eLife Assessment<br /> This study examines an important question regarding the developmental trajectory of neural mechanisms supporting facial expression processing. Leveraging a rare intracranial EEG (iEEG) dataset including both children and adults, the authors reported that facial expression recognition mainly engaged the posterior superior temporal cortex (pSTC) among children, while both pSTC and the prefrontal cortex were engaged among adults. In terms of strength of evidence, the solid methods, data and analyses broadly support the claims with minor weaknesses.
- Last 7 days
-
www.biorxiv.org www.biorxiv.org
-
-
Reviewer #1 (Public review):
Summary:
This study investigates how the brain processes facial expressions across development by analyzing intracranial EEG (iEEG) data from children (ages 5-10) and post-childhood individuals (ages 13-55). The researchers used a short film containing emotional facial expressions and applied AI-based models to decode brain responses to facial emotions. They found that in children, facial emotion information is represented primarily in the posterior superior temporal cortex (pSTC)-a sensory processing area-but not in the dorsolateral prefrontal cortex (DLPFC), which is involved in higher-level social cognition. In contrast, post-childhood individuals showed emotion encoding in both regions. Importantly, the complexity of emotions encoded in the pSTC increased with age, particularly for socially nuanced emotions like embarrassment, guilt, and pride.The authors claim that these findings suggest that emotion recognition matures through increasing involvement of the prefrontal cortex, supporting a developmental trajectory where top-down modulation enhances understanding of complex emotions as children grow older.
Strengths:
(1) The inclusion of pediatric iEEG makes this study uniquely positioned to offer high-resolution temporal and spatial insights into neural development compared to non-invasive approaches, e.g., fMRI, scalp EEG, etc.
(2) Using a naturalistic film paradigm enhances ecological validity compared to static image tasks often used in emotion studies.
(3) The idea of using state-of-the-art AI models to extract facial emotion features allows for high-dimensional and dynamic emotion labeling in real time.
Weaknesses:
(1) The study has notable limitations that constrain the generalizability and depth of its conclusions. The sample size was very small, with only nine children included and just two having sufficient electrode coverage in the posterior superior temporal cortex (pSTC), which weakens the reliability and statistical power of the findings, especially for analyses involving age. Authors pointed out that a similar sample size has been used in previous iEEG studies, but the cited works focus on adults and do not look at the developmental perspectives. Similar work looking at developmental changes in iEEG signals usually includes many more subjects (e.g., n = 101 children from Cross ZR et al., Nature Human Behavior, 2025) to account for inter-subject variabilities.
(2) Electrode coverage was also uneven across brain regions, with not all participants having electrodes in both the dorsolateral prefrontal cortex (DLPFC) and pSTC, making the conclusion regarding the different developmental changes between DLPFC and pSTC hard to interpret (related to point 3 below). It is understood that it is rare to have such iEEG data collected in this age group, and the electrode location is only determined by clinical needs. However, the scientific rigor should not be compromised by the limited data access. It's the authors' decision whether such an approach is valid and appropriate to address the scientific questions, here the developmental changes in the brain, given all the advantages and constraints of the data modality.
(3) The developmental differences observed were based on cross-sectional comparisons rather than longitudinal data, reducing the ability to draw causal conclusions about developmental trajectories. Also, see comments in point 2.
(4) Moreover, the analysis focused narrowly on DLPFC, neglecting other relevant prefrontal areas such as the orbitofrontal cortex (OFC) and anterior cingulate cortex (ACC), which play key roles in emotion and social processing. Agree that this might be beyond the scope of this paper, but a discussion section might be insightful.
(5) Although the use of a naturalistic film stimulus enhances ecological validity, it comes at the cost of experimental control, with no behavioral confirmation of the emotions perceived by participants and uncertain model validity for complex emotional expressions in children. A non-facial music block that could have served as a control was available but not analyzed. The validation of AI model's emotional output needs to be tested. It is understood that we cannot collect these behavioral data retrospectively within the recorded subjects. Maybe potential post-hoc experiments and analyses could be done, e.g., collect behavioral, emotional perception data from age-matched healthy subjects.
(6) Generalizability is further limited by the fact that all participants were neurosurgical patients, potentially with neurological conditions such as epilepsy that may influence brain responses. At least some behavioral measures between the patient population and the healthy groups should be done to ensure the perception of emotions is similar.
(7) Additionally, the high temporal resolution of intracranial EEG was not fully utilized, as data were downsampled and averaged in 500-ms windows. It seems like the authors are trying to compromise the iEEG data analyses to match up with the AI's output resolution, which is 2Hz. It is not clear then why not directly use fMRI, which is non-invasive and seems to meet the needs here already. The advantages of using iEEG in this study are missing here.
(8) Finally, the absence of behavioral measures or eye-tracking data makes it difficult to directly link neural activity to emotional understanding or determine which facial features participants attended to. Related to point 5 as well.
Comments on revisions:
A behavioral measurement will help address a lot of these questions. If the data continues collecting, additional subjects with iEEG recording and also behavioral measurements would be valuable.
-
Reviewer #2 (Public review):
Summary:
In this paper, Fan et al. aim to characterize how neural representations of facial emotions evolve from childhood to adulthood. Using intracranial EEG recordings from participants aged 5 to 55, the authors assess the encoding of emotional content in high-level cortical regions. They report that while both the posterior superior temporal cortex (pSTC) and dorsolateral prefrontal cortex (DLPFC) are involved in representing facial emotions in older individuals, only the pSTC shows significant encoding in children. Moreover, the encoding of complex emotions in the pSTC appears to strengthen with age. These findings lead the authors to suggest that young children rely more on low-level sensory areas and propose a developmental shift from reliance on lower-level sensory areas in early childhood to increased top-down modulation by the prefrontal cortex as individuals mature.
Strengths:
(1) Rare and valuable dataset: The use of intracranial EEG recordings in a developmental sample is highly unusual and provides a unique opportunity to investigate neural dynamics with both high spatial and temporal resolution.
(2 ) Developmentally relevant design: The broad age range and cross-sectional design are well-suited to explore age-related changes in neural representations.
(3) Ecological validity: The use of naturalistic stimuli (movie clips) increases the ecological relevance of the findings.
(4) Feature-based analysis: The authors employ AI-based tools to extract emotion-related features from naturalistic stimuli, which enables a data-driven approach to decoding neural representations of emotional content. This method allows for a more fine-grained analysis of emotion processing beyond traditional categorical labels.
Weaknesses:
(1) While the authors leverage Hume AI, a tool pre-trained on a large dataset, its specific performance on the stimuli used in this study remains unverified. To strengthen the foundation of the analysis, it would be important to confirm that Hume AI's emotional classifications align with human perception for these particular videos. A straightforward way to address this would be to recruit human raters to evaluate the emotional content of the stimuli and compare their ratings to the model's outputs.
(2) Although the study includes data from four children with pSTC coverage-an increase from the initial submission-the sample size remains modest compared to recent iEEG studies in the field.
(3) The "post-childhood" group (ages 13-55) conflates several distinct neurodevelopmental periods, including adolescence, young adulthood, and middle adulthood. As a finer age stratification is likely not feasible with the current sample size, I would suggest authors temper their developmental conclusions.
(4) The analysis of DLPFC-pSTC directional connectivity would be significantly strengthened by modeling it as a continuous function of age across all participants, rather than relying on an unbalanced comparison between a single child and a (N=7) post-childhood group. This continuous approach would provide a more powerful and nuanced view of the developmental trajectory. I would also suggest including the result in the main text.
-
Author response:
The following is the authors’ response to the original reviews.
eLife Assessment
This study examines a valuable question regarding the developmental trajectory of neural mechanisms supporting facial expression processing. Leveraging a rare intracranial EEG (iEEG) dataset including both children and adults, the authors reported that facial expression recognition mainly engaged the posterior superior temporal cortex (pSTC) among children, while both pSTC and the prefrontal cortex were engaged among adults. However, the sample size is relatively small, with analyses appearing incomplete to fully support the primary claims.
Public Reviews:
Reviewer #1 (Public review):
Summary:
This study investigates how the brain processes facial expressions across development by analyzing intracranial EEG (iEEG) data from children (ages 5-10) and post-childhood individuals (ages 13-55). The researchers used a short film containing emotional facial expressions and applied AI-based models to decode brain responses to facial emotions. They found that in children, facial emotion information is represented primarily in the posterior superior temporal cortex (pSTC) - a sensory processing area - but not in the dorsolateral prefrontal cortex (DLPFC), which is involved in higher-level social cognition. In contrast, post-childhood individuals showed emotion encoding in both regions. Importantly, the complexity of emotions encoded in the pSTC increased with age, particularly for socially nuanced emotions like embarrassment, guilt, and pride. The authors claim that these findings suggest that emotion recognition matures through increasing involvement of the prefrontal cortex, supporting a developmental trajectory where top-down modulation enhances understanding of complex emotions as children grow older.
Strengths:
(1) The inclusion of pediatric iEEG makes this study uniquely positioned to offer high-resolution temporal and spatial insights into neural development compared to non-invasive approaches, e.g., fMRI, scalp EEG, etc.
(2) Using a naturalistic film paradigm enhances ecological validity compared to static image tasks often used in emotion studies.
(3) The idea of using state-of-the-art AI models to extract facial emotion features allows for high-dimensional and dynamic emotion labeling in real time
Weaknesses:
(1) The study has notable limitations that constrain the generalizability and depth of its conclusions. The sample size was very small, with only nine children included and just two having sufficient electrode coverage in the posterior superior temporal cortex (pSTC), which weakens the reliability and statistical power of the findings, especially for analyses involving age
We appreciated the reviewer’s point regarding the constrained sample size.
As an invasive method, iEEG recordings can only be obtained from patients undergoing electrode implantation for clinical purposes. Thus, iEEG data from young children are extremely rare, and rapidly increasing the sample size within a few years is not feasible. However, we are confident in the reliability of our main conclusions. Specifically, 8 children (53 recording contacts in total) and 13 control participants (99 recording contacts in total) with electrode coverage in the DLPFC are included in our DLPFC analysis. This sample size is comparable to other iEEG studies with similar experiment designs [1-3].
For pSTC, we returned to the data set and found another two children who had pSTC coverage. After involving these children’s data, the group-level analysis using permutation test showed that children’s pSTC significantly encode facial emotion in naturalistic contexts (Figure 3B). Notably, the two new children’s (S33 and S49) responses were highly consistent with our previous observations. Moreover, the averaged prediction accuracy in children’s pSTC (r<sub>speech</sub>=0.1565) was highly comparable to that in post-childhood group (r<sub>speech</sub>=0.1515).
(1) Zheng, J. et al. Multiplexing of Theta and Alpha Rhythms in the Amygdala-Hippocampal Circuit Supports Pafern Separation of Emotional Information. Neuron 102, 887-898.e5 (2019).
(2) Diamond, J. M. et al. Focal seizures induce spatiotemporally organized spiking activity in the human cortex. Nat. Commun. 15, 7075 (2024).
(3) Schrouff, J. et al. Fast temporal dynamics and causal relevance of face processing in the human temporal cortex. Nat. Commun. 11, 656 (2020).
(2) Electrode coverage was also uneven across brain regions, with not all participants having electrodes in both the dorsolateral prefrontal cortex (DLPFC) and pSTC, and most coverage limited to the left hemisphere-hindering within-subject comparisons and limiting insights into lateralization.
The electrode coverage in each patient is determined entirely by the clinical needs. Only a few patients have electrodes in both DLPFC and pSTC because these two regions are far apart, so it’s rare for a single patient’s suspected seizure network to span such a large territory. However, it does not affect our results, as most iEEG studies combine data from multiple patients to achieve sufficient electrode coverage in each target brain area. As our data are mainly from left hemisphere (due to the clinical needs), this study was not designed to examine whether there is a difference between hemispheres in emotion encoding. Nevertheless, lateralization remains an interesting question that should be addressed in future research, and we have noted this limitation in the Discussion (Page 8, in the last paragraph of the Discussion).
(3) The developmental differences observed were based on cross-sectional comparisons rather than longitudinal data, reducing the ability to draw causal conclusions about developmental trajectories.
In the context of pediatric intracranial EEG, longitudinal data collection is not feasible due to the invasive nature of electrode implantation. We have added this point to the Discussion to acknowledge that while our results reveal robust age-related differences in the cortical encoding of facial emotions, longitudinal studies using non-invasive methods will be essential to directly track developmental trajectories (Page 8, in the last paragraph of Discussion). In addition, we revised our manuscript to avoid emphasis causal conclusions about developmental trajectories in the current study (For example, we use “imply” instead of “suggest” in the fifth paragraph of Discussion).
(4) Moreover, the analysis focused narrowly on DLPFC, neglecting other relevant prefrontal areas such as the orbitofrontal cortex (OFC) and anterior cingulate cortex (ACC), which play key roles in emotion and social processing.
We agree that both OFC and ACC are critically involved in emotion and social processing. However, we have no recordings from these areas because ECoG rarely covers the ACC or OFC due to technical constraints. We have noted this limitation in the Discussion(Page 8, in the last paragraph of Discussion). Future follow-up studies using sEEG or non-invasive imaging methods could be used to examine developmental patterns in these regions.
(5) Although the use of a naturalistic film stimulus enhances ecological validity, it comes at the cost of experimental control, with no behavioral confirmation of the emotions perceived by participants and uncertain model validity for complex emotional expressions in children. A nonfacial music block that could have served as a control was available but not analyzed.
The facial emotion features used in our encoding models were extracted by Hume AI models, which were trained on human intensity ratings of large-scale, experimentally controlled emotional expression data[1-2]. Thus, the outputs of Hume AI model reflect what typical facial expressions convey, that is, the presented facial emotion. Our goal of the present study was to examine how facial emotions presented in the videos are encoded in the human brain at different developmental stages. We agree that children’s interpretation of complex emotions may differ from that of adults, resulting in different perceived emotion (i.e., the emotion that the observer subjectively interprets). Behavioral ratings are necessary to study the encoding of subjectively perceived emotion, which is a very interesting direction but beyond the scope of the present work. We have added a paragraph in the Discussion (see Page 8) to explicitly note that our study focused on the encoding of presented emotion.
We appreciated the reviewer’s point regarding the value of non-facial music blocks. However, although there are segments in music condition that have no faces presented, these cannot be used as a control condition to test whether the encoding model’s prediction accuracy in pSTC or DLPFC drops to chance when no facial emotion is present. This is because, in the absence of faces, no extracted emotion features are available to be used for the construction of encoding model (see Author response image 1 below). Thus, we chose to use a different control analysis for the present work. For children’s pSTC, we shuffled facial emotion feature in time to generate a null distribution, which was then used to test the statistical significance of the encoding models (see Methods/Encoding model fitting for details).
(1) Brooks, J. A. et al. Deep learning reveals what facial expressions mean to people in different cultures. iScience 27, 109175 (2024).
(2) Brooks, J. A. et al. Deep learning reveals what vocal bursts express in different cultures. Nat. Hum. Behav. 7, 240–250 (2023).
Author response image 1.
Time courses of Hume AI extracted facial expression features for the first block of music condition. Only top 5 facial expressions were shown here to due to space limitation.
(6) Generalizability is further limited by the fact that all participants were neurosurgical patients, potentially with neurological conditions such as epilepsy that may influence brain responses.
We appreciated the reviewer’s point. However, iEEG data can only be obtained from clinical populations (usually epilepsy patients) who have electrodes implantation. Given current knowledge about focal epilepsy and its potential effects on brain activity, researchers believe that epilepsy-affected brains can serve as a reasonable proxy for normal human brains when confounding influences are minimized through rigorous procedures[1]. In our study, we took several steps to ensure data quality: (1) all data segments containing epileptiform discharges were identified and removed at the very beginning of preprocessing, (2) patients were asked to participate the experiment several hours outside the window of seizures. Please see Method for data quality check description (Page 9/ Experimental procedures and iEEG data processing).
(1) Parvizi J, Kastner S. 2018. Promises and limitations of human intracranial electroencephalography. Nat Neurosci 21:474–483. doi:10.1038/s41593-018-0108-2
(7) Additionally, the high temporal resolution of intracranial EEG was not fully utilized, as data were down-sampled and averaged in 500-ms windows.
We agree that one of the major advantages of iEEG is its millisecond-level temporal resolution. In our case, the main reason for down-sampling was that the time series of facial emotion features extracted from the videos had a temporal resolution of 2 Hz, which were used for the modelling neural responses. In naturalistic contexts, facial emotion features do not change on a millisecond timescale, so a 500 ms window is sufficient to capture the relevant dynamics. Another advantage of iEEG is its tolerance to motion, which is excessive in young children (e.g., 5-year-olds). This makes our dataset uniquely valuable, suggesting robust representation in the pSTC but not in the DLPFC in young children. Moreover, since our method framework (Figure 1) does not rely on high temporal resolution method, so it can be transferred to non-invasive modalities such as fMRI, enabling future studies to test these developmental patterns in larger populations.
(8) Finally, the absence of behavioral measures or eye-tracking data makes it difficult to directly link neural activity to emotional understanding or determine which facial features participants afended to.
We appreciated this point. Part of our rationale is presented in our response to (5) for the absence of behavioral measures. Following the same rationale, identifying which facial features participants attended to is not necessary for testing our main hypotheses because our analyses examined responses to the overall emotional content of the faces. However, we agree and recommend future studies use eye-tracking and corresponding behavioral measures in studies of subjective emotional understanding.
Reviewer #2 (Public review):
Summary:
In this paper, Fan et al. aim to characterize how neural representations of facial emotions evolve from childhood to adulthood. Using intracranial EEG recordings from participants aged 5 to 55, the authors assess the encoding of emotional content in high-level cortical regions. They report that while both the posterior superior temporal cortex (pSTC) and dorsolateral prefrontal cortex (DLPFC) are involved in representing facial emotions in older individuals, only the pSTC shows significant encoding in children. Moreover, the encoding of complex emotions in the pSTC appears to strengthen with age. These findings lead the authors to suggest that young children rely more on low-level sensory areas and propose a developmental shiZ from reliance on lower-level sensory areas in early childhood to increased top-down modulation by the prefrontal cortex as individuals mature.
Strengths:
(1) Rare and valuable dataset: The use of intracranial EEG recordings in a developmental sample is highly unusual and provides a unique opportunity to investigate neural dynamics with both high spatial and temporal resolution.
(2) Developmentally relevant design: The broad age range and cross-sectional design are well-suited to explore age-related changes in neural representations.
(3) Ecological validity: The use of naturalistic stimuli (movie clips) increases the ecological relevance of the findings.
(4) Feature-based analysis: The authors employ AIbased tools to extract emotion-related features from naturalistic stimuli, which enables a data-driven approach to decoding neural representations of emotional content. This method allows for a more fine-grained analysis of emotion processing beyond traditional categorical labels.
Weaknesses:
(1) The emotional stimuli included facial expressions embedded in speech or music, making it difficult to isolate neural responses to facial emotion per se from those related to speech content or music-induced emotion.
We thank the reviewer for their raising this important point. We agree that in naturalistic settings, face often co-occur with speech, and that these sources of emotion can overlap. However, background music induced emotions have distinct temporal dynamics which are separable from facial emotion (See the Author response image 2 (A) and (B) below). In addition, face can convey a wide range of emotions (48 categories in Hume AI model), whereas music conveys far fewer (13 categories reported by a recent study [1]). Thus, when using facial emotion feature time series as regressors (with 48 emotion categories and rapid temporal dynamics), the model performance will reflect neural encoding of facial emotion in the music condition, rather than the slower and lower-dimensional emotion from music.
For the speech condition, we acknowledge that it is difficult to fully isolate neural responses to facial emotion from those to speech when the emotional content from faces and speech highly overlaps. However, in our study, (1) the time courses of emotion features from face and voice are still different (Author response image 2 (C) and (D)), (2) our main finding that DLPFC encodes facial expression information in postchildhood individuals but not in young children was found in both speech and music condition (Figure 2B and 2C). In music condition, neural responses to facial emotion are not affected by speech. Thus, we have included the DLPFC results from the music condition in the revised manuscript (Figure 2C), and we acknowledge that this issue should be carefully considered in future studies using videos with speech, as we have indicated in the future directions in the last paragraph of Discussion.
(1) Cowen, A. S., Fang, X., Sauter, D. & Keltner, D. What music makes us feel: At least 13 dimensions organize subjective experiences associated with music across different cultures. Proc Natl Acad Sci USA 117, 1924–1934 (2020).
Author response image 2.
Time courses of the amusement. (A) and (B) Amusement conveyed by face or music in a 30-s music block. Facial emotion features are extracted by Hume AI. For emotion from music, we approximated the amusement time course using a weighted combination of low-level acoustic features (RMS energy, spectral centroid, MFCCs), which capture intensity, brightness, and timbre cues linked to amusement. Notice that music continues when there are no faces presented. (C) and (D) Amusement conveyed by face or voice in a 30-s speech block. From 0 to 5 seconds, a girl is introducing her friend to a stranger. The camera focuses on the friend, who appears nervous, while the girl’s voice sounds cheerful. This mismatch explains why the shapes of the two time series differ at the beginning. Such situations occur frequently in naturalistic movies
(2) While the authors leveraged Hume AI to extract facial expression features from the video stimuli, they did not provide any validation of the tool's accuracy or reliability in the context of their dataset. It remains unclear how well the AI-derived emotion ratings align with human perception, particularly given the complexity and variability of naturalistic stimuli. Without such validation, it is difficult to assess the interpretability and robustness of the decoding results based on these features.
Hume AI models were trained and validated by human intensity ratings of large-scale, experimentally controlled emotional expression data [1-2]. The training process used both manual annotations from human raters and deep neural networks. Over 3000 human raters categorized facial expressions into emotion categories and rated on a 1-100 intensity scale. Thus, the outputs of Hume AI model reflect what typical facial expressions convey (based on how people actually interpret them), that is, the presented facial emotion. Our goal of the present study was to examine how facial emotions presented in the videos are encoded in the human brain at different developmental stages. We agree that the interpretation of facial emotions may be different in individual participants, resulting in different perceived emotion (i.e., the emotion that the observer subjectively interprets). Behavioral ratings are necessary to study the encoding of subjectively perceived emotion, which is a very interesting direction but beyond the scope of the present work. We have added text in the Discussion to explicitly note that our study focused on the encoding of presented emotion (second paragraph in Page 8).
(1) Brooks, J. A. et al. Deep learning reveals what facial expressions mean to people in different cultures. iScience 27, 109175 (2024).
(2) Brooks, J. A. et al. Deep learning reveals what vocal bursts express in different cultures. Nat. Hum. Behav. 7, 240–250 (2023).
(3) Only two children had relevant pSTC coverage, severely limiting the reliability and generalizability of results.
We appreciated this point and agreed with both reviewers who raised it as a significant concern. As described in response to reviewer 1 (comment 1), we have added data from another two children who have pSTC coverage. Group-level analysis using permutation test showed that children’s pSTC significantly encode facial emotion in naturalistic contexts (Figure 3B). Because iEEG data from young children are extremely rare, rapidly increasing the sample size within a few years is not feasible. However, we are confident in the reliability of our conclusion that children’s pSTC can encode facial emotion. First, the two new children’s responses (S33 and S49) from pSTC were highly consistent with our previous observations (see individual data in Figure 3B). Second, the averaged prediction accuracy in children’s pSTC (r<sub>speech</sub>=0.1565) was highly comparable to that in post-childhood group (r<sub>speech</sub>=0.1515).
(4) The rationale for focusing exclusively on high-frequency activity for decoding emotion representations is not provided, nor are results from other frequency bands explored.
We focused on high-frequency broadband (HFB) activity because it is widely considered to reflect the responses of local neuronal populations near the recording electrode, whereas low-frequency oscillations in the theta, alpha, and beta ranges are thought to serve as carrier frequencies for long-range communication across distributed networks[1-2]. Since our study aimed to examine the representation of facial emotion in localized cortical regions (DLPFC and pSTC), HFB activity provides the most direct measure of the relevant neural responses. We have added this rationale to the manuscript (Page 3).
(1) Parvizi, J. & Kastner, S. Promises and limitations of human intracranial electroencephalography. Nat. Neurosci. 21, 474–483 (2018).
(2) Buzsaki, G. Rhythms of the Brain. (Oxford University Press, Oxford, 200ti).
(5) The hypothesis of developmental emergence of top-down prefrontal modulation is not directly tested. No connectivity or co-activation analyses are reported, and the number of participants with simultaneous coverage of pSTC and DLPFC is not specified.
Directional connectivity analysis results were not shown because only one child has simultaneous coverage of pSTC and DLPFC. However, the Granger Causality results from post-childhood group (N=7) clearly showed that the influence in the alpha/beta band from DLPFC to pSTC (top-down) is gradually increased above the onset of face presentation (Author response image 3, below left, plotted in red). By comparison, the influence in the alpha/beta band from pSTC to DLPFC (bottom-up) is gradually decreased after the onset of face presentation (Author response image 3, below left, blue curve). The influence in alpha/beta band from DLPFC to pSTC was significantly increased at 750 and 1250 ms after the face presentation (face vs nonface, paired t-test, Bonferroni corrected P=0.005, 0.006), suggesting an enhanced top-down modulation in the post-childhood group during watching emotional faces. Interestingly, this top-down influence appears very different in the 8-year-old child at 1250 ms after the face presentation (Author response image 3, below left, black curve).
As we cannot draw direct conclusions from the single-subject sample presented here, the top-down hypothesis is introduced only as a possible explanation for our current results. We have removed potentially misleading statements, and we plan to test this hypothesis directly using MEG in the future.
Author response image 3.
Difference of Granger causality indices (face – nonface) in alpha/beta and gamma band for both directions. We identified a series of face onset in the movie that paticipant watched. Each trial was defined as -0.1 to 1.5 s relative to the onset. For the non-face control trials, we used houses, animals and scenes. Granger causality was calculated for 0-0.5 s, 0.5-1 s and 1-1.5 s time window. For the post-childhood group, GC indices were averaged across participants. Error bar is sem.
(6) The "post-childhood" group spans ages 13-55, conflating adolescence, young adulthood, and middle age. Developmental conclusions would benefit from finer age stratification.
We appreciate this insightful comment. Our current sample size does not allow such stratification. But we plan to address this important issue in future MEG studies with larger cohorts.
(7) The so-called "complex emotions" (e.g., embarrassment, pride, guilt, interest) used in the study often require contextual information, such as speech or narrative cues, for accurate interpretation, and are not typically discernible from facial expressions alone. As such, the observed age-related increase in neural encoding of these emotions may reflect not solely the maturation of facial emotion perception, but rather the development of integrative processing that combines facial, linguistic, and contextual cues. This raises the possibility that the reported effects are driven in part by language comprehension or broader social-cognitive integration, rather than by changes in facial expression processing per se.
We agree with this interpretation. Indeed, our results already show that speech influences the encoding of facial emotion in the DLPFC differently in the childhood and post-childhood groups (Figure 2D), suggesting that children’s ability to integrate multiple cues is still developing. Future studies are needed to systematically examine how linguistic cues and prior experiences contribute to the understanding of complex emotions from faces, which we have added to our future directions section (last paragraph in Discussion, Page 8-9 ).
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
In the introduction: "These neuroimaging data imply that social and emotional experiences shape the prefrontal cortex's involvement in processing the emotional meaning of faces throughout development, probably through top-down modulation of early sensory areas." Aren't these supposed to be iEEG data instead of neuroimaging?
Corrected.
Reviewer #2 (Recommendations for the authors):
This manuscript would benefit from several improvements to strengthen the validity and interpretability of the findings:
(1) Increase the sample size, especially for children with pSTC coverage.
We added data from another two children who have pSTC coverage. Please see our response to reviewer 2’s comment 3 and reviewer 1’s comment 1.
(2) Include directional connectivity analyses to test the proposed top-down modulation from DLPFC to pSTC.
Thanks for the suggestion. Please see our response to reviewer 2’s comment 5.
(3) Use controlled stimuli in an additional experiment to separate the effects of facial expression, speech, and music.
This is an excellent point. However, iEEG data collection from children is an exceptionally rare opportunity and typically requires many years, so we are unable to add a controlled-stimulus experiment to the current study. We plan to consider using controlled stimuli to study the processing of complex emotion using non-invasive method in the future. In addition, please see our response to reviewer 2’s comment 1 for a description of how neural responses to facial expression and music are separated in our study.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This important study introduces an advance in multi-animal tracking by reframing identity assignment as a self-supervised contrastive representation learning problem. It eliminates the need for segments of video where all animals are simultaneously visible and individually identifiable, and significantly improves tracking speed, accuracy, and robustness with respect to occlusion. This innovation has implications beyond animal tracking, potentially connecting with advances in behavioral analysis and computer vision. The strength of support for these advances is compelling overall, although there were some remaining minor methodological concerns.
-
Reviewer #1 (Public review):
Summary:
This is a strong paper that presents a clear advance in multi-animal tracking. The authors introduce an updated version of idtracker.ai that reframes identity assignment as a contrastive representation learning problem rather than a classification task requiring global fragments. This change leads to substantial gains in speed and accuracy and removes a known bottleneck in the original system. The benchmarking across species is comprehensive, the results are convincing, and the work significant.
Strengths:
The main strengths are the conceptual shift from classification to representation learning, the clear performance gains, and the improved robustness of the new version. Removing the need for global fragments makes the software much more flexible in practice, and the accuracy and speed improvements are well demonstrated across a diverse set of datasets. The authors' response also provides further support for the method's robustness.
The comparison to other methods is now better documented. The authors clarify which features are used, how failures are defined, how parameters are sampled, and how accuracy is assessed against human-validated data. This helps ensure that the evaluation is fair and that readers can understand the assumptions behind the benchmarks.
The software appears thoughtfully implemented, with GUI updates, integration with pose estimators, and tools such as idmatcher.ai for linking identities across videos. The overall presentation has been improved so that the limitations of the original idtracker.ai, the engineering optimizations, and the new contrastive formulation are more clearly separated. This makes the central ideas and contributions easier to follow.
Weaknesses:
I do not have major remaining criticisms. The authors have addressed my earlier concerns about the clarity and fairness of the comparison with prior methods, the benchmark design, and the memory usage analysis by adding methodological detail and clearly explaining their choices. At this point I view these aspects as transparent features of the experimental design that readers can take into account, rather than weaknesses of the work.
Overall, this is a high-quality paper. The improvements to idtracker.ai are well justified and practically significant, and the authors' response addresses the main concerns about clarity and evaluation. The conceptual contribution, thorough empirical validation, and thoughtful software implementation make this a valuable and impactful contribution to multi-animal tracking.
-
Reviewer #3 (Public review):
Summary:
The authors propose a new version of idTracker.ai for animal tracking. Specifically, they apply contrastive learning to embed cropped images of animals into a feature space where clusters correspond to individual animal identities. By doing this, they address the requirement for so-called global fragments - segments of the video, in which all entities are visible/detected at the same time. In general, the new method reduces the long tracking times from the previous versions, while also increasing the average accuracy of assigning the identity labels.
Strengths and weaknesses:
The authors have reorganized and rewritten a substantial portion of their manuscript, which has improved the overall clarity and structure to some extent. In particular, omitting the different protocols enhanced readability. However, all technical details are now in appendix which is now referred to more frequently in the manuscript, which was already the case in the initial submission. These frequent references to the appendix - and even to appendices from previous versions - make it difficult to read and fully understand the method and the evaluations in detail. A more self-contained description of the method within the main text would be highly appreciated.
Furthermore, the authors state that they changed their evaluation metric from accuracy to IDF1. However, throughout the manuscript they continue to refer to "accuracy" when evaluating and comparing results. It is unclear which accuracy metric was used or whether the authors are confusing the two metrics. This point needs clarification, as IDF1 is not an "accuracy" measure but rather an F1-score over identity assignments.
The authors compare the speedups of the new version with those of the previous ones by taking the average. However, it appears that there are striking outliers in the tracking performance data (see Supplementary Table 1-4). Therefore, using the average may not be the most appropriate way to compare. The authors should consider using the median or providing more detailed statistics (e.g., boxplots) to better illustrate the distributions.
The authors did not provide any conclusion or discussion section. Including a concise conclusion that summarizes the main findings and their implications would help to convey the message of the manuscript.
The authors report an improvement in the mean accuracy across all benchmarks from 99.49% to 99.82% (with crossings). While this represents a slight improvement, the datasets used for benchmarking seem relatively simple and already largely "solved". Therefore, the impact of this work on the field may be limited. It would be more informative to evaluate the method on more challenging datasets that include frequent occlusions, crossings, or animals with similar appearances. The accuracy reported in the main text is "without crossings" - this seems like incomplete evaluation, especially that tracking objects that do not cross seems a straightforward task. Information is missing why crossings are a problem and are dealt with separately. There are several videos with a much lower tracking accuracy, explaining what the challenges of these videos are and why the method fails in such cases would help to understand the method's usability and weak points.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary
This is a strong paper that presents a clear advance in multi-animal tracking. The authors introduce an updated version of idtracker.ai that reframes identity assignment as a contrastive learning problem rather than a classification task requiring global fragments. This change leads to gains in speed and accuracy. The method eliminates a known bottleneck in the original system, and the benchmarking across species is comprehensive and well executed. I think the results are convincing and the work is significant.
Strengths
The main strengths are the conceptual shift from classification to representation learning, the clear performance gains, and the fact that the new version is more robust. Removing the need for global fragments makes the software more flexible in practice, and the accuracy and speed improvements are well demonstrated. The software appears thoughtfully implemented, with GUI updates and integration with pose estimators.
Weaknesses
I don't have any major criticisms, but I have identified a few points that should be addressed to improve the clarity and accuracy of the claims made in the paper.
(1) The title begins with "New idtracker.ai," which may not age well and sounds more promotional than scientific. The strength of the work is the conceptual shift to contrastive representation learning, and it might be more helpful to emphasize that in the title rather than branding it as "new."
We considered using “Contrastive idtracker.ai”. However, we thought that readers could then think that we believe they could use both the old idtracker.ai or this contrastive version. But we want to say that the new version is the one to use as it is better in both accuracy and tracking times. We think “New idtracker.ai” communicates better that this version is the version we recommend.
(2) Several technical points regarding the comparison between TRex (a system evaluated in the paper) and idtracker.ai should be addressed to ensure the evaluation is fair and readers are fully informed.
(2.1) Lines 158-160: The description of TRex as based on "Protocol 2 of idtracker.ai" overlooks several key additions in TRex, such as posture image normalization, tracklet subsampling, and the use of uniqueness feedback during training. These features are not acknowledged, and it's unclear whether TRex was properly configured - particularly regarding posture estimation, which appears to have been omitted but isn't discussed. Without knowing the actual parameters used to make comparisons, it's difficult to dassess how the method was evaluated.
We added the information about the key additions of TRex in the section “The new idtracker.ai uses representation learning”, lines 153-157. Posture estimation in TRex was not explicitly used but neither disabled during the benchmark; we clarified this in the last paragraph of “Benchmark of accuracy and tracking time”, lines 492-495.
(2.2) Lines 162-163: The paper implies that TRex gains speed by avoiding Protocol 3, but in practice, idtracker.ai also typically avoids using Protocol 3 due to its extremely long runtime. This part of the framing feels more like a rhetorical contrast than an informative one.
We removed this, see new lines 153-157.
(2.3) Lines 277-280: The contrastive loss function is written using the label l, but since it refers to a pair of images, it would be clearer and more precise to write it as l_{I,J}. This would help readers unfamiliar with contrastive learning understand the formulation more easily.
We added this change in lines 613-620.
(2.4) Lines 333-334: The manuscript states that TRex can fail to track certain videos, but this may be inaccurate depending on how the authors classify failures. TRex may return low uniqueness scores if training does not converge well, but this isn't equivalent to tracking failure. Moreover, the metric reported by TRex is uniqueness, not accuracy. Equating the two could mislead readers. If the authors did compare outputs to human-validated data, that should be stated more explicitly.
We observed TRex crashing without outputting any trajectories on some occasions (Appendix 1—figure 1), and this is what we labeled as “failure”. These failures happened in the most difficult videos of our benchmark, that’s why we treated them the same way as idtracker.ai going to P3. We clarified this in new lines 464-469.
The accuracy measured in our benchmark is not estimated but it is human-validated (see section Computation of tracking accuracy in Appendix 1). Both softwares report some quality estimators at the end of a tracking (“estimated accuracy” for idtracker.ai and "uniqueness” for TRex) but these were not used in the benchmark.
(2.5) Lines 339-341: The evaluation approach defines a "successful run" and then sums the runtime across all attempts up to that point. If success is defined as simply producing any output, this may not reflect how experienced users actually interact with the software, where parameters are iteratively refined to improve quality.
Yes, our benchmark was designed to be agnostic to the different experiences of the user. Also, our benchmark was designed for users that do not inspect the trajectories to choose parameters again not to leave room for potential subjectivity.
(2.6) Lines 344-346: The simulation process involves sampling tracking parameters 10,000 times and selecting the first "successful" run. If parameter tuning is randomized rather than informed by expert knowledge, this could skew the results in favor of tools that require fewer or simpler adjustments. TRex relies on more tunable behavior, such as longer fragments improving training time, which this approach may not capture.
We precisely used the TRex parameter track_max_speed to elongate fragments for optimal tracking. Rather than randomized parameter tuning, we defined the “valid range” for this parameter so that all values in it would produce a decent fragment structure. We used this procedure to avoid worsening those methods that use more parameters.
(2.7) Line 354 onward: TRex was evaluated using two varying parameters (threshold and track_max_speed), while idtracker.ai used only one (intensity_threshold). With a fixed number of samples, this asymmetry could bias results against TRex. In addition, users typically set these parameters based on domain knowledge rather than random exploration.
idtracker.ai and TRex have several parameters. Some of them have a single correct value (e.g. number of animals) or the default value that the system computes is already good (e.g. minimum blob size). For a second type of parameters, the system finds a value that is in general not as good, so users need to modify them. In general, users find that for this second type of parameter there is a valid interval of possible values, from which they need to choose a single value to run the system. idtracker.ai has intensity_threshold as the only parameter of this second type and TRex has two: threshold and track_max_speed. For these parameters, choosing one value or another within the valid interval can give different tracking results. Therefore, when we model a user that wants to run the system once except if it goes to P3 (idtracker.ai) or except if it crashes (TRex), it is these parameters we sample from within the valid interval to get a different value for each run of the system. We clarify this in lines 452-469 of the section “Benchmark of accuracy and tracking time”.
Note that if we chose to simply run old idtracker.ai (v4 or v5) or TRex a single time, this would benefit the new idtracker.ai (v6). This is because old idtracker.ai can enter the very slow protocol 3 and TRex can fail to track. So running old idtracker.ai or TRex up to 5 times until old idtracker.ai does not use Protocol 3 and TRex does not fail is to make them as good as they can be with respect to the new idtracker.ai.
(2.8) Figure 2-figure supplement 3: The memory usage comparison lacks detail. It's unclear whether RAM or VRAM was measured, whether shared or compressed memory was included, or how memory was sampled. Since both tools dynamically adjust to system resources, the relevance of this comparison is questionable without more technical detail.
We modified the text in the caption (new Figure 1-figure supplement 2) adding the kind of memory we measured (RAM) and how we measured it. We already have a disclaimer for this plot saying that memory management depends on the machine's available resources. We agree that this is a simple analysis of the usage of computer resources.
(3) While the authors cite several key papers on contrastive learning, they do not use the introduction or discussion to effectively situate their approach within related fields where similar strategies have been widely adopted. For example, contrastive embedding methods form the backbone of modern facial recognition and other image similarity systems, where the goal is to map images into a latent space that separates identities or classes through clustering. This connection would help emphasize the conceptual strength of the approach and align the work with well-established applications. Similarly, there is a growing literature on animal re-identification (ReID), which often involves learning identity-preserving representations across time or appearance changes. Referencing these bodies of work would help readers connect the proposed method with adjacent areas using similar ideas, and show that the authors are aware of and building on this wider context.
We have now added a new section in Appendix 3, “Differences with previous work in contrastive/metric learning” (lines 792-841) to include references to previous work and a description of what we do differently.
(4) Some sections of the Results text (e.g., lines 48-74) read more like extended figure captions than part of the main narrative. They include detailed explanations of figure elements, sorting procedures, and video naming conventions that may be better placed in the actual figure captions or moved to supplementary notes. Streamlining this section in the main text would improve readability and help the central ideas stand out more clear
Thank you for pointing this out. We have rewritten the Results, for example streamlining the old lines 48-74 (new lines 42-48) by moving the comments about names, files and order of videos to the caption of Figure 1.
Overall, though, this is a high-quality paper. The improvements to idtracker.ai are well justified and practically significant. Addressing the above comments will strengthen the work, particularly by clarifying the evaluation and comparisons.
We thank the reviewer for the detailed suggestions. We believe we have taken all of them into consideration to improve the ms.
Reviewer #2 (Public review):
Summary:
This work introduces a new version of the state-of-the-art idtracker.ai software for tracking multiple unmarked animals. The authors aimed to solve a critical limitation of their previous software, which relied on the existence of "global fragments" (video segments where all animals are simultaneously visible) to train an identification classifier network, in addition to addressing concerns with runtime speed. To do this, the authors have both re-implemented the backend of their software in PyTorch (in addition to numerous other performance optimizations) as well as moving from a supervised classification framework to a self-supervised, contrastive representation learning approach that no longer requires global fragments to function. By defining positive training pairs as different images from the same fragment and negative pairs as images from any two co-existing fragments, the system cleverly takes advantage of partial (but high-confidence) tracklets to learn a powerful representation of animal identity without direct human supervision. Their formulation of contrastive learning is carefully thought out and comprises a series of empirically validated design choices that are both creative and technically sound. This methodological advance is significant and directly leads to the software's major strengths, including exceptional performance improvements in speed and accuracy and a newfound robustness to occlusion (even in severe cases where no global fragments can be detected). Benchmark comparisons show the new software is, on average, 44 times faster (up to 440 times faster on difficult videos) while also achieving higher accuracy across a range of species and group sizes. This new version of idtracker.ai is shown to consistently outperform the closely related TRex software (Walter & Couzin, 2021\), which, together with the engineering innovations and usability enhancements (e.g., outputs convenient for downstream pose estimation), positions this tool as an advancement on the state-of-the-art for multi-animal tracking, especially for collective behavior studies.
Despite these advances, we note a number of weaknesses and limitations that are not well addressed in the present version of this paper:
Weaknesses
(1) The contrastive representation learning formulation. Contrastive representation learning using deep neural networks has long been used for problems in the multi-object tracking domain, popularized through ReID approaches like DML (Yi et al., 2014\) and DeepReID (Li et al., 2014). More recently, contrastive learning has become more popular as an approach for scalable self-supervised representation learning for open-ended vision tasks, as exemplified by approaches like SimCLR (Chen et al., 2020), SimSiam (Chen et al., 2020\), and MAE (He et al., 2021\) and instantiated in foundation models for image embedding like DINOv2 (Oquab et al., 2023). Given their prevalence, it is useful to contrast the formulation of contrastive learning described here relative to these widely adopted approaches (and why this reviewer feels it is appropriate):
(1.1) No rotations or other image augmentations are performed to generate positive examples. These are not necessary with this approach since the pairs are sampled from heuristically tracked fragments (which produces sufficient training data, though see weaknesses discussed below) and the crops are pre-aligned egocentrically (mitigating the need for rotational invariance).
(1.2) There is no projection head in the architecture, like in SimCLR. Since classification/clustering is the only task that the system is intended to solve, the more general "nuisance" image features that this architectural detail normally affords are not necessary here.
(1.3) There is no stop gradient operator like in BYOL (Grill et al., 2020\) or SimSiam. Since the heuristic tracking implicitly produces plenty of negative pairs from the fragments, there is no need to prevent representational collapse due to class asymmetry. Some care is still needed, but the authors address this well through a pair sampling strategy (discussed below).
(1.4) Euclidean distance is used as the distance metric in the loss rather than cosine similarity as in most contrastive learning works. While cosine similarity coupled with L2-normalized unit hypersphere embeddings has proven to be a successful recipe to deal with the curse of dimensionality (with the added benefit of bounded distance limits), the authors address this through a cleverly constructed loss function that essentially allows direct control over the intra- and inter-cluster distance (D\_pos and D\_neg). This is a clever formulation that aligns well with the use of K-means for the downstream assignment step.
No concerns here, just clarifications for readers who dig into the review. Referencing the above literature would enhance the presentation of the paper to align with the broader computer vision literature.
Thank you for this detailed comparison. We have now added a new section in Appendix 3, “Differences with previous work in contrastive/metric learning” (lines 792-841) to include references to previous work and a description of what we do differently, including the points raised by the reviewer.
(2) Network architecture for image feature extraction backbone. As most of the computations that drive up processing time happen in the network backbone, the authors explored a variety of architectures to assess speed, accuracy, and memory requirements. They land on ResNet18 due to its empirically determined performance. While the experiments that support this choice are solid, the rationale behind the architecture selection is somewhat weak. The authors state that: "We tested 23 networks from 8 different families of state-of-the-art convolutional neural network architectures, selected for their compatibility with consumer-grade GPUs and ability to handle small input images (20 × 20 to 100 × 100 pixels) typical in collective animal behavior videos."
(2.1) Most modern architectures have variants that are compatible with consumer-grade GPUs. This is true of, for example, HRNet (Wang et al., 2019), ViT (Dosovitskiy et al., 2020), SwinT (Liu et al., 2021), or ConvNeXt (Liu et al., 2022), all of which report single GPU training and fast runtime speeds through lightweight configuration or subsequent variants, e.g., MobileViT (Mehta et al., 2021). The authors may consider revising that statement or providing additional support for that claim (e.g., empirical experiments) given that these have been reported to outperform ResNet18 across tasks.
Following the recommendation of the reviewer, we tested the architectures SwinT, ConvNeXt and ViT. We found out that none of them outperformed ResNet18 since they all showed a slower learning curve. This would result in higher tracking times. These tests are now included in the section “Network architecture” (lines 550-611).
(2.2) The compatibility of different architectures with small image sizes is configurable. Most convolutional architectures can be readily adapted to work with smaller image sizes, including 20x20 crops. With their default configuration, they lose feature map resolution through repeated pooling and downsampling steps, but this can be readily mitigated by swapping out standard convolutions with dilated convolutions and/or by setting the stride of pooling layers to 1, preserving feature map resolution across blocks. While these are fairly straightforward modifications (and are even compatible with using pretrained weights), an even more trivial approach is to pad and/or resize the crops to the default image size, which is likely to improve accuracy at a possibly minimal memory and runtime cost. These techniques may even improve the performance with the architectures that the authors did test out.
The only two tested architectures that require a minimum image size are AlexNet and DenseNet. DenseNet proved to underperform ResNet18 in the videos where the images are sufficiently large. We have tested AlexNet with padded images to see that it also performs worse than ResNet18 (see Appendix 3—figure 1).
We also tested the initialization of ResNet18 with pre-trained weights from ImageNet (in Appendix 3—figure 2) and it proved to bring no benefit to the training speed (added in lines 591-592).
(2.3) The authors do not report whether the architecture experiments were done with pretrained or randomly initialized weights.
We adapted the text to make it clear that the networks are always randomly initialized (lines 591-592, lines 608-609 and the captions of Appendix 3—figure 1 and 2).
(2.4) The authors do not report some details about their ResNet18 design, specifically whether a global pooling layer is used and whether the output fully connected layer has any activation function. Additionally, they do not report the version of ResNet18 employed here, namely, whether the BatchNorm and ReLU are applied after (v1) or before (v2) the conv layers in the residual path.
We use ResNet18 v1 with no activation function nor bias in its last layer (this has been clarified in the lines 606-608). Also, by design, ResNet has a global average pool right before the last fully connected layer which we did not remove. In response to the reviewer, Resnet18 v2 was tested and its performance is the same as that of v1 (see Appendix 3—figure 1 and lines 590-591).
(3) Pair sampling strategy. The authors devised a clever approach for sampling positive and negative pairs that is tailored to the nature of the formulation. First, since the positive and negative labels are derived from the co-existence of pretracked fragments, selection has to be done at the level of fragments rather than individual images. This would not be the case if one of the newer approaches for contrastive learning were employed, but it serves as a strength here (assuming that fragment generation/first pass heuristic tracking is achievable and reliable in the dataset). Second, a clever weighted sampling scheme assigns sampling weights to the fragments that are designed to balance "exploration and exploitation". They weigh samples both by fragment length and by the loss associated with that fragment to bias towards different and more difficult examples.
(3.1) The formulation described here resembles and uses elements of online hard example mining (Shrivastava et al., 2016), hard negative sampling (Robinson et al., 2020\), and curriculum learning more broadly. The authors may consider referencing this literature (particularly Robinson et al., 2020\) for inspiration and to inform the interpretation of the current empirical results on positive/negative balancing.
Following this recommendation, we added references of hard negative mining in the new section “Differences with previous work in contrastive/metric learning”, lines 792-841. Regarding curriculum learning, even though in spirit it might have parallels with our sampling method in the sense that there is a guided training of the network, we believe the approach is more similar to an exploration-exploitation paradigm.
(4) Speed and accuracy improvements. The authors report considerable improvements in speed and accuracy of the new idTracker (v6) over the original idTracker (v4?) and TRex. It's a bit unclear, however, which of these are attributable to the engineering optimizations (v5?) versus the representation learning formulation.
(4.1) Why is there an improvement in accuracy in idTracker v5 (L77-81)? This is described as a port to PyTorch and improvements largely related to the memory and data loading efficiency. This is particularly notable given that the progression went from 97.52% (v4; original) to 99.58% (v5; engineering enhancements) to 99.92% (v6; representation learning), i.e., most of the new improvement in accuracy owes to the "optimizations" which are not the central emphasis of the systematic evaluations reported in this paper.
V5 was a two year-effort designed to improve time efficiency of v4. It was also a surprise to us that accuracy was higher, but that likely comes from the fact that the substituted code from v4 contained some small bug/s. The improvements in v5 are retained in v6 (contrastive learning) and v6 has higher accuracy and shorter tracking times. The difference in v6 for this extra accuracy and shorter tracking times is contrastive learning.
(4.2) What about the speed improvements? Relative to the original (v4), the authors report average speed-ups of 13.6x in v5 and 44x in v6. Presumably, the drastic speed-up in v6 comes from a lower Protocol 2 failure rate, but v6 is not evaluated in Figure 2 - figure supplement 2.
Idtracker.ai v5 runs an optimized Protocol 2 and, sometimes, the Protocol 3. But v6 doesn’t run either of them. While P2 is still present in v6 as a fallback protocol when contrastive fails, in our v6 benchmark P2 was never needed. So the v6 speedup comes from replacing both P2 and P3 with the contrastive algorithm.
(5) Robustness to occlusion. A major innovation enabled by the contrastive representation learning approach is the ability to tolerate the absence of a global fragment (contiguous frames where all animals are visible) by requiring only co-existing pairs of fragments owing to the paired sampling formulation. While this removes a major limitation of the previous versions of idtracker.ai, its evaluation could be strengthened. The authors describe an ablation experiment where an arc of the arena is masked out to assess the accuracy under artificially difficult conditions. They find that the v6 works robustly up to significant proportions of occlusions, even when doing so eliminates global fragments.
(5.1) The experiment setup needs to be more carefully described.
(5.1.1) What does the masking procedure entail? Are the pixels masked out in the original video or are detections removed after segmentation and first pass tracking is done?
The mask is defined as a region of interest in the software. This means that it is applied at the segmentation step where the video frame is converted to a foreground-background binary image. The region of interest is applied here, converting to background all pixels not inside of it. We clarified this in the newly added section Occlusion tests, lines 240-244.
(5.1.2) What happens at the boundary of the mask? (Partial segmentation masks would throw off the centroids, and doing it after original segmentation does not realistically model the conditions of entering an occlusion area.)
Animals at the boundaries of the mask are partially detected. This can change the location of their detected centroid. That’s why, when computing the ground-truth accuracy for these videos, only the groundtruth centroids that were at minimum 15 pixels further from the mask were considered. We clarified this in the newly added section Occlusion tests, lines 248-251.
(5.1.3) Are fragments still linked for animals that enter and then exit the mask area?
No artificial fragment linking was added in these videos. Detected fragments are linked the usual way. If one animal hides into the mask, the animal disappears so the fragment breaks. We clarified this in the newly added section Occlusion tests, lines 245-247.
(5.1.4) How is the evaluation done? Is it computed with or without the masked region detections?
The groundtruth used to validate these videos contains the positions of all animals at all times. But only the positions outside the mask at each frame were considered to compute the tracking accuracy. We clarified this in the newly added section Occlusion tests, lines 248-251.
(5.2) The circular masking is perhaps not the most appropriate for the mouse data, which is collected in a rectangular arena.
We wanted to show the same proof of concept in different videos. For that reason, we used to cover the arena parametrized by an angle. In the rectangular arena the circular masking uses an external circle, so it is covering the rectangle parametrized by an angle.
(5.3) The number of co-existing fragments, which seems to be the main determinant of performance that the authors derive from this experiment, should be reported for these experiments. In particular, a "number of co-existing fragments" vs accuracy plot would support the use of the 0.25(N-1) heuristic and would be especially informative for users seeking to optimize experimental and cage design. Additionally, the number of co-existing fragments can be artificially reduced in other ways other than a fixed occlusion, including random dropout, which would disambiguate it from potential allocentric positional confounds (particularly relevant in arenas where egocentric pose is correlated with allocentric position).
We included the requested analysis about the fragment connectivity in Figure 3-figure supplement 1. We agree that there can be additional ways of reducing co-existing fragments, but we think the occlusion tests have the additional value that there are many real experiments similar to this test.
(6) Robustness to imaging conditions. The authors state that "the new idtracker.ai can work well with lower resolutions, blur and video compression, and with inhomogeneous light (Figure 2 - figure supplement 4)." (L156). Despite this claim, there are no speed or accuracy results reported for the artificially corrupted data, only examples of these image manipulations in the supplementary figure.
We added this information in the same image, new Figure 1 - figure supplement 3.
(7) Robustness across longitudinal or multi-session experiments. The authors reference idmatcher.ai as a compatible tool for this use case (matching identities across sessions or long-term monitoring across chunked videos), however, no performance data is presented to support its usage. This is relevant as the innovations described here may interact with this setting. While deep metric learning and contrastive learning for ReID were originally motivated by these types of problems (especially individuals leaving and entering the FOV), it is not clear that the current formulation is ideally suited for this use case. Namely, the design decisions described in point 1 of this review are at times at odds with the idea of learning generalizable representations owing to the feature extractor backbone (less scalable), low-dimensional embedding size (less representational capacity), and Euclidean distance metric without hypersphere embedding (possible sensitivity to drift). It's possible that data to support point 6 can mitigate these concerns through empirical results on variations in illumination, but a stronger experiment would be to artificially split up a longer video into shorter segments and evaluate how generalizable and stable the representations learned in one segment are across contiguous ("longitudinal") or discontiguous ("multi-session") segments.
We have now added a test to prove the reliability of idmatcher.ai in v6. In this test, 14 videos are taken from the benchmark and split in two non-overlapping parts (with a 200 frames gap in between). idmatcher.ai is run between the two parts presenting a 100% accuracy identity matching across all of them (see section “Validity of idmatcher.ai in the new idtracker.ai”, lines 969-1008).
We thank the reviewer for the detailed suggestions. We believe we have taken all of them into consideration to improve the ms.
Reviewer #3 (Public review):
Summary
The authors propose a new version of idTracker.ai for animal tracking. Specifically, they apply contrastive learning to embed cropped images of animals into a feature space where clusters correspond to individual animal identities.
Strengths
By doing this, the new software alleviates the requirement for so-called global fragments - segments of the video, in which all entities are visible/detected at the same time - which was necessary in the previous version of the method. In general, the new method reduces the tracking time compared to the previous versions, while also increasing the average accuracy of assigning the identity labels.
Weaknesses
The general impression of the paper is that, in its current form, it is difficult to disentangle the old from the new method and understand the method in detail. The manuscript would benefit from a major reorganization and rewriting of its parts. There are also certain concerns about the accuracy metric and reducing the computational time.
We have made the following modifications in the presentation:
(1) We have added section tiles to the main text so it is clearer what tracking system we are referring to. For example, we now have sections “Limitation of the original idtracker.ai”, “Optimizing idtracker.ai without changes in the learning method” and “The new idtracker.ai uses representation learning”.
(2) We have completely rewritten all the text of the ms until we start with contrastive learning. Old L20-89 is now L20-L66, much shorter and easier to read.
(3) We have rewritten the first 3 paragraphs in the section “The new idtracker.ai uses representation learning” (lines 68-92).
(4) We now expanded Appendix 3 to discuss the details of our approach (lines 539-897). It discusses in detail the steps of the algorithm, the network architecture, the loss function, the sampling strategy, the clustering and identity assignment, and the stopping criteria in training
(5) To cite previous work in detail and explain what we do differently, we have now added in Appendix 3 the new section “Differences with previous work in contrastive/metric learning” (lines 792-841).
Regarding accuracy metrics, we have replaced our accuracy metric with the standard metric IDF1. IDF1 is the standard metric that is applied to systems in which the goal is to maintain consistent identities across time. See also the section in Appendix 1 "Computation of tracking accuracy” (lines 414-436) explaining IDF1 and why this is an appropriate metric for our goal.
Using IDF1 we obtain slightly higher accuracies for the idtracker.ai systems. This is the comparison of mean accuracy over all our benchmark for our previous accuracy score and the new one for the full trajectories:
v4: 97.42% -> 98.24%
v5: 99.41% -> 99.49%
v6: 99.74% -> 99.82%
trex: 97.89% -> 97.89%
We thank the reviewer for the suggestions about presentation and about the use of more standard metrics.
Recommendations for the authors:
Reviewer #2 (Recommendations for the authors):
(1) Figure 1a: A graphical legend inset would make it more readable since there are multiple colors, line styles, and connecting lines to parse out.
Following this recommendation, we added a graphical legend in the old Figure 1 (new Figure 2).
(2) L46: "have images" → "has images".
We applied this correction. Line 35.
(3) L52: "videos start with a letter for the species (z,**f**,m)", but "d" is used for fly videos.
We applied this correction in the caption of Figure 1.
(4) L62: "with Protocol 3 a two-step process" → "with Protocol 3 being a two-step process".
We rewrote this paragraph without mentioning Protocol 3, lines 37-41.
(5) L82-89: This is the main statement of the problems that are being addressed here (speed and relaxing the need for global fragments). This could be moved up, emphasized, and made clearer without the long preamble and results on the engineering optimizations in v5. This lack of linearity in the narrative is also evident in the fact that after Figure 1a is cited, inline citations skip to Figure 2 before returning to Figure 1 once the contrastive learning is introduced.
We have rewritten all the text until the contrastive learning, (old lines 20-89 are now lines 20-66). The text is shorter, more linear and easier to read.
(6) L114: "pairs until the distance D_{pos}" → "pairs until the distance approximates D_{pos}".
We rewrote as “ pairs until the distance 𝐷pos (or 𝐷neg) is reached” in line 107.
(7) L570: Missing a right parenthesis in the equation.
We no longer have this equation in the ms.
(8) L705: "In order to identify fragments we, not only need" → "In order to identify fragments, we not only need".
We applied this correction, Line 775.
(9) L819: "probably distribution" → "probability distribution".
We applied this correction, Line 776.
(10) L833: "produced the best decrease the time required" → "produced the best decrease of the time required".
We applied this correction, Line 746.
Reviewer #3 (Recommendations for the authors):
(1) We recommend rewriting and restructuring the manuscript. The paper includes a detailed explanation of the previous approaches (idTracker and idTracker.ai) and their limitations. In contrast, the description of the proposed method is short and unstructured, which makes it difficult to distinguish between the old and new methods as well as to understand the proposed method in general. Here are a few examples illustrating the problem.
(1.1) Only in line 90 do the authors start to describe the work done in this manuscript. The previous 3 pages list limitations of the original method.
We have now divided the main text into sections, so it is clearer what is the previous method (“Limitation of the original idtracker.ai”, lines 28-51), the new optimization we did of this method (“Optimizing idtracker.ai without changes in the learning method”, lines 52-66) and the new contrastive approach that also includes the optimizations (“The new idtracker.ai uses representation learning”, lines 66-164). Also, the new text has now been streamlined until the contrastive section, following your suggestion. You can see that in the new writing the three sections are 25 , 15 and 99 lines. The more detailed section is the new system, the other two are needed as reference, to describe which problem we are solving and the extra new optimizations.
(1.2) The new method does not have a distinct name, and it is hard to follow which idtracker.ai is a specific part of the text referring to. Not naming the new method makes it difficult to understand.
We use the name new idtracker.ai (v6) so it becomes the current default version. v5 is now obsolete, as well as v4. And from the point of view of the end user, no new name is needed since v6 is just an evolution of the same software they have been using. Also, we added sections in the main text to clarify the ideas in there and indicate the version of idtracker.ai we are referring to.
(1.3) There are "Protocol 2" and "Protocol 3" mixed with various versions of the software scattered throughout the text, which makes it hard to follow. There should be some systematic naming of approaches and a listing of results introduced.
Following this recommendation we no longer talk about the specific protocols of the old version of idtracker.ai in the main text. We rewritten the explanation of these versions in a more clear and straightforward way, lines 29-36.
(2) To this end, the authors leave some important concepts either underexplained or only referenced indirectly via prior work. For example, the explanation of how the fragments are created (line 15) is only explained by the "video structure" and the algorithm that is responsible for resolving the identities during crossings is not detailed (see lines 46-47, 149-150). Including summaries of these elements would improve the paper's clarity and accessibility.
We listed the specific sections from our previous publication where the reader can find information about the entire tracking pipeline (lines 539-549). This way, we keep the ms clear and focused on the new identification algorithm while indicating where to find such information.
(3) Accuracy metrics are not clear. In line 319, the authors define it as based on "proportion of errors in the trajectory". This proportion is not explained. How is the error calculated if a trajectory is lost or there are identity swaps? Multi-object tracking has a range of accuracy metrics that account for such events but none of those are used by the authors. Estimating metrics that are common for MOT literature, for example, IDF1, MOTA, and MOTP, would allow for better method performance understanding and comparison.
In the new ms, we replaced our accuracy metric with the standard metric IDF1. IDF1 is the standard metric that is applied to systems in which the goal is to maintain consistent identities across time. See also the section in Appendix 1 "Computation of tracking accuracy” explaining why IDF1 and not MOTA or MOTP is the adequate metric for a system that wants to give correct tracking by identification in time. See lines 416-436.
Using IDF1 we obtain slightly higher accuracies for the idtracker.ai systems. This is the comparison of mean accuracy four our previous accuracy and the new one for the full trajectories:
v4: 97.42% -> 98.24%
v5: 99.41% -> 99.49%
v6: 99.74% -> 99.82%
trex: 97.89% -> 97.89%
(4) Additionally, the authors distinguish between tracking with and without crossings, but do not provide statistics on the frequency of crossings per video. It is also unclear how the crossings are considered for the final output. Including information such as the frame rate of the videos would help to better understand the temporal resolution and the differences between consecutive frames of the videos.
We added this information in the Appendix 1 “Benchmark of accuracy and tracking time”, lines 445-451. The framerate in our benchmark videos goes from 25 to 60 fps (average of 37 fps). On average 2.6% of the blobs are crossings (1.1% for zebrafish 0.7% for drosophila 9.4% for mice).
(5) In the description of the dataset used for evaluation (lines 349-365), the authors describe the random sampling of parameter values for each tracking run. However, it is unclear whether the same values were used across methods. Without this clarification, comparisons between the proposed method, older versions, and TRex might be biased due to lucky parameter combinations. In addition, the ranges from which the values were randomly sampled were also not described.
Only one parameter is shared between idtracker.ai and TRex: intensity_threshold (in idtracker.ai) and threshold (in TRex). Both are conceptually equivalent but differ in their numerical values since they affect different algorithms. V4, v5, and TRex each required the same process of independent expert visual inspection of the segmentation to select the valid value range. Since versions 5 and 6 use exactly the same segmentation algorithm, they share the same parameter ranges.
All the ranges of valid values used in our benchmark are public here https://drive.google.com/drive/folders/1tFxdtFUudl02ICS99vYKrZLeF28TiYpZ as stated in the section “Data availability”, lines 227-228.
(6) Lines 122-123, Figure 1c. "batches" - is an imprecise metric of training time as there is no information about the batch size.
We clarified the Figure caption, new Figure 2c.
(7) Line 145 - "we run some steps... For example..." leaves the method description somewhat unclear. It would help if you could provide more details about how the assignments are carried out and which metrics are being used.
Following this recommendation, we listed the specific sections from our previous publication where the reader can find information about the entire tracking pipeline (lines 539-549). This way, we keep the ms clear and focused on the new identification algorithm while indicating where to find such information.
(8) Figure 3. How is tracking accuracy assessed with occlusions? Are the individuals correctly recognized when they reappear from the occluded area?
The groundtruth for this video contains the positions of all animals at all times. Only the groundtruth points inside the region of interest are taken into account when computing the accuracy. When the tracking reaches high accuracy, it means that animals are successfully relabeled every time they enter the non-masked region. Note that this software works all the time by identification of animals, so crossings and occlusion are treated the same way. What is new here is that the occlusions are so large that there are no global fragments. We clarified this in the new section “Occlusion tests” in Methods, lines 239-251.
(9) Lines 185-187 this part of the sentence is not clear.
We rewrote this part in a clearer way, lines 180-182.
(10) The authors also highlight the improved runtime performance. However, they do not provide a detailed breakdown of the time spent on each component of the tracking/training pipeline. A timing breakdown would help to compare the training duration with the other components. For example, the calculation of the Silhouette Score alone can be time-consuming and could be a bottleneck in the training process. Including this information would provide a clearer picture of the overall efficiency of the method.
We measured that the training of ResNet takes on average in our benchmark 47% of the tracking time (we added this information line 551 section “Network Architecture”). In this training stage the bottleneck becomes the network forward and backward pass, limited by the GPU performance. All other processes happening during training have been deeply optimized and parallelized when needed so their contribution to the training time is minimal. Apart from the training, we also measured 24.4% of the total tracking time spent in reading and segmenting the video files and 11.1% in processing the identification images and detecting crossings.
(11) An important part of the computational cost is related to model training. It would be interesting to test whether a model trained on one video of a specific animal type (e.g., zebrafish_5) generalizes to another video of the same type (e.g., zebrafish_7). This would assess the model's generalizability across different videos of the same species and spare a lot of compute. Alternatively, instead of training a model from scratch for each video, the authors could also consider training a base model on a superset of images from different videos and then fine-tuning it with a lower learning rate for each specific video. This could potentially save time and resources while still achieving good performance.
Already before v6, there was the possibility for the user to start training the identification network by copying the final weights from another tracking session. This knowledge transfer feature is still present in v6 and it still decreases the training times significatively. This information has been added in Appendix 4, lines 906-909.
We have already begun working on the interesting idea of a general base model but it brings some complex challenges. It could be a very useful new feature for future idtracker.ai releases.
We thank the reviewer for the many suggestions. We have implemented all of them.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This important study provides a detailed analysis of the transcriptional landscape of the mouse hippocampus in the context of various physiological states. The main conclusions have solid support: that most transcriptional targets are generally stable, with notable exceptions in the dentate gyrus and with regard to circadian changes. There are some weaknesses and it would improve the manuscript to address them.
-
Reviewer #1 (Public review):
Olmstead et al. present a single-cell nuclear sequencing dataset that interrogates how hippocampal gene expression changes in response to distinct physiological stimuli and across circadian time. The authors perform single-nucleus RNA sequencing on mouse hippocampal tissue after (1) kainic acid-induced seizure, (2) exposure to an enriched environment, and (3) at multiple circadian phases.
The dataset is rigorously collected, and a major strength is the use of the previously established ABC taxonomy from Yao et al. (2023) to define cell types. The authors further show that this taxonomy is largely independent of activity-driven transcriptional programs. Using these annotations, they examine activity-regulated gene expression across neuronal and glial subclasses. They identify ZT12, corresponding to the transition from the light to the dark period, as transcriptionally distinct from other circadian time points, and show that this pattern is conserved across many cell types. Finally, they test how circadian phase influences activity-dependent gene expression by exposing mice to an enriched environment at different times of day, and report no significant interaction between circadian phase and enriched environment exposure.
A crucial consideration for users of this dataset is the potential confounding effect between circadian phase and locomotor activity. This is particularly relevant because dentate gyrus activity is strongly modulated by locomotion. The authors acknowledge this issue in the Discussion and provide useful guidance for how to interpret their findings, considering this confound.
Taken together, this dataset represents a useful resource for the neuroscience community, particularly for investigators interested in how novel experience and circadian phase shape activity-related and immediate early gene expression in the hippocampus
-
Reviewer #2 (Public review):
This manuscript presents the ACT-DEPP dataset, a comprehensive single-nucleus RNA-sequencing atlas of the mouse hippocampus that examines how activity-dependent and circadian transcriptional programs intersect. The dataset spans multiple experimental conditions and circadian time points, clarifying how cell-type identity relates to transcriptional state. In particular, the authors compare stimulus-evoked activity programs (environmental enrichment and kainate-induced seizures) with circadian phase-dependent transcriptional oscillations. They also identify a transcriptional inflection point near ZT12 and argue that immediate early gene (IEG) induction is broadly maintained across circadian phases, with minimal ZT-dependent modulation.
Strengths:
The study is ambitious in scope and data volume, and outlines the data-processing and atlas-registration workflows. The side-by-side treatment of stimulus paradigms and ZT sampling provides a coherent framework for parsing state (activity) from phase (circadian) across diverse neuronal and non-neuronal classes. Several findings - especially the ZT12 "inflection" and the differential sensitivity of pathways across subclasses - are intriguing.
Weaknesses:
(1) The authors acknowledge, but do not adequately address, the fundamental confounding factor between circadian phase and spontaneous locomotor activity. The assertion that these represent "orthogonal regulatory axes," based on largely non-overlapping DEGs, may be overstated. The absence of behavioral monitoring during baseline is a major limitation.
(2) The statement "Thus, novel experiences and seizures trigger categorically distinct transcriptional responses-with respect to both magnitude and specific genes-in these hippocampal subregions" is overstated, given the data presented. Figure 2A-B shows that approximately one-third of EE-induced DEGs at 30 minutes overlap with KA DEGs, and this overlap increases substantially at 6 hours in CA1 (where EE and KA responses become "fully shared"). This suggests the responses are quantitatively different rather than "categorically distinct."
(3) In Figure 4B, "active cells" are defined as those with {greater than or equal to}3 of 15 IEGs above the 90th percentile, with thresholds apparently calibrated in CA1. Because baseline expression distributions differ across subclasses, this rule can bias activation rates across cell types.
(4) Few genes show significant ZT × stimulus (EE or seizure) interactions, concentrated in neuronal populations. Given unequal nucleus counts and biological replicates across subclasses, small effects may be underpowered.
(5) In Figure 6 I, J, the relationship between the highlighted pathways/functions and circadian phase is not yet explicit.
(6) Line 276-280: The enrichment of lncRNAs at ZT12 in CA1 is intriguing but underdeveloped. What are these lncRNAs, and what might they regulate?
Overall, most descriptive conclusions are supported (e.g., broad phase-robustness of classical IEGs; an inflection near ZT12). Claims about the separability/orthogonality of activity vs circadian programs, and about categorical distinctions between EE and KA responses, would benefit from more conservative wording or additional analyses to rule out behavioral and power-related alternatives.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This valuable study uses fiber photometry, implantable lenses, and optogenetics, to show that a subset of subthalamic nucleus neurons are active during movement, and that active but not passive avoidance depends in part on STN projections to substantia nigra. The strength of the evidence for these claims is solid, whereas evidence supporting the claims that STN is involved in cautious responding is unclear as presented. This paper may be of interest to basic and applied behavioural neuroscientists working on movement or avoidance.
-
Reviewer #1 (Public review):
Summary:
The manuscript presents a robust set of experiments that provide new insights into the role of STN neurons during active and passive avoidance tasks. These forms of avoidance have received comparatively less attention in the literature than the more extensively studied escape or freezing responses, despite being extremely relevant to human behaviour and more strongly influenced by cognitive control.
Strengths:
Understanding the neural infrastructure supporting avoidance behaviour would be a fundamental milestone in neuroscience. The authors employ sophisticated methods to delineate the role of STN neurons during avoidance behaviours. The work is thorough and the evidence presented is compelling. Experiments are carefully constructed, well-controlled, and the statistical analyses are appropriate.
Weaknesses:
One possible remaining conceptual concern that might require future work is determining whether STN primarily mediates higher-level cognitive avoidance or if its activation primarily modulates motor tone.
-
Reviewer #2 (Public review):
Summary:
Zhou, Sajid et al. present a study investigating the STN involvement in signaled movement. They use fiber photometry, implantable lenses, and optogenetics during active avoidance experiments to evaluate this. The data are useful for the scientific community and the overall evidence for their claims is solid, but many aspects of the findings are confusing. The authors present a huge collection of data, it is somewhat difficult to extract the key information and the meaningful implications resulting from these data.
Strengths:
The study is comprehensive in using many techniques and many stimulation powers and frequencies and configurations.
Weaknesses - re-review:
All previous weaknesses have been addressed. The authors should explain how inhibition of the STN impairing active avoidance is consistent with the STN encoding cautious action. If 'caution' is related to avoid latency, why does STN lesion or inhibition increase avoid latency, and therefore increase caution? Wouldn't the opposite be more consistent with the statement that the STN 'encodes cautious action'?
-
Reviewer #3 (Public review):
Summary:
The authors use calcium recordings from STN to measure STN activity during spontaneous movement and in a multi-stage avoidance paradigm. They also use optogenetic inhibition and lesion approaches to test the role of STN during the avoidance paradigm. The paper reports a large amount of data and makes many claims, some seem well supported to this Reviewer, others not so much.
Strengths:
Well-supported claims include data showing that during spontaneous movements, especially contraversive ones, STN calcium activity is increased using bulk photometry measurements. Single-cell measures back this claim but also show that it is only a minority of STN cells that respond strongly, with most showing no response during movement, and a similar number showing smaller inhibitions during movement.
Photometry data during cued active avoidance procedures show that STN calcium activity sharply increases in response to auditory cues, and during cued movements to avoid a footshock. Optogenetic and lesion experiments are consistent with an important role for STN in generating cue-evoked avoidance. And a strength of these results is that multiple approaches were used.
Original Weaknesses:
I found the experimental design and presentation convoluted and some of the results over-interpreted.
As presented, I don't understand this idea that delayed movement is necessarily indicative of cautious movements. Is the distribution of responses multi-modal in a way that might support this idea; or do the authors simply take a normal distribution and assert that the slower responses represent 'caution'? Even if responses are multi-modal and clearly distinguished by 'type', why should readers think this that delayed responses imply cautious responding instead of say: habituation or sensitization to cue/shock, variability in attention, motivation, or stress; or merely uncertainty which seems plausible given what I understand of the task design where the same mice are repeatedly tested in changing conditions. This relates to a major claim (i.e., in the title).
Related to the last, I'm struggling to understand the rationale for dividing cells into 'types' based the their physiological responses in some experiments.
In several figures the number of subjects used was not described. This is necessary. Also necessary is some assessment of the variability across subjects. The only measure of error shown in many figures relates trial-to-trial or event variability, which is minimal because in many cases it appears that hundreds of trials may have been averaged per animal, but this doesn't provide a strong view of biological variability (i.e., are results consistent across animals?).
It is not clear if or how spread of expression outside of target STN was evaluated, and if or how or how many mice were excluded due to spread or fiber placements. Inadequate histological validation is presented and neighboring regions that would be difficult to completely avoid, such as paraSTN may be contributing to some of the effects.
Raw example traces are not provided.
The timeline of the spontaneous movement and avoidance sessions were not clear, nor the number of events or sessions per animal and how this was set. It is not clear if there was pre-training or habituation, if many or variable sessions were combined per animal, or what the time gaps between sessions was, or if or how any of these parameters might influence interpretation of the results.
Comments on revised version:
The authors removed the optogenetic stimulation experiments, but then also added a lot of new analyses. Overall the scope of their conclusions are essentially unchanged.
Part of the eLife model is to leave it to the authors discretion how they choose to present their work. But my overall view of it is unchanged. There are elements that I found clear, well executed, and compelling. But other elements that I found difficult to understand and where I could not follow or concur with their conclusions.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #2 (Public review):
(1) Vglut2 isn't a very selective promoter for the STN. Did the authors verify every injection across brain slices to ensure the para-subthalamic nucleus, thalamus, lateral hypothalamus, and other Vglut2-positive structures were never infected?
The STN is anatomically well-confined, with its borders and the overlying zona incerta (composed of GABAergic neurons) providing protection against off-target expression in most neighboring forebrain regions. All viral injections were histologically verified and did not into extend into thalamic or hypothalamic areas. As described in the Methods, we employed an app we developed (Brain Atlas Analyzer, available on OriginLab) that aligns serial histological sections with the Allen Brain Atlas to precisely assess viral spread and confirm targeting accuracy. The experiments included in the revised manuscript now focus on optogenetic inhibition and irreversible lesion approaches—three complementary methods that consistently targeted the STN and yielded similar behavioral effects.
(2) The authors say in the methods that the high vs low power laser activation for optogenetic experiments was defined by the behavioral output. This is misleading, and the high vs low power should be objectively stated and the behavioral results divided according to the power used, not according to the behavioral outcome.
Optogenetic excitation is no longer part of the study.
(3) In the fiber photometry experiments exposing mice to the range of tones, it is impossible to separate the STN response to the tone from the STN response to the movement evoked by the tone. The authors should expose the mouse to the tones in a condition that prevents movement, such as anesthetized or restrained, to separate out the two components.
The new mixed-effects modeling approach clearly differentiates sensory (auditory) from motor contributions during tone-evoked STN activation. In prior work (see Hormigo et al, 2023, eLife), we explored experimental methods such as head restraint or anesthesia to reduce movement, but we concluded that these approaches are unsuitable for addressing this question. Mice exhibit substantial residual movement even when head-fixed, and anesthesia profoundly alters neural excitability and behavioral state, introducing major confounds. To fully eliminate movement would require paralysis and artificial ventilation, which would again disrupt physiological network dynamics and raise ethical concerns. Therefore, the current modeling approach—incorporating window-specific covariates for movement—is the most appropriate and rigorous way to dissociate tone-evoked sensory activity from motor activity in behaving animals.
(4) The claim 'STN activation is ideally suited to drive active avoids' needs more explanation. This claim comes after the fiber photometry experiments during active avoidance tasks, so there has been no causality established yet.
Text adjusted.
(5) The statistical comparisons in Figure 7E need some justification and/or clarification. The 9 neuron types are originally categorized based on their response during avoids, then statistics are run showing that they respond differently during avoids. It is no surprise that they would have significantly different responses, since that is how they were classified in the first place. The authors must explain this further and show that this is not a case of circular reasoning.
Statistically verifying the clustering is useful to ensure that the selected number of clusters reflects distinct classes. It is also necessary when different measurements are used to classify (movement time series classified the avoids) and to compare neuronal types within each avoid mode/class (know called “mode”). Moreover, the new modeling approach goes beyond the prior statistical limitations related to considering movement and neuronal variables separately.
(6) The authors show that neurons that have strong responses to orientation show reduced activity during avoidance. What are the implications of this? The author should explain why this is interesting and important.
The new modeling approach goes beyond the prior analysis limitations. For instance, it shows that most of the prior orienting related activations closely reflect the orienting movement, and only in a few cases (noted and discussed in the results) orienting activations are related to the behavioral contingencies or behavioral outcomes in the task.
(8) The experiments in Figure 10 are used to say that STN stimulation is not aversive, but they only show that STN stimulation cannot be used as punishment in place of a shock. This doesn't mean that it is not aversive; it just means it is not as aversive as a shock. The authors should do a simpler aversion test, such as conditioned or real-time place preference, to claim that STN stimulation is not aversive. This is particularly surprising as previous work (Serra et al., 2023) does show that STN stimulation is aversive.
Optogenetic excitation is no longer part of the study.
(7) It is not clear which conditions each mouse experienced in which order. This is critical to the interpretation of Figure 9 and the reduction of passive avoids during STN stimulation. Did these mice have the CS1+STN stimulation pairing or the STN+US pairing prior to this experiment? If they did, the stimulation of the STN could be strongly associated with either punishment or with the CS1 that predicts punishment. If that is the case, stimulating the STN during CS2 could be like presenting CS1+CS2 at the same time and could be confusing.
Optogenetic excitation is no longer part of the study.
(8) The experiments in Figure 10 are used to say that STN stimulation is not aversive, but they only show that STN stimulation cannot be used as punishment in place of a shock. This doesn't mean that it is not aversive; it just means it is not as aversive as a shock. The authors should do a simpler aversion test, such as conditioned or real-time place preference, to claim that STN stimulation is not aversive. This is particularly surprising as previous work (Serra et al., 2023) does show that STN stimulation is aversive.
Optogenetic excitation is no longer part of the study.
(9) In the discussion, the idea that the STN encodes 'moving away' from contralateral space is pretty vague and unsupported. It is puzzling that the STN activates more strongly to contraversive turns, but when stimulated, it evokes ipsiversive turns; however, it seems a stretch to speculate that this is related to avoidance. In the last experiments of the paper, the axons from the STN to the GPe and to the midbrain are selectively stimulated. Do these evoke ipsiversive turns similarly?
Optogenetic excitation is no longer part of the study.
(10) In the discussion, the authors claim that the STN is essential for modulating action timing in response to demands, but their data really only show this in one direction. The STN stimulation reliably increases the speed of response in all conditions (except maximum speed conditions such as escapes). It seems to be over-interpreting the data to say this is an inability to modulate the speed of the task, especially as clear learning and speed modulation do occur under STN lesion conditions, as shown in Figure 12B. The mice learn to avoid and increase their latency in AA2 vs AA1, though the overall avoids and latency are different from controls. The more parsimonious conclusion would be that STN stimulation biases movement speed (increasing it) and that this is true in many different conditions.
Optogenetic excitation is no longer part of the study.
(11) In the discussion, the authors claim that the STN projections to the midbrain tegmentum directly affect the active avoidance behavior, while the STN projections to the SNr do not affect it. This seems counter to their results, which show STN projections to either area can alter active avoidance behavior. What is the laser power used in these terminal experiments? If it is high (3mW), the authors may be causing antidromic action potentials in the STN somas, resulting in glutamate release in many brain areas, even when terminals are only stimulated in one area. The authors could use low (0.25mW) laser power in the terminals to reduce the chance of antidromic activation and spatially restrict the optical stimulation.
Optogenetic excitation is no longer part of the study.
(12) Was normality tested for data prior to statistical testing?
Yes, although now we use mixed models
(13) Why are there no error bars on Figure 5B, black circles and orange triangles?
When error bars are not visible, they are smaller than the trace thickness or bar line—for example, in Figure 5B, the black circles and orange triangles include error bars, but they are smaller than the symbol size.
Reviewer #3 (Public review):
(1) I really don't understand or accept this idea that delayed movement is necessarily indicative of cautious movements. Is the distribution of responses multi-modal in a way that might support this idea, or do the authors simply take a normal distribution and assert that the slower responses represent 'caution'? Even if responses are multi-modal and clearly distinguished by 'type', why should readers think this that delayed responses imply cautious responding instead of say: habituation or sensitization to cue/shock, variability in attention, motivation, or stress; or merely uncertainty which seems plausible given what I understand of the task design where the same mice are repeatedly tested in changing conditions. This relates to a major claim (i.e., in the work's title).
In our study, “caution” is defined operationally as the tendency to delay initiation of an avoidance response in demanding situations (e.g., taking more time or care before crossing a busy street). The increase in avoidance latency with task difficulty is highly robust, as we have shown previously through detailed analyses of timing distributions and direct comparisons with appetitive behaviors (e.g., Zhou et al., 2022 JNeurosci). Moreover, we used the tracked movement time series to statistically classify responses into cautious modes, which is likely novel. This definition can dissociate cautious responding from broader constructs listed by a reviewer, such as attention, motivation, or stress, which must be explicitly defined to be rigorously considered in this context, including the likelihood that they covary with caution without being equivalent to it.
Cue-evoked orienting responses at CS onset are directly measured, and their habituation and sensitization have been characterized in our prior work (e.g., Zhou et al., 2023 JNeurosci). US-evoked escapes are also measured in the present study and directly compared with avoidance responses. Together, these analyses provide a rigorous and consistent framework for defining and quantifying caution within our behavioral procedures.
Importantly, mice exhibit cautious responding as defined here across different tasks, making it more informative to classify avoidance responses by behavioral mode rather than by task alone. Accordingly, in the miniscope, single-neuron, and mixed-effects model analyses, we classified active avoids into distinct modes reflecting varying levels of caution. Although these modes covary with task contingencies, their explicit classification improves model predictability and interpretability with respect to cautious responding.
(2) Related to the last, I'm struggling to understand the rationale for dividing cells into 'types' based the their physiological responses in some experiments (e.g., Figure 7).
This section has now been expanded into 3 figures (Fig. 7-9) with new modeling approaches that should make the rationale more straight forward.
By emphasizing the mixed-effects modeling results and integrating these analyses directly into the figures, the revised manuscript now more clearly delineates what is encoded at the population and single-neuron levels. Including movement and baseline covariates allowed us to dissociate motor-related modulation from other neural signals, substantially clarifying the distinction between movement encoding and other task-related variables, which we focus on in the paper. These analyses confirm the strong role of the STN in representing movement while revealing additional signals related to aversive stimulation and cautious responding that persist after accounting for motor effects. These signals arise from distinct neuronal populations that can be differentiated by their movement sensitivity and activation patterns across avoidance modes, reflecting varying levels of caution. At the same time, several effects that initially reflected orienting-related activity at CS-onset (note that our movement tracking captures both head position and orientation as a directional vector) dissipated once movement and baseline covariates were included in the models, emphasizing the utility of the analytical improvements in the revision.
(3)The description and discussion of orienting head movements were not well supported, but were much discussed in the avoidance datasets. The initial speed peaks to cue seem to be the supporting data upon which these claims rest, but nothing here suggests head movement or orientation responses.
As described in the methods (and noted above), we track the head and decompose the movement into rotational and translational components. With the new approach, several effects that initially reflected orienting-related activity at CS-onset (note that our movement tracking captures both head position and orientation as a directional vector) dissipated once movement and baseline covariates were included in the models, emphasizing the utility of the analytical improvements in the revision.
(4) Similar to the last, the authors note in several places, including abstract, the importance of STN in response timing, i.e., particularly when there must be careful or precise timing, but I don't think their data or task design provides a strong basis for this claim.
The avoidance modes and the measured latencies directly support the relation to action timing, but now the portion of the previous paper about optogenetic excitation and apparently the main source of criticism is no longer in the present study.
(5) I think that other reports show that STN calcium activity is recruited by inescapable foot shock as well. What do these authors see? Is shock, independent of movement, contributing to sharp signals during escapes?
The question, “Is shock, independent of movement, contributing to sharp signals during escapes?” is now directly addressed in the revised analyses. By incorporating movement and baseline covariates into the mixed-effects models, we dissociate STN activity related to aversive stimulation from that associated with motor output. The results show that shock-evoked STN activation persists even after controlling for movement within defined neuronal populations, supporting a specific nociceptive contribution independent of motor dynamics—a dissociation that appears to be new in this field.
(6) In particular, and related to the last point, the following work is very relevant and should be cited: Note that the focus of this other paper is on a subset of VGLUT2+ Tac1 neurons in paraSTN, but using VGLUT2-Cre to target STN will target both STN and paraSTN.
We appreciate the reviewer’s reference to the recent preprint highlighting the role of the para-subthalamic nucleus in avoidance learning. However, our study focused specifically on performance in well-trained mice rather than on learning processes. Behavioral learning is inherently more variable and can be disrupted by less specific manipulations, whereas our experiments targeted the stable execution of learned avoidance behaviors. Future work will extend these findings to the learning phase and examine potential contributions of subthalamic subdivisions, which our current Vglut2-based manipulations do not dissociate. We will consider this and related work more closely in those studies.
(7) In multiple other instances, claims that were more tangential to the main claims were made without clearly supporting data or statistics. E.g., claim that STN activation is related to translational more than rotational movement; claim that GCaMP and movement responses to auditory cues were small; claims that 'some animals' responded differently without showing individual data.
We have adjusted the text accordingly.
(8) In several figures, the number of subjects used was not described. This is necessary. Also necessary is some assessment of the variability across subjects. The only measure of error shown in many figures relates to trial-to-trial or event variability, which is minimal because, in many cases, it appears that hundreds of trials may have been averaged per animal, but this doesn't provide a strong view of biological variability. When bar/line plots are used to display data, I recommend showing individual animals where feasible.
All experiments report number of mice and sessions. Wherever feasible, we display individual data points (e.g., Figures 1 and 2) to convey variability directly. However, in cases where figures depict hundreds of paired (repeated-measures) data points, showing all points without connecting them would not be appropriate, while linking them would make the figures visually cluttered and uninterpretable. All plots and traces include measures of variability (SEM), and the raw data will be shared on Dryad. When error bars are not visible, they are smaller than the trace thickness or bar line—for example, in Figure 5B, the black circles and orange triangles include error bars, but they are smaller than the symbol size.
Also, to minimize visual clutter, only a subset of relevant comparisons is highlighted with asterisks, whereas all relevant statistical results, comparisons, and mouse/session numbers are fully reported in the Results section, with statistical analyses accounting for the clustering of data within subjects and sessions.
(9) Can the authors consider the extent to which calcium imaging may be better suited to identify increases compared to decreases and how this may affect the results, particularly related to the GRIN data when similar numbers of cells show responses in both directions (e.g., Figure 3)?
This is an interesting issue related to a widely used technique beyond the scope of our study.
(10) Raw example traces are not provided.
We do not think raw traces are useful here. All figures contain average traces to reflect the activity of the estimated population.
(11) The timeline of the spontaneous movement and avoidance sessions was not clear, nor was the number of events or sessions per animal nor how this was set. It is not clear if there was pre-training or habituation, if many or variable sessions were combined per animal, or what the time gaps between sessions were, or if or how any of these parameters might influence interpretation of the results.
We have enhanced the description of the sessions, including the number of animals and sessions, which are daily and always equal per animals in each group of experiments. As noted, the sessions are part of the random effects in the model.
(12) It is not clear if or how the spread of expression outside of the target STN was evaluated, and if or how many mice were excluded due to spread or fiber placements.
The STN is anatomically well-confined, with its borders and the overlying zona incerta (composed of GABAergic neurons) providing protection against off-target expression in most neighboring forebrain regions. All viral injections were histologically verified and did not into extend into thalamic or hypothalamic areas. As described in the Methods, we employed an app we developed (Brain Atlas Analyzer, available on OriginLab) that aligns serial histological sections with the Allen Brain Atlas to precisely assess viral spread and confirm targeting accuracy. The experiments included in the revised manuscript now focus on optogenetic inhibition and irreversible lesion approaches—three complementary methods that consistently targeted the STN and yielded similar behavioral effects.
Recommendations for the authors:
Reviewing Editor Comments:
The primary feedback agreed upon by all the reviewers was that the manuscript requires significant streamlining as it is currently overly long and convoluted.
We thank the reviewers and editors for their thoughtful and constructive feedback. In response to the primary comment that “the manuscript requires significant streamlining as it is currently overly long and convoluted,” we have substantially revised and refocused the paper. Specifically, we streamlined the included data and enhanced the analyses to emphasize the central findings: the encoding of movement, cautious responding, and punishment in the STN during avoidance behavior. We also focused the causal component of the study by including only the loss-of-function experiments—both optogenetic inhibition and irreversible viral/electrolytic lesions—that establish the critical role of STN circuits in generating active avoidance. Together, these revisions enhance clarity, tighten the narrative focus, and align the manuscript more closely with the reviewers’ recommendations.
Major revisions include the addition of mixed-effects modeling to dissociate the contributions of movement from other STN-encoded signals related to caution and punishment. This modeling approach allowed us to reveal that these components are statistically separable, demonstrating that movement, cautious responding, and aversive input are encoded by neuronal subsets. To streamline the manuscript and address reviewer concerns, we removed the optogenetic excitation experiments. As revised, the paper presents a more concise and cohesive narrative showing that STN neurons differentially encode movement, caution, and aversive stimuli, and that this circuitry is essential for generating active avoidance behavior.
Many of the specific points raised by reviewers now fall outside the scope of the revised manuscript. This is primarily because the revised version omits data and analyses related to optogenetic excitation and associated control experiments. By removing these components, the paper now presents a streamlined and internally consistent dataset focused on how the STN encodes movement, cautious responding, and aversive outcomes during avoidance behavior, as well as on loss-of-function experiments demonstrating its necessity for generating active avoidance. Below, we address the points that remain relevant across reviews.
Following extensive revisions, the current manuscript differs in several important ways from what the assessment describes:
The description that the study “uses fiber photometry, implantable lenses, and optogenetics” is more accurately represented as using both fiber photometry and singleneuron calcium imaging with miniscopes, combined with optogenetic and irreversible lesion approaches.
The phrase stating that “active but not passive avoidance depends in part on STN projections to substantia nigra” is better characterized as “STN projections to the midbrain,” since our data show that optogenetic inhibition of STN terminals in both the mesencephalic reticular tegmentum (MRT) and substantia nigra pars reticulata (SNr) produce equivalent effects, and thus these sites are combined in the study.
Finally, the original concern that evidence for STN involvement in cautious responding or avoidance speed was incomplete no longer applies. The revised focus on encoding, through the inclusion of mixed-effects modeling, now dissociates movement-related, cautious, and aversive components of STN activity. By removing the optogenetic excitation data, we no longer claim that the STN controls caution but rather that it encodes cautious responding, alongside movement and punishment signals. Furthermore, loss-of-function experiments demonstrate that silencing STN output abolishes active avoidance entirely, supporting an essential role for the STN in generating goal-directed avoidance behavior—a behavioral domain that, unlike appetitive responding, is fundamentally defined by caution and the need to balance action timing under threat.
Reviewer #2 (Recommendations for the authors):
(1) Show individual data points on bar plots.
Wherever feasible, we display individual data points (e.g., Figures 1 and 2) to convey variability directly. However, in cases where figures depict hundreds of paired (repeatedmeasures) data points, showing all points without connecting them would not be appropriate, while linking them would make the figures visually cluttered and uninterpretable. All plots and traces include measures of variability (SEM), and the raw data will be shared on Dryad. When error bars are not visible, they are smaller than the trace thickness or bar line—for example, in Figure 5B, the black circles and orange triangles include error bars, but they are smaller than the symbol size.
Also, to minimize visual clutter, only a subset of relevant comparisons is highlighted with asterisks, whereas all relevant statistical results, comparisons, and mouse/session numbers are fully reported in the Results section, with statistical analyses accounting for the clustering of data within subjects and sessions.
(2) The active avoidance experiments are confusing when they are introduced in the results section. More explanation of what paradigms were used and what each CS means at the time these are introduced would add clarity. For example, AA1, AA2, etc, are explained only with references to other papers, but a brief description of each protocol and a schematic figure would really help.
The avoidance protocols (AA1–4) are now described briefly but clearly in the Results section (second paragraph of “STN neurons activate during goal-directed avoidance contingencies”) and in greater detail in the Methods section. As stated, these tasks were conducted sequentially, and mice underwent the same number of sessions per procedure, which are indicated. All relevant procedural information has been included in these sections. Mice underwent daily sessions and learnt these tasks within 1-2 sessions, progressing sequentially across tasks with an equal number of sessions per task (7 per task), and the resulting data were combined and clustered by mouse/session in the statistical models.
(3) How do the Class 1, 2, 3 avoids relate to Class 1, 2, 3 neural types established in Figure 3? It seems like they are not related, and if that is the case, they should be named something different from each other to avoid confusion. (4) Similarly, having 3 different cell types (a,b,c) in the active avoidance seems unrelated to the original classification of cell types (1,2,3), and these are different for each class of avoid. This is very confusing, and it is unclear how any of these types relate to each other. Presumably, the same mouse has all three classes of avoids, so there are recordings from each cell during each type of avoid.
The terms class, mode, and type are now clearly distinguished throughout the manuscript. Modes refer to distinct patterns of avoidance behavior that differ in the level of cautious responding (Mode 3 is most cautious). Within each mode, types denote subgroups of neurons identified based on their ΔF/F activity profiles. In contrast, classes categorize neurons according to their relationship to movement, determined by cross-correlation analyses between ΔF/F and head speed (Class1-4; Fig. 7 is a new analysis) or head turns (ClassA-C, renamed from 1-3). This updated terminology clarifies the analytic structure, highlighting distinct neuronal populations within each analysis. For example, during avoidance behaviors, these classifications distinguish neurons encoding movement-, caution-, and outcome-related signals. Comparisons are conducted within each analytical set, within classes (A-C or 1-4 separately), within avoidance modes, or within modespecific neuronal types.
…So the authors could compare one cell during each avoid and determine whether it relates to movement or sound, or something else. It is interesting that types a,b, and c have the exact same proportions in each class of avoid, and makes it important to investigate if these are the exact same cells or not.
That previous table with the a,b,c % in the three figure panels was a placeholder, which was not updated in the included figure. It has now been correctly updated. They do not have the same proportions as shown in Fig. 9, although they are similar.
Also, these mice could be recorded during the open field, so the original neural classification (class 1, 2,3) could be applied to these same cells, and then the authors can see whether each cell type defined in the open field has a different response to the different avoid types. As it stands, the paper simply finds that during movement and during avoidance behaviors, different cells in the STN do different things.
We included a new analysis in Fig. 7 that classifies neurons based on the cross-correlation with movement. The inclusion of the models now clearly assigns variance to movement versus the other factors, and this analysis leads to the classification based on avoid modes.
(5) The use of the same colors to mean two different things in Figure 9 is confusing. AA1 vs AA2 shouldn't be the same colors as light-naïve vs light signaling CS.
Optogenetic excitation is no longer part of the study.
(6) The exact timeline of the optogenetics experiments should be presented as a schematic for understanding. It is not clear which conditions each mouse experienced in which order. This is critical to the interpretation of Figure 9 and the reduction of passive avoids during STN stimulation. Did these mice have the CS1+STN stimulation pairing or the STN+US pairing prior to this experiment? If they did, the stimulation of the STN could be strongly associated with either punishment or with the CS1that predicts punishment. If that is the case, stimulating the STN during CS2 could be like presentingCS1+CS2 at the same time and could be confusing. The authors should make it clear whether the mice were naïve during this passive avoid experiment or whether they had experienced STN stimulation paired with anything prior to this experiment.
Optogenetic excitation is no longer part of the study.
(20) Similarly, the duration of the STN stimulation should be made clear on the plots that show behavior over time (e.g., Figure 9E).
Optogenetic excitation is no longer part of the study.
(21) There is just so much data and so many conditions for each experiment here. The paper is dense and difficult to read. It would really benefit readability if the authors put only the key experiments and key figure panels in the main text and moved much of the repetitive figure panels to supplemental figures. The addition of schematic drawings for behavioral experiment timing and for the different AA1, AA2, and AA3 conditions would also really improve clarity.
By focusing the study, we believe it has substantially improved clarity and readability.
Reviewer #3 (Recommendations for the authors):
(1) Minor error in results 'Cre-AAV in the STN of Vglut2-Cre' Fixed.
(2) In some Figure 2 panels, the peaks appear to be cut off, and blue traces are obscured by red.
In Fig. 2, the peaks of movement (speed) traces are intentionally truncated to emphasize the rising phase of the turn, which would otherwise be obscured if the full y-axis range were displayed (peaks and other measures are statistically compared). This adjustment enhances clarity without omitting essential detail and is now noted in the legend.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This valuable study provides a 3D standardised anatomical atlas of the brain of an orb-weaving spider. The authors describe the brain's shape and its inner compartments-the neuropils-and add information on the distribution of a number of neuroactive substances such as neurotransmitters and neuropeptides. Through the use of histological and microscopy methods the authors provide a more complete view of an arachnid brain than previous studies and also presents convincing evidence about the organisation and homology of brain regions. The work will serve as a reference for future studies on spider brains and will enables comparisons of brain regions with insects so that the evolution of these structures can be inferred across arthropods.
-
Reviewer #1 (Public review):
Summary:
Artiushin et al. establish a comprehensive 3D atlas of the brain of the orb-web building spider Uloborus diversus. First, they use immunohistochemistry detection of synapsin to mark and reconstruct the neuropils of the brain of six specimen and they generate a standard brain by averaging these brains. Onto this standard 3D brain, they plot immunohistochemical stainings of major transmitters to detect cholinergic, serotonergic, octopaminergic/taryminergic and GABAergic neurons, respectively. Further, they add information on the expression of a number of neuropeptides (Proctolin, AllatostatinA, CCAP and FMRFamide). Based on this data and 3D reconstructions, they extensively describe the morphology of the entire synganglion, the discernable neuropils and their neurotransmitter/neuromodulator content.
Strengths:
While 3D reconstruction of spider brains and the detection of some neuroactive substances have been published before, this seems to be the most comprehensive analysis so far both in terms of number of substances tested and the ambition to analyzing the entire synganglion. Interestingly, besides the previously described neuropils, they detect a novel brain structure, which they call the tonsillar neuropil.
Immunohistochemistry, imaging and 3D reconstruction are convincingly done and the data is extensively visualized in figures, schemes and very useful films, which allow the reader to work with the data. Due to its comprehensiveness, this dataset will be a valuable reference for researchers working on spider brains or on the evolution of arthropod brains.
Weaknesses:
As expected for such a descriptive groundwork, new insights or hypotheses are limited while the first description of the tonsillar neuropil is interesting. The reconstruction of the main tracts of the brain would be a very valuable complementary piece of data.
-
Reviewer #2 (Public review):
Summary
Artiushin et al. created the first three-dimensional atlas of a synganglion in the hackled orb-weaver spider, which is becoming a popular model for web-building behavior. Immunohistochemical analysis with an impressive array of antisera reveal subcompartments of neuroanatomical structures described in other spider species as well as two previously undescribed arachnid structures, the protocerebral bridge, hagstone, and paired tonsillar neuropils. The authors describe the spider's neuroanatomy in detail and discuss similarities and differences from other spider species. The final section of the discussion examines the homology between onychophoran and chelicerate arcuate bodies and mandibulate central bodies.
Strengths
The authors set out to create a detailed 3D atlas and accomplished this goal.
Exceptional tissue clearing and imaging of the nervous system reveals the three-dimensional relationships between neuropils and some connectivity that would not be apparent in sectioned brains.
Detailed anatomical description makes it easy to reference structures described between the text and figures.
The authors used a large palette of antisera which may each be investigated in future studies for function in the spider nervous system and may be compared across species.
Weaknesses addressed in the revision
Additional added information about spider-specific neuropils helps orient a non-expert reader. While the function and connectivity of many of these structures is currently unknown, this study will be foundational in future investigations of function.
-
Reviewer #3 (Public review):
Summary:
This is an impressive paper that offers a much-needed 3D standardized brain atlas for the hackled-orb weaving spider Uloborus diversus, an emerging organism of study in neuroethology. The authors used a detailed immunohistological wholemount staining method that allowed them to localize a wide range of common neurotransmitters and neuropeptides and map them on a common brain atlas. Through this approach, they discovered groups of cells that may form parts of neuropils that had not previously been described, such as the 'tonsillar neuropil', which might be part of a larger insect-like central complex. Further, this work provides unique insights into previously underappreciated complexity of higher-order neuropils in spiders, particularly the arcuate body, and hints at a potentially important role for the mushroom bodies in vibratory processing for web-building spiders.
Strengths:
To understand brain function, data from many experiments on brain structure must be compiled to serve as a reference and foundation for future work. As demonstrated by the overwhelming success in genetically tractable laboratory animals, 3D standardized brain atlases are invaluable tools-especially as increasing amounts of data are obtained at the gross morphological, synaptic, and genetic levels, and as functional data from electrophysiology and imaging are integrated. Among 'non-model' organisms, such approaches have included global silver staining and confocal microscopy, MRI, and more recently, micro-computed tomography (X-ray) scans used to image multiple brains and average them into a composite reference. In this study, the authors used synapsin immunoreactivity to generate an averaged spider brain as a scaffold for mapping immunoreactivity to other neuromodulators. Using this framework, they describe many previously known spider brain structures and also identify some previously undescribed regions. They argue that the arcuate body-a midline neuropil thought to have diverged evolutionarily from the insect central complex-shows structural similarities that may support its role in path integration and navigation.
Having diverged from insects such as the fruit fly Drosophila melanogaster over 400 million years ago, spiders are an important group for study-particularly due to their elegant web-building behavior, which is thought to have contributed to their remarkable evolutionary success. How such exquisitely complex behavior is supported by a relatively small brain remains unclear. A rich tradition of spider neuroanatomy emerged in the previous century through the work of comparative zoologists, who used reduced silver and Golgi stains to reveal remarkable detail about gross neuroanatomy. Yet, these techniques cannot uncover the brain's neurochemical landscape, highlighting the need for more modern approaches-such as those employed in the present study.
A key insight from this study involves two prominent higher-order neuropils of the protocerebrum: the arcuate body and the mushroom bodies. The authors show that the arcuate body has a more complex structure and lamination than previously recognized, suggesting it is insect central complex-like and may support functions such as path integration and navigation, which are critical during web building. They also report strong synapsin immunoreactivity in the mushroom bodies and speculate that these structures contribute to vibratory processing during sensory feedback, particularly in the context of web building and prey localization. These findings align with prior work that noted the complex architecture of both neuropils in spiders and their resemblance (and in some cases greater complexity) compared to their insect counterparts. Additionally, the authors describe previously unrecognized neuropils, such as the 'tonsillar neuropil,' whose function remains unknown but may belong to a larger central complex. The diverse patterns of neuromodulator immunoreactivity further suggest that plasticity plays a substantial role in central circuits.
Weaknesses:
My major concern, however, is some of the authors' neuroanatomical descriptions rely too heavily on inference rather than what is currently resolvable from their immunohistochemistry stains alone.
Comments on revisions:
I thought that the authors did an excellent job responding to the reviews, and I have no further comments.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary:
Artiushin et al. establish a comprehensive 3D atlas of the brain of the orb-web building spider Uloborus diversus. First, they use immunohistochemistry detection of synapsin to mark and reconstruct the neuropils of the brain of six specimens and they generate a standard brain by averaging these brains. Onto this standard 3D brain, they plot immunohistochemical stainings of major transmitters to detect cholinergic, serotonergic, octopaminergic/taryminergic and GABAergic neurons, respectively. Further, they add information on the expression of a number of neuropeptides (Proctolin, AllatostatinA, CCAP, and FMRFamide). Based on this data and 3D reconstructions, they extensively describe the morphology of the entire synganglion, the discernible neuropils, and their neurotransmitter/neuromodulator content.
Strengths:
While 3D reconstruction of spider brains and the detection of some neuroactive substances have been published before, this seems to be the most comprehensive analysis so far, both in terms of the number of substances tested and the ambition to analyze the entire synganglion. Interestingly, besides the previously described neuropils, they detect a novel brain structure, which they call the tonsillar neuropil.<br /> Immunohistochemistry, imaging, and 3D reconstruction are convincingly done, and the data are extensively visualized in figures, schemes, and very useful films, which allow the reader to work with the data. Due to its comprehensiveness, this dataset will be a valuable reference for researchers working on spider brains or on the evolution of arthropod brains.
Weaknesses:
As expected for such a descriptive groundwork, new insights or hypotheses are limited, apart from the first description of the tonsillar neuropil. A more comprehensive labeling in the panels of the mentioned structures would help to follow the descriptions. The reconstruction of the main tracts of the brain would be a very valuable complementary piece of data.
Reviewer #2 (Public review):
Summary
Artiushin et al. created the first three-dimensional atlas of a synganglion in the hackled orb-weaver spider, which is becoming a popular model for web-building behavior. Immunohistochemical analysis with an impressive array of antisera reveals subcompartments of neuroanatomical structures described in other spider species as well as two previously undescribed arachnid structures, the protocerebral bridge, hagstone, and paired tonsillar neuropils. The authors describe the spider's neuroanatomy in detail and discuss similarities and differences from other spider species. The final section of the discussion examines the homology between onychophoran and chelicerate arcuate bodies and mandibulate central bodies.
Strengths
The authors set out to create a detailed 3D atlas and accomplished this goal.
Exceptional tissue clearing and imaging of the nervous system reveal the three-dimensional relationships between neuropils and some connectivity that would not be apparent in sectioned brains.
A detailed anatomical description makes it easy to reference structures described between the text and figures.
The authors used a large palette of antisera which may be investigated in future studies for function in the spider nervous system and may be compared across species.
Weaknesses
It would be useful for non-specialists if the authors would introduce each neuropil with some orientation about its function or what kind of input/output it receives, if this is known for other species. Especially those structures that are not described in other arthropods, like the opisthosomal neuropil. Are there implications for neuroanatomical findings in this paper on the understanding of how web-building behaviors are mediated by the brain?
Likewise, where possible, it would be helpful to have some discussion of the implications of certain neurotransmitters/neuropeptides being enriched in different areas. For example, GABA would signal areas of inhibitory connections, such as inhibitory input to mushroom bodies, as described in other arthropods. In the discussion section on relationships between spider and insect midline neuropils, are there similarities in expression patterns between those described here and in insects?
Reviewer #3 (Public review):
Summary:
This is an impressive paper that offers a much-needed 3D standardized brain atlas for the hackled-orb weaving spider Uloborus diversus, an emerging organism of study in neuroethology. The authors used a detailed immunohistological whole-mount staining method that allowed them to localize a wide range of common neurotransmitters and neuropeptides and map them on a common brain atlas. Through this approach, they discovered groups of cells that may form parts of neuropils that had not previously been described, such as the 'tonsillar neuropil', which might be part of a larger insect-like central complex. Further, this work provides unique insights into the previously underappreciated complexity of higher-order neuropils in spiders, particularly the arcuate body, and hints at a potentially important role for the mushroom bodies in vibratory processing for web-building spiders.
Strengths:
To understand brain function, data from many experiments on brain structure must be compiled to serve as a reference and foundation for future work. As demonstrated by the overwhelming success in genetically tractable laboratory animals, 3D standardized brain atlases are invaluable tools - especially as increasing amounts of data are obtained at the gross morphological, synaptic, and genetic levels, and as functional data from electrophysiology and imaging are integrated. Among 'non-model' organisms, such approaches have included global silver staining and confocal microscopy, MRI, and, more recently, micro-computed tomography (X-ray) scans used to image multiple brains and average them into a composite reference. In this study, the authors used synapsin immunoreactivity to generate an averaged spider brain as a scaffold for mapping immunoreactivity to other neuromodulators. Using this framework, they describe many previously known spider brain structures and also identify some previously undescribed regions. They argue that the arcuate body - a midline neuropil thought to have diverged evolutionarily from the insect central complex - shows structural similarities that may support its role in path integration and navigation.
Having diverged from insects such as the fruit fly Drosophila melanogaster over 400 million years ago, spiders are an important group for study - particularly due to their elegant web-building behavior, which is thought to have contributed to their remarkable evolutionary success. How such exquisitely complex behavior is supported by a relatively small brain remains unclear. A rich tradition of spider neuroanatomy emerged in the previous century through the work of comparative zoologists, who used reduced silver and Golgi stains to reveal remarkable detail about gross neuroanatomy. Yet, these techniques cannot uncover the brain's neurochemical landscape, highlighting the need for more modern approaches-such as those employed in the present study.
A key insight from this study involves two prominent higher-order neuropils of the protocerebrum: the arcuate body and the mushroom bodies. The authors show that the arcuate body has a more complex structure and lamination than previously recognized, suggesting it is insect central complex-like and may support functions such as path integration and navigation, which are critical during web building. They also report strong synapsin immunoreactivity in the mushroom bodies and speculate that these structures contribute to vibratory processing during sensory feedback, particularly in the context of web building and prey localization. These findings align with prior work that noted the complex architecture of both neuropils in spiders and their resemblance (and in some cases greater complexity) compared to their insect counterparts. Additionally, the authors describe previously unrecognized neuropils, such as the 'tonsillar neuropil,' whose function remains unknown but may belong to a larger central complex. The diverse patterns of neuromodulator immunoreactivity further suggest that plasticity plays a substantial role in central circuits.
Weaknesses:
My major concern, however, is that some of the authors' neuroanatomical descriptions rely too heavily on inference rather than what is currently resolvable from their immunohistochemistry stains alone.
We would like to thank the reviewers for their time and effort in carefully reading our manuscript and providing helpful feedback, and particularly for their appreciation and realistic understanding of the scope of this study and its context within the existing spider neuroanatomical literature.
Regarding the limitations and potential additions to this study, we believe these to be well-reasoned and are in agreement. We plan to address some of these shortcomings in future publications.
As multiple reviewers remarked, a mapping of the major tracts of the brain would be a welcome addition to understanding the neuroanatomy of U. diversus. This is something which we are actively working on and hope to provide in a forthcoming publication. Given the length of this paper as is, we considered that a treatment of the tracts would be better served as an additional paper. Likewise, mapping of the immunoreactive somata of the currently investigated targets is a component which we would like to describe as part of a separate paper, keeping the focus of the current one on neuropils, in order to leverage our aligned volumes to describe co-expression patterns, which is not as useful for the more widely dispersed somata. Furthermore, while we often see somata through immunostaining, the presence and intensity of the signal is variable among immunoreactive populations. We are finding that these populations are more consistently and comprehensively revealed thru fluorescent in situ hybridization.
We appreciate the desire of the reviewers for further information regarding the connectivity and function of the described neuropils, and where possible we have added additional statements and references. That being said, where this context remains sparse is largely a reflection of the lack of information in the literature. This is particularly the case for functional roles for spider neuropils, especially higher order ones of the protocerebrum, which are essentially unexamined. As summarized in the quite recent update to Foelix’s Spider Neuroanatomy, a functional understanding for protocerebral neuropil is really only available for the visual pathway. Consequently, it is therefore also difficult to speak of the implications for presence or absence of particular signaling elements in these neuropils, if no further information about the circuitry or behavioral correlates are available. Finally, multiple reviewers suggested that it might be worthwhile to explore a comparison of the arcuate body layer innervation to that of the central bodies of insects, of which there is a richer literature. This is an idea which we were also initially attracted to, and have now added some lines to the discussion section. Our position on this is a cautious one, as a series of more recent comparative studies spanning many insect species using the same antibody, reveals a considerable amount of variation in central body layering even within this clade, which has given us pause in interpreting how substantive similarities and differences to the far more distant spiders would be. Still, this is an interesting avenue which merits an eventual comprehensive analysis, one which would certainly benefit from having additional examples from more spider species, in order to not overstate conclusions based on the currently limited neuroanatomical representation.
Given our framing for the impetus to advance neuroanatomical knowledge in orb-web builders, the question of whether the present findings inform the circuitry controlling web-building is one that naturally follows. While we are unable with this dataset alone to define which brain areas mediate web-building - something which would likely be beyond any anatomical dataset lacking complementary functional data – the process of assembling the atlas has revealed structures and defined innervation patterns in previously ambiguous sectors of the spider brain, particularly in the protocerebrum. A simplistic proposal is that such regions, which are more conspicuous by our techniques and in this model species, would be good candidates for further inquiries into web-building circuitry, as their absence or oversight in past work could be attributable to the different behavioral styles of those model species. Regardless, granted that such a hypothesis cannot be readily refuted by the existing neuroanatomical literature, underscores the need to have more finely refined models of the spider brain, to which we hope that we have positively contributed to and are gratified by the reviewer’s enthusiasm for the strengths of this study.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
(1) Brenneis 2022 has done a very nice and comprehensive study focused on the visual system - this might be worth including.
Thank you, we have included this reference on Line 34.
(2) L 29: When talking about "connectivity maps", the emerging connectomes based on EM data could be mentioned.
Additional references have been added, thank you. Line 35.
(3) L 99: Please mention that you are going to describe the brain from ventral to dorsal.
Thank you, we have added a comment to Line 99.
(4) L 13: is found at the posterior.
Thank you, revised.
(5) L 168: How did you pick those two proctolin+ somata, given that there is a lot of additional punctate signal?
Although not visible in this image, if you scroll through the stack there is a neurite which extends from these neurons directly to this area of pronounced immunoreactivity.
(6) Figure 1: Please add the names of the neuropils you go through afterwards.
We have added labels for neuropils which are recognizable externally.
(7) Figure 1 and Figure 5: Please mark the esophagus.
Label has now been added to Figure 1. In Figure 5, the esophagus should not really be visible because these planes are just ventral to its closure.
(8) Figure 5A: I did not see any CCAP signal where the arrow points to; same for 5B (ChAT).
In hindsight, the CCAP point is probably too minor to be worth mentioning, so we have removed it.
The ChAT signal pattern in 5B has been reinforced by adding a dashed circle to show its location as well.
(9) L 249: Could the circular spot also be a tract (many tracts lack synapsin - at least in insects)?
Yes, thank you for pointing this out – the sentence is revised (L274). We are currently further analyzing anti-tubulin volumes and it seem that indeed there are tracts which occupy these synapsin-negative spaces, although interestingly they do not tend to account for the entire space.
(10) L 302: Help me see the "conspicuous" thing.
Brace added to Fig. 8B, note in caption.
(11) L 315: Please first introduce the number of the eyes and how these relate to 1{degree sign} and 2{degree sign} pathway. Are these separate pathways from separate eyes or two relay stations of one visual pathway?
We have expanded the introduction to this section (L336). Yes, these are considered as two separate visual pathways, with a typical segregation of which eyes contribute to which pathway – although there is evidence for species-specific differences in these contributions. In the context of this atlas, we are not currently able to follow which eyes are innervating which pathway.
(12) L 343: It seems that the tonsillar neuropil could be midline spanning (at least this is how I interpret the signal across the midline). Would it make sense to re-formulate from a paired structure to midline-spanning? Would that make it another option for being a central complex homolog?
In the spectrum from totally midline spanning and unpaired (e.g., arcuate body (at least in adults)) to almost fully distinct and paired (e.g., mushroom bodies (although even here there is a midline spanning ‘bridge’)), we view the tonsillar to be more paired due to the oval components, although it does have a midline spanning section, particularly unambiguous just posterior to the oval sections.
Regarding central complex homology, if the suggestion is that the tonsillar with its midline spanning component could represent the entire central complex, then this is a possibility, but it would neglect the highly innervated and layered arcuate body, which we think represent a stronger contender – at least as a component of the central complex. For this reason, we would still be partial to the possibility that the tonsillar is a part of the central complex, but not the entire complex.
(13) L 407: ...and dorsal (..) lobe...
Added the word ‘lobe’ to this sentence (L429).
(14) L 620ff: Maybe mention the role of MBs in learning and memory.
A reference has been added at L661.
(15) L 644: In the context of arcuate body homology with the central body, I was missing a discussion of the neurotransmitters expressed in the respective parts in insects. Would that provide additional arguments?
This is an interesting comparison to explore, and is one that we initially considered making as well. There are certainly commonalities that one could point to, particularly in trying to build the case of whether particular lobes of the arcuate body are similar to the fan-shaped or ellipsoid bodies in insects. Nevertheless, something which has given us pause is studying the more recent comparative works between insect species (Timm et al., 2021, J Comp Neuro, Homberg et al., 2023, J Comp Neuro), which also reveal a fair degree of heterogeneity in expression patterns between species – and this is despite the fact that the neuropils are unambiguously homologous. When comparing to a much more evolutionarily distant organism such as the spider, it becomes less clear which extant species should serve as the best point of comparison, and therefore we fear making specious arguments by focusing on similarities when there are also many differences. We have added some of these comments to the discussion (L699-725).
Throughout the text, I frequently had difficulties in finding the panels right away in the structures mentioned in the text. It would help to number the panels (e.g., 6Ai, Aii, Aii,i etc) and refer to those in the text. Further, all structures mentioned in the text should be labelled with arrows/arrowheads unless they are unequivocally identified in the panel
Thank you for the suggestion. We have adopted the additional numbering scheme for panels, and added additional markers where suggested.
Reviewer #2 (Recommendations for the authors):
(1) L 18: "neurotransmitter" should be pluralized.
Thank you, revised (L18).
(2) L 55: Missing the word "the" before "U. diversus".
Thank you, revised (L57).
(3) L 179: Change synaptic dense to "synapse-dense".
Thank you, revised (L189).
(4) L 570: "present in" would be clearer than "presented on in".
Our intention here was to say that Loesel et al did not show slices from the subesophageal mass for CCAP, so it was ambiguous as to whether it had immunoreactivity there but they simply did not present it, or if it indeed doesn’t show signal in the subesophageal. But agreed, this is awkward phrasing which has been revised (L606-608), thank you.
(5) L 641: It would be worth noting that the upper and lower central bodies are referred to as the fan-shaped and ellipsoid bodies in many insects.
Thank you, this has been added in L694.
(6) L 642: Although cited here regarding insect central body layers, Strausfeld et al. 2006 mainly describe the onychophoran brain and the evolutionary relationship between the onychophoran and chelicerate arcuate bodies. The phylogenetic relationships described here would strengthen the discussion in the section titled "A spider central complex?"
The phylogenetic relationship of onychophorans and chelicerates remains controversial and therefore we find it tricky to use this point to advance the argument in that discussion section, as one could make opposing arguments. The homology of the arcuate body (between chelicerates, onychophorans, and mandibulates) has likewise been argued over, with this Strausfeld et al paper offering one perspective, while others are more permissive (good summary at end of Doeffinger et al., 2010). Our thought was simply to draw attention to grossly similar protocerebral neuropils in examples from distantly related arthropods, without taking a stance, as our data doesn’t really deeply advance one view over the other.
(7) L 701- Noduli have been described in stomatopods (Thoen et al., Front. Behav. Neurosci., 2017).
This is an important addition, thank you – it has been incorporated and cited (L766).
(8) Antisera against DC0 (PKA-C alpha) may distinguish globuli cells from other soma surrounding the mushroom bodies, but this may be accomplished in future studies.
Agreed, this is something we have been interested in, but have not yet acquired the antibody.
Reviewer #3 (Recommendations for the authors):
Overall, this paper is both timely and important. However, it may face some resistance from classically trained arthropod neuroanatomists due to the authors' reliance on immunohistochemistry alone. A method to visualize fiber tracts and neuropil morphology would have been a valuable and grounding complement to the dataset and can be added in future publications. Tract-tracing methods (e.g., dextran injections) would strengthen certain claims about connectivity - particularly those concerning the mushroom bodies. For delineating putative cell populations across regions, fluorescence in situ hybridization for key transcripts would offer convincing evidence, especially in the context of the arcuate body, the tonsillar neuropil, and proposed homologies to the insect central complex.
That said, the dataset remains rich and valuable. Outlined below are a number of issues the authors may wish to address. Most are relatively minor, but a few require further clarification.
(1) Abstract
(a) L 12-14: The authors should frame their work as a novel contribution to our understanding of the spider brain, rather than solely as a tool or stepping stone for future studies. The opening sentences currently undersell the significance of the study.
Thank you for your encourament! We have revised the abstract.
(b) Rather than touting "first of its kind" in the abstract, state what was learned from this.
Thank you, we have revised the abstract.
(c) The abstract does not mention the major results of the study. It should state which brain regions were found. It should list all of the peptides and transmitters that were tested so that they can be discoverable in searches.
Thank you, revised.
(2) Introduction
(a) L 38: There's a more updated reference for Long (2016): Long, S. M. (2021). Variations on a theme: Morphological variation in the secondary eye visual pathway across the order of Araneae. Journal of Comparative Neurology, 529(2), 259-280.
Thank you, this has been updated (L41 and elsewhere).
(b) L 47: While whole-mount imaging offers some benefits, a downside is the need for complete brain dissection from the cuticle, which in spiders likely damages superficial structures (such as the secondary eye pathways).
True – we have added this caveat to the section (L48-51).
(c) L 49-52: If making this claim, more explicit comparisons with non-web building C. saeli in terms of neuropil presence, volume, or density later in the paper would be useful.
We do not have the data on hand to make measured comparisons of C. salei structures, and the neuropils identified in this study are not clearly identifiable in the slices provided in the literature, so would likely require new sample preparations. We’ve removed the reference to proportionality and softened this sentence slightly – we are not trying to make a strong claim, but simply state that this is a possibility.
(3) Results
(a) The authors should state how they accounted for autofluorescence.
While we did not explicitly test for autofluorescence, the long process of establishing a working whole-mount immuno protocol and testing antibodies produced many examples of treated brains which did not show any substantial signal. We have added a note to the methods section (L866).
(b) L 69: There is some controversy in delineating the subesophageal and supraesophageal mass as the two major divisions despite its ubiquity in the literature. It might be safer to delineate the protocerebrum, deutocerebrum, and fused postoral ganglia (including the pedipalp ganglion) instead.
Thank you for this insight, we have modified the section, section headings and Figure 1 to account for this delineation as well. We have chosen to include both ways of describing the synganglion, in order to maintain a parallel with the past literature, and to be further accessible to non-specialist readers. L73-77
(c) L 90: It might be useful to include a justification for the use of these particular neuropeptides.
Thank you, revised. L97-99.
(d) L 106 - 108: It is stated that the innervation pattern of the leg neuropils is generally consistent, but from Figure 2, it seems that there are differences. The density of 5HT, Proctolin, ChAT, and FMRFamide seems to be higher in the posterior legs. AstA seems to have a broader distribution in L1 and is absent in L4.
We would still stand by the generalization that the innervation pattern is fairly similar for each leg. The L1 neuropils tend to be bigger than the posterior legs, which might explain the difference in density. Another important aspect to keep in mind is that not all of the leg neuropils appear at the exact same imaging plane as we move from ventral to dorsal. If you scroll through the synapsin stack (ventral to dorsal), you will see that L2 and L3 appear first, followed shortly by L1, and then L4, and at the dorsal end of the subesophageal they disappear in the opposite order. The observations listed here are true for the single z-plane in Figure 2, but the fact that they don’t appear at the same time seems to mainly account for these differences. For example, if you scroll further ventrally in the AstA volume, you will see a very similar innervation appear in L4 as well, even though it is absent in the Fig. 2 plane. We plan to have these individual volumes available from a repository so that they can be individually examined to better see the signal at all levels. At the moment, the entire repository can be accessed here: https://doi.org/10.35077/ace-moo-far.
(e) Figure 1 and elsewhere: The axes for the posterior and lateral views show Lateral and Medial. It would be more accurate to label them Left and Right. because it does not define the medial-to-lateral axis. The medial direction is correct for only one hemiganglion, and it's the opposite for the contralateral side.
Thank you, revised.
(f) In Figures that show particular sections, it might be helpful to include a plane in the standard brain to illustrate where that section is.
Yes, we agree and it was our original intention. It is something we can attempt to do, but there is not much room in the corners of many of the synapsin panels, making it harder to make the 3D representation big enough to be clear.
(g) Figure 2, 3: Presenting the z-section stack separately in B and C is awkward because it makes it seem that they are unrelated. I think it would be better to display the z160-190 directly above its corresponding z230-260 for each of the exemplars in B and C. Since there's no left-right asymmetry, a hemibrain could be shown for all examples as was done for TH in D. It's not clear why TH was presented differently.
Thank you for this suggestion. We rearranged the figure as described, but ultimately still found the original layout to be preferrable, in part because the labelling becomes too cramped. We hope that the potential confusion of the continuity of the B and C sections will be mitigated by focusing on the z plane labels and overall shape – which should suggest that the planes are not far from each other. We trust that the form of the leg neuropils is recognizable in both B and C synapsin images, and so readers will make the connection.
Regarding TH, this panel is apart from the rest because we were unable to register the TH volume to the standard brain because the variant of the protocol which produced good anti-TH staining conflicted with synapsin, and we could not simultaneously have adequate penetration of the synapsin signal. We did not want to align the TH panel with the others to avoid potential confusion that this was a view from the same z-plane of a registered volume, as the others are. We have added a note to the figure caption.
(h) The locations of the labels should be consistent. The antisera are below the images in Figure 2, above in Figure 3, and to the bottom left in Figure 5. The slices are shown above in Figure 2 and below in Figure 3.
Thank you, this has been revised for better consistency.
(i) It is surprising to me that there is no mention of the neuronal somata visible in Figure 2 and Figure 3. A typical mapping of the brain would map the locations of the neurons, not just the neuropils.
Our first arrangement of this paper described each immunostain individually from ventral to dorsal, including locations of the immunoreactive somata which could be observed. To aid the flow of the paper and leverage the aligned volumes to emphasize co-expression in the function divisions of the brain, we re-formulated to this current layout which is organized around neuropils. Somata locations are tricky to incorporate in this format of the paper which focuses on key z-planes or tight max projections, because the relevant immunoreactive somata are more dispersed throughout the synganglion, not always overlapping in neighboring z-planes. Further, since only a minority of the antisera we used can reveal traceable projections from the supplying somata in the whole-mount preparation, we would be quite limited in the degree to which we could integrate the specific somata mapping with expression patterns in the neuropil. Finally, compared to immuno, which can be variable in staining intensity between somata for the same target, we find that FISH reveals these locations more clearly and comprehensively – so while we agree that this mapping would also be useful for the atlas, we would like to better provide this information in a future publication using whole-mount FISH.
(j) L 139: There is a reference to a "brace" in Figure 3B, which does not seem to exist. There's one in Figure 3C.
There is a smaller brace near the bottom of the TDC2 panel in Fig. 3B.
(k) L 151 should be "3D".
Thank you, revised (L160).
(l) Figure 4C: It is not mentioned in the legend that the bottom inset is Proctolin without synapsin.
Thank you, revised (L1213).
(m) L 199: Are the authors sure this subdivision is solely on the anterior-posterior axis? Could it also be dorsal ventral? (i.e., could this be an artifact of the protocerebrum and deutocerebrum?)
Yes, this division can be appreciated to extend somewhat in the dorsal-ventral axis and it is possible that this is the protocerebrum emerging after the deutocerebrum, although this area is largely dorsal to the obvious part of the deutocerebrum. In the horizontal planes there appears to be a boundary line which we use for this subdivision in order to assist in better describing features within this generally ventral part of the protocerebrum – referred to as “stalk” because it is thinner before the protocerebrum expands in size, dorsally. Our intention was more organizational, and as stated in the text, this area is likely heterogenous and we are not suggesting that it has a unified function, so being a visual artifact would not be excluded.
(n) L 249: Could it also indicate large tracts projecting elsewhere?
Yes, definitely, we have evidence that part of the space is occupied by tracts. Revised, thank you (L262).
(o) L 281: Several investigators, including Long (2021,) noted very large and robust mushroom bodies of Nephila.
Thank you – the point is well taken that there are examples of orb-web builders that do have appreciable mushroom bodies. We have added a note in this section (L295), giving the examples of Deinopis spinosa and Argiope trifasciata (Figure 4.20 and 4.22 in Long, 2016).
It looks like these species make the point better than Nephila, as Long lists the mushroom body percentage of total protocerebral volume for D. spinosa as 4.18%, for A. trifasciata as 2.38%, but doesn’t give a percentage for Nephila clavipes (Figure 4.24) and only labels the mushroom bodies structures as “possible” in the figure.
In Long (2021), Nephilidae is described as follows: “In Nephilidae, I found what could be greatly reduced medullae at the caudal end of the laminae, as well as a structure that has many physical hallmarks of reduced mushroom bodies”
(p) L 324: If the authors were able to stain for histamine or supplement this work with a different dissection technique for the dorsal structures, the visual pathways might have been apparent, which seems like a very important set of neuropils to include in a complete brain atlas.
Yes, for this reason histamine has been an interesting target which we have attempted to visualize, but unfortunately have not yet been able to successfully stain for in U. diversus. An additional complication is that the antibodies we have seen call for glutaraldehyde fixation, which may make them incompatible with our approach to producing robust synapsin staining throughout the brain.
We agree that the lack of the complete visual pathway is a substantial weakness of our preparation, and should be amended in future work, but this will likely require developing a modified approach in order to preserve these delicate structures in U. diversus.
(q) L 331: Is this bulbous shape neuropil, or just the remains of neuropil that were not fully torn away during dissection?
This certainly is a severed part of the primary pathway, although it seems more likely that the bulbous shape is indicative of a neuropil form, rather than just being a happenstance shape that occurred during the breakage. We have examples where the same bulbous shape appears on both sides, and in different brains. It is possible that this may be the principal eye lamina – although we did not see co-staining with expected markers in examples where it did appear, so cannot be sure.
(r) L 354: Is tyraminergic co-staining with the protocerebral bridge enough evidence to speculate that inputs are being supplied?
We agree that this is not compelling, and have removed the statement.
(s) L 372: This whole structure appears to be a previously described structure in spiders, the 'protocerebral commissure'.
We are reasonably sure that what we are calling the PCB is a distinct structure from the protocerebral bridge (PCC). In Babu and Barth’s (1984) horizontal slice (Fig. 11b), you can see the protocerebral commissure immediately adjacent to the mushroom body bridge. It is found similarly located in other species, as can be seen in the supplementary 3D files provided by Steinhoff et al., (2024).
While not visible with synapsin in U. diversus, we likewise can make out a commissure in this area in close proximity to the mushroom body bridge using tubulin staining. What we are calling the protocerebral bridge is a structure which is much more dorsal to the protocerebral commissure, not appearing in the same planes as the MB bridge.
(t) L 377: Do you have an intuition why the tonsillar neuropil and the protocerebral bridge would show limited immunoreactivity, while the arcuate body's is quite extensive?
This is an interesting question. Given the degree of interconnection and the fact that multiple classes of neurons in insects will innervate both central body as well as PCB or noduli, perhaps it would be expected that expression in tonsillar and protocerebral bridge should be commensurate to the innervation by that particular neurotransmitter expressing population in the arcuate body. Apart from the fact that the arcuate body is just bigger, perhaps this points to a great role of the arcuate body for integration, whereas the tonsillar and PCB may engage in more particular processing, or be limited to certain sensory modalities.
Interestingly, it seems that this pattern of more limited immunoreactivity in the PCB and noduli compared with the central bodies (fan-shaped/ellipsoid) also appears in insects (Kahsai et al., 2010, J Comp Neuro, Timm et al., 2021, J Comp Neuro, Homberg et al., 2023, J Comp Neuro) – particularly, with almost every target having at least some layering in the fan-shaped body (Kahsai et al., 2010, J Comp Neuro). For example, serotoninergic innervation is fairly consistently seen in the upper and lower central bodies across insects, but its presence in the PCB or noduli is more variable – appearing in one or the other in a species-dependent manner (Homberg et al., 2023, J Comp Neuro).
(4) Discussion
(a) L 556: But if confocal images from slices are aligned, is the 3D shape not preserved?
Yes, fair enough – the point we wanted to make was that there is still a limitation in z resolution depending on the thickness of the slices used, which could obscure structures, but perhaps this is too minor of a comment.
(b) L 597: This is a very interesting result. I agree it's likely to do with the processing of mechanosensory information relevant to web activities, and the mushroom body seems like the perfect candidate for this.
(c) L 638: Worth noting that neuropil volume vs density of synapses might play a role in this, as the literature is currently a bit ambiguous with regards to the former.
Thank you, noted (L689).
(d) L 651: The latter seems far more plausible.
Agreed, though the presence of mushroom bodies appears to be variable in spiders, so we didn’t want to take a strong stance, here.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This valuable study addresses T cell receptor activation during autoreactive T cell development and how the strength of T cell receptor engagement in naïve cells can predispose T cells to develop into effector/memory T cells. The authors lead with solid results that are largely consistent with data in the field suggesting that, in comparison to their counterparts with relatively lower basal self-reactivity, naive CD5hi CD8 T cells in non-obese diabetic (NOD) mice are poised for activation. They propose that diabetogenic T cells are preferentially found among the naive CD5hi CD8 T cell population. While the evidence does not fully support all the authors' conclusions, the data provide a foundation that sets up future studies.
-
Reviewer #1 (Public review):
Summary
In their manuscript, Ho and colleagues investigate the importance of thymically-imprinted self-reactivity in determining CD8 T cell pathogenicity in non-obese diabetic (NOD) mice. The authors describe pre-existing functional biases associated with naive CD8 T cell self-reactivity based on CD5 levels, a well characterized proxy for T cell affinity to self-peptide. They find that naive CD5hi CD8 T cells are poised to respond to antigen challenge; these findings are largely consistent with previously published data on the C57Bl/6 background. The authors go on to suggest that naive CD5hi CD8 T cells are more diabetogenic as 1) the CD5hi naive CD8 T cell receptor repertoire has features associated with autoreactivity and contains a larger population of islet-specific T cells, and 2) the autoreactivity of "CD5hi" monoclonal islet-specific TCR transgenic T cells cannot be controlled by phosphatase over-expression. Thus, they implicate CD8 T cells with relatively higher levels of basal self-reactivity in autoimmunity. The data presented offers valuable insights and sets the foundation for future studies, but some conclusions are not yet fully supported.
Specific comments
There is value in presenting phenotypic differences between naive CD5lo and CD5hi CD8 T cells in the NOD background as most previous studies have used T cells harvested from C57Bl/6 mice or peripheral blood from healthy human donors.
The comparison of a marker of self-reactivity, CD5 in this case, on broad thymocyte populations (DN/DP/CD8SP) is cautioned. CD5 is upregulated with signals associated with b-selection and positive selection; CD5 levels will thus vary even among subsets within these broad developmental intermediates. This is a particularly important consideration when comparing CD5 across thymic intermediates in polyclonal versus TCR transgenic thymocytes due to the striking differences in thymic selection efficiency, resulting in different developmental population profiles. The higher levels of CD5 noted in the DN population of NOD8.3 mice, for example, is likely due to the shift towards more mature DN4 post-b-selection cells. Similarly, in the DP population, the larger population of post-positive selection cells in the NOD8.3 transgenic thymus may also skew CD5 levels significantly. Overall, the reported differences between NOD and NOD8.3 thymocyte subsets could be due largely to differences in differentiation/maturation stage rather than affinity for self-antigen during T cell development. The authors have added some additional text to the revised manuscript that acknowledges some of these limitations.
The lack of differences in CD5 levels of post-positive selection DP thymocytes, CD8 SP thymocytes, and CD8 T cells in the pancreas draining lymph nodes from NOD vs NOD8.3 mice also raises questions about the relevance of this model to address the question of basal self-reactivity and diabetogenicity and the authors' conclusion that "that intrinsic high CD5-associated self-reactivity in NOD8.3 T cells overrides the transgenic Pep-mediated protection observed in dLPC/NOD mice"; the phenotype of the polyclonal and NOD8.3 TCR transgenic CD8 T cells that were analyzed in the (spleen and) pancreas draining lymph nodes is not clear (i.e., are these gated on naive T cells?). Furthermore, the rationale for the comparison with NOD-BDC2.5 mice that carry an MHC II-restricted TCR is unclear.
In reference to the conclusion that transgenic Pep phosphatase does not inhibit the diabetogenic potential of "CD5hi" CD8 T cells, there is some concern that comparing diabetes development in mice receiving polyclonal versus TCR transgenic T cells specific for an islet antigen is not appropriate. The increased frequency and number of antigen specific T cells in the NOD8.3 mice may be responsible for some of the observed differences. Further justification for the comparison is suggested.
The manuscript presents an interesting observation that TCR sequences from CD5hi CD8 T cells may share certain characteristics with diabetogenic T cells found in patients (e.g., CDR3 length), and that autoantigen-specific T cells may be enriched within the CD5hi naive CD8 T cell population. However, the percentage of tetramer-positive cells among naive CD8 T cells appears unusually high in the data presented, and caution is warranted when comparing additional T cell receptor features of self-reactivity/auto-reactivity between CD4 and CD8 T cells.
The counts for the KEGG enrichment pathways presented are relatively low, and the robustness of the analysis should be carefully considered, particularly given that several significance values appear borderline. That said, the differentially expressed genes among CD5lo and CD5hi CD8 T cells are generally consistent with previously published datasets.
The manuscript includes some imprecise wording that may be misleading. For example (not exhaustive): The strength of TCR reactivity to foreign antigen is not "contributed by basal TCR signal" per se but rather correlates with sub-threshold TCR signals necessary for T cell development and survival, CD5 is not broadly expressed on all B cells as the text might suggest but is restricted to a specific subset of B cells, some of the proximal signaling molecules downstream of the preTCR are different than for the mature TCR, upregulation of CD127 at early timepoints post T cell activation is not directly suggestive of their "heightened capabilities in memory T cell homeostasis", etc. The statement "Our study exclusively examined female mice because the disease modeled is relevant in females" should be reconsidered. While the use of female NOD mice can be justified by their higher incidence of diabetes than their male counterparts, the current wording could be misleading.
For clarity and transparency, please consider while additional information is provided in the revised manuscript, gating strategies are not always clear (i.e., naive versus total CD8 T cells), and the age/status of the mice from which cells are harvested (i.e., prediabetic?) is not consistently provided as far as this reviewer noted.
-
Reviewer #2 (Public review):
Summary:
In this study Chia-Lo Ho et al. study the impact of CD5high CD8 T cells in the pathophysiology of type 1 diabetes (T1D) in NOD mice. The authors used high expression of CD5 as a surrogate of high TCR signaling and self-reactivity and compared the phenotype, transcriptome, TCR usage, function and pathogenic properties of CD5high vs. CD5low CD8 T cells extracted from the so-called naive T cell pool. The study shows that CD5high CD8 T cells resemble memory T cells poised for stronger response to TCR stimulation and that they exacerbate disease upon transfer in RAG-deficient NOD mice. The authors attempt to link these features to the thymic selection events of these CD5high CD8 T cells. Importantly, forced overexpression of the phosphatase PTPN22 in T cells attenuated TCR signaling and reduced pathogenicity of polyclonal CD8 T cells but not highly autoreactive 8.3-TCR CD8 T cells.
Strengths:
The study is nicely performed and the manuscript is clearly and well written. Interpretation of the data is careful and fair. The data are novel and likely important. However, some issues would need to be clarified through either text changes or addition of new data.
Weaknesses:
The definition of naïve T cells based solely on CD44low and CD62Lhigh staining may be oversimplistic. Indeed, even within this definition naïve CD5high CD8 T cells express much higher levels of CD44 than CD5low CD8 T cells.
Comments on revisions:
The authors addressed my previous comments thoughtfully and extensively.
-
Reviewer #3 (Public review):
Summary:
In this study, Ho et al. hypothesised that autoreactive T cells receiving enhanced TCR signals during positive selection in the thymus are primed for generating effector and memory T cells. They used CD5 as a marker for TCR signal strength during their selection at the double positive stage. Supporting their hypothesis, naïve T cells with high CD5 proliferated better and expressed markers of T cell activation compared to naïve T cells with lower levels of CD5. Furthermore, results showed that autoimmune diabetes can be efficiently induced after the transfer of naïve CD5 hi T cells compared to CD5 lo T cells. This provided solid evidence in support of their hypothesis that T cells receiving higher basal TCR signaling are primmed to develop into effector T cells. However, all functional characterisation was done on the cells in the periphery and CD5 hi cells in the peripheral lymphoid compartment can receive tonic TCR signaling. Hence, the function of CD5 hi T cells might not be related to development and programming in the thymus. This is a major hurdle in the interpretation of the results and justifying the title of the study. The evidence that transgenic PTPN22 expression could not regulate T cell activation in CD5 hi TCR transgenic autoreactive T cells was weak. Studying T cell development in TCR transgenic mice and looking at TCR downstream signaling could be misleading due to transgenic expression of TCR at all developmental stages.
Strengths:
(1) Demonstrating that CD5 hi cells in naïve CD8 T cell compartment express markers of T cell activation, proliferation and cytotoxicity at a higher level
(2) Using gene expression analysis, study showed CD5 hi cells among naïve CD8 T cells are transcriptionally poised to develop into effector or memory T cells.
(3) Study showed that CD5 hi cells have higher basal TCR signaling compared to CD5 lo T cells.
(4) Key evidence of pathogenicity of autoreactive CD5 hi T cells was provided by doing the adoptive transfer of CD5 hi and CD5 lo CD8 T cells into NOD Rag1-/- mice and comparing them.
Weaknesses:
(1) Although CD5 can be used as a marker for self-reactivity and T cell signal strength during thymic development, it can also be regulated in the periphery by tonic TCR signaling or when T cells are activated by its cognate antigen. Hence, TCR signals in the periphery could also prime the T cells towards effector/memory differentiation. That's why from the evidence presented here it cannot be concluded that this predisposition of T cells towards effector/memory differentiation is programmed due to higher reactivity towards self-MHC molecules in the thymus, as stated in the title.
(2) Flow cytometry data needs to be revisited for the gating strategy, biological controls and interpretation.
(3) Evidence linking CD5 hi cells to more effector phenotype using gene enrichment scores is very weak.
(4) Experiments done in this study did not address why CD5 hi T cells could be negatively regulated in NOD mice when PTPN22 is overexpressed resulting in protection from diabetes but the same cannot be achieved in NOD8.3 mice.
(5) Experimental evidence provided to show that PTPN22 overexpression does not regulate TCR signaling in NOD8.3 T cells is weak.
(6) TCR sequencing analysis does not conclusively show that CD5 hi population is linked with autoreactive T cells. Doing single-cell RNAseq and TCR seq analysis would have helped address this question.
(7) When analysing data from CD5 hi T cells from the pancreatic lymph node, it is difficult to discriminate if the phenotype is just because of T cells that would have just encountered the cognate antigen in the draining lymph node or if it is truly due to basal TCR signaling.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Review #1 (Public review):
Figures 1 through 4 contain data that largely recapitulate published findings (Fulton et al., 2015; Lee et al., 2024; Swee et al., 2016; Dong et al., 2021); it is noted that there is value in confirming phenotypic differences between naive CD5lo and CD5hi CD8 T cells in the NOD background. It is important to contextualize the data while being wary of making parallels with results obtained from CD5lo and CD5hi CD4 T cells. There should also be additional attention paid to the wording in the text describing the data (e.g., the authors assert that, in Figure 4C, the “CD5hi group exhibited higher percentages of CD8+ T cells producing TNF-α, IFN-γ and IL-2” though there is no difference in IL-2 nor consistent differences in TNF-α between the CD5lo and CD5hi population<sup>hi</sup> CD8<sup>+</sup> and CD5<sup>lo</sup>CD8<sup>+</sup> T cells have been previously characterized in other genetic backgrounds. In our study, we aimed to confirm and extend these observations specifically in the autoimmune-prone NOD background, which had not been systematically addressed. Additionally, we carefully reviewed the text describing Figure 4C and revised the wording to accurately reflect the observed data (line 263-264). Specifically, we now state that the CD5<sup>hi</sup> group exhibited higher levels of IFN-γ and a trend toward increased TNF-α, while IL-2 production did not show a significant difference.
The comparison of CD5 across thymocyte populations is cautioned due to variation in developmental stages, particularly in transgenic models. The reported differences may reflect maturation stages rather than self-reactivity.
We appreciate the reviewer’s important point regarding the interpretation of CD5 levels across thymocyte subsets. In our revised manuscript (lines 455–471), we have added clarification that CD5 expression in DN and DP subsets reflects pre-TCR and TCR signaling events during thymic development. We also acknowledge that differences in maturation stages, especially in the NOD8.3 transgenic model, may influence CD5 expression. We now discuss this caveat and interpret our results with caution, particularly emphasizing that our data support but do not sufficiently define their differential self-reactivity.
The conclusion that PTPN22 overexpression does not inhibit the diabetogenic potential of CD5<sup>hi</sup>CD8<sup>+</sup> T cells is potentially confounded by differences between polyclonal and TCR transgenic systems.
We thank the reviewer for raising this concern. We acknowledge that this system introduces confounders due to differences in precursor frequencies and clonal expansion compared to polyclonal repertoires. These differences may affect the responsiveness to phosphatase-mediated attenuation of signaling. Therefore, while our results support that high-affinity autoreactive CD8<sup>+</sup> T cells may be less sensitive to PTPN22 overexpression, we do not claim that this finding generalizes to all autoreactive CD8<sup>+</sup> T cells. Rather, it highlights a potential inability of peripheral tolerance in T cells with strong intrinsic self-reactivity.
TCR sequencing data shows variability; is this representative of the overall repertoire?
We appreciate the reviewer’s comment. We acknowledge that data from bulk TCR sequencing has potential limitations, including variability across experiments and limited resolution at the clonotype level. To improve representativeness and reduce sampling bias, we performed TCR repertoire analysis in two independent experiments. In each experiment, naïve CD5<sup>hi</sup> CD8<sup>+</sup> and CD5<sup>lo</sup>CD8<sup>+</sup> T cells were sorted from pooled peripheral lymph nodes of at least 20 individual NOD mice per group. This approach allowed us to capture a broader range of clonotypes and ensured that the resulting repertoire profiles reflect the characteristics of the overall CD5<sup>hi</sup> and CD5<sup>lo</sup> populations, rather than isolated outliers. Despite some variability, we observed consistent trends in key features, such as shorter CDR3β length, altered TRAV/TRBV usage and reduced diversity in the CD5<sup>hi</sup> subset across both experiments. To enhance resolution and directly assess clonotype-specific reactivity, we plan to perform single-cell RNA and TCR sequencing in future studies, as noted in the revised Discussion (lines 466–471).
Clarifications are requested regarding naive gating, controls, gMFI reporting, and missing methods.
We thank the reviewer for these specific suggestions. We have revised figure legends to better describe gating strategies and included appropriate controls in Figures or Supplementary Figures. Regarding gMFI reporting, we have now shown in the figure legends whether values are reported as gMFI. Additionally, we have added the missing methods for cytokine staining, EdU incorporation, overlapped count matrix construction and TCR repertoire diversity metrics.
Review #2 (Public review):
Summary Comment:
The study is nicely performed, but the definition of naive T cells using only CD44 and CD62L may be oversimplified. CD5hi naive T cells express higher CD44 than CD5lo cells.
We thank the reviewer for the critical evaluation and thoughtful comment. As noted, we defined naïve CD8<sup>+</sup> T cells using a well-established gating strategy based on CD44<sup>lo</sup> and CD62L<sup>hi</sup> expression, consistent with previous studies (Immunity. 2010; 32(2):214–26; Nat Immunol. 2015; 16(1):107–17). We acknowledge that CD44 is expressed along a continuum, and indeed, within the naïve gate, CD5<sup>hi</sup> CD8<sup>+</sup> T cells exhibited slightly higher CD44 levels compared to their CD5<sup>lo</sup> counterparts. However, both subsets remained well below the CD44 expression observed in conventional effector/memory CD8<sup>+</sup> T cells, supporting their classification as naïve. To further validate this, we assessed additional markers associated with activation and memory differentiation, including CD69, PD-1, KLRG1 and CD25. These analyses confirmed that the sorted CD5<sup>hi</sup> and CD5<sup>lo</sup> populations retained a phenotypically naïve profile while exhibiting meaningful differences in baseline activation readiness (Figure 1F).
Review #3 (Public review):
CD5 can be regulated by peripheral signals. Therefore, it cannot be concluded that predisposition to effector/memory differentiation is solely programmed in the thymus.
We thank the reviewer for this important point. We agree that CD5 expression can be dynamically regulated in the periphery by tonic TCR signals and antigen encounter, as also reflected in our own data that cells with high CD5 level display elevated activation potential upon encountering antigen (e.g., Figure 3L). To minimize the confounding effects of pre-existing peripheral activation, we performed an adoptive T cell transfer experiment (Figure 4). In this experiment, naïve CD5<sup>hi</sup>CD<sup>+</sup>and CD5<sup>lo</sup>CD8<sup>+</sup>T cells were sorted from the peripheral lymph nodes of young (6–8-week-old) prediabetic NOD mice and transferred into NOD Rag1<sup>–/–</sup> recipients. After 4 weeks, we compared the disease phenotypes and functional profiles of CD8<sup>+</sup> T cells from these two groups. This approach allowed us to evaluate the stability and differentiation capacity of CD5<sup>hi</sup> versus CD5<sup>lo</sup> cells in a lymphopenic environment, while excluding the possibility that the observed differences were due to already activated CD8<sup>+</sup>T cells at the time of isolation. We have revised the Discussion (lines 440–450) to acknowledge these experimental limitations and clarify that, while our findings demonstrate functional differences between CD5<sup>hi</sup>CD8<sup>+</sup> and CD5<sup>lo</sup>CD8<sup>+</sup>T cells, we cannot fully exclude contributions from peripheral influences.
Experiments do not explain why PTPN22 overexpression protects in polyclonal T cells but not in NOD8.3 mice.
We appreciate this critical comment. Our findings support that autoreactive T cells with high-affinity TCRs as in NOD8.3 mice receive strong signaling that even PTPN22 overexpression is insufficient to attenuate their activation and effector function. We acknowledge that further mechanistic studies are needed to fully elucidate the differential effects of PTPN22 in polyclonal versus TCR-transgenic settings.
Evidence that PTPN22 does not regulate TCR signaling in NOD8.3 T cells is weak.
We thank the reviewer for this critical comment. Our data show that NOD8.3 T cells with an intrinsic high CD5-associated self-reactivity are more resistant to transgenic Pep-mediated change in the phosphorylation status of TCR signaling molecules CD3ζ and Erk and CD5 expression (Figure 6, B-D). However, we agree that additional functional assays would strengthen this conclusion.
TCR sequencing does not conclusively link CD5hi cells with autoreactivity; single-cell analysis is needed.
We agree with this critical comment. Bulk TCR sequencing revealed repertoire features associated with autoreactivity, but cannot definitively link specific TCRs to function. We have acknowledged this in the discussion (lines 466–471) and highlighted plans to perform single-cell analysis.
CD5hi cells in the PLNs may reflect antigen exposure rather than basal signaling.
We thank the reviewer for this insightful comment. As also noted in Figure 3L, CD5 expression can be influenced by peripheral tonic TCR signals and recent antigen exposure. To minimize the contribution of peripheral activation, we particularly characterized naïve CD8<sup>+</sup>T cells isolated from the peripheral lymph nodes of young (6–8-week-old) prediabetic NOD mice before the onset of overt autoimmunity. Furthermore, we performed an adoptive transfer experiment (Figure 4) using sorted naïve CD5<sup>hi</sup>CD8<sup>+</sup> and CD5<sup>lo</sup>CD8<sup>+</sup>T cells from these mice and characterized their disease phenotype after 4 weeks in lymphopenic NOD Rag1<sup>–/–</sup> recipients and evaluated the effector function of CD8<sup>+</sup>T cells. This approach allowed us to compare the differentiation potential of these subsets in a controlled setting, independent of their activation status at the time of isolation. We have revised the Discussion (lines 440–450) to emphasize that, while our data support functional differences between CD5<sup>hi</sup>CD8<sup>+</sup> and CD5<sup>lo</sup>CD8<sup>+</sup>T cells, we cannot fully exclude the role of peripheral cues in shaping CD5 expression.
Provide proper gating controls and representative flow plots.
We thank the reviewer for this comment. We have revised figure legends to better describe gating strategies and included representative flow cytometry plots and appropriate gating controls in Figures or Supplementary Figures.
Recommendations for the authors:
Reviewer #1 (Recommendations For The authors):
(1) The figure presentation is inconsistent and the labels/font are often too small to read easily.
As Reviewer suggested, the figure presentation has been revised for consistency. Labels and fonts have been adjusted for improved readability. Specific figures that were difficult to read have been reformatted with larger fonts and clearer legends.
(2) A careful review of the text to ensure clarity of the content is suggested (e.g., “gratitude” at line 91, “were generally lied” at line 123).
Thanks for Reviewer’s comments. The text has been carefully reviewed for clarity and grammatical accuracy. Corrections have been made, including changing “gratitude” to “magnitude” (line 47) and “were generally lied” to “fell between” (line 79).
Reviewer #2 (Recommendations For The Authors):
(1) The definition of naïve T cells based solely on CD44low and CD62Lhigh staining may be oversimplistic. Indeed, even within this definition, naïve CD5high CD8 T cells express much higher levels of CD44 than CD5low CD8 T cells.
Thanks for Reviewer’s comments. We used a literature-supported gating strategy (Immunity. 2010; 32(2):214–26; Nat Immunol. 2015; 16(1):107–17) to define naïve T cells based on CD44<sup>low</sup> and CD62L<sup>high</sup> expression. It is important to note that CD44 expression exists along a continuum. While we were initially surprised to observe that CD5<sup>lo</sup>CD8<sup>+</sup>T cells expressed relatively higher levels of CD44 than CD5<sup>lo</sup>CD8<sup>+</sup>T cells within the naïve gate, both populations still exhibited significantly lower CD44 expression compared to conventional effector/memory CD8<sup>+</sup>T cells. To further validate the distinction between CD5<sup>hi</sup> and CD5 subsets, we also examined additional markers such as CD69, PD1, KLRG1 and CD25, which supported their phenotypic differences within the naïve compartment (Figure 1F).
(2) Figure 1G should show the proportion of IGRP-tetramer+ in the three groups of CD8 T cells. Additionally, it would be useful to assess reactivity against a pool of other islet autoantigens using a similar strategy.
As suggested by the reviewer, the revised manuscript now includes additional data showing the proportion of IGRP-tetramer+ cells (Supplementary Figure 1D), as well as reactivity against another islet autoantigen, insulin-1/insulin-2 (Insulin B15–23) (Supplementary Figure 1E). The description of these results, including the proportions of IGRP-tetramer<sup>+</sup> and Insulin B15–23<sup>+</sup> CD8<sup>+</sup>Tcells, has been added to lines 126–129 of the revised manuscript.
(3) The resolution of Figure 2 is suboptimal and at places poorly visible. Figure 2D is stated to show “two significant pathways stand out.” In fact, the data are barely significant, and the authors may want to correct their statement.
The resolution of Figure 2 has been improved. As Reviewer suggested, the text has been revised to state “two potential pathways stand out” (line 187) instead of “two significant pathways stand out”.
(4) Figure 3C-F and 3H, showing fold change over baseline values would be much easier for the reader to grasp the data.
As Reviewer suggested, data in Figures 3C-F and 3H now are shown in fold change over baseline values for clarity. Baseline gMFI is the mean of each group (total CD<sup>+</sup> , CD5<sup>hi</sup>CD8<sup>+</sup> and CD5<sup>lo</sup>CD8<sup>+</sup>) at 0 μg/ml anti-CD3, with fold changes calculated for stimulation conditions (0.625-10 μg/ml anti-CD3). The figure legend has been updated accordingly.
(5) Figure 4A, it would be much more valuable to show the diabetes frequency upon transfer of CD25- CD4 T cells alone and upon transfer of CD5high CD8 T cells alone. The word “spontaneous” in the Figure 4A legend seems inappropriate.
Thanks for the Reviewer’s comment. We apologize for not including the data for the CD25 CD4<sup>+</sup> T cell transfer group in the original manuscript. While this group was part of our initial experimental design, we had considered it a control group and unintentionally omitted it from the figure. The revised manuscript now includes this group in Figure 4A. In addition, the term “spontaneous” has been replaced with “diabetes incidence” in the Figure 4A legend and manuscript (line 248). Regarding the suggestion to assess CD5<sup>hi</sup>CD8<sup>+</sup>T cells transfer alone, we appreciate the Reviewer’s point. However, previous studies have shown that CD8<sup>+</sup> T cells alone are not effective and sufficient to induce diabetes in adoptive transfer models, and that effective β-cell destruction typically requires both CD4<sup>+</sup> and CD8<sup>+</sup> T cell subsets. For instance, Christianson et al. (1993) demonstrated that enriched CD8<sup>+</sup> T cells from NOD mice fail to transfer diabetes on their own, while CD4<sup>+</sup> T cells—particularly from diabetic donors—can induce disease only under specific conditions and are significantly potentiated by co-transfer of CD8<sup>+</sup>cells. These findings have contributed to the widely available standard of co-transferring both subsets when studying diabetogenic potential in NOD models (Diabetes. 1993;42(1):44–55).
(6) Line 257-258, please remove “indicating superior in vivo proliferation by the CD5hi subset.” Indeed, several other possibilities may explain the phenotype, including survival, migration, etc.
As Reviewer suggested, the phrase “indicating superior in vivo proliferation by the CD5<sup>hi</sup> subset” has been replaced with “implying increased expansion and activation/effector potential” (line 261).
(7) Figure 5A, it is unclear to this referee what is the significance of CD5 and pCD3zeta expression on DN thymocytes. Do these cells express rearranged alpha/beta TCR? Is it signaling through pre-TCRalpha/TCRbeta pairs?
Thanks a lot for this important question. In the revised manuscript, we have expanded the discussion (line 455–471) to address the developmental significance of CD5 and pCD3ζ expression on DN thymocytes. CD5 expression at this stage reflects pre-TCR signaling strength during early selection, which occurs following successful TCRβ rearrangement. The associated phosphorylation of CD3ζ indicates activation of downstream signaling through the pre-TCRα/TCRβ complex. As discussed in the revised text, these early signals play a critical role in determining lineage progression and self-reactivity tuning. We now acknowledge that signaling at the DN stage occurs through the pre-TCRα/TCRβ heterodimer, not a fully rearranged αβ TCR, and that CD5 expression serves as a marker of the strength of these initial pre-selection signals (Sci Signal. 2022;15(736):eabj9842.). These developmental checkpoints are essential for calibrating TCR sensitivity and ensuring proper thymocyte maturation. This has been clarified in the revised discussion (line 455–471).
(8) Figure 5F, could the DP TCRbeta- CD69- thymocytes from 8.3-TCR NOD mice already express low levels of the self-reactive TCR at this stage to explain their high expression of CD5? Addressing the question experimentally would be useful.
Thanks a lot for this useful comment. According to a review by Huseby et al. (2022), expression of a functional TCRβ chain begins at the DN3 stage, initiating progression through the β-selection checkpoint. This is followed by TRAV locus recombination, resulting in the generation of αβ TCR-expressing double-positive 1 (DP-1) thymocytes. At the DP-1 stage, the quality of TCR signaling driven by self-pMHC interactions governs both positive and negative selection, as well as the development of nonconventional T cell lineages. We hypothesize that in transgenic NOD8.3 mice, which express pre-rearranged Tcra and Tcrb transgenes derived from the islet-reactive CD8<sup>+</sup>T cell clone NY8.3, thymocytes undergo allelic exclusion and lack the clonal diversity seen in non-transgenic mice. As a result, NOD8.3 thymocytes may receive strong TCR signals from early developmental stages (DN3 and DP-1) even without undergoing normal selection checkpoints. While the elevated TCR signal observed in NOD8.3 is indeed artificial, this model provides a unique system to test our hypothesis—namely, whether a strongly self-reactive TCR can generate high basal signaling during thymic development that overrides the negative regulatory effects of phosphatases like Pep. This possibility has been acknowledged in the revised Discussion section, along with a plan to validate the hypothesis experimentally (line 455–471).
(9) Figure 7, single-cell TCR-seq would be much more appropriate to tackle the question of self-reactivity of CD5hi vs. CD5low CD8 T cells.
Thanks a lot for this useful comment. The limitations of bulk TCR-seq are acknowledged, and single-cell TCR-seq is proposed as a future direction (line 455–471).
Note, for Reviewer #2 (Recommendations For The Authors) (7) (8) (9), the discussion paragraphs are included to address the reviewers’ questions (line 455–471).
Reviewer #3 (Recommendations For The Authors):
(1) Positive controls (activated T cells from PLN or spleen), gating controls (whole naïve T cells), and representative flow-cytometry plots are needed for T-bet, EOMES, GzmB, and cytokine staining in Figure 1.
As Reviewer suggested, we added representative gating controls for T-bet, EOMES, GzmB and cytokine staining in Supplementary Figure 1 of revised manuscript.
(2) For Figure 1F, MFI for activation markers for the CD44hiCD62Llo cells should be provided for the comparison of PLN data.
As Reviewer suggested, MFI data for these markers have been included in Figure 1F of revised manuscript.
(3) In many places and figure legends, it is not mentioned from which organ cells were collected, i.e., spleen or PLN.
As Reviewer suggested, the origin of cells for each experiment has been explicitly indicated in the figure legends or figure content to ensure clarity.
(4) In the pancreatic lymph node, autoreactive T cells might be upregulating CD5 because they are encountering antigens. This should be addressed in the discussion.
As Reviewer suggested, this issue has been included in the discussion of revised manuscript (line 440-450).
(5) It is not clear if T cells from the spleen and PLN were stimulated to detect the production of pro-inflammatory cytokines.
Thanks for the critical comment. The stimulation protocol and cytokine staining method have been added to the Supplementary material’s Supplementary methods section Cytokine staining in revised manuscript.
(6) Figure 4C-D: It is not clear if analysis was done on naïve T cells or if they were stimulated.
Thanks for the comment. Additionally, the stimulation and cytokine staining methods used in Figure 4C-D have been described in detail in the Supplementary Materials section Cytokine staining of revised manuscript.
(7) IGRP gating in Figure 4F should be revisited with negative controls.
Thanks for the critical comment. Negative controls have been added and used to adjust IGRP gating, and this is now mentioned in the figure legend of revised manuscript.
(8) Interpretation that only CD5hi cells form a central memory T cell population (Figure 4F) could be misleading.
Thanks for this valuable comment. We agree with that in conventional CD8<sup>+</sup> T cell immune responses, both CD5<sup>hi</sup> and CD5<sup>lo</sup> subsets have the potential to differentiate into central memory T cells. In our experimental approach, we adoptively transferred sorted CD5<sup>hi</sup>CD8<sup>+</sup> or CD5<sup>lo</sup>CD8<sup>+</sup>cells into Rag1<sup>-/-</sup> recipients and specifically analyzed PLNs four weeks after transfer. Using CD44 and CD62L expression as conventional markers for central memory T cells, we barely observed a CD44<sup>hi</sup>CD62L<sup>hi</sup> population in CD5<sup>lo</sup>CD8<sup>+</sup>transferred group. Based on these results, we stated: “This analysis underscores that the central memory T cell population and the frequency of islet autoantigen-specific CD8<sup>+</sup>T cells are higher in the CD5<sup>hi</sup> transferred subset within the PLNs, implying more robust immune responses initiated by the CD5<sup>hi</sup>cells” (line 272–274). Importantly, we did not intend to imply that only CD5<sup>hi</sup> cells can form central memory T cells, but rather that they were more enriched for this phenotype under the specific conditions and time point analyzed.
(9) IL-2 gating representative plot should be provided for Figure 5A.
As Reviewer suggested, a representative IL-2 gating plot has been included in the revised Supplementary Figure 3B.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This important study demonstrates that in Drosophila melanogaster, tachykinin (Tk) expression is regulated by the microbiota. The authors present convincing evidence that axenic flies raised with no microbiota are longer-lived than conventionally reared animals, and that Tk expression and Tk receptors in the nervous system are required for this effect. They further test individual bacterial strains for their role in these effects and connect the effect to loss of lipid stores and suggest that FOXO may be involved in the phenotype, results that are of interest to the fields of environmental perception, host microbiome interactions, and geroscience.
[Editors' note: this paper was reviewed by Review Commons.]
-
Reviewer #1 (Public review):
Summary:
In this study the authors use a Drosophila model to demonstrate that Tachykinin (Tk) expression is regulated by the microbiota. In Drosophila conventionally reared (CR) flies are typically shorter lived than those raised without a microbiota (axenic). Here, knockdown of Tk expression is found to prevent lifespan shortening by the microbiota and the reduction of lipid stores typically seen in CR flies when compared to axenic counterparts. It does so without reducing food intake or fecundity which are often seen as necessary trade-offs for lifespan extension. Further, the strength of the interaction between Tk and the microbiota is found to be bacteria specific and is stronger in Acetobacter pomorum (Ap) mono-associated flies compared to Levilactobacillus brevis (Lb) mono-association. The impact on lipid storage was also only apparent in Ap-flies.
Building on these findings the authors show that gut specific knockdown is largely sufficient to explain these phenotypes. Knockdown of the Tk receptor, TkR99D, in neurons recapitulates the lifespan phenotype of intestinal Tk knockdown supporting a model whereby Tk from the gut signals to TkR99D expressing neurons to regulate lifespan. In addition, the authors show that FOXO may have a role in lifespan regulation by the Tk-microbiota interaction. However, they rule out a role for insulin producing cells or Akh-producing cells suggesting the microbiota-Tk interaction regulates lifespan through other, yet unidentified, mechanisms.
Major comments:
Overall, I find the key conclusions of the paper convincing. The authors present an extensive amount of experimental work, and their conclusions are well founded in the data. In particular, the impact of TkRNAi on lifespan and lipid levels, the central finding in this study, has been demonstrated multiple times in different experiments and using different genetic tools. As a result, I don't feel that additional experimental work is necessary to support the current conclusions.
However, I find it hard to assess the robustness of the lifespan data from the other manipulations used (TkR99DRNAi, TkRNAi in dFoxo mutants etc.) because information on the population size and whether these experiments have been replicated is lacking. Can the authors state in the figure legends the numbers of flies used for each lifespan and whether replicates have been done? For all other data it is clear how many replicates have been done, and the methods give enough detail for all experiments to be reproduced.
Significance:
Overall, I find the key conclusions of the paper convincing. The authors present an extensive amount of experimental work, and their conclusions are well founded in the data. We have known that the microbiota influence lifespan for some time but the mechanisms by which they do so have remained elusive. This study identifies one such mechanism and as a result opens several avenues for further research. The Tk-microbiota interaction is shown to be important for both lifespan and lipid homeostasis, although it's clear these are independent phenotypes. The fact that the outcome of the Tk-microbiota interaction depends on the bacterial species is of particular interest because it supports the idea that manipulation of the microbiota, or specific aspects of the host-microbiota interaction, may have therapeutic potential.
These findings will be of interest to a broad readership spanning host-microbiota interactions and their influence on host health. They move forward the study of microbial regulation of host longevity and have relevance to our understanding of microbial regulation of host lipid homeostasis. They will also be of significant interest to those studying the mechanisms of action and physiological roles of Tachykinins.
Field of expertise: Drosophila, gut, ageing, microbiota, innate immunity
-
Reviewer #2 (Public review):
Summary:
The main finding of this work is that microbiota impacts lifespan though regulating the expression of a gut hormone (Tk) which in turn acts on its receptor expressed on neurons. This conclusion is robust and based on a number of experimental observations, carefully using techniques in fly genetics and physiology: 1) microbiota regulates Tk expression, 2) lifespan reduction by microbiota is absent when Tk is knocked down in gut (specifically in the EEs), 3) Tk knockdown extends lifespan and this is recapitulated by knockdown of a Tk receptor in neurons. These key conclusions are very convincing. Additional data are presented detailing the relationship between Tk and insulin/IGF signalling and Akh in this context. These are two other important endocrine signalling pathways in flies. The presentation and analysis of the data are excellent.
There are only a few experiments or edits that I would suggest as important to confirm or refine the conclusions of this manuscript. These are:
(1) When comparing the effects of microbiota (or single bacterial species) in different genetic backgrounds or experimental conditions, I think it would be good to show that the bacterial levels are not impacted by the other intervention(s). For example, the lifespan results observed in Figure 2A are consistent with Tk acting downstream of the microbes but also with Tk RNAi having an impact on the microbiota itself. I think this simple, additional control could be done for a few key experiments. Similarly, the authors could compare the two bacterial species to see if the differences in their effects come from different ability to colonise the flies.
(2) The effect of Tk RNAi on TAG is opposite in CR and Ax or CR and Ap flies, and the knockdown shows an effect in either case (Figure 2E, Figure 3D). Why is this? Better clarification is required.
(3) With respect to insulin signalling, all the experiments bar one indicate that insulin is mediating the effects of Tk. The one experiment that does not is using dilpGS to knock down TkR99D. Is it possible that this driver is simply not resulting in an efficient KD of the receptor? I would be inclined to check this, but as a minimum I would be a bit more cautious with the interpretation of these data.
(4) Is it possible to perform at least one lifespan repeat with the other Tk RNAi line mentioned? This would further clarify that there are no off-target effects that can account for the phenotypes.
There are a few other experiments that I could suggest as I think they could enrich the current manuscript, but I do not believe they are essential for publication:
(5) The manuscript could be extended with a little more biochemical/cell biology analysis. For example, is it possible to look at Tk protein levels, Tk levels in circulation, or even TkR receptor activation or activation of its downstream signalling pathways? Comparing Ax and CR or Ap and CR one would expect to find differences consistent with the model proposed. This would add depth to the genetic analysis already conducted. Similarly, for insulin signalling - would it be possible to use some readout of the pathway activity and compare between Ax and CR or Ap and CR?
(6) The authors use a pan-acetyl-K antibody but are specifically interested in acetylated histones. Would it be possible to use antibodies for acetylated histones? This would have the added benefit that one can confirm the changes are not in the levels of histones themselves.
(7) I think the presentation of the results could be tightened a bit, with fewer sections and one figure per section.
Significance:
The main contribution of this manuscript is the identification of a mechanism that links the microbiota to lifespan. This is very exciting and topical for several reasons:
(1) The microbiota is very important for overall health but it is still unclear how. Studying the interaction between microbiota and health is an emerging, growing field, and one that has attracted a lot of interest, but one that is often lacking in mechanistic insight. Identifying mechanisms provides opportunities for therapies. The main impact of this study comes from using the fruit fly to identify a mechanism.
(2) It is very interesting that the authors focus on an endocrine mechanism, especially with the clear clinical relevance of gut hormones to human health recently demonstrated with new, effective therapies (e.g. Wegovy).
(3) Tk is emerging as an important fly hormone and this study adds a new and interesting dimension by placing TK between microbiota and lifespan.
I think the manuscript will be of great interest to researchers in ageing, human and animal physiology and in gut endocrinology and gut function.
-
Reviewer #3 (Public review):
Summary:
Marcu et al. demonstrate a gut-neuron axis that is required for the lifespan-shortening effects mediated by gut bacteria. They show that the presence of commensal bacteria-particularly Acetobacter pomorum-promotes Tk expression in the gut, which then binds to neuronal tachykinin receptors to shorten lifespan. Tk has also recently been reported to extend lifespan via adipokinetic hormone (Akh) signaling (Ahrentløv et al., Nat Metab 7, 2025), but the mechanism here appears distinct. The lifespan shortening by Ap via Tk seems to be partially dependent on foxo and independent of both insulin signaling and Akh-mediated lipid mobilization.
Although the detailed mechanistic link to lifespan is not fully resolved, the experiment and its results clearly show the involvement of the molecules tested. This work adds a valuable dimension to our growing understanding of how gut bacteria influence host longevity. However, there are some points that should be addressed.
(1) Tk+ EEC activity should be assessed directly, rather than relying solely on transcript levels. Approaches such as CaLexA or GCaMP could be used.
(2) In Line243, the manuscript states that the reporter activity was not increased in the posterior midgut. However, based on the presented results in Fig4E, there is seemingly not apparent regional specificity. A more detailed explanation is necessary.
(3) If feasible, assessing foxo activation would add mechanistic depth. This could be done by monitoring foxo nuclear localization or measuring the expression levels of downstream target genes.
(4) Fig1C uses Adh for normalization. Given the high variability of the result, the authors should (1) check whether Adh expression levels changed via bacterial association and/or (2) compare the results using different genes as internal standard.
(5) While the difficulty of maintaining lifelong axenic conditions is understandable, it may still be feasible to assess the induction of Tk (i.e.. Tk transcription or EE activity upregulation) by the microbiome on males.
(6) We also had some concerns regarding the wording of the title.<br /> Fig6B and C suggests that TkR86C, in addition to TkR99D, may be involved in the A. pomorum-lifespan interaction. Consider revising the title to refer more generally to the "tachykinin receptor" rather than only TkR99D.<br /> The difference between "aging" and "lifespan" should also be addressed. While the study shows a role for Tk in lifespan, assessment of aging phenotypes (e.g. Climbing assay, ISC proliferation) beyond the smurf assay is required to make conclusions about aging.
(7) The statement in Line 82 that EEs express 14 peptide hormones should be supported with an appropriate reference, if available.
Significance:
General assessment: The main strength of this study is the careful and extensive lifespan analyses, which convincingly demonstrate the role of gut microbiota in regulating longevity. The authors clarify an important aspect of how microbial factors contribute to lifespan control. The main limitation is that the study primarily confirms the involvement of previously reported signaling pathways, without identifying novel molecular players or previously unrecognized mechanisms of lifespan regulation.
Advance: The lifespan-shortening effect of Acetobacter pomorum (Ap) has been reported previously, as has the lifespan-shortening effect of Tachykinin (Tk). However, this study is the first to link these two factors mechanistically, which represents a significant and original contribution to the field. The advance is primarily mechanistic, providing new insight into how microbial cues converge on host signaling pathways to influence ageing.
Audience: This work will be of particular interest to a specialized audience of basic researchers in ageing biology. It will also attract interest from microbiome researchers who are investigating host-microbe interactions and their physiological consequences. The findings will be useful as a conceptual framework for future mechanistic studies in this area.
Field of expertise: Drosophila ageing, lifespan, microbiome, metabolism
-
Author response:
(1) General Statements
The goal of our study was to mechanistically connect microbiota to host longevity. We have done so using a combination of genetic and physiological experiments, which outline a role for a neuroendocrine relay mediated by the intestinal neuropeptide Tachykinin, and its receptor TkR99D in neurons. We also show a requirement for these genes in metabolic and healthspan effects of microbiota.
The referees' comments suggest they find the data novel and technically sound. We have added data in response to numerous points, which we feel enhance the manuscript further, and we have clarified text as requested. Reviewer #3 identified an error in Figure 4, which we have rectified. We felt that some specific experiments suggested in review would not add significant further depth, as we articulate below.
Altogether our reviewers appear to agree that our manuscript makes a significant contribution to both the microbiome and ageing fields, using a large number of experiments to mechanistically outline the role(s) of various pathways and tissues. We thank the reviewers for their positive contributions to the publication process.
(2) Description of the planned revisions
Reviewer #2:
Not…essential for publication…is it possible to look at Tk protein levels?
We have acquired a small amount of anti-TK antibody and we will attempt to immunostain guts associated with A. pomorum and L. brevis. We are also attempting the equivalent experiment in mouse colon reared with/without a defined microbiota. These experiments are ongoing, but we note that the referee feels that the manuscript is a publishable unit whether these stainings succeed or not.
(3) Description of the revisions that have already been incorporated in the transferred manuscript
Reviewer #1:
Can the authors state in the figure legends the numbers of flies used for each lifespan and whether replicates have been done?
We have incorporated the requested information into legends for lifespan experiments.
Do the interventions shorten lifespan relative to the axenic cohort? Or do they prevent lifespan extension by axenic conditions? Both statements are valid, and the authors need to be consistent in which one they use to avoid confusing the reader.
We read these statements differently. The only experiment in which a genetic intervention prevented lifespan extension by axenic conditions is neuronal TkR86C knockdown (Figure 6B-C). Otherwise, microbiota shortened lifespan relative to axenic conditions, and genetic knockdowns extend blocked this effect (e.g. see lines 131-133). We have ensured that the framing is consistent throughout, with text edited at lines 198-199, 298-299, 311-312, 345-347, 407-408, 424-425, 450, 497-503.
TkRNAi consistently reduces lipid levels in axenic flies (Figs 2E, 3D), essentially phenocopying the loss of lipid stores seen in control conventionally reared (CR) flies relative to control axenic. This suggests that the previously reported role of Tk in lipid storage - demonstrated through increased lipid levels in TkRNAi flies (Song et al (2014) Cell Rep 9(1): 40) - is dependent on the microbiota. In the absence of the microbiota TkRNAi reduces lipid levels. The lack of acknowledgement of this in the text is confusing
We have added text at lines 219-222 to address this point. We agree that this effect is hard to interpret biologically, since expressing RNAi in axenics has no additional effect on Tk expression (Figure S7). Consequently we can only interpret this unexpected effect as a possible off-target effect of RU feeding on TAG, specific to axenic flies. However, this possibility does not void our conclusion, because an off-target dimunition of TAG cannot explain why CR flies accumulate TAG following Tk<sup>RNAi</sup> induction. We hope that our added text clarifies.
I have struggled to follow the authors logic in ablating the IPCs and feel a clear statement on what they expected the outcome to be would help the reader.
We have added the requested statement at lines 423-424, explaining that we expected the IPC ablation to render flies constitutively long-lived and non-responsive to A pomorum.
Can the authors clarify their logic in concluding a role for insulin signalling, and qualify this conclusion with appropriate consideration of alternative hypotheses?
We have added our logic at lines 449-454. In brief, we conclude involvement for insulin signalling because FoxO mutant lifespan does not respond to Tk<sup>RNAi</sup>, and diminishes the lifespan-shortening effect of A. pomorum. However, we cannot state that the effects are direct because we do not have data that mechanistically connects Tk/TkR99D signalling directly in insulin-producing cells. The current evidence is most consistent with insulin signalling priming responses to microbiota/Tk/TkR99D, as per the newly-added text.
Typographical errors
We have remedied the highlighted errors, at lines 128-140.
Reviewer #2:
it would be good to show that the bacterial levels are not impacted [by TkRNAi]
We have quantified CFUs in CR flies upon ubiquitous TkRNAi (Figure S5), finding that the RNAi does not affect bacterial load. New text at lines 138-139 articulates this point.
The effect of Tk RNAi on TAG is opposite in CR and Ax or CR and Ap flies, and the knockdown shows an effect in either case (Figure 2E, Figure 3D). Why is this?
As per response to Reviewer #1, we have added text at lines 219-222 to address this point.
Is it possible to perform at least one lifespan repeat with the other Tk RNAi line mentioned?
We have added another experiment showing longevity upon knockdown in conventional flies, using an independent TkRNAi line (Figure S3).
Reviewer #3:
In Line243, the manuscript states that the reporter activity was not increased in the posterior midgut. However, based on the presented results in Fig4E, there is seemingly not apparent regional specificity. A more detailed explanation is necessary.
We thank the reviewer sincerely for their keen eye, which has highlighted an error in the previous version of the figure. In revisiting this figure we have noticed, to our dismay, that the figures for GFP quantification were actually re-plots of the figures for (ac)K quantification. This error led to the discrepancy between statistics and graphics, which thankfully the reviewer noticed. We have revised the figure to remedy our error, and the statistics now match the boxplots and results text.
Fig1C uses Adh for normalization. Given the high variability of the result, the authors should (1) check whether Adh expression levels changed via bacterial association
We selected Adh on the basis of our RNAseq analysis, which showed it was not different between AX and CV guts, whereas many commonly-used “housekeeping” genes were. We have now added a plot to demonstrate (Figure S2).
The statement in Line 82 that EEs express 14 peptide hormones should be supported with an appropriate reference
We have added the requested reference (Hung et al, 2020) at line 86.
(4) Description of analyses that authors prefer not to carry out
Reviewer #1:
I'd encourage the authors to provide lifespan plots that enable comparison between all conditions
We have avoided this approach because the number of survival curves that would need to be presented on the same axis (e.g. 16 for Figure 5) is not legible. However we have ensured that axes on faceted plots are equivalent and with grid lines for comparison. Moreover, our approach using statistical coefficients (EMMs) enables direct quantitative comparison of the differences among conditions.
Reviewer #2:
Is it possible that this driver is simply not resulting in an efficient KD of the receptor? I would be inclined to check this
This comment relates to Figure 7G. We do see an effect of the knockdown in this experiment, so we believe that the knockdown is effective. However the direction of response is not consistent with our hypothesis so the experiment is not informative about the role of these cells. We therefore feel there is little to be gained by testing efficacy of knockdown, which would also be technically challenging because the cells are a small population in a larger tissue which expresses the same transcripts elsewhere (i.e. necessitating FISH).
Would it be possible to use antibodies for acetylated histones?
The comment relates to Figure 4C-E. The proposed studies would be a significant amount of work because, to our knowledge, the specific histone marks which drive activation in TK+ cells remain unknown. On the other hand, we do not see how this information would enrich the present story, rather such experiments would appear to be the beginning of something new. We therefore agree with Reviewer #1 (in cross-commenting) that this additional work is not justified.
Reviewer #3:
Tk+ EEC activity should be assessed directly, rather than relying solely on transcript levels. Approaches such as CaLexA or GCaMP could be used.
We agree with reviewers 1-2 (in cross-commenting) that this proposal is non-trivial and not justified by the additional insight that would be gained. As described above, we are attempting to immunostain Tk, which if successful will provide a third line of evidence for regulation of Tk+ cells. However we note that we already have the strongest possible evidence for a role of these cells via genetic analysis (Figure 5).
While the difficulty of maintaining lifelong axenic conditions is understandable, it may still be feasible to assess the induction of Tk (ie. Tk transcription or EE activity upregulation) by the microbiome on males.
As the reviewer recognises, maintaining axenic experiments for months on end is not trivial. Given the tendency for males either to simply mirror female responses to lifespan-extending interventions, or to not respond at all, we made the decision in our work to only study females. We have instead emphasised in the manuscript that results are from female flies.
TkR86C, in addition to TkR99D, may be involved in the A. pomorum-lifespan interaction. Consider revising the title to refer more generally to the "tachykinin receptor" rather than only TkR99D.
We disagree with this interpretation: the results do not show that TkR86C-RNAi recapitulates the effect of enteric Tk-RNAi. A potentially interesting interaction is apparent, but the data do not support a causal role for TkR86C. A causal role is supported only for TkR99D, knockdown of which recapitulates the longevity of axenic flies and Tk<sup>RNAi</sup> flies_._ Therefore we feel that our current title is therefore justified by the data, and a more generic version would misrepresent our findings.
The difference between "aging" and "lifespan" should also be addressed.
The smurf phenotype is a well-established metric of healthspan. Moreover, lifespan is the leading aggregate measure of ageing. We therefore feel that the use of “ageing” in the title is appropriate.
If feasible, assessing foxo activation would add mechanistic depth. This could be done by monitoring foxo nuclear localization or measuring the expression levels of downstream target genes.
Foxo nuclear localisation has already been shown in axenic flies (Shin et al, 2011). We have added text and citation at lines 401-402.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
In this important manuscript, the authors establish a vertebrate model for studying the development of circuits that control heart rate. This contribution uses a combination of experimental techniques to provide compelling information for scientists looking to understand how heart rate regulation emerges during development.
-
Reviewer #1 (Public review):
Summary:
The manuscript by Hernandez-Nunez et al. provides a comprehensive characterization of how heart-brain circuits develop in a vertebrate brain, namely the zebrafish. The characterization is performed using a combination of modern and sophisticated imaging and neural manipulation techniques and achieves unprecedented clarity and detail in how the heart-brain communication develops early in life. The paper describes a three-stage program, where first an efferent-circuit from the motor vagus to the heart develops, followed by sympathetic innervation, and lastly sensory neurons innervate the heart.
Strengths:
The paper is very clearly and nicely written. The findings are novel and of high quality and relevance. The presentations are very clear and nicely interpreted. The analyses are well presented and applied.
Weaknesses:
From the heart rate traces, heart rate variability seems to be prominent and changes across days post-fertilization (dpf). That would be a useful dependent variable, considering that the variation captured by the models does not fully explain heart rate, both for sympathetic and parasympathetic efferents. Given the strong autorhythmicity of nodal tissue in neurogenic hearts, modulatory inputs could potentially predict heart rate variability with higher precision.
-
Reviewer #2 (Public review):
Hernandez-Nunez et al. investigate the development and function of neural circuits involved in the regulation of heart rate in larval zebrafish. Using conserved genetic markers, they identify neural pathways involved in the bidirectional control of heart rate and in providing sensory feedback, potentially enabling more precise tuning. The main observation is that the different elements of this circuit are laid down in a developmentally staggered manner.
At 4 days old, the heart rate is invariant to a range of sensory stimuli, and the vagal motor or sympathetic pathways could not be seen to innervate the heart. Progressively through development, the heart is first innervated by the vagal motor pathway, whose axons are cholinergic, before the formation of phox2bb+ intracardiac neurons (ICNs). At this stage, before the first ICNs are observed, activation of the vagal motor pathway by optogenetic activation of a localized population of cholinergic hindbrain neurons leads to bradycardia. After the vagal motor innervation begins, the sympathetic pathway innervates the heart, which could be visualized in the form of TH+ fibers from the anterior paravertebral ganglia (APG). The activity of the TH+ APG neurons was diverse and showed proportional, integral, and derivative-like relationships to the heart rate, suggesting a role in more precise tuning of the rate than what could be achieved through the vagal pathway alone. The sensory vagus innervation of the heart was identified to be the last stage to develop; however, neurons in the nodose ganglion exhibited diverse responses tuned to the heart rate well before the innervation reached the heart. The authors attribute this to the fact that other indirect sensory cues from the gills or vasculature could be used to sense heart rate prior to innervation.
This study identifies key components of the control loop required for the regulation of heart rate in zebrafish. The control mechanism appears to be independent of the cues that trigger heart rate changes, indicating that the circuit is indeed part of an interoceptive pathway for heart rate control. Evidence for the staggered development of the vagal-motor, sympathetic, and sensory pathways is conclusive, and as the authors discuss, this phenomenon progressively allows for finer-grained control of the heart rate. This could be achieved through proportional-integral-derivative-like control properties emerging in a diverse set of neurons in the APG and sensory feedback of the state of the heart. In line with these findings, the baseline variability of heart rate prior to innervation at 4 days old appears to be comparatively lower than the later stages (Figure 1C, D, Supplementary Figure 1C-F) and increases over development.
Based on this observation and the time courses of the kernels identified by the GLMs, I would expect heart rate fluctuations of a finer time scale, ultimately limited by the time course of GCaMP6s, to be captured by the models in Figures 3, 5, and 7, in addition to the stimulus-locked changes that are highlighted. While the models yield valuable insight in the form of the activation kernels and their potential roles, in one instance, this captures the potential contribution of either the motor vagus or the APG to the change in heart rate. This makes it challenging to identify where it falls short and the potential functions of pathways that are yet to be discovered.
Lastly, the proposed anatomical connectivity of the heart-brain circuit is based on tracts observed in this study as well as those inferred from function and from previous studies.
(1) It is not clear from the images presented here whether the VSNs send feedback projections to the brainstem VPN.
(2) Do the brainstem neurons identified by their functional roles send efferent projections via the motor vagus nerve? This is unclear from the results presented and needs to be clarified in the text.
(3) Add appropriate clarifying annotations to Figure 9 and a section of text discussing the potential unknowns in the proposed circuit diagram.
-
Author response:
We thank the reviewers for their thoughtful, constructive, and generous evaluations of our manuscript. We are encouraged by their overall assessment of the clarity, novelty, and significance of the work, and we appreciate the opportunity to further strengthen the manuscript.
Both reviewers highlight the central contribution of this study: a developmental, circuitlevel dissection of how heart–brain signaling emerges in a vertebrate. We are pleased that the evidence supporting the staggered assembly of vagal motor, sympathetic, and sensory pathways was found to be compelling, and that the computational and experimental framework was viewed as appropriate and informative.
Below, we briefly outline how we plan to address the main points raised in the reviews.
Heart rate variability and temporal structure
Both reviewers note that heart rate variability (HRV) changes across development and suggest that HRV may provide additional insight into the function of autonomic circuits. We agree that HRV is an important physiological readout and that its developmental changes are consistent with the progressive emergence of autonomic control.
In the revised manuscript, we plan to (i) discuss heart rate variability more explicitly in the context of circuit maturation and (ii) clarify the temporal scales captured by our experiments and modeling framework. In particular, we will emphasize that our analyses focus on relationships between neural activity and heart-rate trajectories at timescales accessible given imaging rate and indicator kinetics, rather than beat-to-beat variability. We will also consider adding a supplementary analysis of the variability that can be reliably measured within these constraints, and, where appropriate, how neural activity predicts that measurable variation.
Scope and interpretation of the computational models
Reviewer #2 raises thoughtful points regarding what the generalized linear models can and cannot disambiguate, particularly when multiple efferent pathways may contribute to heart-rate dynamics. We will revise the text to more clearly distinguish between functional encoding relationships inferred from the models and anatomical connectivity that is directly demonstrated.
Our intent is to frame the kernels identified in the motor and sympathetic pathways as computational motifs that capture distinct dynamical contributions, rather than as exclusive or complete explanations of heart-rate control. We will clarify these limitations explicitly in the Results and Discussion.
Circuit diagram and anatomical interpretation
We appreciate the reviewer’s careful reading of the proposed circuit schematic. In the revised manuscript, we will revise the figure and accompanying text to clearly annotate which connections are directly observed, which are functionally inferred, and which remain hypothetical. We will also expand the Discussion to explicitly address open questions, including unresolved feedback pathways and the potential for additional nodes in the circuit.
We believe these revisions will improve clarity without altering the core conclusions of the study. We thank the reviewers again for their insightful feedback and look forward to submitting a revised version of the manuscript that addresses these points in detail.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This paper presents an important advance in genetically encoded voltage imaging of the developing zebrafish spinal cord in vivo, capturing voltage dynamics in neuronal populations, single cells, and subcellular compartments inaccessible to patch clamp, and diverse spike waveforms and subthreshold voltage dynamics inaccessible to calcium imaging. The work identifies a developmental progression from irregular voltage fluctuations to coordinated contralateral and ipsilateral activity, providing insight into how electrical dynamics and cellular morphology evolve during circuit formation. The strength of evidence is solid, with imaging data supporting the main conclusions, although the manuscript would be strengthened by more complete methodological documentation and clearer context relative to earlier calcium imaging studies. Overall, this study provides a resource that is of importance for researchers investigating neural development and circuit assembly, illustrating the value of voltage imaging as a general tool for probing bioelectric mechanisms in morphogenesis and circuit development.
-
Reviewer #1 (Public review):
Summary:
This paper demonstrates the first application of voltage imaging using a genetically encoded voltage indicator, ArcLight, for recording the spontaneous activity of the developing spinal cord in zebrafish. This technology enabled better temporal resolution compared to what has been demonstrated with calcium imaging in past studies (Muto et al., 2011; Warp et al., 2012; Wan et al., 2019 ), which led to the discovery of the maturation process of "firing" shapes in spinal neurons. This maturation process occurs simultaneously with axonal elongation and network integration. Thus, voltage imaging revealed new biological details of the development of the spinal circuits.
Strengths:
The use of voltage imaging instead of calcium imaging revealed biological details of the functional maturation of spinal cord neurons in developing zebrafish.
Weaknesses:
This manuscript lacks many basic components and explanations necessary for understanding the methodologies used in this study.
-
Reviewer #2 (Public review):
The authors present highly impressive in vivo voltage‐imaging data, demonstrating neuronal activity at subcellular, cellular, and population levels in a developing organism. The approach provides excellent spatial and temporal resolution, with sufficient signal-to-noise to detect hyperpolarizations and subthreshold events. The visualization of contralateral synchrony and its developmental loss over time is particularly compelling. The observation that ipsilateral synchrony persists despite contralateral desynchronization is a striking demonstration of the power of GEVIs in vivo. While I outline several points that should be addressed, I consider this among the strongest demonstrations of in vivo GEVI imaging to date.
Major points:
(1) Clarification of GEVI performance characteristics
There is a widespread misconception in the GEVI field that response speed is the dominant or primary determinant of sensor performance. Although fast kinetics are certainly desirable, they are not the only (or even necessarily the limiting) factor for effective imaging. Kinetic speed specifies the time to reach ~63% of the maximal ΔF/F for a given voltage step (typically 100 mV, approximating the amplitude of a neuronal action potential), but in practical imaging, a slower sensor with a large ΔF/F can outperform a faster sensor with a small ΔF/F. In this context, the authors' use of ArcLight is actually instructive. ArcLight is one of the slower GEVIs in common use, yet Figures S1a-b clearly show that it still reports voltage transients in vivo very well. I therefore strongly recommend moving these panels into the main text to emphasize that robust in vivo imaging can be achieved even with a relatively slow GEVI, provided the signal amplitude and SNR are adequate. This will help counteract the common misunderstanding in the field.
(2) ArcLight's voltage-response range
ArcLight is shifted toward more negative potentials (V₁/₂ ≈ −30 mV). This improves subthreshold detection but makes distinguishing action potentials from subthreshold transients more challenging. The comparison with GCaMP is helpful because the Ca²⁺ signal largely reflects action potentials. Panels S1c-f show similar onset kinetics but a longer decay for GCaMP. Surprisingly, the ΔF/F amplitudes are comparable; typically, GCaMP changes are larger. To support lines 193-194, the authors should include a table summarizing the onset/offset kinetics and ΔF/F ranges for neurons expressing ArcLight versus GCaMP.
Additionally, the expected action-potential amplitude in zebrafish neurons should be stated. In Figure S1b, a 40 mV change appears to produce ~0.5% ΔF/F, but this should be quantified and noted. Could this comparison to GCaMP help resolve action potentials from subthreshold bursts?
(3) Axonal versus somatic amplitudes (Line 203)
The manuscript states that voltage amplitudes are "slightly smaller" in axons than in somata; this requires quantitative values and statistical testing. More importantly, differences in optical amplitude reflect factors such as expression levels, background fluorescence, and optical geometry, not necessarily true differences in voltage amplitude. The axonal signals are clearly present, but their relative magnitude should not be interpreted without correction.
(4) Figure 4C: need for an off-ROI control
Figure 4C should include a control ROI located away from ROI3 to demonstrate that the axonal signal is not due to background fluctuations, similar to the control shown in Figure S3. Although the ΔF image suggests localization, showing the trace explicitly would strengthen the point. The fluorescence-change image in Figure 4c should also be fully explained in the legend.
(5) Figure 5: hyperpolarization signals
Figure 5 is particularly impressive. It appears that Cell 2 at 18.5 hpf and Cell 1 at 18 hpf exhibit hyperpolarizing events. The authors should confirm that these are true hyperpolarizations by giving some indication of how often they were observed.
(6) SNR comparison (Lines 300-302)
The claim that ArcLight and GCaMP exhibit comparable SNR requires statistical support across multiple cells.
-
Reviewer #3 (Public review):
Summary:
The authors aimed to establish a long-term voltage imaging platform to investigate how coordinated neuronal activity emerges during spinal cord development in zebrafish embryos. Using the genetically encoded voltage indicator ArcLight, they tracked membrane potential dynamics in motor neurons at population, single-cell, and subcellular levels from 18 to 23 hours post-fertilization (hpf), revealing relationships between firing maturation, waveform characteristics, and axonal outgrowth.
Strengths:
(1) Technical advancement in developmental voltage imaging:
This study demonstrates voltage imaging of motor neurons in the developing vertebrate spinal cord. The approach successfully captures voltage dynamics at multiple spatial scales-neuronal population, single-cell, and subcellular compartments.
(2) Insights into the relationship between morphological and functional maturation:
The work reveals important relationships between voltage dynamics maturation and morphological changes.
(3) Kinetics analysis of membrane potential waveform enabled by voltage imaging:
The characterization of "immature" versus "mature" firing based on quantitative waveform parameters provides insights into functional maturation that are inaccessible by calcium imaging. This analysis reveals a maturation process in the biophysical properties of developing neurons.
(4) Matching of voltage indicator kinetics to biological signal:
The authors' choice of ArcLight, despite its slow kinetics compared to newer GEVIs, proved well-suited to the low-frequency activity patterns in developing spinal neurons (frequency ~0.3 Hz).
Weaknesses:
(1) Insufficient comparison with prior calcium imaging studies:
While the authors state that voltage imaging provides superior temporal resolution compared to calcium imaging (lines 192-196, 301), and this is generally true, the current manuscript does not adequately cite or discuss previous calcium imaging studies. Since neural activity occurs at low frequency in the developing spinal cord, calcium imaging is adequate for characterizing the emergence of coordinated activity patterns in the developing zebrafish spinal cord. Notably, Wan et al. (2019, Cell) performed a comprehensive single-cell reconstruction of emerging population activity in the entire developing zebrafish spinal cord using calcium imaging. This work should be properly acknowledged and compared. The specific advantages of voltage imaging over these prior studies need to be more clearly articulated, e.g. detection of subthreshold events and membrane potential waveform kinetics.
(2) Considerations for generalizability of the ArcLight-based voltage imaging approach:
While this study successfully demonstrates voltage imaging using ArcLight in the developing spinal cord, the generalizability of this approach to later developmental stages and other neural systems warrants discussion. ArcLight exhibits relatively slow kinetics (rise time ~100-200 ms, decay τ ~200-300 ms). In the current study, these kinetics are well-suited to the developmental activity patterns observed (firing frequency ~0.3 Hz), representing appropriate matching of indicator properties to biological timescales. However, the same approach may be less suitable for later developmental stages when neural activity occurs at higher frequencies.
(3) Incomplete methodological descriptions:
As a paper establishing a new imaging approach, several critical details are missing or unclear.
(a) Imaging system specifications: The imaging setup description lacks essential information, including light source specifications, excitation wavelength/filter sets, and light power at the sample. The authors should also clarify whether wide-field optics was used rather than confocal or selective plane imaging.
(b) Long-term imaging protocol: Whether neurons were imaged continuously or with breaks between imaging sessions is not explicitly stated. The current phrasing could be interpreted as a continuous 4.5-hour recording, which would be technically impressive but may not be what was actually done.
(c) Image processing procedures: Denoising and bleach correction procedures are mentioned but not described, which is critical for a methods-focused paper.
(d) The waveform classification (Supplementary Figure S6) shows overlapping kinetics between "immature" and "mature" firing, yet the classification method is not adequately justified.
(e) Given that photostability and toxicity are critical considerations for long-term voltage imaging, these aspects warrant further clarification. While the figures suggest stable ArcLight fluorescence during the experiments, the manuscript lacks quantification of photobleaching, a discussion of potential toxicity concerns associated with the indicator, and information regarding the maximum duration over which the ArcLight signal can faithfully report physiological voltage dynamics.
(4) Incomplete data representation and quantification:
(a) The claim of "reduced variability" in calcium imaging (line 194) is not clearly demonstrated in Supplementary Figure S1.
(b) Amplitude distributions for cell/subcellular compartments are not systematically quantified. Figure S3 shows ~5% changes in some axons versus ~2% in others, but it remains unclear whether these variabilities reflect differences between axonal compartments within the same cell, between individual cells, or between individual fish.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This study presents a valuable and practical approach for one-photon imaging through GRIN lenses. By scanning a low numerical aperture (NA) beam and collecting fluorescence with a high NA, the method expands the usable field of view and yields clearer cellular signals. The evidence is solid overall, with strong qualitative demonstrations, but some claims would benefit from additional quantitative tests. The work will interest researchers who need simple, scalable tools for large‑area cellular imaging in the brain.
-
Reviewer #1 (Public review):
Summary:
The manuscript reported a method for deep brain imaging with a GRIN lens that combines "low-NA telecentric scanning (LNTS) of laser excitation with high-NA fluorescence collection" to achieve a larger FOV than conventional approaches.
Strengths:
The manuscript presented in vivo structural images and calcium activity results in side-by-side comparison to wide-field epi fluorescence imaging through a GRIN lens and two-photon scanning imaging.
Weaknesses:
(1) Lack of sufficient technique information on the "high-NA (1.0) fluorescence collection". Is it custom-made or an off-the-shelf component? The only optical schematic, Figure 1, shows two lenses and a Si-PMT as the collection apparatus. There is no information about the lenses and the spacing between each component.
(2) There is no discussion about the speed limitation of the LNTS method, which, as a scanning-based method, is limited by the scanner speed. At a 10 Hz frame rate, the LNTS, although it has a better FOV, is much slower than widefield fluorescence imaging. The 10 Hz speed is not sufficient for some fast calcium activities.
(3) Supplementary Figure 5 is irrelevant to the main claim of the manuscript. This is a preliminary simulation related to the authors' proposed future work.
-
Reviewer #2 (Public review):
Summary:
This study introduces a simple optical strategy for one-photon imaging through GRIN lenses that prioritizes coverage while maintaining practical signal quality. By using low-NA telecentric scanned excitation together with high-NA collection, the approach aims to convert nearly the full lens facet into a usable field of view (FOV) with uniform contrast and visible somata. The method is demonstrated in 4-µm fluorescent bead samples and mouse brain, with qualitative comparisons to widefield and two-photon (2P) imaging. Because the configuration relies on standard components and a minimalist optical layout, it may enable broader access to large-area cellular imaging in the deep brain across neuroscience laboratories.
Strengths:
(1) This method mitigates off-axis aberrations and enlarges the usable FOV. It achieves near full-facet usable FOV with consistent centre-to-edge contrast, as evidenced by 4-µm fluorescent bead samples (uniform visibility to the edge) and in vivo microglia imaging (resolvable somata across the field).
(2) The optical design is simple and supports efficient photon collection, lowering the barrier to adoption relative to adaptive optics (AO) or lens design-based correction. Using standard components and treating the GRIN lens as a high-NA (~1.0) light pipe increases collection efficiency for ballistic and scattered fluorescence. Figure annotations report the illumination energy required to reach a fixed detected-photon target (e.g., ~1000 detected photons per bead/cell for the 500-µm FOV condition), and under this equal-output criterion, the LNTS configuration achieves comparable or better image quality at lower illumination energy than conventional wide-field imaging, supporting improved photon efficiency and implying reduced bleaching and heating for equivalent signal levels.
(3) The in vivo functional recordings are stable and exhibit strong signals. In vivo calcium imaging shows high-SNR ΔF/F₀ traces that remain stable over ~30-minute sessions with only modest baseline drift reported, supporting physiological measurements without heavy denoising and enabling large-scale data collection.
(4) The low-NA excitation provides an extended focal depth, enabling more neurons to be tracked concurrently within a single FOV while maintaining practical signal quality. It reduces sensitivity to axial motion and minor misalignment and enhances overall experimental efficiency.
Weaknesses:
(1) Quantitative characterization is limited. Resolution and contrast are not comprehensively mapped as functions of field position and depth, and a clear, operational definition of "usable FOV" is not specified with threshold criteria.
(2) The claim of approximately 100% usable FOV is largely supported by qualitative images; standardized metrics (e.g., PSF/MTF maps, contrast-to-noise ratio profiles, cell-detection yield versus radius) are needed to calibrate expectations and enable comparison across systems.
(3) The trade-off inherent to low NA excitation, namely a broader axial PSF and possible neuropil/background contamination, is acknowledged qualitatively but not quantified. Analyses that separate in-focus from out-of-focus signal would help readers judge single-cell fidelity across the field.
(4) Generalizability remains to be established. Performance across multiple GRIN models (e.g., diameter, NA), wavelengths, is not yet demonstrated. Longer-session photobleaching, heating, and phototoxicity, particularly near the edge of the FOV, also require fuller evaluation.
Readers should view it as a coverage-first strategy that enlarges the FOV while accepting a modest trade-off in resolution due to the low-NA excitation and the extended axial PSF.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This study provides a valuable advance in understanding how decision boundaries may change over time during simple choices by introducing a method that uses information about non-decision components to improve parameter estimates. The evidence supporting the main claims is convincing, with clear demonstrations on simulated and real data, although additional model comparison work would further strengthen confidence. The findings will be of interest to researchers studying human decision processes and the methods used to analyse them.
-
Reviewer #1 (Public review):
Summary:
This paper proposes a non-decision time (NDT)-informed approach to estimating time-varying decision thresholds in diffusion models of decision making. The manuscript motivates the method well, outlines the identifiability issues it is intended to address, and evaluates it using simulations and two empirical datasets. The aim is clear, the scope is deliberately focused, and the manuscript is well written. The core idea is interesting, technically grounded, and a meaningful contribution to ongoing work on collapsing thresholds.
Strengths:
The manuscript is logically structured and easy to follow. The emphasis on parameter recovery is appropriate and appreciated. The finding that the exponential NDT-informed function produces substantially better recovery than the hyperbolic form is useful, given the importance placed on identifiability earlier in the paper. The threshold visualisations are also helpful for interpreting what the models are doing. Overall, the work offers a well-defined, methodologically oriented contribution that will interest researchers working on time-varying thresholds.
Weaknesses / Areas for Clarification:
A few points would benefit from clarification, additional analysis, or revised presentation:
(1) It would help readers to see a concrete demonstration of the trade-off between NDT and collapsing thresholds, to give a sense of the scale of the identifiability problem motivating the work.
(2) Before moving to the empirical datasets, the manuscript really needs a simulation-based model-recovery comparison, since all major conclusions of the empirical applications rely on model comparison. One approach might be to simulate from (a) an FT model with across-trial drift variability and (b) one of the CT models, then fit both models to each of the simulated data sets. This would address a longstanding issue: sometimes CT models are preferred even when the estimated collapse in the thresholds is close to zero. A recovery study would confirm that model selection behaves sensibly in the new framework.
(3) An additional subtle point is that BIC is defined in terms of the maximised log-likelihood of the model for the data being modelled. In the joint model, the parameter estimates maximise the combined likelihood of behavioural and non-decision-time data. This means the behavioural log-likelihood evaluated at the joint MLEs is not the behavioural MLE. If BIC is being computed for the behavioural data only, this breaks the assumptions underlying BIC. The only valid BIC here would be one defined for the joint model using the joint likelihood.
(4) Table 1 sets up the Study 1 comparisons, but there's no row for the FT model. Similarly, Figures 10 and 13 would be more informative if they included FT predictions. This matters because, in Study 1, the FT model appears to fit aggregate accuracy better than the BIC-preferred collapsing model, currently shown only in Appendix 5. Some discussion of why would strengthen the argument.
(5) In Figure 7, the degree of decay underestimation is obscured by using a density plot rather than a scatterplot, consistent with the other panels of the same figure. Presenting it the same way would make the mis-recovery more transparent. The accompanying text may also need clarification: when data are generated from an FT model with across-trial drift variability, the NDT-informed model seems to infer FT boundaries essentially. If that's correct, the model must be misfitting the simulated data. This is actually a useful result as it suggests across-trial drift variability in FT models is discriminable from collapsing-threshold models. It would be good to make this explicit.
(6) Given the large recovery advantage of the exponential NDT-informed function over the hyperbolic one, the authors may want to consider whether the results favour adopting the former more generally. Given these findings, I would consider recommending the exponential NDT-informed model for future use.
(7) In Study 2 (Figure 13), all models qualitatively miss an interesting empirical pattern: under speed emphasis, errors are faster than corrects, while under accuracy emphasis, errors become slower. The error RT distribution in the speed condition is especially poorly captured. It would be helpful for the authors to comment, as it suggests that something theoretically relevant is missing from all models tested.
(8) The threshold visualisations extend to 3 seconds, yet both datasets show decisions mostly finishing by ~1.5 seconds. Shortening the x-axis would better reflect the empirical RT distributions and avoid unintentionally overstating the timescale of the empirical decision processes.
-
Reviewer #2 (Public review):
Summary:
The authors use simulations and empirical data fitting in order to demonstrate that informing a decision model on estimates of single-trial non-decision time can guide the model to more reliable parameter estimates, especially when the model has collapsing bounds.
Strengths:
The paper is well written and motivated, with clear depth of knowledge in the areas of neurophysiology of decision-making, sequential sampling models, and, in particular, the phenomenon of collapsing decision bounds.
Two large-scale simulations are run to test parameter recovery, and two empirical datasets are fit and assessed; the fitting procedures themselves are state-of-the-art, and the study makes use of a very new and well-designed ERP decomposition algorithm that provides single-trial estimates of the duration of diffusion; the results provide inferences about the operation of decision bound collapse - all of this is impressive.
Weaknesses:
This is an interesting and promising idea, but a very important issue is not clear: it is an intuitive principle that information from an external empirical source can enhance the reliability of parameter estimates for a given model, but how can the overall BIC improve, unless it is in fact a different model? Unfortunately, it is not clear whether and how the model structure itself differs between the NDT-informed and non-NDT-informed cases. Ideally, they are the same actual model, but with one getting extra guidance on where to place the tau and/or sigma parameters from external measurements. The absence of sigma (non-decision time variance) estimates for the non-NDT-informed model, however, suggests it is different in structure, not just in its lack of constraints. If they were the same model, whether they do or do not possess non-decision time variability (which is not currently clear), the only possible reason that the NDT-informed model could achieve better BIC is because the non-NDT-informed model gets lost in the fitting procedure and fails to find the global optimum. If they are in fact different models - for example, if the NDT-informed model is endowed with NDT variability, while the non-NDT-informed model is not - then the fit superiority doesn't necessarily say anything about an NDT-informed reliability boost, but rather just that a model with NDT variability fits better than one without.
One reason this is unclear is that Footnote 4 says that this study did not allow trial-to-trial variability in nondecision time, but the entire premise of using variable external single-trial estimates of nondecision times (illustrated in Figure 2) assumes there is nondecision time variability and that we have access to its distribution.
It is good that there is an Intro section to explain how the tradeoff between NDT and collapsing bound parameters renders them difficult to simultaneously identify, but I think it needs more work to make it clear. First of all, it is not impossible to identify both, in the same way as, say, pre- and post-decisional nondecision time components cannot be resolved from behaviour alone - the intro had already talked about how collapsing bounds impact RT distribution shapes in specific ways, and obviously mean (or invariant) NDT can't do that - it can only translate the whole distribution earlier/later on the time axis. This is at odds with the phrasing "one CANNOT estimate these three parameters simultaneously." So it should be first clarified that this tradeoff is not absolute. Second, many readers will wonder if it is simply a matter of characterising the bound collapse time course as beginning at accumulation onset, instead of stimulus offset - does that not sidestep the issue? Third, assuming the above can be explained, and there is a reason to keep the collapse function aligned to stimulus onset, could the tradeoff be illustrated by picking two distinct sets of parameter values for non-decision time, starting threshold, and decay rate, which produce almost identical bound dynamics as a function of RT? It is not going to work for most readers to simply give the formula on line 211 and say "There is a tradeoff." Most readers will need more hand-holding.
A lognormal distribution is used as line 231 says it "must" produce a right-skew. Why? It is unusual for non-decision time distribution to be asymmetric in diffusion modeling, so this "must" statement must be fully explained and justified. Would I be right in saying that if either fixed or symmetrically distributed nondecision times were assumed, as in the majority of diffusion models, then the non-identifiability problem goes away? If the issue is one faced only by a special class of DDMs with lognormal NDT, this should be stated upfront.
In the simulation study methods, is the only difference between NDT-informed and non-informed models that the non-NDT-informed must also estimate tau and sigma, whereas the NDT-informed model "knows" these two parameters and so only has the other three to estimate? And is it the exact same data that the two models are fit to, in each of the simulation runs? Why is sigma missing from the uninformed part of Figure 4? If it is nondecision time variability, shouldn't the model at least be aware of the existence of sigma and try to estimate it, in order for this to be a meaningful comparison?
I am curious to know whether a linear bound collapse suffers from the same identifiability issues with NDT, or was it not considered here because it is so suboptimal next to the hyperbolic/exponential?
The approach using HMP rests on the assumption that accumulation onset is marked by the peak of a certain neural event, but even if it is highly predictive of accumulation onset, depending on what it reflects, it could come systematically earlier or later than the actual accumulation onset. Could the authors comment on what implications this might have for the approach?
Figure 7: for this simulation, it would be helpful to know the degree to which you can get away with not equipping the model to capture drift rate variability, when the degree of that d.r. variability actually produces appreciable slow error rates. The approach here is to sample uniformly from ranges of the parameters, but how many of these produce data that can be reasonably recognised as similar to human behaviour on typical perceptual decision tasks? The authors point out that only 5% of fits estimate an appreciable bound collapse but if there are only 10% of the parameter vectors that produce data in a typical RT range with typical error rates etc, and half of these produce an appreciable downturn in accuracy for slower RT, and all of the latter represent that 5%, then that's quite a different story. An easy fix would be to plot estimated decay as a scatter plot against the rate of decline of accuracy from the median RT to the slowest RT, to visualise the degree to which slow errors can be absorbed by the no-dr-var model without falsely estimating steep bound collapse. In general, I'm not so sure of the value of this section, since, in principle, there is no getting around the fact that if what is in truth a drift-variability source of slow errors is fit with a model that can only capture it with a collapsing bound, it will estimate a collapsing bound, or just fail to capture those slow errors.
-
Reviewer #3 (Public review):
The current paper addresses an important issue in evidence accumulation models: many modelers implement flat decision boundaries because the collapsing alternatives are hard to reliably estimate. Here, using simulations, the authors demonstrate that parameter recovery can be drastically improved by providing the model with additional data (specifically, an EEG-informed estimate of non-decision time). Moreover, in two empirical datasets, it is shown that those EEG-informed models provide a better fit to the data. The method seems sound and promising and might inform future work on the debate regarding flat vs collapsing choice boundaries. As an evidence-accumulation enthusiast, I am quite excited about this work, although for a broader audience, the immediate applicability of this approach seems limited because it does require EEG data (i.e. limiting widespread use of the method or e.g., answering questions about individual differences that require a very large N).
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This study provides important evidence that myristate, a fatty acid commonly present in soil environments, is taken up by arbuscular mycorrhizal fungi during symbiosis with a plant host. The evidence presented is solid, with multiple experimental approaches including stable isotope tracing, transcriptional analysis, and physiological measurements across different plant species and phosphorus conditions. However, the main claims are only partially supported.
-
Reviewer #1 (Public review):
Summary:
Two major breakthroughs in the field of arbuscular mycorrhiza (AM) were the discoveries that first AM fungi obtain lipids (not only carbohydrates) from their plant hosts (Bravo et al 2017; Jiang et al 2017; Keymer et al 2017; Luginbuehl et al 2017) and second that presumably obligate biotrophic AM fungi can produce spores in the absence of host plants when exposed to myristate (Sugiura et al 2020; Tanaka et al 2022).
For this manuscript, Chen et al asked the question of whether myristate in the soil may also play a role in AM symbiosis when AM fungi live in symbiosis with their plant hosts. They show that myristate occurs in natural as well as agricultural soils, probably as a component of root exudates. Further, they treat AM fungi with myristate when grown in symbiosis in a Petri dish system with carrot hairy roots or in pots with alfalfa or rice to describe which effect the exogenous myristate has on symbiosis. Using 13C labelling, they show that myristate is taken up by AM fungi, although they can obtain sugars and lipids from the plant host. They also show that myristate leads to an increase in root colonization as well as expression of fungal genes involved in FA assimilation.
Interestingly, the effect of myristate on colonization depends on the plant species and the level of phosphate fertilization provided to the plant. The reason for this remains unknown.
Strengths:
The findings are interesting and provide an advance in our understanding of lipid use by the extraradical mycelium of AM fungi.
Weaknesses:
However, there are some misconceptions in the writing, and some experimental results remain poorly clear as they are presented in a highly descriptive manner without interpretation or explanation.
-
Reviewer #2 (Public review):
Summary:
Arbuscular mycorrhizal fungi (AMF) are among the most widely distributed soil microorganisms, forming symbiotic relationships (AM symbiosis) with approximately 70% of terrestrial vascular plants. AMF are considered obligate biotrophs that rely on host-derived symbiotic carbohydrates. However, it remains unclear whether symbiotic AMF can access exogenous non-symbiotic carbon sources. By conducting three interconnected and complementary experiments, Chen et al. investigated the direct uptake of exogenous 13C1-labeled myristate by symbiotic Rhizophagus irregularis, R. intraradices, and R. diaphanous, and assessed their growth responses using AMF-carrot hairy root co-culture systems (Experiments 1 and 2). They also explored the environmental distribution of myristate in plant and soil substrates, and evaluated the impact of exogenous myristate on the symbiotic carbon-phosphorus exchange between R. irregularis and alfalfa or rice in a greenhouse experiment (Experiment 3). Given that the AM symbiosis not only plays a significant role in the biogeochemical cycling of C and P elements but also acts as a key driver of plant community structure and productivity. The topic of this manuscript is relevant. The study is well-designed, and the manuscript is well-written. I find it easy and interesting to follow the entire narrative.
Strengths:
The manuscript provides evidence from 13C labeling and molecular analyses showing that symbiotic AMF can absorb non-symbiotic C sources like myristate in the presence of plant-derived symbiotic carbohydrates, challenging the traditional assumption that AMF exclusively rely on symbiotic carbon sources supplied from associated host plants. This finding advances our understanding of the nutritional interactions between AMF and host plants. Furthermore, the manuscript reveals that myristate is widely present in diverse soil and plant components; however, exogenous myristate disrupts the carbon-phosphorus exchange in arbuscular mycorrhizal symbiosis. These insights have significant implications for the application and regulation of the AM symbiosis in sustainable agriculture and ecological restoration.
Weaknesses:
The limitations of this study include:
(1) The absorption of myristate by symbiotic AMF was observed only after exogenous application under artificial conditions, which may not accurately reflect natural environments.
(2) The investigation into the mechanism by which myristate disrupts C-P exchange in AM symbiosis remains preliminary.
Nevertheless, the authors have adequately discussed these limitations in the manuscript.
-
Reviewer #3 (Public review):
Summary:
The authors have addressed a major question since the discovery of myristate uptake from AM fungi as a non-symbiotic C source. Myristate has been used to grow some AM fungi axenically, but the biological significance of this saprobic attitude in natural or agronomical environments remained unexplored. The results of this research soundly demonstrate that myristate-derived C is used by AM fungi, leading to improved development of both extraradical and intraradical mycelium (at least under low P conditions). However, this does not lead to obvious advantages for the plant, since symbiotic nutrient exchange (carbon and phosphorus) is reduced upon myristate application. Furthermore, myristate-treated plants quench their defence responses.
Strengths:
The study is extensive, based on a solid experimental setup and methodological approach, combining several state-of-the-art techniques. The conclusions are novel and of high relevance for the scientific community. The writing is fluent and clear.
Weaknesses:
Some of the figures should be improved for clarity. The conclusions do not express a conclusive remark that, in my opinion, emerges clearly from the results: myristate application in agriculture does not seem to be a very promising approach, since it unbalances the symbiosis nutritional equilibrium and may weaken plant immunity. This is a very important point (albeit rather unpleasant for applicative scientists) that should be stressed in the conclusions.
-
-
www.medrxiv.org www.medrxiv.org
-
eLife Assessment
This important study reports on the relationships between cerebral haemodynamics and a number of factors that relate to genetics, lifestyle, and medical history using data from a large cohort. Compelling evidence suggests that brief arterial spin labelling MRI acquisition can lead to both expected observations about brain health, as manifested in cerebral blood flow, and biomarkers for use in diagnosis and treatment monitoring. The results can be used as a starting point for hypothesis generation and further evaluation of conditions expected to affect haemodynamics in the brain.
-
Reviewer #1 (Public review):
Summary:
In this work, Okell et al. describe the imaging protocol and analysis pipeline pertaining to the arterial spin labeling (ASL) MRI protocol acquired as part of the UK Biobank imaging study. In addition, they present preliminary analyses of the first 7000+ subjects in whom ASL data were acquired, and this represents the largest such study to date. Careful analyses revealed expected associations between ASL-based measures of cerebral hemodynamics and non-imaging-based markers, including heart and brain health, cognitive function, and lifestyle factors. As it measures physiology and not structure, ASL-based measures may be more sensitive to these factors compared with other imaging-based approaches.
Strengths:
This study represents the largest MRI study to date to include ASL data in a wide age range of adult participants. The ability to derive arterial transit time (ATT) information in addition to cerebral blood flow (CBF) is a considerable strength, as many studies focus only on the latter.
Some of the results (e.g., relationships with cardiac output and hypertension) are known and expected, while others (e.g., lower CBF and longer ATT correlating with hearing difficulty in auditory processing regions) are more novel and intriguing. Overall, the authors present very interesting physiological results, and the analyses are conducted and presented in a methodical manner.
The analyses regarding ATT distributions and the potential implications for selecting post-labeling delays (PLD) for single PLD ASL are highly relevant and well-presented.
Weaknesses:
At a total scan duration of 2 minutes, the ASL sequence utilized in this cohort is much shorter than that of a typical ASL sequence (closer to 5 minutes as mentioned by the authors). However, this implementation also included multiple (n=5) PLDs. As currently described, it is unclear how any repetitions were acquired at each PLD and whether these were acquired efficiently (i.e., with a Look-Locker readout) or whether individual repetitions within this acquisition were dedicated to a single PLD. If the latter, the number of repetitions per PLD (and consequently signal-to-noise-ratio, SNR) is likely to be very low. Have the authors performed any analyses to determine whether the signal in individual subjects generally lies above the noise threshold? This is particularly relevant for white matter, which is the focus of several findings discussed in the study.
Hematocrit is one of the variables regressed out in order to reduce the effect of potential confounding factors on the image-derived phenotypes. The effect of this, however, may be more complex than accounting for other factors (such as age and sex). The authors acknowledge that hematocrit influences ASL signal through its effect on longitudinal blood relaxation rates. However, it is unclear how the authors handled the fact that the longitudinal relaxation of blood (T1Blood) is explicitly needed in the kinetic model for deriving CBF from the ASL data. In addition, while it may reduce false positives related to the relationships between dietary factors and hematocrit, it could also mask the effects of anemia present in the cohort. The concern, therefore, is two-fold: (1) Were individual hematocrit values used to compute T1Blood values? (2) What effect would the deconfounding process have on this?
The authors leverage an observed inverse association between white matter hyperintensity volume and CBF as evidence that white matter perfusion can be sensitively measured using the imaging protocol utilized in this cohort. The relationship between white matter hyperintensities and perfusion, however, is not yet fully understood, and there is disagreement regarding whether this structural imaging marker necessarily represents impaired perfusion. Therefore, it may not be appropriate to use this finding as support for validation of the methodology.
-
Reviewer #2 (Public review):
Summary:
Okell et al. report the incorporation of arterial spin-labeled (ASL) perfusion MRI into the UK Biobank study and preliminary observations of perfusion MRI correlates from over 7000 acquired datasets, which is the largest sample of human perfusion imaging data to date. Although a large literature already supports the value of ASL MRI as a biomarker of brain function, this important study provides compelling evidence that a brief ASL MRI acquisition may lead to both fundamental observations about brain health as manifested in CBF and valuable biomarkers for use in diagnosis and treatment monitoring.
ASL MRI noninvasively quantifies regional cerebral blood flow (CBF), which reflects both cerebrovascular integrity and neural activity, hence serves as a measure of brain function and a potential biomarker for a variety of CNS disorders. Despite a highly abbreviated ASL MRI protocol, significant correlations with both expected and novel demographic, physiological, and medical factors are demonstrated. In many such cases, ASL was also more sensitive than other MRI-derived metrics. The ASL MRI protocol implemented also enables quantification of arterial transit time (ATT), which provides stronger clinical correlations than CBF in some factors. The results demonstrate both the feasibility and the efficacy of ASL MRI in the UK Biobank imaging study, which expects to complete ASL MRI in up to 60,000 richly phenotyped individuals. Although a large literature already supports the value of ASL MRI as a biomarker of brain function, this important study provides compelling evidence that a brief ASL MRI acquisition may lead to both fundamental observations about brain health as manifested in CBF and valuable biomarkers for use in diagnosis and treatment monitoring.
Strengths:
A key strength of this study is the use of an ASL MRI protocol incorporating balanced pseudocontinuous labeling with a background-suppressed 3D readout, which is the current state-of-the-art. To compensate for the short scan time, voxel resolution was intentionally only moderate. The authors also elected to acquire these data across five post-labeling delays, enabling ATT and ATT-corrected CBF to be derived using the BASIL toolbox, which is based on a variational Bayesian framework. The resulting CBF and ATT maps shown in Figure 1 are quite good, especially when combined with such a large and deeply phenotyped sample.
Another strength of the study is the rigorous image analysis approach, which included covariation for a number of known CBF confounds as well as correction for motion and scanner effects. In doing so, the authors were able to confirm expected effects of age, sex, hematocrit, and time of day on CBF values. These observations lend confidence in the veracity of novel observations, for example, significant correlations between regional ASL parameters and cardiovascular function, height, alcohol consumption, depression, and hearing, as well as with other MRI features such as regional diffusion properties and magnetic susceptibility. They also provide valuable observations about ATT and CBF distributions across a large cohort of middle-aged and older adults.
Weaknesses:
This study primarily serves to illustrate the efficacy and potential of ASL MRI as an imaging parameter in the UK Biobank study, but some of the preliminary observations will be hypothesis-generating for future analyses in larger sample sizes. However, a weakness of the manuscript is that some of the reported observations are difficult to follow. In particular, the associations between ASL and resting fMRI illustrated in Figure 7 and described in the accompanying Results text are difficult to understand. It could also be clearer whether the spatial maps showing ASL correlates of other image-derived phenotypes in Figure 6B are global correlations or confined to specific regions of interest. Finally, while addressing partial volume effects in gray matter regions by covarying for cortical thickness is a reasonable approach, the Methods section seems to imply that a global mean cortical thickness is used, which could be problematic given that cortical thickness changes may be localized.
-
Reviewer #3 (Public review):
Summary:
This is an extremely important manuscript in the evolution of cerebral perfusion imaging using Arterial Spin Labelling (ASL). The number of subjects that were scanned has provided the authors with a unique opportunity to explore many potential associations between regional cerebral blood flow (CBF) and clinical and demographic variables.
Strengths:
The major strength of the manuscript is the access to an unprecedentedly large cohort of subjects. It demonstrates the sensitivity of regional tissue blood flow in the brain as an important marker of resting brain function. In addition, the authors have demonstrated a thorough analysis methodology and good statistical rigour.
Weaknesses:
This reviewer did not identify any major weaknesses in this work.
-
Author response:
We thank the editors and reviewers for their generally positive and thoughtful feedback on this work. Below are provisional responses to some of the concerns raised:
Reviewer 1:
At a total scan duration of 2 minutes, the ASL sequence utilized in this cohort is much shorter than that of a typical ASL sequence (closer to 5 minutes as mentioned by the authors). However, this implementation also included multiple (n=5) PLDs. As currently described, it is unclear how any repetitions were acquired at each PLD and whether these were acquired efficiently (i.e., with a Look-Locker readout) or whether individual repetitions within this acquisition were dedicated to a single PLD. If the latter, the number of repetitions per PLD (and consequently signal-to-noise-ratio, SNR) is likely to be very low. Have the authors performed any analyses to determine whether the signal in individual subjects generally lies above the noise threshold? This is particularly relevant for white matter, which is the focus of several findings discussed in the study.
We agree that this was a short acquisition compared to most ASL protocols, necessitated by the strict time-keeping requirements for running such a large study. We apologise if this was not clear in the original manuscript, but due to this time constraint and the use of a segmented readout (which was not Look-Locker) there was only time available for a single average at each PLD. This does mean that the perfusion weighted images at each PLD are relatively noisy, although the image quality with this sequence was still reasonable, as demonstrated in Figure 1, with perfusion weighted images visibly above the noise floor. In addition, as has been demonstrated theoretically and experimentally in recent work (Woods et al., 2023, 2020), even though the SNR of each individual PLD image might be low in multi-PLD acquisitions, this is effectively recovered during the model fitting process, giving it comparable or greater accuracy than a protocol which collects many averages at a single (long) PLD. As also noted by the reviewers, this approach has the further benefit of allowing ATT estimation, which has proven to provide useful and complementary information to CBF. Finally, the fact that many of the findings in this study pass strict statistical thresholds for significance, despite the many multiple comparisons performed, and that the spatial patterns of these relationships are consistent with expectations, even in the white matter (e.g. Figure 6B), give us confidence that the perfusion estimation is robust. However, we will consider adding some additional metrics around SNR or fitting uncertainty in a revised manuscript, as well as clarifying details of the acquisition.
Hematocrit is one of the variables regressed out in order to reduce the effect of potential confounding factors on the image-derived phenotypes. The effect of this, however, may be more complex than accounting for other factors (such as age and sex). The authors acknowledge that hematocrit influences ASL signal through its effect on longitudinal blood relaxation rates. However, it is unclear how the authors handled the fact that the longitudinal relaxation of blood (T1Blood) is explicitly needed in the kinetic model for deriving CBF from the ASL data. In addition, while it may reduce false positives related to the relationships between dietary factors and hematocrit, it could also mask the effects of anemia present in the cohort. The concern, therefore, is two-fold: (1) Were individual hematocrit values used to compute T1Blood values? (2) What effect would the deconfounding process have on this?
We agree this is an important point to clarify. In this work we decided not to use the haematocrit to directly estimate the T1 of blood for each participant a) because this would result in slight differences in the model fitting for each subject, which could introduce bias (e.g. the kinetic model used assumes instantaneous exchange between blood water and tissue, so changing the T1 of blood for each subject could make us more sensitive to inaccuracies in this assumption); and b) because typically the haematocrit measures were quite some time (often years) prior to the imaging session, leading to an imperfect correction. We therefore took the pragmatic approach to simply regress each subject’s average haematocrit reading out of the IDP and voxelwise data to prevent it contributing to apparent correlations caused by indirect effects on blood T1. However, we agree with the reviewer that this certainly would mask the effects of anaemia in this cohort, so for researchers interested in this condition a different approach should be taken. We will update the revised manuscript to try to clarify these points.
The authors leverage an observed inverse association between white matter hyperintensity volume and CBF as evidence that white matter perfusion can be sensitively measured using the imaging protocol utilized in this cohort. The relationship between white matter hyperintensities and perfusion, however, is not yet fully understood, and there is disagreement regarding whether this structural imaging marker necessarily represents impaired perfusion. Therefore, it may not be appropriate to use this finding as support for validation of the methodology.
We appreciate the reviewer’s point that there is still debate about the relationship between white matter hyperintensities and perfusion. We therefore agree that this observed relationship therefore does not validate the methodology in the sense that it is an expected finding, but it does demonstrate that the data quality is sufficient to show significant correlations between white matter hyperintensity volume and perfusion, even in white matter regions, which would not be the case if the signal there were dominated by noise. Similarly, the clear spatial pattern of perfusion changes in the white matter that correlate with DTI measures in the same regions also suggests there is sensitivity to white matter perfusion. However, we will update the wording in the revised manuscript to try to clarify this point.
Reviewer 2:
This study primarily serves to illustrate the efficacy and potential of ASL MRI as an imaging parameter in the UK Biobank study, but some of the preliminary observations will be hypothesis-generating for future analyses in larger sample sizes. However, a weakness of the manuscript is that some of the reported observations are difficult to follow. In particular, the associations between ASL and resting fMRI illustrated in Figure 7 and described in the accompanying Results text are difficult to understand. It could also be clearer whether the spatial maps showing ASL correlates of other image-derived phenotypes in Figure 6B are global correlations or confined to specific regions of interest. Finally, while addressing partial volume effects in gray matter regions by covarying for cortical thickness is a reasonable approach, the Methods section seems to imply that a global mean cortical thickness is used, which could be problematic given that cortical thickness changes may be localized.
We apologise if any of the presented information was unclear and will try to improve this in our revised manuscript. To clarify, the spatial maps associated with other (non-ASL) IDPs were generated by calculating the correlation between the ASL CBF or ATT in every voxel in standard space with the non-ASL IDP of interest, not the values of the other imaging modality in the same voxel. No region-based masking was used for this comparison. This allowed us to examine whether the correlation with this non-ASL IDP was only within the same brain region or if the correlations extended to other regions too.
We also agree that the associations between ASL and resting fMRI are not easy to interpret. We therefore tried to be clear in the manuscript that these were preliminary findings that may be of interest to others, but clearly further study is required to explore this complex relationship further. However, we will try to clarify how the results are presented in the revised manuscript.
In relation to partial volume effects, we did indeed use only a global measure of cortical thickness in the deconfounding and we acknowledged that this could be improved in the discussion: [Partial volume effects were] “mitigated here by the inclusion of cortical thickness in the deconfounding process, although a region-specific correction approach that is aware of the through-slice blurring (Boscolo Galazzo et al., 2014) is desirable in future iterations of the ASL analysis pipeline.” As suggested here, although this is a coarse correction, we did not feel that a more comprehensive partial volume correction approach could be used without properly accounting for the through-slice blurring effects from the 3D-GRASE acquisition (that will vary across different brain regions), which is not currently available, although this is an area we are actively working on for future versions of the image analysis pipeline. We again will try to clarify this point further in the revised manuscript.
References
Woods JG, Achten E, Asllani I, Bolar DS, Dai W, Detre J, Fan AP, Fernández-Seara M, Golay X, Günther M, Guo J, Hernandez-Garcia L, Ho M-L, Juttukonda MR, Lu H, MacIntosh BJ, Madhuranthakam AJ, Mutsaerts HJ, Okell TW, Parkes LM, Pinter N, Pinto J, Qin Q, Smits M, Suzuki Y, Thomas DL, Van Osch MJP, Wang DJ, Warnert EAH, Zaharchuk G, Zelaya F, Zhao M, Chappell MA. 2023. Recommendations for Quantitative Cerebral Perfusion MRI using Multi-Timepoint Arterial Spin Labeling: Acquisition, Quantification, and Clinical Applications (preprint). Open Science Framework. doi:10.31219/osf.io/4tskr
Woods JG, Chappell MA, Okell TW. 2020. Designing and comparing optimized pseudo-continuous Arterial Spin Labeling protocols for measurement of cerebral blood flow. NeuroImage 223:117246. doi:10.1016/j.neuroimage.2020.117246
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This valuable study uses state-of-the-art neural encoding and video reconstruction methods to achieve a substantial improvement in video reconstruction quality from mouse neural data. It provides a convincing demonstration of how reconstruction performance can be improved by combining these methods. The goal of the study was improving reconstruction performance rather than advancing theoretical understanding of neural processing, so the results will be of practical interest to the brain decoding community.
-
Reviewer #2 (Public review):
Summary:
This is an interesting study exploring methods for reconstructing visual stimuli from neural activity in the mouse visual cortex. Specifically, it uses a competition dataset (published in the Dynamic Sensorium benchmark study) and a recent winning model architecture (DNEM, dynamic neural encoding model) to recover visual information stored in ensembles of mouse visual cortex.
Strengths:
This is a great start for a project addressing visual reconstruction. It is based on physiological data obtained at a single-cell resolution, the stimulus movies were reasonably naturalistic and representative of the real world, the study did not ignore important correlates such as eye position and pupil diameter, and of course, the reconstruction quality exceeded anything achieved by previous studies. There appear to be no major technical flaws in the study, and some potential confounds were addressed upon revision. The study is an enjoyable read.
Weaknesses:
The study is technically competent and benchmark-focused, but without significant conceptual or theoretical advances. The inclusion of neuronal data broadens the study's appeal, but the work does not explore potential principles of neural coding, which limits its relevance for neuroscience and may create some disappointment to some neuroscientists. The authors are transparent that their goal was methodological rather than explanatory, but this raises the question of why neuronal data were necessary at all, as more significant reconstruction improvements might be achievable using noise-less artificial video encoders alone (network-to-network decoding approaches have been done well by teams such as Han, Poggio, and Cheung, 2023, ICML). Yet, even within the methodological domain, the study does not articulate clear principles or heuristics that could guide future progress. The finding that more neurons improve reconstruction aligns with well-established results in the literature that show that higher neuronal numbers improve decoding in general (for example, Hung, Kreiman, Poggio, and DiCarlo, 2005) and thus may not constitute a novel insight.
Specific issues:
(1) The study showed that it could achieve high-quality video reconstructions from mouse visual cortex activity using a neural encoding model (DNEM), recovering 10-second video sequences and approaching a two-fold improvement in pixel-by-pixel correlation over attempts. As a reader, I was left with the question: okay, does this mean that we should all switch to DNEM for our investigations of mouse visual cortex? What makes this encoding model special? It is introduced as "a winning model of the Sensorium 2023 competition which achieved a score of 0.301...single trial correlation between predicted and ground truth neuronal activity," but as someone who does not follow this competition (most eLife readers are not likely to do so, either), I do not know how to gauge my response. Is this impressive? What is the best theoretical score, given noise and other limitations? Is the model inspired by the mouse brain in terms of mechanisms or architecture, or was it optimized to win the competition by overfitting it to the nuances of the data set? Of course, I know that as a reader, I am invited to read the references, but the study would stand better on its own, if it clarified how its findings depended on this model.
The revision helpfully added context to the Methods about the range of scores achieved by other models, but this information remains absent from the Abstract and other important sections. For instance, the Abstract states, "We achieve a pixel-level correlation of 0.57 between the ground truth movie and the reconstructions from single-trial neural responses," yet this point estimate (presented without confidence intervals or comparisons to controls) lacks meaning for readers who are not told how it compares to prior work or what level of performance would be considered strong. Without such context, the manuscript undercuts potentially meaningful achievements.
(2) Along those lines, the authors conclude that "the number of neurons in the dataset and the use of model ensembling are critical for high-quality reconstructions." If true, these principles should generalize across network architectures. I wondered whether the same dependencies would hold for other network types, as this could reveal more general insights. The authors replied that such extensions are expected (since prior work has shown similar effects for static images) but argued that testing this explicitly would require "substantial additional work," be "impractical," and likely not produce "surprising results." While practical difficulty alone is not a sufficient reason to leave an idea untested, I agree that the idea that "more neurons would help" would be unsurprising. The question then becomes: given that this is a conclusion already in the field, what new principle or understanding has been gained in this study?
(3) One major claim was that the quality of the reconstructions depended on the number of neurons in the dataset. There were approximately 8000 neurons recorded per mouse. The correlation difference between the reconstruction achieved by 1000 neurons and 8000 neurons was ~0.2. Is that a lot or a little? One might hypothesize that 7000 additional neurons could contribute more information, but perhaps, those neurons were redundant if their receptive fields are too close together or if they had the same orientation or spatiotemporal tuning. How correlated were these neurons in response to a given movie? Why did so many neurons offer such a limited increase in correlation? Originally, this question was meant to prompt deeper analysis of the neural data, but the authors did not engage with it, suggesting a limited understanding of the neuronal aspects of the dataset.
(4) We appreciated the experiments testing the capacity of the reconstruction process, by using synthetic stimuli created under a Gaussian process in a noise-free way. But this originally further raised questions: what is the theoretical capability for reconstruction of this processing pipeline, as a whole? Is 0.563 the best that one could achieve given the noisiness and/or neuron count of the Sensorium project? What if the team applied the pipeline to reconstruct the activity of a given artificial neural network's layer (e.g., some ResNet convolutional layer), using hidden units as proxies for neuronal calcium activity? In the revision, this concern was addressed nicely in the review in Supplementary Figure 3C. Also, one appreciates that as a follow up, the team produced error maps (New Figure 6) that highlight where in the frames the reconstruction are likely to fail. But the maps went unanalyzed further, and I am not sure if there was a systematic trend in the errors.
(5) I was encouraged by Figure 4, which shows how the reconstructions succeeded or failed across different spatial frequencies. The authors note that "the reconstruction process failed at high spatial frequencies," yet it also appears to struggle with low spatial frequencies, as the reconstructed images did not produce smooth surfaces (e.g., see the top rows of Figures 4A and 4B). In regions where one would expect a single continuous gradient, the reconstructions instead display specular, high-frequency noise. This issue is difficult to overlook and might deserve further discussion.
-
Reviewer #3 (Public review):
Summary:
This paper presents a method for reconstructing input videos shown to a mouse from the simultaneously recorded visual cortex activity (two-photon calcium imaging data). The publicly available experimental dataset is taken from a recent brain-encoding challenge, and the (publicly available) neural network model that serves to reconstruct the videos is the winning model from that challenge (by distinct authors). The present study applies gradient-based input optimization by backpropagating the brain-encoding error through this selected model (a method that has been proposed in the past, with other datasets). The main contribution of the paper is, therefore, the choice of applying this existing method to this specific dataset with this specific neural network model. The quantitative results appear to go beyond previous attempts at video input reconstruction (although measured with distinct datasets). The conclusions have potential practical interest for the field of brain decoding, and theoretical interest for possible future uses in functional brain exploration.
Strengths:
The authors use a validated optimization method on a recent large-scale dataset, with a state-of-the-art brain encoding model. The use of an ensemble of 7 distinct model instances (trained on distinct subsets of the dataset, with distinct random initializations) significantly improves the reconstructions. The exploration of the relation between reconstruction quality and number of recorded neurons will be useful to those planning future experiments.
Weaknesses:
The main contribution is methodological, and the methodology combines pre-existing components without any new original component.
-
Author response:
The following is the authors’ response to the current reviews.
Public Reviews:
Reviewer #2 (Public review):
Summary:
This is an interesting study exploring methods for reconstructing visual stimuli from neural activity in the mouse visual cortex. Specifically, it uses a competition dataset (published in the Dynamic Sensorium benchmark study) and a recent winning model architecture (DNEM, dynamic neural encoding model) to recover visual information stored in ensembles of mouse visual cortex.
Strengths:
This is a great start for a project addressing visual reconstruction. It is based on physiological data obtained at a single-cell resolution, the stimulus movies were reasonably naturalistic and representative of the real world, the study did not ignore important correlates such as eye position and pupil diameter, and of course, the reconstruction quality exceeded anything achieved by previous studies. There appear to be no major technical flaws in the study, and some potential confounds were addressed upon revision. The study is an enjoyable read.
Weaknesses:
The study is technically competent and benchmark-focused, but without significant conceptual or theoretical advances. The inclusion of neuronal data broadens the study's appeal, but the work does not explore potential principles of neural coding, which limits its relevance for neuroscience and may create some disappointment to some neuroscientists. The authors are transparent that their goal was methodological rather than explanatory, but this raises the question of why neuronal data were necessary at all, as more significant reconstruction improvements might be achievable using noise-less artificial video encoders alone (network-to-network decoding approaches have been done well by teams such as Han, Poggio, and Cheung, 2023, ICML). Yet, even within the methodological domain, the study does not articulate clear principles or heuristics that could guide future progress. The finding that more neurons improve reconstruction aligns with well-established results in the literature that show that higher neuronal numbers improve decoding in general (for example, Hung, Kreiman, Poggio, and DiCarlo, 2005) and thus may not constitute a novel insight.
We thank the reviewer for this second round of comments and hope we were able to address the remaining points below.
Indeed, using surrogate noiseless data is interesting and useful when developing such methods, or to demonstrate that they work in principle. But in order to evaluate if they really work in practice, we need to use real neuronal data. While we did not try movie reconstruction from layers within artificial neural networks as surrogate data, in Supplementary Figure 3C we provide the performance of our method using simulated/predicted neuronal responses from the dynamic neural encoding model alongside real neuronal responses.
Specific issues:
(1)The study showed that it could achieve high-quality video reconstructions from mouse visual cortex activity using a neural encoding model (DNEM), recovering 10-second video sequences and approaching a two-fold improvement in pixel-by-pixel correlation over attempts. As a reader, I was left with the question: okay, does this mean that we should all switch to DNEM for our investigations of mouse visual cortex? What makes this encoding model special? It is introduced as "a winning model of the Sensorium 2023 competition which achieved a score of 0.301...single trial correlation between predicted and ground truth neuronal activity," but as someone who does not follow this competition (most eLife readers are not likely to do so, either), I do not know how to gauge my response. Is this impressive? What is the best theoretical score, given noise and other limitations? Is the model inspired by the mouse brain in terms of mechanisms or architecture, or was it optimized to win the competition by overfitting it to the nuances of the data set? Of course, I know that as a reader, I am invited to read the references, but the study would stand better on its own, if it clarified how its findings depended on this model.
The revision helpfully added context to the Methods about the range of scores achieved by other models, but this information remains absent from the Abstract and other important sections. For instance, the Abstract states, "We achieve a pixel-level correlation of 0.57 between the ground truth movie and the reconstructions from single-trial neural responses," yet this point estimate (presented without confidence intervals or comparisons to controls) lacks meaning for readers who are not told how it compares to prior work or what level of performance would be considered strong. Without such context, the manuscript undercuts potentially meaningful achievements.
We appreciate that the additional information about the performance of the SOTA DNEM to predict neural responses could be made more visible in the paper and will therefore move it from the methods to the results section instead:
Line 348 “This model achieved an average single-trial correlation between predicted and ground truth neural activity of 0.291 during the competition, this was later improved to 0.301. The competition benchmark models achieved 0.106, 0.164 and 0.197 single-trial correlation, while the third and second place models achieved 0.243 and 0.265. Across the models, a variety of architectural components were used, including 2D and 3D convolutional layers, recurrent layers, and transformers, to name just a few.” will be moved to the results.
With regard to the lack of context for the performance of our reconstruction in the abstract, we may have overcorrected in the previous revision round and have tried to find a compromise which gives more context to the pixel-level correlation value:
Abstract: “We achieve a pixel-level correlation of 0.57 (95% CI [0.54, 0.60]) between ground-truth movies and single-trial reconstructions. Previous reconstructions based on awake mouse V1 neuronal responses to static images achieved a pixel-level correlation of 0.238 over a similar retinotopic area.”
(2) Along those lines, the authors conclude that "the number of neurons in the dataset and the use of model ensembling are critical for high-quality reconstructions." If true, these principles should generalize across network architectures. I wondered whether the same dependencies would hold for other network types, as this could reveal more general insights. The authors replied that such extensions are expected (since prior work has shown similar effects for static images) but argued that testing this explicitly would require "substantial additional work," be "impractical," and likely not produce "surprising results." While practical difficulty alone is not a sufficient reason to leave an idea untested, I agree that the idea that "more neurons would help" would be unsurprising. The question then becomes: given that this is a conclusion already in the field, what new principle or understanding has been gained in this study?
As mentioned in our previous round of revisions, we chose not to pursue the comparison of reconstructions using different model architectures in this manuscript because we did not think it would add significant insights to the paper given the amount of work it would require, and we are glad the reviewer agrees.
While the fact that more neurons result in better reconstructions is unsurprising, how quickly performance drops off will depend on the robustness of the method, and on the dimensionality of the decoding/reconstruction task (decoding grating orientation likely requires fewer neurons than gray scale image reconstruction, which in turn likely requires fewer neurons than full color movie reconstruction). How dependent input optimization based image/movie reconstruction is on population size has not been shown, so we felt it was useful for readers to know how well movie reconstruction works with our method when recording from smaller numbers of neurons.
(3) One major claim was that the quality of the reconstructions depended on the number of neurons in the dataset. There were approximately 8000 neurons recorded per mouse. The correlation difference between the reconstruction achieved by 1000 neurons and 8000 neurons was ~0.2. Is that a lot or a little? One might hypothesize that 7000 additional neurons could contribute more information, but perhaps, those neurons were redundant if their receptive fields are too close together or if they had the same orientation or spatiotemporal tuning. How correlated were these neurons in response to a given movie? Why did so many neurons offer such a limited increase in correlation? Originally, this question was meant to prompt deeper analysis of the neural data, but the authors did not engage with it, suggesting a limited understanding of the neuronal aspects of the dataset.
We apologize that we did not engage with this comment enough in the previous round. We assumed that the question arose because there was a misunderstanding about figure 5: 1000 not 1 neuron is sufficient to reconstruct the movies to a pixel-level correlation of 0.344. Of course, the fact that increasing the number of neurons from 1000 to 8000 only increased the reconstruction performance from 0.344 to 0.569 (65% increase in correlation) is still worth discussing. To illustrate this drop in performance qualitatively, we show 3 example frames from movie reconstructions using 1000-8000 neurons in Author response image 1.
Author response image 1.
3 example frames from reconstructions using different numbers of neurons.
As the reviewer points out, the diminishing returns of additional neurons to reconstruction performance is at least partly because there is redundancy in how a population of neurons represents visual stimuli. In supplementary figure S2, we inferred the on-off receptive fields of the neurons and show that visual space is oversampled in terms of the receptive field positions in panel C. However, the exact slope/shape of the performance vs population size curve we show in Figure 5 will also depend on the maximum performance of our reconstruction method, which is limited in spatial resolution (Figure 4 & Supplementary Figure S5). It is possible that future reconstruction approaches will require fewer neurons than ours, so we interpret this curve rather as a description of the reconstruction method itself than a feature of the underlying neuronal code. For that reason, we chose caution and refrained from making any claims about neuronal coding principles based on this plot.
(4) We appreciated the experiments testing the capacity of the reconstruction process, by using synthetic stimuli created under a Gaussian process in a noise-free way. But this originally further raised questions: what is the theoretical capability for reconstruction of this processing pipeline, as a whole? Is 0.563 the best that one could achieve given the noisiness and/or neuron count of the Sensorium project? What if the team applied the pipeline to reconstruct the activity of a given artificial neural network's layer (e.g., some ResNet convolutional layer), using hidden units as proxies for neuronal calcium activity? In the revision, this concern was addressed nicely in the review in Supplementary Figure 3C. Also, one appreciates that as a follow up, the team produced error maps (New Figure 6) that highlight where in the frames the reconstruction are likely to fail. But the maps went unanalyzed further, and I am not sure if there was a systematic trend in the errors.
We are happy to hear that we were able to answer the reviewers’ question of what the maximum theoretical performance of our reconstruction process is in figure 3C. Regarding systematic trends in the error maps, we also did not observe any clear systematic trends. If anything, we noticed that some moving edges were shifted, but we do not think we can quantify this effect with this particular dataset.
(5) I was encouraged by Figure 4, which shows how the reconstructions succeeded or failed across different spatial frequencies. The authors note that "the reconstruction process failed at high spatial frequencies," yet it also appears to struggle with low spatial frequencies, as the reconstructed images did not produce smooth surfaces (e.g., see the top rows of Figures 4A and 4B). In regions where one would expect a single continuous gradient, the reconstructions instead display specular, high-frequency noise. This issue is difficult to overlook and might deserve further discussion.
Thank you for pointing this out, this is indeed true. The reconstructions do have high frequency noise. We mention this briefly in line 102 “Finally, we applied a 3D Gaussian filter with sigma 0.5 pixels to remove the remaining static noise (Figure S3) and applied the evaluation mask.” In revisiting this sentence, we think it is more appropriate to replace “remove” with “reduce”. This noise is more visible in the Gaussian noise stimuli (Figure 4) because we did not apply the 3D Gaussian filter to these reconstructions, in case it interfered with the estimates of the reconstruction resolution limits.
Given that the Gaussian noise and drifting grating stimuli reconstructions were from predicted activity (“noise-free”), this high-frequency noise is not biological in origin and must therefore come from errors in our reconstruction process. This kind of high-frequency noise has previously been observed in feature visualization (optimizing input to maximize the activity of a specific node within a neural network to visualize what that node encodes; Olah, et al., "Feature Visualization", https://distill.pub/2017/feature-visualization/, 2017). It is caused by a kind of overfitting, whereby a solution to the optimization is found that is not “realistic”. Ways of combating this kind of noise include gradient smoothing, image smoothing, and image transformations during optimization, but these methods can restrict the resolution of the features that are recovered. Since we were more interested in determining the maximum resolution of stimuli that can be reconstructed in Figure 4 and Supplementary Figures 5-6, we chose not to apply these methods.
Reviewer #3 (Public review):
Summary:
This paper presents a method for reconstructing input videos shown to a mouse from the simultaneously recorded visual cortex activity (two-photon calcium imaging data). The publicly available experimental dataset is taken from a recent brain-encoding challenge, and the (publicly available) neural network model that serves to reconstruct the videos is the winning model from that challenge (by distinct authors). The present study applies gradient-based input optimization by backpropagating the brain-encoding error through this selected model (a method that has been proposed in the past, with other datasets). The main contribution of the paper is, therefore, the choice of applying this existing method to this specific dataset with this specific neural network model. The quantitative results appear to go beyond previous attempts at video input reconstruction (although measured with distinct datasets). The conclusions have potential practical interest for the field of brain decoding, and theoretical interest for possible future uses in functional brain exploration.
Strengths:
The authors use a validated optimization method on a recent large-scale dataset, with a state-of-the-art brain encoding model. The use of an ensemble of 7 distinct model instances (trained on distinct subsets of the dataset, with distinct random initializations) significantly improves the reconstructions. The exploration of the relation between reconstruction quality and number of recorded neurons will be useful to those planning future experiments.
Weaknesses:
The main contribution is methodological, and the methodology combines pre-existing components without any new original component.
We thank the reviewer for their balanced assessment of our manuscript.
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary:
This paper presents a method for reconstructing videos from mouse visual cortex neuronal activity using a state-of-the-art dynamic neural encoding model. The authors achieve high-quality reconstructions of 10-second movies at 30 Hz from two-photon calcium imaging data, reporting a 2-fold increase in pixel-by-pixel correlation compared to previous methods. They identify key factors for successful reconstruction including the number of recorded neurons and model ensembling techniques.
Strengths:
(1) A comprehensive technical approach combining state-of-the-art neural encoding models with gradient-based optimization for video reconstruction.
(2) Thorough evaluation of reconstruction quality across different spatial and temporal frequencies using both natural videos and synthetic stimuli.
(3) Detailed analysis of factors affecting reconstruction quality, including population size and model ensembling effects.
(4) Clear methodology presentation with well-documented algorithms and reproducible code.
(5) Potential applications for investigating visual processing phenomena like predictive coding and perceptual learning.
We thank the reviewer for taking the time to provide this valuable feedback. We would like to add that in our eyes one additional main contribution is the step of going from reconstruction of static images to dynamic videos. We trust that in the revised manuscript, we have now made the point more explicit that static image reconstruction relies on temporally averaged responses, which negates the necessity of having to account for temporal dynamics altogether.
Weaknesses:
The main metric of success (pixel correlation) may not be the most meaningful measure of reconstruction quality:
High correlation may not capture perceptually relevant features.
Different stimuli producing similar neural responses could have low pixel correlations The paper doesn't fully justify why high pixel correlation is a valuable goal
This is a very relevant point. In retrospect, perhaps we did not justify this enough. Sensory reconstruction typically aims to reconstruct sensory input based on brain activity as faithfully as possible. A brain-to-image decoder might therefore be trained to produce images as close to the original input as possible. The loss function to train the decoder would therefore be image similarity on the pixel level. In that case, evaluating reconstruction performance based on pixel correlation is somewhat circular.
However, when reconstructing videos, we optimize the input video in terms of its perceptual similarity to the original video and only then evaluate pixel-level similarity. The perceptual similarity metric we optimize for is the estimate of how the neurons in mouse V1 respond to that video. We then evaluate the similarity of this perceptually optimized video to the original input video with pixel-level correlation. In other words, we optimize for perceptual similarity and then evaluate pixel similarity. If our method optimized pixel-level similarity, then we would agree that perceptual similarity is a more relevant evaluation metric. We do not think it was clear in our original submission that our optimization loss function is a perceptual loss function, and have now made this clearer in Figure 1C-D and have clarified this in the results section, line 70:
“In effect, we optimized the input video to be perceptually similar with respect to the recorded neurons.”
And in line 110:
“Because our optimization of the movies was based on a perceptual loss function, we were interested in how closely these movies matched the originals on the pixel level.”
We chose to use pixel correlation to measure pixel-level similarity for several reasons. 1) It has been used in the past to evaluate reconstruction performance (Yoshida et al., 2020), 2) It is contrast and luminance insensitive, 3) correlation is a common metric so most readers will have an intuitive understanding of how it relates to the data.
To further highlight why pixel similarity might be interesting to visualize, we have included additional analysis in Figure 6 illustrating pixel-level differences between reconstructions from experimentally recorded activity and predicted activity.
We expect that the type of perceptual similarity the reviewer is alluding to is pretrained neural network image embedding similarity (Zhang et al., 2018: https://doi.org/10.48550/arXiv.1801.03924). While these metrics seem to match human perceptual similarity, it is unclear if they reflect mouse vision. We did try to compare the embedding similarity from pretrained networks such as VGG16, but got results suggesting the reconstructed frames were no more similar to the ground truth than random frames, which is obviously not true. This might be because the ground truth videos were too different in resolution from the training data of these networks and because these metrics are typically very sensitive to decreases in resolution.
The best alternative approach to evaluate mouse perceptual similarity would be to show the reconstructed videos to the same animals while recording the same neurons and to compare these neural activation patterns to those evoked by the original ground truth videos. This has been done for static images in the past: Cobos et al., bioRxiv 2022, found that static image reconstructions generated using gradient descent evoked more similar trial-averaged (40 trials) responses to those evoked by ground truth images compared to other reconstruction methods. Unfortunately, we are currently not able to perform these in vivo experiments, which is why we used publicly available data for the current paper. We plan to use this method in the future. But this method is also not flawless as it assumes that the average response to an image is the best reflection of how that image is represented, which may not be the case for an individual trial.
As far as we are aware, there is currently no method that, given a particular activity pattern in response to an image/video, can produce an image/video that induces a neural activity pattern that is closer to the original neural response than simply showing the same image/video again. Hypothetically, such a stimulus exists because of various visual processing phenomena we mention in our discussion (e.g., predictive coding and selective attention), which suggest that the image that is represented by a population of neurons likely differs from the original sensory input. In other words, what the brain represents is an interpretation of reality not a pure reflection. Experimentally verifying this is difficult, as these variations might be present on a single trial level. The first step towards establishing a method that captures the visual representation of a population of neurons is sensory reconstruction, where the aim is to get as close as possible to the original sensory input. We think pixel-level correlation is a stringent and interpretable metric for this purpose, particularly when optimizing for perceptual similarity rather than image similarity directly.
Comparison to previous work (Yoshida et al.) has methodological concerns: Direct comparison of correlation values across different datasets may be misleading; Large differences in the number of recorded neurons (10x more in the current study); Different stimulus types (dynamic vs static) make comparison difficult; No implementation of previous methods on the current dataset or vice versa.
Yes, we absolutely agree that direct comparison to previous static image reconstruction methods is problematic. We primarily do so because we think it is standard practice to give related baselines. We agree that direct comparison of the performance of video reconstruction methods to image reconstruction methods is not really possible. It does not make sense to train and apply a dynamic model on a static image data set where neural activity is time-averaged, as the temporal kernels could not be learned. Conversely, for a static model, which expects a single image as input and predicts time averaged responses, it does not make sense to feed it a series of temporally correlated movie frames and to simply concatenate the resulting activity perdition. The static model would need to be substantially augmented to incorporate temporal dynamics, which in turn would make it a new method. This puts us in the awkward position of being expected to compare our video reconstruction performance to previous image reconstruction methods without a fair way of doing so. We have now added these caveats in line 119:
“However, we would like to stress that directly comparing static image reconstruction methods with movie reconstruction approaches is fundamentally problematic, as they rely on different data types both during training and evaluation (temporally averaged vs continuous neural activity, images flashed at fixed intervals vs continuous movies).”
We have also toned down the language, emphasising the comparison to previous image reconstruction performance in the abstract, results, and conclusion.
Abstract: We removed “We achieve a ~2-fold increase in pixel-by-pixel correlation compared to previous state-of-the-art reconstructions of static images from mouse V1, while also capturing temporal dynamics.” and replaced with “We achieve a pixel-level correction of 0.57 between the ground truth movie and the reconstructions from single-trial neural responses.”
Discussion: we removed “In conclusion, we reconstruct videos presented to mice based on the activity of neurons in the mouse visual cortex, with a ~2-fold improvement in pixel-by-pixel correlation compared to previous static image reconstruction methods.” and replaced with “In conclusion, we reconstruct videos presented to mice based on single-trial activity of neurons in the mouse visual cortex.”
We have also removed the performance table and have instead added supplementary figure 3 with in-depth comparison across different versions of our reconstruction method (variations of masking, ensembling, contrast & luminance matching, and Gaussian blurring).
Limited exploration of how the reconstruction method could provide insights into neural coding principles beyond demonstrating technical capability.
The aim of this paper was not to reveal principles of neural coding. Instead, we aimed to achieve the best possible performance of video reconstructions and to quantify the limitations. But to highlight its potential we have added two examples of how sensory reconstruction has been applied in human vision research in line 321:
“Although fMRI-based reconstruction techniques are starting to be used to investigate visual phenomena in humans (such as illusions [Cheng et al., 2023] and mental imagery [Shen et al., 2019; Koide-Majima et al., 2024; Kalantari et al., 2025]), visual processing phenomena are likely difficult to investigate using existing fMRI-based reconstruction approaches, due to the low spatial and temporal resolution of the data.”
We have also added a demonstration of how this method could be used to investigate which parts of a reconstruction from a single trial response differs from the model's prediction (Figure 6). We do this by calculating pixel-level differences between reconstructions from the recorded neural activity and reconstructions from the expected neural activity (predicted activity by the neural encoding model). Although difficult to interpret, this pixel-by-pixel error map could represent trial-by-trial deviations of the neural code from pure sensory representation. But at this point we cannot know whether these errors are nothing more than errors in the reconstruction process. To derive meaningful interpretations of these maps would require a substantial amount of additional work and in vivo experiments and so is outside the scope of this paper, but we include this additional analysis now to highlight a) why pixel-level similarity might be interesting to quantify and visualize and b) to demonstrate how video reconstruction could be used to provide insights into neural coding, namely as a tool to identify how sensory representations differ from a pure reflection of the visual input.
The claim that "stimulus reconstruction promises a more generalizable approach" (line 180) is not well supported with concrete examples or evidence.
What we mean by generalizable is the ability to apply reconstruction to novel stimuli, which is not possible for stimulus classification. We now explain this better in the paragraph in line 211:
“Stimulus identification, i.e. identifying the most likely stimulus from a constrained set, has been a popular approach for quantifying whether a population of neurons encodes the identity of a particular stimulus [Földiák, 1993, Kay et al., 2008]. This approach has, for instance, been used to decode frame identity within a movie [Deitch et al., 2021, Xia et al., 2021, Schneider et al., 2023, Chen et al.,2024]. Some of these approaches have also been used to reorder the frames of the ground truth movie [Schneider et al., 2023] based on the decoded frame identity. Importantly, stimulus identification methods are distinct from stimulus reconstruction where the aim is to recreate what the sensory content of a neuronal code is in a way that generalizes to new sensory stimuli [Rakhimberdina et al., 2021]. This is inherently a more demanding task because the range of possible solutions is much larger. Although stimulus identification is a valuable tool for understanding the information content of a population code, stimulus reconstruction could provide a more generalizable approach, because it can be applied to novel stimuli.”
All the stimuli we reconstructed were not in the training set of the model, i.e., novel. We have also downed down the claim: we have replaced “promises” with “could provide”.
The paper would benefit from addressing how the method handles cases where different stimuli produce similar neural responses, particularly for high-speed moving stimuli where phase differences might be lost in calcium imaging temporal resolution.
Thank you for this suggestion, we think this is a great question. Calcium dynamics are slow and some of the high temporal frequency information could indeed be lost, particularly phase information. In other words, when the stimulus has high temporal frequency information, it is harder to decode spatial information because of the slow calcium dynamics. Ideally, we would look at this effect using the drifting grating stimuli; however, this is problematic because we rely on predicted activity from the SOTA DNEM, and due to the dilation of the first convolution, the periodic grating stimulus causes aliasing. At 15Hz, when the temporal frequency of the stimulus is half the movie frame rate, the model is actually being given two static images, and so the predicted activity is the interleaved activity evoked by two static images. We therefore do not think using the grating stimuli is a good idea. But we have used the Gaussian stimuli as it is not periodic, and is therefore less of a problem.
We have now also reconstructed phase-inverted Gaussian noise stimuli and plotted the video correlation between the reconstructions from activity evoked by phase-inverted stimuli. On the one hand, we find that even for the fastest changing stimuli, the correlation between the reconstructions from phase inverted stimuli are negative, meaning phase information is not lost at high temporal frequencies. On the other hand, for the highest spatial frequency stimuli, the correlation is negative. So, the predicted neural activity (and therefore the reconstructions) are phase-insensitive when the spatial frequency is higher than the reconstruction resolution limit we identified (spatial length constant of 1 pixel, or 3.38 degrees). Beyond this limit, the DNEM predicts activity in response to phase-inverted stimuli, which, when used for reconstruction, results in movies which are more similar to each other than the stimulus that actually evokes them.
However, not all information is lost at these high spatial frequencies. If we plot the Shannon entropy in the spatial domain or the motion energy in the temporal domain, we find that even when the reconstructions fail to capture the stimulus at a pixel-specific level (spatial length constant of 1 pixel, or 3.38 degrees), they do capture the general spatial and temporal qualities of the videos.
We have added these additional analyses to Figure 4 and Supplementary Figure 5.
Reviewer #2 (Public review):
This is an interesting study exploring methods for reconstructing visual stimuli from neural activity in the mouse visual cortex. Specifically, it uses a competition dataset (published in the Dynamic Sensorium benchmark study) and a recent winning model architecture (DNEM, dynamic neural encoding model) to recover visual information stored in ensembles of the mouse visual cortex.
This is a great project - the physiological data were measured at a single-cell resolution, the movies were reasonably naturalistic and representative of the real world, the study did not ignore important correlates such as eye position and pupil diameter, and of course, the reconstruction quality exceeded anything achieved by previous studies. Overall, it is great that teams are working towards exploring image reconstruction. Arguably, reconstruction may serve as an endgame method for examining the information content within neuronal ensembles - an alternative to training interminable numbers of supervised classifiers, as has been done in other studies. Put differently, if a reconstruction recovers a lot of visual features (maybe most of them), then it tells us a lot about what the visual brain is trying to do: to keep as much information as possible about the natural world in which its internal motor circuits may act consequently.
While we enjoyed reading the manuscript, we admit that the overall advance was in the range of those that one finds in a great machine learning conference proceedings paper. More specifically, we found no major technical flaws in the study, only a few potential major confounds (which should be addressable with new analyses), and the manuscript did not make claims that were not supported by its findings, yet the specific conceptual advance and significance seemed modest. Below, we will go through some of the claims, and ask about their potential significance.
We thank the reviewer for the positive feedback on our paper.
(1) The study showed that it could achieve high-quality video reconstructions from mouse visual cortex activity using a neural encoding model (DNEM), recovering 10-second video sequences and approaching a two-fold improvement in pixel-by-pixel correlation over attempts. As a reader, I am left with the question: okay, does this mean that we should all switch to DNEM for our investigations of the mouse visual cortex? What makes this encoding model special? It is introduced as "a winning model of the Sensorium 2023 competition which achieved a score of 0.301... single-trial correlation between predicted and ground truth neuronal activity," but as someone who does not follow this competition (most eLife readers are not likely to do so, either), I do not know how to gauge my response. Is this impressive? What is the best achievable score, in theory, given data noise? Is the model inspired by the mouse brain in terms of mechanisms or architecture, or was it optimized to win the competition by overfitting it to the nuances of the data set? Of course, I know that as a reader, I am invited to read the references, but the study would stand better on its own if clarified how its findings depended on this model.
This is a very good point. We do not think that everyone should switch to using this particular DNEM to investigate the mouse visual cortex, but we think DNEMs and stimulus reconstruction in general has a lot of potential. We think static neural encoding models have already been demonstrated to be an extremely valuable tool to investigate visual coding (Walker et al., 2019; Yoshida et al., 2021; Willeke et al., bioRxiv 2023). DNEMs are less common, largely because they are very large and are technically more demanding to train and use. That makes static encoding models more practical for some applications, but they do not have temporal kernels and are therefore only used for static stimuli. They cannot, for instance, encode direction tuning, only orientation tuning. But both static and dynamic encoding models have advantages over stimulus classification methods which we outline in our discussion. Here we provide the first demonstration that previous achievements in static image reconstruction are transferable to movies.
It has been shown in the past for static neural encoding models that choosing a better-performing model produces reconstructed static images that are closer to the original image (Pierzchlewicz et al., 2023). The factors in choosing this particular DNEM were its capacity to predict neural activity (benchmarked against other models), it was open source, and the data it was designed for was also available.
To give more context to the model used in the paper, we have included the following, line 348:
“This model achieved an average single-trial correlation between predicted and ground truth neural activity of 0.291 during the competition, this was later improved to 0.301. The competition benchmark models achieved 0.106, 0.164 and 0.197 single-trial correlation, while the third and second place models achieved 0.243 and 0.265. Across the models, a variety of architectural components were used, including 2D and 3D convolutional layers, recurrent layers, and transformers, to name just a few.”
Concerning biologically inspired model design. The winning model contained 3 fully connected layers comprising the “Cortex” just before the final readout of neural activity, but we would consider this level of biological inspiration as minor. We do not think that the exact architecture of the model is particularly important, as the crucial aspect of such neural encoders is their ability to predict neural activity irrespective of how they achieve it. There has been a move towards creating foundation models of the brain (Wang et al., 2025) and the priority so far has been on predictive performance over mechanistic interpretability or similarity to biological structures and processes.
Finally, we would like to note that we do not know what the maximum theoretical score for single-trial responses might be, and don't think there is a good way of estimating it in this context.
(2) Along those lines, two major conclusions were that "critical for high-quality reconstructions are the number of neurons in the dataset and the use of model ensembling." If true, then these principles should be applicable to networks with different architectures. How well can they do with other network types?
This is a good question. Our method critically relies on the accurate prediction of neural activity in response to new videos. It is therefore expected that a model that better predicts neural responses to stimuli will also be better at reconstructing those stimuli given population activity. This was previously shown for static images (Pierzchlewicz et al., 2023). It is also expected that whenever the neural activity is accurately predicted, the corresponding reconstructed frames will also be more similar to the ground truth frames. We have now demonstrated this relationship between prediction accuracy and reconstruction accuracy in supplementary figure 4.
Although it would be interesting to compare the movie reconstruction performance of many different models with different architectures and activity prediction performances, this would involve quite substantial additional work because movie reconstruction is very resource- and time-intensive. Finding optimal hyperparameters to make such a comparison fair and informative would therefore be impractical and likely not yield surprising results.
We also think it is unlikely that ensembling would not improve reconstruction performance in other models because ensembling across model predictions is a common way of improving single-model performance in machine learning. Likewise, we think it is unlikely that the relationship between neural population size and reconstruction performance would differ substantially when using different models, because using more neurons means that a larger population of noisy neurons is “voting” on what the stimulus is. However, we would expect that if the model were worse at predicting neural activity, then more neurons are needed for an equivalent reconstruction performance. In general, we would recommend choosing the best possible DNEM available, in terms of neural activity prediction performance, when reconstructing movies using input optimization through gradient descent.
(3) One major claim was that the quality of the reconstructions depended on the number of neurons in the dataset. There were approximately 8000 neurons recorded per mouse. The correlation difference between the reconstruction achieved by 1 neuron and 8000 neurons was ~0.2. Is that a lot or a little? One might hypothesize that ~7,999 additional neurons could contribute more information, but perhaps, those neurons were redundant if their receptive fields were too close together or if they had the same orientation or spatiotemporal tuning. How correlated were these neurons in response to a given movie? Why did so many neurons offer such a limited increase in correlation?
In the population ablation experiments, we compared the performance using ~1000, ~2000, ~4000, ~8000 neurons, and found an attenuation of 39.5% in video correlation when dropping 87.5% of the neurons (~1000 neurons remaining), we did not try reconstruction using just 1 neuron.
(4) On a related note, the authors address the confound of RF location and extent. The study resorted to the use of a mask on the image during reconstruction, applied during training and evaluation (Line 87). The mask depends on pixels that contribute to the accurate prediction of neuronal activity. The problem for me is that it reads as if the RF/mask estimate was obtained during the very same process of reconstruction optimization, which could be considered a form of double-dipping (see the "Dead salmon" article, https://doi.org/10.1016/S1053-8119(09)71202-9). This could inflate the reconstruction estimate. My concern would be ameliorated if the mask was obtained using a held-out set of movies or image presentations; further, the mask should shift with eye position, if it indeed corresponded to the "collective receptive field of the neural population." Ideally, the team would also provide the characteristics of these putative RFs, such as their weight and spatial distribution, and whether they matched the biological receptive fields of the neurons (if measured independently).
We can reassure the reviewer that there is no double-dipping. We would like to clarify that the mask was trained only on videos from the training set of the DNEM and not the videos which were reconstructed. We have added the sentence, line 91:
“None of the reconstructed movies were used in the optimization of this transparency mask.”
Making the mask dependent on eye position would be difficult to implement with the current DNEM, where eye position is fed to the model as an additional channel. When using a model where the image is first transformed into retinotopic coordinates in an eye position-dependent manner (such as in Wang et al., 2025) the mask could be applied in retinotopic coordinates and therefore be dependent on eye position.
Effectively, the alpha mask defines the relative level of influence each pixel contributes to neural activity prediction. We agree it is useful to compare the shape of the alpha mask with the location of traditional on-off receptive fields (RFs) to clarify what the alpha mask represents and characterise the neural population available for our reconstructions. We therefore presented the DNEM with on-off patches to map the receptive fields of single neurons in an in silico experiment (the experimentally derived RF are not available). As expected, there is a rough overlap between the alpha mask (Supplementary Figure 2D), the average population receptive field (Supplementary Figure 2B), and the location of receptive field peaks (Supplementary Figure 2C). In principle, all three could be used during training or evaluation for masking, but we think that defining a mask based on the general influence of images on neural activity, rather than just on off patch responses, is a more elegant solution.
One idea of how to go a step further would be to first set the alpha mask threshold during training based on the % loss of neural activity prediction performance that threshold induces (in our case alpha=0.5 corresponds to ~3% loss in correlation between predicted vs recorded neural responses, see Supplementary Figure 3D), and second base the evaluation mask on a pixel correlation threshold (see example pixel correlation map in Supplementary Figure 2E) instead to avoid evaluating areas of the image with low image reconstruction confidence.
We referred to this figure in the result section, line 83:
“The transparency masks are aligned with but not identical to the On-Off receptive field distribution maps using sparse-noise (Figure S2).”
We have also done additional analysis on the effect of masking during training and evaluation with different thresholds in Supplementary Figure 3.
(5) We appreciated the experiments testing the capacity of the reconstruction process, by using synthetic stimuli created under a Gaussian process in a noise-free way. But this further raised questions: what is the theoretical capability for the reconstruction of this processing pipeline, as a whole? Is 0.563 the best that one could achieve given the noisiness and/or neuron count of the Sensorium project? What if the team applied the pipeline to reconstruct the activity of a given artificial neural network's layer (e.g., some ResNet convolutional layer), using hidden units as proxies for neuronal calcium activity?
That’s a very interesting point. It is very hard to know what the theoretical best reconstruction performance of the model would be. Reconstruction performance could be decreased due to neural variability, experimental noise, the temporal kernel of the calcium indicator and the imaging frame rate, information compression along the visual hierarchy, visual processing phenomena (such as predictive coding and selective attention), failure of the model to predict neural activity correctly, or failure of the reconstruction process to find the best possible image which explains the neural activity. We don't think we can disentangle the contribution of all these sources, but we can provide a theoretical maximum assuming that the model and the reconstruction process are optimal. To that end, we performed additional simulations and reconstructed the natural videos using the predicted activity of the neurons in response to the natural videos as the target (similar to the synthetic stimuli) and got a correlation of 0.766. So, the single trial performance of 0.569 is ~75% of this theoretical maximum. This difference can be interpreted as a combination of the losses due to neuronal variability, measurement noise, and actual deviations in the images represented by the brain compared to reality.
We thank the reviewer for this suggestion, as it gave us the idea of looking at error maps (Figure 6), where the pixel-level deviation of the reconstructions from recorded vs predicted activity is overlaid on the ground truth movie.
(6) As the authors mentioned, this reconstruction method provided a more accurate way to investigate how neurons process visual information. However, this method consisted of two parts: one was the state-of-the-art (SOTA) dynamic neural encoding model (DNEM), which predicts neuronal activity from the input video, and the other part reconstructed the video to produce a response similar to the predicted neuronal activity. Therefore, the reconstructed video was related to neuronal activity through an intermediate model (i.e., SOTA DNEM). If one observes a failure in reconstructing certain visual features of the video (for example, high-spatial frequency details), the reader does not know whether this failure was due to a lack of information in the neural code itself or a failure of the neuronal model to capture this information from the neural code (assuming a perfect reconstruction process). Could the authors address this by outlining the limitations of the SOTA DNEM encoding model and disentangling failures in the reconstruction from failures in the encoding model?
To test if a better neural prediction by the DNEM would result in better reconstructions, we ran additional simulations and now show that neural activity prediction performance correlates with reconstruction performance (Supplementary Figure 4B). This is consistent with Pierzchlewicz et al., (2023) who showed that static image reconstructions using better encoding models leads to better reconstruction performance. As also mentioned in the answer to the previous comment, untangling the relative contributions of reconstruction losses is hard, but we think that improvements to the DNEM performance are key. Two suggestions to improving the DNEM we used would be to translate the input image into retinotopic coordinates and shift this image relative to eye position before passing it to the first convolutional layer (as is done in Wang et al. 2025), to use movies which are not spatially down sampled as heavily, to not use a dilation of 2 in the temporal convolution of the first layer and to train on a larger dataset.
(7) The authors mentioned that a key factor in achieving high-quality reconstructions was model assembling. However, this averaging acts as a form of smoothing, which reduces the reconstruction's acuity and may limit the high-frequency content of the videos (as mentioned in the manuscript). This averaging constrains the tool's capacity to assess how visual neurons process the low-frequency content of visual input. Perhaps the authors could elaborate on potential approaches to address this limitation, given the critical importance of high-frequency visual features for our visual perception.
This is exactly what we also thought. To answer this point more specifically, we ran additional simulations where we also reconstruct the movies using gradient ensembling instead of reconstruction ensembling. Here, the gradients of the loss with respect to each pixel of the movie is calculated for each of the model instances and are averaged at every iteration of the reconstruction optimization. In essence, this means that one reconstruction solution is found, and the averaging across reconstructions, which could degrade high-frequency content, is skipped. The reconstructions from both methods look very similar, and the video correlation is, if anything, slightly worse (Supplemental Figure 3A&C). This indicates that our original ensembling approach did not limit reconstruction performance, but that both approaches can be used, depending on what is more convenient given hardware restrictions.
Reviewer #3 (Public review):
Summary:
This paper presents a method for reconstructing input videos shown to a mouse from the simultaneously recorded visual cortex activity (two-photon calcium imaging data). The publicly available experimental dataset is taken from a recent brain-encoding challenge, and the (publicly available) neural network model that serves to reconstruct the videos is the winning model from that challenge (by distinct authors). The present study applies gradient-based input optimization by backpropagating the brain-encoding error through this selected model (a method that has been proposed in the past, with other datasets). The main contribution of the paper is, therefore, the choice of applying this existing method to this specific dataset with this specific neural network model. The quantitative results appear to go beyond previous attempts at video input reconstruction (although measured with distinct datasets). The conclusions have potential practical interest for the field of brain decoding, and theoretical interest for possible future uses in functional brain exploration.
Strengths:
The authors use a validated optimization method on a recent large-scale dataset, with a state-of-the-art brain encoding model. The use of an ensemble of 7 distinct model instances (trained on distinct subsets of the dataset, with distinct random initializations) significantly improves the reconstructions. The exploration of the relation between reconstruction quality and the number of recorded neurons will be useful to those planning future experiments.
Weaknesses:
The main contribution is methodological, and the methodology combines pre-existing components without any new original components.
We thank the reviewer for taking the time to review our paper and for their overall positive assessment. We would like to emphasise that combining pre-existing machine learning techniques to achieve top results in a new modality does require iteration and innovation. While gradient-based input optimization by backpropagating the brain-encoding error through a neural encoding model has been used in 2D static image optimization to generate maximally exciting images and reconstruct static images, we are the first to have applied it to movies which required accounting for the time domain. Previous methods used time averaged responses and were limited to the reconstruction of static images presented with fixed image intervals.
The movie reconstructions include a learned "transparency mask" to concentrate on the most informative area of the frame; it is not clear how this choice impacts the comparison with prior experiments. Did they all employ this same strategy? If not, shouldn't the quantitative results also be reported without masking, for a fair comparison?
Yes, absolutely. All reconstruction approaches limit the field of view in some way, whether this is due to the size of the screen, the size of the image on the screen, or cropping of the presented/reconstructed images during analysis due to the retinotopic coverage of the recorded neurons. Note that we reconstruct a larger field of view than Yoshida et al. In Yoshida et al., the reconstructed field of view was 43 by 43 retinal degrees. we show the size of an example evaluation mask in comparison.
To address the reviewer’s concern more specifically, we performed additional simulations and now also show the performance using a variety of different training and evaluation masks, including different alpha thresholds for training and evaluation masks as well as the effective retinotopic coverage at different alpha thresholds. Despite these comparisons, we would also like to highlight that the comparison to the benchmark is problematic itself. This is because image and movie reconstruction are not directly comparable. It does not make sense to train and apply a dynamic model on a static image dataset where neural activity is time averaged. Conversely, it does not make sense to train or apply a static model that expects time-averaged neural responses on continuous neural activity unless it is substantially augmented to incorporate temporal dynamics, which in turn would make it a new method. This puts us in the awkward position of being expected to compare our video reconstruction performance to previous image reconstruction methods without a fair way of doing so. We have therefore de-emphasised the phrasing comparing our method to previous publications in the abstract, results, and discussion.
Abstract: “We achieve a ~2-fold increase in pixel-by-pixel correlation compared to previous state-of-the-art reconstructions of static images from mouse V1, while also capturing temporal dynamics.” with “We achieve a pixel-level correction of 0.57 between the ground truth movie and the reconstructions from single-trial neural responses.”
Results: “This represents a ~2x higher pixel-level correlation over previous single-trial static image reconstructions from V1 in awake mice (image correlation 0.238 +/- 0.054 s.e.m for awake mice) [Yoshida et al., 2020] over a similar retinotopic area (~43° x 43°) while also capturing temporal dynamics. However, we would like to stress that directly comparing static image reconstruction methods with movie reconstruction approaches is fundamentally problematic, as they rely on different data types both during training and evaluation (temporally averaged vs continuous neural activity, images flashed at fixed intervals vs continuous movies).”
Discussion: “In conclusion, we reconstruct videos presented to mice based on the activity of neurons in the mouse visual cortex, with a ~2-fold improvement in pixel-by-pixel correlation compared to previous static image reconstruction methods.” with “In conclusion, we reconstruct videos presented to mice based on single-trial activity of neurons in the mouse visual cortex.”
We have also removed the performance table and have instead added supplementary figure 3 with in-depth comparison across different versions of our reconstruction method (variations of masking, ensembling, contrast & luminance matching, and Gaussian blurring).
We believe that we have given enough information in our paper now so that readers can make an informed decision whether our movie reconstruction method is appropriate for the questions they are interested in.
Recommendations for the authors:
Reviewer #2 (Recommendations for the authors):
(1) "Reconstructions have been luminance (mean pixel value across video) and contrast (standard deviation of pixel values across video) matched to ground truth." This was not clear: was it done by the investigating team? I imagine that one of the most easily captured visual features is luminance and contrast, why wouldn't the optimization titrate these well?
The contrast and luminance matching of the reconstructions to the ground truth videos was done by us, but this was only done to help readers assess the quality of the reconstructions by eye. Our performance metrics (frame and video correlation) are contrast and luminance insensitive. To clarify this, we have also added examples of non-adjusted frames in Supplementary Figure 3A, and added a sentence in the results, line 103:
“When presenting videos in this paper we normalize the mean and standard deviation of the reconstructions to the average and standard deviation of the corresponding ground truth movie before applying the evaluation masks, but this is not done for quantification except in Supplementary Figure 3D.”
We were also initially surprised that contrast and luminance are not captured well by our reconstruction method, but this makes sense as V1 is largely luminance invariant (O’Shea et al., 2025 https://doi.org/10.1016/j.celrep.2024.115217 ) and contrast only has a gain effect on V1 activity (Tring et al., 2024 https://journals.physiology.org/doi/full/10.1152/jn.00336.2024). Decoding absolute contrast is likely unreliable because it is probably not the only factor modulating the overall gain of the neural population.
To address the reviewer’s comment more fully, we ran additional experiments. More specifically, to test why contrast and luminance are not recovered in the reconstructions, we checked how the predicted activity between the reconstruction and the contrast/luminance corrected reconstructions differs. Contrast and luminance adjustment had little impact on predicted response similarity on average. This makes the reconstruction optimization loss function insensitive to overall contrast and luminance so it cannot be decoded. There is a small effect on activity correlation, however, so we cannot completely rule out that contrast and luminance could be reconstructed with a different loss function.
(2) The authors attempted to investigate the variability in reconstruction quality across different movies and 10-second snippets of a movie by correlating various visual features, such as video motion energy, contrast, luminance, and behavioral factors like running speed, pupil diameter, and eye movement, with reconstruction success. However, it would also be beneficial if the authors correlated the response loss (Poisson loss between neural responses) with reconstruction quality (video correlation) for individual videos, as these metrics are expected to be correlated if the reconstruction captures neural variance.
We thank the reviewer for this suggestion. We have now included this analysis and find that if the neural activity was better predicted by the DNEM then the reconstruction of the video was also more similar to the ground truth video. We further found that this effect is shift-dependent (in time), meaning the prediction of activity based on proximal video frames is more influential on reconstruction performance.
Reviewer #3 (Recommendations for the authors):
(1) I was confused about the choice of applying a transparency mask thresholded with alpha>0.5 during training and alpha>1 during evaluation. Why treat the two situations differently? Also, shouldn't we expect alpha to be in the [0,1] range, in which case, what is the meaning of alpha>1? (And finally, as already described in "Weaknesses", how does this choice impact the comparison with prior experiments? Did they also employ a similar masking strategy?)
We found that applying a mask during training increased performance regardless of the size of the evaluation mask. Using a less stringent mask during training than during evaluation increases performance slightly, but also allows inspection of the reconstruction in areas where the model will be less confident without sacrificing performance, if this is desired. The thresholds of 0.5 and 1 were chosen through trial and error, but the exact values do not hold intrinsic meaning. The alpha mask values can go above 1 during their optimization. We could have clipped alpha during the training procedure (algorithm 1), but we decided this was not worth redoing at this stage, as the alphas used for testing were not above 1. All reconstruction approaches in previous publications limit the field of view in some form, whether this is due to the size of the screen, the size of the image on the screen, or the cropping of the presented/reconstructed images during analysis.
To address the reviewer’s comment in detail, we have added extensive additional analysis to evaluate the coverage of the reconstruction achieved in this paper and how different masking strategies affect performance, as well as how the mask relates to more traditional receptive field mapping.
(2) I would not use the word "imagery" in the first sentence of the abstract, because this might be interpreted by some readers as reconstruction of mental imagery, a very distinct question.
We changed imagery to images in the abstract.
(3) Line 145-146: "<1 frame, or <30Hz" should be "<1 frame, or >30Hz".
We have corrected the error.
(4) Algorithm 1, Line 5, a subscript variable 'g' should be changed to 'h'
We have corrected the error.
Additional Changes
(1) Minor grammatical errors
(2) Addition of citations: We were previously not aware of a bioRxiv preprint from 2022 (Cobos et al., 2022), which used gradient descent-based input optimization to reconstruct static images but without the addition of a diffusion model. Instead, we had cited for this method Pierzchlewicz et al., 2023 bioRxiv/NeurIPS. In Cobos et al., 2022, they compare static image reconstruction similarity to ground truth images and the similarity of the in vivo evoked activity across multiple reconstruction methods. Performance values are only given for reconstructions from trial-averaged responses across ~40 trials (in the absence of original data or code we are also not able to retrospectively calculate single-trial performance). The authors find that optimizing for evoked activity rather than image similarity produces image reconstructions that evoke more similar in vivo responses compared to reconstructions optimized for image similarity itself. We have now added and discussed the citation in the main text.
(3) Workaround for error in the open-source code from https://github.com/lRomul/sensorium for video hashing function in the SOTA DNEM: By checking the most correlated first frame for each reconstructed movie, we discovered there was a bug in the open-source code and 9/50 movies we originally used for reconstruction were not properly excluded from the training data between DNEM instances. The reason for this error was that some of the movies are different by only a few pixels, and the video hashing function used to split training and test set folds in the original DNEM code classified these movies as different and split them across folds. We have replaced these 9 movies and provide a figure below showing the next closest first frame for every movie clip we reconstruct. This does not affect our claims. Excluding these 9 movie clips, did not affect the reconstruction performance (video correlation went from 0.563 to 0.568), so there was no overestimation of performance due to test set contamination. However, they should still be removed so some of the values in the paper have changed slightly. The only statistical test that was affected was the correlation between video correlation and mean motion energy (Supplementary Figure 4A), which went from p = 0.043 to 0.071.
Author response image 2.
exclusion of movie clips with duplicates in the DNEM training data. A) example frame of a reconstructed movie (ground truth) and the most correlated first frame from the training data. b) all movie clips and their corresponding most correlated clip from the training data. Red boxes indicate excluded duplicates.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This important study demonstrates the significance of incorporating biological constraints in training neural networks to develop models that make accurate predictions under novel conditions. By comparing standard sigmoid recurrent neural networks (RNNs) with biologically constrained RNNs, the manuscript offers compelling evidence that biologically grounded inductive biases enhance generalization to perturbed conditions. This manuscript will appeal to a wide audience in systems and computational neuroscience.
-
Reviewer #1 (Public review):
This manuscript introduces a biologically informed RNN (bioRNN) that predicts the effects of optogenetic perturbations in both synthetic and in vivo datasets. By comparing standard sigmoid RNNs (σRNNs) and bioRNNs, the authors make a compelling case that biologically grounded inductive biases improve generalization to perturbed conditions. This work is innovative, technically strong, and grounded in relevant neuroscience, particularly the pressing need for data-constrained models that generalize causally.
Comments on revisions:
The authors have addressed all my concerns.
-
Reviewer #2 (Public review):
Sourmpis et al. present a study in which the importance of including certain inductive biases in the fitting of recurrent networks is evaluated with respect to the generalization ability of the networks when exposed to untrained perturbations.
The work proceeds in three stages:
(i) a simple illustration of the problem is made. Two reference (ground-truth) networks with qualitatively different connectivity, but similar observable network dynamics, are constructed, and recurrent networks with varying aspects of design similarity to the reference networks are trained to reproduce the reference dynamics. The activity of these trained networks during untrained perturbations is then compared to the activity of the perturbed reference networks. It is shown that, of the design characteristics that were varied, the enforced sign (Dale's law) and locality (spatial extent) of efference were especially important.
(ii) The intuition from the constructed example is then extended to networks that have been trained to reproduce certain aspects of multi-region neural activity recorded from mice during a detection task with a working-memory component. A similar pattern is demonstrated, in which enforcing the sign and locality of efference in the fitted networks has an influence on the ability of the trained networks to predict aspects of neural activity during unseen (untrained) perturbations.
(iii) The authors then illustrate the relationship between the gradient of the motor readout of trained networks with respect to the net inputs to the network units, and the sensitivity of the motor readout to small perturbations of the input currents to the units, which (in vivo) could be controlled optogenetically. The paper is concluded with a proposed use for trained networks, in which the models could be analyzed to determine the most sensitive directions of the network and, during online monitoring, inform a targeted optogenetic perturbation to bias behavior.
The authors do not overstate their claims, and in general, I find that I agree with their conclusions.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review)
Major:
(1) In line 76, the authors make a very powerful statement: 'σRNN simulation achieves higher similarity with unseen recorded trials before perturbation, but lower than the bioRNN on perturbed trials.' I couldn't find a figure showing this. This might be buried somewhere and, in my opinion, deserves some spotlight - maybe a figure or even inclusion in the abstract.
We agree with the reviewer that these results are important. The failure of σRNN on perturbed data could be inferred from the former Figures 1E, 2C-E, and 3D. Following the reviewers' comments, we have tried to make this the most prominent message of Figure 1, in particular with the addition of the new panel E. We also moved Table 1 from the Supplementary to the main text to highlight this quantitatively.
(2) It's mentioned in the introduction (line 84) and elsewhere (e.g., line 259) that spiking has some advantage, but I don't see any figure supporting this claim. In fact, spiking seems not to matter (Figure 2C, E). Please clarify how spiking improves performance, and if it does not, acknowledge that. Relatedly, in line 246, the authors state that 'spiking is a better metric but not significant' when discussing simulations. Either remove this statement and assume spiking is not relevant, or increase the number of simulations.
We could not find the exact quote from the reviewer, and we believe that he intended to quote “spiking is better on all metrics, but without significant margins”. Indeed, spiking did not improve the fit significantly on perturbed trials, this is particularly true in comparison with the benefits of Dale’s law and local inhibition. As suggested by the reviewer, we rephrased the sentence from this quote and more generally the corresponding paragraphs in the intro (lines 83-87) and in the results (lines 245-271). Our corrections in the results sections are also intended to address the minor point (4) raised by the same reviewer.
(3) The authors prefer the metric of predicting hits over MSE, especially when looking at real data (Figure 3). I would bring the supplementary results into the main figures, as both metrics are very nicely complementary. Relatedly, why not add Pearson correlation or R2, and not just focus on MSE Loss?
In Figure 3 for the in-vivo data, we do not have simultaneous electrophysiological recordings and optogenetic stimulation in this dataset. The two are performed on different recording sessions. Therefore, we can only compare the effect of optogenetics on the behavior, and we cannot compute Pearson correlation or R2 of the perturbed network activity. To avoid ambiguity, we wrote “For the sessions of the in vivo dataset with optogenetic perturbation that we considered, only the behavior of an animal is recorded” on line 294.
(4) I really like the 'forward-looking' experiment in closed loop! But I felt that the relevance of micro perturbations is very unclear in the intro and results. This could be better motivated: why should an experimentalist care about this forward-looking experiment? Why exactly do we care about micro perturbation (e.g., in contrast to non-micro perturbation)? Relatedly, I would try to explain this in the intro without resorting to technical jargon like 'gradients'.
As suggested, we updated the last paragraph of the introduction (lines 88 - 95) to give better motivation for why algorithmically targeted acute spatio-temporal perturbations can be important to dissect the function of neural circuits. We also added citations to recent studies with targeted in vivo optogenetic stimulation. As far as we know the existing previous work targeted network stimulation mostly using linear models, while we used non-linear RNNs and their gradients.
Minor:
(1) In the intro, the authors refer to 'the field' twice. Personally, I find this term odd. I would opt for something like 'in neuroscience'.
We implemented the suggested change: l.27 and l.30
(2) Line 45: When referring to previous work using data-constrained RNN models, Valente et al. is missing (though it is well cited later when discussing regularization through low-rank constraints)
We added the citation: l.45
(3) Line 11: Method should be methods (missing an 's').
We fixed the typo.
(4) In line 250, starting with 'So far', is a strange choice of presentation order. After interpreting the results for other biological ingredients, the authors introduce a new one. I would first introduce all ingredients and then interpret. It's telling that the authors jump back to 2B after discussing 2C.
We restructured the last two paragraphs of section 2.1, and we hope that the presentation order is now more logical.
(5) The black dots in Figure 3E are not explained, or at least I couldn't find an explanation.
We added an explanation in the caption of Figure 3E.
Reviewer #2 (Public review):
(1) Some aspects of the methods are unclear. For comparisons between recurrent networks trained from randomly initialized weights, I would expect that many initializations were made for each model variant to be compared, and that the performance characteristics are constructed by aggregating over networks trained from multiple random initializations. I could not tell from the methods whether this was done or how many models were aggregated.
The expectation of the reviewer is correct, we trained multiple models with different random seeds (affecting both the weight initialization and the noise of our model) for each variant and aggregated the results. We have now clarified this in Methods 4.6. lines 658-662.
(2) It is possible that including perturbation trials in the training sets would improve model performance across conditions, including held-out (untrained) perturbations (for instance, to units that had not been perturbed during training). It could be noted that if perturbations are available, their use may alleviate some of the design decisions that are evaluated here.
In general, we agree with the reviewer that including perturbation trials in the training set would likely improve model performance across conditions. One practical limitation explaining partially why we did not do it with our dataset is the small quantity of perturbed trials for each targeted cortical area: the number of trials with light perturbations is too scarce to robustly train and test our models.
More profoundly, to test hard generalizations to perturbations (aka perturbation testing), it will always be necessary that the perturbations are not trivially represented in the training data. Including perturbation trials during training would compromise our main finding: some biological model constraints improve the generalization to perturbation. To test this claim, it was necessary to keep the perturbations out of the training data.
We agree that including all available data of perturbed and non-perturbed recordings would be useful to build the best generalist predictive system. It could help, for instance, for closed-loop circuit control as we studied in Figure 5. Yet, there too, it will be important for the scientific validation process to always keep some causal perturbations of interest out of the training set. This is necessary to fairly measure the real generalization capability of any model. Importantly, this is why we think out-of-distribution “perturbation testing” is likely to have a recurring impact in the years to come, even beyond the case of optogenetic inactivation studied in detail in our paper.
Recommendation for the authors:
Reviewer #1 (Recommendation for the authors):
The code is not very easy to follow. I know this is a lot to ask, but maybe make clear where the code is to train the different models, which I think is a great contribution of this work? I predict that many readers will want to use the code and so this will improve the impact of this work.
We updated the code to make it easier to train a model from scratch.
Reviewer #2 (Recommendation for the authors):
The figures are really tough to read. Some of that small font should be sized up, and it's tough to tell in the posted paper what's happening in Figure 2B.
We updated Figures 1 and 2 significantly, in part to increase their readability. We also implemented the "Superficialities" suggestions.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This valuable study explores the role of the chromatin regulator ATAD2 in mouse spermatogenesis. The data convincingly demonstrate that ATAD2 is essential for proper chromatin remodeling in haploid spermatids, influencing gene accessibility, H3.3-mediated transcription, and histone eviction. Using Atad2 knockout (KO) mice, the authors link ATAD2 to the DNA-replication-independent incorporation of sperm-specific proteins like protamines and histone H3.3. Although the findings highlight chromatin abnormalities and impaired in vitro fertilization in KO mice, natural fertility remains unaffected, suggesting possible in vivo compensatory mechanisms. Future experiments will be needed to tease out the precise molecular role of ATAD2 in spermatogenesis. This work will be of interest to the epigenetics and developmental fields.
-
Reviewer #1 (Public review):
Summary:
The authors analyzed the expression of ATAD2 protein in post-meiotic stages and characterized the localization of various testis-specific proteins in the testis of the Atad2 knockout (KO). By cytological analysis as well as the ATAC sequencing, the study showed that increased levels of HIRA histone chaperone, accumulation of histone H3.3 on post-meiotic nuclei, defective chromatin accessibility and also delayed deposition of protamines. Sperm from the Atad2 KO mice reduces the success of in vitro fertilization. The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin.
Strengths:
The paper describes the role of ATAD2 AAA+ ATPase in the proper localization of sperm-specific chromatin proteins such as protamine, suggesting the importance of the DNA replication-independent histone exchanges with the HIRA-histone H3.3 axis.
Weaknesses:
The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin.
-
Reviewer #2 (Public review):
Summary:
This manuscript by Liakopoulou et al. presents a comprehensive investigation into the role of ATAD2 in regulating chromatin dynamics during spermatogenesis. The authors elegantly demonstrate that ATAD2, via its control of histone chaperone HIRA turnover, ensures proper H3.3 localization, chromatin accessibility, and histone-to-protamine transition in post-meiotic male germ cells. Using a new well-characterized Atad2 KO mouse model, they show that ATAD2 deficiency disrupts HIRA dynamics, leading to aberrant H3.3 deposition, impaired transcriptional regulation, delayed protamine assembly, and defective sperm genome compaction. The study bridges ATAD2's conserved functions in embryonic stem cells and cancer to spermatogenesis, revealing a novel layer of epigenetic regulation critical for male fertility.
Strengths:
The MS first demonstration of ATAD2's essential role in spermatogenesis, linking its expression in haploid spermatids to histone chaperone regulation by connecting ATAD2-dependent chromatin dynamics to gene accessibility (ATAC-seq), H3.3-mediated transcription, and histone eviction. Interestingly and surprisingly, sperm chromatin defects in Atad2 KO mice impair only in vitro fertilization but not natural fertility, suggesting unknown compensatory mechanisms in vivo.
Weaknesses:
The MS is robust and there are not big weaknesses
The authors have addressed all the queries successfully.
-
Reviewer #3 (Public review):
Summary:
The authors generated knockout mice for Atad2, a conserved bromodomain-containing factor expressed during spermatogenesis. In Atad2 KO mice, HIRA, a chaperone for histone variant H3.3, was upregulated in round spermatids, accompanied by an apparent increase in H3.3 levels. Furthermore, the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis were partially disrupted in the absence of ATAD2, possibly due to delayed histone removal. Despite these abnormalities, Atad2 KO male mice were able to produce offspring normally.
Strengths:
The manuscript addresses the biological role of ATAD2 in spermatogenesis using a knockout mouse model, providing a valuable in vivo framework to study chromatin regulation during male germ cell development. The observed redistribution of H3.3 in round spermatids is clearly presented and suggests a previously unappreciated role of ATAD2 in histone variant dynamics. The authors also document defects in the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis, providing phenotypic insight into chromatin transitions in late spermatogenic stages. Overall, the study presents a solid foundation for further mechanistic investigation into ATAD2 function.
Weaknesses:
While the manuscript reports the gross phenotype of Atad2 KO mice, the findings remain largely superficial and do not convincingly demonstrate how ATAD2 deficiency affects chromatin.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary:
The authors analyzed the expression of ATAD2 protein in post-meiotic stages and characterized the localization of various testis-specific proteins in the testis of the Atad2 knockout (KO). By cytological analysis as well as the ATAC sequencing, the study showed that increased levels of HIRA histone chaperone, accumulation of histone H3.3 on post-meiotic nuclei, defective chromatin accessibility and also delayed deposition of protamines. Sperm from the Atad2 KO mice reduces the success of in vitro fertilization. The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin.
We would like to take this opportunity to highlight that the present study builds on our previously published work, which examined the function of ATAD2 in both yeast S. pombe and mouse embryonic stem (ES) cells (Wang et al., 2021). In yeast, using genetic analysis we showed that inactivation of HIRA rescues defective cell growth caused by the absence of ATAD2. This rescue could also be achieved by reducing histone dosage, indicating that the toxicity depends on histone over-dosage, and that HIRA toxicity, in the absence of ATAD2, is linked to this imbalance.
Furthermore, HIRA ChIP-seq performed in mouse ES cells revealed increased nucleosome-bound HIRA, particularly around transcription start sites (TSS) of active genes, along with the appearance of HIRA-bound nucleosomes within normally nucleosome-free regions (NFRs). These findings pointed to ATAD2 as a major factor responsible for unloading HIRA from nucleosomes. This unloading function may also apply to other histone chaperones, such as FACT (see Wang et al., 2021, Fig. 4C).
In the present study, our investigations converge on the same ATAD2 function in the context of a physiologically integrated mammalian system—spermatogenesis. Indeed, in the absence of ATAD2, we observed H3.3 accumulation and enhanced H3.3-mediated gene expression. Consistent with this functional model of ATAD2— unloading chaperones from histone- and non-histone-bound chromatin—we also observed defects in histone-toprotamine replacement.
Together, the results presented here and in Wang et al. (2021) reveal an underappreciated regulatory layer of histone chaperone activity. Previously, histone chaperones were primarily understood as factors that load histones. Our findings demonstrate that we must also consider a previously unrecognized regulatory mechanism that controls assembled histone-bound chaperones. This key point was clearly captured and emphasized by Reviewer #2 (see below).
Strengths:
The paper describes the role of ATAD2 AAA+ ATPase in the proper localization of sperm-specific chromatin proteins such as protamine, suggesting the importance of the DNA replication-independent histone exchanges with the HIRA-histone H3.3 axis.
Weaknesses:
(1) Some results lack quantification.
We will consider all the data and add appropriate quantifications where necessary.
(2) The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin.
Please see our comments above.
Reviewer #2 (Public review):
Summary:
This manuscript by Liakopoulou et al. presents a comprehensive investigation into the role of ATAD2 in regulating chromatin dynamics during spermatogenesis. The authors elegantly demonstrate that ATAD2, via its control of histone chaperone HIRA turnover, ensures proper H3.3 localization, chromatin accessibility, and histone-toprotamine transition in post-meiotic male germ cells. Using a new well-characterized Atad2 KO mouse model, they show that ATAD2 deficiency disrupts HIRA dynamics, leading to aberrant H3.3 deposition, impaired transcriptional regulation, delayed protamine assembly, and defective sperm genome compaction. The study bridges ATAD2's conserved functions in embryonic stem cells and cancer to spermatogenesis, revealing a novel layer of epigenetic regulation critical for male fertility.
Strengths:
The MS first demonstration of ATAD2's essential role in spermatogenesis, linking its expression in haploid spermatids to histone chaperone regulation by connecting ATAD2-dependent chromatin dynamics to gene accessibility (ATAC-seq), H3.3-mediated transcription, and histone eviction. Interestingly and surprisingly, sperm chromatin defects in Atad2 KO mice impair only in vitro fertilization but not natural fertility, suggesting unknown compensatory mechanisms in vivo.
Weaknesses:
The MS is robust and there are not big weaknesses
Reviewer #3 (Public review):
Summary:
The authors generated knockout mice for Atad2, a conserved bromodomain-containing factor expressed during spermatogenesis. In Atad2 KO mice, HIRA, a chaperone for histone variant H3.3, was upregulated in round spermatids, accompanied by an apparent increase in H3.3 levels. Furthermore, the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis were partially disrupted in the absence of ATAD2, possibly due to delayed histone removal. Despite these abnormalities, Atad2 KO male mice were able to produce offspring normally.
Strengths:
The manuscript addresses the biological role of ATAD2 in spermatogenesis using a knockout mouse model, providing a valuable in vivo framework to study chromatin regulation during male germ cell development. The observed redistribution of H3.3 in round spermatids is clearly presented and suggests a previously unappreciated role of ATAD2 in histone variant dynamics. The authors also document defects in the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis, providing phenotypic insight into chromatin transitions in late spermatogenic stages. Overall, the study presents a solid foundation for further mechanistic investigation into ATAD2 function.
Weaknesses:
While the manuscript reports the gross phenotype of Atad2 KO mice, the findings remain largely superficial and do not convincingly demonstrate how ATAD2 deficiency affects chromatin dynamics. Moreover, the phenotype appears too mild to elucidate the functional significance of ATAD2 during spermatogenesis.
We respectfully disagree with the statement that our findings are largely superficial. Based on our investigations of this factor over the years, it has become evident that ATAD2 functions as an auxiliary factor that facilitates mechanisms controlling chromatin dynamics (see, for example, Morozumi et al., 2015). These mechanisms can still occur in the absence of ATAD2, but with reduced efficiency, which explains the mild phenotype we observed.
This function, while not essential, is nonetheless an integral part of the cell’s molecular biology and should be studied and brought to the attention of the broader biological community, just as we study essential factors. Unfortunately, the field has tended to focus primarily on core functional actors, often overlooking auxiliary factors. As a result, our decade-long investigations into the subtle yet important roles of ATAD2 have repeatedly been met with skepticism regarding its functional significance, which has in turn influenced editorial decisions.
We chose eLife as the venue for this work specifically to avoid such editorial barriers and to emphasize that facilitators of essential functions do exist. They deserve to be investigated, and the underlying molecular regulatory mechanisms must be understood.
(1) Figures 4-5: The analyses of differential gene expression and chromatin organization should be more comprehensive. First, Venn diagrams comparing the sets of significantly differentially expressed genes between this study and previous work should be shown for each developmental stage. Second, given the established role of H3.3 in MSCI, the effect of Atad2 knockout on sex chromosome gene expression should be analyzed. Third, integrated analysis of RNA-seq and ATAC-seq data is needed to evaluate how ATAD2 loss affects gene expression. Finally, H3.3 ChIP-seq should be performed to directly assess changes in H3.3 distribution following Atad2 knockout.
(1) In the revised version, we will include Venn diagrams to illustrate the overlap in significantly differentially expressed genes between this study and previous work. However, we believe that the GSEAs presented here provide stronger evidence, as they indicate the statistical significance of this overlap (p-values). In our case, we observed p-value < 0.01 (**) and p < 0.001 (***).
(2) Sex chromosome gene expression was analyzed and is presented in Fig. 5C.
(3) The effect of ATAD2 loss on gene expression is shown in Fig. 4A, B, and C as histograms, with statistical significance indicated in the middle panels.
(4) Although mapping H3.3 incorporation across the genome in wild-type and Atad2 KO cells would have been informative, the available anti-H3.3 antibody did not work for ChIP-seq, at least in our hands. The authors of Fontaine et al., 2022, who studied H3.3 during spermatogenesis in mice, must have encountered the same problem, since they tagged the endogenous H3.3 gene to perform their ChIP experiments.
(2) Figure 3: The altered distribution of H3.3 is compelling. This raises the possibility that histone marks associated with H3.3 may also be affected, although this has not been investigated. It would therefore be important to examine the distribution of histone modifications typically associated with H3.3. If any alterations are observed, ChIP-seq analyses should be performed to explore them further.
Based on our understanding of ATAD2’s function—specifically its role in releasing chromatin-bound HIRA—in the absence of ATAD2 the residence time of both HIRA and H3.3 on chromatin increases. This results in the detection of H3.3 not only on sex chromosomes but across the genome. Our data provide clear evidence of this phenomenon. The reviewer is correct in suggesting that the accumulated H3.3 would carry H3.3-associated histone PTMs; however, we are unsure what additional insights could be gained by further demonstrating this point.
(3) Figure 7: While the authors suggest that pre-PRM2 processing is impaired in Atad2 KO, no direct evidence is provided. It is essential to conduct acid-urea polyacrylamide gel electrophoresis (AU-PAGE) followed by western blotting, or a comparable experiment, to substantiate this claim.
Figure 7 does not suggest that pre-PRM2 processing is affected in Atad2 KO; rather, this figure—particularly Fig. 7B—specifically demonstrates that pre-PRM2 processing is impaired, as shown using an antibody that recognizes the processed portion of pre-PRM2. ELISA was used to provide a more quantitative assessment; however, in the revised manuscript we will also include a western blot image.
(4) HIRA and ATAD2: Does the upregulation of HIRA fully account for the phenotypes observed in Atad2 KO? If so, would overexpression of HIRA alone be sufficient to phenocopy the Atad2 KO phenotype? Alternatively, would partial reduction of HIRA (e.g., through heterozygous deletion) in the Atad2 KO background be sufficient to rescue the phenotype?
These are interesting experiments that require the creation of appropriate mouse models, which are not currently available.
(5) The mechanism by which ATAD2 regulates HIRA turnover on chromatin and the deposition of H3.3 remains unclear from the manuscript and warrants further investigation.
The Reviewer is absolutely correct. In addition to the points addressed in response to Reviewer #1’s general comments (see above), it would indeed have been very interesting to test the segregase activity of ATAD2 (likely driven by its AAA ATPase activity) through in vitro experiments using the Xenopus egg extract system described by Tagami et al., 2004. This system can be applied both in the presence and absence (via immunodepletion) of ATAD2 and would also allow the use of ATAD2 mutants, particularly those with inactive AAA ATPase or bromodomains. However, such experiments go well beyond the scope of this study, which focuses on the role of ATAD2 in chromatin dynamics during spermatogenesis.
References:
(1) Wang T, Perazza D, Boussouar F, Cattaneo M, Bougdour A, Chuffart F, Barral S, Vargas A, Liakopoulou A, Puthier D, Bargier L, Morozumi Y, Jamshidikia M, Garcia-Saez I, Petosa C, Rousseaux S, Verdel A, Khochbin S. ATAD2 controls chromatin-bound HIRA turnover. Life Sci Alliance. 2021 Sep 27;4(12):e202101151. doi: 10.26508/lsa.202101151. PMID: 34580178; PMCID: PMC8500222.
(2) Morozumi Y, Boussouar F, Tan M, Chaikuad A, Jamshidikia M, Colak G, He H, Nie L, Petosa C, de Dieuleveult M, Curtet S, Vitte AL, Rabatel C, Debernardi A, Cosset FL, Verhoeyen E, Emadali A, Schweifer N, Gianni D, Gut M, Guardiola P, Rousseaux S, Gérard M, Knapp S, Zhao Y, Khochbin S. Atad2 is a generalist facilitator of chromatin dynamics in embryonic stem cells. J Mol Cell Biol. 2016 Aug;8(4):349-62. doi: 10.1093/jmcb/mjv060. Epub 2015 Oct 12. PMID: 26459632; PMCID: PMC4991664.
(3) Fontaine E, Papin C, Martinez G, Le Gras S, Nahed RA, Héry P, Buchou T, Ouararhni K, Favier B, Gautier T, Sabir JSM, Gerard M, Bednar J, Arnoult C, Dimitrov S, Hamiche A. Dual role of histone variant H3.3B in spermatogenesis: positive regulation of piRNA transcription and implication in X-chromosome inactivation. Nucleic Acids Res. 2022 Jul 22;50(13):7350-7366. doi: 10.1093/nar/gkac541. PMID: 35766398; PMCID: PMC9303386.
(4) Tagami H, Ray-Gallet D, Almouzni G, Nakatani Y. Histone H3.1 and H3.3 complexes mediate nucleosome assembly pathways dependent or independent of DNA synthesis. Cell. 2004 Jan 9;116(1):51-61. doi: 10.1016/s0092-8674(03)01064-x. PMID: 14718166.
Recommendations for the authors:
Reviewing Editor Comments:
I note that the reviewers had mixed opinions about the strength of the evidence in the manuscript. A revision that addresses these points would be welcome.
Reviewer #1 (Recommendations for the authors):
Major points:
(1) No line numbers: It is hard to point out the issues.
The revised version harbors line numbers.
(2) Given the results shown in Figure 3 and Figure 4, it is nice to show the chromosomal localization of histone H3.3 in spermatocytes or post-meiotic cells by Chromatin-immunoprecipitation sequencing (ChIP-seq).
Although mapping H3.3 incorporation across the genome in wild-type and Atad2 KO cells would have been informative, the available anti-H3.3 antibody did not work for ChIP-seq in our hands. In fact, this antibody is not well regarded for ChIP-seq. For example, Fontaine et al. (2022), who investigated H3.3 during spermatogenesis in mice, circumvented this issue by tagging the endogenous H3.3 genes for their ChIP experiments.
(3) Figure 7B and 8: Why the authors used ELISA for the protein quantification. At least, western blotting should be shown.
ELISA is a more quantitative method than traditional immunoblotting. Nevertheless, as requested by the reviewer, we have now included a corresponding western blot in Fig. S3.
(4) For readers, please add a schematic pathway of histone-protamine replacement in sperm formation in Fig.1 and it would be nice to have a model figure, which contains the authors' idea in the last figure.
As requested by this reviewer, we have now included a schematic model in Figure 9 to summarize the main conclusions of our work.
Minor points:
(1) Page 2, the second paragraph, "pre-PRM2: Please explain more about pre-PRM2 and/or PRM2 as well as PRM1 (Figure 6).
More detailed descriptions of PRM2 processing are now given in this paragraph.
(2) Page 3, bottom paragraph, line 1: "KO" should be "knockout (KO)".
Done.
(3) Page 4, second paragraph bottom: Please explain more about the protein structure of germ-line-specific ATAD2S: how it is different from ATAD2L. Germ-line specific means it is also expressed in ovary?
As Atad2 is predominantly expressed in embryonic stem cells and in spermatogenic cells, we replaced all through the text germ-line specific by more appropriate terms.
(4) Figure 1C, western blotting: Wild-type testis extracts, both ATAD2L and -S are present. Does this mean that ATADS2L is expressed in both germ line as well as supporting cells. Please clarify this and, if possible, show the western blotting of spermatids well as spermatocytes.
Figure 1D shows sections of seminiferous tubules from Atad2 KO mice, in which lacZ expression is driven by the endogenous Atad2 promoter. The results indicate that Atad2 is expressed mainly in post-meiotic cells. Most labeled cells are located near the lumen, whereas the supporting Sertoli cells remain unlabeled. Sertoli cells, which are anchored to the basal lamina, span the entire thickness of the germinal epithelium from the basal lamina to the lumen. Their nuclei, however, are usually positioned closer to the basal membrane. Thus, the observed lacZ expression pattern argues against substantial Atad2 expression in Sertoli cells.
(5) Figure 1C: Please explain a bit more about the reduction of ATAD2 proteins in heterozygous mice.
Done
(6) Figure 1C: Genotypes of the mice should be shown in the legend.
Done
(7) Figure 1D: Please add a more magnified image of the sections to see the staining pattern in the seminiferous tubules.
The magnification does not bring more information since we lose the structure of cells within tubules due the nature of treatment of the sections for X-gal staining. Please see comments to question 1C to reviewer 2
(8) Page 5, first paragraph, line 2, histone dosage: What do the authors meant by the histone dosage? Please explain more or use more appropriate word.
"Histone dosage" refers to the amount or relative abundance of histone proteins in a cell.
(9) Figure 2A: Figure 2A: Given the result in Figure 1C, it is interesting to check the amount of HIRA in Atad2 heterozygous mice.
In Atad2 heterozygous mice, we would expect an increase in HIRA, but only to about half the level seen in the Atad2 homozygous knockout shown in Figure 2A, which is relatively modest. Therefore, we doubt that detecting such a small change—approximately half of that in Figure 2A—would yield clear or definitive results.
(10) Figure 2A, legend (n=5): What does this "n" mean? The extract of testes from "5" male mice like Figure 2B. Or 5 independent experiments. If the latter is true, it is important to share the other results in the Supplements.
“n” refers to five WT and five Atad2 KO males. The legend has been clarified as suggested by the reviewer.
(11) Figure 2A, legend, line 2, Atad2: This should be italicized.
Done
(12) Figure 2B: Please show the quantification of amounts of HIRA protein like Fig. 2A.
As indicated in the legend, what is shown is a pool of testes from 3 individuals per genotype.
(13) Figure 2B shows an increased level of HIRA in Atad2 KO testis. This suggests the role of ATAD2 in the protein degradation of HIRA. This possibility should be mentioned or tested since ATAD2 is an AAA+ ATPase.
The extensive literature on ATAD2 provides no indication that it is involved in protein degradation. In our early work on ATAD2 in the 2000s, we hypothesized that, as a member of the AAA ATPase family, ATAD2 might associate with the 19S proteasome subunit (through multimerization with the other AAA ATPase member of this regulatory subunit). However, both our published pilot studies (Caron et al., PMID: 20581866) and subsequent unpublished work ruled out this possibility. Instead, since the amount of nucleosome-bound HIRA increases in the absence of ATAD2, we propose that chromatin-bound HIRA is more stable than soluble HIRA once it has been released from chromatin by ATAD2.
(14) Page 6, second paragraph, line 5, ko: KO should be capitalized.
Done
(15) Page 6, second paragraph, line 2 from the bottom, chromatin dynamics: Throughout the text, the authors used "chromatin dynamics". However, all the authors analyzed in the current study is the localization of chromatin protein. So, it is much easier to explain the results by using "chromatin status," etc. In this context, "accessibility" is better.
We changed the term “chromatin dynamics” into a more precise term according to the context used all through the text.
(16) Figure 3: Please provide the quantification of signals of histone H3.3 in a nucleus or nuclear cytoplasm.
This request is not clear to us since we do not observe any H3.3 signal in the cytoplasm.
(17) Figure 3: As the control of specificity in post-meiotic cells, please show the image and quantification of the H3.3 signals in spermatocyte, for example.
This request is not clear to us. What specificity is meant?
(18) Figure 3, bottom panels: Please show what the white lines indicate?
The white lines indicate the limit of cell nucleus and estimated by Hoechst staining. This is now indicated in the legend of the figure.
(19) Figure 4A: Please explain more about what kind of data is here. Is this wild-type and/or Atad2 KO? The label of the Y-axis should be "mean expression level". What is the standard deviation (SD) here on the X-axis. Moreover, there is only one red open circle, but the number of this class is 5611. All 5611 genes in this group show NO expression. Please explain more.
The plot displays the mean expression levels (y-axis, labeled as "mean expression level") versus the corresponding standard deviations (x-axis), both calculated from three independent biological replicates of isolated round spermatids (Atad2 wild-type and Atad2 KO). The standard deviation reflects the variability of gene expression across biological replicates. Genes were grouped into four categories (grp1: blue, grp2: cyan, grp3: green, grp4: orange) according to the quartile of their mean expression. For grp4, all genes have no detectable expression, resulting in a mean expression of zero and a standard deviation of zero; consequently, the 5611 genes in this group are represented by a single overlapping point (red open circle) at the origin.
(20) Figure 4C: If possible, it would be better to have a statistical comparison between wild-type and the KO.
The mean profiles are displayed together with their variability (± 2 s.e.m.) across the four replicates for both ATAD2 WT (blue) and ATAD2 KO (red). For groups 1, 2, and 3, the envelopes of the curves remain clearly separated around the peak, indicating a consistent difference in signal between the two conditions. In contrast, group 4 does not present a strong signal and, accordingly, no marked difference is observed between WT and KO in this group.
(21) Figure 5, GSEA panels: Please explain more about what the GSEA is in the legend. The legend has been updated as follows:
(A) Expression profiles of post-meiotic H3.3-activated genes. The heatmap (left panel) displays the normalized expression levels of genes identified by Fontaine and colleagues as upregulated in the absence of histone H3.3 (Fontaine et al. 2022) for Atad2 WT (WT) and Atad2 KO (KO) samples at days 20, 22, 24, and 26 PP (D20 to D26). The colour scale represents the z-score of log-transformed DESeq2-normalized counts. The middle panel box plots display, pooled, normalized expression levels, aggregated across replicates and genes, for each condition (WT and KO) and each time point (D20 to D26). Statistical significance between WT and KO conditions was determined using a two-sided t-test, with p-values indicated as follows: * for p-value<0.05, ** for p-value<0.01 and *** for p-value<0.001. The right panel shows the results of gene set enrichment analysis (GSEA), which assesses whether predefined groups of genes show statistically significant differences between conditions. Here, the post-meiotic H3.3-activated genes set, identified by Fontaine et al. (2022), is significantly enriched in Atad2 KO compared with WT samples at day 26 (p < 0.05, FDR < 0.25). Coloured vertical bars indicate the “leading edge” genes (i.e., those contributing most to the enrichment signal), located before the point of maximum enrichment score. (B) As shown in (A) but for the "post-meiotic H3.3-repressed genes" gene set. (C) As shown in (A) but for the " sex chromosome-linked genes " gene set.
(22) Figure 6. In the KO, the number of green cells is more than red and yellow cells, suggesting the delayed maturation of green (TH2B-positive) cells. It is essential to count the number of each cell and show the quantification.
The green cells correspond to those expressing TH2B but lacking transition proteins (TP) and protamine 1 (Prm1), indicating that they are at earlier stages than elongating–condensing spermatids. Counting these green cells simply reflects the ratio of elongating/condensing spermatids to earlier-stage cells, which varies depending on the field examined. The key point in this experiment is that in wild-type mice, only red cells (elongating/condensing spermatids) and green cells (earlier stages) are observed. By contrast, in Atad2 KO testes, a significant proportion of yellow cells appears, which are never seen in wild-type tissue. The crucial metric here is the percentage of yellow cells relative to the total number of elongating/condensing spermatids (red cells). In wild-type testes, this value is consistently 0%, whereas in Atad2 KO testes it always ranges between 50% and 100% across all fields containing substantial numbers of elongating/condensing spermatids.
(23) Figure 8A: Please show the images of sperm (heads) in the KO mice with or without decompaction.
The requested image is now displayed in Figure S5.
(24) Figure 8C: In the legend, it says n=5. However, there are more than 5 plots on the graph. Please explain the experiment more in detail.
The experiment is now better explained in the legend of this Figure.
Reviewer #2 (Recommendations for the authors):
While the study is rigorous and well performed, the following minor points could be addressed to strengthen the manuscript:
Figure 1C should indicate each of the different types of cells present in the sections. It would be of interest to show specifically the different post-meiotic germ cells.
With this type of sample preparation, it is difficult to precisely distinguish the different cell types within the sections. Nevertheless, the staining pattern strongly indicates that most of the intensely stained cells are post-meiotic, situated near the tubule lumens and extending roughly halfway toward the basal membrane.
In the absence of functional ATAD2, the accumulation of HIRA primarily occurs in round spermatids (Fig. 2B). If technically possible, it would be of great interest to show this by IHC of testis section.
Unfortunately, our antibody did not satisfactorily work in IHC.
The increased of H3.3 signal in Atad2 KO spermatids (Fig. 3) is interpreted because of a reduced turnover. However, alternative explanations (e.g., H3.3 misincorporation or altered chaperone affinity) should not be ruled out.
The referee is correct that alternative explanations are possible. However, based on our previous work (Wang et al., 2021; PMID: 34580178), we demonstrated that in the absence of ATAD2, there is reduced turnover of HIRAbound nucleosomes, as well as reduced nucleosome turnover, evidenced by the appearance of nucleosomes in regions that are normally nucleosome-free at active gene TSSs. We have no evidence supporting any other alternative hypothesis.
In the MS the reduced accessibility at active genes (Fig. 4) is attributed to H3.3 overloading. However, global changes in histone acetylation (e.g., H4K5ac) or other remodelers in KO cells could be also consider.
In fact, we meant that histone overloading could be responsible for the altered accessibility. This has been clearly demonstrated in case of S. cerevisiae in the absence of Yta7 (S. cerevisiae’ ATAD2) (PMID: 25406467).
In relation with the sperm compaction assay (Fig. 8A), the DTT/heparin/Triton protocol may not fully reflect physiological decompaction. This could be validated with alternative methods (e.g., MNase sensitivity).
The referee is right, but since this is a subtle effect as it can be judged by normal fertility, we doubt that milder approaches could reveal significant differences between wildtype and Atad2 KO sperms.
It is surprising that despite the observed alterations in the genome organization of the sperm, the natural fertility of the KO mice is not affected (Fig. 8C). This warrants deeper discussion: Is functional compensation occurring (e.g., by p97/VCP)? Analysis of epididymal sperm maturation or uterine environment could provide insights.
As detailed in the Discussion section, this work, together with our previous study (Wang et al., 2021; PMID: 34580178), highlights an overlooked level of regulation in histone chaperone activity: the release of chromatinbound factors following their interaction with chromatin. This is an energy-dependent process, driven by ATP and the associated ATPase activity of these factors. Such activity could be mediated by various proteins, such as p97/VCP or DNAJC9–HSP70, as discussed in the manuscript, or by yet unidentified factors. However, most of these mechanisms are likely to occur during the extensive histone-to-histone variant exchanges of meiosis and post-meiotic stages. To the best of our knowledge, epididymal sperm maturation and the uterine environment do not involve substantial histone-to-histone or histone-to-protamine exchanges.
The authors showed that MSCI genes present an enhancement of repression in the absence of ATAD2 by enhancing H3.3 function. It would be also of interest to analyze the behavior of the Sex body during its silencing (zygotene to pachytene) by looking at different markers (i.e., gamma-H2AX phosphorylation, Ubiquitylation etc).
The referee is correct that this is an interesting question. Accordingly, in our future work, we plan to examine the sex body in more detail during its silencing, using a variety of relevant markers, including those suggested by the reviewer. However, we believe that such investigations fall outside the scope of the present study, which focuses on the molecular relationship between ATAD2 and H3.3, rather than on the role of H3.3 in regulating sex body transcription. For a comprehensive analysis of this aspect, studies should primarily focus on the H3.3 mouse models reported by Fontaine and colleagues (PMID: 35766398).
Fig. 6: Co-staining of TH2B/TP1/PRM1 is convincing but would benefit from quantification (% cells with overlapping signals).
The green cells correspond to those expressing TH2B but lacking transition proteins (TP) and protamine 1 (Prm1), indicating that they are at earlier stages than elongating–condensing spermatids. Counting these green cells simply reflects the ratio of elongating/condensing spermatids to earlier-stage cells, which varies depending on the field examined. The key point is that in wild-type mice, only red cells (elongating/condensing spermatids) and green cells (earlier stages) are observed. By contrast, in Atad2 KO testes, a significant proportion of yellow cells appears, which are never seen in wild-type tissue. The crucial metric is the percentage of yellow cells relative to the total number of elongating/condensing spermatids (red cells). In wild-type testes, this value is consistently 0%, whereas in Atad2 KO testes it always ranges between 50% and 100% across all fields containing substantial numbers of elongating/condensing spermatids.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This useful study reports a method to detect and analyze a novel post-translational modification, lysine acetoacetylation (Kacac), finding it regulates protein metabolism pathways. The study unveils epigenetic modifiers involved in placing this mark, including key histone acetyltransferases such as p300, and concomitant HDACs, which remove the mark. Proteomic and bioinformatics analysis identified many human proteins with Kacac sites, potentially suggesting broad effects on cellular processes and disease mechanisms. The data presented are solid and the study will be of interest to those studying protein and metabolic regulation.
-
Reviewer #3 (Public review):
Summary:
This paper presents a timely and significant contribution to the study of lysine acetoacetylation (Kacac). The authors successfully demonstrate a novel and practical chemo-immunological method using the reducing reagent NaBH4 to transform Kacac into lysine β-hydroxybutyrylation (Kbhb).
Strengths:
This innovative approach enables simultaneous investigation of Kacac and Kbhb, showcasing their potential in advancing our understanding of post-translational modifications and their roles in cellular metabolism and disease.
Weaknesses:
The study lacks supporting in vivo data, such as gene knockdown experiments, to validate the proposed conclusions at the cellular level.
-
Author response:
The following is the authors’ response to the previous reviews
Public Reviews:
Reviewer #2 (Public review):
In the manuscript by Fu et al., the authors developed a chemo-immunological method for the reliable detection of Kacac, a novel post-translational modification, and demonstrated that acetoacetate and AACS serve as key regulators of cellular Kacac levels. Furthermore, the authors identified the enzymatic addition of the Kacac mark by acyltransferases GCN5, p300, and PCAF, as well as its removal by deacetylase HDAC3. These findings indicate that AACS utilizes acetoacetate to generate acetoacetyl-CoA in the cytosol, which is subsequently transferred into the nucleus for histone Kacac modification. A comprehensive proteomic analysis has identified 139 Kacac sites on 85 human proteins. Bioinformatics analysis of Kacac substrates and RNA-seq data reveal the broad impacts of Kacac on diverse cellular processes and various pathophysiological conditions. This study provides valuable additional insights into the investigation of Kacac and would serve as a helpful resource for future physiological or pathological research.
The authors have made efforts to revise this manuscript and address my concerns. The revisions are appropriate and have improved the quality of the manuscript.
We appreciate the constructive and thoughtful feedbacks, which have been invaluable in enhancing the quality of our manuscript.
Reviewer #3 (Public review):
Summary:
This paper presents a timely and significant contribution to the study of lysine acetoacetylation (Kacac). The authors successfully demonstrate a novel and practical chemoimmunological method using the reducing reagent NaBH4 to transform Kacac into lysine βhydroxybutyrylation (Kbhb).
Thank you for the positive and insightful comments.
Strengths:
This innovative approach enables simultaneous investigation of Kacac and Kbhb, showcasing its potential in advancing our understanding of post-translational modifications and their roles in cellular metabolism and disease.
We are grateful for the reviewer’s comments, which has contributed to enhancing the quality of our study.
Weaknesses:
The experimental evidence presented in the article is insufficient to fully support the authors' conclusions. In the in vitro assays, the proteins used appear to be highly inconsistent with their expected molecular weights, as shown by Coomassie Brilliant Blue staining (Figure S3A). For example, p300, which has a theoretical molecular weight of approximately 270 kDa, appeared at around 37 kDa; GCN5/PCAF, expected to be ~70 kDa, appeared below 20 kDa. Other proteins used in the in vitro experiments also exhibited similarly large discrepancies from their predicted sizes. These inconsistencies severely compromise the reliability of the in vitro findings. Furthermore, the study lacks supporting in vivo data, such as gene knockdown experiments, to validate the proposed conclusions at the cellular level.
We appreciate the reviewer’s comments. In the biochemical assays, we used the expressed catalytic domains of HATs—rather than the full-length proteins for activity testing. Specifically, the following constructs were expressed and purified: p300 (1287– 1666), GCN5 (499-663), PCAF (493-658), MOF (125-458), MOZ (497-780), MBP-MORF (361-716), Tip60 (221-512), HAT1 (20-341), and HBO1 (full length). This resulted in the observed discrepancies in molecular weight in Figure S3A compared to the expected fulllength weights.
Although a recent study (PMID: 37382194) reported the acetoacetyltransferase activities of p300 and GCN5 in cells, we recognize that additional knockdown experiments would be necessary to substantiate their contributions to in vivo Kacac generation and to explore the functional roles of Kacac in an enzyme-specific context. We plan to address these kinds of research issues in our future work.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This fundamental study provides new evidence of a change in how microglia survey neurons during the chronic phase of neurodegeneration, which researchers studying neuroinflammation and its role in neurodegenerative disease should find interesting. In this research, using time-lapse imaging of acute brain slices from prion-affected mice, the researchers show that, unlike in healthy brains, microglia become reactive, lose their territorial boundaries, and become highly mobile, exhibiting "kiss-and-ride" behavior, migrating into brain tissue and forming reversible, transient body-to-body contact with neurons. The evidence is compelling, with well-executed time-lapse imaging, good quantitative analysis across several disease stages, pharmacological validation of P2Y6 involvement, and the very surprising finding that this mobile behavior persists after microglia are removed from the brain.
-
Reviewer #1 (Public review):
Summary:
In this manuscript, Subhramanian et al. carefully examined how microglia adapt their surveillance strategies during chronic neurodegeneration, specifically in prion-infected mice. The authors used ex vivo time-lapse imaging and in vitro strategies and found that reactive microglia adopt a highly mobile, "kiss-and-ride" behavior, contrasting the more static surveillance typical of homeostatic microglia. The manuscript provides fundamental mechanistic insights into the dynamics of microglia-neuron interactions, implicates P2Y6 signaling in regulating mobility, and suggests that intrinsic reprogramming of microglia might underlie this behavior, the conclusions are therefore compelling.
Strengths:
(1) The novelty of the study is high, particularly the demonstration that microglia lose territorial confinement and dynamically migrate from neuron to neuron under chronic neurodegeneration.
(2) The possible implications of a stimulus-independent high mobility in reactive microglia are particularly striking. Although this is not fully explored.
(3) The use of time-lapse imaging in organotypic slices rather than overexpression models provided a more physiological approach.
(4) Microglia-neuron interactions in neurodegeneration have broad implications for understanding the progression of diseases, such as Alzheimer's and Parkinson's, that are associated with chronic inflammation.
Weaknesses:
Previous weaknesses were addressed.
-
Reviewer #2 (Public review):
This is a nice paper focused microglial responses to different clinical stages of prion infection in acute brain slices. The key here is the use of time-lapse imaging that captures the dynamics of microglial surveillance, including morphology, migration, and intracellular neuron/microglial contacts. The authors use a myeloid GFP-labeled transgenic mouse to track microglia in SSLOW-infected brain slices, quantifying differences in motility and microglial-neuronal interactions via live fluorescence imaging. Interesting findings include the elaborate patterns of motility among microglia, the distinct types and durations of intracellular contacts, the potential role of calcium signaling in facilitating hypermobility, and the fact that this motion-promoting status is intrinsic to the microglia, persisting even after the cells have been isolated from infected brains. Although largely a descriptive paper, it offers mechanistic insights, including the role of calcium in supporting microglial movement, with bursts of signaling identified even within the time lapse format, and inhibition studies implicating the purinergic receptor and calcium transient regulator P2Y6 in migratory capacity.
Strengths:
(1) The focus on microglia activation and activity in the context of prion disease is interesting
(2) Two different prions produce largely the same response
(3) Use of time-lapse provides insight into the dynamics of microglia, distinguishing between types of contact - mobility vs motility - and providing insight on the duration/transience and reversibility of extensive somatic contacts that include brief and focused connections in addition to soma envelopment.
(4) Imaging window selection (3 hours) guided by prior publications documenting preserved morphology, activity, and gene expression regulation up to 4 hours.
(5) The distinction between high- and low-mobility microglia is interesting, especially given that hypermobility seems to be an innate property of the cells.
(6) The live-imaging approach is validated by fixed tissue confocal imaging.
(7) The variance in duration of neuron/microglia contacts is interesting, although there is no insight into what might dictate which status of interaction predominates
(8) The reversibility of the enveloping action, which is not apparently a commitment to engulfment, is interesting, as is the fact that only neurons are selected for this activity.
(9) The calcium studies use the fluorescent dye calbryte-590, which picks up neuronal and microglial bursts -prolonged bursts are detected in enveloped neurons and in the hyper-mobile microglia - the microglial lead is followed up using MRS-2578 P2Y6 inhibitor that blunts the mobility of the microglia
Comments on revisions:
The authors have addressed my concerns in full - I think this is a very nice addition to the literature.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review)
The Cx3cr1/EGFP line labels all myeloid cells, which makes it difficult to conclude that all observed behaviors are attributable to microglia rather than infiltrating macrophages. The authors refer to this and include it as a limitation. Nonetheless, complementary confirmation by additional microglia markers would strengthen their claims.
We appreciate the reviewer’s insightful comment regarding the cellular identity of the enveloping myeloid cells. As suggested, we performed triple co-immunostaining of SSLOW-infected Cx3cr1/EGFP mice using markers for neurons (NeuN), myeloid cells (IBA1), and resident microglia (TMEM119 or P2Y12). Because formic acid treatment used to deactivate prions abolishes the EGFP signal, we relied on IBA1 staining to identify the myeloid population. Our results confirmed that IBA1⁺ cells exhibiting the envelopment behavior are also TMEM119⁺ and P2Y12⁺, consistent with a resident microglial phenotype. These new data are presented in Figures S3 and S4 and described in the final section of the Results.
Although the authors elegantly describe dynamic surveillance and envelopment hypothesis, it is unclear what the role of this phenotype is for disease progression, i.e., functional consequences. For example, are the neurons that undergo sustained envelopment more likely to degenerate?
We appreciate this important question regarding the functional implications of neuronal envelopment. At present, technical limitations prevent us from continuously tracking the fate of individual enveloped neurons in prion-infected mice. Nevertheless, our recent study demonstrated that P2Y12 knockout increases the prevalence of neuronal envelopment and accelerates disease progression (Makarava et al., 2025, J. Neuroinflammation). These findings suggest that while microglial envelopment may represent an adaptive response to increased neuronal surveillance demands, excessive envelopment, as observed in the absence of P2Y12, appears to be maladaptive. A new paragraph has been added to the Discussion to address this point.
Moreover, although the increase in mobility is a relevant finding, it would be interesting for the authors to further comment on what the molecular trigger(s) is/are that might promote this increase. These adaptations, which are at least long-lasting, confer apparent mobility in the absence of external stimuli.
We thank the reviewer for this thoughtful suggestion. The molecular mechanisms underlying the increased mobility of microglia in prion-infected brains remain to be identified, and we plan to pursue this question in future studies. One possibility we briefly discuss in the revised manuscript is that proinflammatory signaling, mediated by secreted cytokines or interleukins, may drive this phenotype. Supporting this hypothesis, recent work has shown that IFNγ enhances microglial migration in the adult mouse cortex (doi:10.1073/pnas.2302892120). This work has been cited in the revised manuscript.
The authors performed, as far as I could understand, the experiments in cortical brain regions. There is no clear rationale for this in the manuscript, nor is it clear whether the mobility is specific to a particular brain region. This is particularly important, as microglia reactivity varies greatly depending on the brain region.
We appreciate this insightful comment highlighting the importance of regional determinants of microglial reactivity, which indeed aligns with our ongoing research interests. In our previous studies, neuronal envelopment by microglia was observed consistently across all prion-affected brain regions exhibiting neuroinflammation. Assuming that envelopment requires microglial mobility, it is reasonable to speculate that microglia are mobile in all brain regions affected by prions and displaying neuroinflammatory responses. In the current study, we focused exclusively on the cortex because this region was used for quantifying the prevalence of neuronal envelopment as a function of disease progression in our prior work (DOI: 10.1172/JCI181169), which guided the present study design. Our ongoing investigations indicate that the prevalence of envelopment is region-dependent and correlates with microglial reactivity/the degree of neuroinflammation. In prion diseases, the degree of microglial reactivity is dictated by the tropism of specific prion strains to distinct brain regions. Notably, our prior studies have shown that strain-specific sialylation patterns of PrP<sup>Sc</sup> glycans play a key role in determining both regional strain tropism and the extent of neuroinflammatory activation (DOI: 10.3390/ijms21030828, DOI: 10.1172/JCI138677). In response to this comment, we have added a brief rationale for using the cortex in the Results section.
It would be relevant information to have an analysis of the percentage of cells in normal, sub-clinical, early clinical, and advanced stages that became mobile. Without this information, the speed/distance alone can have different interpretations.
We thank the reviewer for this valuable suggestion. The percentage of mobile cells across normal, sub-clinical, early clinical, and advanced disease stages is presented in Figure 3b and described in the final paragraph of the section “Enveloping behavior of reactive myeloid cells.”
Reviewer #2 (Public review)
The number of individual cells tracked has been provided, but not the number of individual mice. The sex of the mice is not provided.
We used N = 3 animals per group throughout the study; this information has now been added to the figure legends. Animals of both sexes were included in random proportions. The sex information is now listed for each experiment in the Animals subsection of the Methods.
The statistical approach is not clear; was each cell treated as a single observation?
Yes, with the exception of the heat map in Figure 2d, all mobility parameters are analyzed and presented at the level of individual cells, with each cell treated as an independent observation. The primary aim of this study is to characterize behavioral patterns of single reactive myeloid cells. Analyzing data at the cell level allows us to capture the full distribution of cell behaviors and to preserve biologically meaningful heterogeneity within and across animals. By contrast, averaging values per animal would largely mask this variability. In the heat map in Figure 2d, data are averaged per animal, specifically to illustrate inter-animal variability within each group and to visualize changes across disease progression.
The potential for heterogeneity among animals has not been addressed.
To address this concern, we now include a new Supplemental Figure (Figure S4) presenting the data using Superplots, in which individual cells are shown as dots, animal-level average as circles, and group means calculated based on animals as black horizontal lines. These plots demonstrate that cell mobility measures are highly consistent across animals within each group, indicating limited inter-animal heterogeneity.
Validation of prion accumulation at each clinical stage of the disease is not provided.
We now provide validation of PrP<sup>Sc</sup> accumulation across disease stages by Western blot, along with quantitative analysis, in a new Supplemental Figure (Figure S2). This confirms progressive PrP<sup>Sc</sup> accumulation with advancing disease.
How were the numerous captures of cells handled to derive morphological quantitative values? Based on the videos, there is a lot of movement and shape-shifting.
The following description has been added to Methods to clarify morphology analysis: For microglial morphology analysis, we quantified morphological parameters (radius, area, perimeter, and shape index) for individual EGFP⁺ cells in each time frame of the time-lapse recordings using the TrackMate 7.13.2 plugin in FIJI. Parameter values for each cell were then averaged across the entire three-hour imaging period to obtain a single mean value per cell.
While it is recognized that there are limits to what can be measured simultaneously with live imaging, the authors appear to have fixed tissues from each time point too - it would be very interesting to know if the extent or prion accumulation influences the microglial surveillance - i.e., do the enveloped ones have greater pathology.
This is very interesting question which is difficult to answer due to technical challenges in monitoring the pathology or faith of individual neuronal cells as a function of their envelopment in live prion-infected animals. Our previous work revealed that both accumulation of total PrP<sup>Sc</sup> in a brain and accumulation of PrP<sup>Sc</sup> specifically in lysosomal compartments of microglia due to phagocytosis precedes the onset of neuronal envelopment (DOI: 10.1172/JCI181169). Moreover, the onset of neuronal envelopment occurred after a noticeable decline in neuronal levels of Grin1, a subunit of the NMDA receptor essential for synaptic plasticity. Reactive microglia were observed to envelop Grin1-deficient neurons, suggesting that microglia respond to neuronal dysfunction. However, considering that envelopment is very dynamic and - in most cases - reversible, correlating the degree of envelopment with dysfunction of individual neurons is technically challenging.
Recommendations for the authors
Reviewer #1 (Recommendations for the authors):
(1) I recommend performing additional immunostaining using microglial markers to address specificity.
These new data showing immunostaining for markers of resident microglia TMEM119 and P2Y12 are presented in Figures S6 and S7 and described in the final section of the Results.
(2) The authors can at least further discuss the functional consequences of their findings in further detail.
A new paragraph has been added to the Discussion to address this point.
(3) Quantify the % of cells that become mobile in the different conditions.
The percentage of mobile cells across normal, sub-clinical, early clinical, and advanced disease stages is presented in Figure 3b and described in the final paragraph of the section “Enveloping behavior of reactive myeloid cells.”
(4) Improve method details on the brain regions used and further expand the statistical section.
We have expanded the Statistical Analysis section to indicate whether statistical comparisons and mean values were calculated at the single-cell level or the animal level for each analysis. The specific statistical tests used and the number of animals (N) are now reported in the corresponding figure legends. The sex of animals is provided in Table 1 (Methods). Only the cortical region was examined in this study; this information is stated in the Methods and is now also noted in the figure legends for clarity.
Reviewer #2 (Recommendations for the authors):
(1) More details on members of the PY2 receptor family expressed in microglia would be helpful. The study highlights a previously published prion-induced decline in the expression of P2Y12, a microglial marker that is required for intracellular neuron-microglial contacts, and P2Y6, involved in calcium transients, which is required for hypermotility. How are members of this family of receptors regulated at the gene and/or protein level in microglial and given their responsiveness to nucleotide ligands, are other members implicated in the properties being quantified here?
We appreciate the reviewer’s insightful comment. To address this point, we examined the expression of multiple P2Y receptors and ATP-gated P2X channels known to contribute to microglial surveillance, activation, motility, and phagocytosis, alongside the activation markers Tlr2, Cd68, and Trem2. Bulk brain transcript analyses indicated that all examined genes were upregulated in SSLOW-infected mice relative to controls (new Figure S5a). However, because microglial proliferation substantially increases microglial numbers during prion disease progression, bulk tissue measurements do not necessarily reflect per-cell expression levels. Therefore, we normalized gene expression values to the microglia-specific marker Tmem119, whose per-cell expression remains stable across disease stages (Makarava et al., 2025, J. Neuroinflammation). After normalization, Tlr2, Cd68, and Trem2 were increased approximately 10-, 6-, and 4-fold, respectively. In contrast, P2 receptor genes showed more modest changes: P2ry6 increased ~3-fold, P2ry13 ~2-fold, and P2rx7 ~1.3-fold, while P2rx4 remained unchanged (Figure S5a). Within the scope of the present study, we focused on P2Y6 due to (i) its role in regulating calcium transients, (ii) the magnitude of its upregulation relative to other P2 receptors, and (iii) its highly microglia-specific expression in the CNS. We note that currently available commercial P2Y6 antibodies lack sufficient specificity, making reliable assessment of protein-level expression challenging.
(2) Is P2Y6 expressed in any other cell type that might account for the blunted mobility of the microglia? The authors mention P2Y12 also identifies the GFP cells; however, it would be beneficial to highlight the specificity of the target in the ex vivo treatment of the infected slices.
In the brain, both P2Y12 and P2Y6 are considered highly specific to resident microglia under physiological and neuroinflammatory conditions. P2Y12 is, in fact, widely used as a canonical marker of homeostatic and resident microglia. While P2Y6 is also expressed in peripheral myeloid cells such as macrophages, our phenotypic characterization indicates that the cells exhibiting neuronal envelopment are TMEM119⁺ and P2Y12⁺, consistent with a resident microglial identity. These data, including new analyses added to the revised manuscript, support that the cells responding to P2Y6 signaling in our ex vivo slice experiments are resident microglia.
(3) The fluorescent mouse lacks Cx3cr1 - have the authors investigated why there were no apparent consequences, at least in the context of prion infection? Are there functional redundancies that might be harnessed? Does this impact the generalizability of the findings here?
The role of Cx3cr1 in prion disease has been directly examined in two independent studies (doi: 10.1099/jgv.0.000442; doi: 10.1186/1471-2202-15-44). One study reported no effect of Cx3cr1 deficiency on disease incubation time, whereas the other observed only a minor difference. Importantly, both studies found no detectable alterations in microglial activation patterns, cytokine expression, or PrP<sup>Sc</sup> deposition in Cx3cr1-deficient mice compared to wild-type controls. Our own data (Figure S1) are consistent with these findings: disease course and PrP<sup>Sc</sup> deposition were comparable between Cx3cr1/EGFP and wild-type mice. Moreover, we observed reactive microglial envelopment of neurons in both genotypes. Microglia isolated from SSLOW-infected Cx3cr1/EGFP mice also displayed similarly elevated mobility in vitro, in agreement with our previous observations of high mobility of microglia isolated from SSLOW-infected wild-type mice (Makarava et al., 2025, J. Neuroinflammation). Taken together, these results indicate that Cx3cr1 is not a key determinant of reactive microglial mobility or envelopment behavior in prion disease. Thus, the use of the Cx3cr1/EGFP reporter line does not compromise the generalizability of our conclusions.
(4) The distinction between high mobility and low mobility microglia is interesting - is there any evidence to suggest that the slow-moving microglia are actually a separate class - do enveloping microglia exhibit both mobility states - can the authors comment on plasticity here?
We appreciate this insightful comment, which closely aligns with our ongoing interests. At present, we do not have evidence to support that high- versus low-mobility microglia represent distinct molecular phenotypes. Given that our time-lapse imaging spans only a three-hour window, it remains unclear whether these mobility states reflect stable cell-intrinsic properties or transient phases within a dynamic surveillance process. Notably, we observed that individual cells can transition between more stationary, neuron-associated states and highly mobile states within the same imaging session. In future work, we intend to investigate whether prolonged interactions with neuronal somas or other microenvironmental cues may drive diversification of reactive myeloid cell phenotypes.
(5) In the discussion, the authors speculate about "collective coordinated decision making" - that seems a stretch unless greater context is provided. The fact that several microglia can be found in contact with an individual neuron and that each microglia can connect with multiple neurons simultaneously is certainly interesting; however, evidence for hive behavior is entirely lacking.
We agree with the reviewer that our previous wording overstated the interpretation. The statement regarding collective decision-making has been removed.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife assessment
This important work is the first to suggest a model that the nematode C. elegans prefers specific bacteria (its major food source) that release high amounts of the known attractant isoamyl alcohol when supplemented with exogenous leucine and has also identified a likely receptor for the odorant isoamyl alcohol. The evidence supporting the claims of the authors is solid, and the manuscript would be improved by changes to the text that clarify and address the distinction between "supplemented" versus "enriched". The renaming of srd-12 to snif-1 should also be addressed.
-
Reviewer #1 (Public review):
Summary:
Siddiqui et al., investigate the question of how bacterial metabolism contributes to the attraction of C. elegans to specific bacteria. They show that C. elegans prefers three bacterial species when cultured in a leucine-enriched environment. These bacterial species release more isoamyl alcohol, a known C. elegans attractant, when cultured with leucine supplement than without leucine supplement. The study shows correlative evidence that isoamyl alcohol is produced from leucine by the Ehrlich pathway. In addition, they show that SNIF-1 is a receptor for isoamyl alcohol because a null mutant of this receptor exhibits lower chemotaxis to isoamyl alcohol and that chemotaxis to isoamyl alcohol is rescued by expression of snif-1 in AWC.
Strengths:
(1) This study takes a creative approach to examine the question of what specific volatile chemicals released by bacteria may signify to C. elegans by examining both bacterial metabolism and C. elegans preference behavior. Although C. elegans has long been known to be attracted to bacterial metabolites, this study may be one of the first to examine the possible role of a specific bacterial metabolic pathway in mediating attraction.
(2) A strength of the paper is the identification of SNIF-1 as a receptor for isoamyl alcohol. The ligands for very few olfactory receptors have been identified in C. elegans and so this is a significant addition to the field. The SNIF-1 null mutant strain will likely be a useful reagent for many labs examining olfactory and foraging behaviors.
Weaknesses:
(1) The authors write that the leucine metabolism via the Ehrlich pathway is required for production of isoamyl alcohol by three bacteria (CEent1, JUb66, BIGb0170), but their evidence for this is correlation and not causation. They show that the gene, ilvE (which is part of the Ehrlich pathway) is upregulated in CEent1 bacteria upon exposure to leucine. Although this indicates that the ilvE gene may be involved in leucine metabolism, it does not show causation. To show causation, they need to knockout ilvE from one of these strains, show that the bacteria does not have increased isoamyl alcohol production when cultured on leucine, and that the bacteria is no longer attractive to C. elegans.
(2) Although the authors do show that the three bacterial strains they focus on (CEent1, JUb66, and BIGb0170) are more attractive to C. elegans when supplemented with leucine. Some other strains such as BIGb0393 are also more attractive with leucine supplementation and produce isoamyl alcohol (Fig 1B and Supp Table 2). It is unclear why these other strains are not included with the selected three strains.
(3) The behavioral evidence that snif-1 gene encodes a receptor for isoamyl alcohol is compelling because of the mutant phenotype and rescue experiments. The evidence would be stronger with calcium imaging of AWC neurons in response to isoamyl alcohol in the receptor mutant with the expectation that the response would be reduced or abolished in the mutant compared to wildtype.
-
Reviewer #2 (Public review):
Summary:
Siddiqui et al. show that C. elegans prefers certain bacterial strains that have been supplemented with the essential amino acid (EEA) leucine. They convincingly show that some leucine enriched bacteria stimulate the production of isoamyl alcohol (IAA). IAA is an attractive odorant that is sensed by the AWC. The authors an identify a receptor, SRD-12, that is expressed in the AWC chemosensory neurons and is required for chemotaxis to IAA. The authors propose that IAA is a predominant olfactory cue that determines diet preference in C. elegans. Since leucine is an EAA, the authors propose that worm IAA sensing allows the animal provides a proxy mechanism to identify EAA rich diets.
Strengths:
The authors propose IAA as a predominant olfactory cue that determines diet preference in C. elegans providing molecular mechanism underlying diet selection. They show that wild isolates of C. elegans have strong chemotactic response to IAA indicating that IAA is an ecologically relevant odor for the worm. The paper is well written, and the presented data are convincing and well organized. This is an interesting paper that connects chemotactic response with bacterially produced odors and thus provides an understanding how animals adapt their foraging behavior through the perception of molecules that may indicate the nutritional value.
Weaknesses:
Major: While I do like the way the authors frame C. elegans IAA sensing as mechanisms to identify leucine (EAA) rich diets, it is not fully clear whether bacterial IAA production is a proxy for bacterial leucine levels.
(1) Can the authors measure leucine (or other EAA) content of the different CeMbio strains? This would substantiate the premise in the way they frame this in the introduction. While the authors convincingly show that leucine supplementation induces IAA production in some strains, it is not clear if there are lower leucine levels in the different in the non-preferred strains.
(2) It is not clear whether the non-preferred bacteria in Figure 1A and 1B have the ability to produce IAA. To substantiate the claim that C. elegans prefers CEent1, JUb66, and BIGb0170 due to their ability to generate IAA from leucine, it would be measure IAA levels in non-preferred bacteria (+ and - leucine supplementation). If the authors have these data it would be good to include this.
(3) The authors would strengthen their claim if they could show that deletion or silencing ilvE enzyme reduces IAA levels and eliminates the increased preference upon leucine supplementation.
(4) While the three preferred bacteria possess the ilvE gene, it is not clear whether this enzyme is present in the other non-preferred bacterial strains. As far as I know, the CeMbio strains have been sequenced, so it should be easy to determine if the non-preferred bacteria possess the capacity to make IAA. Does expression of ilvE in e.g. E. coli increase its preference index or are the other genes in the biosynthesis pathway missing?
(5) It is strongly implied that leucine rich diets are beneficial to the worm. Do the authors have data to show the effect on leucine supplementation on C. elegans healthspan, life-span or broodsize?
Comments on revisions:
(1) The authors have addressed most of the earlier questions. The main unresolved issue is the link between iaa production is a reflection of bacterial leucine levels. It is not clear if there are lower leucine levels in the different in non-preferred strains.
The main conclusions that: 1. some bacterial strains can convert exogenous leucine into IAA which is an attractant to C. elegans. 2. The identification of a GPCR required for IAA responses are solid. These are important results that carry the paper. My outstanding concern remains with the overinterpretation of the framing that C. elegans IAA sensing is used as a mechanism to identify leucine (EAA) rich diets. It is fine to leave this a favorite hypothesis in the discussion but statements throughout the paper need to be nuanced without leucine measurement of the different bacterial strains. (Also since for the bacterial chemotaxis assays there were only done with a single concentration of leucine makes it difficult to infer bacterial leucine concentrations). I recommend softening claims related to leucine-rich diet detection unless quantitative measurements are provided.
Part of the issue in the text lies in the difference between "supplemented" and "chemotaxis" (lab based constructs) and enriched and foraging (natural environment based). This is also the way it is set up in the introduction "Do animals use specific sensing mechanisms to find an EAA-enriched diet?". If enriched is used strictly the same as supplemented then it would be fine but in the text this distinction gets blurred and enriched drifts to the more ethological explanation.
Then it is more than just semantics since leucine-supplemented diets are not something that occurs in the natural environment. IAA production by bacteria could be a signal for a leucine rich environment and it is fine to speculate about this in the discussion.
Examples where the wording needs to be more precise to reflect the experimental results rather than the possible impact in its natural environment:
The title:' The olfactory receptor SNIF-1 mediates foraging for leucine-rich diets in C. elegans"
The intro:"Taken together, SNIF-1 regulates the dietary preference of worms to IAA-producing bacteria and thereby mediates the foraging behavior of C. elegans to leucine-enriched diets. Thus, IAA produced by bacteria is a dietary quality code for leucine-enriched bacteria."
Results "Figure 1. C. elegans relies on odors to select leucine-enriched bacteria"
Supplementation is used more in the text and the figure legends whereas headings and abstract use enriched. The experiments in the paper only describe leucine-supplemented experiments. So use I would supplemented instead of enriched when describing experiments for clarity.
For instance:
Page 4:"Microbial odors drive the preference of C. elegans for leucine-enriched diet"
Page 5: "Altogether, these findings suggested that worms rely on odors to distinguish various bacteria and find leucine-enriched bacteria"
Page 7: "Isoamyl alcohol odor is a signature for a leucine-enriched diet"
Page 9: AWC odor sensory neurons facilitate the diet preference of C. elegans for leucine-enriched diets"
page 20 "Leucine-enriched diets produce significantly higher levels of IAA odor, making up to 90% of their headspace"
(2) As suggested in the first round of review the authors now add data IAA levels in non-preferred bacteria (+ and - leucine supplementation) in table S2. While it is good to have this data, the table is not very clear. Not clear what ND stands for in the table S2. Not determined or not detected? I assume not determined since some strains Jub44, BiGb0393 Jub134 produce IAA even in the absence of LEU. The authors mention that "the abundance of IAA in these strains is significantly less". However, the table just reflects yes or no. Can the authors give an indication of the concentration to understand what significantly less means? Fig. 2c at least gives a heat map.
(3) On wormbase the gene is still called srd-12. The authors should seek permission to rename srd-12 to snif-1.
-
Author response:
The following is the authors’ response to the original reviews.
eLife Assessment:
This is an important study, supported by solid to convincing data, that suggests a model for diet selection in C. elegans. The significance is that while C. elegans has long been known to be attracted to bacterial volatiles, what specific bacterial volatiles may signify to C. elegans is largely unknown. This study also provides evidence for a possible odorant/GPCR pairing.
Public Reviews:
Reviewer #1 (Public review):
Summary:
Siddiqui et al., investigate the question of how bacterial metabolism contributes to the attraction of C. elegans to specific bacteria. They show that C. elegans prefers three bacterial species when cultured in a leucine-enriched environment. These bacterial species release more isoamyl alcohol, a known C. elegans attractant, when cultured with leucine supplement than without leucine supplement. The study shows correlative evidence that isoamyl alcohol is produced from leucine by the Ehrlich pathway. In addition, they show that SRD-12 (SNIF-1) is likely a receptor for isoamyl alcohol because a null mutant of this receptor exhibits lower chemotaxis to isoamyl alcohol and lower preference for leucine-enriched bacteria.
Strengths:
(1) This study takes a creative approach to examine the question of what specific volatile chemicals released by bacteria may signify to C. elegans by examining both bacterial metabolism and C. elegans preference behavior. Although C. elegans has long been known to be attracted to bacterial metabolites, this study may be one of the first to examine the role of a specific bacterial metabolic pathway in mediating attraction.
(2) A strength of the paper is the identification of SRD-12 (SNIF-1) as a likely receptor for isoamyl alcohol. The ligands for very few olfactory receptors have been identified in C. elegans and so this is a significant addition to the field. The srd-12 (snif-1) null mutant strain will likely be a useful reagent for many labs examining olfactory and foraging behaviors.
Weaknesses:
(1) The authors write that the leucine metabolism via the Ehrlich pathway is required for the production of isoamyl alcohol by three bacteria (CEent1, JUb66, BIGb0170), but their evidence for this is correlation and not causation. They write that the gene ilvE is a bacterial homolog of the first gene in the yeast Ehrlich pathway (it would be good to include a citation for this) and that the gene is present in these three bacterial strains. In addition, they show that this gene, ilvE, is upregulated in CEent1 bacteria upon exposure to leucine. To show causation, they need to knockout ilvE from one of these strains, show that the bacteria does not have increased isoamyl alcohol production when cultured on leucine, and that the bacteria is no longer attractive to C. elegans.
Thank you for the comment. We have added the appropriate citation [1,2]. We agree that worms’ diet preference for the preferred strains upon ilvE knockout will further strengthen the claim for IAA being used as a proxy for leucine-enriched diet. Currently, protocols and tools for genetic manipulations for CeMbio strains are not available, making this experiment not feasible at this time.
(2) The authors examine three bacterial strains that C. elegans showed increased preference when grown with leucine supplementation vs. without leucine supplementation. However, there also appears to be a strong preference for another strain, JUb0393, when grown on plus leucine (Figure 1B). It would be good to include statistics and criteria for selecting the three strains.
Thanks for your comment. We agree that for Pantoea nemavictus, JUb393, worms seem to prefer the leucine supplemented (+ LEU) bacteria over unsupplemented (-LEU). However, when given a choice between the individual CeMbio bacteria and E. coli OP50, worms showed preference for only CEent1, JUb66, and BIGb0170 (Figure 1F). Consequently, CEent1, JUb66, and BIGb0170 were selected for further analyses. We have included statistics for Figure 1B-C and Figure S1A-G with details mentioned in the figure legend.
(3) Although the behavioral evidence that srd-12 (snif-1) gene encodes a receptor for isoamyl alcohol is compelling, it does not meet the standard for showing that it is an olfactory receptor in C. elegans. To show it is indeed a likely receptor one or more of the following should be done:
(a) Calcium imaging of AWC neurons in response to isoamyl alcohol in the receptor mutant with the expectation that the response would be reduced or abolished in the mutant compared to wildtype.
(b)"A receptor swap" experiment where the SRD-12 (SNIF-1) receptor is expressed in AWB repulsive neuron in SRD-12 (SNIF-1) receptor mutant background with the expectation that with receptor swap C. elegans will now be repulsed from isoamyl alcohol in chemotaxis assays (experiment from Sengupta et al., 1996 odr-10 paper).
Thanks for all your comments and suggestions. While the lab currently does not have the necessary expertise to conduct calcium imaging of neurons, we have performed additional experiments to confirm the requirements of AWC neurons for SNIF-1 function. We generated transgenic worms with extrachromosomal array expressing snif-1 under (a) AWC-specific promoter, odr-1, and (b) AWB-specific promoter, str-1. As shown in new panel 6H in the revised manuscript and Author response image 1, we found that overexpression of snif-1 in AWC neurons completely rescues the chemotaxis defect of snif-1 mutant (referred at VSL2401), whereas upon the “receptor swap" in AWB neurons IAA is sensed as a repellent.
Author response image 1.
(A) Chemotaxis index (CI) of WT, VSL2401, VSL2401 [AWCp::snif-1] and VSL2401 [AWBp::snif-1] worms to IAA at 1:1000 dilution. Significant differences are indicated as **** P ≤ 0.0001 determined by one-way ANOVA followed by post hoc Dunnett’s multiple comparison test. Error bars indicate SEM (n≥15).
(4) The authors conclude that C. elegans cannot detect leucine in chemotaxis assays. It is important to add the method for how leucine chemotaxis assay was done in order to interpret these results. Because leucine is not volatile if leucine is put on the plates immediately before the worms are added (as in a traditional odor chemotaxis assay), there is no leucine gradient for the worm to detect. It would be good to put leucine on the plate several hours before worms are introduced so worms have the possibility to be able to detect the gradient of leucine (for example, see Wakabayashi et al., 2009).
Previously, the chemotaxis assays with leucine were performed like traditional odor chemotaxis assays. We also performed chemotaxis assay as detailed in Shingai et al 2005[3]. Leucine was spotted on the assay plates 5 hours prior to the introduction of worms on the plates. As shown in new panel S1H in the revised manuscript, wild-type worms do not show response to leucine in the modified chemotaxis assay.
We have included the experimental details for leucine chemotaxis assays in the revised manuscript.
(5) The bacterial preference assay entitled "odor-only assay" is a misleading name. In the assay, C. elegans is exposed to both volatile chemicals (odors) and non-volatile chemicals because the bacteria are grown on the assay plate for 12 hours before the worms are introduced to the assay plate. In that time, the bacteria is likely releasing non-volatile metabolites into the plate which may affect the worm's preference. A true odor-only assay would have the bacteria on the lid and the worms on the plate.
The ‘odor-only’ diet preference assay does not allow for non-volatile chemicals to reach worms. We achieved this by using tripartite dishes where the compartments containing worms and bacterial odors are separated by polystyrene barriers. At the time of the assay, worms were spotted in a separate compartment from that of bacteria (as shown in schematic 1A). The soluble metabolites released by the bacteria during their growth will accumulate in the agar within the bacterial compartment alone such that worms only encounter the volatile metabolites produced by bacteria wafting past the polystyrene barrier.
(6) The findings of the study should be discussed more in the context of prior literature. For example, AWC neurons have been previously shown to be involved in bacterial preference (Harris et al., 2014; Worthy et al., 2018). In addition, CeMbio bacterial strains (the strains examined in this study) have been previously shown to release isoamyl alcohol (Chai et al. 2024).
Thanks for the suggestion. We have modified the Discussion section to discuss the study in the light of relevant prior literature.
Reviewer #2 (Public review):
Summary:
Siddiqui et al. show that C. elegans prefers certain bacterial strains that have been supplemented with the essential amino acid (EEA) leucine. They convincingly show that some leucine enriched bacteria stimulate the production of isoamyl alcohol (IAA). IAA is an attractive odorant that is sensed by the AWC. The authors an identify a receptor, SRD-12 (SNIF-1), that is expressed in the AWC chemosensory neurons and is required for chemotaxis to IAA. The authors propose that IAA is a predominant olfactory cue that determines diet preference in C. elegans. Since leucine is an EAA, the authors propose that worm IAA sensing allows the animal provides a proxy mechanism to identify EAA rich diets.
Strengths:
The authors propose IAA as a predominant olfactory cue that determines diet preference in C. elegans providing molecular mechanism underlying diet selection. They show that wild isolates of C. elegans have a strong chemotactic response to IAA indicating that IAA is an ecologically relevant odor for the worm. The paper is well written, and the presented data are convincing and well organized. This is an interesting paper that connects chemotactic response with bacterially produced odors and thus provides an understanding of how animals adapt their foraging behavior through the perception of molecules that may indicate the nutritional value.
Weaknesses:
Major:
While I do like the way the authors frame C. elegans IAA sensing as mechanisms to identify leucine (EAA) rich diets it is not fully clear whether bacterial IAA production is a proxy for bacterial leucine levels.
(1) Can the authors measure leucine (or other EAA) content of the different CeMbio strains? This would substantiate the premise in the way they frame this in the introduction. While the authors convincingly show that leucine supplementation induces IAA production in some strains, it is not clear if there are lower leucine levels in the different in non-preferred strains.
Thanks for your suggestion. Estimating leucine levels in various bacteria will provide useful information, and we hope to do so in future studies.
(2) It is not clear whether the non-preferred bacteria in Figure 1A and 1B have the ability to produce IAA. To substantiate the claim that C. elegans prefers CEent1, JUb66, and BIGb0170 due to their ability to generate IAA from leucine, it would measure IAA levels in non-preferred bacteria (+ and - leucine supplementation). If the authors have these data it would be good to include this.
Thanks for the suggestion. We have included the table indicating the presence or absence of IAA production by all the bacteria under + LEU and – LEU conditions (Table S2). Some of the nonpreferred bacteria indeed produce isoamyl alcohol. However, the abundance of IAA in these strains is significantly less than in the preferred bacteria.
Using the available genomic sequence data, we found that all CeMbio strains encode IlvE-like transaminase enzymes[4]. This suggests that presumably all the bacteria have the metabolic capacity to make alpha-ketoisocaproate (an intermediate in IAA biosynthetic pathway) from leucine. However, the regulation of metabolic flux is likely to be quite complex in various bacteria.
(3) The authors would strengthen their claim if they could show that deletion or silencing ilvE enzyme reduces IAA levels and eliminates the increased preference upon leucine supplementation.
We agree that testing worms’ diet preference for the preferred strains upon ilvE knockout will further strengthen the claim for IAA being crucial for finding leucine-enriched diet. Currently the lab does not have the necessary expertise and standardize protocols to do genetic manipulations for the CeMbio strains.
(4) While the three preferred bacteria possess the ilvE gene, it is not clear whether this enzyme is present in the other non-preferred bacterial strains. As far as I know, the CeMbio strains have been sequenced so it should be easy to determine if the non-preferred bacteria possess the capacity to make IAA. Does the expression of ilvE in e.g. E. coli increase its preference index or are the other genes in the biosynthesis pathway missing?
Thanks for the suggestion. Using the available genomic sequence data, we find that all the bacteria in the CeMbio collection possess IlvE-like transaminase necessary for synthesis of alphaketoisocaproate, key metabolite in leucine turn over as well as precursor for IAA [4]. E. coli has an IlvE encoding gene in its genome [2]. However, we do not find IAA in the headspace of E. coli either with or without leucine supplementation. This indicates either (i) E. coli lacks enzymes for subsequent steps in IAA biosynthesis or (ii) leucine provided under the experimental regime is not sufficient to shift the metabolic flux to IAA production.
Previous studies have suggested that in yeast, the final two steps leading to IAA production are catalyzed by decarboxylase and dehydrogenase enzymes1. The genomic and metabolic flux data available for CeMbio do not describe specific enzymes leading up to IAA synthesis [4].
(5) It is strongly implied that leucine-rich diets are beneficial to the worm. Do the authors have data to show the effect on leucine supplementation on C. elegans healthspan, life-span or broodsize?
Edwards et al. 2015 reported a 15% increase in the lifespan of worms upon 1 mM leucine supplementation [5]. Wang et al 2018 also showed lifespan extension upon 1 mM and 10 mM leucine supplementation. They also reported that while leucine supplementation did not have any effect on brood size, it did make worms more resistant to heat, paraquat, and UV-stress [6]. These studies have been included in the discussion section.
Other comments:
Page 6. Figure 2c. While the authors' conclusions are correct based on AWC expts. it would be good at this stage to include the possibility that odors that enriched in the absence of leucine may be aversive.
Thanks for the comment. We have tested the chemotaxis response of the worms for most of the odors produced by CeMbio strains without leucine supplementation. We did not find any odor that is aversive to worms. However, we cannot completely rule out the possibility that a low abundance of aversive odor in the headspace of the bacteria was missed.
Interestingly, we did identify 2-nonanone, a known repellent, in the headspace of the preferred bacteria upon leucine supplementation. However, the abundance of 2-nonanone in headspace of bacteria is relatively low (less than 1% for CEent1, and JUb66, and ~10% for BIGb0170). This suggests that the relative abundance of odors in an odor bouquet may be a relevant factor in determining worms’ reference.
Page 6. IAA increases 1.2-4 folds upon leucine supplementation. If the authors perform a chemotaxis assay with just IAA with 1-2-4 fold differences do you get the shift in preference index as seen with the bacteria? i.e. is the difference in IAA concentration sufficient to explain the shift in bacterial PI upon leucine supplementation? Other attractants such as Acetoin and isobutanol go up in -Leu conditions.
Thanks for the suggestion. As shown in Figure S2H and S2I, when given a choice between a concentration of IAA (1:1000 dilution) attractive to worms and a 4-fold higher amount of IAA, worms chose the latter. This result suggests that worms can distinguish between relatively small difference in concentrations of IAA.
We agree that the relative abundance of Acetoin and Isobutanol is high in -LEU conditions. The presence of other attractants in - LEU conditions should skew the preference of worms for – LEU bacteria. However, we found that worms prefer + LEU bacteria (Figure 1B), suggesting that the abundance of IAA mainly influences the diet preference of the worms.
Page 14-15. The authors identify a putative IAA receptor based on expression studies. I compliment the authors for isolating two CRISPR deletion alleles. They show that the srd-12 (snif-1) mutants have obvious defects in IAA chemotaxis. Very few ligand-odorant receptors combinations have been identified so this is an important discovery. CenGen data indicate that srd-12 (snif-1) is expressed in a limited set of neurons. Did the authors generate a reporter to show the expression of srd-12 (snif-1)? This is a simple experiment that would add to the characterization of the SRD-12 (SNIF-1) receptor. Rescue experiments would be nice even though the authors have independent alleles. To truly claim that SRD-12 (SNIF-1) is the ligand for IAA and activates the AWC neurons would require GCamp experiments in the AWC neuron or heterologous expression system. I understand that GCamp imaging might not be part of the regular arsenal of the lab but it would be a great addition (even in collaboration with one of the many labs that do this regularly). Comparing AWC activity using GCaMP in response IAA-producing bacteria with high leucine levels in both wild-type and SRD-12 (SNIF-1) deficient backgrounds, would further support their narrative. I leave that to the authors.
Thanks for your comments and suggestions. To address this comment, we rescued snif-1 mutant (referred as VSL2401) with extrachromosomal array expressing snif-1 under AWC-specific promoter as well as its native promoter. As shown in Figure 6H and Author response image 2, we find that both transgenic lines show a complete rescue of chemotaxis response to isoamyl alcohol. To find where snif-1 is expressed, we generated a transgenic line of worms expressing GFP under snif-1 promoter, and mCherry under odr-1 promoter (to mark AWC neurons). As shown in Figure 6I, we found that snif-1 is expressed faintly in many neurons, with strong expression in one of the two AWC neurons marked by odr-1::mCherry. This result suggests that SNIF-1 is expressed in AWC neuron.
We hope to perform GCaMP assay and further characterization of SNIF-1 in the future.
Author response image 2.
Chemotaxis index (CI) of WT, VSL2401, VSL2401 [AWCp:: snif-1] and VSL2401 [snif-1p::snif-1] worms to IAA at 1:1000 dilution. Significant differences are indicated as **** P ≤ 0.0001 determined by one-way ANOVA followed by post hoc Dunnett’s multiple comparison test. Error bars indicate SEM (n≥15).
Minor:
Page 4 "These results suggested that worms can forage for diets enriched in specific EAA, leucine...." More precise at this stage would be to state " These results indicated that worms can forage for diets supplemented with specific EAA...".
We have changed the statement in the revised manuscript.
Page 5."these findings suggested that worms not only rely on odors to choose between two bacteria but also to find leucine enriched bacteria" This statement is not clear to me and doesn't follow the data in Fig. S2. Preferred diets in odorant assays are the IAA producing strains.
Thanks for your comment. We have revised the manuscript to make it clear. “Altogether, these findings suggested that worms rely on odors to distinguish different bacteria and find leucineenriched bacteria”. This statement concludes all the data shown in Figure 1 and Figure S1.
Page 5. Figure S2A provides nice and useful data that can be part of the main Figure 1.
Thanks for the comment. We have incorporated the data from Figure S2A to main Figure 1.
Reviewer #3 (Public review):
Summary:
The authors first tested whether EAA supplementation increases olfactory preference for bacterial food for a variety of bacterial strains. Of the EAAs, they found only leucine supplementation increased olfactory preference (within a bacterial strain), and only for 3 of the bacterial strains tested. Leucine itself was not found to be intrinsically attractive.
They determined that leucine supplementation increases isoamyl alcohol (IAA) production in the 3 preferred bacterial strains. They identify the biochemical pathway that catabolizes leucine to IAA, showing that a required enzyme for this pathway is upregulated upon supplementation.
Consistent with earlier studies, they find that AWC olfactory neuron is primarily responsible for increased preference for IAA-producing bacteria.
Testing volatile compounds produced by bacteria and identified by GC/MS, and identified several as attractive, most of them require AWC for the full effect. Adaptation assays were used to show that odorant levels produced by bacterial lawns were sufficient to induce olfactory adaptation, and adaptation to IAA reduced chemotaxis to leucine-supplemented lawns. They then showed that IAA attractiveness is conserved across wild strains, while other compounds are more variable, suggesting IAA is a principal foraging cue.
Finally, using the CeNGEN database, they developed a list of candidate IAA receptors. Using behavioral tests, they show that mutation of srd-12 (snif-1) greatly impairs IAA chemotaxis without affecting locomotion or attraction to another AWC-sensed odor, PEA.
Comments
This study will be of great interest in the field of C. elegans behavior, chemical senses and chemical ecology, and understanding of the sensory biology of foraging.
Strengths:
The identification of a receptor for IAA is an excellent finding. The combination of microbial metabolic chemistry and the use of natural bacteria and nematode strains makes an extremely compelling case for the ecological and adaptive relevance of the findings.
Weaknesses:
AWC receives synaptic input from other chemosensory neurons, and thus could potentially mediate navigation behaviors to compounds detected in whole or in part by those neurons. Language concluding detection by AWC should be moderated (e.g. p9 "worms sense an extensive repertoire...predominantly using AWC") unless it has been demonstrated.
Thanks for your comment. We have modified the manuscript to incorporate the suggestion.
srd-12 (snif-1) is not exclusively expressed in AWC. Normally, cell-specific rescue or knockdown would be used to demonstrate function in a specific cell. The authors should provide such a demonstration or explain why they are confident srd-12 (snif-1) acts in AWC.
Thanks for the comment. We have performed AWC-specific rescue of snif-1 in mutant worms. As shown in Figure 6H, we found that AWC neurons specific rescue completely recovered the chemotaxis defect of the snif-1 mutant (referred as VSL2401) for IAA. In addition, snif-1 is expressed in one of the AWC neurons.
A comparison of AWC's physiological responses between WT and srd-12 (snif-1), preferably in an unc13 background, would be nice. Even further, the expression of srd-12 (snif-1) in a different neuron type and showing that it confers responsiveness to IAA (in this case, inhibition) would be very convincing.
Thanks for the suggestion. We have performed a receptor swap experiment, where snif-1 is misexpressed in AWB neurons. We find that these worms show slight but significant repulsion to IAA compared to WT and snif-1 mutant worms (Author response image 1).
Recommendations for the authors:
Reviewing Editor:
Please consider all of the reviewer comments. In particular, as noted in the individual reviews, the strength of the evidence would be bolstered by additional experiments to demonstrate that the iLvE enzyme affects IAA levels in the preferred bacteria. The reviewers note that the authors haven't shown that IAA production is a reflection of leucine content. Are the non-preferred bacteria low on leucine or lack iLvE or IAA synthesis pathways? Further, more direct evidence that SRD-12 (SNIF-1) is in fact the primary IAA receptor would further strengthen the study. The authors should also be aware that geographic distance for wild isolate C. elegans may not directly correlate with phylogenetic distance. This should be assessed/discussed for the strains used.
Thanks for the suggestions. Some of these have been addressed in response to reviewers. Thanks for your comments about possible disconnect between geographical and phylogenetic distances amongst natural isolates used here.
By analyzing the phylogenetic tree generated using neighbor-joining algorithm available at CaeNDR database, we found that QX1211 and JU3226 are phylogenetically close, but the remaining isolates fall under different clades separated by long phylogenetic distances [7,8].
Reviewer #1 (Recommendations for the authors):
(1) In the first sentence of the third paragraph of the introduction, C. elegans are described as "soildwelling." Although C. elegans has been described as soil-dwelling in the past, current research indicates they are most often found on rotten fruit, compost heaps and other bacterial-rich environments, not soil. "All Caenorhabditis species are colonizers of nutrient- and bacteria-rich substrates and none of them is a true soil nematode." from Kiontke, K. and Sudhaus, W. Ecology of Caenorhabditis species (WormBook).
Your specific comment about C. elegans’ habitat is well received. However, in that sentence we are referring to the chemosensory system of soil-dwelling animals in general, and not particularly C. elegans.
(2) Figure 3K, the model would be clearer if leucine-rich diet -> volatile chemicals ->AWC (instead of leucine-rich diet -> AWC <- volatile chemicals). The leucine-rich diet results in the production of volatile chemicals which are detected by AWC.
We have modified the figure to make it clearer.
(3) Figure 4 - it would help to include a table summarizing the volatile chemicals that each bacteria releases. Then the reader could more easily evaluate whether the adaptation to each specific odor is consistent with the change in preference for the specific bacteria based on what it releases in its headspace. In addition, Figure 4 would help to clarify whether bacteria in these experiments were cultured with or without leucine supplementation.
Table S2 summarizes the odors released by all the bacteria under + LEU and – LEU conditions.
In Figure 4, adaptation was performed by odors of bacteria when cultured under leucineunsupplemented conditions.
Reviewer #2 (Recommendations for the authors):
Page 9. Previous studies e.g. Bargmann Hartwieg and Horvitz have shown IAA is sensed by the AWC. Would be good to cite appropriately.
Thanks for the comment. The reference has been cited at p9 and p16.
References:
(1) Yuan, J., Mishra, P., and Ching, C.B. (2017). Engineering the leucine biosynthetic pathway for isoamyl alcohol overproduction in Saccharomyces cerevisiae. Journal of Industrial Microbiology and Biotechnology 44, 107-117. 10.1007/s10295-016-1855-2 %J Journal of Industrial Microbiology and Biotechnology.
(2) Kanehisa, M., Furumichi, M., Sato, Y., Matsuura, Y., and Ishiguro-Watanabe, M. (2025). KEGG: biological systems database as a model of the real world. Nucleic Acids Res 53, D672-d677. 10.1093/nar/gkae909.
(3) Shingai, R., Wakabayashi, T., Sakata, K., and Matsuura, T. (2005). Chemotaxis of Caenorhabditis elegans during simultaneous presentation of two water-soluble attractants, llysine and chloride ions. Comparative biochemistry and physiology. Part A, Molecular & integrative physiology 142, 308-317. 10.1016/j.cbpa.2005.07.010.
(4) Dirksen, P., Assié, A., Zimmermann, J., Zhang, F., Tietje, A.M., Marsh, S.A., Félix, M.A., Shapira, M., Kaleta, C., Schulenburg, H., and Samuel, B.S. (2020). CeMbio - The Caenorhabditis elegans Microbiome Resource. G3 (Bethesda, Md.) 10, 3025-3039. 10.1534/g3.120.401309.
(5) Edwards, C., Canfield, J., Copes, N., Brito, A., Rehan, M., Lipps, D., Brunquell, J., Westerheide, S.D., and Bradshaw, P.C. (2015). Mechanisms of amino acid-mediated lifespan extension in Caenorhabditis elegans. BMC genetics 16, 8. 10.1186/s12863-015-0167-2.
(6) Wang, H., Wang, J., Zhang, Z.J.J.o.F., and Research, N. (2018). Leucine Exerts Lifespan Extension and Improvement in Three Types of Stress Resistance (Thermotolerance, AntiOxidation and Anti-UV Irradiation) in C. elegans. 6, 665-673.
(7) Crombie, T.A., McKeown, R., Moya, N.D., Evans, Kathryn S., Widmayer, Samuel J., LaGrassa, V., Roman, N., Tursunova, O., Zhang, G., Gibson, Sophia B., et al. (2023). CaeNDR, the Caenorhabditis Natural Diversity Resource. Nucleic Acids Research 52, D850-D858. 10.1093/nar/gkad887 %J Nucleic Acids Research.
(8) Cook, D.E., Zdraljevic, S., Roberts, J.P., and Andersen, E.C. (2017). CeNDR, the Caenorhabditis elegans natural diversity resource. Nucleic Acids Res 45, D650-d657. 10.1093/nar/gkw893.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This important study characterized and identified clonal MSC populations from human synovium. The authors provide convincing evidence that clonal MSC populations can be isolated and expanded from both normal and osteoarthritic synovium and that CD47 represents a potential marker for improved chondrogenic potential of MSC sub-populations. These findings could provide new avenues for osteoarthritis treatment in the future and deeper mechanistic understanding of the factors involved in the repair.
-
Reviewer #1 (Public review):
Summary:
This work by Al-Jezani et al. focused on characterizing clonally derived MSC populations from the synovium of normal and osteoarthritis (OA) patients. This included characterizing the cell surface marker expression in situ (at time of isolation), as well as after in vitro expansion. The group also tried to correlate marker expression with trilineage differential potential. They also tested the ability of the different sub-populations for their efficacy in repairing cartilage in a rat model of OA. The main finding of the study is that CD47hi MSCs may have a greater capacity to repair cartilage than CD47lo MSCs, suggesting that CD47 may be a novel marker of human MSCs that have enhanced chondrogenic potential.
Strengths:
Studies on cell characterization of the different clonal populations isolated indicate that the MSC are heterogenous and traditional cell surface markers for MSCs do not accurately predict the differentiation potential of MSCs. While this has been previously established in the field of MSC therapy, the authors did attempt to characterize clones derived from single cells, as well as evaluate the marker profile at the time of isolation. While the outcome of heterogeneity is not surprising, the methods used to isolate and characterze the cells were well developed. The interesting finding of the study is the identification of CD47 as a potential MSC marker that could be related to chondrogenic potential. The authors suggest that MSCs with high CD47 repaired cartilage more effectively than MSC with low CD47 in a rat OA model.
Comments on revisions:
Thank you for addressing the comments from the first review. No additional revisions.
-
Reviewer #2 (Public review):
Summary:
This is a compelling study that systematically characterized and identified clonal MSC populations derived from normal and osteoarthritis human synovium. There is immense growth in the focus on synovial-derived progenitors in the context of both disease mechanisms and potential treatment approaches, and the authors sought to understand the regenerative potential of synovial-derived MSCs.
Strengths:
This study has multiple strengths. MSC cultures were established from an impressive number of human subjects, and rigorous cell surface protein analyses were conducted, at both pre-culture and post-culture timepoints. In vivo experiments using a rat DMM model showed beneficial therapeutic effects of MSCs vs non-MSCs, with compelling data demonstrating that only "real" MSC clones incorporate into cartilage repair tissue and express Prg4. Proteomics analysis was performed to characterize non-MSC vs MSC cultures, and high CD47 expression was identified as a marker for MSC. Injection of CD47-Hi vs CD47-Low cells in the same rat DMM model also demonstrated beneficial effects, albeit only based on histology. A major strength of these studies is the direct translational opportunity for novel MSC-based therapeutic interventions, with high potential for a "personalized medicine" approach.
Weaknesses:
Weaknesses of this study include the rather cursory assessment of the OA phenotype in the rat model, confined entirely to histology (i.e. no microCT, no pain/behavioral assessments, no molecular readouts). This is relevant given the mixed results in therapeutic experiments demonstrating lower OA scores, but not lower inflammation scores, in CD47-Hi-treated rats. Thus, future work should focus on characterizing the therapeutic mechanism further given the clinical relevant of inflammation and pain in OA. It is somewhat unclear how the authors converged on CD47 vs other factors, but despite its somewhat broad profile, it was shown to be a useful marker to differentiate functional effects of MSCs. Additional work is needed to understand whether MSCs also engraft in ectopic cartilage (in the context of osteophyte/chondrophyte formation) or whether their effects are limited to articular cartilage. Despite these areas for improvement, this is a strong paper with a high degree of rigor, and the results are compelling, timely, and important.
Overall, the authors achieved their aims, and the results support not just the therapeutic value of clonally-isolated synovial MSCs but also the immense heterogeneity in stromal cell populations (containing true MSCs and non-MSCs) that must be investigated further. Of note, the authors employed the ISCT criteria to characterize MSCs, with mixed results in pre-culture and post-culture assessments. This work is likely to have a long-term impact on methodologies used to culture and study MSCs, in addition to advancing the field's knowledge about how synovial-derived progenitors contribute to cartilage repair in vivo.
Comments on revisions:
I commend the authors for a good revision. While the revision primarily entailed re-analysis or additional analysis of existing data, as well as text-based changes, it improved the clarity and completeness of the manuscript.
I do encourage the authors to expand their phenotyping assessments in future studies given that the interaction between structural disease, inflammation, and pain is complex, and our understanding of how the two interact and affect each other is evolving. There are multiple recent publications that show that a therapeutic or knock-out is protective against cartilage damage but doesn't alleviate pain, or vice versa. Thus, as a field, understanding which therapies target which pathological manifestations is an important next step to advance treatments. I also look forward to the follow-up studies on the MSC's role in ectopic cartilage.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary:
This work by Al-Jezani et al. focused on characterizing clonally derived MSC populations from the synovium of normal and osteoarthritis (OA) patients. This included characterizing the cell surface marker expression in situ (at time of isolation), as well as after in vitro expansion. The group also tried to correlate marker expression with trilineage differential potential. They also tested the ability of the different subpopulations for their efficacy in repairing cartilage in a rat model of OA. The main finding of the study is that CD47hi MSCs may have a greater capacity to repair cartilage than CD47lo MSCs, suggesting that CD47 may be a novel marker of human MSCs that have enhanced chondrogenic potential.
Strengths:
Studies on cell characterization of the different clonal populations isolated indicate that the MSC are heterogenous and traditional cell surface markers for MSCs do not accurately predict the differentiation potential of MSCs. While this has been previously established in the field of MSC therapy, the authors did attempt to characterize clones derived from single cells, as well as evaluate the marker profile at the time of isolation. While the outcome of heterogeneity is not surprising, the methods used to isolate and characterize the cells were well developed. The interesting finding of the study is the identification of CD47 as a potential MSC marker that could be related to chondrogenic potential. The authors suggest that MSCs with high CD47 repaired cartilage more effectively than MSC with low CD47 in a rat OA model.
Weaknesses:
While the identification of CD47 as a novel MSC marker could be important to the field of cell therapy and cartilage regeneration, there was a lack of robust data to support the correlation of CD47 expression to chondrogenesis. The authors indicated that the proteomics suggested that the MSC subtype expressed significantly more CD47 than the non-MSC subtype. However, it was difficult to appreciate where this was shown. It would be helpful to clearly identify where in the figure this is shown, especially since it is the key result of the study. The authors were able to isolate CD47hi and CD47 low cells. While this is exciting, it was unclear how many cells could be isolated and whether they needed to be expanded before being used in vivo. Additional details for the CD47 studies would have strengthened the paper. Furthermore, the CD47hi cells were not thoroughly characterized in vitro, particularly for in vitro chondrogenesis. More importantly, the in vivo study where the CD47hi and CD47lo MSCs were injected into a rat model of OA lacked experimental details regarding how many cells were injected and how they were labeled. No representative histology was presented and there did not seem to be a statistically significant difference between the OARSI score of the saline injected and MSC injected groups. The repair tissue was stained for Sox9 expression, which is an important marker of chondrogenesis but does not show production of cartilage. Expression of Collagen Type II would be needed to more robustly claim that CD47 is a marker of MSCs with enhanced repair potential.
Reviewer #2 (Public review):
Summary:
This is a compelling study that systematically characterized and identified clonal MSC populations derived from normal and osteoarthritis human synovium. There is immense growth in the focus on synovial-derived progenitors in the context of both disease mechanisms and potential treatment approaches, and the authors sought to understand the regenerative potential of synovial-derived MSCs.
Strengths:
This study has multiple strengths. MSC cultures were established from an impressive number of human subjects, and rigorous cell surface protein analyses were conducted, at both pre-culture and post-culture timepoints. In vivo experiments using a rat DMM model showed beneficial therapeutic effects of MSCs vs non-MSCs, with compelling data demonstrating that only "real" MSC clones incorporate into cartilage repair tissue and express Prg4. Proteomics analysis was performed to characterize non-MSC vs MSC cultures, and high CD47 expression was identified as a marker for MSC. Injection of CD47-Hi vs CD47-Low cells in the same rat DMM model also demonstrated beneficial effects, albeit only based on histology. A major strength of these studies is the direct translational opportunity for novel MSC-based therapeutic interventions, with high potential for a "personalized medicine" approach.
Weaknesses:
Weaknesses of this study include the rather cursory assessment of the OA phenotype in the rat model, confined entirely to histology (i.e. no microCT, no pain/behavioral assessments, no molecular readouts). It is somewhat unclear how the authors converged on CD47 vs the other factors identified in the proteomics screen, and additional information is needed to understand whether true MSCs only engraft in articular cartilage or also in ectopic cartilage (in the context of osteophyte/chondrophyte formation). Some additional discussion and potential follow-up analyses focused on other cell surface markers recently described to identify synovial progenitors is also warranted. A conceptual weakness is the lack of discussion or consideration of the multiple recent studies demonstrating that DPP4+ PI16+ CD34+ stromal cells (i.e. the "universal fibroblasts") act as progenitors in all mesenchymal tissues, and their involvement in the joint is actively being investigated. Thus, it seems important to understand how the MSCs of the present study are related to these DPP4+ progenitors. Despite these areas for improvement, this is a strong paper with a high degree of rigor, and the results are compelling, timely, and important.
Overall, the authors achieved their aims, and the results support not just the therapeutic value of clonally-isolated synovial MSCs but also the immense heterogeneity in stromal cell populations (containing true MSCs and non-MSCs) that must be investigated further. Of note, the authors employed the ISCT criteria to characterize MSCs, with mixed results in pre-culture and post-culture assessments. This work is likely to have a longterm impact on methodologies used to culture and study MSCs, in addition to advancing the field's knowledge about how synovial-derived progenitors contribute to cartilage repair in vivo.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
In all figures, it would be beneficial to report the sample number used for the data reported. It is difficult to appreciate the statistical analysis without that information.
Understood, the sample number and replicates have been added to each figure legend.
Please check that Table S7 is part of the manuscript. It could not be found.
It was added as an additional excel file since it was too large to fit in the word document.
Lines 377-379 (Figure 2E): the authors write that rats receiving MSCs had a significantly lower OARSI and Krenn score vs. rats injected with non-MSCs. However, none of the bars indicating statistical significance run between these two groups. Please verify the text and figure.
This has been corrected
The details surrounding the labeling of the cells with tdTomato were not presented in the methods.
This has been added to the methods
The fluorescent antibodies used should be listed and more details provided in the methods rather than a general statement that fluorescent antibodies were used.
Our apologies, the clones and companies have been added.
Additional information on the CD47 experiments (# cells, # animals) would have strengthened the study.
This has been added to the methods and figure legend.
Reviewer #2 (Recommendations for the authors):
My comments span minor corrections, requests for additional analyses, some suggestions for additional experiments, and requests for additional discussion of recent important studies.
Introduction:
The introduction is thorough and well-written. I recommend a brief discussion about the emerging evidence demonstrating that DPP4+ PI16+ CD34+ synovial cells, i.e. the "universal fibroblasts", act as stromal progenitors in development, homeostasis, and disease. Relevant osteoarthritis-related papers encompass human and mouse studies (PMIDs: 39375009, 38266107, 38477740, 36175067, 36414376).
This has been added.
Relatedly, as DPP4 is CD26 and therefore useful as a cell-surface antigen for flow cytometry, sorting, etc, it would be interesting to understand the relationship and similarities between the CD47-High cells identified in this study and the DPP4/PI16+ cells previously described. Do they overlap in phenotype/identity?
We have added a new flow cytometry figure for address this question.
Results:
Note type-o on Line 311: "preformed" instead of "performed". Line 313 "prolife" instead of "profile"
Thank you for catching these.
The identified convergence of the cell surface marker profile of all normal and OA clones in culture is a highly intriguing result. Do the authors have stored aliquots of these cells to demonstrate whether this would also occur in soft substrate, i.e. low stiffness culture conditions? This could be done with standard dishes coated with bulk collagen or with commercially available low-stiffness dishes (1 kPa). This is relevant to multiple studies demonstrating the induction of a myofibroblast-like phenotype by stromal cells cultured on high-stiffness plastic or glass. This is also the experiment where assessment of DPP4/CD26 could be added, if possible.
While we agree it would be interesting to determine the mechanism by which the cells phenotypes converge, we would argue that it is outside of the scope of the current manuscript. We have instead added a sentence to the discussion.
Line 353 regarding the use of CD68 as a negative gate: can the authors pleasecomment on why they employed CD68 here and not CD45? While monocytes/macs/myeloid cells are the most abundant immune cells in synovium, CD45 would more comprehensively exclude all immune cells.
That is a fair point, and we really don’t have any reason to have picked CD68 over CD45. In our opinion either would be a fair negative marker to use based on the literature.
Fig 2, minor suggestion: consider adding "MSC vs non-MSC" on the experimental schematic to more comprehensively summarize the experiment.
This has been modified
Fig 2E should show all individual datapoints, not just bar graphs.
This has been modified
Fig 2: Given the significant reduction in Krenn score in DMM-MSC injected knees compared to DMM-saline knees, Fig 2 should also show representative images of the synovial phenotype to demonstrate which aspects of synovial pathology were mitigated. Was the effect related to lining hyperplasia, subsynovial infiltrate, fibrosis, etc? Similarly, can the authors narrate which aspects of the OARSI score drove the treatment effect (proteoglycans vs structure vs osteophytes, etc).
We have added a new sup figure breaking down the Krenn score as well as higher magnification images of representative synovium.
Fig 2: In the absence of microCT imaging, can the authors quantify subchondral bone morphometrics using multiple histological sections? The tibial subchondral bone in Fig 2D appears protected from sclerosis/thickening.
Unfortunately, this is beyond what are able to add to the manuscript.
The Fig 3 results are highly compelling and interesting. Congratulations.
Thank you very much.
Fig 4A: the cell highlighted in the high-mag zoom box in Fig 4A appears to be localized within the joint capsule or patellar tendon (it is unclear which anatomic region this image represents). The highly aligned nature of the tissue and cells along a fibrillar geometry indicates that this is not synovium. The interface between synovium and the tissue in question can be clearly observed in this image. Please choose an image more representative of synovium.
We completely agree with the reviewers assessment. However, it is the synovium that overlays this tissue (Fig 4A arrow). We are simply showing that there were very few MSCs that took up residence in the synovium or the adjacent tissues.
Fig 4C and F: please show individual data points.
This has been added
Fig 5D: I see DPP4 and ITGA5 were also hits in the proteomics analysis, which is intriguing. Besides my comments/suggestions regarding DPP4 above, please note this recent paper identifying a ITGA5+ synovial fibroblast subset that orchestrates pathological crosstalk with lymphocytes in RA, PMID: 39486872
Thank you for the information. We have added the reference in the results section.
Fig 5B-D: How did the authors converge on CD47 as the target for follow-up study? It does not appear to be a differentially-expressed protein based on the Volcano plot in Fig 5B, and it's unclear why it is a more important factor than any of the other proteins shown in the network diagram in Fig 5D, e.g. CTSL, ITGA5, DPP4. Can the authors add a quantitative plot supporting their statement "the MSC sub-type expressed significantly more CD47 than the non-MSCs" on Line 458?
We have re-written this line. It was incorrect to discuss amount of CD47. That was shown later with the flow analysis.
Fig 6D: Please show individual data points and also representative histology images to demonstrate the nature of the phenotypic effect.
This has been added.
Fig 6E-F: In what anatomic region are these images? Please add anatomic markers to clarify the location and allow the reader to interpret whether this is articular cartilage or ectopic cartilage
We have redone the figure to show the area as requested.
Relevant to this, do the authors observe this type of cellular engraftment in ectopic cartilage/osteophytes or only in articular cartilage? Understanding the contribution of these cells to the formation/remodeling of various cartilage types in the context of OA is a critical aspect of this line of investigation.
We didn’t see any contribution of these cells to ectopic cartilage formation and are actively working on a follow up study discussing this point specifically.
Discussion:
Besides my comments regarding DPP4 and ITGA5 above, the authors may also consider discussing PMID: 37681409 (JCI Insight 2023), which demonstrates that adult Prg4+ progenitors derived from synovium contribute to articular cartilage repair in vivo.
We agree that there are numerous markers we could look at in future studies and that other people in the field are actively studying.
-
-
-
eLife Assessment
This important study shows that a controlled pause in gene reading is required for early heart cells to form during development. The authors demonstrate that loss of this pause prevents the proper activation of the heart-producing program across animal and stem cell systems. The evidence is compelling, supported by careful genomic and functional analyses that clearly define the developmental block. Overall, this work will interest developmental biologists and inspire further studies on the origins of early heart defects.
-
Reviewer #1 (Public review):
This is a highly original and impactful study that significantly advances our understanding of transcriptional regulation, in particular RNAPII pausing, during early heart development. The Chen lab has a long history of producing influential studies in cardiac morphogenesis, and this manuscript represents another thorough and mechanistically insightful contribution. The authors have thoroughly addressed this Reviewer's concerns and incorporated all of my suggestions in the revised manuscript. In addition, their responses to the other reviewer's comments are also very clear. As it is, this work is of great interest to the readership of Elife, as well as to the general scientific community.
The authors reveal a fundamentally new role for Rtf1-a component of the PAF1 complex-in governing promoter-proximal RNAPII pausing in the context of myocardial lineage specification. While transcriptional pausing has been implicated in stress responses and inducible gene programs, its developmental relevance has remained poorly defined. This study fills that gap with rigorous in vivo evidence demonstrating that Rtf1-dependent pausing is indispensable for activating the cardiac gene program from the lateral plate mesoderm.
Importantly, the study also provides compelling therapeutic implications. Showing that CDK9 inhibition-using either flavopiridol or targeted knockdown-can restore promoter-proximal pausing and rescue cardiomyocyte formation in Rtf1-deficient embryos suggests that modulation of pause-release kinetics may represent a new avenue for correcting transcriptionally driven congenital heart defects. Given that many CDK inhibitors are clinically approved or in active development, this connection significantly elevates the translational impact of the findings.
In sum, this study is rigorous, innovative, and transformative in its implications for developmental biology and cardiac medicine. I strongly support its publication.
-
Reviewer #2 (Public review):
Summary:
Langenbacher at el. examine the requirement of Rtf1, a component of the PAF1C complex, which regulates transcriptional pausing in cardiac development. The authors first confirm that newly generated rtf1 mutant alleles recapitulate the defects in cardiac progenitor differentiation found using morpholinos from their previous work. The authors then show that conditional loss of Rtf1 in mouse embryos and depletion in mouse ESCs both demonstrates a failure to turn on cardiac progenitor and differentiation marker genes, supporting conservation of Rtf1 in promoting vertebrate cardiac progenitor development. The authors then employ bulk RNA-seq on flow-sorted hand2:GFP+ cells and multiomic single-cell RNA-seq on whole Rtf1-depleted zebrafish embryos at the 10-12 somite stage. These experiments corroborate that gene expression associated with cardiac progenitor differentiation is lost. Furthermore, analysis of differentiation trajectories suggests that the expression of genes associated with cardiac, blood, and endothelial progenitor differentiation is not initiated within the anterior lateral plate mesoderm. Structure-function analysis supports that the Rtf1 Plus3 domain is necessary for its function in promoting cardiac progenitor differentiation. ChIP-seq for RNA Pol II on 10-12 somite stage zebrafish embryos supports that Rtf1 is required for proper promoter pausing at the transcriptional start site. The transcriptional promoter pausing defect and cardiac differentiation can partially be rescued in zebrafish rtf1 mutants through pharmacological inhibition and depletion of Cdk9, a kinase that inhibits elongation. Thus, the authors have provided a clear analysis of the requirements and basic mechanism that Rf1 employs regulating cardiac progenitor development.
Strengths and weaknesses:
Overall, the data presented are strong and the message of the study is clear. The conclusions that Rtf1 is required for transcriptional pause release and promotes vertebrate cardiac progenitor differentiation are supported. Areas of strength include the complementary approaches in zebrafish and mouse embryos, and mouse embryonic stem cells, which together support the conserved requirement for Rtf1 in promoting cardiac differentiation. The bulk and single-cell RNA-sequencing analyses provide further support for this model via examining broader gene expression. In particular, the pseudotime analysis bolsters that there is a broader effect on differentiation of anterior lateral plate mesoderm derivatives. The structure-function analysis provides a relatively clean demonstration of the requirement of the Rtf1 Plus3 domain. The pharmacological and depletion epistasis of Cdk9 combined with the RNA Pol II ChIP-seq nicely support the mechanism implicating Cdk9 in the Rtf1-dependent RNA Pol II promoter pausing. Additionally, this is a revised manuscript. The authors were overall responsive to the previous critiques. The new analysis and revisions have helped to strengthen their hypothesis and improve the clarity of their study. While the revised manuscript is significantly improved, the lack of analysis from the multiomic analysis still represents a lost opportunity to provide further insight into Rtf1 mechanisms within this study. However, the authors have nevertheless achieved their goal for this study. The data sets reported will also be useful tools for further analysis and integration by the cardiovascular development community. Thus, the study will be of interest to scientists studying cardiovascular development and those broadly interested in epigenetic regulation controlling vertebrate development.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public Review):
Summary:
The manuscript submitted by Langenbacher et al., entitled " Rtf1-dependent transcriptional pausing regulates cardiogenesis", describes very interesting and highly impactful observations about the function of Rtf-1 in cardiac development. Over the last few years, the Chen lab has published novel insights into the genes involved in cardiac morphogenesis. Here, they used the mouse model, the zebrafish model, cellular assays, single cell transcription, chemical inhibition, and pathway analysis to provide a comprehensive view of Rtf1 in RNAPII (Pol2) transcription pausing during cardiac development. They also conducted knockdown-rescue experiments to dissect the functions of Rtf1 domains.
Strengths:
The most interesting discovery is the connection between Rtf1 and CDK9 in regulating Pol2 pausing as an essential step in normal heart development. The design and execution of these experiments also demonstrate a thorough approach to revealing a previously underappreciated role of Pol2 transcription pausing in cardiac development. This study also highlights the potential amelioration of related cardiac deficiencies using small molecule inhibitors against cyclin dependent kinases, many of which are already clinically approved, while many other specific inhibitors are at various preclinical stages of development for the treatment of other human diseases. Thus, this work is impactful and highly significant.
We thank the reviewer for appreciating our work.
Reviewer #2 (Public Review):
Summary:
Langenbacher at el. examine the requirement of Rtf1, a component of the PAF1C, which regulates transcriptional pausing in cardiac development. The authors first confirm their previous morphant study with newly generated rtf1 mutant alleles, which recapitulate the defects in cardiac progenitor and diUerentiation gene expression observed previously in morphants. They then examine the conservation of Rtf1 in mouse embryos and embryonic stem cell-derived cardiomyocytes. Conditional loss of Rtf1 in mesodermal lineages and depletion in murine ESCs demonstrates a failure to turn on cardiac progenitor and diUerentiation marker genes, supporting conservation of Rtf1 in promoting cardiac development. The authors subsequently employ bulk RNA-seq on flow-sorted hand2:GFP+ cells and multiomic single-cell RNA-seq on whole Rtf1-depleted embryos at the 10-12 stage. These experiments corroborate that genes associated with cardiac and muscle development are lost. Furthermore, the diUerentiation trajectories suggest that the expression of genes associated with cardiac maturation is not initiated. Structure-function analysis supports that the Plus3 domain is necessary for its function in promoting cardiac progenitor formation. ChIP-seq for RNA Pol II on 1012 somite stage embryos suggests that Rtf1 is required for proper promoter pausing. This defect can partially be rescued through use of a pharmacological inhibitor for Cdk9, which inhibits elongation, can partially restore elongation in rtf1 mutants.
Strengths:
Many aspects of the data are strong, which support the basic conclusions of the authors that Rtf1 is required for transcriptional pausing and has a conserved requirement in vertebrate cardiac development. Areas of strength include the genetic data supporting the conserved requirement for Rtf1 in promoting cardiac development, the complementary bulk and single-cell RNA-sequencing approaches providing some insight into the gene expression changes of the cardiac progenitors, the structure-function analysis supporting the requirement of the Plus3 domain, and the pharmacological epistasis combined with the RNA Pol II ChIP-seq, supporting the mechanism implicating Cdk9 in the Rtf1 dependent mechanism of RNA Pol II pausing.
We thank the reviewer for the summary and for recognizing many strengths of our work.
Weaknesses:
While most of the basic conclusions are supported by the data, there are a number of analyses that are confusing as to why they chose to perform the experiments the way they did and some places where the interpretations presently do not support the interpretations. One of the conclusions is that the phenotype aUects the maturation of the cardiomyocytes and they are arresting in an immature state. However, this seems to be mostly derived from picking a few candidates from the single cell data in Fig. 6. If that were the case, wouldn't the expectation be to observe relatively normal expression of earlier marker genes required for specification, such as Nkx2.5 and Gata5/6? The in situ expression analysis from fish and mice (Fig. 2 and Fig. 3) and bulk RNA-seq (Fig. 5) seems to suggest that there are pretty early specification and diUerentiation defects. While some genes associated with cardiac development are not changed, many of these are not specific to cardiomyocyte progenitors and expressed broadly throughout the ALPM. Similarly, it is not clear why a consistent set of cardiac progenitor genes (for instance mef2ca, nkx2.5, and tbx20) was analyzed for all the experiments, in particular with the single cell analysis.
A major conclusion of our study is that Rtf1 deficiency impairs myocardial lineage differentiation from mesoderm, as suggested by the reviewer. Thus, the main goal of this study is to understand how Rtf1 drives cardiac differentiation from the LPM, rather than the maturation of cardiomyocytes. Multiple lines of evidence support this conclusion:
(a) In situ hybridization showed that Rtf1 mutant embryos do not have nkx2.5+ cardiac progenitor cells and subsequently fail to produce cardiomyocytes (Figs. 2, 3).
(b) RT-PCR analysis showed that knockdown of Rtf1 in mouse embryonic stem cells causes a dramatic reduction of cardiac gene expression and production of significantly fewer beating patches (Fig.4).
(c) Bulk RNA sequencing revealed significant downregulation of cardiac lineage genes, including nkx2.5 (Fig. 5).
(d) Single cell RNA sequencing clearly showed that lateral plate mesoderm (LPM) cells are significantly more abundant in Rtf1 morphant,s whereas cardiac progenitors are less abundant (Fig. 6 and Fig.6 Supplement 1-5).
When feasible, we used cardiac lineage restricted markers in our assays. Nkx2.5 and tbx5a are not highlighted in the single cell analysis because their expression in our sc-seq dataset was too low to examine in the clustering/trajectory analysis. In this revised manuscript, we provide violin plots showing the low expression levels of these genes in single cells from Rtf1 deficient embryos (Figure 6 Supplement 5).
The point of the multiomic analysis is confusing. RNA- and ATAC-seq were apparently done at the same time. Yet, the focus of the analysis that is presented is on a small part of the RNA-seq data. This data set could have been more thoroughly analyzed, particularly in light of how chromatin changes may be associated with the transcriptional pausing. This seems to be a lost opportunity. Additionally, how the single cell data is covered in Supplemental Fig. 2 and 3 is confusing. There is no indication of what the diUerent clusters are in the Figure or the legend.
In this study, we performed single cell multiome analysis and used both scRNAseq and scATACseq datasets to generate reliable clustering. The scRNAseq analysis reveals how Rtf1 deficiency impacts cardiac differentiation from mesoderm, which inspired us to investigate the underlying mechanism and led to the discovery of defects in Rtf1-dependent transcriptional pause release.
We agree with the reviewer that deep examination of Rtf1-dependent chromatin changes would provide additional insights into how Rtf1 influences early development and careful examination of the scATACseq dataset is certainly a good future direction.
In this revised manuscript, we have revised Fig.6 Supplement 1 to include the predicted cell types and provide an additional excel file showing the annotation of all 39 clusters (Supplementary Table 2).
While the effect of Rtf1 loss on cardiomyocyte markers is certainly dramatic, it is not clear how well the mutant fish have been analyzed and how specific the eUect is to this population. It is interpreted that the eUects on cardiomyocytes are not due to "transfating" of other cell fates, yet supplemental Fig. 4 shows numerous eUects on potentially adjacent cell populations. Minimally, additional data needs to be provided showing the live fish at these stages and marker analysis to support these statements. In some images, it is not clear the embryos are the same stage (one can see pigmentation in the eyes of controls that is not in the mutants/morphants), causing some concern about developmental delay in the mutants.
Single cell RNA sequencing showed an increased abundance of LPM cells and a reduced abundance of cardiac progenitors in Rtf1 morphants (Fig. 6 and Fig.6 Supplement 1-5). The reclustering of anterior lateral plate mesoderm (ALPM) cells and their derivatives further showed that cells representing undiRerentiated ALPM were increased whereas cells representing all three ALPM derivatives were reduced. These findings indicate a defect in ALPM diRerentiation.
The reviewer questioned whether we examined stage-matched embryos. In our assay, Rtf1 mutant embryos were collected from crosses of Rtf1 heterozygotes. Each clutch from these crosses consists of ¼ embryos showing rtf1 mutant phenotypes and ¾ embryos showing wild type phenotypes which were used as control. Mutants and their wild type siblings were fixed or analyzed at the same time.
The reviewer questioned the specificity of the Rtf1 deficient cardiac phenotype and pointed out that Rtf1 mutant embryos do not have pigment cells around the eye. Rtf1 is a ubiquitously expressed transcriptional regulator. Previous studies in zebrafish have shown that Rtf1 deficiency significantly impacts embryonic development. Rtf1 deficiency causes severe defects in cardiac lineage and neural crest cell development; consequently, Rtf1 deficient embryos do not have cardiomyocytes and pigmentation (Langenbacher et al., 2011, Akanuma et al., 2007, and Jurynec et al., 2019). We now provide an image showing a 2-day-old Rtf1 mutant embryo and their wild type sibling to illustrate the cardiac, neural crest, and somitogenesis defects caused by loss of Rtf1 activity (Fig. 2 Supplement 1).
With respect to the transcriptional pausing defects in the Rtf1 deficient embryos, it is not clear from the data how this eUect relates to the expression of the cardiac markers. This could have been directly analyzed with some additional sequencing, such as PRO-seq, which would provide a direct analysis of transcriptional elongation.
We showed that Rtf1 deficiency results in a nearly genome-wide decrease in promoterproximal pausing and downregulation of cardiac makers. Attenuating transcriptional pause release could restore cardiomyocyte formation in Rtf1 deficient embryos. In this revised manuscript, we provide additional RNAseq data showing that the expression levels of critical cardiac development genes such as nkx2.5, tbx5a, tbx20, mef2ca, mef2cb, ttn.2, and ryr2b are significantly rescued. We agree with the reviewer that further analyses using the PRO-seq approach could provide additional insights, but it is beyond the scope of this manuscript.
Some additional minor issues include the rationale that sequence conservation suggests an important requirement of a gene (line 137), which there are many examples this isn't the case, referencing figures panels out of order in Figs. 4, 7, and 8) as described in the text, and using the morphants for some experiments, such as the rescue, that could have been done in a blinded manner with the mutants.
We have clarified the rationale in this revised manuscript and made the eRort to reference figures in order.
The reviewer commented that rescue experiments “could have been done in a blinded manner with the mutants”. This was indeed how the flavopiridol rescue and cdk9 knockdown experiments were carried out. Embryos from crosses of Rtf1 heterozygotes were collected, fixed after treatment and subjected to in situ hybridization. Embryos were then scored for cardiac phenotype and genotyped (Fig.8 d-g). Morpholino knockdown was used in genomic experiments because our characterization of rtf1 morphants showed that they faithfully recapitulate the rtf1 mutant phenotype during the timeframe of interest (Fig. 2).
Recommendations for the authors:
Reviewer #1 (Recommendations For The Authors):
This reviewer has a few suggestions below, aimed at improving the clarity and impact of the current study. Once these items are addressed, the manuscript should be of interest to the Elife reader.
Item 1. Strengthening the interaction between Rfh1 and CDK9 on Pol2 pausing.
The authors have convincingly shown that the chemical inhibition of CDK9 by flavopiridol can partially rescue the expression of cardiac genes in the zebrafish model. Although flavopiridol is FDA approved and has been a classical inhibitor for the dissection of CDK9 function, it also inhibits related CDKs (such as Flavopiridol (Alvocidib) competes with ATP to inhibit CDKs including CDK1, CDK2, CDK4, CDK6, and CDK9 with IC50 values in the 20-100 nM range) Therefore, this study could be more impactful if the authors can provide evidence on which of these CDKs may be most relevant during Rtf1-dependent cardiogenesis. To determine whether the observed cardiac defect indicates a preferential role for CDK9, or that other CDKs may also be able to provide partial rescue may be clarified using additional, more selective small molecules (e.g., BAY1251152, LDC000067 are commercially available).
The reviewer raised a reasonable concern about the specificity of flavopiridol. We thank the reviewer for the insightful suggestion and share the concern about specificity. To address this question, we have used an orthogonal testing through morpholino inhibition where we directly targeted CDK9 and observed the same level of rescue, supporting a critical role of transcription pausing in cardiogenesis.
Item 2. Differences between CRISPR lines and morphants
Much of the work presented used Rtf1 morphants while the authors have already generated 2 CRISPR lines. What is the diUerence between morphants and mutants? The authors should comment on the similarities and/or differences between using morphants or mutants in their study and whether the same Rtf1- CDK9 connection also occurs in the CRISPR lines.
The morphology of our mutants (rtf1<sup>LA2678</sup> and rtf1<sup>LA2679</sup>) resembles the morphants and the previously reported ENU-induced rtf1<sup>KT641</sup> allele. Extensive in situ hybridization analysis showed that the morphants faithfully recapitulate the mutant phenotypes (Fig.2). We have performed rescue experiments (flavopiridol and CDK9 morpholino) using Rtf1 mutant embryos and found that inhibiting Cdk9 restores cardiomyocyte formation (Fig.8).
Item 3. Discuss the therapeutic relevance of study
The authors have already generated a mouse model of Rtf1 Mesp1-Cre knockout where cardiac muscle development is severely derailed (Fig 3B). Thus, a demonstration of a conserved role for CDK9 inhibitor in rescuing cardiogenesis using mouse cells or the mouse model will provide important information on a conserved pathway function relevant to mammalian heart development. In the Discussion, how this underlying mechanistic role may be useful in the treatment of congenital heart disease should be provided.
Thank you for the insight. We have incorporated your comments in the discussion.
Item 4. Insights into the role of CDK9-Rtf1 in response to stress versus in cardiogenesis.
In the Discussion, the authors commented on the role of additional stress-related stimuli such as heat shock and inflammation that have been linked to CDK9 activity. However, the current ms provides the first, endogenous role of Pol2 pausing in a critical developmental step during normal cardiogenesis. The authors should emphasize the novelty and significance of their work by providing a paragraph on the state of knowledge on the molecular mechanisms governing cardiogenesis, then placing their discovery within this framework. This minor addition will also clarify the significance of this work to the broad readership of eLife.
Thank you for the suggestion. We have incorporated your comments and elaborate on the novelty and significance of our work in the discussion.
Reviewer #2 (Recommendations For The Authors):
(1) It is diUicult to assess what the overt defects are in the embryos at any stages. Images of live images were not included in the supplement. Do these have a small, malformed heart tube later or are the embryos just deteriorating due to broad defects?
The Rtf1 deficient embryos do not produce nkx2.5+ cardiac progenitors. Consequently, we never observed a heart tube or detected cells expressing cardiomyocyte marker genes such as myl7. This finding is consistent with previous reports using rtf1 morphants and rtf<sup>1KT64</sup>, an ENU-induced point mutation allele (Langenbacher et al., 2011 and Akanuma, 2007). In this revised manuscript, we provide a live image of 2-day-old wild type and rtf1<sup>LA2679/LA2679</sup> embryos (Fig. 2 Supplement 1). After two days, rtf1 mutant embryos undergo broad cell death.
(2) Fig. 2, although the in situs are convincing, there is not a quantitative assessment of expression changes for these genes. This could have been done for the bulk or single cell RNA-seq experiments, but was not and these genes weren't not included in the heat maps. A quantitative assessment of these genes would benefit the study.
The top 40 most significantly diRerentially expressed genes are displayed in the heatmap presented in Fig.5d. The complete diRerential gene expression analysis results for our hand2 FACS-based comparison of rtf1 morphants and controls is presented in Supplementary Data File 1. In this revised manuscript, we provide a new supplemental figure with violin plots showing the expression levels of genes of interest in our single cell sequencing dataset (Fig.6 Supplement 5).
(3) It doesn't not appear that any statistical tests were used for the comparisons in Fig. 2.
We now provide the statistical data in the legend and Fig.2 b, d, f, h and i.
(4) It's not clear the magnifications and orientations of the embryos in Fig. 3b are the same.
Embryos shown in Fig.3b are at the same magnification. However, because Rtf1 mutant embryos display severe morphological defects, the orientation of mutant embryos was adjusted to examine the cardiac tissue.
(5) The n's for analysis of MLC2v in WT Rtf1 CKO embryos in Fig. 3b are only 1. At least a few more embryos should be analyzed to confirm that the phenotype is consistent.
We have revised the figure and present the number of embryos analyzed and statistics in Fig.3c.
(6) A number of figure panels are referred to out of order in the text. Fig. 4E-G are before Fig. 4C, D, Fig. 7C before 7B, Fig. 8D-I before 8A ,B. In general, it is easier for the reader if the figures panels are presented in the order they are referred to in the text.
Revised as suggested.
(7) While additional genes can be included, it is not clear why the same sets of genes are not examined in the bulk or single-cell RNA-seq as with the in situs or expression was analyzed in embryos. I suggest including the genes like nkx2.5, tbx20, myl7, in all the sequencing analysis.
We used the same set of genes in all analyses when possible. However, the low expression of genes such as nkx2.5 and myl7 in our sc-seq dataset preclude them from the clustering/trajectory analysis. In this revised manuscript, we present violin plots showing their expression in wild type and rtf1 morphants (Fig. 6 Supplement 5).
(8) If a multiomic approach was used, why wasn't its analysis incorporated more into the manuscript? In general, a clearer presentation and deeper analysis of the single cell data would benefit the study. The integration of the RNA and ATAC would benefit the analysis.
As addressed in our response to the reviewer’s public review, both datasets were used in clustering. Examining changes in chromatin accessibility is certainly interesting, but beyond the scope of this study.
(9) Many of the markers analyzed are not cardiac specific or it is not clear they are expressed in cardiac progenitors at the stage of the analysis. Hand2 has broader expression. Additional confirmation of some of the genes through in situ would help the interpretations.
Markers used for the in situ hybridization analysis (myl7, mef2ca, nkx2.5, tbx5a, and tbx20) are known for their critical role in heart development. For sc-seq trajectory analyses, most displayed genes (sema3e, bmp6, ttn.2, mef2cb, tnnt2a, ryr2b, and myh7bb) were identified based on their diRerential expression along the LPM-cardiac progenitor pseudotime trajectory. Rather than selecting genes based on their cardiac specificity, our goal was to examine the progressive gene expression changes associated with cardiac progenitor formation and compare gene expression of wild type and rtf1 deficient embryos.
(10) Additional labels of the cell clusters are needed for Supplemental Figs. 2 and 3.
The cluster IDs were presented on Supplementary Figures 2 and 3. In this revised version, we added predicted cell types to the UMAP (revised Fig.6 Supplement 1) and provided an excel file with this information (revised Supplementary Table 2).
(11) On lines 101-102, the interpretation from the previous data is that diUerentiation of the LPM requires Rtf1. However, later from the single cell data the interpretation based on the markers is that Rtf1 loss aUects maturation. However, it is not clear this interpretation is correct or what changed from the single cell data. If that were the case, one would expect to see maintenance of more early marks and subsequent loss of maturation markers, which does not appear to the be the case from the presented data.
Our data suggests that cardiac progenitor formation is not accomplished by simultaneously switching on all cardiac marker genes. Our pseudotime trajectory analysis highlights tnnt2a, ryr2b, and myh7bb as genes that increase in expression in a lagged manner compared to mef2cb (Fig. 6). Thus, the abnormal activation of mef2cb without subsequent upregulation of tnnt2a, ryr2b, and myh7bb in rtf1 morphants suggests a requirement for rtf1 in the progressive gene expression changes required for proper cardiac progenitor diRerentiation. Our single cell experiment focuses on the process of cardiac progenitor diRerentiation and does not provide insights into cardiomyocyte maturation. We have edited the text to clarify these interpretations.
(12) The interpretation that there is not "transfating" is not supported by the shown data. Analysis of markers in other tissues, again with in situ, to show spatially would benefit the study.
As stated in our response to the reviewer’s public review, we observed a dramatic increase of ALPM cells, but a decrease of ALPM derivatives including the cardiac lineage. We did not observe the expansion of one ALPM-derived subpopulation at the expense of the others. These observations suggest a defect in ALPM diRerentiation and argue against the notion that the region of the ALPM that would normally give rise to cardiac progenitors is instead diRerentiating into another cell type.
(13) The rationale that sequence conservation means a gene is important (lines 137-139) is not really true. There are examples a lot of highly conserved genes whose mutants don't have defects.
We have revised the text to avoid confusion.
(14) The data showing that the 8 bp mutations do not aUect the RNA transcript is not shown or at least indicated in Fig. 7. It would seem that this experiment could have been done in the mutant embryos, in which case the experiment would have been semi-blinded as the genotyping would occur after imaging.
The modified Rtf1 wt RNA (Rtf1 wt* in revised Fig. 7) robustly rescued nkx2.5 expression in rtf1 deficient embryos, demonstrating that the 8 bp modifications do not negatively impact the activity of the injected RNA. As stated previously, morpholino knockdown was used in some experiments because our characterization of rtf1 morphants showed that they faithfully recapitulate the rtf1 mutant phenotype during the timeframe of interest.
(15) Using a technique like PRO-seq at the same stage as the ChIP-seq would complement the ChIP-seq and allow a more detailed analysis of the transcriptional pausing on specific genes observed in WT and mutant embryos.
As stated in our response to the reviewer’s public review, we appreciate the suggestion but PRO-seq is beyond the scope of this study.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
In the gram-positive model organism Bacillus subtilis, the membrane associated ParA family member MinD, concentrates the division inhibitor MinC at cell poles where it prevents aberrant division events. This important study presents compelling data suggesting that polar localization of MinCD is largely due to differences in diffusion rates between monomeric and dimeric MinD. This finding is exciting as it negates the necessity for a third, localization determinant, in this system as has been proposed by previous investigations.
-
Reviewer #1 (Public review):
The authors used fluorescence microscopy, image analysis, and mathematical modeling to study the effects of membrane affinity and diffusion rates of MinD monomer and dimer states on MinD gradient formation in B. subtilis. To test these effects, the authors experimentally examined MinD mutants that lock the protein in specific states, including Apo monomer (K16A), ATP-bound monomer (G12V) and ATP-bound dimer (D40A, hydrolysis defective), and compared to wild-type MinD. Overall, the experimental results support the conclusions that reversible membrane binding of MinD is critical for the formation of the MinD gradient, but the binding affinities between monomers and dimers are similar.
The modeling part is a new attempt to use the Monte Carlo method to test the conditions for the formation of the MinD gradient in B. subtilis. The modeling results provide good support for the observations and find that the MinD gradient is sensitive to different diffusion rates between monomers and dimers. This simulation is based on several assumptions and predictions, which raises new questions that need to be addressed experimentally in the future.
-
Reviewer #3 (Public review):
This important study by Bohorquez et al examines the determinants necessary for concentrating the spatial modulator of cell division, MinD, at the future site of division and the cell poles. Proper localization of MinD is necessary to bring the division inhibitor, MinC, in proximity to the cell membrane and cell poles where it prevents aberrant assembly of the division machinery. In contrast to E. coli, in which MinD oscillates from pole-to-pole courtesy of a third protein MinE, how MinD localization is achieved in B. subtilis-which does not encode a MinE analog-has remained largely a mystery. The authors present compelling data indicating that MinD dimerization is dispensable for membrane localization but required for concentration at the cell poles. Dimerization is also important for interactions between MinD and MinC, leading to the formation of large protein complexes. Computational modeling, specifically a Monte Carlo simulation, supports a model in which differences in diffusion rates between MinD monomers and dimers lead to concentration of MinD at cell poles. Once there, interaction with MinC increases the size of the complex, further reinforcing diffusion differences. Notably, interactions with MinJ-which has previously been implicated in MinCD localization, are dispensable for concentrating MinD at cell poles although MinJ may help stabilize the MinCD complex at those locations.
[Editor's note: The editors and reviewers have no further comments and encourage the authors to proceed with a Version of Record.]
-
Author response:
The following is the authors’ response to the previous reviews
Public Review:
Reviewer #1 (Public review):
The authors used fluorescence microscopy, image analysis, and mathematical modeling to study the effects of membrane affinity and diffusion rates of MinD monomer and dimer states on MinD gradient formation in B. subtilis. To test these effects, the authors experimentally examined MinD mutants that lock the protein in specific states, including Apo monomer (K16A), ATP-bound monomer (G12V) and ATP-bound dimer (D40A, hydrolysis defective), and compared to wild-type MinD. Overall, the experimental results support the conclusions that reversible membrane binding of MinD is critical for the formation of the MinD gradient, but the binding affinities between monomers and dimers are similar.
The modeling part is a new attempt to use the Monte Carlo method to test the conditions for the formation of the MinD gradient in B. subtilis. The modeling results provide good support for the observations and find that the MinD gradient is sensitive to different diffusion rates between monomers and dimers. This simulation is based on several assumptions and predictions, which raises new questions that need to be addressed experimentally in the future.
Reviewer #3 (Public review):
This important study by Bohorquez et al examines the determinants necessary for concentrating the spatial modulator of cell division, MinD, at the future site of division and the cell poles. Proper localization of MinD is necessary to bring the division inhibitor, MinC, in proximity to the cell membrane and cell poles
where it prevents aberrant assembly of the division machinery. In contrast to E. coli, in which MinD 50 oscillates from pole-to-pole courtesy of a third protein MinE, how MinD localization is achieved in B. 51 subtilis-which does not encode a MinE analog-has remained largely a mystery. The authors present 52 compelling data indicating that MinD dimerization is dispensable for membrane localization but required 53 for concentration at the cell poles. Dimerization is also important for interactions between MinD and MinC, 54 leading to the formation of large protein complexes. Computational modeling, specifically a Monte Carlo 55 simulation, supports a model in which differences in diffusion rates between MinD monomers and dimers 56 lead to concentration of MinD at cell poles. Once there, interaction with MinC increases the size of the 57 complex, further reinforcing diffusion differences. Notably, interactions with MinJ-which has previously 58 been implicated in MinCD localization, are dispensable for concentrating MinD at cell poles although MinJ may help stabilize the MinCD complex at those locations.
Comments on revisions:
I believe the authors put respectable effort into revisions and addressing reviewer comments, particularly 64 those that focused on the strengths of the original conclusions. The language in the current version of the manuscript is more precise and the overall product is stronger.
We are happy to learn that the reviewer considers our manuscript ready for publication.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
The author has adequately answered the questions that were raised in my previous comments. There are only few minor revisions needed for improvement.
Line 48−49: 'These proteins ensure that cell division occurs at midcell and not close to nascent division sites or cell poles'
delete 'nascent division site'
This has now been corrected as suggested.
Line 64−65: 'MinC inhibits polymerization of FtsZ by direct protein-protein interactions and needs to bind to the Walker A-type ATPase MinD for its recruitment to septa or the polar regions of the cell'
delete 'septa or', because MinD recruits MinC to the cell poles to block polar division, not septal formation.
This has now been corrected as suggested.
Supplemental information:
Some parameters in Table S1 are missing definitions. If these parameters relate to terms described in the "Methods" section, please add the corresponding parameter symbols after the terms.
We would like to thank the reviewer for pointing this out. We have improved Table S1 and corrected the related parameters in the Methods section (lines 605-619).
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
Ge et al here report a structural study of the native tripartite multidrug efflux pump complexes from Escherichia coli that identifies a novel accessory subunit, YbjP, the structure of the native TolC-YbjP-AcrABZ complex, as well as structures of the AcrB protein in L, T, and O conformations. The strength of the structural data is compelling, and the importance of the findings is potentially fundamental. However, additional analysis and comparison with pre-existing data would help to put the obtained data and its impact in the proper context, and the inclusion of functional data would help to substantiate some claims that are currently incompletely supported.
-
Reviewer #1 (Public review):
Summary:
This manuscript investigates the biological mechanism underlying the assembly and transport of the AcrAB-TolC efflux pump complex. By combining endogenous protein purification with cryo-EM analysis, the authors show that the AcrB trimer adopts three distinct conformations simultaneously and identify a previously uncharacterized lipoprotein, YbjP, as a potential additional component of the complex. The work aims to advance our understanding of the AcrAB-TolC efflux system in near-native conditions and may have broader implications for elucidating its physiological mechanism.
Strengths:
Overall, the manuscript is clearly presented, and several of the datasets are of high quality. The use of natively isolated complexes is a major strength, as it minimizes artifacts associated with reconstituted systems and enables the discovery of a novel subunit. The authors also distinguish two major assemblies-the TolC-YbjP sub-complex and the complete pump-which appear to correspond to the closed and open channel states, respectively. The conceptual advance is potentially meaningful, and the findings could be of broad interest to the field.
Weaknesses:
(1) As the identification of YbjP is a key contribution of this work, a deeper comparison with functional "anchor" proteins in other efflux pumps is needed. Including an additional supplementary figure illustrating these structural comparisons would be valuable.
(2) The observation of the LTO states in the presence of TolC represents an important extension of previous findings. A more detailed discussion comparing these LTO states to those reported in earlier structural and biochemical studies would improve the clarity and significance of this point.
-
Reviewer #2 (Public review):
Summary:
This manuscript reports the high-resolution cryo-EM structures of the endogenous TolC-YbjP-AcrABZ complex and a TolC-YbjP subcomplex from E. coli, identifying a novel accessory subunit. This work is an impressive effort that provides valuable structural insights into this native complex.
Strengths:
(1) The study successfully determines the structure of the complete, endogenously purified complex, marking a significant achievement.
(2) The identification of a previously unknown accessory subunit is an important finding.
(3) The use of cryo-EM to resolve the complex, including potential post-translational modifications such as N-palmitoyl and S-diacylglycerol, is a notable highlight.
Weaknesses:
(1) Clarity and Interpretation: Several points need clarification. Additionally, the description of the sample preparation method, which is a key strength, is currently misplaced and should be introduced earlier.
(2) Data Presentation: The manuscript would benefit significantly from improved figures.
(3) Supporting Evidence: The inclusion of the protein purification profile as a supplementary figure is essential. Furthermore, a discussion comparing the endogenous AcrB structure to those obtained in other systems (e.g., liposomes) and commenting on observed lipid densities would strengthen the overall analysis.
-
Reviewer #3 (Public review):
Summary:
The manuscript "Structural mechanisms of pump assembly and drug transport in the AcrAB-TolC efflux system" by Ge et al. describes the identification of a previously uncharacterized lipoprotein, YbjP, as a novel partner of the well-studied Enterobacterial tripartite efflux pump AcrAB-TolC. The authors present cryo-electron microscopy structures of the TolC-YbjP subcomplex and the complete AcrABZ-TolC-YbjP assembly. While the identification and structural characterization of YbjP are potentially novel, the stated focus of the manuscript-mechanisms of pump assembly and drug transport - is not sufficiently addressed. The manuscript requires reframing to emphasize the principal novelty associated with YbjP and significant development of the other aspects, especially the claimed novelty of the AcrB drug-efflux cycle.
Strengths:
The reported association of YbjP with AcrAB-TolC is novel; however, a recent deposition of a preceding and much more detailed manuscript to the BioRxiv server (Horne et al., https://doi.org/10.1101/2025.03.19.644130) removes much of the immediate novelty.
Weaknesses:
While the identification of YbjP is novel, the authors do not appear to acknowledge the precedence of another work (Horne et al., 2025), and it is not cited within the correct context in the manuscript.
Several results presented in the TolC-YbjP section do not represent new findings regarding TolC structure itself. The structure and gating behaviour of TolC should be more thoroughly introduced in the Introduction, including prior work describing channel opening and conformational transitions. The current manuscript does not discuss the mechanistic role of helices H3/H4 and H7/H8 in channel dilation, despite implying that YbjP binding may influence these features. Only the original closed TolC structure is cited, and the manuscript does not address prior mutational studies involving the D396 region, though this residue is specifically highlighted in the presented structures.
The manuscript provides only a general structural alignment between the closed TolC-YbjP subcomplex and the open TolC observed in the full pump assembly. However, multiple open, closed, and intermediate conformations of AcrAB-TolC have already been reported. Thus, YbjP alone cannot be assumed to account for TolC channel gating. A systematic comparison with existing structures is necessary to determine whether YbjP contributes any distinct allosteric modulation.
The analysis of AcrB peristaltic action is superficial, poorly substantiated and importantly, not novel. Several references to the ATP-synthase cycle have been provided, but this has been widely established already some 20 years ago - e.g. https://www.science.org/doi/10.1126/science.1131542.
The most significant limitation of the study is the absence of functional characterization of YbjP in vivo or in vitro. While the structural association between YbjP and TolC is interesting, the biological role of YbjP remains unclear. Moreover, the manuscript does not examine structural differences between the presented complex and previously solved AcrAB-TolC or MexAB-OprM assemblies that might support a mechanistic model.
-
Author response:
Public Reviews:
Reviewer #1 (Public review):
Summary:
This manuscript investigates the biological mechanism underlying the assembly and transport of the AcrAB-TolC efflux pump complex. By combining endogenous protein purification with cryo-EM analysis, the authors show that the AcrB trimer adopts three distinct conformations simultaneously and identify a previously uncharacterized lipoprotein, YbjP, as a potential additional component of the complex. The work aims to advance our understanding of the AcrAB-TolC efflux system in near-native conditions and may have broader implications for elucidating its physiological mechanism.
Strengths:
Overall, the manuscript is clearly presented, and several of the datasets are of high quality. The use of natively isolated complexes is a major strength, as it minimizes artifacts associated with reconstituted systems and enables the discovery of a novel subunit. The authors also distinguish two major assemblies-the TolC-YbjP sub-complex and the complete pump-which appear to correspond to the closed and open channel states, respectively. The conceptual advance is potentially meaningful, and the findings could be of broad interest to the field.
Weaknesses:
(1) As the identification of YbjP is a key contribution of this work, a deeper comparison with functional "anchor" proteins in other efflux pumps is needed. Including an additional supplementary figure illustrating these structural comparisons would be valuable.
We appreciate this helpful suggestion. We will expand the comparative analysis between YbjP and established anchoring or accessory components in other efflux pumps, and we will add a new supplementary figure illustrating these structural relationships.
(2) The observation of the LTO states in the presence of TolC represents an important extension of previous findings. A more detailed discussion comparing these LTO states to those reported in earlier structural and biochemical studies would improve the clarity and significance of this point.
We agree. In the revised manuscript we will expand our discussion of the LTO conformations, including a direct comparison with previously reported structural and biochemical observations, to better contextualize the significance of our findings.
Reviewer #2 (Public review):
Summary:
This manuscript reports the high-resolution cryo-EM structures of the endogenous TolC-YbjP-AcrABZ complex and a TolC-YbjP subcomplex from E. coli, identifying a novel accessory subunit. This work is an impressive effort that provides valuable structural insights into this native complex.
Strengths:
(1) The study successfully determines the structure of the complete, endogenously purified complex, marking a significant achievement.
(2) The identification of a previously unknown accessory subunit is an important finding.
(3) The use of cryo-EM to resolve the complex, including potential post-translational modifications such as N-palmitoyl and S-diacylglycerol, is a notable highlight.
Weaknesses:
(1) Clarity and Interpretation: Several points need clarification. Additionally, the description of the sample preparation method, which is a key strength, is currently misplaced and should be introduced earlier.
Thank you for pointing this out. We will reorganize the text to introduce the sample preparation strategy earlier and clarify the points that may cause ambiguity.
(2) Data Presentation: The manuscript would benefit significantly from improved figures.
We agree and will revise the figures to improve clarity, consistency, and readability. Additional schematic illustrations will also be included where appropriate.
(3) Supporting Evidence: The inclusion of the protein purification profile as a supplementary figure is essential. Furthermore, a discussion comparing the endogenous AcrB structure to those obtained in other systems (e.g., liposomes) and commenting on observed lipid densities would strengthen the overall analysis.
We appreciate these suggestions. We will add the purification profile and expand the comparison between our endogenous AcrB structure and previously reported structures from reconstituted systems, including a more detailed discussion of lipid densities.
Reviewer #3 (Public review):
Summary:
The manuscript "Structural mechanisms of pump assembly and drug transport in the AcrAB-TolC efflux system" by Ge et al. describes the identification of a previously uncharacterized lipoprotein, YbjP, as a novel partner of the well-studied Enterobacterial tripartite efflux pump AcrAB-TolC. The authors present cryo-electron microscopy structures of the TolC-YbjP subcomplex and the complete AcrABZ-TolC-YbjP assembly. While the identification and structural characterization of YbjP are potentially novel, the stated focus of the manuscript-mechanisms of pump assembly and drug transport - is not sufficiently addressed. The manuscript requires reframing to emphasize the principal novelty associated with YbjP and significant development of the other aspects, especially the claimed novelty of the AcrB drug-efflux cycle.
Strengths:
The reported association of YbjP with AcrAB-TolC is novel; however, a recent deposition of a preceding and much more detailed manuscript to the BioRxiv server (Horne et al., https://doi.org/10.1101/2025.03.19.644130) removes much of the immediate novelty.
Weaknesses:
While the identification of YbjP is novel, the authors do not appear to acknowledge the precedence of another work (Horne et al., 2025), and it is not cited within the correct context in the manuscript.
We thank the reviewer for rasising this important point regarding the independent nature of our work.
Our study indeed progressed independently. The process began with our purification of an endogenous protein sample containing the AcrAB-TolC efflux pump. During our cryo-EM analysis, we observed an unassigned density in the map, for which we built a preliminary main-chain model. A subsequent search of structural databases, including AlphaFold predictions, allowed us to identify this density as the protein YbjP. It was only after this identification that we became aware of the related preprint by Horne et al. on BioRxvi (Posted March 19, 2025).
Therefore, our structural determination of YbjP was conducted entirely independently. We fully acknowledge and respect the work by Horne et al. and have already cited their reprint in our manuscript. While their detailed structural data, maps, and coordinates are not yet publicly available, we have described their findings appropriately. We agree that our manuscript can better reflect this context and will carefully check for any missing citations to ensure that their contribution is properly and clearly acknowledged.
We also believe that the two studies are mutually complementary and collectively reinforce the emerging understanding of YbjP.
Several results presented in the TolC-YbjP section do not represent new findings regarding TolC structure itself.
We agree that the TolC features we describe are consistent with previously reported structural characteristics. However, these observations could only be confirmed in the context of the newly determined TolC–YbjP subcomplex, which was not available prior to this study. We will clarify this point in the revision to avoid overstating novelty.
The structure and gating behaviour of TolC should be more thoroughly introduced in the Introduction, including prior work describing channel opening and conformational transitions.
We appreciate this suggestion and agree that a more comprehensive overview of TolC gating and conformational transitions will strengthen the Introduction. We will revise the text to incorporate relevant prior structural and functional studies.
The current manuscript does not discuss the mechanistic role of helices H3/H4 and H7/H8 in channel dilation, despite implying that YbjP binding may influence these features.
Thank you for this comment. The primary novel contributions of this manuscript are the identification of YbjP and the structural characterization of AcrB in three distinct states. The discussion of the dilation mechanism, while included because we observed the closed TolC-YbjP state, is a secondary point. In the revised manuscript, we will expand this discussion as suggested.
Only the original closed TolC structure is cited, and the manuscript does not address prior mutational studies involving the D396 region, though this residue is specifically highlighted in the presented structures.
We appreciate the reviewer drawing attention to this oversight. We will add citations to the relevant mutational and mechanistic studies, including those involving the D396 region, and more clearly discuss these findings in relation to our structural observations.
The manuscript provides only a general structural alignment between the closed TolC-YbjP subcomplex and the open TolC observed in the full pump assembly. However, multiple open, closed, and intermediate conformations of AcrAB-TolC have already been reported. Thus, YbjP alone cannot be assumed to account for TolC channel gating. A systematic comparison with existing structures is necessary to determine whether YbjP contributes any distinct allosteric modulation.
We agree with the reviewer’s assessment and appreciate the constructive suggestion. In our revised manuscript, we will expand the structural comparison to include previously reported open, closed, and intermediate AcrAB–TolC conformations. This expanded analysis will more clearly position our findings within the existing structural framework.
The analysis of AcrB peristaltic action is superficial, poorly substantiated and importantly, not novel. Several references to the ATP-synthase cycle have been provided, but this has been widely established already some 20 years ago - e.g. https://www.science.org/doi/10.1126/science.1131542.
We thank the reviewer for this comment. We fully acknowledge the foundational studies that established the AcrB functional cycle and its analogy to the ATP-synthase mechanism. While previous work indeed defined the LTO (Loose, Tight, Open) cycle of AcrB, those structures were obtained using AcrB in isolation. In contrast, our endogenous sample, which includes the native constraints of AcrA from above and the presence of AcrZ, reveals conformational changes in the transmembrane and porter domains that differ from those previously reported. We interpret these differences as reflecting a more physiologically relevant mechanism. In our revision, we will provide a detailed discussion to contextualize these distinctions within the existing literature.
The most significant limitation of the study is the absence of functional characterization of YbjP in vivo or in vitro. While the structural association between YbjP and TolC is interesting, the biological role of YbjP remains unclear.
We agree that the lack of functional characterization is a limitation of the present work. Our study focuses on structural elucidation and structural analysis. Although the recent preprint you mentioned suggests that YbjP deletion may not produce a strong phenotype, we are still interested in conducting additional experiments to explore its potential roles in future work. We will revise the text to clearly acknowledge this limitation.
Moreover, the manuscript does not examine structural differences between the presented complex and previously solved AcrAB-TolC or MexAB-OprM assemblies that might support a mechanistic model.
We thank the reviewer for this suggestion. We will incorporate a more detailed comparative analysis with existing AcrAB–TolC and MexAB–OprM structures and highlight similarities and differences that may inform mechanistic interpretation.
-
-
-
eLife Assessment
This important study identifies PRRT2 as an auxiliary regulator of Nav channel slow inactivation, proposing that PRRT2 facilitates entry into, and delays recovery from, the slow-inactivated state. The evidence provided is compelling and well executed, though the work would be bolstered by additional studies of Nav1.6, as well as structural studies to directly investigate the molecular basis of gating modulation. Overall, this study will be of interest to ion channel biophysicists and neurophysiologists, particularly those studying channelopathies.
-
Reviewer #1 (Public review):
Summary:
The manuscript by Lu and colleagues demonstrates convincingly that PRRT2 interacts with brain voltage-gated sodium channels to enhance slow inactivation in vitro and in vivo. The work is interesting and rigorously conducted. The relevance to normal physiology and disease pathophysiology (e.g., PRRT2-related genetic neurodevelopmental disorders) seems high. Some simple additional experiments could elevate the impact and make the study more complete.
Strengths:
Experiments are conducted rigorously, including experimenter blinding and appropriate controls. Data presentation is excellent and logical. The paper is well written for a general scientific audience.
Weaknesses:
There are a few missing experiments and one place where data are over-interpreted.
(1) An in vitro study of Nav1.6 is conspicuously absent. In addition to being a major brain Na channel, Nav1.6 is predominant in cerebellar Purkinje neurons, which the authors note lack PRRT2 expression. They speculate that the absence of PRRT2 in these neurons facilitates the high firing rate. This hypothesis would be strengthened if PRRT2 also enhanced slow inactivation of Nav1.6. If a stable Nav1.6 cell were not available, then simple transient co-transfection experiments would suffice.
(2) To further demonstrate the physiological impact of enhanced slow inactivation, the authors should consider a simple experiment in the stable cell line experiments (Figure 1) to test pulse frequency dependence of peak Na current. One would predict that PRRT2 expression will potentiate 'run down' of the channels, and this finding would be complementary to the biophysical data.
(3) The study of one K channel is limited, and the conclusion from these experiments represents an over-interpretation. I suggest removing these data unless many more K channels (ideally with measurable proxies for slow inactivation) were tested. These data do not contribute much to the story.
(4) In Figure 2, the authors should confirm that protein is indeed expressed in cells expressing each truncated PRRT2 construct. Absent expression should be ruled out as an explanation for absent enhancement of slow inactivation.
-
Reviewer #2 (Public review):
Summary:
As a member of DspB subfamily, PRRT2 is primarily expressed in the nervous system and has been associated with various paroxysmal neurological disorders. Previous studies have shown that PRRT2 directly interacts with Nav1.2 and Nav1.6, modulating channel properties and neuronal excitability.
In this study, Lu et al. reported that PRRT2 is a physiological regulator of Nav channel slow inactivation, promoting the development of Nav slow inactivation and impeding the recovery from slow inactivation. This effect can be replicated by the C-terminal region (256-346) of PRRT2, and is highly conserved across species from zebrafish, mouse, to human PRRT2. TRARG1 and TMEM233, the other two DspB family members, showed similar effects on Nav1.2 slow inactivation. Co-IP data confirms the interaction between Nav channels and PRRT2. Prrt2-mutant mice, which lack PRRT2 expression, require lower stimulation thresholds for evoking after-discharges when compared to WT mice.
Strengths:
(1) This study is well designed, and data support the conclusion that PRRT2 is a potent regulator of slow inactivation of Nav channels.
(2) This study reveals similar effects on Nav1.2 slow inactivation by PRRT2, TMEM233, and TRARG1, indicating a common regulation of Nav channels by DspB family members (Supplemental Figure 2). A recent study has shown that TMEM233 is essential for ExTxA (a plant toxin)-mediated inhibition on fast inactivation of Nav channels; and PRRT2 and TRARG1 could replicate this effect (Jami S, et al. Nat Commun 2023). It is possible that all three DspB members regulate Nav channel properties through the same mechanism, and exploring molecules that target PRRT2/TRARG1/TMEM233 might be a novel strategy for developing new treatments of DspB-related neurological diseases.
Weaknesses:
(1) Previously, the authors have reported that PRRT2 reduces Nav1.2 current density and alters biophysical properties of both Nav1.2 and Nav1.6 channels, including enhanced steady-state inactivation, slower recovery, and stronger use-dependent inhibition (Lu B, et al. Cell Rep 2021, Fig 3 & S5). All those changes are expected to alter neuronal excitability and should be discussed.
(2) In this study, the fast inactivation kinetics was examined by a single stimulus at 0 mV, which may not be sufficient for the conclusion. Inactivation kinetics at more voltage potentials should be added.
(3) It is a little surprising that there is no difference in Nav1.2 current density in axon-blebs between WT and Prrt2-mutant mice (Figure 7B). PRRT2 significantly shifts steady-state slow inactivation curve to hyperpolarizing direction, at -70 mV, nearly 70% of Nav1.2 channels are inactivated by slow inactivation in cells expressing PRRT2 when compared to less than 10% in cells expressing GFP (Figure supplement 1B); with a holding potential of -70 mV, I would expect that most of Nav channels are inactivated in axon-blebs from WT mice but not in axon-blebs from Prrt2-mutant mice, and therefore sodium current density should be different in Figure 7B, which was not. Any explanation?
(3) Besides Nav channels, PRRT2 has been shown to act on Cav2.1 channels as well as molecules involved in neurotransmitter release, which may also contribute to abnormal neuronal activity in Prrt2-mutant mice. These should be mentioned when discussing PRRT2's role in neuronal resilience.
-
Reviewer #3 (Public review):
This paper reveals that the neuronal protein PRRT2, previously known for its association with paroxysmal dyskinesia and infantile seizures, modulates the slow inactivation of voltage-gated sodium ion (Nav) channels, a gating process that limits excitability during prolonged activity. Using electrophysiology, molecular biology, and mouse models, the authors show that PRRT2 accelerates entry of Nav channels into the slow-inactivated state and slows their recovery, effectively dampening excessive excitability. The effect seems evolutionarily conserved, requires the C-terminal region of PRRT2, and is recapitulated in cortical neurons, where PRRT2 deficiency leads to hyper-responsiveness and reduced cortical resilience in vivo. These findings extend the functional repertoire of PRRT2, identifying it as a physiological brake on neuronal excitability. The work provides a mechanistic link between PRRT2 mutations and episodic neurological phenotypes.
Comments:
(1) The precise structural interface and the molecular basis of gating modulation remain inferred rather than demonstrated.
(2) The in vivo phenotype reflects a complex circuit outcome and does not isolate slow-inactivation defects per se.
(3) Expression of PRRT2 in muscle or heart is low, so the cross isoform claims are likely of limited physiological significance.
(4) The mechanistic separation between the trafficking of PRRT2 and its gating effects is not clearly resolved.
(5) Additional studies with Nav1.6 should be carried out.
-
Author response:
Public Reviews:
Reviewer #1 (Public review):
Summary:
The manuscript by Lu and colleagues demonstrates convincingly that PRRT2 interacts with brain voltage-gated sodium channels to enhance slow inactivation in vitro and in vivo. The work is interesting and rigorously conducted. The relevance to normal physiology and disease pathophysiology (e.g., PRRT2-related genetic neurodevelopmental disorders) seems high. Some simple additional experiments could elevate the impact and make the study more complete.
Strengths:
Experiments are conducted rigorously, including experimenter blinding and appropriate controls. Data presentation is excellent and logical. The paper is well written for a general scientific audience.
Weaknesses:
There are a few missing experiments and one place where data are over-interpreted.
(1) An in vitro study of Nav1.6 is conspicuously absent. In addition to being a major brain Na channel, Nav1.6 is predominant in cerebellar Purkinje neurons, which the authors note lack PRRT2 expression. They speculate that the absence of PRRT2 in these neurons facilitates the high firing rate. This hypothesis would be strengthened if PRRT2 also enhanced slow inactivation of Nav1.6. If a stable Nav1.6 cell were not available, then simple transient co-transfection experiments would suffice.
We thank the reviewer for this suggestion. In the revised manuscript, we will examine whether PRRT2 modulates slow inactivation of Nav1.6 channels using heterologous co-expression experiments.
(2) To further demonstrate the physiological impact of enhanced slow inactivation, the authors should consider a simple experiment in the stable cell line experiments (Figure 1) to test pulse frequency dependence of peak Na current. One would predict that PRRT2 expression will potentiate 'run down' of the channels, and this finding would be complementary to the biophysical data.
We agree that examining pulse frequency-dependent changes in peak sodium current would provide a functional readout linking PRRT2-mediated enhancement of slow inactivation to use-dependent channel availability. In the revision, we will include a pulse-train protocol to quantify use-dependent attenuation (“run-down”) of peak sodium current across stimulation trains and will compare this adaptation between control and PRRT2-expressing conditions.
(3) The study of one K channel is limited, and the conclusion from these experiments represents an over-interpretation. I suggest removing these data unless many more K channels (ideally with measurable proxies for slow inactivation) were tested. These data do not contribute much to the story.
We agree with the reviewer’s assessment. To avoid over-interpretation and to maintain focus on PRRT2-dependent regulation of Nav channel slow inactivation, we will remove potassium channel dataset and the associated conclusions from the revised manuscript.
(4) In Figure 2, the authors should confirm that protein is indeed expressed in cells expressing each truncated PRRT2 construct. Absent expression should be ruled out as an explanation for the enhancement of slow inactivation.
We appreciate the reviewer’s concern regarding expression of the truncated PRRT2 constructs in the Nav1.2 stable cell line, particularly PRRT2(1-266), which shows little effect on slow inactivation of Nav1.2 channels. In the revision, we will include expression controls for each truncation construct in the Nav1.2-expressing cells to rule out lack of expression as an explanation for the observed functional differences.
Reviewer #2 (Public review):
Summary:
As a member of DspB subfamily, PRRT2 is primarily expressed in the nervous system and has been associated with various paroxysmal neurological disorders. Previous studies have shown that PRRT2 directly interacts with Nav1.2 and Nav1.6, modulating channel properties and neuronal excitability.
In this study, Lu et al. reported that PRRT2 is a physiological regulator of Nav channel slow inactivation, promoting the development of Nav slow inactivation and impeding the recovery from slow inactivation. This effect can be replicated by the C-terminal region (256-346) of PRRT2, and is highly conserved across species from zebrafish, mouse, to human PRRT2. TRARG1 and TMEM233, the other two DspB family members, showed similar effects on Nav1.2 slow inactivation. Co-IP data confirms the interaction between Nav channels and PRRT2. Prrt2-mutant mice, which lack PRRT2 expression, require lower stimulation thresholds for evoking after-discharges when compared to WT mice.
Strengths:
(1) This study is well designed, and data support the conclusion that PRRT2 is a potent regulator of slow inactivation of Nav channels.
(2) This study reveals similar effects on Nav1.2 slow inactivation by PRRT2, TMEM233, and TRARG1, indicating a common regulation of Nav channels by DspB family members (Supplemental Figure 2). A recent study has shown that TMEM233 is essential for ExTxA (a plant toxin)-mediated inhibition on fast inactivation of Nav channels; and PRRT2 and TRARG1 could replicate this effect (Jami S, et al. Nat Commun 2023). It is possible that all three DspB members regulate Nav channel properties through the same mechanism, and exploring molecules that target PRRT2/TRARG1/TMEM233 might be a novel strategy for developing new treatments of DspB-related neurological diseases.
Weaknesses:
(1) Previously, the authors have reported that PRRT2 reduces Nav1.2 current density and alters biophysical properties of both Nav1.2 and Nav1.6 channels, including enhanced steady-state inactivation, slower recovery, and stronger use-dependent inhibition (Lu B, et al. Cell Rep 2021, Fig 3 & S5). All those changes are expected to alter neuronal excitability and should be discussed.
We agree that PRRT2 has been reported to exert multiple effects on Nav channels which are all expected to influence neuronal excitability (Fruscione et al., 2018; Lu et al., 2021; Valente et al., 2023). In the revised manuscript, we will expand the Discussion to integrate these prior findings and to clarify how these PRRT2-dependent changes may interact with (and potentially converge on) modulation of slow inactivation to shape neuronal excitability.
(2) In this study, the fast inactivation kinetics was examined by a single stimulus at 0 mV, which may not be sufficient for the conclusion. Inactivation kinetics at more voltage potentials should be added.
We thank the reviewer for this suggestion. In the revision, we will extend our analysis of Nav1.2 fast-inactivation kinetics across a range of test potentials (e.g., -20, -10, 0, +10 and +20 mV) in the presence and absence of PRRT2.
(3) It is a little surprising that there is no difference in Nav1.2 current density in axon-blebs between WT and Prrt2-mutant mice (Figure 7B). PRRT2 significantly shifts steady-state slow inactivation curve to hyperpolarizing direction, at -70 mV, nearly 70% of Nav1.2 channels are inactivated by slow inactivation in cells expressing PRRT2 when compared to less than 10% in cells expressing GFP (Figure supplement 1B); with a holding potential of -70 mV, I would expect that most of Nav channels are inactivated in axon-blebs from WT mice but not in axon-blebs from Prrt2-mutant mice, and therefore sodium current density should be different in Figure 7B, which was not. Any explanation?
We appreciate the reviewer for raising this point. In our axonal bleb recordings, although the holding potential was -70 mV, sodium current density was measured after a hyperpolarizing pre-pulse (-110 mV) to relieve inactivation immediately prior to the test depolarization (as described in the Methods). Thus, the current density measurement in Figure 7B reflects the maximal available current following this recovery step, rather than the steady-state availability at -70 mV. In the revision, we will state this explicitly in the Results and/or figure legend to avoid confusion.
(4) Besides Nav channels, PRRT2 has been shown to act on Cav2.1 channels as well as molecules involved in neurotransmitter release, which may also contribute to abnormal neuronal activity in Prrt2-mutant mice. These should be mentioned when discussing PRRT2's role in neuronal resilience.
We agree with the reviewer. In the revised manuscript, we will broaden the Discussion to acknowledge PRRT2 functions beyond Nav channels, including reported roles in Cav2.1 regulation and neurotransmitter release. We will frame the in vivo phenotypes in Prrt2-mutant mice as likely arising from convergent mechanisms—altered intrinsic excitability together with changes in synaptic transmission.
Reviewer #3 (Public review):
This paper reveals that the neuronal protein PRRT2, previously known for its association with paroxysmal dyskinesia and infantile seizures, modulates the slow inactivation of voltage-gated sodium ion (Nav) channels, a gating process that limits excitability during prolonged activity. Using electrophysiology, molecular biology, and mouse models, the authors show that PRRT2 accelerates entry of Nav channels into the slow-inactivated state and slows their recovery, effectively dampening excessive excitability. The effect seems evolutionarily conserved, requires the C-terminal region of PRRT2, and is recapitulated in cortical neurons, where PRRT2 deficiency leads to hyper-responsiveness and reduced cortical resilience in vivo. These findings extend the functional repertoire of PRRT2, identifying it as a physiological brake on neuronal excitability. The work provides a mechanistic link between PRRT2 mutations and episodic neurological phenotypes.
Comments:
(1) The precise structural interface and the molecular basis of gating modulation remain inferred rather than demonstrated.
We thank the reviewer for this comment. In the revision, we will make it explicit that our structural modeling are based on prediction rather than evidential. We will also expand the Limitations section to highlight that direct structural and biochemical mapping of the PRRT2-Nav interface (e.g., through targeted mutagenesis, crosslinking, and/or structural determination) will be required to define the binding interface and establish the molecular basis of gating modulation.
(2) The in vivo phenotype reflects a complex circuit outcome and does not isolate slow-inactivation defects per se.
We agree with the reviewer. In the revision, we will refine the Discussion to avoid over-attributing the in vivo phenotype to slow-inactivation defects alone and to explicitly state that impaired slow inactivation in Prrt2-mutant mice represents one plausible contributing mechanism to reduced cortical resilience, alongside other PRRT2-dependent process.
(3) Expression of PRRT2 in muscle or heart is low, so the cross-isoform claims are likely of limited physiological significance.
We thank the review for your comment about physiological relevance. In the revised manuscript, we will clarify that our Nav isoform panel was designed to assess mechanistic generality at the channel level rather than to imply broad in vivo relevance across tissues. We will also expand the Discussion to emphasize that any therapeutic strategy involving PRRT2 delivery should consider its consistent effect on slow inactivation across multiple Nav isoforms.
(4) The mechanistic separation between the trafficking effect of PRRT2 and its gating effects is not clearly resolved.
We appreciate the reviewer for raising this important point. In the revision, we will expand the Discussion to clarify why we interpret the effect of PRRT2 on slow inactivation as a gating modulation rather than a secondary consequence of altered channel abundance or localization. First, our slow inactivation measurements are expressed as the fraction of available channels after depolarization conditioning relative to baseline availability within the same cell (post-/pre-conditioning), which minimizes confounding by differences in initial surface expression. Second, the slow inactivation of Nav channel occurs on a rapid, activity-dependent timescale (seconds), whereas remarkable changes in trafficking and surface abundance generally develop over longer intervals (minutes to hours).
(5) Additional studies with Nav1.6 should be carried out.
We thank the reviewer’s suggestion. We will include Nav1.6 slow inactivation experiments in the revised manuscript.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This important study fills a major geographic and temporal gap in understanding Paleocene mammal evolution in Asia and proposes an intriguing "brawn before bite" hypothesis grounded in diverse analytical approaches. However, the findings are incomplete because limitations in sampling design - such as the use of worn or damaged teeth, the pooling of different tooth positions, and the lack of independence among teeth from the same individuals - introduce uncertainties that weaken support for the reported disparity patterns. The taxonomic focus on predominantly herbivorous clades also narrows the ecological scope of the results. Clarifying methodological choices, expanding the ecological context, and tempering evolutionary interpretations would substantially strengthen the study.
-
Reviewer #1 (Public review):
Summary:
This work provides valuable new insights into the Paleocene Asian mammal recovery and diversification dynamics during the first ten million years post-dinosaur extinction. Studies that have examined the mammalian recovery and diversification post-dinosaur extinction have primarily focused on the North American mammal fossil record, and it's unclear if patterns documented in North America are characteristic of global patterns. This study examines dietary metrics of Paleocene Asian mammals and found that there is a body size disparity increase before dietary niche expansion and that dietary metrics track climatic and paleobotanical trends of Asia during the first 10 million years after the dinosaur extinction.
Strengths:
The Asian Paleocene mammal fossil record is greatly understudied, and this work begins to fill important gaps. In particular, the use of interdisciplinary data (i.e., climatic and paleobotanical) is really interesting in conjunction with observed dietary metric trends.
Weaknesses:
While this work has the potential to be exciting and contribute greatly to our understanding of mammalian evolution during the first 10 million years post-dinosaur extinction, the major weakness is in the dental topographic analysis (DTA) dataset.
There are several specimens in Figure 1 that have broken cusps, deep wear facets, and general abrasion. Thus, any values generated from DTA are not accurate and cannot be used to support their claims. Furthermore, the authors analyze all tooth positions at once, which makes this study seem comprehensive (200 individual teeth), but it's unclear what sort of noise this introduces to the study. Typically, DTA studies will analyze a singular tooth position (e.g., Pampush et al. 2018 Biol. J. Linn. Soc.), allowing for more meaningful comparisons and an understanding of what value differences mean. Even so, the dataset consists of only 48 specimens. This means that even if all the specimens were pristinely preserved and generated DTA values could be trusted, it's still only 48 specimens (representing 4 different clades) to capture patterns across 10 million years. For example, the authors note that their results show an increase in OPCR and DNE values from the middle to the late Paleocene in pantodonts. However, if a singular tooth position is analyzed, such as the lower second molar, the middle and late Paleocene partitions are only represented by a singular specimen each. With a sample size this small, it's unlikely that the authors are capturing real trends, which makes the claims of this study highly questionable.
-
Reviewer #2 (Public review):
Summary:
This study uses dental traits of a large sample of Chinese mammals to track evolutionary patterns through the Paleocene. It presents and argues for a 'brawn before bite' hypothesis - mammals increased in body size disparity before evolving more specialized or adapted dentitions. The study makes use of an impressive array of analyses, including dental topographic, finite element, and integration analyses, which help to provide a unique insight into mammalian evolutionary patterns.
Strengths:
This paper helps to fill in a major gap in our knowledge of Paleocene mammal patterns in Asia, which is especially important because of the diversification of placentals at that time. The total sample of teeth is impressive and required considerable effort for scanning and analyzing. And there is a wealth of results for DTA, FEA, and integration analyses. Further, some of the results are especially interesting, such as the novel 'brawn before bite' hypothesis and the possible link between shifts in dental traits and arid environments in the Late Paleocene. Overall, I enjoyed reading the paper, and I think the results will be of interest to a broad audience.
Weaknesses:
I have four major concerns with the study, especially related to the sampling of teeth and taxa, that I discuss in more detail below. Due to these issues, I believe that the study is incomplete in its support of the 'brawn before bite' hypothesis. Although my concerns are significant, many of them can be addressed with some simple updates/revisions to analyses or text, and I try to provide constructive advice throughout my review.
(1) If I understand correctly, teeth of different tooth positions (e.g., premolars and molars), and those from the same specimen, are lumped into the same analyses. And unless I missed it, no justification is given for these methodological choices (besides testing for differences in proportions of tooth positions per time bin; L902). I think this creates some major statistical concerns. For example, DTA values for premolars and molars aren't directly comparable (I don't think?) because they have different functions (e.g., greater grinding function for molars). My recommendation is to perform different disparity-through-time analyses for each tooth position, assuming the sample sizes are big enough per time bin. Or, if the authors maintain their current methods/results, they should provide justification in the main text for that choice.
Also, I think lumping teeth from the same specimen into your analyses creates a major statistical concern because the observations aren't independent. In other words, the teeth of the same individual should have relatively similar DTA values, which can greatly bias your results. This is essentially the same issue as phylogenetic non-independence, but taken to a much greater extreme.
It seems like it'd be much more appropriate to perform specimen-level analyses (e.g., Wilson 2013) or species-level analyses (e.g., Grossnickle & Newham 2016) and report those results in the main text. If the authors believe that their methods are justified, then they should explain this in the text.
(2) Maybe I misunderstood, but it sounds like the sampling is almost exclusively clades that are primarily herbivorous/omnivorous (Pantodonta, Arctostylopida, Anagalida, and maybe Tillodonta), which means that the full ecomorphological diversity of the time bins is not being sampled (e.g., insectivores aren't fully sampled). Similarly, the authors say that they "focused sampling" on those major clades and "Additional data were collected on other clades ... opportunistically" (L628). If they favored sampling of specific clades, then doesn't that also bias their results?
If the study is primarily focused on a few herbivorous clades, then the Introduction should be reframed to reflect this. You could explain that you're specifically tracking herbivore patterns after the K-Pg.
(3) There are a lot of topics lacking background information, which makes the paper challenging to read for non-experts. Maybe the authors are hindered by a short word limit. But if they can expand their main text, then I strongly recommend the following:
(a) The authors should discuss diets. Much of the data are diet correlates (DTA values), but diets are almost never mentioned, except in the Methods. For example, the authors say: "An overall shift towards increased dental topographic trait magnitudes ..." (L137). Does that mean there was a shift toward increased herbivory? If so, why not mention the dietary shift? And if most of the sampled taxa are herbivores (see above comment), then shouldn't herbivory be a focal point of the paper?
(b) The authors should expand on "we used dentitions as ecological indicators" (L75). For non-experts, how/why are dentitions linked to ecology? And, again, why not mention diet? A strong link between tooth shape and diet is a critical assumption here (and one I'm sure that all mammalogists agree with), but the authors don't provide justification (at least in the Introduction) for that assumption. Many relevant papers cited later in the Methods could be cited in the Introduction (e.g., Evans et al. 2007).
(c) Include a better introduction of the sample, such as explicitly stating that your sample only includes placentals (assuming that's the case) and is focused on three major clades. Are non-placentals like multituberculates or stem placentals/eutherians found at Chinese Paleocene fossil localities and not sampled in the study, or are they absent in the sampled area?
(d) The way in which "integration" is being used should be defined. That is a loaded term which has been defined in different ways. I also recommend providing more explanation on the integration analyses and what the results mean.
If the authors don't have space to expand the main text, then they should at least expand on the topics in the supplement, with appropriate citations to the supplement in the main text.
(4) Finally, I'm not convinced that the results fully support the 'brawn before bite' hypothesis. I like the hypothesis. However, the 'brawn before ...' part of the hypothesis assumes that body size disparity (L63) increased first, and I don't think that pattern is ever shown. First, body size disparity is never reported or plotted (at least that I could find) - the authors just show the violin plots of the body sizes (Figures 1B, S6A). Second, the authors don't show evidence of an actual increase in body size disparity. Instead, they seem to assume that there was a rapid diversification in the earliest Paleocene, and thus the early Paleocene bin has already "reached maximum saturation" (L148). But what if the body size disparity in the latest Cretaceous was the same as that in the Paleocene? (Although that's unlikely, note that papers like Clauset & Redner 2009 and Grossnickle & Newham 2016 found evidence of greater body size disparity in the latest Cretaceous than is commonly recognized.) Similarly, what if body size disparity increased rapidly in the Eocene? Wouldn't that suggest a 'BITE before brawn' hypothesis? So, without showing when an increase in body size diversity occurred, I don't think that the authors can make a strong argument for 'brawn before [insert any trait]".
Although it's probably well beyond the scope of the study to add Cretaceous or Eocene data, the authors could at least review literature on body size patterns during those times to provide greater evidence for an earliest Paleocene increase in size disparity.
-
Author response:
eLife Assessment
This important study fills a major geographic and temporal gap in understanding Paleocene mammal evolution in Asia and proposes an intriguing "brawn before bite" hypothesis grounded in diverse analytical approaches. However, the findings are incomplete because limitations in sampling design - such as the use of worn or damaged teeth, the pooling of different tooth positions, and the lack of independence among teeth from the same individuals - introduce uncertainties that weaken support for the reported disparity patterns. The taxonomic focus on predominantly herbivorous clades also narrows the ecological scope of the results. Clarifying methodological choices, expanding the ecological context, and tempering evolutionary interpretations would substantially strengthen the study.
We thank Dr. Rasmann for the constructive evaluation of our manuscript. Considering the reviewers’ comments, we plan to implement revisions to our study focusing on (1) expansion of the fossil sample description, including a detailed account of the process of excluding extremely worn or damaged teeth from all analyses, (2) expanded reporting of the analyses done on individual tooth positions, and tempering the interpretation of the pooled samples in light of the issues raised by reviewers, (3) providing a more comprehensive introduction that includes an overview of the Paleocene mammal faunas in south China, which unevenly samples certain clades whereas others are extremely rare, and why the current available fossil samples would not permit a whole-fauna analysis to be adequately conducted across the three land mammal age time bins of the Paleocene in China. We believe these revisions would substantially strengthen the study’s robustness and impact for understanding the ecomorphological evolution of the earliest abundant placental mammals during the Paleocene in Asia.
Public Reviews:
Reviewer #1 (Public review):
Summary:
This work provides valuable new insights into the Paleocene Asian mammal recovery and diversification dynamics during the first ten million years post-dinosaur extinction. Studies that have examined the mammalian recovery and diversification post-dinosaur extinction have primarily focused on the North American mammal fossil record, and it's unclear if patterns documented in North America are characteristic of global patterns. This study examines dietary metrics of Paleocene Asian mammals and found that there is a body size disparity increase before dietary niche expansion and that dietary metrics track climatic and paleobotanical trends of Asia during the first 10 million years after the dinosaur extinction.
Strengths:
The Asian Paleocene mammal fossil record is greatly understudied, and this work begins to fill important gaps. In particular, the use of interdisciplinary data (i.e., climatic and paleobotanical) is really interesting in conjunction with observed dietary metric trends.
Weaknesses:
While this work has the potential to be exciting and contribute greatly to our understanding of mammalian evolution during the first 10 million years post-dinosaur extinction, the major weakness is in the dental topographic analysis (DTA) dataset.
There are several specimens in Figure 1 that have broken cusps, deep wear facets, and general abrasion. Thus, any values generated from DTA are not accurate and cannot be used to support their claims. Furthermore, the authors analyze all tooth positions at once, which makes this study seem comprehensive (200 individual teeth), but it's unclear what sort of noise this introduces to the study. Typically, DTA studies will analyze a singular tooth position (e.g., Pampush et al. 2018 Biol. J. Linn. Soc.), allowing for more meaningful comparisons and an understanding of what value differences mean. Even so, the dataset consists of only 48 specimens. This means that even if all the specimens were pristinely preserved and generated DTA values could be trusted, it's still only 48 specimens (representing 4 different clades) to capture patterns across 10 million years. For example, the authors note that their results show an increase in OPCR and DNE values from the middle to the late Paleocene in pantodonts. However, if a singular tooth position is analyzed, such as the lower second molar, the middle and late Paleocene partitions are only represented by a singular specimen each. With a sample size this small, it's unlikely that the authors are capturing real trends, which makes the claims of this study highly questionable.
We thank Reviewer 1 for their careful review of our manuscript. A major external limitation of the application of DTA to fossil samples is the availability of specimens. Whereas a typical study design using extant or geologically younger/more abundant fossil species would preferably sample much larger quantities of teeth from each treatment group (time bins, in our case), the rarity of well-preserved Paleocene mammalian dentitions in Asia necessitates the analysis of small samples in order to make observations regarding major trends in a region and time period otherwise impossible to study (see Chow et al. 1977). That said, we plan to clarify methodological details in response to the reviewer’s comments, including a more comprehensive explanation of our criteria for exclusion of broken tooth crowns from the analyses. We also plan to expand our results reporting on individual tooth position analysis, potentially including resampling and/or simulation analyses to assess the effect of small and uneven samples on our interpretation of results. Lastly, we plan to revise the discussion and conclusion accordingly, including more explicit distinction between well-supported findings that emerge from various planned sensitivity analyses, versus those that are more speculative and tentative in nature.
Chow, M., Zhang, Y., Wang, B., and Ding, S. (1977). Paleocene mammalian fauna from the Nanxiong Basin, Guangdong Province. Paleontol. Sin. New Ser. C 20, 1–100.
Reviewer #2 (Public review):
Summary:
This study uses dental traits of a large sample of Chinese mammals to track evolutionary patterns through the Paleocene. It presents and argues for a 'brawn before bite' hypothesis - mammals increased in body size disparity before evolving more specialized or adapted dentitions. The study makes use of an impressive array of analyses, including dental topographic, finite element, and integration analyses, which help to provide a unique insight into mammalian evolutionary patterns.
Strengths:
This paper helps to fill in a major gap in our knowledge of Paleocene mammal patterns in Asia, which is especially important because of the diversification of placentals at that time. The total sample of teeth is impressive and required considerable effort for scanning and analyzing. And there is a wealth of results for DTA, FEA, and integration analyses. Further, some of the results are especially interesting, such as the novel 'brawn before bite' hypothesis and the possible link between shifts in dental traits and arid environments in the Late Paleocene. Overall, I enjoyed reading the paper, and I think the results will be of interest to a broad audience.
Weaknesses:
I have four major concerns with the study, especially related to the sampling of teeth and taxa, that I discuss in more detail below. Due to these issues, I believe that the study is incomplete in its support of the 'brawn before bite' hypothesis. Although my concerns are significant, many of them can be addressed with some simple updates/revisions to analyses or text, and I try to provide constructive advice throughout my review.
(1) If I understand correctly, teeth of different tooth positions (e.g., premolars and molars), and those from the same specimen, are lumped into the same analyses. And unless I missed it, no justification is given for these methodological choices (besides testing for differences in proportions of tooth positions per time bin; L902). I think this creates some major statistical concerns. For example, DTA values for premolars and molars aren't directly comparable (I don't think?) because they have different functions (e.g., greater grinding function for molars). My recommendation is to perform different disparity-through-time analyses for each tooth position, assuming the sample sizes are big enough per time bin. Or, if the authors maintain their current methods/results, they should provide justification in the main text for that choice.
We thank Reviewer 2 for raising several issues worthy of clarification. Separate analyses for individual tooth positions were performed but not emphasized in the first version of the study. In our revised version we plan to highlight the nuances of the results from premolar versus molar partition analyses.
Also, I think lumping teeth from the same specimen into your analyses creates a major statistical concern because the observations aren't independent. In other words, the teeth of the same individual should have relatively similar DTA values, which can greatly bias your results. This is essentially the same issue as phylogenetic non-independence, but taken to a much greater extreme.
It seems like it'd be much more appropriate to perform specimen-level analyses (e.g., Wilson 2013) or species-level analyses (e.g., Grossnickle & Newham 2016) and report those results in the main text. If the authors believe that their methods are justified, then they should explain this in the text.
We plan to emphasize individual tooth position analyses in our revisions, and provide a stronger justification for our current treatment of multiple teeth from the same individual specimens as independent samples. We recognize the statistical nonindependence raised by Reviewer 2, but we would point out that from an ecomorphological perspective, it is unclear to us that the heterodont dentition of these early Cenozoic placental mammals should represent a single ecological signal (and thus warrant using only a single tooth position as representative of an individual’s DTA values). We plan to closely examine the nature of nonindependence in the DTA data within individuals, to assess a balanced approach to maximize information content from the relatively small and rare fossil samples used, while minimizing signal nonindependence across the dentition.
(2) Maybe I misunderstood, but it sounds like the sampling is almost exclusively clades that are primarily herbivorous/omnivorous (Pantodonta, Arctostylopida, Anagalida, and maybe Tillodonta), which means that the full ecomorphological diversity of the time bins is not being sampled (e.g., insectivores aren't fully sampled). Similarly, the authors say that they "focused sampling" on those major clades and "Additional data were collected on other clades ... opportunistically" (L628). If they favored sampling of specific clades, then doesn't that also bias their results?
If the study is primarily focused on a few herbivorous clades, then the Introduction should be reframed to reflect this. You could explain that you're specifically tracking herbivore patterns after the K-Pg.
We plan to revise the introduction section to more accurately reflect the emphasis on those clades. However, we would note that conventional dietary ecomorphology categories used to characterize later branching placental mammals are likely to be less informative when applied to their Paleocene counterparts. Although there are dental morphological traits that began to characterize major placental clades during the Paleocene, distinctive dietary ecologies have not been demonstrated for most of the clade representatives studied. Thus, insectivory was probably not restricted to “Insectivora”, nor carnivory to early Carnivmorpha or “Creodonta”, each of which represented less than 5% of the taxonomic richness during the Paleocene in China (Wang et al. 2007).
Wang, Y., Meng, J., Ni, X., and Li, C. (2007). Major events of Paleogene mammal radiation in China. Geol. J. 42, 415–430.
(3) There are a lot of topics lacking background information, which makes the paper challenging to read for non-experts. Maybe the authors are hindered by a short word limit. But if they can expand their main text, then I strongly recommend the following:
(a) The authors should discuss diets. Much of the data are diet correlates (DTA values), but diets are almost never mentioned, except in the Methods. For example, the authors say: "An overall shift towards increased dental topographic trait magnitudes ..." (L137). Does that mean there was a shift toward increased herbivory? If so, why not mention the dietary shift? And if most of the sampled taxa are herbivores (see above comment), then shouldn't herbivory be a focal point of the paper?
We plan to revise the text to make clearer connections between DTA and dietary inferences, and at the same time advise caution in making one-to-one linkages between them. Broadly speaking, dental indices such as DTA are phenotypic traits, and as in other phenotypic traits, the strength of structure-function relationships needs to be explicitly established before dietary ecological inferences can be confidently made. There is, to date, no consistent connection between dental topology and tooth use proxies and biomechanical traits in extant non-herbivorous species (e.g., DeSantis et al. 2017, Tseng and DeSantis 2024), and in our analyses, FEA and DTA generally did not show strong correlations to each other. Thus, we plan to continue to exercise care in interpreting DTA data as dietary data.
DeSantis LRG, Tseng ZJ, Liu J, Hurst A, Schubert BW, Jiangzuo Q. Assessing niche conservatism using a multiproxy approach: dietary ecology of extinct and extant spotted hyenas. Paleobiology. 2017;43(2):286-303. doi:10.1017/pab.2016.45
Tseng ZJ, DeSantis LR. Relationship between tooth macrowear and jaw morphofunctional traits in representative hypercarnivores. PeerJ. 2024 Nov 11;12:e18435.
(b) The authors should expand on "we used dentitions as ecological indicators" (L75). For non-experts, how/why are dentitions linked to ecology? And, again, why not mention diet? A strong link between tooth shape and diet is a critical assumption here (and one I'm sure that all mammalogists agree with), but the authors don't provide justification (at least in the Introduction) for that assumption. Many relevant papers cited later in the Methods could be cited in the Introduction (e.g., Evans et al. 2007).
Thank you for this suggestion. We plan to expand the introduction section to better contextualize the methodological basis for the work presented.
(c) Include a better introduction of the sample, such as explicitly stating that your sample only includes placentals (assuming that's the case) and is focused on three major clades. Are non-placentals like multituberculates or stem placentals/eutherians found at Chinese Paleocene fossil localities and not sampled in the study, or are they absent in the sampled area?
We thank Reviewer 2 for raising this important point worthy of clarification. Multituberculates are completely absent from the first two land mammal ages in the Paleocene of Asia, and non-placentals are rare in general (Wang et al. 2007). We plan to provide more context for the taxonomic sampling choices made in the study.
Wang, Y., Meng, J., Ni, X., and Li, C. (2007). Major events of Paleogene mammal radiation in China. Geol. J. 42, 415–430.
(d) The way in which "integration" is being used should be defined. That is a loaded term which has been defined in different ways. I also recommend providing more explanation on the integration analyses and what the results mean.
If the authors don't have space to expand the main text, then they should at least expand on the topics in the supplement, with appropriate citations to the supplement in the main text.
We plan to clarify our usage of “integration” to enable readers to accurately interpret what we mean by it.
(4) Finally, I'm not convinced that the results fully support the 'brawn before bite' hypothesis. I like the hypothesis. However, the 'brawn before ...' part of the hypothesis assumes that body size disparity (L63) increased first, and I don't think that pattern is ever shown. First, body size disparity is never reported or plotted (at least that I could find) - the authors just show the violin plots of the body sizes (Figures 1B, S6A). Second, the authors don't show evidence of an actual increase in body size disparity. Instead, they seem to assume that there was a rapid diversification in the earliest Paleocene, and thus the early Paleocene bin has already "reached maximum saturation" (L148). But what if the body size disparity in the latest Cretaceous was the same as that in the Paleocene? (Although that's unlikely, note that papers like Clauset & Redner 2009 and Grossnickle & Newham 2016 found evidence of greater body size disparity in the latest Cretaceous than is commonly recognized.) Similarly, what if body size disparity increased rapidly in the Eocene? Wouldn't that suggest a 'BITE before brawn' hypothesis? So, without showing when an increase in body size diversity occurred, I don't think that the authors can make a strong argument for 'brawn before [insert any trait]".
Although it's probably well beyond the scope of the study to add Cretaceous or Eocene data, the authors could at least review literature on body size patterns during those times to provide greater evidence for an earliest Paleocene increase in size disparity.
We plan to provide a broader discussion and any supporting evidence from the Cretaceous and Eocene to either make a stronger case for “brawn before bite”, or to refine what we mean by brawn/size/size disparity.
-
-
papers.ssrn.com papers.ssrn.com
-
eLife Assessment
This Review Article explores the intricate relationship between humans and Mycobacterium tuberculosis (Mtb), providing an additional perspective on TB disease. Specifically, this review focuses on the utilization of systems-level approaches to study TB, while highlighting challenges in the frameworks used to identify the relevant immunologic signals that may explain the clinical spectrum of disease. The work could be further enhanced by better defining key terms that anchor the review, such as "unified mechanism" and "immunological route." This review will be of interest to immunologists as well as those interested in evolution and host-pathogen interactions.
-
Reviewer #1 (Public review):
Summary:
This is an interesting and useful review highlighting the complex pathways through which pulmonary colonisation or infection with Mycobacterium tuberculosis (Mtb) may progress to develop symptomatic disease and transmit the pathogen. I found the section on immune correlates associated with individuals who have clearly been exposed to and reacted to Mtb but did not develop latent infections particularly valuable. However, several aspects would benefit from clarification.
Strengths:
The main strengths lie in the arguments presented for a multiplicity of immune pathways to TB disease.
Weaknesses:
The main weaknesses lie in clarity, particularly in the precise meanings of the three figures.
I accept that there is a 'goldilocks zone' that underpins the majority of TB cases we see and predominantly reflects different patterns of immune response, but the analogies used need to be more clearly thought through.
-
Reviewer #2 (Public review):
Summary:
This is a thought-provoking perspective by Reichmann et al, outlining supportive evidence that Mycobacterium tuberculosis co-evolved with its host Homo Sapiens to both increase susceptibility to infection and reduce rates of fatal disease through decreased virulence. TB is an ancient disease where two modes of virulence are likely to have evolved through different stages of human evolution: one before the Neolithic Demographic Transition, where humans lived in sparse hunter-gatherer communities, which likely selected for prolonged Mtb infection with reduced virulence to allow for transmission across sparse populations. Conversely, following the agricultural and industrial revolutions, Mtb virulence is likely to have evolved to attack a higher number of susceptible individuals. These different disease modalities highlight the central idea that there are different immunological routes to TB disease, which converge on a disease phenotype characterized by high bacterial load and destruction of the extracellular matrix. The writing is very clear and provides a lot of supportive evidence from population studies and the recent clinical trials of novel TB vaccines, like M72 and H56. However, there are areas to support the thesis that have been described only in broad strokes, including the impact of host and Mtb genetic heterogeneity on this selection, and the alternative model that there are likely different TB diseases (as opposed to different routes to the same disease), as described by several groups advancing the concept of heterogeneous TB endotypes. I expand on specific points below.
Strengths:
(1) The idea that Mtb evolved to both increase transmission (and possible commensalism with humans) with low rates of reactivation is intriguing. The heterogeneous TB phenotypes in the collaborative cross model (PMID: 35112666) support this idea, where some genetic backgrounds can tolerate a high bacterial load with minimal pathology, while others show signs of pathogenesis with low bacterial loads. This supports the idea that the underlying host state, driven by a number of factors like genetics and nutrition, is likely to explain whether someone will co-exist with Mtb without pathology, or progress to disease. I particularly enjoyed the discussion of the protective advantages provided by Mtb infection, which may have rewired the human immune system to provide protection against heterologous pathogens- this is supported by recent studies showing that Mtb infection provides moderate protection against SARS-CoV-2 (PMID: 35325013, and 37720210), and may have applied to other viruses that are likely to have played a more significant role in the past in the natural selection of Homo Sapiens.
(2) Modeling from Marcel Behr and colleagues (PMID: 31649096) indeed suggests that there are at least TB clinical phenotypes that likely mirror the two distinct phases of Mtb co-evolution with humans. Most of the TB disease progression occurs rapidly (within 1-2 years of exposure), and the rest are slow cases of reactivation over time. I enjoyed the discussion of the difference between the types of immune hits needed to progress to disease in the two scenarios, where you may need severe immune hits for rapid progression, a phenotype that likely evolved after the Neolithic transition to larger human populations. On the other hand, a series of milder immune events leading to reactivation after a long period of asymptomatic infection likely mirrors slow progression in the hunter-gatherer communities, to allow for prolonged transmission in scarce populations. Perhaps a clearer analysis of these models would be helpful for the reader.
Weaknesses:
(1) The discussion of genetic heterogeneity is limited and only discusses evidence from MSMD studies. Genetics is an important angle to consider in the co-evolution of Mtb and humans. There is a large body of literature on both host and Mtb genetic associations with TB disease. The very fact that host variants in one population do not necessarily cross-validate across populations is evidence in support of population-specific adaptations. Specific Mtb lineages are likely to have co-evolved with distinct human populations. A key reference is missing (PMID: 23995134), which shows that different lineages co-evolved with human migrations. Also, meta-analyses of human GWAS studies to define variants associated with TB are very relevant to the topic of co-evolution (e.g., PMID: 38224499). eQTL studies can also highlight genetic variants associated with regulating key immune genes involved in the response to TB. The authors do mention that Mtb itself is relatively clonal with ~2K SNPs marking Mtb variation, much of which has likely evolved under the selection pressure of modern antibiotics. However, some of this limited universe of variants can still explain co-adaptations between distinct Mtb lineages and different human populations, as shown recently in the co-evolution of lineage 2 with a variant common in Peruvians (PMID: 39613754).
(2) Although the examples of anti-TNF and anti-PD1 treatments are relevant as drivers of TB in limited clinical contexts, the bigger picture is that they highlight major distinct disease endotypes. These restricted examples show that TB can be driven by immune deficiency (as in the case of anti-TNF, HIV, and malnutrition) or hyperactivation (as in the case of anti-PD1 treatment), but there are still certainly many other routes leading to immune suppression or hyperactivation. Considering the idea of hyper-activation as a TB driver, the apparent higher rate of recurrence in the H56 trial referenced in the review is likely due to immune hyperactivation, especially in the context of residual bacteria in the lung. These different TB manifestations (immune suppression vs immune hyperactivation) mirror TB endotypes described by DiNardo et al (PMID: 35169026) from analysis of extensive transcriptomic data, which indicate that it's not merely different routes leading to the same final endpoint of clinical disease, but rather multiple different disease endpoints. A similar scenario is shown in the transcriptomic signatures underlying disease progression in BCG-vaccinated infants, where two distinct clusters mirrored the hyperactivation and immune suppression phenotypes (PMID: 27183822). A discussion of how to think about translating the extensive information from system biology into treatment stratification approaches, or adjunct host-directed therapies, would be helpful.
-
Reviewer #3 (Public review):
Summary:
This perspective article by Reichmann et al. highlights the importance of moving beyond the search for a single, unified immune mechanism to explain host-Mtb interactions. Drawing from studies in immune profiling, host and bacterial genetics, the authors emphasize inconsistencies in the literature and argue for broader, more integrative models. Overall, the article is thought-provoking and well-articulated, raising a concept that is worth further exploration in the TB field.
Strengths:
Timely and relevant in the context of the rapidly expanding multi-omics datasets that provide unprecedented insights into host-Mtb interactions.
Weaknesses (Minor):
(1) Clarity on the notion of a "unified mechanism". It remains unclear whether prior studies explicitly proposed a single unifying immunological model. While inconsistencies in findings exist, they do not necessarily demonstrate that earlier work was uniformly "single-minded". Moreover, heterogeneity in TB has been recognized previously (PMIDs: 19855401, 28736436), which the authors could acknowledge.
(2) Evolutionary timeline and industrial-era framing. The evolutionary model is outdated. Ancient DNA studies place the Mtb's most recent common ancestor at ~6,000 years BP (PMIDs: 25141181; 25848958). The Industrial Revolution is cited as a driver of TB expansion, but this remains speculative without bacterial-genomics evidence and should be framed as a hypothesis. Additionally, the claim that Mtb genomes have been conserved only since the Industrial Revolution (lines 165-167) is inaccurate; conservation extends back to the MRCA (PMID: 31448322).
(3) Trained immunity and TB infection. The treatment of trained immunity is incomplete. While BCG vaccination is known to induce trained immunity (ref 59), revaccination does not provide sustained protection (ref 8), and importantly, Mtb infection itself can also impart trained immunity (PMID: 33125891). Including these nuances would strengthen the discussion.
-
Author response:
The following is the authors’ response to the original reviews.
eLife Assessment
This Review Article explores the intricate relationship between humans and Mycobacterium tuberculosis (Mtb), providing an additional perspective on TB disease. Specifically, this review focuses on the utilization of systems-level approaches to study TB, while highlighting challenges in the frameworks used to identify the relevant immunologic signals that may explain the clinical spectrum of disease. The work could be further enhanced by better defining key terms that anchor the review, such as "unified mechanism" and "immunological route." This review will be of interest to immunologists as well as those interested in evolution and host-pathogen interactions.
We thank the editors for reviewing our article and for the primarily positive comments. We accept that better definition and terminology will improve the clarity of the message, and so have changed the wording as suggested above in the revised manuscript.
Public Reviews:
Reviewer #1 (Public review):
Summary:
This is an interesting and useful review highlighting the complex pathways through which pulmonary colonisation or infection with Mycobacterium tuberculosis (Mtb) may progress to develop symptomatic disease and transmit the pathogen. I found the section on immune correlates associated with individuals who have clearly been exposed to and reacted to Mtb but did not develop latent infections particularly valuable. However, several aspects would benefit from clarification.
Strengths:
The main strengths lie in the arguments presented for a multiplicity of immune pathways to TB disease.
Weaknesses:
The main weaknesses lie in clarity, particularly in the precise meanings of the three figures.
We accept this point, and have completely changed figure 2, and have expanded the legends for figure 1 and 3 to maximise clarity.
I accept that there is a 'goldilocks zone' that underpins the majority of TB cases we see and predominantly reflects different patterns of immune response, but the analogies used need to be more clearly thought through.
We are glad the reviewer agrees with the fundamental argument of different patterns of immunity, and have revised the manuscript throughout where we feel the analogies could be clarified.
Reviewer #2 (Public review):
Summary:
This is a thought-provoking perspective by Reichmann et al, outlining supportive evidence that Mycobacterium tuberculosis co-evolved with its host Homo Sapiens to both increase susceptibility to infection and reduce rates of fatal disease through decreased virulence. TB is an ancient disease where two modes of virulence are likely to have evolved through different stages of human evolution: one before the Neolithic Demographic Transition, where humans lived in sparse hunter-gatherer communities, which likely selected for prolonged Mtb infection with reduced virulence to allow for transmission across sparse populations. Conversely, following the agricultural and industrial revolutions, Mtb virulence is likely to have evolved to attack a higher number of susceptible individuals. These different disease modalities highlight the central idea that there are different immunological routes to TB disease, which converge on a disease phenotype characterized by high bacterial load and destruction of the extracellular matrix. The writing is very clear and provides a lot of supportive evidence from population studies and the recent clinical trials of novel TB vaccines, like M72 and H56. However, there are areas to support the thesis that have been described only in broad strokes, including the impact of host and Mtb genetic heterogeneity on this selection, and the alternative model that there are likely different TB diseases (as opposed to different routes to the same disease), as described by several groups advancing the concept of heterogeneous TB endotypes. I expand on specific points below.
Strengths:
The idea that Mtb evolved to both increase transmission (and possible commensalism with humans) with low rates of reactivation is intriguing. The heterogeneous TB phenotypes in the collaborative cross model (PMID: 35112666) support this idea, where some genetic backgrounds can tolerate a high bacterial load with minimal pathology, while others show signs of pathogenesis with low bacterial loads. This supports the idea that the underlying host state, driven by a number of factors like genetics and nutrition, is likely to explain whether someone will co-exist with Mtb without pathology, or progress to disease. I particularly enjoyed the discussion of the protective advantages provided by Mtb infection, which may have rewired the human immune system to provide protection against heterologous pathogens- this is supported by recent studies showing that Mtb infection provides moderate protection against SARS-CoV-2 (PMID: 35325013, and 37720210), and may have applied to other viruses that are likely to have played a more significant role in the past in the natural selection of Homo Sapiens.
We thank the reviewer for their positive comments, and also for pointing out work that we have overlooked citing previously. We now discuss and cite the work above as suggested
Modeling from Marcel Behr and colleagues (PMID: 31649096) indeed suggests that there are at least TB clinical phenotypes that likely mirror the two distinct phases of Mtb co-evolution with humans. Most of the TB disease progression occurs rapidly (within 1-2 years of exposure), and the rest are slow cases of reactivation over time. I enjoyed the discussion of the difference between the types of immune hits needed to progress to disease in the two scenarios, where you may need severe immune hits for rapid progression, a phenotype that likely evolved after the Neolithic transition to larger human populations. On the other hand, a series of milder immune events leading to reactivation after a long period of asymptomatic infection likely mirrors slow progression in the hunter-gatherer communities, to allow for prolonged transmission in scarce populations. Perhaps a clearer analysis of these models would be helpful for the reader.
We agree that we did not present these concepts in as much detail as we should, and so we now discuss this more on lines 81 – 83 and 184 - 187)
Weaknesses:
The discussion of genetic heterogeneity is limited and only discusses evidence from MSMD studies. Genetics is an important angle to consider in the co-evolution of Mtb and humans. There is a large body of literature on both host and Mtb genetic associations with TB disease. The very fact that host variants in one population do not necessarily cross-validate across populations is evidence in support of population-specific adaptations. Specific Mtb lineages are likely to have co-evolved with distinct human populations. A key reference is missing (PMID: 23995134), which shows that different lineages co-evolved with human migrations. Also, meta-analyses of human GWAS studies to define variants associated with TB are very relevant to the topic of co-evolution (e.g., PMID: 38224499). eQTL studies can also highlight genetic variants associated with regulating key immune genes involved in the response to TB. The authors do mention that Mtb itself is relatively clonal with ~2K SNPs marking Mtb variation, much of which has likely evolved under the selection pressure of modern antibiotics. However, some of this limited universe of variants can still explain co-adaptations between distinct Mtb lineages and different human populations, as shown recently in the co-evolution of lineage 2 with a variant common in Peruvians (PMID: 39613754).
We thank the reviewer for these comments and agree we failed to cite and discuss the work from Sebastian Gagneux’s group on co-migration, which we now discuss. We include a new paragraph discussing co-evolution as suggested on lines 145 – 155 and 218 -220 , citing the work proposed, which we agree enhances the arguments about co-evolution.
Although the examples of anti-TNF and anti-PD1 treatments are relevant as drivers of TB in limited clinical contexts, the bigger picture is that they highlight major distinct disease endotypes. These restricted examples show that TB can be driven by immune deficiency (as in the case of anti-TNF, HIV, and malnutrition) or hyperactivation (as in the case of anti-PD1 treatment), but there are still certainly many other routes leading to immune suppression or hyperactivation. Considering the idea of hyper-activation as a TB driver, the apparent higher rate of recurrence in the H56 trial referenced in the review is likely due to immune hyperactivation, especially in the context of residual bacteria in the lung. These different TB manifestations (immune suppression vs immune hyperactivation) mirror TB endotypes described by DiNardo et al (PMID: 35169026) from analysis of extensive transcriptomic data, which indicate that it's not merely different routes leading to the same final endpoint of clinical disease, but rather multiple different disease endpoints. A similar scenario is shown in the transcriptomic signatures underlying disease progression in BCG-vaccinated infants, where two distinct clusters mirrored the hyperactivation and immune suppression phenotypes (PMID: 27183822). A discussion of how to think about translating the extensive information from system biology into treatment stratification approaches, or adjunct host-directed therapies, would be helpful.
We agree with the points made and that the two publications above further enhance the paper. We have added discussion of the different disease endpoints on line 65 - 67, the evidence regarding immune herpeactivation versus suppression in the vaccination study on lines 162 - 164, and expanded on the translational implications on lines 349 – 352.
Reviewer #3 (Public review):
Summary:
This perspective article by Reichmann et al. highlights the importance of moving beyond the search for a single, unified immune mechanism to explain host-Mtb interactions. Drawing from studies in immune profiling, host and bacterial genetics, the authors emphasize inconsistencies in the literature and argue for broader, more integrative models. Overall, the article is thought-provoking and well-articulated, raising a concept that is worth further exploration in the TB field.
Strengths:
Timely and relevant in the context of the rapidly expanding multi-omics datasets that provide unprecedented insights into host-Mtb interactions.
Weaknesses (Minor):
Clarity on the notion of a "unified mechanism". It remains unclear whether prior studies explicitly proposed a single unifying immunological model. While inconsistencies in findings exist, they do not necessarily demonstrate that earlier work was uniformly "single-minded". Moreover, heterogeneity in TB has been recognized previously (PMIDs: 19855401, 28736436), which the authors could acknowledge.
We accept this point and have toned down the language, acknowledging that we are expanding on an argument that others have made, whilst focusing on the implications for the systems immunology era, and cite the previous work as suggested.
Evolutionary timeline and industrial-era framing. The evolutionary model is outdated. Ancient DNA studies place the Mtb's most recent common ancestor at ~6,000 years BP (PMIDs: 25141181; 25848958). The Industrial Revolution is cited as a driver of TB expansion, but this remains speculative without bacterial-genomics evidence and should be framed as a hypothesis. Additionally, the claim that Mtb genomes have been conserved only since the Industrial Revolution (lines 165-167) is inaccurate; conservation extends back to the MRCA (PMID: 31448322).
Our understanding is that the evolutionary timeline is not fully resolved, with conflicting evidence proposing different dates. The ancient DNA studies giving a timeline of 6,000 years seem to oppose the evidence of evidence of Mtb infection of humans in the middle east 10,000 years ago, and other estimates suggesting 70,000 years. Therefore, we have cited the work above and added a sentence highlighting that different studies propose different timelines. We would propose the industrial revolution created the ideal societal conditions for the expansion of TB, and this would seem widely accepted in the field, but have added a proviso as suggested. We did not intent to claim that Mtb genomes have been conserved since the industrial revolution, the point we were making is that despite rapid expansion within human populations, it has still remained conserved. We therefore have revised our discussion of the conservation of the Mtb genomes on lines and 72 – 74, 81 – 83 and 185 – 190.
Trained immunity and TB infection. The treatment of trained immunity is incomplete. While BCG vaccination is known to induce trained immunity (ref 59), revaccination does not provide sustained protection (ref 8), and importantly, Mtb infection itself can also impart trained immunity (PMID: 33125891). Including these nuances would strengthen the discussion.
We have refined this section. We did cite PMID: 33125891 in the original submission but have changed the wording to emphasise the point on line …
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
Abstract
Line 30: What is an immunological route? Suggest
”...host-pathogen interaction, with diverse immunological processes leading to TB disease (10%) or stable lifelong association or elimination. We suggest these alternate relationships result from the prolonged co-evolution of the pathogen with humans and may even confer a survival advantage in the 90% of exposures that do not progress to disease.”
Thank you, we have reworded the abstract along the lines suggested above, but not identically to allow for other reviewer comments.
Introduction
Ln 43: It is misleading to suggest that the study of TB was the leading influence in establishing the Koch's postulates framework. Many other infections were involved, and Jacob Henle, one of Koch's teachers, is credited with the first clear formulation (see Evans AS. 1976 THE YALE JOURNAL OF BIOLOGY AND MEDICIN PMID: 782050).
We have downplayed the language, stating that TB “contributed” to the formulation if Koch’s postulated.
Ln 46: While the review rightly emphasises intracellular infection in macrophages, the importance and abundance of extracellular bacilli should not be ignored, particularly in transmission and in cavities.
We agree, and have added text on the importance of extracellular bacteria and transmission.
Ln: 56: This is misleading as primary disease prevention is implied, whereas the vaccine was given to individuals presumed to be already infected (TST or IGRA positive). Suggest ..."reduces by 50% progression to overt TB disease when given to those with immunological evidence of latent infection.
Thank you, edit made as suggested
Ln 62: Not sure why it is urgent. Suggest "high priority".
Wording changed as suggested.
Figure 1 needs clarification. The colour scale appears to signify the strength or vigour of the immune response so that disease is associated with high (orange/red) or low (green/blue) activity. The arrows seem to imply either a sequence or a route map when all we really have is an association with a plausible mechanistic link. They might also be taken to imply a hierarchy that is not appropriate. I'm not sure that the X-rays and arrows add anything, and the rectangle provides the key information on its own. Clarify please.
We have clarified the figure legend. We feel the X-rays give the clinical context, and so have kept them, and now state in the legend that this is highlighting that there are diverse pathways leading to active disease to try to emphasise the point the figure is illustrating.
Ln 149-157: I agree that the current dogma is that overt pulmonary disease is required to spread Mtb and fuel disease prevalence. It is vitally important to distinguish the spread of the organism from the occurrence of disease (which does not, of itself, spread). However, both epidemiological (e.g. Ryckman TS, et al. 2022Proc Natl Acad Sci U S A:10.1073/pnas.2211045119) and recent mechanistic (Dinkele R, et al. 2024iScience:10.1016/j.isci.2024.110731, Patterson B, et al. 2024Proc Natl Acad Sci U S A:10. E1073/pnas.2314813121, Warner DF, et al. 2025Nat Rev Microbiol:10.1038/s41579-025-01201-x) studies indicate the importance of asymptomatic infections, and those associated with sputum positivity have recently been recognised by WHO. I think it will be important to acknowledge the importance of this aspect and consider how immune responses may or may not contribute. I regard the view that Mtb is an obligate pathogen, dependent on overt pTB for transmission, as needing to be reviewed.
We agree that we did not give sufficient emphasis to the emerging evidence on asymptomatic infections, and that this may play an important part in transmission in high incidence settings. We now include a discussion on this, and citation of the papers above, on lines 168 – 170.
Ln 159: The terms colonise and colonisation are used, without a clear definition, several times. My view is that both refer to the establishment and replication of an organism on or within a host without associated damage. Where there is associated damage, this is often mediated by immune responses. In this header, I think "establishment in humanity" would be appropriate.
We agree with this point and have changed the header as suggested, and clarified our meaning when we use the term colonisation, which the reviewer correctly interprets.
Ln 181-: I strongly support the view that Mtb has contributed to human selection, even to the suggestion that humanity is adapted to maintain a long-term relationship with Mtb
Thank you, and we have expanded on this evidence as suggested by other reviewers.
Ln 189: improved.
Apologies, typo corrected.
Figure 2: I was also confused by this. The x-axis does not make sense, as a single property should increase. Moreover, does incidence refer to incidence in individuals with that specific balance of resistance and susceptibility, or contribution to overall global incidence - I suspect the latter (also, prevalence would make more sense). At the same time, the legend implies that those with high resistance to colonisation will be infrequent in the population, suggesting that the Y axis should be labelled "frequency in human population". Finally, I can't see what single label could apply to the X axis. While the implication that the majority of global infections reflect a balance between the resistance and susceptibilities is indicated, a frequency distribution does not seem an appropriate representation.
The reviewer is correct that the X axis is aiming to represent two variables, which is not logical, and so we have completely changed this figure to a simple one that we hope makes the point clearly and have amended the legend appropriately. We are aiming to highlight the selective pressures of Mtb on the human population over millennia.
Ln 244: Immunological failure - I agree with the statement but again find the figure (3) unhelpful. Do we start or end in the middle? Is the disease the outside - if so, why are different locations implied? The notion of a maze has some value, but the bacteria should start and finish in the same place by different routes.
We are attempting to illustrate the concept that escape from host immunological control can occur through different mechanisms. As this comment was just from one reviewer, we have left the figure unchanged but have expanded the legend to try to make the point that this is just a conceptual illustration of multiple routes to disease.
Ln 262 onward: I broadly agree with the points made about omic technologies, but would wish to see major emphasis on clear phenotyping of cases. There is something of a contradiction in the review between the emphasis on the multiplicity of immunological processes leading ultimately to disease and the recommendation to analyse via omics, which, in their most widely applied format, bundle these complexities into analyses of the humoral and cellular samples available in blood. Admittedly, the authors point out opportunities for 3-dimensional and single-cell analyses, but it is difficult to see where these end without extrapolation ad infinitum.
We totally agree that clear phenotyping of infection is critical, and expand on this further on lines 307 - 309.
Reviewer #2 (Recommendations for the authors):
I suggest expanding on the genetic determinants of Mtb/host co-evolution.
Thank you, we have now expanded on these sections as suggested.
Reviewer #3 (Recommendations for the authors):
We are in an era of exploding large-scale datasets from multi-omics profiling of Mtb and host interactions, offering an unprecedented lens to understand the complexity of the host immune response to Mtb-a pathogen that has infected human populations for thousands of years. The guiding philosophy for how to interpret this tremendous volume of data and what models can be built from it will be critical. In this context, the perspective article by Reichmann et al. raises an interesting concept: to "avoid unified immune mechanisms" when attempting to understand the immunology underpinning host-Mtb interactions. To support their arguments, the authors review studies and provide evidence from immune profiling, host and bacterial genetics, and showcase several inconsistencies. Overall, this perspective article is well articulated, and the concept is worthwhile for further exploration. A few comments for consideration:
Clarity on the notion of a "unified mechanism". Was there ever a single, clearly proposed unified immunological mechanism? For example, in lines 64-65, the authors criticize that almost all investigations into immune responses to Mtb are based on the premise that a unifying disease mechanism exists. However, after reading the article, it was not clear to me how previous studies attempted to unify the model or what that unifying mechanism was. While inconsistencies in findings certainly exist, they do not necessarily indicate that prior work was guided by a unified framework. I agree that interpreting and exploring data from a broader perspective is valuable, but I am not fully convinced that previous studies were uniformly "single-minded". In fact, the concept of heterogeneity in TB has been previously discussed (e.g., PMIDs: 19855401, 28736436).
We accept this point, and that we have overstated the argument and not acknowledged previous work sufficiently. We now downplay the language and cite the work as proposed.
However, we would propose that essentially all published studies imply that single mechanisms underly development of disease. The authors are not aware of any manuscript that concludes “Therefore, xxxx pathway is one of several that can lead to TB disease”, instead they state “Therefore, xxxx pathway leads to TB disease”. The implication of this language is that the mechanism described occurs in all patients, whilst in fact it likely only is involved in a subset. We have toned down the language and expand on this concept on line 268 – 270.
Evolutionary timeline and industrial-era framing. The evolutionary model needs updating. The manuscript cites a "70,000-year" origin for Mtb, but ancient-DNA studies place the most recent common ancestor at ~6,000 years BP (PMIDs: 25141181; 25848958). The Industrial Revolution is invoked multiple times as a driver of TB expansion, yet the magnitude of its contribution remains debated and, to my knowledge, lacks direct bacterial-genomics evidence for causal attribution; this should be framed as a hypothesis rather than a conclusion. In addition, the statement in lines 165-167 is inaccurate: at the genome level, Mtb has remained highly conserved since its most recent common ancestor-not specifically since the Industrial Revolution (PMID: 31448322).
We accept these points and have made the suggested amendments, as outlined in the public responses. Our understanding is that the evidence about the most common ancestor is controversial; if the divergence of human populations occurred concurrently with Mtb, then this must have been significantly earlier than 6,000 years ago, and so there are conflicting arguments in this domain.
Trained immunity and TB infection. The discussion of trained immunity could be expanded. Reference 59 suggests the induction of innate immune training, but reference 8 reports that revaccination does not confer protection against sustained TB infection, indicating that at least "re"-vaccination may not enhance protection. Furthermore, while BCG is often highlighted as a prototypical inducer of trained immunity, real-world infection occurs through Mtb itself. Importantly, a later study demonstrated that Mtb infection can also impart trained immunity (PMID: 33125891). Integrating these findings would provide a more nuanced view of how both vaccination and infection shape innate immune training in the TB context.
We thank the reviewer for these suggestions and have edited the relevant section to include these studies.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This important study describes the progressive transformation of olfactory information across five different brain regions in the olfactory pathway, including a comparison of responses to familiar and unfamiliar odors. This dataset is of broad interest for olfactory researchers and provides a solid analysis of a graded change in representations of odor identity and experience in different locations in the pathway.
-
Reviewer #1 (Public review):
In this important study, the authors characterized the transformation of neural representations of olfactory stimuli from primary sensory cortex to multisensory regions in the medial temporal lobe and investigated how they were affected by non-associative learning. The authors used high-density silicon probe recordings from five different cortical regions while familiar vs. novel odors were presented to a head-restrained mouse. This is a timely study because unlike other sensory systems (e.g., vision), the progressive transformation of olfactory information is still poorly understood. The authors report that both odor identity and experience are encoded by all of these five cortical areas but nonetheless, some themes emerge. Single neuron tuning of odor identity is broad in the sensory cortices but becomes narrowly tuned in hippocampal regions. Furthermore, while experience affects neuronal response magnitudes in early sensory cortices, it changes the proportion of active neurons in hippocampal regions. Thus, this study is an important step forward in the ongoing quest to understand how olfactory information is progressively transformed along the olfactory pathway.
The study is well-executed. The direct comparison of neuronal representations from five different brain regions is impressive. Conclusions are based on single neuronal level as well as population level decoding analyses. Among all the reported results, one stands out for being remarkably robust. The authors show that the anterior olfactory nucleus (AON), which receives direct input from the olfactory bulb output neurons, was far superior at decoding odor identity as well as novelty compared to all the other brain regions. This is perhaps surprising because the other primary sensory region - the piriform cortex - has been thought to be the canonical site for representing odor identity. A vast majority of studies have focused on aPCx, but direct comparisons between odor coding in the AON and aPCx are rare. The experimental design of this current study allowed the authors to do so and the AON was found to convincingly outperform aPCx. Although this result goes against the canonical model, it is consistent with a few recent studies including one that predicted this outcome based on anatomical and functional comparisons between the AON-projecting tufted cells vs. the aPCx-projecting mitral cells in the olfactory bulb.
Future experiments are needed to probe the circuit mechanisms underlying the differential importance of the two primary olfactory cortices, as well as their potential causal roles in odor identification. Moreover, future work should test whether the decoding accuracy of odor identity and experience from neural data (as reported here) can predict the causal contributions of these regions, as revealed through perturbations during behavioral tasks that explicitly probe odor identification and/or experience.
-
Reviewer #2 (Public review):
Summary:
This manuscript investigates how olfactory representations are transformed along the cortico-hippocampal pathway in mice during a non-associative learning paradigm involving novel and familiar odors. By recording single-unit activity in several key brain regions (AON, aPCx, LEC, CA1, and SUB), the authors aim to elucidate how stimulus identity and experience are encoded and how these representations change across the pathway.
The study addresses an important question in sensory neuroscience regarding the interplay between sensory processing and signaling novelty/familiarity. It provides insights into how the brain processes and retains sensory experiences, suggesting that the earlier stations in the olfactory pathway, the AON aPCx, play a central role in detecting novelty and encoding odor, while areas deeper into the pathway (LEC, CA1 & Sub) are more sparse and encodes odor identity but not novelty/familiarity. However, there are several concerns related to methodology, data interpretation, and the strength of the conclusions drawn.
Strengths:
The authors combine the use of modern tools to obtain high-density recordings from large populations of neurons at different stages of the olfactory system (although mostly one region at a time) with elegant data analyses to study an important and interesting question.
Weaknesses:
The first and biggest problem I have with this paper is that it is very confusing, and the results seem to be all over the place. In some parts, it seems like the AON and aPCx are more sensitive to novelty; in others, it seems the other way around. I find their metrics confusing and unconvincing. For example, the example cells in Figure 1C shows an AON neuron with a very low spontaneous firing rate and a CA1 with a much higher firing rate, but the opposite is true in Fig. 2A. So, what are we to make of Fig. 2C that shows the difference in firing rates between novel vs. familiar odors measured as a difference in spikes/sec. The meaning of this is unclear. The authors could have used a difference in Z-scored responses to normalize different baseline activity levels. (This is just one example of a problem with the methodology.)
There are a lot of high-level data analyses (e.g., decoding, analyzing decoding errors, calculating mutual information, calculating distances in state space, etc.) but very little neural data (except for Fig. 2C, and see my comment above about how this is flawed). So, if responses to novel vs. familiar odors are different in the AON and aPCx, how are they different? Why is decoding accuracy better for novel odors in CA1 but better for familiar odors in SUB (Fig. 3A)? The authors identify a small subset of neurons that have unusually high weights in the SVM analyses that contribute to decoding novelty, but they don't tell us which neurons these are and how they are responding differently to novel vs. familiar odors.
The authors call AON and aPCx "primary sensory cortices" and LEC, CA1, and Sub "multisensory areas". This is a straw man argument. For example, we now know that PCx encodes multimodal signals (Poo et al. 2021, Federman et al., 2024; Kehl et al., 2024), and LEC receives direct OB inputs, which has traditionally been the criterion for being considered a "primary olfactory cortical area". So, this terminology is outdated and wrong, and although it suits the authors' needs here in drawing distinctions, it is simplistic and not helpful moving forward.
Why not simply report z-scored firing rates for all neurons as a function of trial number? (e.g., Jacobson & Friedrich, 2018). Fig. 2C is not sufficient. For example, in the Discussion, they say, "novel stimuli caused larger increases in firing rates than familiar stimuli" (L. 270), but what does this mean? Odors typically increase the firing in some neurons and suppress firing in others. Where does the delta come from? Is this because novel odors more strongly activate neurons that increase their firing or because familiar odors more strongly suppress neurons?
Ls. 122-124 - If cells in AON and aPCx responded the same way to novel and familiar odors, then we would say that they only encode for odor and not at all for experience. So, I don't understand why the authors say these areas code for a "mixed representation of chemical identity and experience." "On the other hand," if LEC, CA1, and SUB are odor selective and only encode novel odors, then these areas, not AON and aPCx, are the jointly encoding chemical identity and experience. Also, I do not understand why, here, they say that AON and PCx respond to both while LEC, CA1, and SUB were selective for novel stimuli, but the authors then go on to argue that novelty is encoded in the AON and PCx, but not in the LEC, CA1, and SUB.
Ls. 132-140 - As presented in the text and the figure, this section is unclear and confusing. Their use of the word "shuffled" is a major source of this confusion, because this typically is the control that produces outcomes at chance level. More importantly, it seems as though they did the wrong analysis here. A better way to do this analysis is to train on some of the odors and test on an untrained odor (i.e., what Bernardi et al., 2021 called "cross-condition generalization performance"; CCGP).
Comments on revisions:
I think the authors have done an adequate job addressing the reviewers' concerns. Most importantly, I found the first version of the manuscript quite confusing, and the consequent clarifications have addressed this issue.
In several cases, I see their point, while I still disagree with whether they made the best decisions. However, the issues here do not fundamentally change the big-picture outcome, and if they want to dig in with their approaches (e.g., only using auROC or just reporting delta firing rates without any normalization), it's their choice.
-
Reviewer #3 (Public review):
In this manuscript, the authors investigate how odor-evoked neural activity is modulated by experience within the olfactory-hippocampal network. The authors perform extracellular recordings in the anterior olfactory nucleus (AON), the anterior piriform (aPCx) and lateral entorhinal cortex (LEC), the hippocampus (CA1) and the subiculum (SUB), in naïve mice and in mice repeatedly exposed to the same odorants. They determine the response properties of individual neurons and use population decoding analyses to assess the effect of experience on odor information coding across these regions.
The authors' findings show that odor identity is represented in all recorded areas, but that the response magnitude and selectivity of neurons are differentially modulated by experience across the olfactory-hippocampal pathway.
Overall, this work represents a valuable multi-region data set of odor-evoked neural activity. However, a few limitations in experimental design and analysis restrict the conclusions that can be drawn from this study.
Main limitations:
The authors use a non-associative learning paradigm - repeated odor exposure - to test how experience modulates odor responses along the olfactory-hippocampal pathway. While repeated odor exposure clearly modulates sampling behavior and odor-evoked neural activity, the relevance of this modulation across different brain areas remains difficult to assess.
The authors discuss the olfactory-hippocampal pathway as a transition from primary sensory (AON, aPCx) to associative areas (LEC, CA1, SUB). While this is reasonable, given the known circuit connectivity, other interpretations are possible. For example, AON, aPCx, and LEC receive direct inputs from the olfactory bulb ('primary cortex'), while CA1 and SUB do not; AON receives direct top-down inputs from CA1 ('associative cortex'), while aPCx does not. In fact, the data presented in this manuscript do not appear to support a consistent transformation from sensory to associative, as implied by the authors.
-
Author response:
The following is the authors’ response to the original reviews.
Public reviews:
Reviewer #1 (Public review):
In this important study, the authors characterized the transformation of neural representations of olfactory stimuli from the primary sensory cortex to multisensory regions in the medial temporal lobe and investigated how they were affected by non-associative learning. The authors used high-density silicon probe recordings from five different cortical regions while familiar vs. novel odors were presented to a head-restrained mouse. This is a timely study because unlike other sensory systems (e.g., vision), the progressive transformation of olfactory information is still poorly understood. The authors report that both odor identity and experience are encoded by all of these five cortical areas but nonetheless some themes emerge. Single neuron tuning of odor identity is broad in the sensory cortices but becomes narrowly tuned in hippocampal regions. Furthermore, while experience affects neuronal response magnitudes in early sensory cortices, it changes the proportion of active neurons in hippocampal regions. Thus, this study is an important step forward in the ongoing quest to understand how olfactory information is progressively transformed along the olfactory pathway.
The study is well-executed. The direct comparison of neuronal representations from five different brain regions is impressive. Conclusions are based on single neuronal level as well as population level decoding analyses. Among all the reported results, one stands out for being remarkably robust. The authors show that the anterior olfactory nucleus (AON), which receives direct input from the olfactory bulb output neurons, was far superior at decoding odor identity as well as novelty compared to all the other brain regions. This is perhaps surprising because the other primary sensory region - the piriform cortex - has been thought to be the canonical site for representing odor identity. A vast majority of studies have focused on aPCx, but direct comparisons between odor coding in the AON and aPCx are rare. The experimental design of this current study allowed the authors to do so and the AON was found to convincingly outperform aPCx. Although this result goes against the canonical model, it is consistent with a few recent studies including one that predicted this outcome based on anatomical and functional comparisons between the AON-projecting tufted cells vs. the aPCx-projecting mitral cells in the olfactory bulb (Chae, Banerjee et. al. 2022). Future experiments are needed to probe the circuit mechanisms that generate this important difference between the two primary olfactory cortices as well as their potential causal roles in odor identification.
The authors were also interested in how familiarity vs. novelty affects neuronal representation across all these brain regions. One weakness of this study is that neuronal responses were not measured during the process of habituation. Neuronal responses were measured after four days of daily exposure to a few odors (familiar) and then some other novel odors were introduced. This creates a confound because the novel vs. familiar stimuli are different odorants and that itself can lead to drastic differences in evoked neural responses. Although the authors try to rule out this confound by doing a clever decoding and Euclidian distance analysis, an alternate more straightforward strategy would have been to measure neuronal activity for each odorant during the process of habituation.
Reviewer #2 (Public review):
This manuscript investigates how olfactory representations are transformed along the cortico-hippocampal pathway in mice during a non-associative learning paradigm involving novel and familiar odors. By recording single-unit activity in several key brain regions (AON, aPCx, LEC, CA1, and SUB), the authors aim to elucidate how stimulus identity and experience are encoded and how these representations change across the pathway.
The study addresses an important question in sensory neuroscience regarding the interplay between sensory processing and signaling novelty/familiarity. It provides insights into how the brain processes and retains sensory experiences, suggesting that the earlier stations in the olfactory pathway, the AON aPCx, play a central role in detecting novelty and encoding odor, while areas deeper into the pathway (LEC, CA1 & Sub) are more sparse and encodes odor identity but not novelty/familiarity. However, there are several concerns related to methodology, data interpretation, and the strength of the conclusions drawn.
Strengths:
The authors combine the use of modern tools to obtain high-density recordings from large populations of neurons at different stages of the olfactory system (although mostly one region at a time) with elegant data analyses to study an important and interesting question.
Weaknesses:
(1) The first and biggest problem I have with this paper is that it is very confusing, and the results seem to be all over the place. In some parts, it seems like the AON and aPCx are more sensitive to novelty; in others, it seems the other way around. I find their metrics confusing and unconvincing. For example, the example cells in Figure 1C show an AON neuron with a very low spontaneous firing rate and a CA1 with a much higher firing rate, but the opposite is true in Figure 2A. So, what are we to make of Figure 2C that shows the difference in firing rates between novel vs. familiar odors measured as a difference in spikes/sec. This seems nearly meaningless. The authors could have used a difference in Z-scored responses to normalize different baseline activity levels. (This is just one example of a problem with the methodology.)
We appreciate the reviewer’s concerns regarding clarity and methodology. It is less clear why all neurons in a given brain area should have similar firing rates. Anatomically defined brain areas typically comprise of multiple cell types, which can have diverse baseline firing rates. Since we computed absolute firing rate differences per neuron (i.e., novel vs. familiar odor responses within the same neuron), baseline differences across neurons do not have a major impact.
The suggestion to use Z-scores instead of absolute firing rate differences is well taken. However, Z-scoring assumes that the underlying data are normally distributed, which is not the case in our dataset. Specifically, when analyzing odor-evoked firing rates on a per-neuron basis, only 4% of neurons exhibit a normal distribution. In cases of skewed distributions, Z-scoring can distort the data by exaggerating small variations, leading to misleading conclusions. We acknowledge that different analysis methods exist, we believe that our chosen approach best reflects the properties of the dataset and avoids potential misinterpretations introduced by inappropriate normalization techniques.
(2) There are a lot of high-level data analyses (e.g., decoding, analyzing decoding errors, calculating mutual information, calculating distances in state space, etc.) but very little neural data (except for Figure 2C, and see my comment above about how this is flawed). So, if responses to novel vs. familiar odors are different in the AON and aPCx, how are they different? Why is decoding accuracy better for novel odors in CA1 but better for familiar odors in SUB (Figure 3A)? The authors identify a small subset of neurons that have unusually high weights in the SVM analyses that contribute to decoding novelty, but they don't tell us which neurons these are and how they are responding differently to novel vs. familiar odors.
We performed additional analyses to address the reviewer’s feedback (Figures 2C-E and lines 118-132) and added more single-neuron data (Figures 1, S3 and S4).
(3) The authors call AON and aPCx "primary sensory cortices" and LEC, CA1, and Sub "multisensory areas". This is a straw man argument. For example, we now know that PCx encodes multimodal signals (Poo et al. 2021, Federman et al., 2024; Kehl et al., 2024), and LEC receives direct OB inputs, which has traditionally been the criterion for being considered a "primary olfactory cortical area". So, this terminology is outdated and wrong, and although it suits the authors' needs here in drawing distinctions, it is simplistic and not helpful moving forward.
We appreciate the reviewer’s concern regarding the classification of brain regions as “primary sensory” versus “multisensory.” Of note, the cited studies (Poo et al., 2021; Federman et al., 2024; Kehl et al., 2024) focus on posterior PCx (pPCx), while our recordings were conducted in very anterior section of anterior PCx. The aPCx and pPCx have distinct patterns of connectivity, both anatomically and functionally. To the best of our knowledge, there is no evidence for multimodal responses in aPCx, whereas there is for LEC, CA1 and SUB. Furthermore, our distinction is not based on a connectivity argument, as the reviewer suggests, but on differences in the α-Poisson ratio (Figure 1E and F).
To avoid confusion due to definitions of what constitutes a “primary sensory” region, we adopted a more neutral description throughout the manuscript.
(4) Why not simply report z-scored firing rates for all neurons as a function of trial number? (e.g., Jacobson & Friedrich, 2018). Figure 2C is not sufficient.
Regarding z-scores, please see response to 1). We further added a figure showing responses of all neurons to novel stimuli (using ROC instead of z-scoring, as described previously (e.g. Cohen et al. Nature 2012). We added the following figure to the supplementary for the completeness of the analysis (S2E).
For example, in the Discussion, they say, "novel stimuli caused larger increases in firing rates than familiar stimuli" (L. 270), but what does this mean?
This means that on average, the population of neurons exhibit higher firing rates in response to novel odors compared to familiar ones.
Odors typically increase the firing in some neurons and suppress firing in others. Where does the delta come from? Is this because novel odors more strongly activate neurons that increase their firing or because familiar odors more strongly suppress neurons?
We thank the reviewer for this valuable feedback and extended the characterization of firing rate properties, including a separate analysis of neurons i) significantly excited by odorants, ii) significantly inhibited by odorants and iii) not responsive to odorants. We added the analysis and corresponding discussion to the main manuscript (Figures 2C-E and lines 118-132)
(5) Lines 122-124 - If cells in AON and aPCx responded the same way to novel and familiar odors, then we would say that they only encode for odor and not at all for experience. So, I don't understand why the authors say these areas code for a "mixed representation of chemical identity and experience." "On the other hand," if LEC, CA1, and SUB are odor selective and only encode novel odors, then these areas, not AON and aPCx, are the jointly encoding chemical identity and experience. Also, I do not understand why, here, they say that AON and PCx respond to both while LEC, CA1, and SUB were selective for novel stimuli, but the authors then go on to argue that novelty is encoded in the AON and PCx, but not in the LEC, CA1, and SUB.
We appreciate the reviewer’s request for clarification. Throughout the brain areas we studied, odorant identity and experience can be decoded. However, the way information is represented is different between regions. We acknowledge that that “mixed” representation is a misleading term and removed it from the manuscript.
In AON and aPCx, neurons significantly respond to both novel and familiar odors. However, the magnitude of their responses to novel and familiar odors is sufficiently distinct to allow for decoding of odor experience (i.e., whether an odor is novel or familiar). Moreover, novelty engages more neurons in encoding the stimulus (Figure 2D). In neural space, the position of an odor’s representation in AON and aPCx shifts depending on whether it is novel or familiar, meaning that experience modifies the neural representation of odor identity. This suggests that in these regions the two representations are intertwined.
In contrast, some neurons in LEC, CA1, and SUB exhibit responses to novel odors, but few neurons respond to familiar odors at all. This suggests a more selective encoding of novelty.
(6) Lines 132-140 - As presented in the text and the figure, this section is poorly written and confusing. Their use of the word "shuffled" is a major source of this confusion, because this typically is the control that produces outcomes at the chance level. More importantly, they did the wrong analysis here. The better and, I think, the only way to do this analysis correctly is to train on some of the odors and test on an untrained odor (i.e., what Bernardi et al., 2021 called "cross-condition generalization performance"; CCGP).
We appreciate the feedback and thank the reviewer for the recommendation to implement cross-condition generalization performance (CCGP) as used in Bernardi et al., 2020. We acknowledge that the term "shuffled" may have caused confusion, as it typically refers to control analyses producing chance-level outcomes. In our case, by "shuffling" we shuffled the identity of novel and familiar odors to assess how much the decoder relies on odor identity when distinguishing novelty. This test provided insight into how novelty-based structure exists within neural activity beyond random grouping but does not directly assess generalization.
As suggested, we used CCGP to measure how well novelty-related representations generalize across different odors. Our findings show that in AON and aPCx, novelty-related information is indeed highly generalizable, supporting the idea that these regions encode novelty in a less odor-selective manner (Figure 2K).
Reviewer #3 (Public review):
In this manuscript, the authors investigate how odor-evoked neural activity is modulated by experience within the olfactory-hippocampal network. The authors perform extracellular recordings in the anterior olfactory nucleus (AON), the anterior piriform (aPCx) and lateral entorhinal cortex (LEC), the hippocampus (CA1), and the subiculum (SUB), in naïve mice and in mice repeatedly exposed to the same odorants. They determine the response properties of individual neurons and use population decoding analyses to assess the effect of experience on odor information coding across these regions.
The authors' findings show that odor identity is represented in all recorded areas, but that the response magnitude and selectivity of neurons are differentially modulated by experience across the olfactory-hippocampal pathway.
Overall, this work represents a valuable multi-region data set of odor-evoked neural activity. However, limitations in the interpretability of odor experience of the behavioral paradigm, and limitations in experimental design and analysis, restrict the conclusions that can be drawn from this study.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
Some suggestions, in no particular order, to further improve the manuscript:
(1) The example neuronal responses for CA1 and SUB in Figure 1 are not very inspiring. To my eyes, the odor period response is not that different from the baseline period. In general, a thorough characterization of firing rate properties during the odor period between the different brain regions would be informative.
We thank the reviewer for this valuable feedback. We have replaced the example neurons from CA1 and SUB in Figure 1C. We further extended the characterization of firing rate properties, including a separate analysis of neurons i) significantly excited by odorants, ii) significantly inhibited by odorants and iii) not responsive to odorants. We added the analysis and corresponding discussion to the main manuscript (Figures 2C-E and lines 118-132)
(2) For the summary in Figure 1, why not show neuronal responses as z-scored firing rates as opposed to auROC?
We chose to use auROC instead of z-scored firing rates due to the non-normality of the dataset, which can distort results when using z-scores. Specifically, z-scoring can exaggerate small deviations in neurons with low responsiveness, potentially leading to misleading conclusions. auROC provides a more robust measure of response change that is less sensitive to these distortions because it does not assume any specific distribution. This approach has been used previously (e.g. Cohen et al. 2012, Nature).
(3) To study novelty, the authors presented odorants that were not used during four days of habituation. But this design makes it hard to dissociate odor identity from novelty. Why not track the response of the same odorants during the habituation process itself?
We respectfully disagree with the argument that using different stimuli as novel and familiar constitutes a confound in our analysis. In our study, we used multiple different, structurally dissimilar single molecule chemicals which were randomly assigned to novel and familiar categories in each animal. If individual stimuli did cause “drastic differences in evoked neural responses”, these would be evenly distributed between novel and familiar stimuli. It is therefore extremely unlikely that the clear differences we observed between novel and familiar conditions and between brain areas can be attributed to the contribution of individual stimuli, in particular given our analyses was performed at the population level. In fact, we observed that responses between novel and familiar conditions were qualitatively very similar in the short time window after odor onset (Figure 1G and H).
Importantly, the goal of this study was to investigate the impact of long-term habituation over more than 4 days, rather than short term habituation during one behavioral session. However, tracking the activity of large numbers of neurons across multiple days presents a significant technical challenge, due to the difficulty of identifying stable single-unit recordings over extended periods of time with sufficient certainty. Tools that facilitate tracking have recently been developed (e.g. Yuan AX et al., Elife. 2024) and it will be interesting to apply them to our dataset in the future.
(4) Since novel odors lead to greater sniffing and sniffing strongly influences firing rates in the olfactory system, the authors decided to focus on a 400 ms window with similar sniffing rates for both novel vs. familiar odors. Although I understand the rationale for this choice, I worry that this is too restrictive, and it may not capture the full extent of the phenomenology.
Could the authors model the effect of sniffing on firing rates of individual neurons from the data, and then check whether the odor response for novel context can be fully explained just by increased sniffing or not?
It is an interesting suggestion to extend the window of analysis and observe how responses evolve with sniffing (and other behavioral reactions). To address this, we added an additional figure to the supplementary material, showing the mean responses of all neurons to novel stimuli during the entire odor presentation window (Fig. S1B).
As suggested, we further created a Generalized Linear Model (GLM) for the entire 2s odor stimulation period, incorporating sniffing and novelty as independent variables. As expected, sniffing had a dominant impact on firing rate in all brain areas. A smaller proportion of neurons was modulated by novelty or by the interaction between novelty x breathing, suggesting the entrainment of neural activity by sniffing during the response to novel odors. These results support our decision to focus the analysis on the early 400ms window in order to dissociate the effects of novelty and behavioral responses. Taken together, our results suggest that odorant responses are modulated by novelty early during odorant processing, whereas at later stages sniffing becomes the predominant factor driving firing (Figure S2C-D).
(5) The authors conclude that aPCx has a subset of neurons dedicated to familiar odors based on the distribution of SVM weights in Figure 3D. To me, this is the weakest conclusion of the paper because although significant, the effect size is paltry; the central tendencies are hardly different for the two conditions in aPCx. Could the authors show the PSTHs of some of these neurons to make this point more convincing?
We appreciate the reviewer’s concern regarding the effect size. To strengthen our conclusion, we now include PSTHs of representative neurons in the least 10% and best 10% of neuronal population based on the SVM analysis (Figures S3 and S4). We hope this provides more clarity and support for the interpretation that there is a subset of neurons in aPCx that show greater sensitivity to familiar odors, despite the relatively modest central tendency differences.
In the revised manuscript, we discuss the effect size more explicitly in the text to provide context for its significance (lines 193 - 195).
Reviewer #2 (Recommendations for the authors):
(1) The authors only talk about "responsive" neurons. Does this include neurons whose activity increases significantly (activated) and neurons whose activity decreases (suppressed)?
Yes, the term "responsive" refers to neurons whose activity either increases significantly (excited) or decreases (inhibited) in response to the odor stimuli. We performed additional analyses to characterize responses separately for the different groups (Figure 2C-E and lines 118-132).
(2) Line 54 - The Schoonover paper doesn't show that cells lose their responses to odors, but rather that the population of cells that respond to odors changes with time. That is, population responses don't become more sparse
The fact that “the population of cells that respond to odors changes with time”, implies that some neurons lose their responsiveness (e.g. unit 2 in Figure 1 of Schoonover et al., 2021), while others become responsive (e.g. unit 1 in Figure 1 of Schoonover et al., 2021). Frequent responses reduce drift rate (Figure 4 of Schoonover et al., 2021), thus fewer neurons loose or gain responsiveness. We have revised the manuscript to clarify this.
(3) Line 104 - "Recurrent" is incorrectly used here. I think the authors mean "repeated" or something more like that.
Thank you for pointing this out. We replaced "recurrent" with "repeated".
(4) Figure 3D - What is the scale bar here?
We apologize for the accidental omission. The scale bar was be added to Figure 3D in the revised version of the manuscript.
(5) Line 377 - They say they lowered their electrodes to "200 um/s per second." This must be incorrect. Is this just a typo, or is it really 200 um/s, because that's really fast?
Thank you for pointing this out. It was 20 to 60 um/s, the change has been made in the manuscript.
(6) Line 431: The authors say they used auROC to calculate changes in firing rates (which I think is only shown in Figure 1D). Note that auROC measures the discriminability of two distributions, not the strength or change in the strength of response.
Indeed we used auROC to measure the discriminability of firing between baseline and during stimulus response. We have corrected the wording in the methods.
(7) Figure 1B: The anatomical locations of the five areas they recorded from are straightforward, and this figure is not hugely helpful. However, the reader would benefit tremendously by including an experimental schematic. As is, we needed to scour the text and methods sections to understand exactly what they did when.
We thank the reviewer for this suggestion. We included an experimental schematic in the supplementary material.
(8) Figure 1F(left): This plot is much less useful without showing a pre-odor window, even if only times after the odor onset were used for calculation alpha
We appreciate this concern, however the goal of Figure 1F is to illustrate the meaning of the alpha value itself. We chose not to include a pre-odor window comparison to avoid confusing the reader.
(9) Figure 2A: What are the bar plots above the raster plots? Are these firing rates? Are the bars overlaid or stacked? Where is the y-axis scale bar?
The bar plots above the raster plots represent a histogram of the spike count/trials over time, with a bin width of 50 ms. These bars are overlaid on the raster plot. We will include a y-axis scale bar in the revised figure to clarify the presentation.
(10) Figure 4G: This makes no sense. First, the Y axis is supposed to measure standard deviation, but the axis label is spikes/s. Second, if responses in the AON are much less reliable than responses in "deeper" areas, why is odor decoding in AON so much better than in the other areas?
We acknowledge the error in the axis label, and we will correct it to indicate the correct units. AON has a larger response variability but also larger responses magnitudes, which can explain the higher decoding accuracy.
(11) From the model and text, one predicts that the lifetime sparseness increases along the pathway. The authors should use this metric as well/instead of "odor selectivity" because of problems with arbitrary thresholding.
We acknowledge that lifetime sparseness, often computed using lifetime kurtosis, can be an informative measure of selectivity. However, we believe it has limitations that make it less suitable for our analysis. One key issue is that lifetime sparseness does not account for the stability of responses across multiple presentations of the same stimulus. In contrast, our odor selectivity measure incorporates trial-to-trial variability by considering responses over 10 trials and assessing significance using a Wilcoxon test compared to baseline. While the choice of a p-value threshold (e.g., 0.05) is somewhat arbitrary, it is a widely accepted statistical convention. Additionally, lifetime sparseness does not account for excitatory and inhibitory responses. For example, if a neuron X is strongly inhibited by odor A, strongly excited by odor B, and unresponsive to odors C and D, lifetime sparseness would classify it as highly selective for odor B, without capturing its inhibitory selectivity for odor A. The lifetime sparseness will be higher than if X was simply unresponsive for A.
Our odor selectivity measure addresses this by considering both excitation and inhibition as potential responses. Thus, while lifetime sparseness could provide a useful complementary perspective in another type of dataset, it does not fully capture the dynamics of odor selectivity here.
Author response 1.
Lifetime Kurtosis distribution per region.
Reviewer #3 (Recommendations for the authors):
Main points:
(1) The authors use a non-associative learning paradigm - repeated odor exposure - to test how experience modulates odor responses along the olfactory-hippocampal pathway. While repeated odor exposure clearly modulates odor-evoked neural activity, the relevance of this modulation and its differential effect across different brain areas are difficult to assess in the absence of any behavioral read-outs.
Our experimental paradigm involves a robust, reliable behavioral readout of non-associative learning. Novel olfactory stimuli evoke a well-characterized orienting reaction, which includes a multitude of physiological reactions, including exploratory sniffing, facial movements and pupil dilation (Modirshanechi et al., Trends Neuroscience 2023). In our study, we focused on exploration sniffing.
Compared to associative learning, non-associative learning might have received less attention. However, it is critically important because it forms the foundation for how organisms adapt to their environment through experience without forming associations. This is highlighted by the fact that non-instrumental stimuli can be remembered in large number (Standing, 1973) and with remarkable detail (Brady et al., 2008). While non-associative learning can thus create vast, implicit memory of stimuli in the environment, it is unclear how stimulus representations reflect this memory. Our study contributes to answering this question. We describe the impact of experience on olfactory sensory representations and reveal a transformation of representations from olfactory cortical to hippocampal structures. Our findings also indicate that sensory responses to familiar stimuli persist within sensory cortical and hippocampal regions, even after spontaneous orienting behaviors habituated. Further studies involving experimental manipulation techniques are needed to elucidate the causal mechanisms underlying the formation of stimulus memory during non-associative learning.
(2) The authors discuss the olfactory-hippocampal pathway as a transition from primary sensory (AON, aPCx) to associative areas (LEC, CA1, SUB). While this is reasonable, given the known circuit connectivity, other interpretations are possible. For example, AON, aPCx, and LEC receive direct inputs from the olfactory bulb ('primary cortex'), while CA1 and SUB do not; AON receives direct top-down inputs from CA1 ('associative cortex'), while aPCx does not. In fact, the data presented in this manuscript does not appear to support a consistent, smooth transformation from sensory to associative, as implied by the authors (e.g. Figure 4A, F, and G).
Thank you for this insightful comment. Indeed, there are complexities in the circuitry, and the relationships between different areas are not linear. We believe that AON and aPCx are distinctly different from LEC, CA1 and SUB, as the latter areas have been shown to integrate multimodal sensory information. To avoid confusion due to definitions of what constitutes a “primary sensory” region, we adopted a more neutral description throughout the manuscript. We also removed the term “gradual” to describe the transition of neural representations from olfactory cortical to hippocampal areas.
(3) The analysis of odor-evoked responses is focused on a 400 ms window to exclude differences in sniffing behavior. This window spans 200 ms before and after the first inhalation after odor onset. Inhalation onset initiates neural odor responses - why do the authors include neural data before inhalation onset?
The reason to include a brief time window prior to odor onset is to account for what is often called “partical” sniffs. In our experimental setup, odor delivery is not triggered by the animal’s inhalation. Therefore, it can happen that an animal has just begun to inhale when the stimulus is delivered. In this case, the animal is exposed to odorant molecules prior to the first complete inhalation after odor onset. We acknowledge that this limits the temporal resolution of our measurements, but it does not affect the comparison of sensory representations between different brain areas.
It would also be interesting to explore the effect of sniffing behavior (see point 2) on odor-evoked neural activity.
Thank you for your comment, we performed additional analysis including a GLM to address this question (Figure S2C-D).
Minor points:
(4) Figure 2A represents raster plots for 2 neurons per area - it is unclear how to distinguish between the 2 neurons in the plots.
Figure 2A shows one example neuron per brain area. Each neurons has two raster plot which indicate responses to either a novel (orange) or a familiar stimulus (blue). We have revised the figure caption for clarity.
(5) Overall, axes should be kept consistent and labeled in more detail. For example, Figure 2H and I are difficult to compare, given that the y-axis changes and that decoding accuracies are difficult to estimate without additional marks on the y-axis.
Axes are indeed different, because chance level decoding accuracy is different between those two figures. The decoding between novel and familiar odors has a chance level of 0.5, while chance level decoding odors is 0.1 (there are 10 odors to decode the identity from).
(6) Some parts of the discussion seem only loosely related to the data presented in this manuscript. For example, the statement that 'AON rather than aPCx should be considered as the primary sensory cortex in olfaction' seems out of context. Similarly, it would be helpful to provide data on the stability of subpopulations of neurons tuned to familiar odors, rather than simply speculate that they could be stable. The authors could summarize more speculative statements in an 'Ideas and Speculation' subsection.
Thank you for your comment. We appreciate your perspective on our hypotheses. We have revised the discussion accordingly. Specifically, we removed the discussion of stable subpopulations, since we have not performed longitudinal tracking in this study.
(7) The authors should try to reference relevant published work more comprehensively.
Thank you for your comment. We attempted to include relevant published work without exceeding the limit for references but might have overseen important contributions. We apologize to our colleagues, whose relevant work might not have been cited.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This work provides a fundamental molecular mechanism of how a single enzyme can coordinate the ordered assembly of hyaluronan, a complex polysaccharide, from two different building blocks in an alternating pattern. The authors present compelling evidence by combining high-resolution structural data with rigorous biochemical validation to define the underlying process. Major strengths of the study include the clarity and coherence of the mechanistic insights and the complementary use of structural and functional approaches to address the research question.
-
Reviewer #1 (Public review):
Summary:
This manuscript describes critical intermediate reaction steps of a HA synthase at the molecular level; specifically, it examines the 2nd step, polymerization, adding GlcA to GlcNAc to form the initial disaccharide of the repeating HA structure. Unlike the vast majority of known glycosyltransferases, the viral HAS (a convenient proxy extrapolated to resemble the vertebrate forms) uses a single pocket to catalyze both monosaccharide transfer steps. The authors' work illustrates the interactions needed to bind & proof-read the UDP-GlcA using direct and '2nd layer' amino acid residues. This step also allows the HAS to distinguish the two UDP-sugars; this is very important as the enzymes are not known or observed to make homopolymers of only GlcA or GlcNAc, but only make the HA disaccharide repeats GlcNAc-GlcA.
Strengths:
Overall, the strengths of this paper lie in its techniques & analysis.
The authors make significant leaps forward towards understanding this process using a variety of tools and comparisons of wild-type & mutant enzymes. The work is well presented overall with respect to the text and illustrations (especially the 3D representations), and the robustness of the analyses & statistics is also noteworthy.
Furthermore, the authors make some strides towards creating novel sugar polymers using alternative primers & work with detergent binding to the HAS. The authors tested a wide variety of monosaccharides and several disaccharides for primer activity and observed that GlcA could be added to cellobiose and chitobiose, which are moderately close structural analogs to HA disaccharides. Did the authors also test the readily available HA tetramer (HA4, [GlcA-GlcNAc]2) as a primer in their system? This is a highly recommended experiment; if it works, then this molecule may also be useful for cryo-EM studies of CvHAS as well.
Weaknesses:
In the past, another report describing the failed attempt of elongating short primers (HA4 & chitin oligosaccharides larger than the cello- or chitobiose that have activity in this report) with a vertebrate HAS, XlHAS1, an enzyme that seems to behave like the CvHAS ( https://pubmed.ncbi.nlm.nih.gov/10473619/); this work should probably be cited and briefly discussed. It may be that the longer primers in the 1999 paper and/or the different construct or isolation specifics (detergent extract vs crude) were not conducive to the extension reaction, as the authors extracted recombinant enzyme.
There are a few areas that should be addressed for clarity and correctness, especially defining the class of HAS studied here (Class I-NR) as the results may (Class I-R) or may not (Class II) align (see comment (a) below), but overall, a very nicely done body of work that will significantly enhance understanding in the field.
-
Reviewer #2 (Public review):
Summary:
The paper by Stephens and co-workers provides important mechanistic insight into how hyaluronan synthase (HAS) coordinates alternating GlcNAc and GlcA incorporation using a single Type-I catalytic centre. Through cryo-EM structures capturing both "proofreading" and fully "inserted" binding poses of UDP-GlcA, combined with detailed biochemical analysis, the authors show how the enzyme selectively recognizes the GlcA carboxylate, stabilizes substrates through conformational gating, and requires a priming GlcNAc for productive turnover.
These findings clarify how one active site can manage two chemically distinct donor sugars while simultaneously coupling catalysis to polymer translocation.
The work also reports a DDM-bound, detergent-inhibited conformation that possibly illuminates features of the acceptor pocket, although this appears to be a purification artefact (it is indeed inhibitory) rather than a relevant biological state.
Overall, the study convincingly establishes a unified catalytic mechanism for Type-I HAS enzymes and represents a significant advance in understanding HA biosynthesis at the molecular level.
Strengths:
There are many strengths.
This is a multi-disciplinary study with very high-quality cryo-EM and enzyme kinetics (backed up with orthogonal methods of product analysis) to justify the conclusions discussed above.
Weaknesses:
There are few weaknesses.
The abstract and introduction assume a lot of detailed prior knowledge about hyaluronan synthases, and in doing so, risk lessening the readership pool.
A lot of discussion focuses on detergents (whose presence is totally inhibitory) and transfer to non-biological acceptors (at high concentrations). This risks weakening the manuscript.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This valuable study addresses a question related to how we achieve visual stability across saccadic eye movements. The authors' gaze-contingent fMRI design provides convincing evidence that peripherally presented visual stimuli are represented in foveal visual cortex prior to a saccade. The results will be of interest to vision scientists and behavioural neuroscientists.
-
Reviewer #2 (Public review):
Summary:
This study investigated whether the identity of a peripheral saccade target object is fed back to the foveal retinotopic cortex during saccade preparation, a critical prediction of the foveal prediction hypothesis proposed by Kroell & Rolfs (2022). To achieve this, the authors leveraged a gaze-contingent fMRI paradigm, where the peripheral saccade target was removed before the eyes landed near it, and used multivariate decoding analysis to quantify identity information in the foveal cortex. The results showed that the identity of the saccade target object can be decoded based on foveal cortex activity, despite the fovea never directly viewing the object, and that the foveal feedback representation was similar to passive viewing and not explained by spillover effects. Additionally, exploratory analysis suggested IPS as a candidate region mediating such foveal decodability. Overall, these findings provide neural evidence for the foveal cortex processing the features of the saccade target object, potentially supporting the maintenance of perceptual stability across saccadic eye movements.
Strengths:
This study is well-motivated by previous theoretical findings (Kroell & Rolfs, 2022), aiming to provide neural evidence for a potential neural mechanism of trans-saccadic perceptual stability. The question is important, and the gaze-contingent fMRI paradigm is a solid methodological choice for the research goal. The use of stimuli allowing orthogonal decoding of stimulus category vs stimulus shape is a nice strength, and the resulting distinctions in decoded information by brain region are clean. The results will be of interest to readers in the field, and they fill in some untested questions regarding pre-saccadic remapping and foveal feedback.
Weaknesses:
The authors have done a nice job addressing the previous weaknesses. The remaining weaknesses / limitations are appropriately discussed in the manuscript. E.g., the use of only 4 unique stimuli in the experiment. The findings are intriguing and relevant to saccadic remapping and foveal feedback, but somewhat limited in terms of the ability to draw theoretical distinctions between these related phenomena.
Specifics:
The revised manuscript is much improved in terms of framing and discussion of the prior literature, and the theoretical claims are now stated with appropriate nuance.
I have two remaining minor suggestions/comments, which the authors may optionally respond to:
(1) In the parametric modulation analysis, the authors' additional analyses nicely addresses my concern and strengthens the claim. However, the description in the revised manuscript (Pg 7 Ln 190-191) is minimal and may be difficult to grasp what the control analysis is about and how it rules out alternative explanations to the IPS findings. The authors may wish to elaborate on the description in the text.
(2) Out of curiosity (not badgering): The authors argued that the findings of Harrison et al. (2013) and Szinte et al. (2015) can be explained by feature integration between the currently attended location and its future, post-saccadic location. Couldn't the same argument apply in the current paradigm, where attention at the saccade target gets remapped to the pre-saccadic fovea (see also Rolfs et al., 2011 Fig 5), thus leading to the observed feature remapping?
-
Reviewer #3 (Public review):
Summary:
In this paper the authors used fMRI to determine whether peripherally-viewed objects could be decoded from foveal cortex, even when the objects themselves were never viewed foveally. Specifically they investigated whether pre-saccadic target attributes (shape, semantic category) could be decoded from foveal cortex. They found that object shape, but not semantic category could be decoded, providing evidence that foveal feedback relies on low-mid-level information. The authors claim that this provides evidence for a mechanism underlying visual stability and object recognition across saccades.
Strengths:
I think this is another nice demonstration that peripheral information can be decoded from / is processed in foveal cortex - the methods seem appropriate, and the experiments and analyses carefully conducted, and the main results seem convincing. The paper itself was very clear and well-written.
Weaknesses:
Given that foveal feedback has been found in previous studies that don't incorporate saccades, it is still unclear how this mechanism might specifically contribute to stability across saccades, rather than just being a general mechanism that aids the processing/discrimination of peripherally-viewed stimuli. The authors address this point, but I guess whether foveal feedback during fixation and saccade prep are really the same, is ultimately a question that needs more experimental work to disentangle.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary:
The main contributions of this paper are: (1) a replication of the surprising prior finding that information about peripherally-presented stimuli can be decoded from foveal V1 (Williams et al 2008), (2) a new demonstration of cross-decoding between stimuli presented in the periphery and stimuli presented at the fovea, (3) a demonstration that the information present in the fovea is based on shape not semantic category, and (4) a demonstration that the strength of foveal information about peripheral targets is correlated with the univariate response in the same block in IPS.
Strengths:
The design and methods appear sound, and finding (2) above is new, and importantly constrains our understanding of this surprising phenomenon. The basic effect investigated here is so surprising that even though it has been replicated several times since it was first reported in 2008, it is useful to replicate it again.
We thank the reviewer for their summary. While we agree with many points, we would like to respectfully push back on the notion that this work is a replication of Williams et al. (2008). What our findings share with those of Williams is a report of surprising decoding at the fovea without foveal stimulation. Beyond this similarity, we treat these as related but clearly separate findings, for the following reasons:
(1) Foveal feedback, as shown by Williams et al. (2008) and others during fixation, was only observed during a shape discrimination task, specific to the presented stimulus. Control experiments without such a task (or a color-related task) did not show effects of foveal feedback. In contrast, in the present study, the participants’ task was merely to perform saccades towards stimuli, independently of target features. We thus show that foveal feedback can occur independently of a task related to stimulus features. This dissociation demonstrates that our study must be tapping into something different than reported by Williams.
(2) In a related study, Kroell and Rolfs (2022, 2025) demonstrated a connection between foveal feedback and saccade preparation, including the temporal details of the onset of this effect before saccade execution, highlighting the close link of this effect to saccade preparation. Here we used a very similar behavioral task to capture this saccade-related effect in neural recordings and investigate how early it occurs and what its nature is. Thus, there is a clear motivation for this study in the context of eye movement preparation that is separate from the previous work by Williams.
(3) Lastly, decoding in the experimental task was positively associated with activity in FEF and IPS, areas that have been reliably linked to saccade preparation. We have now also performed an additional analysis (see our response to Specific point 2 of Reviewer 2) showing that decoding in the control condition did not show the same association, further supporting the link of foveal feedback to saccade preparation.
Despite our emphasis on these critical differences in studies, covert peripheral attention, as required by the task in Williams et al., and saccade preparation in natural vision, as in our study, are tightly coupled processes. Indeed, the task in Williams et al. would, during natural vision, likely involve an eye movement to the peripheral target. While speculative, a parsimonious and ecologically valid explanation is that both ours and earlier studies involve eye movement preparation, for which execution is suppressed, however, in studies enforcing fixation (e.g., Williams et al., 2008). We now discuss this idea of a shared underlying mechanism more extensively in the revised manuscript (pg 8 ln 228-240).
Weaknesses:
(1) The paper, including in the title ("Feedback of peripheral saccade targets to early foveal cortex") seems to assume that the feedback to foveal cortex occurs in conjunction with saccade preparation. However, participants in the original Williams et al (2008) paper never made saccades to the peripheral stimuli. So, saccade preparation is not necessary for this effect to occur. Some acknowledgement and discussion of this prior evidence against the interpretation of the effect as due to saccade preparation would be useful. (e.g., one might argue that saccade preparation is automatic when attending to peripheral stimuli.)
We agree that the effects Williams et al. showed were not sufficiently discussed in the first version of this manuscript. To more clearly engage with these findings we now introduce saccade related foveal feedback (foveal prediction) and foveal feedback during fixation separately in the introduction (pg 2 ln 46-59).
We further added another section in the discussion called “Foveal feedback during saccade preparation” in which we discuss how our findings are related to Williams et al. and how they differ (pg 8 ln 211-240).
As described in our previous response, we believe that our findings go beyond those described by Williams et al. (2008) and others in significant ways. However, during natural vision, the paradigm used by Williams et al. (2008) would likely be solved using an eye movement. Thus, while participants in Williams et al. (2008) did not execute saccades, it appears plausible that they have prepared saccades. Given the fact that covert peripheral attention and saccade preparation are tightly coupled processes (Kowler et al., 1995, Vis Res; Deubel & Schneider, 1996, Vis Res; Montagnini & Castet, 2007, J Vis; Rolfs & Carrasco, 2012, J Neurosci; Rolfs et al., 2011, Nat Neurosci), their results are parsimoniously explained by saccade preparation (but not execution) to a behaviorally relevant target.
(2) The most important new finding from this paper is the cross-decodability between stimuli presented in the fovea and stimuli presented in the periphery. This finding should be related to the prior behavioral finding (Yu & Shim, 2016) that when a foveal foil stimulus identical to a peripheral target is presented 150 ms after the onset of the peripheral target, visual discrimination of the peripheral target is improved, and this congruency effect occurred even though participants did not consciously perceive the foveal stimulus (Yu, Q., & Shim, W. M., 2016). Modulating foveal representation can influence visual discrimination in the periphery (Journal of Vision, 16(3), 15-15).
We thank the reviewer for highlighting this highly relevant reference. In the revised version of the manuscript, we now put more emphasis on the finding of cross-decodability (pg 2 ln 60-61). We now also discuss Yu et al.’s finding, which support our conclusion that foveal feedback and direct stimulus presentation share representational formats in early visual areas (pg 9 ln 277-279).
(3) The prior literature should be laid out more clearly. For example, most readers will not realize that the basic effect of decodability of peripherally-presented stimuli in the fovea was first reported in 2008, and that that original paper already showed that the effect cannot arise from spillover effects from peripheral retinotopic cortex because it was not present in a retinotopic location between the cortical locus corresponding to the peripheral target and the fovea. (For example, this claim on lines 56-57 is not correct: "it remains unknown 1) whether information is fed back all the way to early visual areas".) What is needed is a clear presentation of the prior findings in one place in the introduction to the paper, followed by an articulation and motivation of the new questions addressed in this paper. If I were writing the paper, I would focus on the cross-decodability between foveal and peripheral stimuli, as I think that is the most revealing finding.
We agree that the structure of the introduction did not sufficiently place our work in the context of prior literature. We have now expanded upon our Introduction section to discuss past studies of saccade- and fixation-related foveal feedback (pg 2 ln 49-59), laying out how this effect has been studied previously. We also removed the claim that "it remains unknown 1) whether information is fed back all the way to early visual areas", where our intention was to specifically focus on foveal prediction. We realize that this was not clear and hence removed this section. Instead, we now place a stronger focus on the cross-decodability finding (pg 2 ln 60-61).
Reviewer #2 (Public review):
Summary:
This study investigated whether the identity of a peripheral saccade target object is predictively fed back to the foveal retinotopic cortex during saccade preparation, a critical prediction of the foveal prediction hypothesis proposed by Kroell & Rolfs (2022). To achieve this, the authors leveraged a gaze-contingent fMRI paradigm, where the peripheral saccade target was removed before the eyes landed near it, and used multivariate decoding analysis to quantify identity information in the foveal cortex. The results showed that the identity of the saccade target object can be decoded based on foveal cortex activity, despite the fovea never directly viewing the object, and that the foveal feedback representation was similar to passive viewing and not explained by spillover effects. Additionally, exploratory analysis suggested IPS as a candidate region mediating such foveal decodability. Overall, these findings provide neural evidence for the foveal cortex processing the features of the saccade target object, potentially supporting the maintenance of perceptual stability across saccadic eye movements.
Strengths:
This study is well-motivated by previous theoretical findings (Kroell & Rolfs, 2022), aiming to provide neural evidence for a potential neural mechanism of trans-saccadic perceptual stability. The question is important, and the gaze-contingent fMRI paradigm is a solid methodological choice for the research goal. The use of stimuli allowing orthogonal decoding of stimulus category vs stimulus shape is a nice strength, and the resulting distinctions in decoded information by brain region are clean. The results will be of interest to readers in the field, and they fill in some untested questions regarding pre-saccadic remapping and foveal feedback.
We thank the reviewer for the positive assessment of our study.
Weaknesses:
The conclusions feel a bit over-reaching; some strong theoretical claims are not fully supported, and the framing of prior literature is currently too narrow. A critical weakness lies in the inability to test a distinction between these findings (claiming to demonstrate that "feedback during saccade preparation must underlie this effect") and foveal feedback previously found during passive fixation (Williams et al., 2008). Discussions (and perhaps control analysis/experiments) about how these findings are specific to the saccade target and the temporal constraints on these effects are lacking. The relationship between the concepts of foveal prediction, foveal feedback, and predictive remapping needs more thorough treatment. The choice to use only 4 stimuli is justified in the manuscript, but remains an important limitation. The IPS results are intriguing but could be strengthened by additional control analysis. Finally, the manuscript claims the study was pre-registered ("detailing the hypotheses, methodology, and planned analyses prior to data collection"), but on the OSF link provided, there is just a brief summary paragraph, and the website says "there have been no completed registrations of this project".
We thank the reviewer for these helpful considerations. We agree that some of the claims were not sufficiently supported by the evidence, and in the revised manuscript, we added nuance to those claims (pg 8 ln 211-240). Furthermore, we now address more directly the distinction between foveal feedback during fixation and foveal feedback (foveal prediction) during saccade preparation. In particular, we now describe the literature about these two effects separately in the introduction (pg 2 ln 46-59), and we have added a new section in the discussion (“Foveal feedback during saccade preparation”) that more thoroughly explains why a passive fixation condition would have been unlikely to produce the same results we find (pg 8 ln 211-227). We also adapted the section about “Saccadic remapping or foveal prediction”, clearly delineating foveal prediction from feature remapping and predictive updating of attention pointers. As recommended by the reviewer, we conducted the parametric modulation analyses on the control condition, strengthening the claim that our findings are saccade-related. These results were added as Supplementary Figure 2 and are discussed in (pg 7 ln 190-191) and (pg 8 ln 224-227).
Lastly, we would like to apologize about a mistake we made with the pre-registration. We realized that the pre-registration had indeed not been submitted. We have now done so without changing the pre-registration itself, which can be seen from the recent activity of the preregistration (screenshot attached in the end). After consulting an open science expert at the University of Leipzig, we added a note of this mistake to the methods section of the revised manuscript (pg 10 ln 326-332). We could remove reference to this preregistration altogether, but would keep it at the discretion of the editor.
Specifics:
(1) In the eccentricity-dependent decoding results (Figure 2B), are there any statistical tests to support the results being a U-shaped curve? The dip isn't especially pronounced. Is 4 degrees lower than the further ones? Are there alternative methods of quantifying this (e.g., fitting it to a linear and quadratic function)?
We statistically tested the U-shaped relationship using a weighted quadratic regression, which showed significant positive curvature for decoding between fovea and periphery in all early visual areas (V1: t(27) = 3.98, p = 0.008, V2: t(27) = 3.03, p = 0.02, V3: t(27)= 2.776, p = 0.025, one-sided). We now report these results in the revised manuscript (pg 5 ln 137-138).
(2) In the parametric modulation analysis, the evidence for IPS being the only region showing stronger fovea vs peripheral beta values was weak, especially given the exploratory nature of this analysis. The raw beta value can reflect other things, such as global brain fluctuations or signal-to-noise ratio. I would also want to see the results of the same analysis performed on the control condition decoding results.
We appreciate the reviewer’s suggestion and repeated the same parametric modulation analysis on the control condition to assess the influence of potential confounds on the overall beta values (Supplementary Figure 2). The results show a negative association between foveal decoding and FEF and IPS (likely because eye movements in the control condition lead to less foveal presentation of the stimulus) and a positive association with LO. Peripheral decoding was not associated with significant changes in any of the ROIs, indicating that global brain fluctuations alone are not responsible for the effects reported in the experimental condition. The results of this analysis thus show a specific positive association of IPS activity with the experimental condition, not the control condition, which is in line with the idea that the foveal feedback effect reported in this study may be related to saccade preparation.
(3) Many of the claims feel overstated. There is an emphasis throughout the manuscript (including claims in the abstract) that these findings demonstrate foveal prediction, specifically that "image-specific feedback during saccade preparation must underlie this effect." To my understanding, one of the key aspects of the foveal prediction phenomenon that ties it closely to trans-saccadic stability is its specificity to the saccade target but not to other objects in the environment. However, it is not clear to what degree the observed findings are specific to saccade preparation and the peripheral saccade target. Should the observers be asked to make a saccade to another fixation location, or simply maintain passive fixation, will foveal retinotopic cortex similarly contain the object's identity information? Without these control conditions, the results are consistent with foveal prediction, but do not definitively demonstrate that as the cause, so claims need to be toned down.
We fully agree with the reviewer and toned down claims about foveal prediction. We engage with the questions raised by the reviewer more thoroughly in the new discussion section “Foveal feedback during saccade preparation”.
In addition, we agree that another condition in which subjects make a saccade towards a different location would have been a great addition that we also considered, but due to concerns with statistical power did not add. While including such a condition exceeds the scope of the current study, we included this limitation in the Discussion section (pg 10 ln 316) and hope that future studies will address this question.
(4) Another critical aspect is the temporal locus of the feedback signal. In the paradigm, the authors ensured that the saccade target object was never foveated via the gaze-contingent procedure and a conservative data exclusion criterion, thus enabling the test of feedback signals to foveal retinotopic cortex. However, due to the temporal sluggishness of fMRI BOLD signals, it is unclear when the feedback signal arrives at the foveal retinotopic cortex. In other words, it is possible that the feedback signal arrives after the eyes land at the saccade target location. This possibility is also bolstered by Chambers et al. (2013)'s TMS study, where they found that TMS to the foveal cortex at 350-400 ms SOA interrupts the peripheral discrimination task. The authors should qualify their claims of the results occurring "during saccade preparation" (e.g., pg 1 ln 22) throughout the manuscript, and discuss the importance of temporal dynamics of the effect in supporting stability across saccades.
We fully agree that the sluggishness of the fMRI signal presents an important challenge in investigating foveal feedback. We have now included this limitation in the discussion (pg 10 ln 306-318). We also clarify that our argument connects to previous studies investigating the temporal dynamics of foveal feedback using similar tasks (pg 10 ln 313-316). Specifically, in their psychophysical work, Kroell and Rolfs (2022) and (2025) showed that foveal feedback occurs before saccade execution with a peak around 80 ms before the eye movement.
(5) Relatedly, the claims that result in this paradigm reflect "activity exclusively related to predictive feedback" and "must originate from predictive rather than direct visual processes" (e.g., lines 60-65 and throughout) need to be toned down. The experimental design nicely rules out direct visual foveal stimulation, but predictive feedback is not the only alternative to that. The activation could also reflect mental imagery, visual working memory, attention, etc. Importantly, the experiment uses a block design, where the same exact image is presented multiple times over the block, and the activation is taken for the block as a whole. Thus, while at no point was the image presented at the fovea, there could still be more going on than temporally-specific and saccade-specific predictive feedback.
We agree that those claims could have misled the reader. Our intention was to state that the activation originates from feedback rather than direct foveal stimulation because of the nature of the design. We have now clarified these statements (pg 2 ln 65) and also included a discussion of other effects including imagery and working memory in the limitations section (pg 10 ln 306-313).
(6) The authors should avoid using the terms foveal feedback and foveal prediction interchangeably. To me, foveal feedback refers to the findings of Williams et al. (2008), where participants maintained passive fixation and discriminated objects in the periphery (see also Fan et al., 2016), whereas foveal prediction refers to the neural mechanism hypothesized by Kroell & Rolfs (2022), occurring before a saccade to the target object and contains task irrelevant feature information.
We agree, and we have now adopted a clearer distinction between these terms, referring to foveal prediction only when discussing the distinct predictive nature of the effect discovered by Kroell and Rolfs (2022). Otherwise we referred to this effect as foveal feedback.
(7) More broadly, the treatment of how foveal prediction relates to saccadic remapping is overly simplistic. The authors seem to be taking the perspective that remapping is an attentional phenomenon marked by remapping of only attentional/spatial pointers, but this is not the classic or widely accepted definition of remapping. Within the field of saccadic remapping, it is an ongoing debate whether (/how/where/when) information about stimulus content is remapped alongside spatial location (and also whether the attentional pointer concept is even neurophysiologically viable). This relationship between saccadic remapping and foveal prediction needs clarification and deeper treatment, in both the introduction and discussion.
We thank the reviewer for their remarks. We reformulated the discussion section on “Saccadic remapping or foveal prediction” to include the nuances about spatial and feature remapping laid out in the reviewer’s comment (pg 8-9 ln 241-269). We also put a stronger focus on the special role the fovea seems to be playing regarding the feedback of visual features (pg 8-9 ln 265-269).
(8) As part of this enhanced discussion, the findings should be better integrated with prior studies. E.g., there is some evidence for predictive remapping inducing integration of non-spatial features (some by the authors themselves; Harrison et al., 2013; Szinte et al., 2015). How do these findings relate to the observed results? Can the results simply be a special case of non-spatial feature integration between the currently attended and remapped location (fovea)? How are the results different from neurophysiological evidence for facilitation of the saccade target object's feature across the visual field (Burrow et al., 2014)? How might the results be reconciled with a prior fMRI study that failed to find decoding of stimulus content in remapped responses (Lescroart et al, 2016)? Might this reflect a difference between peripheral-to-peripheral vs peripheral-to-foveal remapping? A recent study by Chiu & Golomb (2025) provided supporting evidence for peripheral-to-fovea remapping (but not peripheral-to-peripheral remapping) of object-location binding (though in the post-saccadic time window), and suggested foveal prediction as the underlying mechanism.
We thank the reviewer for raising these intriguing questions. We now address them in the revised discussion. We argue that the findings by Harrison et al., 2013 and Szinte et al., 2015 of presaccadic integration of features across two peripheral locations can be explained by presaccadic updating of spatial attention pointers rather than remapping of feature information (pg 8 ln 248-253). The lack of evidence for periphery-to-periphery remapping (Lescroart et al, 2016) and the recent study by Chiu & Golomb (2025) showing object-location binding from periphery to fovea nicely align with our characterization of foveal processing as unique in predicting feature information of upcoming stimuli (pg 8-9 ln 265-269). Finally, we argue that the global (i.e., space-invariant) selection task-irrelevant saccadic target features (Burrows et al., 2014) is well-established at the neural level, but does not suffice to explain the spatially specific nature of foveal prediction (pg 8 ln 220-224). We now include these studies in the revised discussion section.
Reviewer #3 (Public review):
Summary:
In this paper, the authors used fMRI to determine whether peripherally viewed objects could be decoded from the foveal cortex, even when the objects themselves were never viewed foveally. Specifically, they investigated whether pre-saccadic target attributes (shape, semantic category) could be decoded from the foveal cortex. They found that object shape, but not semantic category, could be decoded, providing evidence that foveal feedback relies on low-mid-level information. The authors claim that this provides evidence for a mechanism underlying visual stability and object recognition across saccades.
Strengths:
I think this is another nice demonstration that peripheral information can be decoded from / is processed in the foveal cortex - the methods seem appropriate, and the experiments and analyses are carefully conducted, and the main results seem convincing. The paper itself was very clear and well-written.
We thank the reviewer for this positive evaluation of our work. As discussed in our response to Reviewer 1, we now elaborate on the differences between previous work showing decoding of peripheral information from foveal cortex from the effect shown here. While there are important similarities between these findings, foveal prediction in our study occurs in a saccade condition and in the absence of a task that is specific to stimulus features.
Weaknesses:
There are a couple of reasons why I think the main theoretical conclusions drawn from the study might not be supported, and why a more thorough investigation might be needed to draw these conclusions.
(1) The authors used a blocked design, with each object being shown repeatedly in the same block. This meant that the stimulus was entirely predictable on each block, which weakens the authors' claims about this being a predictive mechanism that facilitates object recognition - if the stimulus is 100% predictable, there is no aspect of recognition or discrimination actually being tested. I think to strengthen these claims, an experiment would need to have unpredictable stimuli, and potentially combine behavioural reports with decoding to see whether this mechanism can be linked to facilitating object recognition across saccades.
We appreciate the reviewer’s point and would like to highlight that it was not our intention to claim a behavioral effect on object recognition. We believe that an ambiguous formulation in the original abstract may have been interpreted this way, and we thus removed this reference. We also speculated in our Discussion that a potential reason for foveal prediction could be a headstart in peripheral object recognition and in the revised manuscript more clearly highlight that this is a potential future direction only.
(2) Given that foveal feedback has been found in previous studies that don't incorporate saccades, how is this a mechanism that might specifically contribute to stability across saccades, rather than just being a general mechanism that aids the processing/discrimination of peripherally-viewed stimuli? I don't think this paper addresses this point, which would seem to be crucial to differentiate the results from those of previous studies.
We fully agree that this point had not been sufficiently addressed in the previous version of the manuscript. As described in our responses to similar comments from reviewers 1 and 2, we included an additional section in the Discussion (“Foveal feedback during saccade preparation”) to more clearly delineate the present study from previous findings of foveal feedback. Previous studies (Williams et al., 2008) only found foveal feedback during narrow discrimination tasks related to spatial features of the target stimulus, not during color-discrimination or fixation-only tasks, concluding that the observed effect must be related to the discrimination behavior. In contrast, we found foveal feedback (as evidenced by decoding of target features) during a saccade condition that was independent of the target features, suggesting a different role of foveal feedback than hypothesized by Williams et al. (2008).
Recommendations for the authors:
Reviewer #2 (Recommendations for the authors):
(A) Minor comments:
(1) The task should be clarified earlier in the manuscript.
We now characterise the task in the abstract and clarified its description in the third paragraph, right after introducing the main literature.
(2) Is there actually only 0.5 seconds between saccades? This feels very short/rushed.
The inter-trial-interval was 0.5 seconds, though effectively it varied because the target only appeared once participants fixated on the fixation dot. Note that this pacing is slower than the rate of saccades in natural vision (about 3 to 4 saccades per second).Participants did not report this paradigm as rushed.
(3) Typo on pg2 ln64 (whooe).
Fixed.
(4) Can the authors also show individual data points for Figures 3 and 4?
We added individual data points for Figures 4 and S2
(5) The MNI coordinates on Figure 4A seem to be incorrect.
We took out those coordinates.
(6) Pg4 ln126 and pg6 ln194, why cite Williams et al. (2008)?
We included this reference here to acknowledge that Williams et al. raised the same issues. We added a “cf.” before this reference to clarify this.
(7) Pg7 ln207 Fabius et al. (2020) showed slow post-saccadic feature remapping, rather than predictive remapping of spatial attention.
We have corrected this mistake.
(8) The OSF link is valid, but I couldn't find a pre-registration.
The issue with the OSF link has been resolved. The pre-registration had been set up but not published. We now published it without changing the original pre-registration (see the screenshot attached).
(9) I couldn't access the OpenNeuro repository.
The issue with the OpenNeuro link has been resolved.
(B) Additional references you may wish to include:
(1) Burrows, B. E., Zirnsak, M., Akhlaghpour, H., Wang, M., & Moore, T. (2014). Global selection of saccadic target features by neurons in area v4. Journal of Neuroscience.
(2) Chambers, C. D., Allen, C. P., Maizey, L., & Williams, M. A. (2013). Is delayed foveal feedback critical for extra-foveal perception?. Cortex.
(3) Chiu, T. Y., & Golomb, J. D. (2025). The influence of saccade target status on the reference frame of object-location binding. Journal of Experimental Psychology. General.
(4) Harrison, W. J., Retell, J. D., Remington, R. W., & Mattingley, J. B. (2013). Visual crowding at a distance during predictive remapping. Current Biology.
(5) Lescroart, M. D., Kanwisher, N., & Golomb, J. D. (2016). No evidence for automatic remapping of stimulus features or location found with fMRI. Frontiers in Systems Neuroscience.
(6) Moran, C., Johnson, P. A., Hogendoorn, H., & Landau, A. N. (2025). The representation of stimulus features during stable fixation and active vision. Journal of Neuroscience.
(7) Szinte, M., Jonikaitis, D., Rolfs, M., Cavanagh, P., & Deubel, H. (2016). Presaccadic motion integration between current and future retinotopic locations of attended objects. Journal of Neurophysiology.
We thank the reviewer for pointing out these references. We have included them in the revised version of the manuscript.
Reviewer #3 (Recommendations for the authors):
I just have a few minor points where I think some clarifications could be made.
(1) Line 64 - "whooe" should be "whoose" I think.
Fixed.
(2) Around line 53 - you might consider citing this review on foveal feedback - https://doi.org/10.1167/jov.20.12.2
We included the reference (pg 2 ln 55).
(3) Line 129 - you mention a u-shaped relationship for decoding - I wasn't quite sure of the significance/relevance of this relationship - it would be helpful to expand on this / clarify what this means.
We have expanded this section and added statistical tests of the u-shaped relationship in decoding using a weighted quadratic regression. We found significant positive curvature in all early visual areas between fovea and periphery (V1: t(27) = 3.98, p = 0.008, V2: t(27) = 3.03, p = 0.02, V3: t(27)= 2.776, p = 0.025). These findings support a u-shaped relationship. We now report these results in the revised manuscript (pg 5 ln 137-138).
(4) Figure 1 - it would be helpful to indicate how long the target was viewed in the "stim on" panels - I assume it was for the saccade latency, but it would be good to include those values in the main text.
We included that detail in the text (pg 3 ln 96-97).
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
The development of glmSMA represents a valuable advancement in spatial transcriptomics analysis, offering a mathematically robust regression-based approach that achieves higher-resolution mapping of single-cell RNA sequencing data to spatial locations than existing methods. The evidence is convincing, as the authors demonstrate the method's superiority by formulating it as a convex optimization problem that ensures stable solutions, coupled with successful validation across multiple biological systems. The rigorous mathematical framework and validation across diverse tissues enable precise spatial mapping of cellular heterogeneity at enhanced resolution.
-
Reviewer #2 (Public review):
Summary:
The author proposes a novel method for mapping single-cell data to specific locations with higher resolution than several existing tools.
Strengths:
The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus.
Comments on revised version:
I have no additional comments regarding the current version of the manuscript.
-
Reviewer #3 (Public review):
Summary:
The authors have provided a thorough and constructive response to the comments. They effectively addressed concerns regarding the dependence on marker gene selection by detailing the incorporation of multiple feature selection strategies, such as highly variable genes and spatially informative markers (e.g., via Moran's I), which enhance glmSMA's robustness even when using gene-limited reference atlases.
Furthermore, the authors thoughtfully acknowledged the assumption underlying glmSMA-that transcriptionally similar cells are spatially proximal-and discussed both its limitations and empirical robustness in heterogeneous tissues such as human PDAC. Their use of real-world, heterogeneous datasets to validate this assumption demonstrates the method's practical utility and adaptability.
Overall, the response appropriately contextualizes the limitations while reinforcing the generalizability and performance of glmSMA. The authors' clarifications and experimental justifications strengthen the manuscript and address the reviewer's concerns in a scientifically sound and transparent manner.
Comments on revised version:
Figure 1 does not yet clearly convey what the glmSMA algorithm actually does. I recommend revising or redesigning the figure so that the workflow, main inputs, and outputs of the algorithm are more intuitively presented. A clearer visual explanation would help readers quickly grasp the core concept and contribution of glmSMA.
-
Author response:
The following is the authors’ response to the previous reviews.
Reviewer #1
(1) Related to comment 3, related to the spatial communication section, either provide a clearer worked example or adjust the framing to avoid implying a more developed capability than is shown.
We appreciate the reviewer’s feedback regarding the framing of the spatial communication section. We have removed this section from the revised version.
(2) Related to comment 4 about resolution, consider including explicit numerical estimates of spatial resolution (e.g., median patch diameter in micrometers) for at least one dataset to help users understand practical mapping granularity.
We appreciate the suggestion. We have added explicit numerical estimates of spatial resolution to clarify our mappings. Specifically, we now (i) define “patch” precisely and (ii) report the median patch diameter (in µm) for representative datasets:
10x Visium (mouse cortex): spot diameter = 55 µm; center-to-center spacing = 100 µm.
Slide-seqV2 (mouse brain): bead diameter ≈ 10 µm. When we optionally coarse-grain to 5×5 bead tiles for robustness, the effective patch diameter is ~50 µm
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This valuable study investigates the relationship between pupil dilation and information gain during associative learning, using two different tasks. A key strength of this study is its exploration of pupil dilation beyond the immediate response period, extending analysis to later time windows after feedback, and it provides convincing evidence that pupillary response to information gain may be context-dependent during associative learning. The interpretation remains limited by task heterogeneity and unresolved contextual factors influencing pupil dynamics, but a range of interesting ideas are discussed.
-
Reviewer #1 (Public review):
Summary:
This study examines whether changes in pupil size index prediction-error-related updating during associative learning, formalised as information gain via Kullback-Leibler (KL) divergence. Across two independent tasks, pupil responses scaled with KL divergence shortly after feedback, with the timing and direction of the response varying by task. Overall, the work supports the view that pupil size reflects information-theoretic processes in a context-dependent manner.
Strengths:
This study provides a novel and convincing contribution by linking pupil dilation to information-theoretic measures, such as KL divergence, supporting Zénon's hypothesis that pupil responses reflect information gain during learning. The robust methodology, including two independent datasets with distinct task structures, enhances the reliability and generalisability of the findings. By carefully analysing early and late time windows, the authors capture the timing and direction of prediction-error-related responses, offering new insights into the temporal dynamics of model updating. The use of an ideal-learner framework to quantify prediction errors, surprise, and uncertainty provides a principled account of the computational processes underlying pupil responses. The work also highlights the critical role of task context in shaping the direction and magnitude of these effects, revealing the adaptability of predictive processing mechanisms. Importantly, the conclusions are supported by rigorous control analyses and preprocessing sanity checks, as well as convergent results from frequentist and Bayesian linear mixed-effects modelling approaches.
Weaknesses:
Some aspects of directionality remain context-dependent, and on current evidence cannot be attributed specifically to whether average uncertainty increases or decreases across trials. Differences between the two tasks (e.g., sensory modality and learning regime) limit direct comparisons of effect direction and make mechanistic attribution cautious. In addition, subjective factors such as confidence were not measured and could influence both prediction-error signals and pupil responses. Importantly, the authors explicitly acknowledge these limitations, and the manuscript clearly frames them as areas for future work rather than settled conclusions.
-
Reviewer #2 (Public review):
Summary:
The authors investigate whether pupil dilation reflects information gain during associative learning, formalised as Kullback-Leibler divergence within an ideal observer framework. They examine pupil responses in a late time window after feedback and compare these to information-theoretic estimates (information gain, surprise, and entropy) derived from two different tasks with contrasting uncertainty dynamics.
Strength:
The exploration of task evoked pupil dynamics beyond the immediate response/feedback period and then associating them with model estimates was interesting and inspiring. This offered a new perspective on the relationship between pupil dilation and information processing.
Weakness:
However, the interpretability of the findings remains constrained by the fundamental differences between the two tasks (stimulus modality, feedback type, and learning structure), which confound the claimed context-dependent effects. The later time-window pupil effects, although intriguing, are small in magnitude and may reflect residual noise or task-specific arousal fluctuations rather than distinct information-processing signals. Thus, while the study offers valuable methodological insight and contributes to ongoing debates about the role of the pupil in cognitive inference, its conclusions about the functional significance of late pupil responses should be treated with caution.
-
Reviewer #3 (Public review):
Summary:
Thank you for inviting me to review this manuscript entitled "Pupil dilation offers a time-window on prediction error" by Colizoli and colleagues. The study examines prediction errors, information gain (Kullback-Leibler [KL] divergence), and uncertainty (entropy) from an information-theory perspective using two experimental tasks and pupillometry. The authors aim to test a theoretical proposal by Zénon (2019) that the pupil response reflects information gain (KL divergence). The conclusion of this work is that (post-feedback) pupil dilation in response to information gain is context dependent.
Strengths:
Use of an established Bayesian model to compute KL divergence and entropy.
Pupillometry data preprocessing and multiple robustness checks.
Weaknesses:
Operationalization of prediction errors based on frequency, accuracy, and their interaction:
The authors rely on a more model-agnostic definition of the prediction error in terms of stimulus frequency ("unsigned prediction error"), accuracy, and their interaction ("signed prediction error"). While I see the point, I would argue that this approach provides a simple approximation of the prediction error, but that a model-based approach would be more appropriate.
Model validation:
My impression is that the ideal learner model should work well in this case. However, the authors don't directly compare model behavior to participant behavior ("posterior predictive checks") to validate the model. Therefore, it is currently unclear if the model-derived terms like KL divergence and entropy provide reasonable estimates for the participant data.
Lack of a clear conclusion:
The authors conclude that this study shows for the first time that (post-feedback) pupil dilation in response to information gain is context dependent. However, the study does not offer a unifying explanation for such context dependence. The discussion is quite detailed with respect to task-specific effects, but fails to provide an overarching perspective on the context-dependent nature of pupil signatures of information gain. This seems to be partly due to the strong differences between the experimental tasks.
-
Author response:
The following is the authors’ response to the current reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary:
This study examines whether changes in pupil size index prediction-error-related updating during associative learning, formalised as information gain via Kullback-Leibler (KL) divergence. Across two independent tasks, pupil responses scaled with KL divergence shortly after feedback, with the timing and direction of the response varying by task. Overall, the work supports the view that pupil size reflects information-theoretic processes in a context-dependent manner.
Strengths:
This study provides a novel and convincing contribution by linking pupil dilation to informationtheoretic measures, such as KL divergence, supporting Zénon's hypothesis that pupil responses reflect information gain during learning. The robust methodology, including two independent datasets with distinct task structures, enhances the reliability and generalisability of the findings. By carefully analysing early and late time windows, the authors capture the timing and direction of prediction-error-related responses, oPering new insights into the temporal dynamics of model updating. The use of an ideal-learner framework to quantify prediction errors, surprise, and uncertainty provides a principled account of the computational processes underlying pupil responses. The work also highlights the critical role of task context in shaping the direction and magnitude of these ePects, revealing the adaptability of predictive processing mechanisms. Importantly, the conclusions are supported by rigorous control analyses and preprocessing sanity checks, as well as convergent results from frequentist and Bayesian linear mixed-ePects modelling approaches.
Weaknesses:
Some aspects of directionality remain context-dependent, and on current evidence cannot be attributed specifically to whether average uncertainty increases or decreases across trials. DiPerences between the two tasks (e.g., sensory modality and learning regime) limit direct comparisons of ePect direction and make mechanistic attribution cautious. In addition, subjective factors such as confidence were not measured and could influence both predictionerror signals and pupil responses. Importantly, the authors explicitly acknowledge these limitations, and the manuscript clearly frames them as areas for future work rather than settled conclusions.
Reviewer #2 (Public review):
Summary:
The authors investigate whether pupil dilation reflects information gain during associative learning, formalised as Kullback-Leibler divergence within an ideal observer framework. They examine pupil responses in a late time window after feedback and compare these to informationtheoretic estimates (information gain, surprise, and entropy) derived from two diPerent tasks with contrasting uncertainty dynamics.
Strength:
The exploration of task evoked pupil dynamics beyond the immediate response/feedback period and then associating them with model estimates was interesting and inspiring. This oPered a new perspective on the relationship between pupil dilation and information processing.
Weakness:
However, the interpretability of the findings remains constrained by the fundamental diPerences between the two tasks (stimulus modality, feedback type, and learning structure), which confound the claimed context-dependent ePects. The later time-window pupil ePects, although intriguing, are small in magnitude and may reflect residual noise or task-specific arousal fluctuations rather than distinct information-processing signals. Thus, while the study oPers valuable methodological insight and contributes to ongoing debates about the role of the pupil in cognitive inference, its conclusions about the functional significance of late pupil responses should be treated with caution.
Reviewer #3 (Public review):
Summary:
Thank you for inviting me to review this manuscript entitled "Pupil dilation oPers a time-window on prediction error" by Colizoli and colleagues. The study examines prediction errors, information gain (Kullback-Leibler [KL] divergence), and uncertainty (entropy) from an information-theory perspective using two experimental tasks and pupillometry. The authors aim to test a theoretical proposal by Zénon (2019) that the pupil response reflects information gain (KL divergence). The conclusion of this work is that (post-feedback) pupil dilation in response to information gain is context dependent.
Strengths:
Use of an established Bayesian model to compute KL divergence and entropy.
Pupillometry data preprocessing and multiple robustness checks.
Weaknesses:
Operationalization of prediction errors based on frequency, accuracy, and their interaction:
The authors rely on a more model-agnostic definition of the prediction error in terms of stimulus frequency ("unsigned prediction error"), accuracy, and their interaction ("signed prediction error"). While I see the point, I would argue that this approach provides a simple approximation of the prediction error, but that a model-based approach would be more appropriate.
Model validation:
My impression is that the ideal learner model should work well in this case. However, the authors don't directly compare model behavior to participant behavior ("posterior predictive checks") to validate the model. Therefore, it is currently unclear if the model-derived terms like KL divergence and entropy provide reasonable estimates for the participant data.
Lack of a clear conclusion:
The authors conclude that this study shows for the first time that (post-feedback) pupil dilation in response to information gain is context dependent. However, the study does not oPer a unifying explanation for such context dependence. The discussion is quite detailed with respect to taskspecific ePects, but fails to provide an overarching perspective on the context-dependent nature of pupil signatures of information gain. This seems to be partly due to the strong diPerences between the experimental tasks.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
I highly appreciate the care and detail in the authors' response and thank them for the ePort invested in revising the manuscript. They addressed the core concerns to a high standard, and the manuscript has substantially improved in methodological rigour (through additional controls/sanity checks and complementary mixed-ePects analyses) and in clarity of interpretation (by explicitly acknowledging context-dependence and tempering stronger claims). The present version reads clearly and is much strengthened overall. I only have a few minor points below:
Minor suggestions:
Abstract:
In the abstract KL is introduced as abbreviation, but at first occurence it should be written out as "Kullback-Leibler (KL)" for readers not familiar with it.
We thank the reviewer for catching this error. It has been correct in the version of record.
Methods:
I appreciate the additional bayesian LME analysis. I only had a few things that I thought were missing from knowing the parameters: 1) what was the target acceptance rate (default of .95?), 2) which family was used to model the response distribution: (default) "gaussian" or robust "student-t"? Depending on the data a student-t would be preferred, but since the author's checked the fit & the results corroborate the correlation analysis, using the default would also be fine! Just add the information for completeness.
Thank you for bringing this to our attention. We have now noted that default parameters were used in all cases unless otherwise mentioned.
Thank you once again for your time and consideration.
Reviewer #2 (Recommendations for the authors):
Thanks to the authors' ePort on revision. I am happy with this new version of manuscript.
Thank you once again for your time and consideration.
Reviewer #3 (Recommendations for the authors):
(1) Regarding comments #3 and #6 (first round) on model validation and posterior predictive checks, the authors replied that since their model is not a "generative" one, they can't perform posterior predictive checks. Crucially, in eq. 2, the authors present the p{tilde}^j_k variable denoting the learned probability of event k on trial j. I don't see why this can't be exploited for simulations. In my opinion, one could (and should) generate predictions based on this variable. The simplest implementation would translate the probability into a categorical choice (w/o fitting any free parameter). Based on this, they could assess whether the model and data are comparable.
We thank the reviewer for this clarification. The reviewer suggests using the probability distributions at each trial to predict which event should be chosen on each trial. More specifically, the event(s) with the highest probability on trial j could be used to generate a prediction for the choice of the participant on trial j. We agree that this would indeed be an interesting analysis. However, the response options of each task are limited to two-alternatives. In the cue-target task, four events are modeled (representing all possible cue-target conditions) while the participants’ response options are only “left” and “right”. Similarly, in the letter-color task, 36 events are modeled while the participants’ response options are “match” and “no-match”. In other words, we do not know which event (either four or 36, for the two tasks) the participant would have indicated on each trial. As an approximation to this fine-grained analysis, we investigated the relationship between the information-theoretic variables separately for error and correct trials. Our rationale was that we would have more insight into how the model fits depended on the participants’ actual behavior as compared with the ideal learner model.
(2) I recommend providing a plot of the linear mixed model analysis of the pupil data. Currently, results are only presented in the text and tables, but a figure would be much more useful.
We thank the reviewer for the suggestion to add a plot of the linear mixed model results. We appreciate the value of visualizing model estimates; however, we feel that the current presentation in the text and tables clearly conveys the relevant findings. For this reason, and to avoid further lengthening the manuscript, we prefer to retain the current format.
(3) I would consider only presenting the linear mixed ePects for the pupil data in the main results, and the correlation results in the supplement. It is currently quite long.
We thank the reviewer for this recommendation. We agree that the results section is detailed; however, we consider the correlation analyses to be integral to the interpretation of the pupil data and therefore prefer to keep them in the main text rather than move them to the supplement.
The following is the authors’ response to the original reviews
eLife Assessment
This important study seeks to examine the relationship between pupil size and information gain, showing opposite effects dependent upon whether the average uncertainty increases or decreases across trials. Given the broad implications for learning and perception, the findings will be of broad interest to researchers in cognitive neuroscience, decision-making, and computational modelling. Nevertheless, the evidence in support of the particular conclusion is at present incomplete - the conclusions would be strengthened if the authors could both clarify the differences between model-updating and prediction error in their account and clarify the patterns in the data.
Public Reviews:
Reviewer #1 (Public review):
Summary:
This study investigates whether pupil dilation reflects prediction error signals during associative learning, defined formally by Kullback-Leibler (KL) divergence, an information-theoretic measure of information gain. Two independent tasks with different entropy dynamics (decreasing and increasing uncertainty) were analyzed: the cue-target 2AFC task and the lettercolor 2AFC task. Results revealed that pupil responses scaled with KL divergence shortly after feedback onset, but the direction of this relationship depended on whether uncertainty (entropy) increased or decreased across trials. Furthermore, signed prediction errors (interaction between frequency and accuracy) emerged at different time windows across tasks, suggesting taskspecific temporal components of model updating. Overall, the findings highlight that pupil dilation reflects information-theoretic processes in a complex, context-dependent manner.
Strengths:
This study provides a novel and convincing contribution by linking pupil dilation to informationtheoretic measures, such as KL divergence, supporting Zénon's hypothesis that pupil responses reflect information gained during learning. The robust methodology, including two independent datasets with distinct entropy dynamics, enhances the reliability and generalisability of the findings. By carefully analysing early and late time windows, the authors capture the temporal dynamics of prediction error signals, offering new insights into the timing of model updates. The use of an ideal learner model to quantify prediction errors, surprise, and entropy provides a principled framework for understanding the computational processes underlying pupil responses. Furthermore, the study highlights the critical role of task context - specifically increasing versus decreasing entropy - in shaping the directionality and magnitude of these effects, revealing the adaptability of predictive processing mechanisms.
Weaknesses:
While this study offers important insights, several limitations remain. The two tasks differ significantly in design (e.g., sensory modality and learning type), complicating direct comparisons and limiting the interpretation of differences in pupil dynamics. Importantly, the apparent context-dependent reversal between pupil constriction and dilation in response to feedback raises concerns about how these opposing effects might confound the observed correlations with KL divergence.
We agree with the reviewer’s concerns and acknowledge that the speculation concerning the directional effect of entropy across trials can not be fully substantiated by the current study. As the reviewer points out, the directional relationship between pupil dilation and information gain must be due to other factors, for instance, the sensory modality, learning type, or the reversal between pupil constriction and dilation across the two tasks. Also, we would like to note that ongoing experiments in our lab already contradict our original speculation. In line with the reviewer’s point, we noted these differences in the section on “Limitations and future research” in the Discussion. To better align the manuscript with the above mentioned points, we have made several changes in the Abstract, Introduction and Discussion summarized below:
We have removed the following text from the Abstract and Introduction: “…, specifically related to increasing or decreasing average uncertainty (entropy) across trials.”
We have edited the following text in the Introduction (changes in italics) (p. 5):
“We analyzed two independent datasets featuring distinct associative learning paradigms, one characterized by increasing entropy and the other by decreasing entropy as the tasks progressed. By examining these different tasks, we aimed to identify commonalities (if any) in the results across varying contexts. Additionally, the contrasting directions of entropy in the two tasks enabled us to disentangle the correlation between stimulus-pair frequency and information gain in the postfeedback pupil response.
We have removed the following text from the Discussion:
“…and information gain in fact seems to be driven by increased uncertainty.”
“We speculate that this difference in the direction of scaling between information gain and the pupil response may depend on whether entropy was increasing or decreasing across trials.”
“…which could explain the opposite direction of the relationship between pupil dilation and information gain”
“… and seems to relate to the direction of the entropy as learning progresses (i.e., either increasing or decreasing average uncertainty).”
We have edited the following texts in the Discussion (changes in italics):
“For the first time, we show that the direction of the relationship between postfeedback pupil dilation and information gain (defined as KL divergence) was context dependent.” (p. 29):
Finally, we have added the following correction to the Discussion (p. 30):
“Although it is tempting to speculate that the direction of the relationship between pupil dilation and information gain may be due to either increasing or decreasing entropy as the task progressed, we must refrain from this conclusion. We note that the two tasks differ substantially in terms of design with other confounding variables and therefore cannot be directly compared to one another. We expand on these limitations in the section below (see Limitations and future research).”
Finally, subjective factors such as participants' confidence and internal belief states were not measured, despite their potential influence on prediction errors and pupil responses.
Thank you for the thoughtful comment. We agree with the reviewer that subjective factors, such as participants' confidence, can be important in understanding prediction errors and pupil responses. As per the reviewer’s point, we have included the following limitation in the Discussion (p. 33):
“Finally, while we acknowledge the potential relevance of subjective factors, such as the participants’ overt confidence reports, in understanding prediction errors and pupil responses, the current study focused on the more objective, model-driven measure of information-theoretic variables. This approach aligns with our use of the ideal learner model, which estimates information-theoretic variables while being agnostic about the observer's subjective experience itself. Future research is needed to explore the relationship between information-gain signals in pupil dilation and the observer’s reported experience of or awareness about confidence in their decisions.”
Reviewer #2 (Public review):
Summary:
The authors proposed that variability in post-feedback pupillary responses during the associative learning tasks can be explained by information gain, which is measured as KL divergence. They analysed pupil responses in a later time window (2.5s-3s after feedback onset) and correlated them with information-theory-based estimates from an ideal learner model (i.e., information gain-KL divergence, surprise-subjective probability, and entropy-average uncertainty) in two different associative decision-making tasks.
Strength:
The exploration of task-evoked pupil dynamics beyond the immediate response/feedback period and then associating them with model estimates was interesting and inspiring. This offered a new perspective on the relationship between pupil dilation and information processing.
Weakness:
However, disentangling these later effects from noise needs caution. Noise in pupillometry can arise from variations in stimuli and task engagement, as well as artefacts from earlier pupil dynamics. The increasing variance in the time series of pupillary responses (e.g., as shown in Figure 2D) highlights this concern.
It's also unclear what this complicated association between information gain and pupil dynamics actually means. The complexity of the two different tasks reported made the interpretation more difficult in the present manuscript.
We share the reviewer’s concerns. To make this point come across more clearly, we have added the following text to the Introduction (p. 5):
“The current study was motivated by Zenon’s hypothesis concerning the relationship between pupil dilation and information gain, particularly in light of the varying sources of signal and noise introduced by task context and pupil dynamics. By demonstrating how task context can influence which signals are reflected in pupil dilation, and highlighting the importance of considering their temporal dynamics, we aim to promote a more nuanced and model-driven approach to cognitive research using pupillometry.”
Reviewer #3 (Public review):
Summary:
This study examines prediction errors, information gain (Kullback-Leibler [KL] divergence), and uncertainty (entropy) from an information-theory perspective using two experimental tasks and pupillometry. The authors aim to test a theoretical proposal by Zénon (2019) that the pupil response reflects information gain (KL divergence). In particular, the study defines the prediction error in terms of KL divergence and speculates that changes in pupil size associated with KL divergence depend on entropy. Moreover, the authors examine the temporal characteristics of pupil correlates of prediction errors, which differed considerably across previous studies that employed different experimental paradigms. In my opinion, the study does not achieve these aims due to several methodological and theoretical issues.
Strengths:
(1) Use of an established Bayesian model to compute KL divergence and entropy.
(2) Pupillometry data preprocessing, including deconvolution.
Weaknesses:
(1) Definition of the prediction error in terms of KL divergence:
I'm concerned about the authors' theoretical assumption that the prediction error is defined in terms of KL divergence. The authors primarily refer to a review article by Zénon (2019): "Eye pupil signals information gain". It is my understanding that Zénon argues that KL divergence quantifies the update of a belief, not the prediction error: "In short, updates of the brain's internal model, quantified formally as the Kullback-Leibler (KL) divergence between prior and posterior beliefs, would be the common denominator to all these instances of pupillary dilation to cognition." (Zénon, 2019).
From my perspective, the update differs from the prediction error. Prediction error refers to the difference between outcome and expectation, while update refers to the difference between the prior and the posterior. The prediction error can drive the update, but the update is typically smaller, for example, because the prediction error is weighted by the learning rate to compute the update. My interpretation of Zénon (2019) is that they explicitly argue that KL divergence defines the update in terms of the described difference between prior and posterior, not the prediction error.
The authors also cite a few other papers, including Friston (2010), where I also could not find a definition of the prediction error in terms of KL divergence. For example [KL divergence:] "A non-commutative measure of the non-negative difference between two probability distributions." Similarly, Friston (2010) states: Bayesian Surprise - "A measure of salience based on the Kullback-Leibler divergence between the recognition density (which encodes posterior beliefs) and the prior density. It measures the information that can be recognized in the data." Finally, also in O'Reilly (2013), KL divergence is used to define the update of the internal model, not the prediction error.
The authors seem to mix up this common definition of the model update in terms of KL divergence and their definition of prediction error along the same lines. For example, on page 4: "KL divergence is a measure of the difference between two probability distributions. In the context of predictive processing, KL divergence can be used to quantify the mismatch between the probability distributions corresponding to the brain's expectations about incoming sensory input and the actual sensory input received, in other words, the prediction error (Friston, 2010; Spratling, 2017)."
Similarly (page 23): "In the current study, we investigated whether the pupil's response to decision outcome (i.e., feedback) in the context of associative learning reflects a prediction error as defined by KL divergence."
This is problematic because the results might actually have limited implications for the authors' main perspective (i.e., that the pupil encodes prediction errors) and could be better interpreted in terms of model updating. In my opinion, there are two potential ways to deal with this issue:
(a) Cite work that unambiguously supports the perspective that it is reasonable to define the prediction error in terms of KL divergence and that this has a link to pupillometry. In this case, it would be necessary to clearly explain the definition of the prediction error in terms of KL divergence and dissociate it from the definition in terms of model updating.
(b) If there is no prior work supporting the authors' current perspective on the prediction error, it might be necessary to revise the entire paper substantially and focus on the definition in terms of model updating.
We thank the reviewer for pointy out these inconsistencies in the manuscript and appreciate their suggestions for improvement. We take approach (a) recommended by the reviewer, and provide our reasoning as to why prediction error signals in pupil dilation are expected to correlate with information gain (defined as the KL divergence between posterior and prior belief distributions). This can be found in a new section in the introduction, copied here for convenience (p. 3-4):
“We reasoned that the link between prediction error signals and information gain in pupil dilation is through precision-weighting. Precision refers to the amount of uncertainty (inverse variance) of both the prior belief and sensory input in the prediction error signals [6,64–67]. More precise prediction errors receive more weighting, and therefore, have greater influence on model updating processes. The precisionweighting of prediction error signals may provide a mechanism for distinguishing between known and unknown sources of uncertainty, related to the inherent stochastic nature of a signal versus insufficient information of the part of the observer, respectively [65,67,68]. In Bayesian frameworks, information gain is fundamentally linked to prediction error, modulated by precision [65,66,69–75]. In non-hierarchical Bayesian models, information gain can be derived as a function of prediction errors and the precision of the prior and likelihood distributions, a relationship that can be approximately linear [70]. In hierarchical Bayesian inference, the update in beliefs (posterior mean changes) at each level is proportional to the precision-weighted prediction error; this update encodes the information gained from new observations [65,66,69,71,72]. Neuromodulatory arousal systems are well-situated to act as precision-weighting mechanisms in line with predictive processing frameworks [76,77]. Empirical evidence suggests that neuromodulatory systems broadcast precisionweighted prediction errors to cortical regions [11,59,66,78]. Therefore, the hypothesis that feedback-locked pupil dilation reflects a prediction error signal is similarly in line with Zenon’s main claim that pupil dilation generally reflects information gain, through precision-weighting of the prediction error. We expected a prediction error signal in pupil dilation to be proportional to the information gain.”
We have referenced previous work that has linked prediction error and information gain directly (p. 4): “The KL divergence between posterior and prior belief distributions has been previously considered to be a proxy of (precision-weighted) prediction errors [68,72].”
We have taken the following steps to remedy this error of equating “prediction error” directly with the information gain.
First, we have replaced “KL divergence” with “information gain” whenever possible throughout the manuscript for greater clarity.
Second, we have edited the section in the introduction defining information gain substantially (p. 4):
“Information gain can be operationalized within information theory as the KullbackLeibler (KL) divergence between the posterior and prior belief distributions of a Bayesian observer, representing a formalized quantity that is used to update internal models [29,79,80]. Itti and Baldi (2005)81 termed the KL divergence between posterior and prior belief distributions as “Bayesian surprise” and showed a link to the allocation of attention. The KL divergence between posterior and prior belief distributions has been previously considered to be a proxy of (precision-weighted) prediction errors[68,72]. According to Zénon’s hypothesis, if pupil dilation reflects information gain during the observation of an outcome event, such as feedback on decision accuracy, then pupil size will be expected to increase in proportion to how much novel sensory evidence is used to update current beliefs [29,63]. ”
Finally, we have made several minor textual edits to the Abstract and main text wherever possible to further clarify the proposed relationship between prediction errors and information gain.
(2) Operationalization of prediction errors based on frequency, accuracy, and their interaction:
The authors also rely on a more model-agnostic definition of the prediction error in terms of stimulus frequency ("unsigned prediction error"), accuracy, and their interaction ("signed prediction error"). While I see the point here, I would argue that this approach offers a simple approximation to the prediction error, but it is possible that factors like difficulty and effort can influence the pupil signal at the same time, which the current approach does not take into account. I recommend computing prediction errors (defined in terms of the difference between outcome and expectation) based on a simple reinforcement-learning model and analyzing the data using a pupillometry regression model in which nuisance regressors are controlled, and results are corrected for multiple comparisons.
We agree with the reviewer’s suggestion that alternatively modeling the data in a reinforcement learning paradigm would be fruitful. We adopted the ideal learner model as we were primarily focused on Information Theory, stemming from our aim to test Zenon’s hypothesis that information gain drives pupil dilation. However, we agree with the reviewer that it is worthwhile to pursue different modeling approaches in future work. We have now included a complementary linear mixed model analysis in which we controlled for the effects of the information-theoretic variables on one another, while also including the nuisance regressors of pre-feedback baseline pupil dilation and reaction times (explained in more detail below in our response to your point #4). Results including correction for multiple comparisons was reported for all pupil time course data as detailed in Methods section 2.5.
(3) The link between model-based (KL divergence) and model-agnostic (frequency- and accuracy-based) prediction errors:
I was expecting a validation analysis showing that KL divergence and model-agnostic prediction errors are correlated (in the behavioral data). This would be useful to validate the theoretical assumptions empirically.
The model limitations and the operalization of prediction error in terms of post-feedback processing do not seem to allow for a comparison of information gain and model-agnostic prediction errors in the behavioral data for the following reasons. First, the simple ideal learner model used here is not a generative model, and therefore, cannot replicate or simulate the participants responses (see also our response to your point #6 “model validation” below). Second, the behavioral dependent variables obtained are accuracy and reaction times, which both occur before feedback presentation. While accuracy and reaction times can serve as a marker of the participant’s (statistical) confidence/uncertainty following the decision interval, these behavioral measures cannot provide access to post-feedback information processing. The pupil dilation is of interest to us because the peripheral arousal system is able to provide a marker of post-feedback processing. Through the analysis presented in Figure 3, we indeed aimed to make the comparison of the model-based information gain to the model-agnostic prediction errors via the proxy variable of post-feedback pupil dilation instead of behavioral variables. To bridge the gap between the “behaviorally agnostic” model parameters and the actual performance of the participants, we examined the relationship between the model-based information gain and the post-feedback pupil dilation separately for error and correct trials as shown in Figure 3D-F & Figure 3J-L. We hope this addresses the reviewers concern and apologize in case we did not understand the reviewers suggestion here.
(4) Model-based analyses of pupil data:
I'm concerned about the authors' model-based analyses of the pupil data. The current approach is to simply compute a correlation for each model term separately (i.e., KL divergence, surprise, entropy). While the authors do show low correlations between these terms, single correlational analyses do not allow them to control for additional variables like outcome valence, prediction error (defined in terms of the difference between outcome and expectation), and additional nuisance variables like reaction time, as well as x and y coordinates of gaze.
Moreover, including entropy and KL divergence in the same regression model could, at least within each task, provide some insights into whether the pupil response to KL divergence depends on entropy. This could be achieved by including an interaction term between KL divergence and entropy in the model.
In line with the reviewer’s suggestions, we have included a complementary linear mixed model analysis in which we controlled for the effects of the information-theoretic variables on one another, while also including the nuisance regressors of pre-feedback baseline pupil dilation and reaction times. We compared the performance of two models on the post-feedback pupil dilation in each time window of interest: Modle 1 had no interaction between information gain and entropy and Model 2 included an interaction term as suggested. We did not include the x- and y- coordinates of gaze in the mixed linear model analysis, as there are multiple values of these coordinates per trial. Furthermore, regressing out the x and y- coordinates of gaze can potentially remove signal of interest in the pupil dilation data in addition to the gaze-related confounds and we did not measure absolute pupil size (Mathôt, Melmi & Castet, 2015; Hayes & Petrov, 2015). We present more sanity checks on the pre-processing pipeline as recommended by Reviewer 1.
This new analysis resulted in several additions to the Methods (see Section 2.5) and Results. In sum, we found that including an interaction term for information gain and entropy did not lead to better model fits, but sometimes lead to significantly worse fits. Overall, the results of the linear mixed model corroborated the “simple” correlation analysis across the pupil time course while accounting for the relationship to the pre-feedback baseline pupil and preceeding reaction time differences. There was only one difference to note between the correlation and linear mixed modeling analyses: for the error trials in the cue-target 2AFC task, including entropy in the model accounted for the variance previously explained by surprise.
(5) Major differences between experimental tasks:
More generally, I'm not convinced that the authors' conclusion that the pupil response to KL divergence depends on entropy is sufficiently supported by the current design. The two tasks differ on different levels (stimuli, contingencies, when learning takes place), not just in terms of entropy. In my opinion, it would be necessary to rely on a common task with two conditions that differ primarily in terms of entropy while controlling for other potentially confounding factors. I'm afraid that seemingly minor task details can dramatically change pupil responses. The positive/negative difference in the correlation with KL divergence that the authors interpret to be driven by entropy may depend on another potentially confounding factor currently not controlled.
We agree with the reviewer’s concerns and acknowledge that the speculation concerning the directional effect of entropy across trials can not be fully substantiated by the currect study. We note that Review #1 had a similar concern. Our response to Reviewer #1 addresses this concern of Reviewer #3 as well. To better align the manuscript with the above mentioned points, we have made several changes that are detailed in our response to Reviewer #1’s public review (above).
(6) Model validation:
My impression is that the ideal learner model should work well in this case. However, the authors don't directly compare model behavior to participant behavior ("posterior predictive checks") to validate the model. Therefore, it is currently unclear if the model-derived terms like KL divergence and entropy provide reasonable estimates for the participant data.
Based on our understanding, posterior predictive checks are used to assess the goodness of fit between generated (or simulated) data and observed data. Given that the “simple” ideal learner model employed in the current study is not a generative model, a posterior predictive check would not apply here (Gelman, Carlin, Stern, Dunson, Vehtari, & Rubin (2013). The ideal learner model is unable to simulate or replicate the participants’ responses and behaviors such as accuracy and reaction times; it simply computes the probability of seeing each stimulus type at each trial based on the prior distribution and the exact trial order of the stimuli presented to each participant. The model’s probabilities are computed directly from a Dirichlet distribution of values that represent the number of occurences of each stimulus-pair type for each task. The information-theoretic variables are then directly computed from these probabilities using standard formulas. The exact formulas used in the ideal learner model can be found in section 2.4.
We have now included a complementary linear mixed model analysis which also provides insight into the amount of explained variance of these information-theoretic predictors on the post-feedback pupil response, while also including the pre-feedback baseline pupil and reaction time differences (see section 3.3, Tables 3 & 4). The R<sup>2</sup> values ranged from 0.16 – 0.50 across all conditions tested.
(7) Discussion:
The authors interpret the directional effect of the pupil response w.r.t. KL divergence in terms of differences in entropy. However, I did not find a normative/computational explanation supporting this interpretation. Why should the pupil (or the central arousal system) respond differently to KL divergence depending on differences in entropy?
The current suggestion (page 24) that might go in this direction is that pupil responses are driven by uncertainty (entropy) rather than learning (quoting O'Reilly et al. (2013)). However, this might be inconsistent with the authors' overarching perspective based on Zénon (2019) stating that pupil responses reflect updating, which seems to imply learning, in my opinion. To go beyond the suggestion that the relationship between KL divergence and pupil size "needs more context" than previously assumed, I would recommend a deeper discussion of the computational underpinnings of the result.
Since we have removed the original speculative conclusion from the manuscript, we will refrain from discussing the computational underpinnings of a potential mechanism. To note as mentioned above, we have preliminary data from our own lab that contradicts our original hypothesis about the relationship between entropy and information gain on the post-feedback pupil response.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
Apart from the points raised in the public review above, I'd like to use the opportunity here to provide a more detailed review of potential issues, questions, and queries I have:
(1) Constriction vs. Dilation Effects:
The study observes a context-dependent relationship between KL divergence and pupil responses, where pupil dilation and constriction appear to exhibit opposing effects. However, this phenomenon raises a critical concern: Could the initial pupil constriction to visual stimuli (e.g., in the cue-target task) confound correlations with KL divergence? This potential confound warrants further clarification or control analyses to ensure that the observed effects genuinely reflect prediction error signals and are not merely a result of low-level stimulus-driven responses.
We agree with the reviewers concern and have added the following information to the limitations section in the Discussion (changes in italics below; p. 32-33).
“First, the two associative learning paradigms differed in many ways and were not directly comparable. For instance, the shape of the mean pupil response function differed across the two tasks in accordance with a visual or auditory feedback stimulus (compare Supplementary Figure 3A with Supplementary Figure 3D), and it is unclear whether these overall response differences contributed to any differences obtained between task conditions within each task. We are unable to rule out whether so-called “low level” effects such as the initial constriction to visual stimuli in the cue-target 2AFC task as compared with the dilation in response auditory stimuli in letter-color 2AFC task could confound correlations with information gain. Future work should strive to disentangle how the specific aspects of the associative learning paradigms relate to prediction errors in pupil dilation by systematically manipulating design elements within each task.”
Here, I also was curious about Supplementary Figure 1, showing 'no difference' between the two tones (indicating 'error' or 'correct'). Was this the case for FDR-corrected or uncorrected cluster statistics? Especially since the main results also showed sig. differences only for uncorrected cluster statistics (Figure 2), but were n.s. for FDR corrected. I.e. can we be sure to rule out a confound of the tones here after all?
As per the reviewer’s suggestion, we verified that there were also no significant clusters after feedback onset before applying the correction for multiple comparisons. We have added this information to Supplemenatary section 1.2 as follows:
“Results showed that the auditory tone dilated pupils on average (Supplementary Figure 1C). Crucially, however, the two tones did not differ from one another in either of the time windows of interest (Supplementary Figure 1D; no significant time points after feedback onset were obtained either before or after correcting for multiple comparisons using cluster-based permutation methods; see Section 2.5.”
Supplementary Figure 1 is showing effects cluster-corrected for multiple comparisons using cluster-based permutation tests from the MNE software package in Python (see Methods section 2.5). We have clarified that the cluster-correction was based on permutation testing in the figure legend.
(2) Participant-Specific Priors:
The ideal learner models do not account for individualised priors, assuming homogeneous learning behaviour across participants. Could incorporating participant-specific priors better reflect variability in how individuals update their beliefs during associative learning?
We have clarified in the Methods (see section 2.4) that the ideal learner models did account for participant-specific stimuli including participant-specific priors in the letter-color 2AFC task. We have added the following texts:
“We also note that while the ideal learner model for the cue-target 2AFC task used a uniform (flat) prior distribution for all participants, the model parameters were based on the participant-specific cue-target counterbalancing conditions and randomized trial order.” (p. 13)
“The prior distributions used for the letter-color 2AFC task were estimated from the randomized letter-color pairs and randomized trial order presentation in the preceding odd-ball task; this resulted in participant-specific prior distributions for the ideal learner model of the letter-color 2AFC task. The model parameters were likewise estimated from the (participant-specific) randomized trial order presented in the letter-color 2AFC task.” (p. 13)
(3) Trial-by-Trial Variability:
The analysis does not account for random effects or inter-trial variability using mixed-effects models. Including such models could provide a more robust statistical framework and ensure the observed relationships are not influenced by unaccounted participant- or trial-specific factors.
We have included a complementary linear mixed model analysis in which “subject” was modeled as a random effect on the post-feedback pupil response in each time window of interest and for each task. Across all trials, the results of the linear mixed model corroborated the “simple” correlation analysis across the pupil time course while accounting for the relationship to the prefeedback baseline pupil and preceeding reaction time differences (see section 3.3, Tables 3 & 4).
(4) Preprocessing/Analysis choices:
Before anything else, I'd like to highlight the authors' effort in providing public code (and data) in a very readable and detailed format!
We appreciate the compliment - thank you for taking the time to look at the data and code provided.
I found the idea of regressing the effect of Blinks/Saccades on the pupil trace intriguing. However, I miss a complete picture here to understand how well this actually worked, especially since it seems to be performed on already interpolated data. My main points here are:
(4.1) Why is the deconvolution performed on already interpolated data and not on 'raw' data where there are actually peaks of information to fit?
To our understanding, at least one critical reason for interpolating the data before proceeding with the deconvolution analysis is that the raw data contain many missing values (i.e., NaNs) due to the presence of blinks. Interpolating over the missing data first ensures that there are valid numerical elements in the linear algebra equations. We refer the reviewer to the methods detailed in Knapen et al. (2016) for more details on this pre-processing method.
(4.2) What is the model fit (e.g. R-squared)? If this was a poor fit for the regressors in the first place, can we trust the residuals (i.e. clean pupil trace)? Is it possible to plot the same Pupil trace of Figure 1D with a) the 'raw' pupil time-series, b) after interpolation only (both of course also mean-centered for comparison), on top of the residuals after deconvolution (already presented), so we can be sure that this is not driving the effects in a 'bad' way? I'd just like to make sure that this approach did not lead to artefacts in the residuals rather than removing them.
We thank the reviewer for this suggestion. In the Supplementary Materials, we have included a new figure (Supplementary Figure 2, copied below for convience), which illustrates the same conditions as in Figure 1D and Figure 2D, with 1) the raw data, and 2) the interpolated data before the nuisance regression. Both the raw data and interpolated data have been band-pass filtered as was done in the original pre-processing pipeline and converted to percent signal change. These figures can be compared directly to Figure 1D and Figure 2D, for the two tasks, respectively.
Of note is that the raw data seem to be dominated by responses to blinks (and/or saccades). Crucially, the pattern of results remains overall unchaged between the interpolated-only and fully pre-processed version of the data for both tasks.
In the Supplementary Materials (see Supplementary section 2), we have added the descriptives of the model fits from the deconvolution method. Model fits (R<sup>2</sup>) for the nuisance regression were generally low: cue-target 2AFC task, M = 0.03, SD = 0.02, range = [0.00, 0.07]; letter-color visual 2AFC, M = 0.08, SD = 0.04, range = [0.02, 0.16].
Furthermore, a Pearson correlation analysis between the interpolated and fully pre-processed data within the time windows of interest for both task indicated high correspondence:
Cue-target 2AFC task
Early time window: M = 0.99, SD = 0.01, range = [0.955, 1.000]
Late time window: M = 0.99, SD = 0.01, range = [0.971, 1.000]
Letter-color visual 2AFC
Early time window: M = 0.95, SD = 0.04, range = [0.803, 0.998]
Late time window: M = 0.97, SD = 0.02, range = [0.908, 0.999]
In hindsight, including the deconvolution (nuisance regression) method may not have changed the pattern of results much. However, the decision to include this deconvolution method was not data-driven; instead, it was based on the literature establishing the importance of removing variance (up to 5 s) of these blinks and saccades from cognitive effects of interest in pupil dilation (Knapen et al., 2016).
(4.3) Since this should also lead to predicted time series for the nuisance-regressors, can we see a similar effect (of what is reported for the pupil dilation) based on the blink/saccade traces of a) their predicted time series based on the deconvolution, which could indicate a problem with the interpretation of the pupil dilation effects, and b) the 'raw' blink/saccade events from the eye-tracker? I understand that this is a very exhaustive analysis so I would actually just be interested here in an averaged time-course / blink&saccade frequency of the same time-window in Figure 1D to complement the PD analysis as a sanity check.
Also included in the Supplementary Figure 2 is the data averaged as in Figure 1D and Figure 2D for the raw data and nuisance-predictor time courses (please refer to the bottom row of the sub-plots). No pattern was observed in either the raw data or the nuisance predictors as was shown in the residual time courses.
(4.4) How many samples were removed from the time series due to blinks/saccades in the first place? 150ms for both events in both directions is quite a long bit of time so I wonder how much 'original' information of the pupil was actually left in the time windows of interest that were used for subsequent interpretations.
We thank the reviewer for bringing this issue to our attention. The size of the interpolation window was based on previous literature, indicating a range of 100-200 ms as acceptable (Urai et al., 2017; Knapen et al., 2016; Winn et al., 2018). The ratio of interpolated-to-original data (across the entire trial) varied greatly between participants and between trials: cue-target 2AFC task, M = 0.262, SD = 0.242, range = [0,1]; letter-color 2AFC task, M = 0.194, SD = 0.199, range = [0,1].
We have now included a conservative analysis in which only trials with more than half (threshold = 60%) of original data are included in the analyses. Crucially, we still observe the same pattern of effects as when all data are considered across both tasks (compare the second to last row in the Supplementary Figure 2 to Figure 1D and Figure 2D).
(4.5) Was the baseline correction performed on the percentage change unit?
Yes, the baseline correction was performed on the pupil timeseries after converting to percentsignal change. We have added that information to the Methods (section 2.3).
(4.6) What metric was used to define events in the derivative as 'peaks'? I assume some sort of threshold? How was this chosen?
The threshold was chosen in a data-driven manner and was kept consistent across both tasks. The following details have been added to the Methods:
“The size of the interpolation window preceding nuisance events was based on previous literature [13,39,99]. After interpolation based on data-markers and/or missing values, remaining blinks and saccades were estimated by testing the first derivative of the pupil dilation time series against a threshold rate of change. The threshold for identifying peaks in the temporal derivative is data-driven, partially based on past work[10,14,33]. The output of each participant’s pre-processing pipeline was checked visually. Once an appropriate threshold was established at the group level, it remained the same for all participants (minimum peak height of 10 units).” (p. 8 & 11).
(5) Multicollinearity Between Variables:
Lastly, the authors state on page 13: "Furthermore, it is expected that these explanatory variables will be correlated with one another. For this reason, we did not adopt a multiple regression approach to test the relationship between the information-theoretic variables and pupil response in a single model". However, the very purpose of multiple regression is to account for and disentangle the contributions of correlated predictors, no? I might have missed something here.
We apologize for the ambiguity of our explanation in the Methods section. We originally sought to assess the overall relationship between the post-feedback response and information gain (primarily), but also surprise and entropy. Our reasoning was that these variables are often investigated in isolation across different experiments (i.e., only investigating Shannon surprise), and we would like to know what the pattern of results would look like when comparing a single information-theoretic variable to the pupil response (one-by-one). We assumed that including additional explanatory variables (that we expected to show some degree of collinearity with each other) in a regression model would affect variance attributed to them as compared with the one-on-one relationships observed with the pupil response (Morrissey & Ruxton 2018). We also acknowledge the value of a multiple regression approach on our data. Based on the suggestions by the reviewers we have included a complementary linear mixed model analysis in which we controlled for the effects of the information-theoretic variables on one another, while also including the nuisance regressors of pre-feedback baseline pupil dilation and reaction times.
This new analysis resulted in several additions to the Methods (see Section 2.5) and Results (see Tables 3 and 4). Overall, the results of the linear mixed model corroborated the “simple” correlation analysis across the pupil time course while accounting for the relationship to the prefeedback baseline pupil and preceeding reaction time differences. There was only one difference to note between the correlation and linear mixed modeling analyses: for the error trials in the cue-target 2AFC task, including entropy in the model accounted for the variance previously explained by surprise.
Reviewer #2 (Recommendations for the authors):
(1) Given the inherent temporal dependencies in pupil dynamics, characterising later pupil responses as independent of earlier ones in a three-way repeated measures ANOVA may not be appropriate. A more suitable approach might involve incorporating the earlier pupil response as a covariate in the model.
We thank the reviewer for bringing this issue to our attention. From our understanding, a repeated-measures ANOVA with factor “time window” would be appropriate in the current context for the following reasons. First, autocorrelation (closely tied to sphericity) is generally not considered a problem when only two timepoints are compared from time series data (Field, 2013; Tabachnick & Fidell, 2019). Second, the repeated-measures component of the ANOVA takes the correlated variance between time points into account in the statistical inference. Finally, as a complementary analysis, we present the results testing the interaction between the frequency and accuracy conditions across the full time courses (see Figures 1D and 2D); in these pupil time courses, any difference between the early and late time windows can be judged by the reader visually and qualitatively.
(2) Please clarify the correlations between KL divergence, surprise, entropy, and pupil response time series. Specifically, state whether these correlations account for the interrelationships between these information-theoretic measures. Given their strong correlations, partialing out these effects is crucial for accurate interpretation.
As mentioned above, based on the suggestions by the reviewers we have included a complementary linear mixed model analysis in which we controlled for the effects of the information-theoretic variables on one another, while also including the nuisance regressors of pre-feedback baseline pupil dilation and reaction times.
This new analysis resulted in several additions to the Methods (see Section 2.5) and Results (see Tables 3 and 4). Overall, the results of the linear mixed model corroborated the “simple” correlation analysis across the pupil time course while accounting for the relationship to the prefeedback baseline pupil and preceeding reaction time differences. There was only one difference to note between the correlation and linear mixed modeling analyses: for the error trials in the cue-target 2AFC task, including entropy in the model accounted for the variance previously explained by surprise.
(3) The effects observed in the late time windows appear weak (e.g., Figure 2E vs. 2F, and the generally low correlation coefficients in Figure 3). Please elaborate on the reliability and potential implications of these findings.
We have now included a complementary linear mixed model analysis which also provides insight into the amount of explained variance of these information-theoretic predictors on the post-feedback pupil response, while also including the pre-feedback baseline pupil and reaction time differences (see section 3.3, Tables 3 & 4). The R<sup>2</sup> values ranged from 0.16 – 0.50 across all conditions tested. Including the pre-feedback baseline pupil dilation as a predictor in the linear mixed model analysis consistently led to more explained variance in the post-feedback pupil response, as expected.
(4) In Figure 3 (C-J), please clarify how the trial-by-trial correlations were computed (averaged across trials or subjects). Also, specify how the standard error of the mean (SEM) was calculated (using the number of participants or trials).
The trial-by-trial correlations between the pupil signal and model parameters were computed for each participant, then the coefficients were averaged across participants for statistical inference. We have added several clarifications in the text (see section 2.5 and legends of Figure 3 and Supplementary Figure 4).
We have added “the standard error of the mean across participants” to all figure labels.
(5) For all time axes (e.g., Figure 2D), please label the ticks at 0, 0.5, 1, 1.5, 2, 2.5, and 3 seconds. Clearly indicate the duration of the feedback on the time axes. This is particularly important for interpreting the pupil dilation responses evoked by auditory feedback.
We have labeled the x-ticks every 0.5 seconds in all figures and indicated the duration of the auditory feedback in the letter-color decision task and as well as the stimuli presented in the control tasks in the Supplementary Materials.
Reviewer #3 (Recommendations for the authors):
(1) Introduction page 3: "In information theory, information gain quantifies the reduction of uncertainty about a random variable given the knowledge of another variable. In other words, information gain measures how much knowing about one variable improves the prediction or understanding of another variable."
(2) In my opinion, the description of information gain can be clarified. Currently, it is not very concrete and quite abstract. I would recommend explaining it in the context of belief updating.
We have removed these unclear statements in the Introduction. We now clearly state the following:
“Information gain can be operationalized within information theory as the KullbackLeibler (KL) divergence between the posterior and prior belief distributions of a Bayesian observer, representing a formalized quantity that is used to update internal models [29,79,80].” (p. 4)
(3) Page 4: The inconsistencies across studies are described in extreme detail. I recommend shortening this part and summarizing the inconsistencies instead of listing all of the findings separately.
As per the reviewer’s recommendation, we have shortened this part of the introduction to summarize the inconsistencies in a more concise manner as follows:
“Previous studies have shown different temporal response dynamics of prediction error signals in pupil dilation following feedback on decision outcome: While some studies suggest that the prediction error signals arise around the peak (~1 s) of the canonical impulse response function of the pupil [11,30,41,61,62,90], other studies have shown evidence that prediction error signals (also) arise considerably later with respect to feedback on choice outcome [10,25,32,41,62]. A relatively slower prediction error signal following feedback presentation may suggest deeper cognitive processing, increased cognitive load from sustained attention or ongoing uncertainty, or that the brain is integrating multiple sources of information before updating its internal model. Taken together, the literature on prediction error signals in pupil dilation following feedback on decision outcome does not converge to produce a consistent temporal signature.” (p. 5)
We would like to note some additional minor corrections to the preprint:
We have clarified the direction of the effect in Supplementary Figure 3 with the following:
“Participants who showed a larger mean difference between the 80% as compared with the 20% frequency conditions in accuracy also showed smaller differences (a larger mean difference in magnitude in the negative direction) in pupil responses between frequency conditions (see Supplementary Figure 4).”
The y-axis labels in Supplementary Figure 3 were incorrect and have been corrected as the following: “Pupil responses (80-20%)”.
We corrected typos, formatting and grammatical mistakes when discovered during the revision process. Some minor changes were made to improve clarity. Of course, we include a version of the manuscript with Tracked Changes as instructed for consideration.
-
-
-
eLife Assessment
This study identifies 53BP1 as an interaction partner of GMCL1 (a likely CUL3 substrate receptor). The study proposes a novel mechanism by which cancer cells evade the mitotic surveillance pathway through GMCL1-mediated degradation of 53BP1, leading to reduced p53 activation and paclitaxel resistance. These data are the most useful aspect of the study, but the data supporting the authors' conclusions as to the clinical relevance of the study are inadequate. The authors have not taken relevant data about the clinical mechanism of taxanes into account.
-
Reviewer #2 (Public review):
Summary
This study investigates the role of GMCL1 in regulating the mitotic surveillance pathway (MSP), a protective mechanism that activates p53 following prolonged mitosis. The authors identify a physical interaction between 53BP1 and GMCL1, but not with GMCL2. They propose that the ubiquitin ligase complex CRL3-GMCL1 targets 53BP1 for degradation during mitosis, thereby preventing the formation of the "mitotic stopwatch" complex (53BP1-USP28-p53) and subsequent p53 activation. The authors show that high GMCL1 expression correlates with resistance to paclitaxel in cancer cell lines that express wild-type p53. Importantly, loss of GMCL1 restores paclitaxel sensitivity in these cells, but not in p53-deficient lines. They propose that GMCL1 overexpression enables cancer cells to bypass MSP-mediated p53 activation, promoting survival despite mitotic stress. Targeting GMCL1 may thus represent a therapeutic strategy to re-sensitize resistant tumors to taxane-based chemotherapy.
Strengths
This manuscript presents potentially interesting observations. The major strength of this article is the identification of GMCL1 as 53BP1 interaction partner. The authors identified relevant domains and show that GMCL1 controls 53BP1 stability. The authors further show a potentially interesting link between GMCL1 status and sensitivity to Taxol.
Weaknesses
A major limitation of the original manuscript was that the functional relevance of GMCL1 in regulating 53BP1 within an appropriate model system was not clearly demonstrated. In the revised version, the authors attempt to address this point. However, the new experiment is insufficiently controlled, making it difficult to interpret the results. State-of-the-art approaches would typically rely on single-cell tracking to monitor cell fate following release from a moderately prolonged mitosis.
In contrast, the authors use a population-based assay, but the reported rescue from arrest is minimal. If the assay were functioning robustly, one would expect that nearly all cells depleted of USP28 or 53BP1 should have entered S-phase at a defined time after release. Thus, the very small rescue effect of siTP53BP1 suggests that the current assay is not suitable. It is also likely that release from a 16-hour mitotic arrest induces defects independent of the 53BP1-dependent p53 response.
Furthermore, the cell-cycle duration of RPE1 cells is less than 20 hours. It is therefore unclear why cells are released for 30 hours before analysis. At this time point, many cells are likely to have progressed into the next cell cycle, making it impossible to draw conclusions regarding the immediate consequences of prolonged mitosis. As a result, the experiment cannot be evaluated due to inadequate controls.
To strengthen this part of the study, I recommend that the authors first establish an assay that reliably rescues the mitotic-arrest-induced G1 block upon depletion of p53, 53BP1, or USP28. Once this baseline is validated, GMCL1 knockout can then be introduced to quantify its contribution to the response.
A broader conceptual issue is that the evidence presented does not form a continuous line of reasoning. For example, it is not demonstrated that GMCL1 interacts with or regulates 53BP1 in RPE1 cells-the system in which the limited functional experiments are conducted.
There are also a number of inconsistencies and issues with data presentation that need to be addressed:
(1) Figure 2C: p21 levels appear identical between GMCL1 KO and WT rescue. If GMCL1 regulates p53 through 53BP1, p21 should be upregulated in the KO.
(2) Figure 2A vs. 2C: GMCL1 KO affects chromatin-bound 53BP1 in Figure 2A, yet in Figure 2C it affects 53BP1 levels specifically in G1-phase cells. This discrepancy requires clarification.
(3) Figure 2C quantification: The three biological repeats show an unusual pattern, with one repeat's data points lying exactly between the other two. It is unclear what the line represents; please clarify.
(4) Figure nomenclature: Some abbreviations (e.g., FLAG-KI in Fig. 1F, WKE in Fig. 1C-D, ΔMFF in Fig. 1E) are not defined in the figure legends. All abbreviations must be explained.
(5) Figure 2D: Please indicate how many times the experiment was reproduced. Quantification with statistical testing would strengthen the result. Pull-downs of 53BP1 with calculation of the ubiquitinated/total ratio could also support the conclusion.
(6) Figures 3A and 3C: The G1 bars share the same color as the error bars, making the graphs difficult to interpret. Please adjust the color scheme.
-
Reviewer #3 (Public review):
Summary:
In this study, Kito et al follow up on previous work that identified Drosophila GCL as a mitotic substrate recognition subunit of a CUL3-RING ubiquitin ligase (CRL3) complex. Here they identified mutants of the human ortholog of GCL, GMCL1, that disrupt the interaction with CUL3 (GMCL1E142K) and that lack the substrate interaction domain (GMCL1 BBO). Immunoprecipitation followed by mass spectrometry identified 9 proteins that interacted with wild type FLAG-GMCL1 but not GMCL1 EK or GMCL1 BBO. These proteins included 53BP1, which plays a well characterized role in double strand break repair but also functions in a USP28-p53-53BP1 "mitotic stopwatch" complex that arrests the cell cycle after a substantially prolonged mitosis. Consistent with the IP-MS results, FLAG-GMCL1 immunoprecipitated 53BP1. Depletion of GMCL1 during mitotic arrest increased protein levels of 53BP1, and this could be rescued by wild type GMCL1 but not the E142K mutant or a R433A mutant that failed to immunoprecipitate 53BP1. Using a publicly available dataset, the authors identified a relatively small subset of cell lines with high levels of GMCL1 mRNA that were resistant to the taxanes paclitaxel, cabazitaxel, and/or docetaxel. This type of analysis is confounded by the fact that paclitaxel and other microtubule poisons accumulate to substantially different levels in various cell lines (PMID: 8105478, PMID: 10198049) so careful follow up experiments are required to validate results. The correlation between increased GMCL1 mRNA and taxane resistance was not observed in lung cancer cell lines. The authors propose this was because nearly half of lung cancers harbor p53 mutations, and lung cancer cell lines with wild type but not mutant p53 showed the correlation between increased GMCL1 mRNA and taxane resistance. However, the other cancer cell types in which they report increased GMCL1 expression correlates with taxane sensitivity also have high rates of p53 mutation. Furthermore, p53 status does not predict taxane response in patients (PMID: 10951339, PMID: 8826941, PMID: 10955790). The authors then depleted GMCL1 and reported that it increased apoptosis in two cell lines with wild type p53 (MCF7 and U2OS) due to activation of the mitotic stopwatch. This is surprising because the mitotic stopwatch paper cited (PMID: 38547292) reported that U2OS cells have an inactive stopwatch. Though it can be partially restored by treatment with an inhibitor of WIP1, the stopwatch was reported to be substantially impaired in U2OS cells, in contrast to what is reported here. Additionally, activation of the stopwatch results in cell cycle arrest rather than apoptosis in most cell types, including MCF7. Beyond this, it has recently been shown that the level of taxanes and other microtubule poisons achieved in patient tumors is too low to induce mitotic arrest (PMID: 24670687, PMID: 34516829, PMID: 37883329). Physiologically relevant concentrations are achieved with approximately 5-10 nM paclitaxel, rather than the 100 nM used here. The findings here demonstrating that GMCL1 mediates chromatin localization of 53BP1 during mitotic arrest are solid and of interest to cell biologists, but it is unlikely that these findings are relevant to paclitaxel response in patients.
Strengths:
This study identified 53BP1 as a target of CRL3GMCL1-mediated degradation during mitotic arrest. AlphaFold3 predictions of the binding interface followed by mutational analysis identified mutants of each protein (GMCL1 R433A and 53BP1 IEDI1422-1425AAAA) that disrupted their interaction. Knock-in of a FLAG tag into the C-terminus of GMCL1 in HCT116 cells followed by FLAG immunoprecipitation confirmed that endogenous GMCL1 interacts with endogenous CUL3 and 53BP1 during mitotic arrest.
Weaknesses:
The clinical relevance of the study is overinterpreted. The authors have not taken relevant data about the clinical mechanism of taxanes into account. Supraphysiologic doses of microtubule poisons cause mitotic arrest and can activate the mitotic stopwatch. However, in physiologic concentrations of clinically useful microtubule poisons, cells proceed though mitosis and divide their chromosomes on mitotic spindles that are at least transiently multipolar. Though these low concentrations may result in a brief mitotic delay, it is substantially shorter than the arrest caused by high concentrations of microtubule poisons, and the one mimicked here by 16 hours of 0.4 mg/mL nocodazole or 48 hours of 100 nM paclitaxel. Resistance to mitotic arrest occurs through different mechanisms than resistance to multipolar spindles, raising concerns about the relevance of prolonged mitosis to paclitaxel response in cancer. Nocodazole is a microtubule poison that is not used clinically and does not induce multipolar spindles, so a similar apoptotic response to both drugs increases concern about a lack of physiological relevance. Moreover, clinical response to paclitaxel does not correlate with p53 status (PMID: 10951339, PMID: 8826941, PMID: 10955790). No evidence is presented that GMCL1 affects cellular response to clinically relevant doses of paclitaxel.
Comments on revisions:
(1) The claim that GMCL1 modulates paclitaxel sensitivity in cancer should be toned down. Inaccurate statements based on an outdated understanding of the anti-cancer mechanism of paclitaxel should be removed (eg lines 42-44: "In cancers that are resistant to paclitaxel, a microtubule-targeting agent, cells bypass mitotic surveillance activation, allowing unchecked proliferation...", lines 73-75: "Proper mitotic arrest is critical for the efficacy of microtubule-targeting therapies...", lines 78-79: "This resistance is frequently associated with loss of MSP activity, for example due to defective p53 signaling". As cited in the public review, p53 status does not correlate with paclitaxel response in cancer.)
(2) Perform timelapse experiments +/- GMCL1 siRNA in the absence of drug and in the presence of low, physiologically relevant concentrations of paclitaxel (5-10 nM), as well as supraphysiologic concentrations (100 nM) and correlate mitotic duration with cell cycle arrest. Test if co-depletion of 53BP1 with GMCL1 rescues cell cycle arrest after a substantially prolonged mitosis. Perform these experiments in a cell line with an intact mitotic stopwatch.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1(Public review):
In this manuscript, Pagano and colleagues test the idea that the protein GMCL1 functions as a substrate receptor for a Cullin RING 3 E3 ubiquitin ligase (CUL3) complex. Using a pulldown approach, they identify GMCL1 binding proteins, including the DNA damage scaffolding protein 53BP1. They then focus on the idea that GMCL1 recruits 53BP1 for CUL3-dependent ubiquitination, triggering subsequent proteasomal degradation of ubiquitinated 53BP1.
In addition to its DNA damage signalling function, in mitosis, 53BP1 is reported to form a stopwatch complex with the deubiquitinating enzyme USP28 and the transcription factor p53 (PMID: 38547292). These 53BP1-stopwatch complexes generated in mitosis are inherited by G1 daughter cells and help promote p53-dependent cell cycle arrest independent from DNA damage (PMID: 38547292). Several studies show that knockout of 53BP1 overcomes G1 cell cycle arrest after mitotic delays caused by anti-mitotic drugs or centrosome ablation (PMID: 27432897, 27432896). In this model, it is crucial that 53BP1 remains stable in mitosis and more stopwatch complex is formed after delayed mitosis.
Major concerns:
Pagano and coworkers suggest that 53BP1 levels can sometimes be suppressed in mitosis if the cells overexpress GMCL1. They carry out a bioinformatic analysis of available public data for p53 wild-type cancer cell lines resistant to the anti-mitotic drug paclitaxel and related compounds. Stratifying GMCL1 into low and high expression groups reveals a weak (p = 0.05 or ns) correlation with sensitivity to taxanes. It is unclear on what basis the authors claim paclitaxel-resistant and p53 wild-type cancer cell lines bypass the mitotic surveillance/timer pathway. They have not tested this. Figure 3 is a correlation assembled from public databases but has no experimental tests. Figure 4 looks at proliferation but not cell cycle progression or the length of mitosis. The main conclusions relating to cell cycle progression and specifically the link to mitotic delays are therefore not supported by experimental data. There is no imaging of the cell cycle or cell fate after mitotic delays, or analysis of where the cells arrest in the cell cycle. Most of the cell lines used have been reported to lack a functional mitotic surveillance pathway in the recent work by Meitinger. To support these conclusions, the stability of endogenous 53BP1 under different conditions in cells known to have a functional mitotic surveillance pathway needs to be examined. A key suggestion in the work is that the level of GMCL1 expression correlates with resistance to taxanes. For the mitotic surveillance pathway, the type of drug (nocodazole, taxol, etc) used to induce a delay isn't thought to be relevant, only the length of the delay. Do GMCL1-overexpressing cells show resistance to anti-mitotics in general?
We thank the reviewer for this insightful comment. We propose that GMCL1 promotes CUL3-dependent ubiquitination of 53BP1 during prolonged mitotic arrest, thereby facilitating its proteasome-dependent degradation. To evaluate the potential clinical relevance of this mechanism, we stratified cancer cell lines based on GMCL1 mRNA expression using publicly available datasets from DepMap (PMID: 39468210). We observed correlations between GMCL1 expression levels and taxane sensitivity that appear to reflect specific cancer type-drug combinations. To experimentally evaluate this correlation and obtain mechanistic insights, we performed knockdown experiments in hTERT-RPE1 cells, which are known to possess an intact mitotic surveillance pathway. Silencing of GMCL1 alone inhibited cell proliferation and induced apoptosis, while co-depletion of either TP53BP1 or USP28 significantly rescued these effects. These results suggest that GMCL1 modulates the stability of 53BP1 and therefore the availability of the 53BP1-USP28-p53 ternary complex in cells with a functional mitotic surveillance pathway (MSP) (new Figure 5I,J) directly linking GMCL1 to the regulation of the MSP complex. Moreover, to further support our mechanism, we assessed the effect of GMCL1 levels on cell cycle progression. Briefly, following nocodazole synchronization and release, we treated cells with EdU and performed FACS analyses at different times. Knockdown of GMCL1 alone led to a delayed cell cycle progression, but co-depletion of either TP53BP1 or USP28 restored this phenotype (new Figure 3A and new Supplementary Figure 3A-C). These results are consistent with our proliferation data and suggest that the observed effects of GMCL1 are specific to mitotic exit. Finally, overexpression of GMCL1 accelerates cell cycle progression (as assessed by FACS analyses) upon release from prolonged mitotic arrest (new Figure 3B and new Supplementary Figure 3D-E).
Importantly, if GMCL1 specifically degrades 53BP1 during prolonged mitotic arrests, the authors should show what happens during normal cell divisions without any delays or drug treatments. How much 53BP1 is destroyed in mitosis under those conditions? Does 53BP1 destruction depend on the length of mitosis, drug treatment, or does 53BP1 get degraded every mitosis regardless of length? Testing the contribution of key mitotic E3 ligase activities on mitotic 53BP1 stability, such as the anaphase-promoting complex/cyclosome (APC/C) is important in this regard. One previous study reported an analysis of putative APC/C KEN-box degron motifs in 53BP1 and concluded these play a role in 53BP1 stability in anaphase (PMID: 28228263).
Physiological mitosis under unperturbed conditions is typically brief (approximately 30 minutes), making protein quantification during this window challenging. Despite this, we tried by synchronizing cells using RO-3306 and releasing them into drug-free medium to assess GMCL1 dynamics during normal mitosis. Under these conditions, GMCL1 expression was similar to that in asynchronous cells and higher than the levels upon extended mitosis. However, when we attempted to measure the half-life of proteins using cycloheximide, most cells died, likely due to the toxic effect of cycloheximide in cells subjected to co-treatment with RO-3306 or nocodazole. This is the same reasons why in Figure 2C, we assessed 53BP1 in daughter cells rather than mitotic cells.
There is no direct test of the proposed mechanism, and it is therefore unclear if 53BP1 is ubiquitinated by a GMCL1-CUL3 ligase in cells, and how efficient this process would be at different cell cycle stages. A key issue is the lack of experimental data explaining why the proposed mechanism would be restricted to mitosis. Indirect effects, such as loss of 53BP1 from the chromatin fraction during M phase upon GMCL1 overexpression, do not necessarily mean that 53BP1 is degraded. PLK1-dependent chromatin-cytoplasmic shuttling of 53BP1 during mitotic delays has been described previously (PMID: 38547292, 37888778). These papers are cited in the text, but the main conclusions of those papers on 53BP1 incorporation into a stopwatch complex during mitotic delays have been ignored. Are the authors sure that 53BP1 is destroyed in mitosis and not simply re-localised between chromatin and non-chromatin fractions? At the very least, these reported findings should be discussed in the text.
To examine whether GMCL1 promotes 53BP1 ubiquitination in cells, we expressed in cells Trypsin-Resistant Tandem Ubiquitin-Binding Entity (TR-TUBE), a protein that binds polyubiquitin chains. Abundant, endogenous ubiquitinated 53BP1 co-precipitated with TR-TUBE constructs only when wild-type GMCL1 but not the E142K GMCL1 mutant, was expressed (new Figure 2D). The PLK1-dependent incorporation of 53BP1 into the stopwatch complex and the chromatin-cytoplasmic shuttling of 53BP1 during mitotic delays is now discussed in the text. That said, compared to parental cells, 53BP1 levels in the chromatin fraction are high in two different GMCL1 KO clones in M phase arrested cells (Figure 2A-B). This increase does not correspond to a decrease in the 53BP1 soluble fraction (Figure 2A and new Supplementary Figure 2D), suggesting decreased 53BP1 is not due to re-localization. The increased half-life of 53BP1 in daughter cells (Figure 2C), also supports this hypothesis.
The authors use a variety of cancer cell line models throughout their study, most of which have been reported to lack a functional mitotic surveillance pathway. U2OS and HCT116 cells do not respond normally to mitotic delays, despite being annotated as p53 WT. Other studies have used p53 wild-type hTERT RPE-1 cells to study the mitotic surveillance pathway. If the model is correct, then over-expressing GMCL1 in hTERT-RPE1 cells should suppress cell cycle arrest after mitotic delays, and GMCL1 KO should make the cells more sensitive to delays. These experiments are needed to provide an adequate test of the proposed model.
We greatly appreciate the reviewer’s suggestion regarding overexpression of GMCL1 in hTERT-RPE1 cells. To address this, we generated stable RPE1 cells expressing V5-tagged GMCL1 and conducted EdU incorporation assays following nocodazole synchronization and release. Overexpression of GMCL1 enhanced cell cycle progression compared to control cells (new Figure 3B and new Supplementary Figure 3D-E) after mitotic arrest, consistent with our model. We, therefore, propose that GMCL1 controls 53BP1 stability to suppress p53-dependent cell cycle arrest.
We also want to point out that while some papers suggest that HCT116 and U2OS cells do not have an intact mitotic surveillance pathway, others have shown that the MSP is indeed functioning in HCT116 cells and can be triggered with variable efficiency in U2OS cells (PMID: 38547292). This is likely due to high heterogeneity and extensive clonal diversity of cancer cell lines grown in different labs. Please see examples in PMIDs: 3620713, 30089904, and 30778230. In particular, PMID: 30089904 shows that this heterogeneity correlates with considerably different drug responses.
To conclude, while the authors propose a potentially interesting model on how GMCL1 overexpression could regulate 53BP1 stability to limit p53-dependent cell cycle arrest, it is unclear what triggers this pathway or when it is relevant. 53BP1 is known to function in DNA damage signalling, and GMCL1 might be relevant in that context. The manuscript contains the initial description of GMCL1-53BP1 interaction but lacks a proper analysis of the function of this interaction and is therefore a preliminary report.
We hope that the new experiments, along with the clarifications provided in this response letter and revised manuscript, offer the reviewer increased confidence in the robustness and validity of our proposed model.
Reviewer #2 (Public review):
This study investigates the role of GMCL1 in regulating the mitotic surveillance pathway (MSP), a protective mechanism that activates p53 following prolonged mitosis. The authors identify a physical interaction between 53BP1 and GMCL1, but not with GMCL2. They propose that the ubiquitin ligase complex CRL3-GMCL1 targets 53BP1 for degradation during mitosis, thereby preventing the formation of the "mitotic stopwatch" complex (53BP1-USP28-p53) and subsequent p53 activation. The authors show that high GMCL1 expression correlates with resistance to paclitaxel in cancer cell lines that express wild-type p53. Importantly, loss of GMCL1 restores paclitaxel sensitivity in these cells, but not in p53-deficient lines. They propose that GMCL1 overexpression enables cancer cells to bypass MSP-mediated p53 activation, promoting survival despite mitotic stress. Targeting GMCL1 may thus represent a therapeutic strategy to re-sensitize resistant tumors to taxane-based chemotherapy.
Strengths:
This manuscript presents potentially interesting observations. The major strength of this article is the identification of GMCL1 as a 53BP1 interaction partner. The authors identified relevant domains and showed that GMCL1 controls 53BP1 stability. The authors further show a potentially interesting link between GMCL1 status and sensitivity to Taxol.
Weaknesses:
However, the manuscript is significantly weakened by unsubstantiated mechanistic claims, overreliance on a non-functional model system (U2OS), and overinterpretation of correlative data. To support the conclusions of the manuscript, the authors must show that the GMCL1-dependent sensitivity to Taxol depends on the mitotic surveillance pathway.
To demonstrate that GMCL1-dependent taxane sensitivity is mediated through the mitotic surveillance pathway (MSP), we now performed experiments using hTERT-RPE1 (RPE1) cells, a widely used, non-transformed cell line known to possess a functional MSP. We compared RPE1 cells with knockdown of GMCL1 alone to those with simultaneous knockdown of GMCL1 and either TP53BP1 or USP28. Upon paclitaxel (Taxol) treatment, cells with GMCL1 knockdown exhibited suppressed proliferation and increased apoptosis. Notably, these phenotypes were rescued by co-depletion of TP53BP1 or USP28 (new Figure 5I,J). These results support the notion that GMCL1 contributes to MSP activity, at least in part, through its regulation of 53BP1.
To further strengthen our mechanistic experiments, we assessed the effect of GMCL1 levels on cell cycle progression. Following nocodazole synchronization and release, we treated cells with EdU and performed FACS analyses at different times. Knockdown of GMCL1 alone led to a delay in cell cycle progression, but co-depletion of either TP53BP1 or USP28 alleviate this phenotype (new Figure 3A and new Supplementary Figure 3A, B). These results are consistent with our proliferation data.
Reviewer #3 (Public review):
Summary:
In this study, Kito et al follow up on previous work that identified Drosophila GCL as a mitotic substrate recognition subunit of a CUL3-RING ubiquitin ligase (CRL3) complex.
Here they characterize mutants of the human ortholog of GCL, GMCL1, that disrupt the interaction with CUL3 (GMCL1E142K) and that lack the substrate interaction domain (GMCL1 BBO). Immunoprecipitation followed by mass spectrometry identified 9 proteins that interacted with wild-type FLAG-GMCL1 and GMCL1 EK but not GMCL1 BBO. These proteins included 53BP1, which plays a well-characterized role in double-strand break repair but also functions in a USP28-p53-53BP1 "mitotic stopwatch" complex that arrests the cell cycle after a substantially prolonged mitosis. Consistent with the IP-MS results, FLAG-GMCL1 immunoprecipitated 53BP1. Depletion of GMCL1 during mitotic arrest increased protein levels of 53BP1, and this could be rescued by wild-type GMCL1 but not the E142K mutant or a R433A mutant that failed to immunoprecipitate 53BP1.
Using a publicly available dataset, the authors identified a relatively small subset of cell lines with high levels of GMCL1 mRNA that were resistant to the taxanes paclitaxel, cabazitaxel, and docetaxel. This type of analysis is confounded by the fact that paclitaxel and other microtubule poisons accumulate to substantially different levels in various cell lines (DOI: 10.1073/pnas.90.20.9552 , DOI: 10.1091/mbc.10.4.947 ), so careful follow-up experiments are required to validate results. The correlation between increased GMCL1 mRNA and taxane resistance was not observed in lung cancer cell lines. The authors propose this was because nearly half of lung cancers harbor p53 mutations, and lung cancer cell lines with wild-type but not mutant p53 showed the correlation between increased GMCL1 mRNA and taxane resistance. However, the other cancer cell types in which they report increased GMCL1 expression correlates with taxane sensitivity also have high rates of p53 mutation. Furthermore, p53 status does not predict taxane response in patients (DOI: 10.1002/1097-0142(20000815)89:4<769::aid-cncr8>3.0.co;2-6 , DOI: 10.1002/(SICI)1097-0142(19960915)78:6<1203::AID-CNCR6>3.0.CO;2-A , PMID: 10955790).
The authors then depleted GMCL1 and reported that it increased apoptosis in two cell lines with wild-type p53 (MCF7 and U2OS) due to activation of the mitotic stopwatch. This is surprising because the mitotic stopwatch paper they cite (DOI: 10.1126/science.add9528 ) reported that U2OS cells have an inactive stopwatch and that activation of the stopwatch results in cell cycle arrest rather than apoptosis in most cell types, including MCF7. Beyond this, it has recently been shown that the level of taxanes and other microtubule poisons achieved in patient tumors is too low to induce mitotic arrest (DOI: 10.1126/scitranslmed.3007965 , DOI: 10.1126/scitranslmed.abd4811 , DOI: 10.1371/journal.pbio.3002339 ), raising concerns about the relevance of prolonged mitosis to paclitaxel response in cancer. The findings here demonstrating that GMCL1 mediates degradation of 53BP1 during mitotic arrest are solid and of interest to cell biologists, but it is unclear that these findings are relevant to paclitaxel response in patients.
Strengths:
This study identified 53BP1 as a target of CRL3GMCL1-mediated degradation during mitotic arrest. AlphaFold3 predictions of the binding interface, followed by mutational analysis, identified mutants of each protein (GMCL1 R433A and 53BP1 IEDI1422-1425AAAA) that disrupted their interaction. Knock-in of a FLAG tag into the C-terminus of GMCL1 in HCT116 cells, followed by FLAG immunoprecipitation, confirmed that endogenous GMCL1 interacts with endogenous CUL3 and 53BP1 during mitotic arrest.
Weaknesses:
The clinical relevance of the study is overinterpreted. The authors have not taken relevant data about the clinical mechanism of taxanes into account. Supraphysiologic doses of microtubule poisons cause mitotic arrest and can activate the mitotic stopwatch. However, in physiologic concentrations of clinically useful microtubule poisons, cells proceed through mitosis and divide their chromosomes on mitotic spindles that are at least transiently multipolar. Though these low concentrations may result in a brief mitotic delay, it is substantially shorter than the arrest caused by high concentrations of microtubule poisons, and the one mimicked here by 16 hours of 0.4 mg/mL nocodazole, which is not used clinically and does not induce multipolar spindles. Resistance to mitotic arrest occurs through different mechanisms than resistance to multipolar spindles. No evidence is presented in the current version of the manuscript that GMCL1 affects cellular response to clinically relevant doses of paclitaxel.
We agree that it would be an overstatement to claim that GMCL1 and p53 regulates paclitaxel sensitivity in cancer patients in a clinical context. The correlations we observed were based on publicly available cancer cell lines from datasets catalogued in CCLE and DepMap, which do not fully account for clinical heterogeneity and patient-specific factors. In response to this important point, we have revised the text accordingly.
In the experiments shown in former Figure 4A-H (now Figure 5A-H) and in those shown in the new Figure 5I-J, we used 100 nM paclitaxel to test the hypothesis that low GMCL1 levels sensitizes cancer cells in a p53-dependent manner. Here, paclitaxel was chosen to mimic the conditions reported in the PRISM dataset (PMID: 32613204), which compiles the proliferation inhibitory activity of 4,518 compounds tested across 578 cancer cell lines. Consistent with our cell cycle findings, the paclitaxel sensitivity caused by GMCL1 depletion was reverted by silencing 53BP1 or USP28 (new Figure 5I-J), again supporting the involvement of the stopwatch complex. We are unsure about how to model the “physiologic concentrations of clinically useful microtubule poisons” in cell-based studies. A recent review notes that “The time above a threshold paclitaxel plasma concentration (0.05 mmol/L) is important for the efficacy and toxicity of the drug” (PMID: 28612269). Two other reviews mention that the clinically relevant concentration of paclitaxel is considered to be plasma levels between 0.05–0.1 μmol/L (approximately 50–100 nM) and that in clinical dosing, typical patient plasma concentrations after paclitaxel infusion range from 80–280 nM, with corresponding intratumoral concentrations between 1.1–9.0 μM, due to drug accumulation in tumor tissue (PMIDs: 24670687 and 29703818). We have now emphasized in the revised text the rationale for using 100 nM paclitaxel in our experiments.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
General comments on the Figures:
(1) Western blots lack molecular weight markers on most panels and are often over-exposed and over-contrasted, rendering them hard to interpret.
We have now included molecular weight markers in all Western blot panels. We have also reprocessed the images to avoid overexposure and excessive contrast, ensuring that the bands are clearly visible and interpretable.
(2) Input and IP samples do not show percentage loading, so it is hard to interpret relative enrichments.
In the revised figures, we have indicated what % of the input was loaded.
(3) The authors change between cell line models for their experiments, and this is not clear in the figures. These are important details for interpreting the data, as many of the cell lines used are not functional for the mitotic surveillance pathway.
In the revised manuscript, we have clearly indicated the specific cell lines used in each experiment in the figure legends. Additionally, to address concerns regarding the mitotic surveillance pathway, we have included new experiments using hTERT-RPE1 cells, which have been reported to possess a functional mitotic surveillance pathway (MSP) (Figure 4I-J).
(4) No n-numbers are provided in the figure legends. Are the Western blots provided done once, or are they reproducible? Many of the blots would benefit from quantification and presentation via graphs to test for reproducible changes to 53BP1 levels under the different conditions.
As now indicated in the methods section, we have conducted each Western blot no less than three times, yielding results that exhibit a high degree of reproducibility. A representative Western blot has been selected for each figure. We did not include densiometric quantification of immunoblots, given that the semi-quantitative nature of this technique would lead to an overinterpretation of our data; unfortunately, this is a limitation of the technique. In fact, eLife and other similar scientific journals do not adhere to the practice of quantifying Western blots. One exception to this norm is for protein half-life studies, which is done to measure the kinetics of decay rates and their internal comparisons. Accordingly, the experiments in Figure 2C were quantified.
(5) Graphs displayed in the supplementary figures are blacked out, and individual data points cannot be visualised. All graphs should have individual data points clearly visible.
We revised the quantified graphs and replaced them with scatter plots to clearly display individual data points, showing sample distribution.
Additional experiments with specific comments on Figures:
(1) Figure 1C-D: the relative amount of 53BP1 co-precipitating with FLAG-tagged GMCL1 WT appears very different between the two experiments. If the idea is that MLN4924 (Cullin neddylation inhibitor) makes the interaction easier to capture, then this should be explained in the text, and ideally shown on the same gel/blot -/+ MLN4924.
We now present the samples treated with and without MLN4924 on the same gel/blot to allow direct comparison (new Figure 1D) and clarified this point in the text.
(2) Figure 1E: The figure legend states that GMCL1 was immunoprecipitated, but the Figure looks as though FLAG-tagged 53BP1 was the bait protein being immunoprecipitated? Can the authors clarify?
We thank the reviewer for pointing out the discrepancy between the figure and the figure legend in Figure 1E. The immunoprecipitation was indeed performed using FLAG-tagged 53BP1, and we have now rectified the figure legend accordingly.
(3) Figure 1F: Rather than parental cell lysate, the better control would be to IP FLAG from another FLAG-tagged expressing cell line, to rule out non-specific binding with the FLAG tag at the non-overexpressed level.
Figure 1F shows interaction at the endogenous level. The specificity of binding with overexpressed proteins is shown in Figures 1C and 1D.
The USP28 blot is over-exposed and makes it hard to see any changes in electrophoretic mobility - it looks as though there is a change between the parental and the KI cell line? It is surprising that USP28 would co-IP with GMCL1 (presumably because USP28 is bound to 53BP1) if the function of GMCL1-53BP1 interaction is to promote 53BP1 degradation. Can the authors reconcile this? Crucially, if the authors claim that the 53BP1-GMCL1 interaction is specific to prolonged mitosis, then this experiment should be repeated and performed with asynchronous, normal-length mitosis, and prolonged mitosis conditions. This is vital for supporting the claim that this interaction only occurs during prolonged mitoses and does not occur in every mitosis regardless of length.
This is a good point. Unfortunately, many of the protein-protein interactions occur post lysis. Therefore, we could not observe differences in asynchronous vs. mitotic cells.
(4) Figure S1F: Label on blot should be CUL3 not CUI3.
We thank the reviewer for pointing this out and we have corrected the typo.
(5) Figure 2A: The authors suggest an increase in chromatin-bound 53BP1 in GMCL1 KO U2OS cells, specifically in M phase. Again, is this time in mitosis dependent, or would this be evident in every mitosis, regardless of length? Such an experiment would benefit from repetition and quantification to test whether the observed effect is reproducibly consistent. If the authors' model is correct, simply treating U2OS WT mitotic cells with MG132 during the mitotic arrest and performing the same fractionation should bring 53BP1 levels up to that seen in GMCL1 KO cells under the same conditions.
The reviewer’s suggestion to assess 53BP1 accumulation in wild-type U2OS cells treated with MG132 during mitotic arrest is indeed highly relevant. However, treatment with MG132 during prolonged mitosis consistently led to significant cell death, making it technically challenging to evaluate 53BP1 levels under these conditions.
(6) Figure 2B: The authors restore GMCL1 expression in the KO U2OS cells using WT and 2 distinct mutant cDNAs. However, the expression of these constructs is not equivalent, and thus their effects cannot be directly compared. It is also surprising that GMCL1 is much higher in M phase samples in this experiment (shouldn't it be destroyed?), when no such behaviour has been observed in the other figures.
There is no evidence in our study or others that GMCL1 should be destroyed in M phase. We show that the R433A mutant is expressed at a level very similar to the WT protein, yet it doesn’t promote the degradation of 53BP1. It is true that the E142K is expressed less in mitotic cells whereas is the most expressed in asynchronous cells. For some reason, this mutant has an inverse behavior compared to the WT, limiting the interpretation of this result. We now mention this in the text.
(7) Figure 2C: The CHX experiment would benefit from inclusion of a control protein known to have a short half-life (e.g. c-myc, p53). Is GMCL1 known to have a relatively short half-life? It looks as though GMCL1 disappears after 1 h CHX treatment (although hard to definitively tell in the absence of molecular weight markers). 53BP1 appears to continue declining in the absence of GMCL1, which is surprising if p53BP1 degradation requires GMCL1. How can the authors reconcile this?
As a control for the CHX chase experiments, we included p21, whose protein levels decreased in a CHX-dependent. GMCL1 itself also appeared to undergo degradation upon CHX treatment, but it doesn’t disappear completely.
(8) Supplemental Figure 2:
Transcription is largely inhibited in M phase, so the p53 target gene transcripts present in M phase are inherited from the preceding G2 phase. The qPCR's thus need a reference sample to compare against. I.e., was p21/PUMA/NOXA mRNA already low in G2 in the GMCL1 KO + WT cells before they entered mitosis? Or is the mRNA stability affected during M phase specifically? Is this effect on the mRNA dependent on the time in mitosis?
It is well established that transcription is not entirely shut down during mitosis, particularly for a subset of genes involved in cell cycle regulation. For example, p21, PUMA, NOXA, and p53 mRNAs have been shown to remain actively transcribed during mitosis (see Table S5 in PMID: 28912132). However, we currently lack direct evidence that p53 activation during mitosis, specifically through the mitotic surveillance pathway, drives the transcription of p21, PUMA, or NOXA mRNAs during M phase. In the absence of such mechanistic data, we opted to exclude these analyses from the final figures.
Panel B: blots are too over-exposed to see differences in p53 stability under the different conditions. Mitotic samples should be included to show how these differ from the G1 samples.
The background of all blot images has been adjusted to ensure clarity and consistency.
Panel D: The authors show no significant difference in the cell cycle profiles of the GMCL1 KO and reconstituted cells compared to parental U2OS cells. This should also be performed in the G1 daughter cells following a prolonged mitosis, to test the effect of the different GMCL1 constructs on G1 cell cycle arrest. U2OS cells have been reported not to have a functional mitotic surveillance pathway (Meitinger et al, Science, 2024), so U2OS cells are perhaps not a good model for testing this.
We performed cell cycle profiling using EdU incorporation in hTERT-RPE1 cells, which possess a functional MSP, to evaluate cell cycle progression in daughter cells following prolonged mitosis. We observed that GMCL1 knockdown alone leads to G1-phase arrest. In contrast, co-depletion of GMCL1 with either 53BP1 or USP28 bypasses this arrest, indicating that GMCL1 regulates cell cycle progression in an MSP-dependent manner. Please see also the answer to the public review above.
(9) Figure 3:
The authors show expression data for GMCL1 in the different cancer cell lines. This should be validated for a subset of cancer cell lines at the GMCL1 protein level, and cross-correlated to their MSP/mitotic timer status. Does GMCL1 depletion or knockout in p53 wild-type cancer cell lines overexpressing GMCL1 protein restore mitotic surveillance function?
We were unable to assess GMCL1 protein levels using publicly available proteomics datasets, as GMCL1 expression was not detected. In p53 wild-type hTERT-RPE1 cells, GMCL1 knockdown impaired the mitotic surveillance pathway, as evidenced by G1-phase arrest following prolonged mitosis (new Figure 3A and new Supplementary Figure 3A, B). This arrest was rescued by co-depletion of either TP53BP1 or USP28, indicating that GMCL1 acts upstream of the MSP.
(10) Figure 4:
The authors show siRNA experiments depleting GMCL1 and testing the effects of GMCL1 loss on cell viability and apoptosis induction. This is performed in different cell line backgrounds. However, there is no demonstration that any of the observed effects are due to a lack of GMCL1 activity on 53BP1. These experiments need to be repeated in 53BP1 co-depleted cells to test for rescue. Without this, the interpretation is purely correlative.
We assessed the effects of GMCL1 knockdown, alone or in combination with TP53BP1 or USP28 knockdown, on cell viability and apoptosis in hTERT-RPE1 cells using siRNA. Knockdown of GMCL1 alone led to a significant reduction in cell viability and an increase in apoptosis. However, co-depletion of GMCL1 with either TP53BP1 or USP28 restored both cell viability and apoptosis levels to those observed in control cells (new Figure 5I,J).
(11) Text comments:
Line 257: HeLa cells supress p53 through the E6 viral protein and are not "mutant" for p53.
The authors should cite early work by Uetake and Sluder describing the effects of spindle poisons on the mitotic surveillance pathway.
We appreciate the reviewer’s comments – We have now made the necessary corrections.
Reviewer #2 (Recommendations for the authors):
Major Points:
(1) Unsubstantiated Mechanistic Claims:
In Figures 3 and 4, the authors show correlations between GMCL1 expression and sensitivity to Taxol. However, they fail to demonstrate that the mitotic stopwatch is mechanistically involved. To support this conclusion, the authors must test whether deletion of 53BP1, USP28, or disruption of their interaction rescues Taxol sensitivity in GMCL1-depleted cells. Since 53BP1 also plays a role in DNA damage response, such rescue experiments are necessary to distinguish between mitotic surveillance-specific and broader stress-response effects. Deletion of USP28 would be particularly informative.
We sought to experimentally determine whether GMCL1 is involved in regulating the mitotic stopwatch. Knockdown of GMCL1 alone resulted in reduced cell proliferation and increased apoptosis. In contrast, co-depletion of GMCL1 with either TP53BP1 or USP28 restored both proliferation and apoptosis levels to those observed in control cells (new Figure 5I, J). To further strengthen our mechanistic experiments, we assessed the effect of GMCL1 levels on cell cycle progression. We conducted EdU incorporation assays following nocodazole synchronization and release. Knockdown of GMCL1 alone led to a delay in G1 progression, whereas co-depletion of either TP53BP1 or USP28 rescued normal cell cycle progression (new Figure 3A and new Supplementary Figure 3A, B). These results are consistent with our proliferation data and suggest that GMCL1 functions upstream of the ternary complex, likely by regulating 53BP1 protein levels.
(2) Model System Limitations (U2OS Cells):
The use of U2OS cells is highly problematic for investigating the mitotic surveillance pathway. U2OS cells lack a functional mitotic stopwatch and do not arrest following prolonged mitosis in a 53BP1/USP28-dependent manner (PMID: 38547292). Therefore, conclusions drawn from this model system about the function of the mitotic surveillance pathway are not substantiated. Key experiments should be repeated in a cell line with an intact pathway, such as RPE1.
We now performed all key experiments also hTERT-RPE1 cells (see above). We also would like to point out that while some papers suggest that HCT116 and U2OS cells do not have an intact mitotic surveillance pathway, others have showed that the MSP is indeed functioning in HCT116 cells and can be triggered with variable efficiency in U2OS cells (PMID: 38547292). This is likely due to high heterogeneity and extensive clonal diversity of cancer cell lines grown in different labs. Please see examples in PMIDs: 3620713, 30089904, and 30778230. In particular, PMID: 30089904 shows that this heterogeneity correlates with considerably different drug responses.
(3) Misinterpretation of p53 Activity Timing:
The manuscript states that "GMCL1 KO cells led to decreased mRNA levels of p21 and NOXA during mitosis" (line 194). However, it is well established that the mitotic surveillance pathway activates p53 in the G1 phase following prolonged mitosis-not during mitosis itself (PMID: 38547292). Therefore, the observed changes in mRNA levels during mitosis are unlikely to be relevant to this pathway.
We currently lack direct evidence that p53 activated during mitosis through the mitotic surveillance pathway directly influences the transcription of p21, PUMA, or NOXA mRNAs during M phase. Therefore, we have chosen to exclude these data from the final figures.
(4) Incorrect Interpretation of 53BP1 Chromatin Binding:
The authors claim that 53BP1 remains associated with chromatin during mitosis, which contradicts established literature. It is known that 53BP1 is released from chromatin during mitosis via mitosis-specific phosphorylation (PMID: 24703952), and this is supported by more recent findings (PMID: 38547292). A likely explanation for the discrepancy may be contamination of mitotic fractions with interphase cells. The chromatin fraction data in Figure 2C must be interpreted with caution.
Our method to synchronize in M phase is rather stringent (see Supplementary Figure 3D as an example). The literature indicates that the bulk of 53BP1 is released from chromatin during mitosis. Yet, even in the two publications mentioned by the reviewer, there is a difference in the observable amount of 53BP1 bound to chromatin (compare Figure 2B in PMID: 38547292 and Figure 5A in PMID: 24703952). The difference is likely due to the different biochemical approaches used to purify chromatin bound proteins (salt and detergent concentrations, sonication, etc.). Using our fractionation approach, we can reliably separate the soluble fraction (containing also the nucleoplasmic fraction) and chromatin associated proteins as indicated by the controls such as a-Tubulin and Histon H3. We have now mentioned these limitations when comparing different fractionation methods in our discussion section.
(5) Inadequate Citation of Foundational Literature:
The literature on the mitotic surveillance pathway is relatively limited, and it is essential that the authors provide a comprehensive and accurate account of its development. The foundational work by the Sluder lab (PMID: 20832310), demonstrating a p53-dependent arrest following prolonged mitosis, must be cited. Furthermore, the three key 2016 papers (PMID: 27432896, 27432897, 27432896) that identified the involvement of USP28 and 53BP1 in this pathway are critical and should be cited as the basis of the mitotic surveillance pathway.
In contrast, the manuscript currently emphasizes publications that either contribute minimally or have been contradicted by prior and subsequent work. For example: PMID: 31699974, which proposes Ser15 phosphorylation of p53 as critical, has been contradicted by multiple groups (e.g., Holland, Oegema, and Tsou labs).
PMID: 37888778, which suggests that 53BP1 must be released from kinetochores, is inconsistent with findings that indicate kinetochore localization is not relevant.
The authors should thoroughly revise the Introduction to reflect what this reviewer would describe as a more accurate and scholarly approach to the literature.
We have substantially revised both the Introduction and Discussion sections to incorporate important references kindly suggested by the reviewer.
Minor Points:
(1) Overexposed Western Blots:
The Western blots throughout the manuscript are heavily overexposed and saturated, obscuring differences in protein levels and hindering data interpretation. The authors should provide properly exposed blots with quantification where appropriate.
We have provided Western blot images with appropriate exposure levels and included quantification where appropriate (i.e., to measure the kinetics of decay rates as in Figure 2C). For all the other immunoblots, we did not include densiometric quantification, given that the semi-quantitative nature of this technique would lead to overinterpretation of our data. This is, unfortunately, a limitation of the technique. In fact, eLife and other similar scientific journals do not adhere to the practice of quantifying Western blot analyses.
(2) Missing information in the graphs in Figure 2C and 4; S2? How many repeats? What are the asterisks?
Panels referenced above have been repeated several times, and further details are now provided in the figure legends.
Reviewer #3 (Recommendations for the authors):
(1) The claim that GMCL1 modulates paclitaxel sensitivity in cancer should be toned down
.
We agree that it would be an overstatement to claim that GMCL1 regulates paclitaxel sensitivity in cancer patients in a clinical context. The correlations we observed were based on publicly available, cell line–based datasets, which do not fully account for clinical heterogeneity and patient-specific factors. In response to this important point, we have revised our statements and corresponding text accordingly. We now placed greater emphasis on our molecular and cell biology studies.
(2) Additional experiments in low, physiologically relevant concentrations of paclitaxel would be interesting. It is possible that these concentrations activate the mitotic stopwatch in a portion of cells, in addition to inducing cell death due to chromosome loss, activation of an immune response, and chromothripsis. Results should be interpreted in the context of this complexity.
Please see the response to the public review.
(3) It would be helpful to show that CUL3 interacts with 53BP1 only in the presence of GMCL1.
We show that the binding of 53BP1 to GMCL1 is independent of the ability of GMCL1 to bind CUL3 (Figure 1C, D). The binding between 53BP1 and CUL3 is difficult to detect (Figure 1F) likely because it’s not direct but mediated by GMCL1.
(4) The GMCL1 "KO" lines appear to still express a low level of GMCL1 (Figure 2A), which should be acknowledged
We have included the GMCL1 mRNA expression data, as measured by RT-PCR, in Supplementary Figure 1G, demonstrating that GMCL1 expression was undetectable under the tested conditions.
(5) Additional description of the methods is warranted. This is particularly true for the database analysis that forms the basis for the claim that GMCL1 overexpression causes resistance to paclitaxel and other taxanes presented in Figure 3, the methodology used to obtain M-phase cells, and the concentration and duration of taxol treatment.
We have now extensively revised the Methods section.
(6) "Taxol" and "paclitaxel" are used interchangeably throughout the manuscript. Consistency would be preferable.
We have revised the manuscript to maintain consistency in the use of the terms “Taxol” and “paclitaxel” and now refer to “paclitaxel” when discussing that individual compound; “taxanes” when referring collectively to cabazitaxel, docetaxel and paclitaxel; and “Taxol” has been removed entirely to avoid redundancy or confusion.
(7) It is unclear why it is claimed that GMCL1 interacts "specifically" with 53BP1 (line 176) since multiple interactors were identified in the IP-MS study
We meant that the GMCL1 R433A mutant loses its ability to bind 53BP1, suggesting that the GMCL1-53BP1 interaction is not an artifact. We have now clarified the text.
(8) The bottom row in Figure S3 is misleading. Paclitaxel is not uniformly effective in every tumor of any given type, and so resistance occurs in every cancer type.
We fully agree that cancer is highly heterogeneous and that paclitaxel efficacy varies across tumors, even within the same histological subtype. Our intension was not to suggest uniform sensitivity/resistance, but rather to provide a high-level overview using aggregated data. We acknowledge that this coarse-grained representation may unintentionally imply overly generalized conclusions. To avoid potential misinterpretation, we have removed the corresponding panel in the revised paper.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This study presents a valuable advance in reconstructing naturalistic speech from intracranial ECoG data using a dual-pathway model. The evidence supporting the claims of the authors is solid, although the rationale for employing a smaller language model rather than a large language model (LLM) should be further clarified. This work will be of interest to cognitive neuroscientists and computer scientists/engineers working on speech reconstruction from neural data.
-
Reviewer #1 (Public review):
Summary:
This paper introduces a dual-pathway model for reconstructing naturalistic speech from intracranial ECoG data. It integrates an acoustic pathway (LSTM + HiFi-GAN for spectral detail) and a linguistic pathway (Transformer + Parler-TTS for linguistic content). Output from the two components is later merged via CosyVoice2.0 voice cloning. Using only 20 minutes of ECoG data per participant, the model achieves high acoustic fidelity and linguistic intelligibility.
Strengths:
(1) The proposed dual-pathway framework effectively integrates the strengths of neural-to-acoustic and neural-to-text decoding and aligns well with established neurobiological models of dual-stream processing in speech and language.
(2) The integrated approach achieves robust speech reconstruction using only 20 minutes of ECoG data per subject, demonstrating the efficiency of the proposed method.
(3) The use of multiple evaluation metrics (MOS, mel-spectrogram R², WER, PER) spanning acoustic, linguistic (phoneme and word), and perceptual dimensions, together with comparisons against noise-degraded baselines, adds strong quantitative rigor to the study.
Weaknesses:
(1) It is unclear how much the acoustic pathway contributes to the final reconstruction results, based on Figures 3B-E and 4E. Including results from Baseline 2 + CosyVoice and Baseline 3 + CosyVoice could help clarify this contribution.
(2) As noted in the limitations, the reconstruction results heavily rely on pre-trained generative models. However, no comparison is provided with state-of-the-art multimodal LLMs such as Qwen3-Omni, which can process auditory and textual information simultaneously. The rationale for using separate models (Wav2Vec for speech and TTS for text) instead of a single unified generative framework should be clearly justified. In addition, the adaptor employs an LSTM architecture for speech but a Transformer for text, which may introduce confounds in the performance comparison. Is there any theoretical or empirical motivation for adopting recurrent networks for auditory processing and Transformer-based models for textual processing?
(3) The model is trained on approximately 20 minutes of data per participant, which raises concerns about potential overfitting. It would be helpful if the authors could analyze whether test sentences with higher or lower reconstruction performance include words that were also present in the training set.
(4) The phoneme confusion matrix in Figure 4A does not appear to align with human phoneme confusion patterns. For instance, /s/ and /z/ differ only in voicing, yet the model does not seem to confuse these phonemes. Does this imply that the model and the human brain operate differently at the mechanistic level?
(5) In general, is the motivation for adopting the dual-pathway model to better align with the organization of the human brain, or to achieve improved engineering performance? If the goal is primarily engineering-oriented, the authors should compare their approach with a pretrained multimodal LLM rather than relying on the dual-pathway architecture. Conversely, if the design aims to mirror human brain function, additional analysis, such as detailed comparisons of phoneme confusion matrices, should be included to demonstrate that the model exhibits brain-like performance patterns.
-
Reviewer #2 (Public review):
Summary:
The study by Li et al. proposes a dual-path framework that concurrently decodes acoustic and linguistic representations from ECoG recordings. By integrating advanced pre-trained AI models, the approach preserves both acoustic richness and linguistic intelligibility, and achieves a WER of 18.9% with a short (~20-minute) recording.
Overall, the study offers an advanced and promising framework for speech decoding. The method appears sound, and the results are clear and convincing. My main concerns are the need for additional control analyses and for more comparisons with existing models.
Strengths:
(1) This speech-decoding framework employs several advanced pre-trained DNN models, reaching superior performance (WER of 18.9%) with relatively short (~20-minute) neural recording.
(2) The dual-pathway design is elegant, and the study clearly demonstrates its necessity: The acoustic pathway enhances spectral fidelity while the linguistic pathway improves linguistic intelligibility.
Weaknesses:
The DNNs used were pre-trained on large corpora, including TIMIT, which is also the source of the experimental stimuli. More generally, as DNNs are powerful at generating speech, additional evidence is needed to show that decoding performance is driven by neural signals rather than by the DNNs' generative capacity.
-
Author response:
Here we provide a provisional response addressing the public comments and outlining the revisions we are planning to make:
(1) We will add additional baseline models to delineate the contributions of the acoustic and linguistic pathways.
(2) We will show additional ablation analysis and other model comparison results, as suggested by the reviewers, to justify the choice of the DNN models.
(3) We will clarify the use of the TIMIT dataset during pre-training. In fact, the TIMIT speech data (the speech corpora used in the test set) was not included or used when pre-training the acoustic or linguistic pathway. It was only used in fine-tuning the final speech synthesizer (the cosyvoice model). We will present results without this fine-tuning step, which will fully eliminate the usage of the TIMIT data during model training.
(4) We will further analyze the phoneme confusion matrices and/or other data to evaluate the model behavior.
(5) We will analyze the test sentences with high and low accuracies. We will also include results with partial training data (e.g. using 25%, 50%, 75% of the training set) to further evaluate the impact of the total amount of training data.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This fundamental study provides compelling evidence for the functional segregation of the sensorimotor cortex into precisely delineated areas, and highlights a rapid transition in functional properties at the boundaries between these areas. This result further confirms and extends recent work on the diversity of neural response specificities across cortical areas in the context of complex behavioral tasks. This work will be of interest to neuroscientists studying sensory-motor functions.
-
Reviewer #1 (Public review):
Summary:
Here, the authors address the organization of reach-related activity in layer 2/3 across a broad swath of anterodorsal neocortex that included large subregions of M1, M2, and S1. In mice performing a novel variant water-reaching task, the authors measured activity using two-photon fluorescence imaging of a GECI expressed in excitatory projection neurons. The authors found a substantial diversity of response patterns using a number of metrics they developed for characterizing the PETHs of neurons across reach conditions (target locations). By mapping single-neuron properties across the cortex, the authors found substantial spatial variation, only some of which aligned with traditional boundaries between cortical regions. Using Gaussian mixture models, the authors found evidence of distinct response types in each region, with several types prominent in multiple cortical regions. Aggregating across regions, four primary subpopulations were apparent, each distinct in its average response properties. Strikingly, each subpopulation was observed in multiple regions, but subpopulation members from different regions exhibited largely similar response properties.
Strengths:
The work addresses a fundamental question in the field that has not previously been addressed at cellular resolution across such a broad cortical extent. I see this as truly foundational work that will support future investigation of how the rodent brain drives and controls reaching.
The quantification is thoughtful and rigorous. It is great that the authors provide an explanation for and intuition behind their response metrics, rather than burying everything in the Methods.
The Discussion and general contextualization of the results are thorough, thoughtful, and strong. It is great that the authors avoid the common over-interpretation of classical observations regarding cortical organization that are endemic in the field.
All things considered, this is the best paper regarding spatial structure in the motor system I have ever read. The breadth of cellular resolution activity measurement, the rigor of the quantification, and the clear and open-minded interrogation of the data collectively have produced a very special piece of work.
Weaknesses:
The behavioral task is very impressive and an important contribution to the field in its own right. However, given that it appears substantially different from the one used in the previous paper, the characterization of the behavior provided in the Results is too brief. More illustration of the behavior would be helpful. For example, it is rather deep into the paper when the authors reveal that the mice can whisk to help localize the target location. That should be expressed at the outset when the behavior is first described. Other suggestions for elaborating the behavior description are included below.
Statistical support for key claims is lacking. For example, "The five areas of interest varied in the fraction of neurons that were modulated: M2 had 14%, M1 had 23%, S1-fl had 30%, S1-hl had 25%, and S1-tr had 27%" - I cannot locate the statistical tests showing that these values are actually different. Another example is Figure 7, where a key observation is that distributions of PETH features are distinct across regions. It is clear that at least some distributions are not overlapping, but a clearer statistical basis for this key claim should be provided.
I understand that the authors are planning a follow-up study that addresses the relation between activity patterns and kinematics. One question about interpreting the results here though, is how much the activity variation across target locations may relate to the kinematic differences across these different conditions, as opposed to true higher-order movement features like reach direction.
-
Reviewer #2 (Public review):
Summary:
The functional parcellation of cortical areas is a critical question in neuroscience. This is particularly true in frontal areas in mice. While sensory areas are relatively well characterized by their tuning to sensory stimuli, the situation is much less clear for motor areas. This has become even more ambiguous since recent studies using large-scale neuronal recordings consistently report mixed sensory and motor-related activity throughout the brain, and motor mapping studies have shown that movements evoked by cortical stimulation are by no means limited to motor areas alone. Here, the authors use a correlation approach combining large-scale functional imaging at cellular resolution with movement-tracking in mice executing a reaching task. Across multiple recording sessions in the same animals, the authors have imaged a large portion of the sensorimotor cortex at cellular resolution in mice performing a reaching task, recording the activity of nearly 40,000 neurons. By aligning the calcium signal of each neuron to three task events-the Go cue triggering the reach, the onset of paw lift, and the contact between the paw and the target-for different target positions, the authors identified different response patterns distributed differently across cortical areas. They defined a set of features that describe the neurons' response pattern, representing the temporal dynamics and tuning properties for the different target positions. These features were used to construct cortical maps, and the authors show that, interestingly, gradient maps obtained from the first derivative of the feature maps reveal sharp discontinuities at the boundaries between anatomically defined cortical areas. Using dimensionality reduction of the neuronal response features, the authors found that, despite clear differences in their average response properties, individual neurons from the same cortical areas do not form distinct clusters in the reduced-dimensional space. In fact, most areas contain heterogeneous neuronal populations, and most neuronal populations are present in multiple areas, albeit in different proportions. Interestingly, the authors identified four neuronal subpopulations based on the distance between the components of the Gaussian mixture model used to model the distribution of neurons within each area. One of these subpopulations is almost exclusively represented in the anterior M2 cortex, while another is broadly distributed across the different areas.
Strengths:
This article is based on an impressive dataset of nearly 40,000 neurons covering a large portion of the sensorimotor cortex and on innovative analytical approaches. This study is likely the first to clearly demonstrate boundaries between cortical areas defined based on the responses of individual neurons. This innovative approach to functional mapping of cortical areas potentially opens up new perspectives for higher-resolution mapping of frontal cortical areas, using a broader repertoire of sensory and motor evoked responses.
Weaknesses:
The second part of the article, which presents multimodal responses in the cortical areas, seems to be a perhaps overly complicated way of showing what has already been demonstrated in numerous recent publications, but these new analyses expand upon these previous observations by revealing an interesting functional organization of the sensorimotor cortex, highlighting interesting similarities and differences between certain areas.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
The authors present evidence for a WIPI2-Retriever complex (termed CROP2) that couples cargo selection to carrier fission at endosomes. CROP2 appears to function analogously to the previously described CROP1 complex, formed by WIPI1 and Retromer, with which it shares structural similarities. They provide convincing evidence that CROP1 and CROP2 regulate the trafficking of distinct subsets of cargoes; however, the cellular evidence for the existence of these distinct complexes remains incomplete. Overall, the findings are important and expand our understanding of how cargo selection by Retriever and Retromer is orchestrated at endosomes.
-
Reviewer #1 (Public review):
WIPI1 is a PROPPIN family protein that has been implicated in Retromer-mediated membrane fission events. Although the cargos that it has been tested to be important for are diverse, one of the cargos that is unaffected is Beta1-Integrin. This leads the authors to assess another PROPPIN family protein - WIPI2, which is a homolog of WIPI1. KD using siRNA is effective and had no consequences on LAMP1, EGFR trafficking or GLUT1 trafficking. Integrin-B1, however, had a large and significant defect in its recycling from the endosome, with a clear endosomal colocalisation. Complementation experiments with WT WIPI2 recovered the phenotype, but various mutant WIPI2 complements resulted in elongated tubules, and there was also a dominant negative effect of the mutant. Integrin is a classic retreiver cargo, so the authors rationalise that WIPI2 may be playing a role with retreiver that WIPI1 plays with retromer. To assess this, they perform a set of immunoprecipitations. SNX17, the retreiver-associated sorting nexin, co-IPs with WIPI2 in a VPS26C-dependent manner. VPS26C but not VPS26 co-IPs with WIPI2, and the reciprocal with WIPI1. These interactions were not present for the FSSS mutation of WIPI2. WIPI2 localises to Rab11 endosomes mainly, as does retriever. Mutations of WIPI2 not only affected WIPI2 localisation, but also VPS35L mutations, indicating that there is a functional relationship between the two.
On the whole, I find the manuscript compelling. The manuscript is very clearly written, the results are convincing and well performed. The flow of experiments is logical, and although not comprehensive in the subsequent mechanistic understanding, the fundamental findings are important and convincing. My comments below are, on the whole, minor and are intended to support the communication of the findings to the field.
(1) The IP interaction data were convincing; however, for me and some others, an interaction is only convincing when performed in vitro, and understood at a structural level. I do not suggest the authors do that in this case; however, I think, at a minimum, some sensible moderation of claims would be useful here.
(2) I found the final localisation data and its interpretation confusing. My interpretation of that data would not be that the retreiver is relocalised, but rather that there is less of both recruited to the membrane and the remaining localisation distribution is shifted. In addition, I am not quite sure of the model here - is the idea that WIPI2 recruits retreiver, if that is the case, I find it hard to resolve with its role as a mediator of fission. Clarity would be appreciated here.
(3) I am concerned that the repeats being compared for statistical analysis are not biological repeats but technical repeats (cells in the same experiment). I should think the idea of the statistical comparison is to show experimental reproducibility and variability across biological repeats. Therefore, I would expect an appropriate number of biological repeats (3 or more minimum), to be the data compared in the statistical analysis and graphs. I think it is appropriate to average the technical repeats from each biological repeat. I find these to be useful resources https://doi.org/10.1083/jcb.202401074, https://doi.org/10.1083/jcb.200611141
-
Reviewer #2 (Public review):
Summary:
The manuscript from De Leo and Mayer presents evidence that the PROPPIN protein, WIPI2, associates with the Retriever complex, and is required for the proper transport of the SNX17-Retriever cargo, beta1-integrin. This finding fits with prior papers from the Mayer lab, which showed that a related PROPPIN, WIPI1, is required for the transport of some SNX27-Retromer cargo, including GLUT1. The retromer and retriever complexes are architecturally similar. Importantly, they act at the same endosomes, and each transports cargo from endosomes to the plasma membrane. Thus, the possibility that each also requires a structurally related PROPPIN is of interest. However, the manuscript is incomplete, and the main claims are only partially supported.
Strengths:
The topic that PROPPIN proteins are important for the function of the Retromer and Retriever complexes expands our view of the trafficking complex.
Weaknesses:
Many important controls are missing. Several points that are made in the manuscript are only supported through a single approach.
-
Reviewer #3 (Public review):
Summary:
The manuscript of Mayer and colleagues analyzes the function of WIPI proteins in mammalian cells. The authors previously identified CROP as a complex consisting of WIPI1 and the retromer complex, primarily in yeast cells. In mammalian cells, both WIPI1 and WIPI2 exist, whereas retromer has a homologous complex termed retriever. They now find that WIPI2 can form a complex with retriever subunits. They named this complex CROP2. Their data further indicate that CROP2 and CROP1 have distinct substrate specificities as knockdown of CROP2 subunits affects beta1 integrin sorting, whereas knockdown of CROP1 affects EGFR and GLUT1. They further identify a similar sequence (FSSS) in both WIPI1 and WIPI2, which is required for their specific binding to retromer and retriever.
Strengths:
CROP1 and CROP2 seem to use similar features for their formation, and have different substrates, which is convincingly shown.
Weaknesses:
The analysis lacks information that this is a complex as claimed. It can be deduced from the interaction analysis, but was not shown.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This study reports a valuable method to predict the capacity of a candidate probiotic bacterium to metabolically outcompete a bacterial pathogen in the ecological niche of the murine respiratory tract (niche exclusion) based on the overlap of used carbon sources in vitro. The in vivo confirmation of the in vitro/in silico predicted efficacy is, at this stage, incomplete and would require more persuasive experimental evidence for the elimination of alternative mechanisms of action.
-
Reviewer #1 (Public review):
A summary of what the authors were trying to achieve:
(1) Identify probiotic candidates based on the phylogenetic proximity and their presence in the lower respiratory tract based on phylogenetic analysis and on meta-analysis of 16S rRNA sequencing of mouse lung samples.
(2) Predefine probiotic candidates with overlapping and competing metabolic profiles based on a simple and easy-to-applicable score, taking carbon source use into consideration.
(3) Confirm the functionality of these candidate probiotics in vitro and define their mechanism of action (niche exclusion by either metabolic competition or active antibacterial strategies).
(4) Confirm the probiotic action in vivo.
Strengths:
The authors attempt to go the whole 9 yards from rational choice of phylogenetic close lower respiratory tract probiotics, over in silico modelling of niche index based on use of similar carbon sources with in vitro confirmation, to in vivo competition experiments in mice.
Weaknesses:
(1) The use of a carbon source is defined as growth to OD600 two SD above the blank level. While allowing a clear cutoff, this procedure does not take into account larger differences in the preferences of carbon sources between the pathogen and the probiotic candidate. If the pathogen is much better at taking up and processing a carbon source, the competition by the probiotic might be biologically irrelevant.
(2) The authors do not take into account the growth of candidate probiotics in the presence of Bt. In monoculture, three of the four most potent candidate probiotics grow to comparable levels as Bt in LSM.
(3) Niche exclusion in vivo is not shown. Mortality of hosts after infection with Bt is not a measure for competition of CP with the pathogen. Only Bt titers would prove a competitive effect. For CP17, less than half of the mice were actually colonized, but still, there is 100% protection. Activation of the host immune system would explain this and has to be excluded as an alternative reason for improved host survival.
Appraisal:
(1) Based on phylogenetic comparison and published resources on lower respiratory tract colonizing bacteria, the authors find a reasonably good number of candidate probiotics that grow in LSM and successfully compete with the pathogenic target bacterium Bt in vitro.
(2) In vivo, only host survival was tested, and a direct competition of CP with Bt by testing for Bt titers was not shown.
Impact:
Niche exclusion based on competition for environmentally provided metabolites is not a new concept and was experimentally tested, e.g. in the intestine. The authors show here that this concept could be translated into the resource-poor environment of the respiratory tract. It remains to be tested if the LSM growth-based competition data in vitro can be translated into niche exclusion in vivo.
-
Reviewer #2 (Public review):
Summary:
This study aims to establish a rational framework for designing bacterial probiotics against respiratory infections. The central hypothesis is that in vitro antagonism, particularly through metabolic niche overlap with a pathogen, predicts in vivo efficacy.
Strengths:
(1) Systematic pipeline: The study integrates bacterial isolation, in vitro characterization, model development, and in vivo validation into a cohesive workflow.
(2) Quantitative model: The introduction of the Niche Index (NI) and Niche Index Fraction (NIF) provides a novel, quantitative tool for predicting probiotic efficacy based on ecological principles.
(3) Mechanistic insight: The work dissects different modes of action, clearly demonstrating that inhibition can be driven by specialized metabolite production (CP8) or carbon resource competition (e.g., CP7), with lactate utilization identified as a key factor.
Weaknesses:
(1) Limited model generalizability: The predictive power of the NI model is not universal. It fails to account for the in vivo inefficacy of CP8 (a metabolite-dependent inhibitor) and cannot explain the short-term protection conferred by some non-inhibitory CPs in vivo, suggesting unmodeled mechanisms like immune priming are at play.
(2) Preliminary nature of key findings: The emphasis on lactate consumption as a critical predictor, while interesting, is not sufficiently explored to establish its general importance beyond the specific strains and conditions tested.
Appraisal:
The authors successfully achieve their aim of establishing a rational probiotic-design pipeline. The data robustly support the conclusion that metabolic niche overlap predicts efficacy for many strains, while also clearly delineating the model's limitations, as acknowledged by the authors.
Impact:
This work provides a valuable methodological framework for hypothesis-driven probiotic discovery. The quantitative Niche Index offers immediate utility to the field and, with further refinement, has the potential to become a fundamental tool for developing respiratory therapeutics.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This is an overall compelling set of findings on the role of centrally produced estrogens in the control of behaviors in male medaka. The significance of the findings rests on the revealed potential mechanism between brain derived estrogens modulating social behaviors in males , supported by the analysis of multiple transgenic lines. The evidence for the broader claim is incomplete since it has not been extended to female medaka, and further experimentation would be necessary to fully validate the conclusions on the role of brain-derived estrogens. Nonetheless, the findings have led to important hypotheses on the hormonal control of behaviors in teleosts that can be tested further.
-
Reviewer #1 (Public review):
Summary:
This research group has consistently performed cutting-edge research aiming to understand the role of hormones in the control of social behaviors, specifically by utilizing the genetically-tractable teleost fish, medaka, and the current work is no exception. The overall claim they make, that estrogens modulate social behaviors in males is supported, with important caveats. For one, there is no evidence these estrogens are generated by "neurons" as would be assumed by their main claim that it is NEUROestrogens that drive this effect. While indeed the aromatase they have investigated is expressed solely in the brain, in most teleosts, brain aromatase is only present in glial cells (astrocytes, radial glia). The authors should change this description so as not to mislead the reader. Below I detail more specific strengths and weaknesses of this manuscript.
Strengths:
Excellent use of the medaka model to disentangle the control of social behavior by sex steroid hormones
The findings are strong for the most part because deficits in the mutants are restored by the molecule (estrogens) that was no longer present due to the mutation
Presentation of the approach and findings are clear, allowing the reader to make their own inferences and compare them with the authors'
Includes multiple follow-up experiments, which leads to tests of internal replication and an impactful mechanistic proposal
Findings are provocative not just for teleost researchers, but for other species since, as the authors point out, the data suggest mechanisms of estrogenic control of social behaviors may be evolutionary ancient
Weaknesses:
The experimental design for studying aggression in males has flaws, but it appears a typical resident-intruder type assay is not appropriate for medaka. seems other species may be better for studying aggression in teleosts.
-
Reviewer #3 (Public review):
Summary:
Taking advantage of the existence in fish of two genes coding for estrogen synthase, the enzyme aromatase, one mostly expressed in the brain (Cyp19a1b) and the other mostly found in the gonads (Cyp19a1a), this study investigates the role of brain-derived estrogens in the control of sexual and aggressive behavior in male medaka. The constitutive deletion of Cyp19a1b, confirmed by the ablation of its transcript, markedly reduced brain estrogen content. This effect is accompanied by reduced sexual and aggressive behavior and reduced expression of the transcripts coding for androgen receptors (AR), ara and arb, in brain regions involved in social behavior regulation. Both AR expression and aspects of social behaviors were restored by adult treatment with estrogens, providing some support for a role of aromatization. Expression analysis of AR isoforms and behavior of mutants of estrogen receptors (ER) indicates that these effects are likely mediated by the activation of the esr1 and esr2a isoforms. Together, these results provide valuable insights into the role of brain-derived estrogens in social behavior in fish.
Strengths:
This study evaluates the role of brain "specific" Cyp19a1 in the social behavior in male teleost fish, which as a taxon are more abundant and yet proportionally less studied that the most common birds and rodents. Therefore, evaluating the generalizability of results from higher vertebrates is important. The study suggests that, as opposed to mammals, the facilitatory role of brain-derived estrogens on mating and aggression is limited to adulthood.
Results obtained from multiple mutant lines converge to show that estrogens most likely synthesized in the brain drives aspects of male sexual behavior.
The comparative discussion of the age-dependent abundance of brain aromatase in fish vs mammals and its role in organization vs activation is important beyond the study of the targeted species.
Weaknesses:
Most experiments are weakly powered (low sample size).
The variability of the mRNA content for a same target gene between experiments (genotype comparison vs E2 treatment comparison) raises questions about the reproducibility of the data (apparent disappearance of genotype effect).
Conclusions :
Overall, the present study provides convincing evidence for a facilitatory role of estrogens originating from the brain on sexual behavior and aggressive behavior in male medaka. The role of specific estrogen receptor isoforms underlying the expression of androgen receptors is supported by converging evidence from multiple mutant lines.
-
Author response:
The following is the authors’ response to the previous reviews.
Public Reviews:
Reviewer #1 (Public Review):
Summary:
This research group has consistently performed cutting-edge research aiming to understand the role of hormones in the control of social behaviors, specifically by utilizing the genetically-tractable teleost fish, medaka, and the current work is no exception. The overall claim they make, that estrogens modulate social behaviors in males and females is supported, with important caveats. For one, there is no evidence these estrogens are generated by "neurons" as would be assumed by their main claim that it is NEUROestrogens that drive this effect. While indeed the aromatase they have investigated is expressed solely in the brain, in most teleosts, brain aromatase is only present in glial cells (astrocytes, radial glia). The authors should change this description so as not to mislead the reader. Below I detail more specific strengths and weaknesses of this manuscript.
We thank the reviewer for this positive evaluation of our work and for the helpful comments and suggestions. Regarding the concern that the term “neuroestrogens” may be misleading, we addressed this in the previous revision by consistently replacing it throughout the manuscript with “brain-derived estrogens” or “brain estrogens.”
In addition, the following sentence was added to the Introduction (line 61): “In teleost brains, including those of medaka, aromatase is exclusively localized in radial glial cells, in contrast to its neuronal localization in rodent brains (Forlano et al., 2001; Diotel et al., 2010; Takeuchi and Okubo, 2013).”
Strenghth:
Excellent use of the medaka model to disentangle the control of social behavior by sex steroid hormones
The findings are strong for the most part because deficits in the mutants are restored by the molecule (estrogens) that was no longer present due to the mutation
Presentation of the approach and findings are clear, allowing the reader to make their own inferences and compare them with the authors'
Includes multiple follow-up experiments, which leads to tests of internal replication and an impactful mechanistic proposal
Findings are provocative not just for teleost researchers, but for other species since, as the authors point out, the data suggest mechanisms of estrogenic control of social behaviors may be evolutionary ancient
We thank the reviewer again for their positive evaluation of our work.
Weakness:
As stated in the summary, the authors are attributing the estrogen source to neurons and there isn't evidence this is the case. The impact of the findings doesn't rest on this either
As mentioned above, we addressed this in the previous revision by replacing “neuroestrogens” with “brain-derived estrogens” or “brain estrogens” throughout the manuscript. In addition, the following sentence was added to the Introduction (line 61): “In teleost brains, including those of medaka, aromatase is exclusively localized in radial glial cells, in contrast to its neuronal localization in rodent brains (Forlano et al., 2001; Diotel et al., 2010; Takeuchi and Okubo, 2013).”
The d4 versus d8 esr2a mutants showed different results for aggression. The meaning and implications of this finding are not discussed, leaving the reader wondering
This comment is the same as one raised in the first review (Reviewer #1’s comment 2 on weaknesses), which we already addressed in our initial revision. For the reviewer’s convenience, we provide the response below:
Line 300: As the reviewer correctly noted, circles were significantly reduced in mutant males of the Δ8 line, whereas no significant reduction was observed in those of the Δ4 line. However, a tendency toward reduction was evident in the Δ4 line (P = 0.1512), and both lines showed significant differences in fin displays. Based on these findings, we believe our conclusion that esr2a<sup>−/−</sup> males exhibit reduced aggression remains valid. To clarify this point and address potential reader concerns, we have revised the text as follows: “esr2a<sup>−/−</sup> males exhibited significantly fewer fin displays (P = 0.0461 and 0.0293 for Δ8 and Δ4 lines, respectively) and circles (P = 0.0446 and 0.1512 for Δ8 and Δ4 lines, respectively) than their wild-type siblings (Fig. 5L; Fig. S8E), suggesting less aggression” was edited to read “esr2a<sup>−/−</sup> males from both the Δ8 and Δ4 lines exhibited significantly fewer fin displays than their wild-type siblings (P = 0.0461 and 0.0293, respectively). Circles followed a similar pattern, with a significant reduction in the Δ8 line (P = 0.0446) and a comparable but non-significant decrease in the Δ4 line (P =0.1512) (Figure 5L, Figure 5—figure supplement 3E), showing less aggression.”
Lack of attribution of previous published work from other research groups that would provide the proper context of the present study
This comment is also the same as one raised in the first review (Reviewer #1’s comment 3 on weaknesses). In our previous revision, in response to this comment, we cited the relevant references (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015; Yong et al., 2017; Alward et al., 2020; Ogino et al., 2023) in the appropriate sections. We also added the following new references and revised the Introduction and Discussion accordingly:
(2) Alward BA, Laud VA, Skalnik CJ, York RA, Juntti SA, Fernald RD. 2020. Modular genetic control of social status in a cichlid fish. Proceedings of the National Academy of Sciences of the United States of America 117:28167–28174. DOI: https://doi.org/10.1073/pnas.2008925117
(39) O’Connell LA, Hofmann HA. 2012. Social status predicts how sex steroid receptors regulate complex behavior across levels of biological organization. Endocrinology 153:1341–1351. DOI:https://doi.org/10.1210/en.2011-1663
(54) Yong L, Thet Z, Zhu Y. 2017. Genetic editing of the androgen receptor contributes to impaired male courtship behavior in zebrafish. Journal of Experimental Biology 220:3017–3021.DOI:https://doi.org/10.1242/jeb.161596
There are a surprising number of citations not included; some of the ones not included argue against the authors' claims that their findings were "contrary to expectation"
In our previous revision, we cited the relevant references (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015) in the Introduction. We also revised the text to remove phrases such as “contrary to expectation” and “unexpected.”
The experimental design for studying aggression in males has flaws. A standard test like a residentintruder test should be used.
Following this comment, we have attempted additional aggression assays using the resident-intruder paradigm. However, these experiments did not produce consistent or interpretable results. As noted in our previous revision, medaka naturally form shoals and exhibit weak territoriality, and even slight differences in dominance between a resident and an intruder can markedly increase variability, reducing data reliability. Therefore, we believe that the approach used in the present study provides a more suitable assessment of aggression in medaka, regardless of territorial tendencies. We will continue to explore potential refinements in future studies and respectfully ask the reviewer to evaluate the present work based on the assay used here.
While they investigate males and females, there are fewer experiments and explanations for the female results, making it feel like a small addition or an aside
While we did not adopt this comment in our previous revision, we have carefully reconsidered the reviewers’ feedback and have now decided to remove the female data. This change allows us to present a more focused and cohesive story centered on males. The specific revisions are outlined below:
Abstract
Line 25: The text “, thereby revealing a previously unappreciated mode of action of brain-derived estrogens. We additionally show that female fish lacking Cyp19a1b are less receptive to male courtship and conversely court other females, highlighting the significance of brain-derived estrogens in establishing sex-typical behaviors in both sexes.” has been revised to “. Taken together, these findings reveal a previously unappreciated mode of action of brain-derived estrogens in shaping male-typical behaviors.”
Results
Line 88: The text “Loss of cyp19a1b function in these fish was verified by measuring brain and peripheral levels of sex steroids. As expected, brain estradiol-17β (E2) in both male and female homozygous mutants (cyp19a1b<sup>−/−</sup>) was significantly reduced to 16% and 50%, respectively, of the levels in their wild-type (cyp19a1b<sup>+/+</sup>) siblings (P = 0.0037, males; P = 0.0092, females) (Fig. 1, A and B). In males, brain E2 in heterozygotes (cyp19a1b<sup>−/−</sup>) was also reduced to 45% of the level in wild-type siblings (P = 0.0284) (Fig. 1A), indicating a dosage effect of cyp19a1b mutation. In contrast, peripheral E2 levels were unaltered in both cyp19a1b<sup>−/−</sup> males and females (Fig. S1, C and D), consistent with the expected functioning of Cyp19a1b primarily in the brain. Strikingly, brain levels of testosterone, as opposed to E2, increased 2.2-fold in cyp19a1b<sup>−/−</sup> males relative to wild-type siblings (P = 0.0006) (Fig. 1A). Similarly, brain 11KT levels in cyp19a1b<sup>−/−</sup> males and females increased 6.2- and 1.9-fold, respectively, versus wild-type siblings (P = 0.0007, males; P = 0.0316, females) (Fig. 1, A and B). These results show that cyp19a1b-deficient fish have reduced estrogen levels coupled with increased androgen levels in the brain, confirming the loss of cyp19a1b function. They also suggest that the majority of estrogens in the male brain and half of those in the female brain are synthesized locally in the brain. In addition, peripheral 11KT levels in cyp19a1b<sup>−/−</sup> males and females increased 3.7- and 1.8-fold, respectively (P = 0.0789, males; P = 0.0118, females) (Fig. S1, C and D), indicating peripheral influence in addition to central effects.” has been revised to “Loss of cyp19a1b function in these fish was verified by measuring brain and peripheral levels of sex steroids in males. As expected, brain estradiol-17β (E2) in homozygous mutants (cyp19a1b<sup>−/−</sup>) was significantly reduced to 16% of the levels in wild-type (cyp19a1b<sup>+/+</sup>) siblings (P = 0.0037) (Figure 1A). Brain E2 in heterozygotes (cyp19a1b<sup>+/−</sup>) was also reduced to 45% of wild-type levels (P = 0.0284) (Figure 1A), indicating a dosage effect of the cyp19a1b mutation. In contrast, peripheral E2 levels were unaltered in cyp19a1b<sup>−/−</sup> males (Figure 1B), consistent with the expected functioning of Cyp19a1b primarily in the brain. Strikingly, brain testosterone levels, as opposed to E2, increased 2.2-fold in cyp19a1b<sup>−/−</sup> males relative to wild-type siblings (P = 0.0006) (Figure 1A). Similarly, brain 11KT levels increased 6.2-fold (P = 0.0007) (Figure 1A). These results indicate that cyp19a1b-deficient males have reduced estrogen coupled with elevated androgen levels in the brain, confirming the loss of cyp19a1b function. They also suggest that the majority of estrogens in the male brain are synthesized locally in the brain. Peripheral 11KT levels also increased 3.7-fold in cyp19a1b<sup>−/−</sup> males (P = 0.0789) (Figure 1B), indicating peripheral influence in addition to central effects.”
Line 211: “expression of vt in the pNVT of cyp19a1b<sup>−/−</sup> males was significantly reduced to 18% as compared with cyp19a1b<sup>+/+</sup> males (P = 0.0040), a level comparable to that observed in females” has been revised to “expression of vt in the pNVT of cyp19a1b<sup>−/−</sup> males was significantly reduced to 18% as compared with cyp19a1b<sup>+/+</sup> males (P = 0.0040).”
The subsection entitled “cyp19a1b-deficient females are less receptive to males and instead court other females,” which followed line 311, has been removed.
Discussion
The two paragraphs between lines 373 and 374, which addressed the female data, have been removed.
Materials and methods
Line 433: “males and females” has been changed to “males”.
Line 457: “focal fish” has been changed to “focal male”.
Line 458: “stimulus fish” has been changed to “stimulus female”.
Line 458: “Fig. 6, E and F, ” has been deleted.
Line 460: “; wild-type males in Fig. 6, A to C” has been deleted.
Line 466: The text “The period of interaction/recording was extended to 2 hours in tests of courtship displays received from the stimulus esr2b-deficient female and in tests of mating behavior between females, because they take longer to initiate courtship (12). In tests using an esr2b-deficient female as the stimulus fish, where the latency to spawn could not be calculated because these fish were unreceptive to males and did not spawn, the sexual motivation of the focal fish was instead assessed by counting the number of courtship displays and wrapping attempts in 30 min. The number of these mating acts was also counted in tests to evaluate the receptivity of females. In tests of mating behavior between two females, the stimulus female was marked with a small notch in the caudal fin to distinguish it from the focal female.” has been revised to “In tests using an esr2b-deficient female as the stimulus fish, the latency to spawn could not be calculated because the female was unreceptive to males and did not spawn. Therefore, the sexual motivation of the focal male was assessed by counting the number of courtship displays and wrapping attempts in 30 min. To evaluate courtship displays performed by stimulus esr2bdeficient females toward focal males, the recording period was extended to 2 hours, as these females take longer to initiate courtship (Nishiike et al., 2021). In all video analyses, the researcher was blind to the fish genotype and treatment.”
Line 499: “brains dissected from males and females of the cyp19a1b-deficient line (analysis of ara, arb, vt, gal, npba, and esr2b) and males of the esr1-, esr2a-, and esr2b-deficient lines” has been revised to “male brains from the cyp19a1b-deficient line (analysis of ara, arb, vt, and gal) and from the esr1-, esr2a-, and esr2b-deficient lines.”
Line 504: “After color development for 15 min (gal), 40 min (npba), 2 hours (vt), or overnight (ara, arb, and esr2b)” has been revised to “After color development for 15 min (gal), 2 hours (vt), or overnight (ara and arb).”
Line 516: “Thermo Fisher Scientific, Waltham, MA” has been changed to “Thermo Fisher Scientific” to avoid redundancy.
Line 565: The subsection entitled “Measurement of spatial distances between fish” has been removed.
Line 585: “6/10 cyp19a1b<sup>+/+</sup>, 3/10 cyp19a1b<sup>+/−</sup>, and 6/10 cyp19a1b<sup>−/−</sup> females were excluded in Fig. 6B;” has been deleted.
References
The following references have been removed:
Capel B. 2017. Vertebrate sex determination: evolutionary plasticity of a fundamental switch. Nature Reviews Genetics 18:675–689. DOI: https://doi.org/10.1038/nrg.2017.60
Hiraki T, Nakasone K, Hosono K, Kawabata Y, Nagahama Y, Okubo K. 2014. Neuropeptide B is femalespecifically expressed in the telencephalic and preoptic nuclei of the medaka brain. Endocrinology 155:1021–1032. DOI: https://doi.org/10.1210/en.2013-1806
Juntti SA, Hilliard AT, Kent KR, Kumar A, Nguyen A, Jimenez MA, Loveland JL, Mourrain P, Fernald RD. 2016. A neural basis for control of cichlid female reproductive behavior by prostaglandin F2α. Current Biology 26:943–949. DOI: https://doi.org/10.1016/j.cub.2016.01.067
Kimchi T, Xu J, Dulac C. 2007. A functional circuit underlying male sexual behaviour in the female mouse brain. Nature 448:1009–1014. DOI: https://doi.org/10.1038/nature06089
Kobayashi M, Stacey N. 1993. Prostaglandin-induced female spawning behavior in goldfish (Carassius auratus) appears independent of ovarian influence. Hormones and Behavior 27:38–55.
DOI:https://doi.org/10.1006/hbeh.1993.1004
Liu H, Todd EV, Lokman PM, Lamm MS, Godwin JR, Gemmell NJ. 2017. Sexual plasticity: a fishy tale. Molecular Reproduction and Development 84:171–194. DOI: https://doi.org/10.1002/mrd.22691
Munakata A, Kobayashi M. 2010. Endocrine control of sexual behavior in teleost fish. General and Comparative Endocrinology 165:456–468. DOI: https://doi.org/10.1016/j.ygcen.2009.04.011
Nugent BM, Wright CL, Shetty AC, Hodes GE, Lenz KM, Mahurkar A, Russo SJ, Devine SE, McCarthy MM. 2015. Brain feminization requires active repression of masculinization via DNA methylation. Nature Neuroscience 18:690–697. DOI: https://doi.org/10.1038/nn.3988
Shaw K, Therrien M, Lu C, Liu X, Trudeau VL. 2023. Mutation of brain aromatase disrupts spawning behavior and reproductive health in female zebrafish. Frontiers in Endocrinology 14:1225199.
DOI:https://doi.org/10.3389/fendo.2023.1225199
Stacey NE. 1976. Effects of indomethacin and prostaglandins on the spawning behaviour of female goldfish. Prostaglandins 12:113–126. DOI: https://doi.org/10.1016/s0090-6980(76)80010-x
Figure 1
Panel B, which originally showed steroid levels in female brains, has been replaced with steroid levels in the periphery of males, originally presented in Figure S1, panel C. Accordingly, the legend “(A and B) Levels of E2, testosterone, and 11KT in the brain of adult cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (A) and females (B) (n = 3 per genotype and sex).” has been revised to “(A, B) Levels of E2, testosterone, and 11KT in the brain (A) and periphery (B) of adult cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (n = 3 per genotype).”
Figure 3
The female data have been deleted from Figure 3. The revised Figure 3 is presented.
The corresponding legend text has been revised as follows:
Line 862: “males and females (n = 4 and 5 per genotype for males and females, respectively)” has been changed to “males (n = 4 per genotype)”.
Line 864: “males and females (n = 4 except for cyp19a1b<sup>+/+</sup> males, where n = 3)” has been changed to “males (n = 3 and 4, respectively)”.
Figure 6
Figure 6 and its legend have been removed.
Figure 1—figure supplement 1
Panel C, showing male data, has been moved to Figure 1B, as described above, while panel D, showing female data, has been deleted. The corresponding legend “(C and D) Levels of E2, testosterone, and 11KT in the periphery of adult cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (C) and females (D) (n = 3 per genotype and sex). Statistical differences were assessed by Bonferroni’s post hoc test (C and D). Error bars represent SEM. *P < 0.05.” has also been removed.
Line 804: Following this change, the figure title has been updated from “Generation of cyp19a1bdeficient medaka and evaluation of peripheral sex steroid levels” to “Generation of cyp19a1b-deficient medaka.”
The statistics comparing "experimental to experimental" and "control to experimental" isn't appropriate
This comment is the same as one raised in the first review (Reviewer #1’s comment 7 on weaknesses), which we already addressed in our initial revision. For the reviewer’s convenience, we provide the response below:
The reviewer raised concerns about the statistical analysis used for Figures 4C and 4E, suggesting that Bonferroni’s test should be used instead of Dunnett’s test. However, Dunnett’s test is commonly used to compare treatment groups to a reference group that receives no treatment, as in our study. Since we do not compare the treated groups with each other, we believe Dunnett’s test is the most appropriate choice.
Line 576: The reviewer’s concern may have arisen from the phrase “comparisons between control and experimental groups” in the Materials and methods. We have revised it to “comparisons between untreated and E2-treated groups in Figure 4C and D” for clarity.
Reviewer #3 (Public Review):
Summary:
Taking advantage of the existence in fish of two genes coding for estrogen synthase, the enzyme aromatase, one mostly expressed in the brain (Cyp19a1b) and the other mostly found in the gonads (Cyp19a1a), this study investigates the role of brain-derived estrogens in the control of sexual and aggressive behavior in medaka. The constitutive deletion of Cyp19a1b markedly reduced brain estrogen content in males and to a lesser extent in females. These effects are accompanied by reduced sexual and aggressive behavior in males and reduced preference for males in females. These effects are reversed by adult treatment with supporting a role for estrogens. The deletion of Cyp19a1b is associated with a reduced expression of the genes coding for the two androgen receptors, ara and arb, in brain regions involved in the regulation of social behavior. The analysis of the gene expression and behavior of mutants of estrogen receptors indicates that these effects are likely mediated by the activation of the esr1 and esr2a isoforms. These results provide valuable insight into the role of estrogens in social behavior in the most abundant vertebrate taxon, however the conclusion of brain-derived estrogens awaits definitive confirmation.
We thank this reviewer for their positive evaluation of our work and comments that have improved the manuscript.
Strength:
Evaluation of the role of brain "specific" Cyp19a1 in male teleost fish, which as a taxon are more abundant and yet proportionally less studied that the most common birds and rodents. Therefore, evaluating the generalizability of results from higher vertebrates is important. This approach also offers great potential to study the role of brain estrogen production in females, an understudied question in all taxa.
Results obtained from multiple mutant lines converge to show that estrogen signaling, likely synthesized in the brain drives aspects of male sexual behavior.
The comparative discussion of the age-dependent abundance of brain aromatase in fish vs mammals and its role in organization vs activation is important beyond the study of the targeted species. - The authors have made important corrections to tone down some of the conclusions which are more in line with the results.
We thank the reviewer again for their positive evaluation of our work and the revisions we have made.
weaknesses:
No evaluation of the mRNA and protein products of Cyp19a1b and ESR2a are presented, such that there is no proper demonstration that the mutation indeed leads to aromatase reduction. The conclusion that these effects dependent on brain derived estrogens is therefore only supported by measures of E2 with an EIA kit that is not validated. No discussion of these shortcomings is provided in the discussion thus further weakening the conclusion manuscript.
In response to this and other comments, we have now provided direct validation that the cyp19a1b mutation in our medaka leads to loss of function. Real-time PCR analysis showed that cyp19a1b transcript levels in the brain were reduced by approximately half in cyp19a1b<sup>+/−</sup> males and were nearly absent in cyp19a1b<sup>−/−</sup> males, consistent with nonsense-mediated mRNA decay
In addition, AlphaFold 3-based structural modeling indicated that the mutant Cyp19a1b protein lacks essential motifs, including the aromatic region and heme-binding loop, and exhibits severe conformational distortion (see figure; key structural features are annotated as follows: membrane helix (blue), aromatic region (red), and heme-binding loop (orange)).
Results:
Line 101: The following text has been added: “Loss of cyp19a1b function was further confirmed by measuring cyp19a1b transcript levels in the brain and by predicting the three-dimensional structure of the mutant protein. Real-time PCR revealed that transcript levels were reduced by half in cyp19a1b<sup>+/−</sup> males and were nearly undetectable in cyp19a1b<sup>−/−</sup> males, presumably as a result of nonsense-mediated mRNA decay (Lindeboom et al., 2019) (Figure 1C). The wild-type protein, modeled by AlphaFold 3, exhibited a typical cytochrome P450 fold, including the membrane helix, aromatic region, and hemebinding loop, all arranged in the expected configuration (Figure 1—figure supplement 1C). The mutant protein, in contrast, was severely truncated, retaining only the membrane helix (Figure 1—figure supplement 1C). The absence of essential domains strongly indicates that the allele encodes a nonfunctional Cyp19a1b protein. Together, transcript and structural analyses consistently demonstrate that the mutation generated in this study causes a complete loss of cyp19a1b function.”
Materials and methods
Line 438: A subsection entitled “Real-time PCR” has been added. The text of this subsection is as follows: “Total RNA was isolated from the brains of cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males using the RNeasy Plus Universal Mini Kit (Qiagen, Hilden, Germany). cDNA was synthesized with the SuperScript VILO cDNA Synthesis Kit (Thermo Fisher Scientific, Waltham, MA). Real-time PCR was performed on the LightCycler 480 System II using the LightCycler 480 SYBR Green I Master (Roche Diagnostics). Melting curve analysis was conducted to verify that a single amplicon was obtained in each sample. The β-actin gene (actb; GenBank accession number NM_001104808) was used to normalize the levels of target transcripts. The primers used for real-time PCR are shown in Supplementary file 2.”
Line 448: A subsection entitled “Protein structure prediction” has been added. The text of this subsection is as follows: “Structural predictions of Cyp19a1b proteins were conducted using AlphaFold 3 (Abramson et al., 2024). Amino acid sequences corresponding to the wild-type allele and the mutant allele generated in this study were submitted to the AlphaFold 3 prediction server. The resulting models were visualized with PyMOL (Schrödinger, New York, NY), and key structural features, including the membrane helix, aromatic region, and heme-binding loop, were annotated.”
References
The following two references have been added:
Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, Bodenstein SW, Evans DA, Hung CC, O'Neill M, Reiman D, Tunyasuvunakool K, Wu Z, Žemgulytė A, Arvaniti E, Beattie C, Bertolli O, Bridgland A, Cherepanov A, Congreve M, CowenRivers AI, Cowie A, Figurnov M, Fuchs FB, Gladman H, Jain R, Khan YA, Low CMR, Perlin K, Potapenko A, Savy P, Singh S, Stecula A, Thillaisundaram A, Tong C, Yakneen S, Zhong ED, Zielinski M, Žídek A, Bapst V, Kohli P, Jaderberg M, Hassabis D, Jumper JM. 2024. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630:493–500. DOI: https://doi.org/10.1038/s41586-024-07487-w
Lindeboom RGH, Vermeulen M, Lehner B, Supek F. 2019. The impact of nonsense-mediated mRNA decay on genetic disease, gene editing and cancer immunotherapy. Nature Genetics 51:1645–1651.DOI:https://doi.org/10.1038/s41588-019-0517-5
Figure 1
The real-time PCR results described above have been incorporated in Figure 1, panel C, with the corresponding legend provided below (line 788).
(C) Brain cyp19a1b transcript levels in cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (n = 6 per genotype). Mean value for cyp19a1b<sup>+/+</sup> males was arbitrarily set to 1.
The subsequent panels have been renumbered accordingly. The entirety of the revised Figure 1.
Figure 1—figure supplement 1
The AlphaFold 3-generated structural models described above have been incorporated in Figure 1— figure supplement 1, panel C, with the corresponding legend provided below (line 811).
(C) Predicted three-dimensional structures of wild-type (left) and mutant (right) Cyp19a1b proteins. Key structural features are annotated as follows: membrane helix (blue), aromatic region (red), and heme-binding loop (orange).
The entirety of the revised Figure 1—figure supplement 1 is presented
The information on the primers used for real-time PCR has been included in Supplementary file 2.
The functional deficiency of esr2a was already addressed in the previous revision. For clarity, we have reproduced the relevant information here.
A previous study reported that female medaka lacking esr2a fail to release eggs due to oviduct atresia (Kayo et al., 2019, Sci Rep 9:8868). Similarly, in this study, some esr2a-deficient females exhibited spawning behavior but were unable to release eggs, although the sample size was limited (Δ8 line: 2/3; Δ4 line: 1/1). In contrast, this was not observed in wild-type females (Δ8 line: 0/12; Δ4 line: 0/11). These results support the effective loss of esr2a function. To incorporate this information into the manuscript, the following text has been added to the Materials and methods (line 423): “A previous study reported that esr2a-deficient female medaka cannot release eggs due to oviduct atresia (Kayo et al., 2019). Likewise, some esr2a-deficient females generated in this study, despite the limited sample size, exhibited spawning behavior but were unable to release eggs (Δ8 line: 2/3; Δ4 line: 1/1), while such failure was not observed in wild-type females (Δ8 line: 0/12; Δ4 line: 0/11). These results support the effective loss of esr2a function.”
Most experiments are weakly powered (low sample size).
This comment is essentially the same as one raised in the first review (Reviewer #3’s comment 7 on weaknesses). We acknowledge the reviewer’s concern that the histological analyses were weakly powered due to the limited sample size. In our earlier revision, we responded as follows:
Histological analyses were conducted with a relatively small sample size, as our previous experience suggested that interindividual variability in the results would not be substantial. Since significant differences were detected in many analyses, further increasing the sample size was deemed unnecessary.
The variability of the mRNA content for a same target gene between experiments (genotype comparison vs E2 treatment comparison) raises questions about the reproducibility of the data (apparent disappearance of genotype effect).
This comment is the same as one raised in the first review (Reviewer #3’s comment 8 on weaknesses), which we already addressed in our initial revision. For the reviewer’s convenience, we provide the response below:
As the reviewer pointed out, the overall area of ara expression is larger in Figure 2J than in Figure 2F. However, the relative area ratios of ara expression among brain nuclei are consistent between the two figures, indicating the reproducibility of the results. Thus, this difference is unlikely to affect the conclusions of this study.
Additionally, the differences in ara expression in pPPp and arb expression in aPPp between wild-type and cyp19a1b-deficient males appear less pronounced in Figures 2J and 2K than in Figures 2F and 2H. This is likely attributable to the smaller sample size used in the experiments for Figures 2J and 2K, resulting in less distinct differences. However, as the same genotype-dependent trends are observed in both sets of figures, the conclusion that ara and arb expression is reduced in cyp19a1b-deficient male brains remains valid.
Conclusions:
Overall, the claims regarding role of estrogens originating in the brain on male sexual behavior is supported by converging evidence from multiple mutant lines. The role of brain-derived estrogens on gene expression in the brain is weaker as are the results in females.
We appreciate the reviewer’s positive evaluation of our findings on male behavior. The concern regarding the role of brain-derived estrogens in gene expression has been addressed in our rebuttal, and the female data have been removed so that the analysis now focuses on males. The specific revisions for removing the female data are described in Response to reviewer #1’s comment 6 on weaknesses.
Recommendations For The Authors:
Reviewer #1 (Recommendations For The Authors):
The manuscript is improved slightly. I am thankful the authors addressed some concerns, but for several concerns the referees raised, the authors acknowledged them yet did not make corresponding changes to the manuscript or disagreed that they were issues at all without explanation. All reviewers had issues with the imbalanced focus on males versus females and the male aggression assay. Yet, they did not perform additional experiments or even make changes to the framing and scope of the manuscript. If the authors had removed the female data, they may have had a more cohesive story, but then they would still be left with inadequate behavior assays in the males. If the authors don't have the time or resources to perform the additional work, then they should have said so. However, the work would be incomplete relative to the claims. That is a key point here. If they change their scope and claims, the authors avoid overstating their findings. I want to see this work published because I believe it moves the field forward. But the authors need to be realistic in their interpretations of their data.
In response to this and related comments, we have removed the female data and focused the manuscript on analyses in males. The specific revisions are described in Response to reviewer #1’s comment 6 on weaknesses. Additionally, we have validated that the cyp19a1b mutation in our medaka leads to loss of function (see Response to reviewer #3’s comment 1 on weaknesses), which further strengthens the reliability of our conclusions regarding male behavior.
I agree with the reviewer who said we need to see validation of the absence of functional cyp19a1 b in the brain. However, the results from staining for the protein and performing in situ could be quizzical. Indeed, there aren't antibodies that could distinguish between aromatase a and b, and it is not uncommon for expression of a mutated gene to be normal. One approach they could do is measure aromatase activity, but they are *sort of* doing that by measuring brain E2. It's not perfect, but we teleost folks are limited in these areas. At the very least, they should show the predicted protein structure of the mutated aromatase alleles. It could show clearly that the tertiary structure is utterly absent, giving more support to the fact that their aromatase gene is non-functional.
As noted above, we have further validated the loss of cyp19a1b function by measuring cyp19a1b transcript levels in the brain and predicting the three-dimensional structure of the mutant protein. These analyses confirmed that cyp19a1b function is indeed lost, thereby increasing the reliability of our conclusions. For further details, please refer to Response to reviewer #3’s comment 1 on weaknesses.
With all of this said, the work is important, and it is possible that with a reframing of the impact of their work in the context of their findings, I could consider the work complete. I think with a proper reframing, the work is still impactful.
In accordance with this feedback, and as described above, we have reframed the manuscript by removing the female data and focusing exclusively on males. This revision clarifies the scope of our study and reinforces the support for our conclusions. For further details, please refer to Response to reviewer #1’s comment 6 on weaknesses.
(1) Clearly state in the Figure 1 legend that each data point for male aggressive behaviors represents the total # of behaviors calculated over the 4 males in each experimental tank.
In response to this comment, we have revised the legend of Figure 1K (line 797). The original legend, “(K) Total number of each aggressive act observed among cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, or cyp19a1<sup>−/−</sup> males in the tank (n = 6, 7, and 5, respectively),” has been updated to “(K) Total number of each aggressive act performed by cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males. Each data point represents the sum of acts recorded for the 4 males of the same genotype in a single tank (n = 6, 7, and 5 tanks, respectively).” This clarifies that each data point reflects the total behaviors of the 4 males within each tank.
(2) The authors wrote under "Response to reviewer #1's major comment "...the development of male behaviors may require moderate neuroestrogen levels that are sufficient to induce the expression of ara and arb, but not esr2b, in the underlying neural circuitry": "This may account for the lack of aggression recovery in E2-treated cyp19a1b-deficient males in this study.".
What is meant by the latter statement? What accounts for the lack of aggression? The lack of increase in esr2b? Please clarify.
Line 365: In response to this comment, “This may account for the lack of aggression recovery in E2treated cyp19a1b-deficient males in this study.” has been revised to “Considering this, the lack of aggression recovery in E2-treated cyp19a1b-deficient males in this study may be explained by the possibility that the E2 dose used was sufficient to induce not only ara and arb but also esr2b expression in aggression-relevant circuits, which potentially suppressed aggression.”
This revision clarifies that, while moderate brain estrogen levels are sufficient to promote male behaviors via induction of ara and arb, the E2 dose used in this study may have additionally induced esr2b in circuits relevant to aggression, potentially underlying the lack of aggression recovery.
(3) This is a continuation of my comment/concern directly above. If the induction of ara and arb aren't enough, then how can, as the authors state, androgen signaling be the primary driver of these behaviors?
In response to this follow-up comment, we would like to clarify that, as described above, the lack of aggression recovery in E2-treated cyp19a1b-deficient males is not due to insufficient induction of ara and arb, but instead is likely because esr2b was also induced in aggression-relevant circuits, which may have suppressed aggression. Therefore, the concern that androgen signaling cannot be the primary driver of these behaviors is not applicable.
(4) The authors' point about sticking with the terminology for the ar genes as "ara" and "arb" is not convincing. The whole point of needing a change to match the field of neuroendocrinology as a whole (that is, across all vertebrates) is researchers, especially those with high standing like the Okubo group, adopt the new terminology. Indeed, the Okubo group is THE leader in medaka neuroendocrinology. It would go a long way if they began adopting the new terminology of "ar1" and "ar2". I understand this may be laborious to a degree, and each group can choose to use their terminology, but I'd be remiss if I didn't express my opinion that changing the terminology could help our field as a whole.
We sincerely appreciate the reviewer’s thoughtful comments regarding nomenclature consistency in vertebrate neuroendocrinology. We understand the motivation behind the suggestion to adopt ar1 and ar2. However, we consider the established nomenclature of ara and arb to be more appropriate for the following reasons.
First, adopting the ar1/ar2 nomenclature would introduce a discrepancy between gene and protein symbols. According to the NCBI International Protein Nomenclature Guidelines (Section 2B.Abbreviations and symbols;
https://www.ncbi.nlm.nih.gov/genbank/internatprot_nomenguide/), the ZFIN Zebrafish Nomenclature Conventions (Section 2. PROTEINS:https://zfin.atlassian.net/wiki/spaces/general/pages/1818394635/ZFIN+Zebrafish+Nomenclature+Con ventions), and the author guidelines of many journal
(e.g.,https://academic.oup.com/molehr/pages/Gene_And_Protein_Nomenclature), gene and protein symbols should be identical (with proteins designated in non-italic font and with the first letter capitalized). Maintaining consistency between gene and protein symbols helps avoid unnecessary confusion. The ara/arb nomenclature allows this, whereas ar1/ar2 does not.
Second, the two androgen receptor genes in teleosts are paralogs derived from the third round of wholegenome duplication that occurred early in teleost evolution. For such duplicated genes, the ZFIN Zebrafish Nomenclature Conventions (Section 1.2. Duplicated genes) recommend appending the suffixes “a” and “b” to the approved symbol of the human or mouse ortholog. This convention clearly indicates that these genes are whole-genome duplication paralogs and provides an intuitive way to represent orthologous and paralogous relationships between teleost genes and those of other vertebrates. As a result, it has been widely adopted, and we consider it logical and beneficial to apply the same principle to androgen receptors.
In light of these considerations, we respectfully maintain that the ara/arb nomenclature is more suitable for the present manuscript than the alternative ar1/ar2 system.
(5) In the discussion please discuss these potentially unexpected findings.
(a) gal was unaffected in female cyp19a1 mutants, but they exhibit mating behaviors towards females. Given gal is higher in males and these females act like females, what does this mean about the function of gal/its utility in being a male-specific marker (is it one??)?
(b) esr2b expression is higher in female cyp19a1 mutants. this is unexpected as well given esr2b is required for female-typical mating and is higher in females compared to males and E2 increases esr2b expression. please explain...well, what this means for our idea of what esr2b expression tell us.
We thank the reviewer for the insightful comments. As the female data have been removed from the manuscript, discussion of these findings in female cyp19a1b mutants is no longer necessary.
Reviewer #3 (Recommendations For The Authors):
The authors have addressed a number of answers to the reviewer's comments, notably they provided missing methodological information and rephrased the text. However, the authors have not addressed the main issues raised by the reviewers. Notably, it is regrettable that the reduced amount of brain aromatase cannot be confirmed, this seems to be the primary step when validating a new mutant. Even if protein products of the two genes may not be discriminated (which I can understand), it should be possible to evaluate the expression of a common messenger and/or peptide and confirm that aromatase expression is reduced in the brain. Since Cyp19a1b is relatively more abundant in the brain Cyp19a1a, this would strengthen the conclusion and provide confidence that the mutant indeed does silence aromatase expression in the brain. Although these short comings are acknowledged in the rebuttal letter, this is not mentioned in the discussion. Doing so would make the manuscript more transparent and clearer.
As noted in Response to reviewer #3’s comment 1 on weaknesses, we have validated the loss of Cyp19a1b function by measuring its transcript levels in the brain and predicting the three-dimensional structure of the mutant protein. These analyses confirmed that Cyp19a1b function is indeed lost, thereby increasing the reliability of our conclusions.
FigS1 - panels C&D please indicate in which tissue were hormones measured. Blood?
We thank the reviewer for pointing this out. In our study, “peripheral” refers to the caudal half of the body excluding the head and visceral organs, not blood. Accordingly, we have revised the figure legend and the description in the Materials and Methods section as follows:
Legend for Figure 1B (line 787) now reads: “Levels of E2, testosterone, and 11KT in the brain (A) and peripheral tissues (caudal half of the body) (B) of adult cyp19a1b<sup>+/+</sup>, cyp19a1b<sup>+/−</sup>, and cyp19a1b<sup>−/−</sup> males (n = 3 per genotype).”
Materials and methods (line 431): The sentence “Total lipids were extracted from the brain and peripheral tissues (from the caudal half) of” has been revised to “Total lipids were extracted from the brain and from peripheral tissues, specifically the caudal half of the body excluding the head and visceral organs, of.”
Additional Alterations:
We have reformatted the text and supporting materials to comply with the journal’s Author Guidelines. The following changes have been made:
(1) Figures and supplementary files are now provided separately from the main text.
(2) The title page has been reformatted without any changes to its content.
(3) In-text citations have been changed from numerical references to the author–year format.
(4) Figure labels have been revised from “Fig. 1,” “Fig. S1,” etc., to “Figure 1,” “Figure 1—figure supplement 1,” etc.
(5) Table labels have been revised from “Table S1,” etc., to “Supplementary file 1,” etc.
(6) Line 324: The typo “is” has been corrected to “are”.
(7) Line 382: The section heading “Materials and Methods” has been changed to “Materials and methods” (lowercase “m”).
(8) Line 383: The Key Resources Table has been placed at the beginning of the Materials and methods section.
(9) Line 389: The sentence “Sexually mature adults (2–6 months) were used for experiments, and tissues were consistently sampled 1–5 hours after lights on.” has been revised to “Sexually mature adults (2–6 months) were used for experiments and assigned randomly to experimental groups. Tissues were consistently sampled 1–5 hours after lights on.”
(10) Line 393: The sentence “All fish were handled in accordance with the guidelines of the Institutional Animal Care and Use Committee of the University of Tokyo.” has been removed.
(11) Line 589: The following sentence has been added: “No power analysis was conducted due to the lack of relevant data; sample size was estimated based on previous studies reporting inter-individual variation in behavior and neural gene expression in medaka.”
(12) Line 598: The reference list has been reordered from numerical sequence to alphabetical order by author.
(13) In the figure legends, notations such as “A and B” have been revised to “A, B.”
-
-
www.medrxiv.org www.medrxiv.org
-
eLife Assessment
This paper describes a useful Bayesian model to estimate the probabilities of access, use, and use given access of insecticide-treated bed nets (ITNs), by using sub-national cross-sectional survey data and the annual number of ITNs received at the country level. The authors provide convincing evidence to support their modeling approach, which could be enhanced by more validation and exploration of model assumptions.
-
Reviewer #1 (Public review):
Summary:
This paper provides a novel method to improve the accuracy of predictions of the impact of ITN strategies, by using sub-national estimates of the duration of ITN access and use over time from cross-sectional survey data and annual country ITNs received.
Strengths:
The approach is novel, makes use of available data, and has considered all of the relevant components of ITN distributions.
Weaknesses:
(1) The main message of the paper was not very clear, and did not seem to fit the title. The title focuses on sub-national tailoring of ITN, but the abstract did not feature results directly about SNT. It was not very clear what the main result of the paper was - there are several ITN observations in the results and discussion. Most did not seem to be directly about SNT, but rather sub-national differences in use and access were accounted for in the analyses. It was not clear if the same conclusions would be reached without accounting for sub-national differences, but the estimates and predictions could be expected to be more accurate.
(2) Some of the results seemed to me to be apparent even without a modelling exercise (eg high coverage could not be maintained between campaigns, use would be higher with 2-yearly distributions rather than 3-yearly) or were not in themselves new insights (eg estimates of the duration of use). It would be helpful to clearly state what the novel results are in the abstract, the first paragraph of the discussion and the conclusions, and to make sure that the title is consistent.
(3) On L236, the link to SNT is stated: "the models indicate trends that can support sub-national tailoring of ITNs". They could indeed, but SNT itself is not done in this paper. It seems to be about improving sub-national predictions of the impact of single ITN strategies, by taking into account sub-national variation in access and use duration. This is useful, and the model developed has novel aspects.
(4) Individual countries may have records on when nets were distributed to the regions rather than needing to use the annual country number of nets together with the DHS data. It could be helpful to say what the analysis steps would be in that case.
(5) There were several assumptions that needed to be made in building the model. There is some validation of the timing of the distributions (L633 "verified where possible through discussion with interested parties nationally and internationally") and the fit of estimated access and use to survey data, and agreement between predictions of prevalence and MAP estimates. It would be helpful to say which assumptions are important for the results (and would be key knowledge gaps) and which would not make a difference. It might be possible to validate the net timing model using a country where net distributions are known reasonably well.
(6) What was assumed about what happens to old nets after a mass campaign was not clear. This assumption is likely to affect the predictions of access for the biennial distributions.
(7) L312 and elsewhere: That use given access declines with net age is plausible. However, I wondered if this could be partly a consequence of the assumptions in the model (eg the two exponential decays for access and use, the possible assumption that new nets displace the current ones when there is a mass campaign).
(8) The Methods section on Estimating historical use and access seemed to be aimed at readers familiar with formulae, but I think it could lose other interested readers. It could be useful to explain a little more about what is happening at each step and also why.
(9) The model was fitted to MAP estimates of PfPR2-10, which themselves come from a model. It may be that there is different uncertainty in the MAP estimates for different regions. I couldn't see this on the graph, but maybe the uncertainty is small. Was this taken into account in the fitting?
(10) Was uncertainty from each estimated component integrated into the other components?
(11) Eyeballing Figure 2 (Burkina Faso), there is a general pattern of decline in all the regions, some differences between the regions and some differences in how well the model fits between the regions. If possible, it could be helpful to say how much better the fit was when using region-specific compared to countrywide parameter values for access and use, and how different the results would be.
(12) The question of moving from a campaign every three to every two years may not be the most pertinent question in the current funding landscape. I realise that a paper is in development for a long time, but it would be helpful to comment on what else the model could be used for when fewer rather than more nets are likely to be available.
-
Reviewer #2 (Public review):
Summary:
The authors design a custom Bayesian model to estimate the probabilities of access, use and use given access of insecticide-treated nets in six African countries, providing sub-national estimates and inferring the average duration of ITN use and access. An individual-based model was employed to simulate malaria epidemics and estimate the effectiveness of different ITN distribution strategies. The study finds that the mean probability of use or access did not reach 80% (a universal coverage formely targeted by WHO) for any of the regions, even for biennial campaigns, demonstrates that switching from triennial to biennial distribution campaigns increases population use by 7.9%, and evaluates the impact of employing more efficient ITNs on P. falciparum prevalence.
Strengths:
The authors developed a data-driven model that accounts for data collection imperfections and sources of uncertainty while differentiating between ITN use and access. They developed a methodology to infer the timing of a mass campaign from publicly available data instead of assuming fixed dates. The probability of use given access allows for determining the regions where ITN distribution is least effective. This work can help better inform future interventions by identifying regions where increasing mass campaign frequency or employing better ITNs are most effective. Finally, in addition to insights on ITN access and use for the six countries analyzed, the paper contributes a methodological framework that can likely be extended to other countries.
Weaknesses:
Since the models employed are rather complex, the description of the methodology may be hard to follow for most readers. In addition, the models assume many hypotheses, including:
(1) Exponential decay of ITN use/access.
(2) The decay rates for the probability of the ITN repelling and killing a mosquito are the same.
(3) Given a time instant, all individuals in the same administrative unit and have the same probability of using a net;
(4) ITN use/access decay models do not depend on the distribution strategy (e.g. bienal vs trienal distribution).
(5) The Bayesian model assumes some narrow prior distributions.
The impact of these hypotheses on the estimated parameters is not explored in the paper, and no sensitivity analyses are performed, although some limitations are discussed.
-
Author response:
We would like to thank both reviewers for taking the time to review the manuscript in detail. Your comments have been extremely useful and constructive. A revised version of the manuscript will seek to address the weaknesses raised, clarifying the reasons for the assumptions made, the impact they have and how they influence the policy implication of the work. We will clarify the language to differentiate the work from the standard sub-national tailoring which is typically conducted to support National Malaria Programmes and emphasise why our mechanistic model can provide greater information than simple summary statistics.
-
-
-
eLife Assessment
The authors provide a useful integrated analytical approach to investigating MASLD focused on diverse multiomic integration methods. The strength of evidence for this new resource is solid, as analyses highlight the importance of previously-described pathophysiologic processes, as well as unveil several new mechanisms as key features of MASLD in obese patients.
-
Reviewer #1 (Public review):
Summary:
Metabolic dysfunction-associated steatotic liver disease (MASLD) ranges from simple steatosis, steatohepatitis, fibrosis/cirrhosis, and hepatocellular carcinoma. In the current study, the authors aimed to determine the early molecular signatures differentiating patients with MASLD associated fibrosis from those patients with early MASLD but no symptoms. The authors recruited 109 obese individuals before bariatric surgery. They separated the cohorts as no MASLD (without histological abnormalities) and MASLD. The liver samples were then subjected to transcriptomic and metabolomic analysis. The serum samples were subjected to metabolomic analysis. The authors identified dysregulated lipid metabolism, including glyceride lipids, in the liver samples of MASLD patients compared to the no MASLD ones. Circulating metabolomic changes in lipid profiles slightly correlated with MASLD, possibly due to the no MASLD samples derived from obese patients. Several genes involved in lipid droplet formation were also found elevated in MASLD patients. Besides, elevated levels of amino acids, which are possibly related to collagen synthesis, were observed in MASLD patients. Several antioxidant metabolites were increased in MASLD patients. Furthermore, dysregulated genes involved in mitochondrial function and autophagy were identified in MASLD patients, likely linking oxidative stress to MASLD progression. The authors then determined the representative gene signatures in the development of fibrosis by comparing this cohort with the other two published cohorts. Top enriched pathways in fibrotic patients included GTPas signaling and innate immune responses, suggesting the involvement of GTPas in MASLD progression to fibrosis. The authors then challenged human patient derived 3D spheroid system with a dual PPARa/d agonist and found that this treatment restored the expression levels of GTPase-related genes in MASLD 3D spheroids. In conclusion, the authors suggested the involvement of upregulated GTPase-related genes during fibrosis initiation.
Concerns from first round of review:
(1) A recent study, via proteomic and transcriptomic analysis, revealed that four proteins (ADAMTSL2, AKR1B10, CFHR4 and TREM2) could be used to identify MASLD patients at risk of steatohepatitis (PMID: 37037945). It is not clear why the authors did not include this study in their comparison.
(2) The authors recruited 109 patients but only performed transcriptomic and metabolomic analysis in 94 liver samples. Why did the authors exclude other samples?
(3) The authors mentioned clinical data in Table 1 but did not present the table in this manuscript.
(4) The generated metabolomic data could be a very useful resource to the MASLD community. However, it is very confusing how the data was generated in those supplemental tables. There is no clear labeling of human clinical information in those tables. Also, what do those values mean in columns 47-154? This reviewer assumed that they are the raw data of metabolomic analysis in plasma samples. However, without clear clinical information in these patients, it is impossible that any scientist can use the data to reproduce the authors' findings.
(5) In Fig. 5B, the authors excluded the steatosis and fibrosis overlapped genes. Steatosis and fibrosis specific genes could simply reflect the outcomes rather than causes. In this case, the obtained results might not identify the gene signatures related to fibrosis initiation.
(6 In Fig. 6D, the authors used 3D liver spheroid to validate their findings. However, there is no images showing the 3D liver spheroid formation before and after PPARa/d agonist treatment. It is not clear whether the 3D liver spheroid was successfully established.
(7) The authors suggested that targeting LX-2 cells with Rac1 and Cdc42 inhibitors could reduce collagen production. Did the authors observe these two genes upregulated in mRNA and protein expression levels in their cohort when compared MASLD patients with and without fibrosis?
(8) Did the authors observe that the expression levels of Rac1 and Cdc42 are correlated with fibrosis progression in MASLD patients?
(9) Other studies have revealed several metabolite changes related to MASLD progression (PMID: 35434590, PMID: 22364559). However, the authors did not discuss the discrepancies between their findings with the previous studies.
Significance:
Overall, the current study might provide some new resources regarding transcriptomic and metabolomic data derived from obese patients with and without MASLD. The MASLD research community will be interested in the resource data.
Comments on revised version:
Thank you for the authors' responses to my concerns. I do not have any further comments.
-
Reviewer #2 (Public review):
In this paper, Kaldis and collaborators investigate the molecular heterogeneity of a 109 morbidly obese patient cohort, focusing on liver transcriptomics and metabolomics analysis from liver and serum. The main finding (i.e. upregulation of GTPase-coding genes) was validated in spheroids and a human HSC cell line. As these proteins are involved in critical cellular functions related to metabolism and cytoskeleton dynamics, these findings shed light on their involvement in human liver pathology which so far has been poorly (or even not) documented to date. This is an interesting addition to the current knowledge about chronic liver pathology and warranting further in-depth molecular investigations to address molecular mechanisms of action (cellular specificity, GTPase-driven pathways...).
Strengths:
Using a well-characterized patient cohort of moderate size, the study provide transcriptomic and metabolomic data of high quality with adequate statistical corrections which are a very useful resource for the community. Mechanistic experiments usefully hint at novel druggable targets in the early steps of fibrosis, hence probably in hepatic stellate cell activation.
Weaknesses:
Cross comparisons with other cohorts is informative but of limited interest due to patient classification issues, inherent to histological staging practices. The lack of correlation between transcriptomic and metabolomic data is deceptive but expected due to the systemic nature of metabolomic analysis and was also observed in recently published papers.
Comments on revised version:
I have no further comment about this amended version, aside from suggesting to add (if known) the time at which biopsies were collected. Time-of-day is an important yet often overlooked parameter of gene expression variation, and along the same line, the imposed fasting to bariatric surgery patients is also a matter of variation of gene expression and of metabolite abundance. It is hoped that future investigations will more precisely characterize the role of the newly identified targets in MASLD.
-
Reviewer #3 (Public review):
Summary:
Metabolic dysfunction associated liver disease (MASLD) describes a spectrum of progressive liver pathologies linked to life style-associated metabolic alterations (such as increased body weight and elevated blood sugar levels), reaching from steatosis over steatohepatitis to fibrosis and finally end stage complications, such as liver failure and hepatocellular carcinoma. Treatment options for MASLD include diet adjustments, weight loss, and the receptor-β (THR-β) agonist resmetirom, but remain limited at this stage, motivating further studies to elucidate molecular disease mechanisms to identify novel therapeutic targets.
In their present study, the authors aim to identify early molecular changes in MASLD linked to obesity. To this end, they study a cohort of 109 obese individuals with no or early-stage MASLD combining measurements from two anatomic sides: 1. bulk RNA-sequencing and metabolomics of liver biopsies, and 2. metabolomics from patient blood. Their major finding is that GTPase-related genes are transcriptionally altered in livers of individuals with steatosis with fibrosis compared to steatosis without fibrosis.
Comments from the first round of review:
(1) Confounders (such as (pre-)diabetes)
The patient table shows significant differences in non-MASLD vs. MASLD individuals, with the latter suffering more often from diabetes or hypertriglyceridemia. Rather than just stating corrections, subgroup analyses should be performed (accompanied with designated statistical power analyses) to infer the degree to which these conditions contribute to the observations. I.e., major findings stating MASLD-associated changes should hold true in the subgroup of MASLD patients without diabetes/of female sex and so forth (testing for each of the significant differences between groups).
Post-rebuttal update: The authors have performed the requested sub-group analysis and find the gene signatures hold for the non-diabetic sub-cohort, but not the diabetic subgroup. They denote a likely interaction between fibrosis and diabetes, that was not corrected for in the original analysis.
(2) External validation
Additionally, to back up the major GTPase signature findings, it would be desirable to analyze an external dataset of (pre)diabetes patients (other biased groups) for alternations in these genes. It would be important to know if this signature also shows in non-MASLD diabetic patients vs. healthy patients or is a feature specific to MASLD. Also, could the matched metabolic data be used to validate metabolite alterations that would be expected under GTPase-associated protein dysregulation?
Post-rebuttal update: The authors confirm that with the present data, insulin resistance cannot be fully ruled out as a confounder to the GTP-ase related gene signature. They however plan future mouse model experiments to study whether the GTPase-fibrosis signature differs in diabetic vs. non-diabetic conditions.
(3).3D liver spheroid MASH model, Fig. 6D/E
This 3D experiment is technically not an external validation of GTPase-related genes being involved in MASLD, since patient-derived cells may only retain changes that have happened in vivo. To demonstrate that the GTPase expression signature is specifically invoked by fibrosis the LX-2 set up is more convincing, however, the up-regulation of the GTPase-related genes upon fibrosis induction with TGF-beta, in concordance with the patient data, needs to be shown first (qPCR or RNA-seq). Additionally, the description of the 3D model is too uncritical. The maintenance of functional PHHs is a major challenge (PMID: 38750036, PMID: 21953633, PMID: 40240606, PMID: 31023926). It cannot be ruled out that their findings are largely attributable to either 1) the (other present) mesenchymal cells (i.e., mesenchyme-derived cells, such as for example hepatic stellate cells, not to be confused with mesenchymal stem cells, MSCs), or 2) related to potential changes in PHHs in culture, and these limitations need to be stated.
Post-rebuttal update: To address the concern of other cells than hepatocytes contributing to the observed effects in culture, the authors performed TGF-beta treatment in independent mono-cultures (Figure R4): LX-2 and hepatocytes, and the spheroid system. Surprisingly, important genes highlighted in Figure 6E for the spheroid system (RAB6A, ARL4A, RAB27B, DIRAS2) are all absent from this qPCR(?) validation experiment. The authors evaluate instead RAC1, RHOU, VAV1, DOCK2, RAB32. In spheroids, RHOU and RAB32 are down-regulated with TGF-B. In hepatocytes DOCK2 and RAC seemed up-regulated. They find no difference in these genes in LX-2 cells. Surprisingly, ACTA2 expression values are missing for LX-2 cells. Together, it is hard to judge which individual cell type recapitulates the changes observed in patients in this validation experiment, as the major genes called out in Figure 6E are not analyzed.
Unfortunately, the 3D liver spheroid model used (as presented in PMID39605182) lacks important functional validation tests of maintained hepatocyte identity in culture (at the very least Albumin expression and secretion plus CYP3A4 assay). This functional data (acquired at the time point in culture when the RNA expression analysis in 6E was performed) is indispensable prior to stating that mature hepatocytes cause the observed effects.
(4) Novelty / references
Similar studies that also combined liver and blood lipidomics/metabolomics in obese individuals with and without MASLD (e.g. PMID 39731853, 39653777) should be cited. Additionally, it would benefit the quality of the discussion to state how findings in this study add new insights over previous studies, if their findings/insights differ, and if so, why.
Post-rebuttal update: The authors have included the studies into their discussion.
-
- Dec 2025
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
Argunşah et al. investigate the mechanisms underlying the differential response dynamics of barrel vs septa domains in shaping the responses to single vs multiple whiskers. Based on the observation of a higher density of SST+ interneurons in the septa, the authors investigate the hypothesis that Elfn1-dependent short-term plasticity shapes these responses. This important study is, however, supported by incomplete evidence; factors restricting the strength of evidence are the limited spatial resolution of the multi-unit activity, as well as the lack of a mechanistic explanation. This provocative and intellectually stimulating hypothesis provides a contribution to work on how different cell types shape cortical representation.
-
Reviewer #1 (Public Review):
Summary:
Argunşah et al. describe and investigate the mechanisms underlying the differential response dynamics of barrel vs septa domains in the whisker-related primary somatosensory cortex (S1). Upon repeated stimulation, the authors report that the response ratio between multi- and single-whisker stimulation increases in layer (L) 4 neurons of the septal domain, while remaining constant in barrel L4 neurons. The authors attribute this divergence to differences in short-term synaptic plasticity, particularly within somatostatin-expressing (SST⁺) interneurons. This interpretation is supported by 1) the increased density of SST+ neurons in L4 of the septa compared to barrel domain, 2) the stronger response of (L2/3) SST+ neurons to repeated multi- vs single-whisker stimulation and 3) the reduced functional difference in single- versus multi-whisker response ratios across barrel and septal domains in Elfn1 KO mice, which lack a synaptic protein that confers characteristic short-term plasticity, notably in SST+ neurons. Consistently, a decoder trained on WT data fails to generalize to Elfn1 KO responses. Finally, the authors report a relative enrichment of S2- and M1-projecting cell densities in L4 of the septal domain compared to the barrel domain, suggesting that septal and barrel circuits may differentially route information about single vs multi-whisker stimulation downstream of S1.
Strengths:
This paper describes and aims to study a circuit underlying differential response between barrel columns and septal domains of the primary somatosensory cortex. This work supports the view these two domains contribute distinctly to the processing single versus multi-whisker inputs and highlight the role of SST+ neuron and their short-term plasticity. Together, this study suggests that the barrel cortex multiplexes whisker-derived sensory information across its domains, enabling parallel processing within S1.
Weaknesses:
Although the divergence in responses to repeated single- versus multi-whisker stimulation between barrel and septal domains is consistent with a role for SST⁺ neuron short-term plasticity, the evidence presented does not conclusively demonstrate that this mechanism is the critical driver of the difference. The lack of targeted recordings and manipulations limits the strength of this conclusion: SST⁺ neuron activity is not measured in L4, nor is it assessed in a domain-specific manner. The Elfn1 knockout manipulation does not appear to selectively affect either stimulus condition, domain or interneuron subtype. Finally, all experiments were performed under anesthesia, which raises concerns about how well the reported dynamics generalize to awake cortical processing.
-
Reviewer #2 (Public review):
Summary:
Argunsah and colleagues demonstrate that SST expressing interneurons are concentrated in the mouse septa and differentially respond to repetitive multi-whisker inputs. Identifying how a specific neuronal phenotype impacts responses is an advance.
Strengths:
(1) Careful physiological and imaging studies.
(2) Novel result showing the role of SST+ neurons in shaping responses.
(3) Good use of a knockout animal to further the main hypothesis.
(4) Clear analytical techniques.
Comments on revisions:
The authors have effectively responded to my initial critiques - I have no further concerns.
-
Reviewer #3 (Public review):
Summary:
This study investigates the functional differences between barrel and septal columns in the mouse somatosensory cortex, focusing on how local inhibitory dynamics (particularly involving SST⁺ interneurons) may mediate temporal integration of multi-whisker (MW) stimuli in septa. Using a combination of in vivo multi-unit recordings, calcium imaging, and anatomical tracing, the authors propose a model in which Elfn1-dependent synaptic facilitation onto SST⁺ interneurons contributes to the distinct sensory responses to MW input in barrels and septa, enabling functional segregation between these domains.
Strengths:
The study presents a thought-provoking and useful conceptual model for understanding sensory processing in the somatosensory cortex. While barrel columns have been widely studied, septal regions remain relatively understudied in mice. If septa indeed act as selective integrators of distributed sensory input, this would suggest a novel computational role for cortical microcircuits beyond the classical view focused on barrels. Although still hypothetical, the proposed model in which SST⁺ interneurons contribute to domain-specific sensory responses between barrel and septal domains is intriguing and opens new avenues for investigating inhibitory circuit mechanisms.
Weaknesses:
The primary limitation of this study lies in the spatial and cellular specificity of the recording techniques. The physiological data rely predominantly on unsorted multi-unit activity (MUA) recorded with low-channel-count silicon probes. Because MUA aggregates signals from multiple neurons over a radius of approximately 50-100 µm (often wider than the typical septal width in mice), this approach makes it difficult to confidently isolate activity originating strictly from within septal domains. The manuscript would benefit from additional analyses to validate the spatial specificity of these recordings, such as systematically varying spike detection thresholds to test the robustness of domain attribution, as suggested by the reviewer. Furthermore, although the authors now appropriately frame their findings in the Elfn1 knockout mice as indirect evidence, it is worth emphasizing that the study lacks direct in vivo, cell-type-specific recordings and manipulations to more definitively test the proposed mechanism.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public Reviews):
Summary:
Argunşah et al. describe and investigate the mechanisms underlying the differential response dynamics of barrel vs septa domains of the whisker-related primary somatosensory cortex (S1). Upon repeated stimulation, the authors report that the response ratio between multi- and single-whisker stimulation increases in layer (L) 4 neurons of the septal domain, while remaining constant in barrel L4 neurons. This difference is attributed to the short-term plasticity properties of interneurons, particularly somatostatin-expressing (SST+) neurons. This claim is supported by the increased density of SST+ neurons found in L4 of the septa compared to barrels, along with a stronger response of (L2/3) SST+ neurons to repeated multi- vs single-whisker stimulation. The role of the synaptic protein Elfn1 is then examined. Elfn1 KO mice exhibited little to no functional domain separation between barrel and septa, with no significant difference in single- versus multi-whisker response ratios across barrel and septal domains. Consistently, a decoder trained on WT data fails to generalize to Elfn1 KO responses. Finally, the authors report a relative enrichment of S2- and M1-projecting cell densities in L4 of the septal domain compared to the barrel domain.
Strengths:
This paper describes and aims to study a circuit underlying differential response between barrel columns and septal domains of the primary somatosensory cortex. This work supports the view that barrel and septal domains contribute differently to processing single versus multi-whisker inputs, suggesting that the barrel cortex multiplexes sensory information coming from the whiskers in different domains.
We thank the reviewer for the very neat summary of our findings that barrel cortex multiplexes converging information in separate domains.
Weaknesses:
While the observed divergence in responses to repeated SWS vs MWS between the barrel and septal domains is intriguing, the presented evidence falls short of demonstrating that short-term plasticity in SST+ neurons critically underpins this difference. The absence of a mechanistic explanation for this observation limits the work’s significance. The measurement of SST neurons’ response is not specific to a particular domain, and the Elfn1 manipulation does not seem to be specific to either stimulus type or a particular domain.
We appreciate the reviewer’s perspective. Although further research is needed to understand the circuit mechanisms underlying the observed phenomenon, we believe our data suggest that altering the short-term dynamics of excitatory inputs onto SST neurons reduces the divergent spiking dynamics in barrels versus septa during repetitive single- and multi-whisker stimulation. Future work could examine how SST neurons, whose somata reside in barrels and septa, respond to different whisker stimuli and the circuits in which they are embedded. At this time, however, the authors believe there is no alternative way to test how the short-term dynamics of excitatory inputs onto SST neurons, as a whole, contribute to the temporal aspects of barrel versus septa spiking.
The study's reach is further constrained by the fact that results were obtained in anesthetized animals, which may not generalize to awake states.
We appreciate the reviewer’s concern regarding the generalizability of our findings from anesthetized animals to awake states. Anesthesia was employed to ensure precise individual whisker stimulation (and multi-whisker in the same animal), which is challenging in awake rodents due to active whisking. While anesthesia may alter higher-order processing, core mechanisms, such as short and long term plasticity in the barrel cortex, are preserved under anesthesia (Martin-Cortecero et al., 2014; Mégevand et al., 2009).
The statistical analysis appears inappropriate, with the use of repeated independent tests, dramatically boosting the false positive error rate.
Thank you for your feedback on our analysis using independent rank-based tests for each time point in wild-type (WT) animals. To address concerns regarding multiple comparisons and temporal dependencies (for Figure 1F and 4D for now but we will add more in our revision), we performed a repeated measures ANOVA for WT animals (13 Barrel, 8 Septa, 20 time points), which revealed a significant main effect of Condition (F(1,19) = 16.33, p < 0.001) and a significant Condition-Time interaction (F(19,361) = 2.37, p = 0.001). Post-hoc tests confirmed significant differences between Barrel and Septa at multiple time points (e.g., p < 0.0025 at times 3, 4, 6, 7, 8, 10, 11, 12, 16, 19 after Bonferroni posthoc correction), supporting a differential multi-whisker vs. single-whisker ratio response in WT animals. In contrast, a repeated measures ANOVA for knock-out (KO) animals (11 Barrel, 7 Septa, 20 time points) showed no significant main effect of Condition (F(1,14) = 0.17, p = 0.684) or Condition-Time interaction (F(19,266) = 0.73, p = 0.791), indicating that the BarrelSepta difference observed in WT animals is absent in KO animals.
Furthermore, the manuscript suffers from imprecision; its conclusions are occasionally vague or overstated. The authors suggest a role for SST+ neurons in the observed divergence in SWS/MWS responses between barrel and septal domains. However, this remains speculative, and some findings appear inconsistent. For instance, the increased response of SST+ neurons to MWS versus SWS is not confined to a specific domain. Why, then, would preferential recruitment of SST+ neurons lead to divergent dynamics between barrel and septal regions? The higher density of SST+ neurons in septal versus barrel L4 is not a sufficient explanation, particularly since the SWS/MWS response divergence is also observed in layers 2/3, where no difference in SST+ neuron density is found.
Moreover, SST+ neuron-mediated inhibition is not necessarily restricted to the layer in which the cell body resides. It remains unclear through which differential microcircuits (barrel vs septum) the enhanced recruitment of SST+ neurons could account for the divergent responses to repeated SWS versus MWS stimulation.
We fully appreciate the reviewer’s comment. We currently do not provide any evidence on the contribution of SST neurons in the barrels versus septa in layer 4 on the response divergence of spiking observed in SWS versus MWS. We only show that these neurons differentially distribute in the two domains in this layer. It is certainly known that there is molecular and circuit-based diversity of SST-positive neurons in different layers of the cortex, so it is plausible that this includes cells located in the two domains of vS1, something which has not been examined so far. Our data on their distribution are one piece of information that SST neurons may have a differential role in inhibiting barrel stellate cells versus septa ones. Morphological reconstructions of SST neurons in L4 of the somatosensory barrel cortex has shown that their dendrites and axons project locally and may confine to individual domains, even though not specifically examined (Fig. 3 of Scala F et al., 2019). The same study also showed that L4 SST cells receive excitatory input from local stellate cells) and is known that they are also directly excited by thalamocortical fibers (Beierlein et al., 2003; Tan et al., 2008), both of which facilitate.
As shown in our supplementary figure, the divergence is also observed in L2/3 where, as the reviewer also points out, where we do not have a differential distribution of SST cells, at least based on a columnar analysis extending from L4. There are multiple scenarios that could explain this “discrepancy” that one would need to examine further in future studies. One straightforward one is that the divergence in spiking in L2/3 domains may be inherited from L4 domains, where L4 SST act on. Another is that even though L2/3 SST neurons are not biased in their distribution their input-output function is, something which one would need to examine by detailed in vitro electrophysiological and perhaps optogenetic approaches in S1. Despite the distinctive differences that have been found between the L4 circuitry in S1 and V1 (Scala F et al., 2019), recent observations indicate that small but regular patches of V1 marked by the absence of muscarinic receptor 2 (M2) have high temporal acuity (Ji et al., 2015), and selectively receive input from SST interneurons (Meier et al., 2025). Regions lacking M2 have distinct input and output connectivity patterns from those that express M2 (Meier et al., 2021; Burkhalter et al., 2023). These findings, together with ours, suggest that SST cells preferentially innervate and regulate specific domains columns- in sensory cortices.
Regardless of the mechanism, the Elfn1 knock-out mouse line almost exclusively affects the incoming excitability onto SST neurons (see also reply to comment below), hence what can be supported by our data is that changing the incoming short-term synaptic plasticity onto these neurons brings the spiking dynamics between barrels and septa closer together.
The Elfn1 KO mouse model seems too unspecific to suggest the role of the short-term plasticity in SST+ neurons in the differential response to repeated SWS vs MWS stimulation across domains. Why would Elfn1-dependent short-term plasticity in SST+ neurons be specific to a pathway, or a stimulation type (SWS vs MWS)? Moreover, the authors report that Elfn1 knockout alters synapses onto VIP+ as well as SST+ neurons (Stachniak et al., 2021; previous version of this paper)-so why attribute the phenotype solely to SST+ circuitry? In fact, the functional distinctions between barrel and septal domains appear largely abolished in the Elfn1 KO.
Previous work by others and us has shown that globally removing Elfn1 selectively removes a synaptic process from the brain without altering brain anatomy or structure. This allows us to study how the temporal dynamics of inhibition shape activity, as opposed to inhibition from particular cell types. We will nevertheless update the text to discuss more global implications for SST interneuron dynamics and include a reference to VIP interneurons that contain Elfn1.
When comparing SWS to MWS, we find that MWS replaces the neighboring excitation which would normally be preferentially removed by short-term plasticity in SST interneurons, thus providing a stable control comparison across animals and genotypes. On average, VIP interneurons failed to show modulation by MWS. We were unable to measure a substantial contribution of VIP cells to this process and also note that the Elfn1 expressing multipolar neurons comprise only ~5% of VIP neurons (Connor and Peters, 1984; Stachniak et al., 2021), a fraction that may be lost when averaging from 138 VIP cells. Moreover, the effect of Elfn1 loss on VIP neurons is quite different and marginal compared to that of SST cells, suggesting that the primary impact of Elfn1 knockout is mediated through SST+ interneuron circuitry. Therefore, even if we cannot rule out that these 5% of VIP neurons contribute to barrel domain segregation, we are of the opinion that their influence would be very limited if any.
Reviewer #2 (Public Reviews):
Summary:
Argunsah and colleagues demonstrate that SST-expressing interneurons are concentrated in the mouse septa and differentially respond to repetitive multi-whisker inputs. Identifying how a specific neuronal phenotype impacts responses is an advance.
Strengths:
(1) Careful physiological and imaging studies.
(2) Novel result showing the role of SST+ neurons in shaping responses.
(3) Good use of a knockout animal to further the main hypothesis.
(4) Clear analytical techniques.
We thank the reviewer for their appreciation of the study.
Weaknesses:
No major weaknesses were identified by this reviewer. Overall, I appreciated the paper but feel it overlooked a few issues and had some recommendations on how additional clarifications could strengthen the paper. These include:
(1) Significant work from Jerry Chen on how S1 neurons that project to M1 versus S2 respond in a variety of behavioral tasks should be included (e.g. PMID: 26098757). Similarly, work from Barry Connor’s lab on intracortical versus thalamocortical inputs to SST neurons, as well as excitatory inputs onto these neurons (e.g. PMID: 12815025) should be included.
We thank the reviewer for these valuable resources that we overlooked. We will include Chen et al. (2015), Cruikshank et al. (2007) and Gibson et al. (1999) to contextualize S1 projections and SST+ inputs, strengthening the study’s foundation as well as Beierlein et al. (2003) which nicely show both local and thalamocortical facilitation of excitatory inputs onto L4 SST neurons, in contrast to PV cells. The paper also shows the gradual recruitment of SST neurons by thalamocortical inputs to provide feed-forward inhibition onto stellate cells (regular spiking) of the barrel cortex L4 in rat.
(2) Using Layer 2/3 as a proxy to what is happening in layer 4 (~line 234). Given that layer 2/3 cells integrate information from multiple barrels, as well as receiving direct VPm thalamocortical input, and given the time window that is being looked at can receive input from other cortical locations, it is not clear that layer 2/3 is a proxy for what is happening in layer 4.
We agree with the reviewer that what we observe in L2/3 is not necessarily what is taking place in L4 SST-positive cells. The data on L2/3 was included to show that these cells, as a population, can show divergent responses when it comes to SWS vs MWS, which is not seen in L2/3 VIP neurons. Regardless of the mechanisms underlying it, our overall data support that SST-positive neurons can change their activation based on the type of whisker stimulus and when the excitatory input dynamics onto these neurons change due to the removal of Elfn1 the recruitment of barrels vs septa spiking changes at the temporal domain. Having said that, the data shown in Supplementary Figure 3 on the response properties of L2/3 neurons above the septa vs above the barrels (one would say in the respective columns) do show the same divergence as in L4. This suggests that a circuit motif may exist that is common to both layers, involving SST neurons that sit in L4, L5 or even L2/3. This implies that despite the differences in the distribution of SST neurons in septa vs barrels of L4 there is an unidentified input-output spatial connectivity motif that engages in both L2/3 and L4. Please also see our response to a similar point raised by reviewer 1.
(3) Line 267, when discussing distinct temporal response, it is not well defined what this is referring to. Are the neurons no longer showing peaks to whisker stimulation, or are the responses lasting a longer time? It is unclear why PV+ interneurons which may not be impacted by the Elfn1 KO and receive strong thalamocortical inputs, are not constraining activity.
We thank the reviewer for their comment and will clarify the statement.
This convergence of response profiles was further clear in stimulus-aligned stacked images, where the emergent differences between barrels and septa under SWS were largely abolished in the KO (Figure 4B). A distinction between directly stimulated barrels and neighboring barrels persisted in the KO. In addition, the initial response continued to differ between barrel and septa and also septa and neighbor (Figure 4B). This initial stimulus selectivity potentially represents distinct feedforward thalamocortical activity, which includes PV+ interneuron recruitment that is not directly impacted by the Elfn1 KO (Sun et al., 2006; Tan et al., 2008). PV+ cells are strongly excited by thalamocortical inputs, but these exhibit short-term depression, as does their output, contrasting with the sustained facilitation observed in SST+ neurons. These findings suggest that in WT animals, activity spillover from principal barrels is normally constrained by the progressive engagement of SST+ interneurons in septal regions, driven by Elfn1-dependent facilitation at their excitatory synapses. In the absence of Elfn1, this local inhibitory mechanism is disrupted, leading to longer responses in barrels, delayed but stronger responses in septa, and persistently stronger responses in unstimulated neighbors, resulting in a loss of distinction between the responses of barrel and septa domains that normally diverge over time (see Author response image 1 below).
Author response image 1.
(A) Barrel responses are longer following whisker stimulation in KO. (B) Septal responses are slightly delayed but stronger in KO. (C) Unstimulated neighbors show longer persistent responses in KO.
(4) Line 585 “the earliest CSD sink was identified as layer 4…” were post-hoc measurements made to determine where the different shank leads were based on the post-hoc histology?
Post hoc histology was performed on plane-aligned brain sections which would allow us to detect barrels and septa, so as to confirm the insertion domains of each recorded shank. Layer specificity of each electrode therefore could therefore not be confirmed by histology as we did not have coronal sections in which to measure electrode depth.
(5) For the retrograde tracing studies, how were the M1 and S2 injections targeted (stereotaxically or physiologically)? How was it determined that the injections were in the whisker region (or not)?
During the retrograde virus injection, the location of M1 and S2 injections was determined by stereotaxic coordinates (Yamashita et al., 2018). After acquiring the light-sheet images, we were able to post hoc examine the injection site in 3D and confirm that the injections were successful in targeting the regions intended. Although it would have been informative to do so, we did not functionally determine the whisker-related M1 and whisker-related S2 region in this experiment.
(6) Were there any baseline differences in spontaneous activity in the septa versus barrel regions, and did this change in the KO animals?
Thank you for this interesting question. Our previous study found that there was a reduction in baseline activity in L4 barrel cortex of KO animals at postnatal day (P)12, but no differences were found at P21 (Stachniak et al., 2023).
Reviewer #3 (Public Reviews):
Summary:
This study investigates the functional differences between barrel and septal columns in the mouse somatosensory cortex, focusing on how local inhibitory dynamics, particularly involving Elfn1-expressing SST⁺ interneurons, may mediate temporal integration of multiwhisker (MW) stimuli in septa. Using a combination of in vivo multi-unit recordings, calcium imaging, and anatomical tracing, the authors propose that septa integrate MW input in an Elfn1-dependent manner, enabling functional segregation from barrel columns.
Strengths:
The core hypothesis is interesting and potentially impactful. While barrels have been extensively characterized, septa remain less understood, especially in mice, and this study's focus on septal integration of MW stimuli offers valuable insights into this underexplored area. If septa indeed act as selective integrators of distributed sensory input, this would add a novel computational role to cortical microcircuits beyond what is currently attributed to barrels alone. The narrative of this paper is intellectually stimulating.
We thank the reviewer for finding the study intellectually stimulating.
Weaknesses:
The methods used in the current study lack the spatial and cellular resolution needed to conclusively support the central claims. The main physiological findings are based on unsorted multi-unit activity (MUA) recorded via low-channel-count silicon probes. MUA inherently pools signals from multiple neurons across different distances and cell types, making it difficult to assign activity to specific columns (barrel vs. septa) or neuron classes (e.g., SST⁺ vs. excitatory).
The recording radius (~50-100 µm or more) and the narrow width of septa (~50-100 µm or less) make it likely that MUA from "septal" electrodes includes spikes from adjacent barrel neurons.
The authors do not provide spike sorting, unit isolation, or anatomical validation that would strengthen spatial attribution. Calcium imaging is restricted to SST⁺ and VIP⁺ interneurons in superficial layers (L2/3), while the main MUA recordings are from layer 4, creating a mismatch in laminar relevance.
We thank the reviewer for pointing out the possibility of contamination in septal electrodes. Importantly, it may not have been highlighted, although reported in the methods, but we used an extremely high threshold (7.5 std, in methods, line 583) for spike detection in order to overcome the issue raised here, which restricts such spatial contaminations. Since the spike amplitude decays rapidly with distance, at high thresholds, only nearby neurons contribute to our analysis, potentially one or two. We believe that this approach provides a very close approximation of single unit activity (SUA) in our reported data. We will include a sentence earlier in the manuscript to make this explicit and prevent further confusion.
Regarding the point on calcium imaging being performed on L2/3 SST and VIP cells instead of L4. Both reviewer 1 and 2 brought up the same issue and we responded as follows. As shown in our supplementary figure, the divergence is also observed in L2/3 where we do not have a differential distribution of SST cells, at least based on a columnar analysis extending from L4. There are multiple scenarios that could explain this “discrepancy” that one would need to examine further in future studies. One straightforward one is that the divergence in spiking in L2/3 domains may be inherited from L4 domains, where L4 SST act on. Another is that even though L2/3 SST neurons are not biased in their distribution their input-output function is, something which one would need to examine by detailed in vitro electrophysiological and perhaps optogenetic approaches in S1. Despite the distinctive differences that have been found between the L4 circuitry in S1 and V1 (Scala F et al., 2019), recent observations indicate that small but regular patches of V1 marked by the absence of muscarinic receptor 2 (M2) have high temporal acuity (Ji et al., 2015), and selectively receive input from SST interneurons (Meier et al., 2025). Regions lacking M2 have distinct input and output connectivity patterns from those that express M2 (Meier et al., 2021; Burkhalter et al., 2023). These findings, together with ours, suggest that SST cells preferentially innervate and regulate specific domains -columns- in sensory cortices.
Furthermore, while the role of Elfn1 in mediating short-term facilitation is supported by prior studies, no new evidence is presented in this paper to confirm that this synaptic mechanism is indeed disrupted in the knockout mice used here.
We thank Reviewer #3 for noting the absence of new evidence confirming Elfn1’s disruption of short-term facilitation in our knockout mice. We acknowledge that our study relies on previously strong published data demonstrating that Elfn1 mediates short-term synaptic facilitation of excitatory inputs onto SST+ interneurons (Sylwestrak and Ghosh, 2012; Tomioka et al., 2014; Stachniak et al., 2019, 2023). These studies consistently show that Elfn1 knockout abolishes facilitation in SST+ synapses, leading to altered temporal dynamics, which we hypothesize underlies the observed loss of barrel-septa response divergence in our Elfn1 KO mice (Figure 4). Nevertheless, to address the point raised, we will clarify in the revised manuscript (around lines 245-247 and 271-272) that our conclusions are based on these established findings, stating: “Building on prior evidence that Elfn1 knockout disrupts short-term facilitation in SST+ interneurons (Sylwestrak and Ghosh, 2012; Tomioka et al., 2014; Stachniak et al., 2019, 2023), we attribute the abolished barrel-septa divergence in Elfn1 KO mice to altered SST+ synaptic dynamics, though direct synaptic measurements were not performed here.”
Additionally, since Elfn1 is constitutively knocked out from development, the possibility of altered circuit formation-including changes in barrel structure and interneuron distribution, cannot be excluded and is not addressed.
We thank Reviewer #3 for raising the valid concern that constitutive Elfn1 knockout could potentially alter circuit formation, including barrel structure and interneuron distribution. To address this, we will clarify in the revised manuscript (around line ~271 and in the Discussion) that in our previous studies that included both whole-cell patch-clamp in acute brain slices ranging from postnatal day 11 to 22 (P11 - P21) and in vivo recordings from barrel cortex at P12 and P21, we saw no gross abnormalities in barrel structure, with Layer 4 barrels maintaining their characteristic size and organization, consistent with wildtype (WT) mice (Stachniak et al., 2019, 2023). While we cannot fully exclude subtle developmental changes, prior studies indicate that Elfn1 primarily modulates synaptic function rather than cortical cytoarchitecture (Tomioka et al., 2014). Elfn1 KO mice show no gross morphological or connectivity differences and the pattern and abundance of Elfn1 expressing cells (assessed by LacZ knock in) appears normal (Dolan and Mitchell, 2013).
We will add the following to the Discussion: “Although Elfn1 is constitutively knocked out, we find here and in previous studies that barrel structure is preserved (Stachniak et al., 2019, 2023). Further, the distribution of Elfn1 expressing interneurons is not different in KO mice, suggesting minimal developmental disruption (Dolan and Mitchell, 2013).
Nonetheless, we acknowledge that subtle circuit changes cannot be ruled out without the usage of time-depended conditional knockout of the gene.”
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
(1) My biggest concern is regarding statistics. Did the authors repeatedly apply independent tests (Mann-Whitney) without any correction for multiple comparisons (Figures 1 and 4)? In that case, the chances of a spurious "significant" result rise dramatically.
In response to the reviewer’s comment, we now present new statistical results by utilizing ANOVA and blended these results in the manuscript between lines 172 and 192 for WT data and 282 and 298 for Elfn1 KO data. This new statistical approach shows the same differences as we had previously reported, hence consolidating the statements made.
(2) The findings only hint at a mechanism involving SST+ neurons for how SWS and MWS are processed differently in the barrel vs septal domains. As a direct test of SST+ neuron involvement in the divergence of barrel and septal responses, the authors might consider SST-specific manipulations - for example, inhibitory chemo- or optogenetics during SWS and MWS stimulation.
We thank the reviewer for this comment and agree that a direct manipulation of SST+ neurons via inhibitory chemo- or opto-genetics could provide further supporting evidence for the main claims in our study. We have opted out from performing these experiments for this manuscript as we feel they can be part of a future study. At the same time, it is conceivable that such manipulations and depending on how they are performed may lead to larger and non-specific effects on cortical activity, since SST neurons will likely be completely shut down. So even though we certainly appreciate and value the strengths of such approaches, our experiments have addressed a more nuanced hypothesis, namely that the synaptic dynamics onto SST+ neurons matter for response divergence of septa versus barrels, which could not have been easily and concretely addressed by manipulating SST+ cell firing activity.
(3) In general, it is hard to comprehend what microcircuit could lead to the observed divergence in the MWS/SWS ratio in the barrel vs septal domain. There preferential recruitment of SST+ neurons during MWS is not specific to a particular domain, and the higher density of SST+ neurons specifically in L4 septa cannot per se explain the diverging MWS/SWS ratio in L4 septal neurons since similar ratio divergence is observed across domains in L2/3 neurons without increase SST+ neuron density in L2/3. This view would also assume that SST+ inhibition remains contained to its own layer and domain. Is this the case? Is it that different microcircuits between barrels and septa differently shape the response to repeated MWS? This is partially discussed in the paper; can the authors develop on that? What would the proposed mechanism be? Can the short-term plasticity of the thalamic inputs (VPM vs POm) be part of the picture?
We thank the reviewer for raising this important point. We propose that the divergence in MWS/SWS ratios across barrel and septal domains arises from dynamic microcircuit interactions rather than static anatomical features such as SST+ density, which we describe and can provide a hint. In L2/3, where SST+ density is uniform, divergence persists, suggesting that trans-laminar and trans-domain interactions are key. Barrel domains, primarily receiving VPM inputs, exhibit short-term depression onto excitatory cells and engage PV+ and SST+ neurons to stabilize the MWS/SWS ratio, with Elfn1-dependent facilitation of SST+ neurons gradually increasing inhibition during repetitive SWS. Septal domains, in contrast, are targeted by facilitating POm inputs, combined with higher L4 SST+ density and Elfn1-mediated facilitation, producing progressive inhibitory buildup that amplifies the MWS/SWS ratio. SST+ projections in septa may extend trans-laminarly and laterally, influencing L2/3 and neighboring barrels, thereby explaining L2/3 divergence despite uniform SST+ density in L2/3. In this regards, direct laminar-dependent manipulations will be required to confirm whether L2/3 divergence is inherited from L4 dynamics. In Elfn1 KO mice, the loss of facilitation in SST+ neurons likely flattens these dynamics, disrupting functional segregation. Future experiments using VPM/POm-specific optogenetic activation and SST+ silencing will be critical to directly test this model.
We expanded the discussion accordingly.
(4) Can the decoder generalize between SWS and MWS? In this condition, if the decoder accuracy is higher for barrels than septa, it would support the idea that septa are processing the two stimuli differently.
Our results show that septal decoding accuracy is generally higher than barrel accuracy when generalizing from multi-whisker stimulation (MWS) to single-whisker stimulation (SWS), indicating distinct information processing in septa compared to barrels.
In wild-type (WT) mice, septal accuracy exceeds barrel accuracy across all time windows (150ms, 51-95ms, 1-95ms), with the largest difference in the 51-95ms window (0.9944 vs. 0.9214 at pulse 20, 10Hz stimulation). This septal advantage grows with successive pulses, reflecting robust, separable neural responses, likely driven by the posterior medial nucleus (POm)’s strong MWS integration contrasting with minimal SWS activation. Barrel responses, driven by consistent ventral posteromedial nucleus (VPM) input for both stimuli, are less distinguishable, leading to lower accuracy.
In Elfn1 knockout (KO) mice, which disrupt excitatory drive to somatostatin-positive (SST+) interneurons, barrel accuracy is higher initially in the 1-50ms window (0.8045 vs. 0.7500 at pulse 1), suggesting reduced early septal distinctiveness. However, septal accuracy surpasses barrels in later pulses and time windows (e.g., 0.9714 vs. 0.9227 in 51-95ms at pulse 20), indicating restored septal processing. This supports the role of SST+ interneurons in shaping distinct MWS responses in septa, particularly in late-phase responses (51-95ms), where inhibitory modulation is prominent, as confirmed by calcium imaging showing stronger SST+ activation during MWS.
These findings demonstrate that septa process SWS and MWS differently, with higher decoding accuracy reflecting structured, POm- and SST+-driven response patterns. In Elfn1 KO mice, early deficits in septal processing highlight the importance of SST+ interneurons, with later recovery suggesting compensatory mechanisms.
We have added Supplementary Figure 4 and included this interpretation between lines 338353.
We thank the reviewer for suggesting this analysis.
(5) It is not clear to me how the authors achieve SWS. How is it that the pipette tip "placed in contact with the principal whisker" does not detach from the principal whisker or stimulate other whiskers? Please clarify the methods.
Targeting the specific principal whisker is performed under the stereoscope.
Specifically, we have added this statement in line 628:
“We trimmed the whiskers where necessary, to avoid them touching each other and to avoid stimulating other whiskers. By putting the pipette tip very close (almost touching) to the principal whisker, the movement of the tip (limited to 1mm) would reliably move the targeted whisker. The specificity of the stimulation of the selected principal whisker was observed under the stereoscope.”
(6) The method for calculating decoder accuracy is not clearly described-how can accuracy exceed 1? The authors should clarify this metric and provide measures of variability (e.g., confidence intervals or standard deviations across runs) to assess the significance of their comparisons. Additionally, using a consistent scale across all plots would improve interoperability.
We thank the reviewer for raising this point. We have now changed the way accuracies are calculated and adopted a common scale among different plots (see updated Figure 5). We have also changed the methods section accordingly.
(7) Figure 1: The sample size is not specified. It looks like the numbers match the description in the methods, but the sample size should be clearly stated here.
These are the numbers the reviewer is inquiring about.
WT: (WT) animals: a 280 × 95 × 20 matrix for the stimulated barrel (14 Barrels, 95ms, 20 pulses), a 180 × 95 × 20 matrix for the septa (9 Septa, 95ms, 20 pulses), and a 360 × 95 × 20 matrix for the neighboring barrel (18 Neighboring barrels, 95ms, 20 pulses). N=4 mice.
KO: 11-barrel columns, 7 septal columns, 11 unstimulated neighbors from N=4 mice.
Panels D-F are missing axes and axis labels (firing rate, p-value). Panel D is mislabeled (left, middle, and right). I can't seem to find the yellow line.
Thank you for this observation. We made changes in the figures to make them easier to navigate based on the collective feedback from the reviewers.
Why is changing the way to compare the differences in the responses to repeated stimulation between SWS and MWS?
To assess temporal accumulation of information, we compared responses to repeated single-whisker stimulation (SWS) and multi-whisker stimulation (MWS) using an accumulative decoding approach rather than simple per-pulse firing rates. This method captures domain-specific integration dynamics over successive pulses.
The use of the term "principal whisker" is confusing, as it could refer to the whisker that corresponds to the recorded barrel.
When we use the term principal whisker, the intention is indeed to refer to the whisker corresponding to the recorded barrel during single whisker stimulation. The term principal whisker is removed from Figure legend 1 and legend S1C where it may have led to ambiguity.
Why the statement "after the start of active whisking"? Mice are under anesthesia here; it does not appear to be relevant for the figure.
“After the start of active whisking” refers to the state of the barrel cortex circuitry at the time of recordings. The particular reference we use comes from the habit of assessing sensory processing also from a developmental point of view. The reviewer is correct that it has nothing to do the with the status of the experiment. Nevertheless, since the reviewer found that it may create confusion, we have now taken it out.
(8) Figure 3: The y-axis label is missing for panel C.
This is now fixed. (dF/F).
(9) Figure 4: Axis labels are missing.
Added.
Minor:
(10) Line 36: "progressive increase in septal spiking activity upon multi-whisker stimulation". There is no increase in septal spiking activity upon MWS; the ratio MWS/SWS increases.
We have changed the sentence as follows: Genetic removal of Elfn1, which regulates the incoming excitatory synaptic dynamics onto SST+ interneurons, leads to the loss of the progressive increase in septal spiking ratio (MWS/SWS) upon stimulation.
(11) Line 105: domain-specific, rather than column-specific, for consistency.
We have changed it.
(12) Lines 173-174: "a divergence between barrel and septa domain activity also occurred in Layer 4 from the 2nd pulse onward (Figure 1E)". The authors only show a restricted number of comparisons. Why not show the p-values as for SWS?
The statistics is now presented in current Figure 1E.
(13) Lines 151-153: "Correspondingly, when a single whisker is stimulated repeatedly, the response to the first pulse is principally bottom-up thalamic-driven responses, while the later pulses in the train are expected to also gradually engage cortico-thalamo-cortical and cortico-cortical loops." Can the authors please provide a reference?
We have now added the following references : (Kyriazi and Simons, 1993; Middleton et al., 2010; Russo et al., 2025).
(14) Lines 184-186: "Our electrophysiological experiments show a significant divergence of responses over time upon both SWS and MWS in L4 between barrels (principal and neighboring) and adjacent septa, with minimal initial difference". The only difference between the neighboring barrel and septa is the responses to the initial pulse. Can the author clarify?
We have now changed the sentence as follows: Our electrophysiological experiments show a significant divergence of responses between domains upon both SWS and MWS in L4. (Line 198 now)
(15) Line 214: "suggest these interneurons may play a role in diverging responses between barrels and septa upon SWS". Why SWS specifically?
We have changed the sentence as follows: These results confirmed that SST+ and VIP+ interneurons have higher densities in septa compared to barrels in L4 and suggest these interneurons may play a role in diverging responses between barrels and septa. (Line 231 now).
(16) Line 235: "This result suggests that differential activation of SST+ interneurons is more likely to be involved in the domain-specific temporal ratio differences between barrels and septa". Why? The results here are not domain-specific.
We have now revised this statement to: This result suggested that temporal ratio differences specific to barrels and septa might involve differential activation of SST+ interneurons rather than VIP+ interneurons.
(17) Lines 241-243: "SST+ interneurons in the cortex are known to show distinct short-term synaptic plasticity, particularly strong facilitation of excitatory inputs, which enables them to regulate the temporal dynamics of cortical circuits." Please provide a reference.
We have now added the following references: (Grier et al., 2023; Liguz-Lecznar et al., 2016).
(18) Lines 245-247: "A key regulator of this plasticity is the synaptic protein Elfn1, which mediates short-term synaptic facilitation of excitation on SST+ interneurons (Stachniak et al., 2021, 2019; Tomioka et al., 2014)". Is Stachniak et al., 2021 not about the role of Elf1n in excitatory-to-VIP+ neuron synapses?
The reviewer correctly spotted this discrepancy . This reference has now been removed from this statement.
(19) Lines 271-272: "Building on our findings that Elfn1-dependent facilitation in SST+ interneurons is critical for maintaining barrel-septa response divergence". The authors did not show that.
We have now changed the statement to: Building on our findings that Elfn1 is critical for maintaining barrel-septa response divergence
(20) Line 280: second firing peak, not "peal".
Thank you, it is now fixed.
(21) Lines 304-305: "These results highlight the critical role of Elfn1 in facilitating the temporal integration of 305 sensory inputs through its effects on SST+ interneurons". This claim is also overstated.
We have now changed the statement to: These results highlight the contribution of Elfn1 to the temporal integration of sensory inputs. (Line 362)
(22) Line 329: Any reason why not cite Chen et al., Nature 2013?
We have now added this reference, as also pointed out by reviewer 1.
(23) Line 341-342: "wS1" and "wS2" instead of S1 and S2 for consistency.
Thanks, we have now updated the terms.
Reviewer #2 (Recommendations for the authors):
(1) Figure 3D - the SW conditions are labeled but not the MW conditions (two right graphs) - they should be labeled similarly (SSTMW, VIPMW).
The two right graphs in Figure 3D represent paired SW vs MW comparisons of the evoked responses for SST and VIP populations, respectively.
(2) Figure 6 D and E I think it would be better if the Depth measurements were to be on the yaxis, which is more typical of these types of plots.
We thank the reviewer for this comment. Although we appreciate this may be the case, we feel that the current presentation may be easier for the reader to navigate, and we have hence kept it.
(3) Having an operational definition of septa versus barrel would be useful. As the authors point out, this is a tough distinction in a mouse, and often you read papers that use Barrel Wall versus Barrel Hollow/Center - operationally defining how these areas were distinguished would be helpful.
We thank the reviewer for this comment and understand the point made.
We have now updated the methods section in line 611:
DiI marks contained within the vGlut2 staining were defined as barrel recordings, while DiI marks outside vGlut2 staining were septal recordings.
Reviewer #3 (Recommendations for the authors):
To support the manuscript's major claims, the authors should consider the following:
(1) Validate the septal identity of the neurons studied, either anatomically or functionally at the single-cell level (e.g., via Ca²⁺ imaging with confirmed barrel/septa mapping).
We thank the reviewer for this suggestion, but we feel that these extensive experiments are beyond the scope of this study.
(2) Provide both anatomical and physiological evidence to assess the possibility of altered cortical development in Elfn1 KO mice, including potential changes in barrel structure or SST⁺ cell distribution.
To address the reviewer’s point, we have now added the following to the Discussion: “Although Elfn1 is constitutively knocked out, we find here and in previous studies that barrel structure is preserved (Stachniak et al., 2019, 2023). Further, the distribution of Elfn1 expressing interneurons is not different in KO mice, suggesting minimal developmental disruption (Dolan and Mitchell, 2013). Nonetheless, we acknowledge that subtle circuit changes cannot be ruled out without conditional knockouts.”,
(3) Examine the sensory responses of SST⁺ and VIP⁺ interneurons in deeper cortical layers, particularly layer 4, which is central to the study's main conclusions.
We thank the reviewer for this suggestion and appreciate the value it would bring to the study. We nevertheless feel that these extensive experiments are beyond the scope of this study and hence opted out from performing them.
Minor Comments:
(1) The authors used a CLARITY-based passive clearing protocol, which is known to sometimes induce tissue swelling or distortion. This may affect anatomical precision, especially when assigning neurons to narrow domains such as septa versus barrels. Please clarify whether tissue expansion was measured, corrected, or otherwise accounted for during analysis.
Yes, the tissue expansion was accounted during analysis for the laminar specification. We excluded the brains with severe distortion.
(2) While the anatomical data are plotted as a function of "depth from the top of layer 4," the manuscript does not specify the precise depth ranges used to define individual cortical layers in the cleared tissue. Given the importance of laminar specificity in projection and cell type analyses, the criteria and boundaries used to delineate each layer should be explicitly stated.
Thank you for pointing this out. We now include the criteria for delineating each layer in the manuscript. “Given that the depth of Layer 4 (L4) can be reliably measured due to its welldefined barrel boundaries, and that the relative widths of other layers have been previously characterized (El-Boustani et al., 2018), we estimated laminar boundaries proportionally. Specifically, Layer 2/3 was set to approximately 1.3–1.5 times the width of L4, Layer 5a to ~0.5 times, and Layer 5b to a similar width as L4. Assuming uniform tissue expansion across the cortical column, we extrapolated the remaining laminar thicknesses proportionally.”
(3) In several key comparisons (e.g., SST⁺ vs. VIP⁺ interneurons, or S2-projecting vs. M1projecting neurons), it is unclear whether the same barrel columns were analyzed across conditions. Given the anatomical and functional heterogeneity across wS1 columns, failing to control for this may introduce significant confounds. We recommend analyzing matched columns across groups or, if not feasible, clearly acknowledging this limitation in the manuscript.
We thank the reviewer for raising this important point. For the comparison of SST⁺ versus VIP⁺ interneurons, it would in principle have been possible to analyze the same barrel columns across groups. However, because some of the cleared brains did not reach the optimal level of clarity, our choice of columns was limited, and we were not always able to obtain sufficiently clear data from the same columns in both groups. Similarly, for the analysis of S2- versus M1-projecting neurons, variability in the position and spread of retrograde virus injections made it difficult to ensure measurements from identical barrel columns. We have now added a statement in the Discussion to acknowledge this limitation.
(4) Figure 1C: Clarify what each point in the t-SNE plot represents-e.g., a single trial, a recording channel, or an averaged response. Also, describe the input features used for dimensionality reduction, including time windows and preprocessing steps.
In response to the reviewer’s comment, we have now added the following in the methods: In summary, each point in the t-SNE plots represents an averaged response across 20 trials for a specific domain (barrel, septa, or neighbor) and genotype (WT or KO), with approximately 14 points per domain derived from the 280 trials in each dataset. The input features are preprocessed by averaging blocks of 20 trials into 1900-dimensional vectors (95ms × 20), which are then reduced to 2D using t-SNE with the specified parameters. This approach effectively highlights the segregation and clustering patterns of neural responses across cortical domains in both WT and KO conditions.
(5) Figures 1D, E (left panels): The y-axes lack unit labeling and scale bars. Please indicate whether values are in spikes/sec, spikes/bin, or normalized units.
We have now clarified this.
(6) Figures 1D, E (right panels): The color bars lack units. Specify whether the values represent raw firing rates, z-scores, or other normalized measures. Replace the vague term "Matrix representation" with a clearer label such as "Pulse-aligned firing heatmap."
Thank you, we have now done it.
(7) Figure 1E (bottom panel): There appears to be no legend referring to these panels. Please define labels such as "B" and "S."
Thank you, we have now done it.
(8) Figure 1E legend: If it duplicates the legend from Figure 1D, this should be made explicit or integrated accordingly.
We have changed the structure of this figure.
(9) Figure 1F: Define "AUC" and explain how it was computed (e.g., area under the firing rate curve over 0-50 ms). Indicate whether the plotted values represent percentages and, if so, label the y-axis accordingly. If normalization was applied, describe the procedure. Include sample sizes (n) and specify what each data point represents (e.g., animal, recording site).
The following paragraph has been added in the methods section:
The Area Under the Curve (AUC) was computed as the integral of the smoothed firing rate (spikes per millisecond) over a 50ms window following each whisker stimulation pulse, using trapezoidal integration. Firing rate data for layer 4 barrel and septal regions in wild-type (WT) and knockout (KO) mice were smoothed with a 3-point moving average and averaged across blocks of 20 trials. Plotted values represent the percentage ratio of multi-whisker (MW) to single whisker (SW) AUC with error bars showing the standard error of the mean. Each data point reflects the mean AUC ratio for a stimulation pulse across approximately 11 blocks (220 trials total). The y-axis indicates percentages.
(10) Figure 3C: Add units to the vertical axis.
We have added them.
(11) Figure 3D: Specify what each line represents (e.g., average of n cells, individual responses?).
Each line represents an average response of a neuron.
(12) Figure 4C legend: Same with what?". No legend refers to the bottom panels - please revise to clarify.
Thank you. We have now changed the figure structure and legends and fixed the missing information issue.
(13) Supplementary Figure 1B: Indicate the physical length of the scale bar in micrometers.
This has been fixed. The scale bar is 250um.
(14) Indicate the catalog number or product name of the 8×8 silicon probe used for recordings.
We have added this information. It is the A8x8-Edge-5mm-100-200-177-A64
References
(1) Beierlein, M., Gibson, J. R. & Connors, B. W. (2003). Two dynamically distinct inhibitory networks in layer 4 of the neocortex. J. Neurophysiol. 90, 2987–3000.
(2) Burkhalter, A., D’Souza, R. D. & Ji, W. (2023). Integration of feedforward and feedback information streams in the modular architecture of mouse visual cortex. Annu. Rev. Neurosci. 46, 259–280.
(3) Chen, J. L., Margolis, D. J., Stankov, A., Sumanovski, L. T., Schneider, B. L. & Helmchen, F. (2015). Pathway-specific reorganization of projection neurons in somatosensory cortex during learning. Nat. Neurosci. 18, 1101–1108.
(4) Connor, J. R. & Peters, A. (1984). Vasoactive intestinal polypeptide-immunoreactive neurons in rat visual cortex. Neuroscience 12, 1027–1044.
(5) Cruikshank, S. J., Lewis, T. J. & Connors, B. W. (2007). Synaptic basis for intense thalamocortical activation of feedforward inhibitory cells in neocortex. Nat. Neurosci. 10, 462–468.
(6) Dolan, J. & Mitchell, K. J. (2013). Mutation of Elfn1 in mice causes seizures and hyperactivity. PLoS One 8, e80491.
(7) Gibson, J. R., Beierlein, M. & Connors, B. W. (1999). Two networks of electrically coupled inhibitory neurons in neocortex. Nature 402, 75–79.
(8) Ji, W., Gămănuţ, R., Bista, P., D’Souza, R. D., Wang, Q. & Burkhalter, A. (2015). Modularity in the organization of mouse primary visual cortex. Neuron 87, 632–643.
(9) Martin-Cortecero, J. & Nuñez, A. (2014). Tactile response adaptation to whisker stimulation in the lemniscal somatosensory pathway of rats. Brain Res. 1591, 27–37.
(10) Mégevand, P., Troncoso, E., Quairiaux, C., Muller, D., Michel, C. M. & Kiss, J. Z. (2009). Long-term plasticity in mouse sensorimotor circuits after rhythmic whisker stimulation. J. Neurosci. 29, 5326–5335.
(11) Meier, A. M., Wang, Q., Ji, W., Ganachaud, J. & Burkhalter, A. (2021). Modular network between postrhinal visual cortex, amygdala, and entorhinal cortex. J. Neurosci. 41, 4809– 4825.
(12) Meier, A. M., D’Souza, R. D., Ji, W., Han, E. B. & Burkhalter, A. (2025). Interdigitating modules for visual processing during locomotion and rest in mouse V1. bioRxiv 2025.02.21.639505.
(13) Scala, F., Kobak, D., Shan, S., Bernaerts, Y., Laturnus, S., Cadwell, C. R., Hartmanis, L., Froudarakis, E., Castro, J. R., Tan, Z. H., et al. (2019). Layer 4 of mouse neocortex differs in cell types and circuit organization between sensory areas. Nat. Commun. 10, 4174.
(14) Stachniak, T. J., Sylwestrak, E. L., Scheiffele, P., Hall, B. J. & Ghosh, A. (2019). Elfn1induced constitutive activation of mGluR7 determines frequency-dependent recruitment of somatostatin interneurons. J. Neurosci. 39, 4461–4475.
(15) Stachniak, T. J., Kastli, R., Hanley, O., Argunsah, A. Ö., van der Valk, E. G. T., Kanatouris, G. & Karayannis, T. (2021). Postmitotic Prox1 expression controls the final specification of cortical VIP interneuron subtypes. J. Neurosci. 41, 8150–8166.
(16) Stachniak, T. J., Argunsah, A. Ö., Yang, J. W., Cai, L. & Karayannis, T. (2023). Presynaptic kainate receptors onto somatostatin interneurons are recruited by activity throughout development and contribute to cortical sensory adaptation. J. Neurosci. 43, 7101–7118.
(17) Sun, Q.-Q., Huguenard, J. R. & Prince, D. A. (2006). Barrel cortex microcircuits: Thalamocortical feedforward inhibition in spiny stellate cells is mediated by a small number of fast-spiking interneurons. J. Neurosci. 26, 1219–1230.
(18) Sylwestrak, E. L. & Ghosh, A. (2012). Elfn1 regulates target-specific release probability at CA1-interneuron synapses. Science 338, 536–540.
(19) Tan, Z., Hu, H., Huang, Z. J. & Agmon, A. (2008). Robust but delayed thalamocortical activation of dendritic-targeting inhibitory interneurons. Proc. Natl. Acad. Sci. USA 105, 2187–2192.
(20) Tomioka, N. H., Yasuda, H., Miyamoto, H., Hatayama, M., Morimura, N., Matsumoto, Y., Suzuki, T., Odagawa, M., Odaka, Y. S., Iwayama, Y., et al. (2014). Elfn1 recruits presynaptic mGluR7 in trans and its loss results in seizures. Nat. Commun. 5, 4501.
(21) Yamashita, T., Vavladeli, A., Pala, A., Galan, K., Crochet, S., Petersen, S. S. & Petersen, C. C. (2018). Diverse long-range axonal projections of excitatory layer 2/3 neurons in mouse barrel cortex. Front. Neuroanat. 12, 33.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This important manuscript provides insights into the competition between Splicing Factor 1 (SF1) and Quaking (QKI) for binding at the ACUAA branch point sequence in a model intron, regulating exon inclusion. The study employs convincing, rigorous transcriptomic, proteomic, and reporter assays, with both mammalian cell culture and yeast models.
-
Reviewer #2 (Public review):
Summary:
In this manuscript, Pereira de Castro and coworkers are studying potential competition between a more standard splicing factor SF1 and an alternative splicing factor called QK1. This is interesting because they bind to overlapping sequence motifs and could potentially have opposing effects on promoting the splicing reaction. To test this idea, the authors KD either SF1 or QK1 in mammalian cells and uncover several exons whose splicing regulation follows the predicted pattern of being promoted for splicing by SF1 and repressed by QK1. Importantly, these have introns enriched in SF1 and QK1 motifs. The authors then focus on one exon in particular with two tandem motifs to study the mechanism of this in greater detail and their results confirm the competition model. Mass spec analysis largely agrees with their proposal; however, it is complicated by apparently quick transition of SF1 bound complexes to later splicing intermediates. An inspired experiment in yeast shows how QK1 competition could potentially have a determinental impact on splicing in an orthogonal system. Overall these results show how splicing regulation can be achieved by competition between a "core" and alternative splicing factor and provide additional insight into the complex process of branch site recognition. The manuscript is exceptionally clear and the figures and data very logically presented. The work will be valuable to those in the splicing field who are interested in both mechanism and bioinformatics approaches to deconvolve any apparent "splicing code" being used by cells to regulate gene expression.
Strengths:
(1) The main discovery of the manuscript involving evidence for SF1/QK1 competition is quite interesting and important for this field. This evidence has been missing and may change how people think about branch site recognition.
(2) The experiments and the rationale behind them are clearly and logically presented.
(3) The experiments are carried out to a high standard and well-designed controls are included.
(4) The extrapolation of the result to yeast in order to show the potentially devastating consequences of QK1 competition was creative and informative.
Weaknesses:
Overall the weaknesses are relatively minor and involve cases where conclusions could potentially have been strengthened with additional experimentation. For example, pull-down of the U2 snRNP could be strengthened by detection of the snRNA whereas the proteins may themselves interact with these factors in the absence of the snRNA. In addition the discussion is a bit speculative given the data, but compelling nonetheless.
-
Reviewer #3 (Public review):
Summary:
In this manuscript the authors were trying to establish whether competition between the RNA binding proteins SF1 and QKI controlled splicing outcomes. These two proteins have similar binding sites and protein sequences, but SF1 lacks a dimerization motif and seems to bind a single version of the binding sequence. Importantly, these binding sequences correspond to branchpoint consensus sequences, with SF1 binding leading to productive splicing, but QKI binding leading instead to association with paraspeckle proteins. They show that in human cells SF1 generally activates exons and QKI represses, and a large group of the jointly regulated exons (43% of joint targets) are reciprocally controlled by SF1 and QKI. They focus on one of these exons RAI14 that shows this reciprocal pattern of regulation, and has 2 repeats of the binding site that make it a candidate for joint regulation, and confirm regulation within a minigene context. The authors used assembly of proteins within nuclear extracts to explain the effect of QKI versus SF1 binding. Finally the authors show that expression of QKI is lethal in yeast, and causes splicing defects.
How this fits in the field. This study is interesting and provides a conceptual advance by providing a general rule how SF1 and QKI interact with relation to binding sites, and the relative molecular fates followed, so is very useful. Most of the analysis seems to focus on one example, but the choice of this example was carefully explained in the text. The molecular analysis and global work significantly adds to the picture from the previously published paper about NUMB joint regulation by QKI and SF (Zong et al, cited in text as reference 50, that looked at SF1 and QKI binding in relation to a duplicated binding site/branchpoint sequence in NUMB).
Strengths:
The data presented are strong and clear. The ideas discussed in this paper are of wide interest, and present a simple model where two binding sites generates a potentially repressive QKI response, whereas exons that have a single upstream sequence are just regulated by SF1. The assembly of splicing complexes on RNAs derived from RAI14 in nuclear extracts, followed by mass spec gave interesting mechanistic insight into what was occurring as a result of QKI versus SF1 binding.
Weaknesses:
The authors have addressed the previous weaknesses of the study, resulting in a much stronger manuscript
-
Author response:
The following is the authors’ response to the original reviews.
eLife Assessment
This important manuscript provides insights into the competition between Splicing Factor 1 (SF1) and Quaking (QKI) for binding at the ACUAA branch point sequence in a model intron, regulating exon inclusion. The study employs rigorous transcriptomic, proteomic, and reporter assays, with both mammalian cell culture and yeast models. Nevertheless, while the data are convincing, broadening the analysis to additional exons and narrowing the manuscript's title to better align with the experimental scope would strengthen the work.
Public Reviews:
Reviewer #1 (Public review):
In this manuscript, the authors aimed to show that SF1 and QKI compete for the intron branch point sequence ACUAA and provide evidence that QKI represses inclusion when bound to it.
Major strengths of this manuscript include:
(1) Identification of the ACUAA-like motif in exons regulated by QKI and SF1.
(2) The use of the splicing reporter and mutant analysis to show that upstream and downstream ACUAAC elements in intron 10 of RAI are required for repressing splicing.
(3) The use of proteomic to identify proteins in C2C12 nuclear extract that binds to the wild type and mutant sequence.
(4) The yeast studies showing that ectopic lethality when Qki5 expression was induced, due to increased mis-splicing of transcripts that contain the ACUAA element.
The authors conclusively show that the ACUAA sequence is bound by QKI and provide strong evidence that this leads to differences in exons inclusion and exclusion. In animal cells, and especially in human, branchpoint sequences are degenerate but seem to be recognized by specific splicing factors. Although a subset of splicing factors shows tissue-specific expression patterns most don't, suggesting that yet-to-be-identified mechanisms regulate splicing. This work suggests that an alternate mechanism could be related to the binding affinity of specific RNA binding factors for branchpoint sequences coupled with the level of these different splicing factors in a given cell.
We thank the reviewer for the positive comments.
Reviewer #2 (Public review):
Summary:
In this manuscript, Pereira de Castro and coworkers are studying potential competition between a more standard splicing factor SF1, and an alternative splicing factor called QK1. This is interesting because they bind to overlapping sequence motifs and could potentially have opposing effects on promoting the splicing reaction. To test this idea, the authors KD either SF1 or QK1 in mammalian cells and uncover several exons whose splicing regulation follows the predicted pattern of being promoted for splicing by SF1 and repressed by QK1. Importantly, these have introns enriched in SF1 and QK1 motifs. The authors then focus on one exon in particular with two tandem motifs to study the mechanism of this in greater detail and their results confirm the competition model. Mass spec analysis largely agrees with their proposal; however, it is complicated by the apparently quick transition of SF1-bound complexes to later splicing intermediates. An inspired experiment in yeast shows how QK1 competition could potentially have a detrimental impact on splicing in an orthogonal system. Overall, these results show how splicing regulation can be achieved by competition between a "core" and alternative splicing factor and provide additional insight into the complex process of branch site recognition. The manuscript is exceptionally clear and the figures and data are very logically presented. The work will be valuable to those in the splicing field who are interested in both mechanism and bioinformatics approaches to deconvolve any apparent "splicing code" being used by cells to regulate gene expression. Criticisms are minor and the most important of them stem from overemphasis on parts of the manuscript on the evolutionary angle when evolution itself wasn't analyzed per se.
We thank the reviewer for the positive comments and very clear and fair critical points.
Strengths:
(1) The main discovery of the manuscript involving evidence for SF1/QK1 competition is quite interesting and important for this field. This evidence has been missing and may change how people think about branch site recognition.
(2) The experiments and the rationale behind them are exceptionally clearly and logically presented. This was wonderful!
Thank you so much. We felt the overall flow of the paper and data make for a nice “story” that conveys a relatively easy-to-understand explanation for a complex subject.
(3) The experiments are carried out to a high standard and well-designed controls are included.
(4) The extrapolation of the result to yeast in order to show the potentially devastating consequences of the QK1 competition was very exciting and creative.
We agree this is a very exciting result and finding! Thanks.
Weaknesses:
Overall the weaknesses are relatively minor and involve cases where clarification is necessary, some additional analysis could bolster the arguments, and suggestions for focusing the manuscript on its strengths.
(1) The title (Ancient...evolutionary outcomes), abstract, and some parts of the discussion focus heavily on the evolutionary implications of this work. However, evolutionary analysis was not performed in these studies (e.g., when did QK1 and SF1 proteins arise and/or diverge? How does this line up with branch site motifs and evolution of U2? Any insight from recent work from Scott Roy et al?). I think this aspect either needs to be bolstered with experimental work/data or this should be tamped down in the manuscript. I suggest highlighting the idea expressed in the sentence "A nuanced implication of this model is that loss-of-function...". To me, this is better supported by the data and potentially by some analysis of mutations associated with human disease.
We have revised the title and dampened the evolutionary aspects of the previous version of the manuscript.
(2) One paper that I didn't see cited was that by Tanackovic and Kramer (Mol Biol Cell 2005). This paper is relevant because they KD SF1 and found it nonessential for splicing in vivo. Do their results have implications for those here? How do the results of the KD compare? Could QK1 competition have influenced their findings (or does their work influence the "nuanced implication" model referenced above?)?
This is an interesting point, and thank you for the suggestion. We have now included a brief description of this study in the Introduction of the revised manuscript and do note that the authors measured intron retention of a beta globin reporter and SF3A1, SF3A2, and SF3A3 during SF1 knockdown, but did not detect elevated unspliced RNA in these targets.
(3) Can the authors please provide a citation for the statement "degeneracy is observed to a higher degree in organisms with more alternative splicing"? Does recent evolutionary analysis support this?
We have removed the statement, as it did not add much to the content and I am not sure I can state the concept I was attempting to convey in a simple manner with few citations.
(4) For the data in Figure 3, I was left wondering if NMD was confounding this analysis. Can the authors respond to this and address this concern directly?
We have not measured if the reporters used in Figure 3 produce protein(s). Presumably, though, all spliced reporter RNA would be degraded equally (the included/skipped isoforms’ “reading frames” are not altered from one another). This would not be case for unspliced nuclear reporter RNA, however. Given this difference, and that our analysis can not resolve the subcellular localization of the different reporter species, we have removed the measurement of and subsequent results describing unspliced reporter RNA from Figure 3.
(5) To me, the idea that an engaged U2 snRNP was pulled down in Figure 4F would be stronger if the snRNA was detected. Was that able to be observed by northern or primer extension? Would SF1 be enriched if the U2 snRNA was degraded by RNaseH in the NE?
We did not measure any co-associating RNAs in this experimental approach, but agree that this approach would strengthen the evidence for it.
(6) I'm wondering how additive the effects of QK1 and SF1 are... In Figure 2, if QK1 and SF1 are both knocked down, is the splicing of exon 11 restored to "wt" levels?
This is an interesting question that we were unfortunately unable to address experimentally here.
(7) The first discussion section has two paragraphs that begin "How does competition between SF1..." and "Relatively little is known about how...". I found the discussion and speculation about localization, paraspekles, and lncRNAs interesting but a bit detracting from the strengths of the manuscript. I would suggest shortening these two paragraphs into a single one.
We have revised the Discussion.
Reviewer #3 (Public review):
Summary:
In this manuscript, the authors were trying to establish whether competition between the RNA-binding proteins SF1 and QKI controlled splicing outcomes. These two proteins have similar binding sites and protein sequences, but SF1 lacks a dimerization motif and seems to bind a single version of the binding sequence. Importantly, these binding sequences correspond to branchpoint consensus sequences, with SF1 binding leading to productive splicing, but QKI binding leading instead to association with paraspeckle proteins. They show that in human cells SF1 generally activates exons and QKI represses, and a large group of the jointly regulated exons (43% of joint targets) are reciprocally controlled by SF1 and QKI. They focus on one of these exons RAI14 that shows this reciprocal pattern of regulation, and has 2 repeats of the binding site that make it a candidate for joint regulation, and confirm regulation within a minigene context. The authors used the assembly of proteins within nuclear extracts to explain the effect of QKI versus SF1 binding. Finally, the authors show that the expression of QKI is lethal in yeast, and causes splicing defects.
How this fits in the field. This study is interesting and provides a conceptual advance by providing a general rule on how SF1 and QKI interact in relation to binding sites, and the relative molecular fates followed, so is very useful. Most of the analysis seems to focus on one example, although the molecular analysis and global work significantly add to the picture from the previously published paper about NUMB joint regulation by QKI and SF (Zong et al, cited in text as reference 50, that looked at SF1 and QKI binding in relation to a duplicated binding site/branchpoint sequence in NUMB).
Thank you for the encouraging remarks.
Strengths:
The data presented are strong and clear. The ideas discussed in this paper are of wide interest, and present a simple model where two binding sites generate a potentially repressive QKI response, whereas exons that have a single upstream sequence are just regulated by SF1. The assembly of splicing complexes on RNAs derived from RAI14 in nuclear extracts, followed by mass spec gave interesting mechanistic insight into what was occurring as a result of QKI versus SF1 binding.
Weaknesses:
I did not think the title best summarises the take-home message and could be perhaps a bit more modest. Although the authors investigated splicing patterns in yeast and human cells, yeast do not have QKI so there is no ancient competition in that case, and the study did not really investigate physiological or evolutionary outcomes in splicing, although it provides interesting speculation on them. Also as I understood it, the important issue was less conserved branchpoints in higher eukaryotes enabling alternative splicing, rather than competition for the conserved branchpoint sequence. So despite the the data being strong and properly analysed and discussed in the paper, could the authors think whether they fit best with the take-home message provided in the title? Just as a suggestion (I am sure the authors can do a better job), maybe "molecular competition between variant branchpoint sequences predict physiological and evolutionary outcomes in splicing"?
Thank you for this point (Reviewer 2 had a similar comment) and the suggestion. We have revised the title.
Although the authors do provide some global data, most of the detailed analysis is of RAI14. It would have been useful to examine members of the other quadrants in Figure 1C as well for potential binding sites to give a reason why these are not co-regulated in the same way as RAI14. How many of the RAI14 quadrants had single/double sites (the motif analysis seemed to pull out just one), and could one of the non-reciprocally regulated exons be moved into a different quadrant by addition or subtraction of a binding site or changing the branchpoint (using a minigene approach for example).
This is an interesting point that we have considered. Our intent with the focus on RAI14 was to use a naturally occurring intron bps with evidence of strong QKI binding that did not require a high degree of sequence manipulation or engineering.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
(1) Most of my recommendations are really centered on the figures. In their current state, they detract from the data shown and could be improved: I recommend the authors use a uniform font. For example, Figure 1E and F have at least three different fonts of varying sizes making it very messy. In Figure 1C, the authors could bold the Ral14 ex11 or simply indicate that the blue is this exon in the legend, thus removing the text from this very busy graph. In Figure 4F, I would recommend, having all the labels the same size and putting those genes of interest like Sf3a1 in bold. This could also be done in Figure 4E.
Thank you for the suggestion and we have edited these (FYI the font in Fig’s 1E and 1F were from the rMAPS default output, but I agree, it gives a sloppy appearance).
(2) In Figures 4D and 4G, is there QKI binding to the downstream deletion mutant after 30 minutes? Also, in Figure 4G, are these all from the same blot? The band sizes seem to be very different between lanes. If these were not on the same blot, the original gels should be submitted.
A small amount of Qki appears to be binding after 30 min. All lanes/blots are from the same gels/membranes; see new Supplemental Figure 4 for the original (uncropped) images of the blots.
(3) The authors should indicate, the source and concentration of the antibodies used for their WB. They should also indicate the primers used for RT-PCRs.
We have revised the methods to include the antibody information and have uploaded a supplemental table 8 with all oligonucleotide sequences used (which I (Sam Fagg) neglected to do initially, so that’s my bad).
Reviewer #2 (Recommendations for the authors):
(1) This may come down to the author's preference but branch point and branch site are frequently two words, not a single compound word (branch point vs. branchpoint). In addition, the authors may want to use branchsite with the abbreviation BS more frequently since they often don't describe the specific point of branching, and bp and bps could be confused for the more frequent abbreviations for base pair(s).
Good suggestion; we have edited the text accordingly.
(2) In general the addition of page numbers and line numbers to the manuscript would greatly aid reviewers!
Point taken…
(3) Introduction; "...under normal growth conditions they are efficiently spliced". I would say MOST introns in yeast are efficiently spliced. This is definitely not universal.
Text edited to indicate that most are efficiently spliced.
(4) Introduction; " recognition of the bps by SF1 (mammals) (20)". The choice of reference 20 is an odd one here. I think the Robin Reed and Michael Rosbash paper was the first to show SF1 was the human homolog of BBP.
Got it, thanks (added #14 here and kept #20 also since it shows the structure of SF1 in complex with a UACUAAC bps.)
(5) Results; "QK1 and SF1 co-regulate.."; it may be useful for the reader if you could explain in more detail why exon inclusion and intron retention are expected outcomes for QK1 knockdown and vice versa for SF1. The exon inclusion here is more obvious than the intron retention phenotype. (In other words, if more exons are included shouldn't it follow that more introns are removed?)
We explain the expected results for exon inclusion in the Introduction and this paragraph of the Results. Although we have observed more intron retention under QKI loss-of-function approaches before, I am uncertain where the reviewer sees that we indicate any expected result for intron retention from either QKI or SF1 knockdown. I believe the statement you refer to might be on line 162 and starts with: “Consistent with potentially opposing functions in splicing…” ?
Also, I agree that if SF1 is a “splicing activator,” one might expect more IR in its absence (but this is not the case; there is, in fact, less), but nonetheless, the opposite outcome is observed with QKI knockdown (more IR). It is unclear why this is the case, and we did not investigate it.
(6) Results; "QK1 and SF1 co-regulate.."; "Thus the most highly represented set.." To me, the most highly represented set is those which are not both QK1-repressed and SF1-activated. Does this indicate that other factors are involved at most sites than simple competition between these two?
We have revised the sentence in question to include the text “by quadrant” in order to convey our meaning more precisely.
(7) Throughout the manuscript, 5 apostrophes and 3 apostrophes are used instead of 5 prime symbols and 3 prime symbols.
Thank you for pointing that out. We have fixed each instance of this.
(8) Sometimes SF1 is written as Sf1. (also Tatsf1)
This was a mouse/human gene/protein nomenclature error that we have fixed; thank you for pointing this out.
(9) You may want to make sure that figures are labeled consistently with the manuscript text. In Figure 1B, it is RI rather than IR. In Figure 4 it is myoblast NE rather than C2C12 nuclear extract.
We have fixed these, checked for other examples, and where relevant, edited those too.
(10) I think Figure 1A could be improved by also including a depiction of the domain arrangements of SF1 and QK1.
Done.
(11) I was a bit confused with all the lines in Figure 1E and 1F. What is the difference between the log (pVal) and upregulated plots? Can these figures be simplified or explained more thoroughly?
Based on this comment and one from Reviewer 1, we have slightly revised the wording (and font) on the output, which hopefully clarifies. These are motif enrichment plots generated by rMAPS (Refs 61 and 62) analysis of rMATS (Ref 60) data for exons more included (depicted by the red lines) or more skipped (depicted by the blue lines) compared to control versus a “background” set of exons that are detectable but unchanged. The -log<sub>10</sub> is P-value (dotted line) indicates the significance of exons more included in shRNA treatment vs control shRNA (previously read “upregulated”) compared to background exons that are detectable but unchanged; the solid lines indicate the motif score; these are described in the references indicated.
(12) Figure 1B, it is a bit hard to conclude that there is more AltEx or "RI/IR" in one sample vs. the other from these plots since the points overlay one another. Can you include numbers here?
Added (and deleted Suppl Fig S1, which was simply a chart showing the numbers).
(13) How was PSI calculated in Figure 2A?
VAST-tools (we state this in the legend in the revised version).
You may want to include rel protein (or the lower limit of detection) for Figure 2B to be consistent with 2C. Why is KD of SF1 so poor and variable between 2C and 2D?
We have not investigated this, but these blots show an optimized result that we were able to obtain for the knockdown in each cell type. It may be that HEK293 cells (Fig 2B) have a stronger requirement for SF1 than C2C12 cells…? I would argue that it is not necessarily “poor” in Fig 2C, as we observe ~70% depletion of the protein.
Why are two bands present in the gel?
Two to three isoforms of SF1 are present in most cell types.
A good (or bad, really) example of an SF1 western blot (and knockdown of ~35% in K562 or ~45% in HepG2 can also be seen on the ENCODE project website, for reference:
By comparison, I think ours are much more cosmetically pleasing, and our knockdown (especially in C2C12) is much more efficient.
(14) Figure 3, The asterisk refers to a cryptic product. Can the uaAcuuuCAG be used as a branch point? Presumably the natural 3' SS is now too close so this would result in activation of a downstream 3'SS?
We did not pursue determining the identity of this minor and likely artefactual product, but we (and others) have observed a similar phenomenon when using splicing reporter-based mutational approaches.
(15) For the methods. The "RNA extraction, RT -PCR,..." subheading needs to be on its own line. Please add (w/v) or (v/v) to percentages where appropriate. Please convert ug to the symbol for "micro".
Thank you, we have made these changes.
(16) In Figure 4B, the text here and legend are microscopic. Even with reading glasses, I couldn't make anything out!
We have increased the font sizes for the text and scale bar…when referring to “legend” does the reviewer mean the scale bar?
(17) As a potential discussion item, it is worth noting that SF1 could also repress splicing if it could either not engage with U2AF or be properly displaced by U2 snRNP so the snRNA could pair. I was wondering if QK1 could similarly be activating if it could engage with U2AF. I'm unsure if this could be tested by domain swaps (and is beyond the scope of this paper). It just may be worth speculating about.
Good point and suggestion…we are looking into this.
Reviewer #3 (Recommendations for the authors):
(1) Is the reference in the text to Figure 5F correct for actin splicing (this is just before the discussion)?
I see references several lines up from this, but I do not see a reference just before the discussion…?
(2) I was not sure why the minigene experiments showed such high levels of intron retention that seemed to be impacted also by deletion of the branchpoint sequences, and suggest that the two branchpoints are not equal in strength.
Neither were we, but Reviewer 2 has suggested that degradation of the spliced products could be rapid (NMD substrates) which could complicate the interpretation of what appears to be higher levels of intron retention. Given the possibility that this could be a non-physiological artefact, we have removed the measurement of unspliced reporter and now only show the spliced products (equally subject to degradation) and report their percent inclusion.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
The study presents convincing quantitative evidence, supported by appropriate negative controls, for the presence of low-abundance glycine receptors (GlyRs) within inhibitory synapses in telencephalic regions of the mouse brain. Using sensitive single-molecule localization microscopy of endogenously tagged GlyRs, the authors reveal previously undetected populations of these receptors. Although the functional significance of these low-abundance GlyRs remains to be established, the findings offer valuable insights and methodologies that will be of interest to neuroscientists studying inhibitory synapse biology.
[Editors' note: this paper was reviewed by Review Commons.]
-
Reviewer #1 (Public review):
Summary:
In this manuscript, the authors investigate the nanoscopic distribution of glycine receptor subunits in the hippocampus, dorsal striatum, and ventral striatum of the mouse brain using single-molecule localization microscopy (SMLM). They demonstrate that only a small number of glycine receptors are localized at hippocampal inhibitory synapses. Using dual-color SMLM, they further show that clusters of glycine receptors are predominantly localized within gephyrin-positive synapses. A comparison between the dorsal and ventral striatum reveals that the ventral striatum contains approximately eight times more glycine receptors and this finding is consistent with electrophysiological data on postsynaptic inhibitory currents. Finally, using cultured hippocampal neurons, they examine the differential synaptic localization of glycine receptor subunits (α1, α2, and β). This study is significant as it provides insights into the nanoscopic localization patterns of glycine receptors in brain regions where this protein is expressed at low levels. Additionally, the study demonstrates the different localization patterns of GlyR in distinct striatal regions and its physiological relevance using SMLM and electrophysiological experiments. However, several concerns should be addressed.
Specific comments on the original version:
(1) Colocalization analysis in Figure 1A. The colocalization between Sylite and mEos-GlyRβ appears to be quite low. It is essential to assess whether the observed colocalization is not due to random overlap. The authors should consider quantifying colocalization using statistical methods, such as a pixel shift analysis, to determine whether colocalization frequencies remain similar after artificially displacing one of the channels.
(2) Inconsistency between Figure 3A and 3B. While Figure 3B indicates an ~8-fold difference in the number of mEos4b-GlyRβ detections per synapse between the dorsal and ventral striatum, Figure 3A does not appear to show a pronounced difference in the localization of mEos4b-GlyRβ on Sylite puncta between these two regions. If the images presented in Figure 3A are not representative, the authors should consider replacing them with more representative examples or providing an expanded images with multiple representative examples. Alternatively, if this inconsistency can be explained by differences in spot density within clusters, the authors should explain that.
(3) Quantification in Figure 5. It is recommended that the authors provide quantitative data on cluster formation and colocalization with Sylite puncta in Figure 5 to support their qualitative observations.
(4) Potential for pseudo replication. It's not clear whether they're performing stats tests across biological replica, images, or even synapses. They often quote mean +/- SEM with n = 1000s, and so does that mean they're doing tests on those 1000s? Need to clarify.
(5) Does mEoS effect expression levels or function of the protein? Can't see any experiments done to confirm this. Could suggest WB on homogenate, or mass spec?
(6) Quantification of protein numbers is challenging with SMLM. Issues include i) some of FP not correctly folded/mature, and ii) dependence of localisation rate on instrument, excitation/illumination intensities, and also the thresholds used in analysis. Can the authors compare with another protein that has known expression levels- e.g. PSD95? This is quite an ask, but if they could show copy number of something known to compare with, it would be useful.
(7) Rationale for doing nanobody dSTORM not clear at all. They don't explain the reason for doing the dSTORM experiments. Why not just rely on PALM for coincidence measurements, rather than tagging mEoS with a nanobody, and then doing dSTORM with that? Can they explain? Is it to get extra localisations- i.e. multiple per nanobody? If so, localising same FP multiple times wouldn't improve resolution. Also, no controls for nanobody dSTORM experiments- what about non-spec nb, or use on WT sections?
(8) What resolutions/precisions were obtained in SMLM experiments? Should perform Fourier Ring Correlation (FRC) on SR images to state resolutions obtained (particularly useful for when they're presenting distance histograms, as this will be dependent on resolution). Likewise for precision, what was mean precision? Can they show histograms of localisation precision.
(9) Why were DBSCAN parameters selected? How can they rule out multiple localisations per fluor? If low copy numbers (<10), then why bother with DBSCAN? Could just measure distance to each one.
(10) For microscopy experiment methods, state power densities, not % or "nominal power".
(11) In general, not much data presented. Any SI file with extra images etc.?
(12) Clarification of the discussion on GlyR expression and synaptic localization: The discussion on GlyR expression, complex formation, and synaptic localization is sometimes unclear, and needs terminological distinctions between "expression level", "complex formation" and "synaptic localization". For example, the authors state: "What then is the reason for the low protein expression of GlyRβ? One possibility is that the assembly of mature heteropentameric GlyR complexes depends critically on the expression of endogenous GlyR α subunits." Does this mean that GlyRβ proteins that fail to form complexes with GlyRα subunits are unstable and subject to rapid degradation? If so, the authors should clarify this point. The statement "This raises the interesting possibility that synaptic GlyRs may depend specifically on the concomitant expression of both α1 and β transcripts." suggests a dependency on α1 and β transcripts. However, is the authors' focus on synaptic localization or overall protein expression levels? If this means synaptic localization, it would be beneficial to state this explicitly to avoid confusion. To improve clarity, the authors should carefully distinguish between these different aspects of GlyR biology throughout the discussion. Additionally, a schematic diagram illustrating these processes would be highly beneficial for readers.
(13) Interpretation of GlyR localization in the context of nanodomains. The distribution of GlyR molecules on inhibitory synapses appears to be non-homogeneous, instead forming nanoclusters or nanodomains, similar to many other synaptic proteins. It is important to interpret GlyR localization in the context of nanodomain organization.
Significance:
The paper presents biological and technical advances. The biological insights revolve mostly on the documentation of Glycine receptors in particular synapses in forebrain, where they are typically expressed at very low levels. The authors provide compelling data indicating that the expression is of physiological significance. The authors have done a nice job of combining genetically tagged mice with advanced microscopy methods to tackle the question of distributions of synaptic proteins. Overall, these advances are more incremental than groundbreaking.
Comments on revised version:
The authors have addressed the majority of the significant issues raised in the review and revised the manuscript appropriately. One issue that can be further addressed relates to the issue of pseudo-replication. The authors state in their response that "All experiments were repeated at least twice to ensure reproducibility (N independent experiments). Statistical tests were performed on pooled data across the biological replicates; n denotes the number of data points used for testing (e.g., number of synaptic clusters, detections, cells, as specified in each case).". This suggests that they're not doing their stats on biological replicates, and instead are pseudo replicating. It's not clear how they have ensured reproducibility, when the stats seem to have been done on pooled data across repeats.
-
Reviewer #2 (Public review):
Summary:
In their manuscript "Single molecule counting detects low-copy glycine receptors in hippocampal and striatal synapses" Camuso and colleagues apply single molecule localization microscopy (SMLM) methods to visualize low copy numbers of GlyRs at inhibitory synapses in the hippocampal formation and the striatum. SMLM analysis revealed higher copy numbers in striatum compared to hippocampal inhibitory synapses. They further provide evidence that these low copy numbers are tightly linked to post-synaptic scaffolding protein gephyrin at inhibitory synapses. Their approach profits from the high detection sensitivity and resolution of SMLM and challenges the controversial view on the presence of GlyRs in these formations although there are reports (electrophysiology) on the presence of GlyRs in these particular brain regions. These new datasets in the current manuscript may certainly assist in understanding the complexity of fundamental building blocks of inhibitory synapses.
Strengths:
The manuscript provides new insights to presence of low-copy numbers by visualizing them via SMLM. This is the first report that visualizes GlyR optically in the brain applying the knock-in model of mEOS4b tagged GlyRß and quantifies their copy number comparing distribution and amount of GlyRs from hippocampus and striatum. Imaging data correspond well to electrophysiological measurements in the manuscript.
Comments on revised version:
My concerns have been successfully addressed by the authors during the revision process.
-
Reviewer #3 (Public review):
In this study, Camuso et al., make use of a knock-in mouse model expressing endogenously mEos4b-tagged GlyRβ subunits to detect endogenous glycine receptors in mouse brain using single-molecule localization microscopy (SMLM). At synapses in the hippocampus GlyRβ molecules are detected at very low copy numbers. Assuming that each detected GlyRβ molecule is incorporated in a pentameric glycine receptor, it was estimated that while the majority of hippocampal inhibitory synapses do not contain glycine receptors, a small population of inhibitory synapses contain a few (up to 10) glycine receptors. Using dual-color SMLM approaches it is furthermore confirmed that the detected GlyRβ molecules are embedded in the postsynaptic domain marked by gephyrin. In contrast to the hippocampus, at inhibitory synapses in the striatum GlyRβ molecules were detected at considerably higher copy numbers. Interestingly, the observed number of GlyRβ detections was significantly higher in the ventral striatum compared to the dorsal striatum. These findings are corroborated by electrophysiological recordings showing that postsynaptic glycinergic currents can be readily detected in the ventral striatum but are almost absent in the dorsal striatum. Using lentiviral overexpression of recombinant GlyRalpha1, alpha2, and beta subunits in cultured hippocampal neurons, it is shown that GlyR alpha1 subunits are readily detectable at synapses, but overexpressed GlyRalpha2 and beta subunits did not strongly enrich at synapses. This could indicate that GlyRa1 expression is limiting the synaptic expression of GlyRβ-containing glycine receptors in hippocampal neurons.
Comments on revised version:
This revised manuscript is significantly improved. New experimental and quantitative analysis is presented that strengthen the conclusions. Overall, the results presented in this manuscript are based on carefully performed SMLM experiments and are well-presented and described. The knock-in mouse with endogenously tagged GlyRβ molecules is a very strong aspect of this study and provides confidence in the labeling, the combination with SMLM is very strong as it provides high sensitivity and spatial resolution. These results confirm previous studies and will be of interest to a specialised audience interested in glycine receptors, inhibitory synapse biology and super-resolution microscopy.
-
Author response:
The following is the authors’ response to the current reviews.
We thank the editors of eLife and the reviewers for their thorough evaluation of our study. As regards the final comments of reviewer 1 please note that all experimental replicates were first analyzed separately, and were then pooled, since the observed changes were comparable between experiments. This mean that statistical analyses were done on pooled biological replicates.
The following is the authors’ response to the original reviews.
General Statements
We thank the reviewers for their thorough and constructive evaluation of our work. We have revised the manuscript carefully and addressed all the criticisms raised, in particular the issues mentioned by several of the reviewers (see point-by-point response below). We have also added a number of explanations in the text for the sake of clarity, while trying to keep the manuscript as concise as possible.
In our view, the novelty of our research is two-fold. From a neurobiological point of view, we provide conclusive evidence for the existence of glycine receptors (GlyRs) at inhibitory synapses in various brain regions including the hippocampus, dentate gyrus and sub-regions of the striatum. This solves several open questions and has fundamental implications for our understanding of the organisation and function of inhibitory synapses in the telencephalon. Secondly, our study makes use of the unique sensitivity of single molecule localisation microscopy (SMLM) to identify low protein copy numbers. This is a new way to think about SMLM as it goes beyond a mere structural characterisation and towards a quantitative assessment of synaptic protein assemblies.
Point-by-point description of the revisions
Reviewer #1 (Evidence, reproducibility and clarity):
In this manuscript, the authors investigate the nanoscopic distribution of glycine receptor subunits in the hippocampus, dorsal striatum, and ventral striatum of the mouse brain using single-molecule localization microscopy (SMLM). They demonstrate that only a small number of glycine receptors are localized at hippocampal inhibitory synapses. Using dual-color SMLM, they further show that clusters of glycine receptors are predominantly localized within gephyrinpositive synapses. A comparison between the dorsal and ventral striatum reveals that the ventral striatum contains approximately eight times more glycine receptors and this finding is consistent with electrophysiological data on postsynaptic inhibitory currents. Finally, using cultured hippocampal neurons, they examine the differential synaptic localization of glycine receptor subunits (α1, α2, and β). This study is significant as it provides insights into the nanoscopic localization patterns of glycine receptors in brain regions where this protein is expressed at low levels. Additionally, the study demonstrates the different localization patterns of GlyR in distinct striatal regions and its physiological relevance using SMLM and electrophysiological experiments. However, several concerns should be addressed.
The following are specific comments:
(1) Colocalization analysis in Figure 1A. The colocalization between Sylite and mEos-GlyRβ appears to be quite low. It is essential to assess whether the observed colocalization is not due to random overlap. The authors should consider quantifying colocalization using statistical methods, such as a pixel shift analysis, to determine whether colocalization frequencies remain similar after artificially displacing one of the channels.
Following the suggestion of reviewer 1, we re-analysed CA3 images of Glrb<sup>eos/eos</sup> hippocampal slices by applying a pixel-shift type of control, in which the Sylite channel (in far red) was horizontally flipped relative to the mEos4b-GlyRβ channel (in green, see Methods). As expected, the number of mEos4b-GlyRβ detections per gephyrin cluster was markedly reduced compared to the original analysis (revised Fig. 1B), confirming that the synaptic mEos4b detections exceed chance levels (see page 5).
(2) Inconsistency between Figure 3A and 3B. While Figure 3B indicates an ~8-fold difference in the number of mEos4b-GlyRβ detections per synapse between the dorsal and ventral striatum, Figure 3A does not appear to show a pronounced difference in the localization of mEos4bGlyRβ on Sylite puncta between these two regions. If the images presented in Figure 3A are not representative, the authors should consider replacing them with more representative examples or providing an expanded images with multiple representative examples. Alternatively, if this inconsistency can be explained by differences in spot density within clusters, the authors should explain that.
The pointillist images in Fig. 3A are essentially binary (red-black). Therefore, the density of detections at synapses cannot be easily judged by eye. For clarity, the original images in Fig. 3A have been replaced with two other examples that better reflect the different detection numbers in the dorsal and ventral striatum.
(3) Quantification in Figure 5. It is recommended that the authors provide quantitative data on cluster formation and colocalization with Sylite puncta in Figure 5 to support their qualitative observations.
This is an important point that was also raised by the other reviewers. We have performed additional experiments to increase the data volume for analysis. For quantification, we used two approaches. First, we counted the percentage of infected cells in which synaptic localisation of the recombinant receptor subunit was observed (Fig. 5C). We found that mEos4b-GlyRa1 consistently localises at synapses, indicating that all cells express endogenous GlyRb. When neurons were infected with mEos4b-GlyRb, fewer cells had synaptic clusters, meaning that indeed, GlyR alpha subunits are the limiting factor for synaptic targeting. In cultures infected with mEos4b-GlyRa2, only very few neurons displayed synaptic localisation (as judged by epifluorescence imaging). We think this shows that GlyRa2 is less capable of forming heteromeric complexes than GlyRa1, in line with our previous interpretation (see pp. 9-10, 13).
Secondly, we quantified the total intensity of each subunit at gephyrin-positive domains, both in infected neurons as well as non-infected control cultures (Fig. 5D). We observed that mEos4bGlyRa1 intensity at gephyrin puncta was higher than that of the other subunits, again pointing to efficient synaptic targeting of GlyRa1. Gephyrin cluster intensities (Sylite labelling) were not significantly different in GlyRb and GlyRa2 expressing neurons compared to the uninfected control, indicating that the lentiviral expression of recombinant subunits does not fundamentally alter the size of mixed inhibitory synapses in hippocampal neurons. Interestingly, gephyrin levels were slightly higher in hippocampal neurons expressing mEos4b-GlyRa1. In our view, this comes from an enhanced expression and synaptic targeting of mEos4b-GlyRa1 heteromers with endogenous GlyRb, pointing to a structural role of GlyRa1/b in hippocampal synapses (pp. 10, 13).
The new data and analyses have been described and illustrated in the relevant sections of the manuscript.
(4) Potential for pseudo replication. It's not clear whether they're performing stats tests across biological replica, images, or even synapses. They often quote mean +/- SEM with n = 1000s, and so does that mean they're doing tests on those 1000s? Need to clarify.
All experiments were repeated at least twice to ensure reproducibility (N independent experiments). Statistical tests were performed on pooled data across the biological replicates; n denotes the number of data points used for testing (e.g., number of synaptic clusters, detections, cells, as specified in each case). We have systematically given these numbers in the revised manuscript (n, N, and other experimental parameters such as the number of animals used, coverslips, images or cells). Data are generally given as mean +/- SEM or as mean +/- SD as indicated.
(5) Does mEoS effect expression levels or function of the protein? Can't see any experiments done to confirm this. Could suggest WB on homogenate, or mass spec?
The Glrb<sup>eos/eos</sup> knock-in mouse line has been characterised previously and does not to display any ultrastructural or functional deficits at inhibitory synapses (Maynard et al. 2021 eLife). GlyRβ expression and glycine-evoked responses were not significantly different to those of the wildtype. The synaptic localisation of mEos4b-GlyRb in KI animals demonstrates correct assembly of heteromeric GlyRs and synaptic targeting. Accordingly, the animals do not display any obvious phenotype. We have clarified this in the manuscript (p. 4). In the case of cultured neurons, long-term expression of fluorescent receptor subunits with lentivirus has proven ideal to achieve efficient synaptic targeting. The low and continuous supply of recombinant receptors ensures assembly with endogenous subunits to form heteropentameric receptor complexes (e.g. [Patrizio et al. 2017 Sci Rep]). In the present study, lentivirus infection did not induce any obvious differences in the number or size of inhibitory synapses compared to control neurons, as judged by Sylite labelling of synaptic gephyrin puncta (new Fig. 5D).
(6) Quantification of protein numbers is challenging with SMLM. Issues include i) some of FP not correctly folded/mature, and ii) dependence of localisation rate on instrument, excitation/illumination intensities, and also the thresholds used in analysis. Can the authors compare with another protein that has known expression levels- e.g. PSD95? This is quite an ask, but if they could show copy number of something known to compare with, it would be useful.
We agree that absolute quantification with SMLM is challenging, since the number of detections depends on fluorophore maturation, photophysics, imaging conditions, and analysis thresholds (discussed in Patrizio & Specht 2016, Neurophotonics). For this reason, only very few datasets provide reliable copy numbers, even for well-studied proteins such as PSD-95. One notable exception is the study by Maynard et al. (eLife 2021) that quantified endogenous GlyRβcontaining receptors in spinal cord synapses using SMLM combined with correlative electron microscopy. The strength of this work was the use of a KI mouse strain, which ensures that mEos4b-GlyRβ expression follows intrinsic regional and temporal profiles. The authors reported a stereotypic density of ~2,000 GlyRs/µm² at synapses, corresponding to ~120 receptors per synapse in the dorsal horn and ~240 in the ventral horn, taking into account various parameters including receptor stoichiometry and the functionality of the fluorophore. These values are very close to our own calculations of GlyR numbers at spinal cord synapses that were obtained slightly differently in terms of sample preparation, microscope setup, imaging conditions, and data analysis, lending support to our experimental approach. Nevertheless, the obtained GlyR copy numbers at hippocampal synapses clearly have to be taken as estimates rather than precise figures, because the number of detections from a single mEos4b fluorophore can vary substantially, meaning that the fluorophores are not represented equally in pointillist images. This can affect the copy number calculation for a specific synapse, in particular when the numbers are low (e.g. in hippocampus), however, it should not alter the average number of detections (Fig. 1B) or the (median) molecule numbers of the entire population of synapses (Fig. 1C). We have discussed the limitations of our approach (p. 11).
(7) Rationale for doing nanobody dSTORM not clear at all. They don't explain the reason for doing the dSTORM experiments. Why not just rely on PALM for coincidence measurements, rather than tagging mEoS with a nanobody, and then doing dSTORM with that? Can they explain? Is it to get extra localisations- i.e. multiple per nanobody? If so, localising same FP multiple times wouldn't improve resolution. Also, no controls for nanobody dSTORM experiments- what about non-spec nb, or use on WT sections?
As discussed above (point 6), the detection of fluorophores with SMLM is influenced by many parameters, not least the noise produced by emitting molecules other than the fluorophore used for labelling. Our study is exceptional in that it attempts to identify extremely low molecule numbers (down to 1). To verify that the detections obtained with PALM correspond to mEos4b, we conducted robust control experiments (including pixel-shift as suggested by the reviewer, see point 1, revised Fig. 1B). The rationale for the nanobody-based dSTORM experiments was twofold: (1) to have an independent readout of the presence of low-copy GlyRs at inhibitory synapses and (2) to analyse the nanoscale organisation of GlyRs relative to the synaptic gephyrin scaffold using dual-colour dSTORM with spectral demixing (see p. 6). The organic fluorophores used in dSTORM (AF647, CF680) ensure high photon counts, essential for reliable co-localisation and distance analysis. PALM and dSTORM cannot be combined in dual-colour mode, as they require different buffers and imaging conditions.
The specificity of the anti-Eos nanobody was demonstrated by immunohistochemistry in spinal cord cultures expressing mEos4b-GlyRb and wildtype control tissue (Fig. S3). In response to the reviewer's remarks, we also performed a negative control experiment in Glrb<sup>eos/eos</sup> slices (dSTORM), in which the nanobody was omitted (new Fig. S4F,G). Under these conditions, spectral demixing produced a single peak corresponding to CF680 (gephyrin) without any AF647 contribution (Fig. S4F). The background detection of "false" AF647 detections at synapses was significantly lower than in the slices labelled with the nanobody. We conclude that the fluorescence signal observed in our dual-colour dSTORM experiments arises from the specific detection of mEos4b-GlyRb by the nanobody, rather than from background, crossreactivity or wrong attribution of colour during spectral demixing. We have added these data and explanations in the results (p. 7) and in the figure legend of Fig. S4F,G.
(8) What resolutions/precisions were obtained in SMLM experiments? Should perform Fourier Ring Correlation (FRC) on SR images to state resolutions obtained (particularly useful for when they're presenting distance histograms, as this will be dependent on resolution). Likewise for precision, what was mean precision? Can they show histograms of localisation precision.
This is an interesting question in the context of our experiments with low-copy GlyRs, since the spatial resolution of SMLM is limited also by the density of molecules, i.e. the sampling of the structure in question (Nyquist-Shannon criterion). Accordingly, the priority of the PALM experiments was to improve the sensibility of SMLM for the identification of mEos4b-GlyRb subunits, rather than to maximize the spatial resolution. The mean localisation precision in PALM was 33 +/- 12 nm, as calculated from the fitting parameters of each detection (Zeiss, ZEN software), which ultimately result from their signal-to-noise ratio. This is a relatively low precision for SMLM, which can be explained by the low brightness of mEos4b compared to organic fluorophores together with the elevated fluorescence background in tissue slices.
In the case of dSTORM, the aim was to study the relative distribution of GlyRs within the synaptic scaffold, for which a higher localisation precision was required (p. 6). Therefore, detections with a precision ≥ 25 nm were filtered during analysis with NEO software (Abbelight). The retained detections had a mean localisation precision of 12 +/- 5 for CF680 (Sylite) and 11 +/- 4 for AF647 (nanobody). These values are given in the revised manuscript (pp. 18, 22).
(9) Why were DBSCAN parameters selected? How can they rule out multiple localisations per fluor? If low copy numbers (<10), then why bother with DBSCAN? Could just measure distance to each one.
Multiple detections of the same fluorophore are intrinsic to dSTORM imaging and have not been eliminated from the analysis. Small clusters of detections likely represent individual molecules (e.g. single receptors in the extrasynaptic regions, Fig. 2A). DBSCAN is a robust clustering method that is quite insensitive to minor changes in the choice of parameters. For dSTORM of synaptic gephyrin clusters (CF680), a relatively low length (80 nm radius) together with a high number of detections (≥ 50 neighbours) were chosen to reconstruct the postsynaptic domain with high spatial resolution (see point 8). In the case of the GlyR (nanobody-AF647), the clustering was done mostly for practical reasons, as it provided the coordinates of the centre of mass of the detections. The low stringency of this clustering (200 nm radius, ≥ 5 neighbours) effectively filters single detections that can result from background noise or incorrect demixing. An additional reference explaining the use of DBSCAN including the choice of parameters is given on p. 22 (see also R2 point 4).
(10) For microscopy experiment methods, state power densities, not % or "nominal power".
Done. We now report the irradiance (laser power density) instead of nominal power (pp. 18, 21).
(11) In general, not much data presented. Any SI file with extra images etc.?
The original submission included four supplementary figures with additional data and representative images that should have been available to the reviewer (Figs. S1-S4). The SI file has been updated during revision (new Fig. S4E-G).
(12) Clarification of the discussion on GlyR expression and synaptic localization: The discussion on GlyR expression, complex formation, and synaptic localization is sometimes unclear, and needs terminological distinctions between "expression level", "complex formation" and "synaptic localization". For example, the authors state:"What then is the reason for the low protein expression of GlyRβ? One possibility is that the assembly of mature heteropentameric GlyR complexes depends critically on the expression of endogenous GlyR α subunits." Does this mean that GlyRβ proteins that fail to form complexes with GlyRα subunits are unstable and subject to rapid degradation? If so, the authors should clarify this point. The statement "This raises the interesting possibility that synaptic GlyRs may depend specifically on the concomitant expression of both α1 and β transcripts." suggests a dependency on α1 and β transcripts. However, is the authors' focus on synaptic localization or overall protein expression levels? If this means synaptic localization, it would be beneficial to state this explicitly to avoid confusion. To improve clarity, the authors should carefully distinguish between these different aspects of GlyR biology throughout the discussion. Additionally, a schematic diagram illustrating these processes would be highly beneficial for readers.
We thank the reviewer to point this out. We are dealing with several processes; protein expression that determines subunit availability and the assembly of pentameric GlyRs complexes, surface expression, membrane diffusion and accumulation of GlyRb-containing receptor complexes at inhibitory synapses. We have edited the manuscript, particularly the discussion and tried to be as clear as possible in our wording.
We chose not to add a schematic illustration for the time being, because any graphical representation is necessarily a simplification. Instead, we preferred to summarise the main numbers in tabular form (Table 1). We are of course open to any other suggestions.
(13) Interpretation of GlyR localization in the context of nanodomains. The distribution of GlyR molecules on inhibitory synapses appears to be non-homogeneous, instead forming nanoclusters or nanodomains, similar to many other synaptic proteins. It is important to interpret GlyR localization in the context of nanodomain organization.
The dSTORM images in Fig. 2 are pointillist representations that show individual detections rather than molecules. Small clusters of detections are likely to originate from a single AF647 fluorophore (in the case of nanobody labelling) and therefore represent single GlyRb subunits. Since GlyR copy numbers are so low at hippocampal synapses (≤ 5), the notion of nanodomain is not directly applicable. Our analysis therefore focused on the integration of GlyRs within the postsynaptic scaffold, rather than attempting to define nanodomain structures (see also response to point 8 of R1). A clarification has been added in the revised manuscript (p. 6).
Reviewer #1 (Significance):
The paper presents biological and technical advances. The biological insights revolve mostly on the documentation of Glycine receptors in particular synapses in forebrain, where they are typically expressed at very low levels. The authors provide compelling data indicating that the expression is of physiological significance. The authors have done a nice job of combining genetically-tagged mice with advanced microscopy methods to tackle the question of distributions of synaptic proteins. Overall these advances are more incremental than groundbreaking.
We thank the reviewer for acknowledging both the technical and biological advances of our study. While we recognize that our work builds upon established models, we consider that it also addresses important unresolved questions, namely that GlyRs are present and specifically anchored at inhibitory synapses in telencephalic regions, such as the hippocampus and striatum. From a methodological point of view, our study demonstrates that SMLM can be applied not only for structural analysis of highly abundant proteins, but also to reliably detect proteins present at very low copy numbers. This ability to identify and quantify sparse molecule populations adds a new dimension to SMLM applications, which we believe increases the overall impact of our study beyond the field of synaptic neuroscience.
Reviewer #2 (Evidence, reproducibility and clarity):
In their manuscript "Single molecule counting detects low-copy glycine receptors in hippocampal and striatal synapses" Camuso and colleagues apply single molecule localization microscopy (SMLM) methods to visualize low copy numbers of GlyRs at inhibitory synapses in the hippocampal formation and the striatum. SMLM analysis revealed higher copy numbers in striatum compared to hippocampal inhibitory synapses. They further provide evidence that these low copy numbers are tightly linked to post-synaptic scaffolding protein gephyrin at inhibitory synapses. Their approach profits from the high sensitivity and resolution of SMLM and challenges the controversial view on the presence of GlyRs in these formations although there are reports (electrophysiology) on the presence of GlyRs in these particular brain regions. These new datasets in the current manuscript may certainly assist in understanding the complexity of fundamental building blocks of inhibitory synapses.
However I have some minor points that the authors may address for clarification:
(1) In Figure 1 the authors apply PALM imaging of mEos4b-GlyRß (knockin) and here the corresponding Sylite label seems to be recorded in widefield, it is not clearly stated in the figure legend if it is widefield or super-resolved. In Fig 1 A - is the scale bar 5 µm? Some Sylite spots appear to be sized around 1 µm, especially the brighter spots, but maybe this is due to the lower resolution of widefield imaging? Regarding the statistical comparison: what method was chosen to test for normality distribution, I think this point is missing in the methods section.
This is correct; the apparent size of the Sylite spots does not reflect the real size of the synaptic gephyrin domain due to the limited resolution of widefield imaging including the detection of outof-focus light. We have clarified in the legend of Fig. 1A that Sylite labelling was with classic epifluorescence microscopy. The scale bar in Fig. 1A corresponds to 5 µm. Since the data were not normally distributed, nonparametric tests (Kruskal- Wallis one-way ANOVA with Dunn’s multiple comparison test or Mann-Whitney U-test for pairwise comparisons) were used (p. 23).
Moreover I would appreciate a clarification and/or citation that the knockin model results in no structural and physiological changes at inhibitory synapses, I believe this model has been applied in previous studies and corresponding clarification can be provided.
The Glrbeos/eos mouse model has been described previously and does not exhibit any structural or physiological phenotypes (Maynard et al. 2021 eLife). The issue was also raised by reviewer R1 (point 5) and has been clarified in the revised manuscript (p. 4).
(2) In the next set of experiments the authors switch to demixing dSTORM experiments - an explanation why this is performed is missing in the text - I guess better resolution to perform more detailed distance measurements? For these experiments: which region of the hippocampus did the authors select, I cannot find this information in legend or main text.
Yes, the dSTORM experiments enable dual-colour structural analysis at high spatial resolution (see response to R1 point 7). An explanation has been added (p. 6).
(3) Regarding parameters of demixing experiments: the number of frames (10.000) seems quite low and the exposure time higher than expected for Alexa 647. Can the authors explain the reason for chosing these particular parameters (low expression profile of the target - so better separation?, less fluorophores on label and shorter collection time?) or is there a reference that can be cited? The laser power is given in the methods in percentage of maximal output power, but for better comparison and reproducibility I recommend to provide the values of a power meter (kW/cm2) as lasers may change their maximum output power during their lifetime.
Acquisition parameters (laser power, exposure time) for dSTORM were chosen to obtain a good localisation precision (~12 nm; see R1 point 8). The number of frames is adequate to obtain well sampled gephyrin scaffolds in the CF680 channel. In the case of the GlyR (nanobody-AF647), the concept of spatial resolution does not really apply due to the low number of targets (see R1, point 13). Power density (irradiance) values have now been given (pp. 18, 21).
(4) For analysis of subsynaptic distribution: how did the authors decide to choose the parameters in the NEO software for DBSCAN clustering - was a series of parameters tested to find optimal conditions and did the analysis start with an initial test if data is indeed clustered (K-ripley) or is there a reference in literature that can be provided?
DBSCAN parameters were optimised manually, by testing different values. Identification of dense and well-delimited gephyrin clusters (CF680) was achieved with a small radius and a high number of detections (80 nm, ≥ 50 neighbours), whereas filtering of low-density background in the AF647 channel (GlyRs) required less stringent parameters (200 nm, ≥ 5) due to the low number of target molecules. Similar parameters were used in a previous publication (Khayenko et al. 2022, Angewandte Chemie). The reference has been provided on p. 22 (see also R1 point 9).
(5) A conclusion/discussion of the results presented in Figure 5 is missing in the text/discussion.
This part of the manuscript has been completely overhauled. It includes new experimental data, quantification of the data (new Fig.5), as well as the discussion and interpretation of our findings (see also R1, point 3). In agreement with our earlier interpretation, the data confirm that low availability of GlyRa1 subunits limits the expression and synaptic targeting of GlyRa1/b heteropentamers. The observation that GlyRa1 overexpression with lentivirus increases the size of the postsynaptic gephyrin domain further points to a structural role, whereby GlyRs can enhance the stability (and size) of inhibitory synapses in hippocampal neurons, even at low copy numbers (pp. 13-14).
(6) In line 552 "suspension" is misleading, better use "solution"
Done.
Reviewer #2 (Significance):
Significance: The manuscript provides new insights to presence of low-copy numbers by visualizing them via SMLM. This is the first report that visualizes GlyR optically in the brain applying the knock-in model of mEOS4b tagged GlyRß and quantifies their copy number comparing distribution and amount of GlyRs from hippocampus and striatum. Imaging data correspond well to electrophysiological measurements in the manuscript.
Field of expertise: Super-Resolution Imaging and corresponding analysis
Reviewer #4 (Evidence, reproducibility and clarity):
In this study, Camuso et al., make use of a knock-in mouse model expressing endogenously mEos4b-tagged GlyRβ to detect endogenous glycine receptors using single-molecule localization microscopy. The main conclusion from this study is that in the hippocampus GlyRβ molecules are barely detected, while inhibitory synapses in the ventral striatum seem to express functionally relevant GlyR numbers.
I have a few points that I hope help to improve the strength of this study.
- In the hippocampus, this study finds that the numbers of detections are very low. The authors perform adequate controls to indicate that these localizations are above noise level. Nevertheless, it remains questionable that these reflect proper GlyRs. The suggestion that in hippocampal synapses the low numbers of GlyRβ molecules "are important in assembly or maintenance of inhibitory synaptic structures in the brain" is on itself interesting, but is not at all supported. It is also difficult to envision how such low numbers could support the structure of a synapse. A functional experiment showing that knockdown of GlyRs affects inhibitory synapse structure in hippocampal neurons would be a minimal test of this.
It is not clear what the reviewer means by “it remains questionable that these reflect proper GlyRs”. The PALM experiments include a series of stringent controls (see R1, point 1) demonstrating the existence of low-copy GlyRs at inhibitory synapses in the hippocampus (Fig. 1) and in the striatum (Fig. 3), and are backed up by dSTORM experiments (Fig. 2). We have no reason to doubt that these receptors are fully functional (as demonstrated for the ventral striatum (Fig. 4). However, due to their low number, a role in inhibitory synaptic transmission is clearly limited, at least in the hippocampus and dorsal striatum.
We therefore propose a structural role, where the GlyRs could be required to stabilise the postsynaptic gephyrin domain in hippocampal neurons. This is based on the idea that the GlyRgephyrin affinity is much higher than that of the GABAAR-gephyrin interaction (reviewed in Kasaragod & Schindelin 2018 Front Mol Neurosci). Accordingly, there is a close relationship between GlyRs and gephyrin numbers, sub-synaptic distribution, and dynamics in spinal cord synapses that are mostly glycinergic (Specht et al. 2013 Neuron; Maynard et al. 2021 eLife; Chapdelaine et al. 2021 Biophys J). It is reasonable to assume that low-copy GlyRs could play a similar structural role at hippocampal synapses. A knockdown experiment targeting these few receptors is technically very challenging and beyond the scope of this study. However, in response to the reviewer's question we have conducted new experiments in cultured hippocampal neurons (new Fig. 5). They demonstrate that overexpression of GlyRa1/b heteropentamers increases the size of the postsynaptic domain in these neurons, supporting our interpretation of a structural role of low-copy GlyRs (p. 14).
- The endogenous tagging strategy is a very strong aspect of this study and provides confidence in the labeling of GlyRβ molecules. One caveat however, is that this labeling strategy does not discriminate whether GlyRβ molecules are on the cell membrane or in internal compartments. Can the authors provide an estimate of the ratio of surface to internal GlyRβ molecules?
Gephyrin is known to form a two-dimensional scaffold below the synaptic membrane to which inhibitory GlyRs and GABAARs attach (reviewed in Alvarez 2017 Brain Res). The majority of the synaptic receptors are therefore thought to be located in the synaptic membrane, which is supported by the close relationship between the sub-synaptic distribution of GlyRs and gephyrin in spinal cord neurons (e.g. Maynard et al. 2021 eLife). To demonstrate the surface expression of GlyRs at hippocampal synapses we labelled cultured hippocampal neurons expressing mEos4b-GlyRa1 with anti-Eos nanobody in non-permeabilised neurons (see Author response image 1). The close correspondence between the nanobody (AF647) and the mEos4b signal confirms that the majority of the GlyRs are indeed located in the synaptic membrane.
Author response image 1.
Left: Lentivirus expression of mEos4b-GlyRa1 in fixed and non-permeabilised hippocampal neurons (mEos4b signal). Right: Surface labelling of the recombinant subunit with anti-Eos nanoboby (AF647).
- “We also estimated the absolute number of GlyRs per synapse in the hippocampus. The number of mEos4b detections was converted into copy numbers by dividing the detections at synapses by the average number of detections of individual mEos4b-GlyRβ containing receptor complexes”. In essence this is a correct method to estimate copy numbers, and the authors discuss some of the pitfalls associated with this approach (i.e., maturation of fluorophore and detection limit). Nevertheless, the authors did not subtract the number of background localizations determined in the two negative control groups. This is critical, particularly at these low-number estimations.
We fully agree that background subtraction can be useful with low detection numbers. In the revised manuscript, copy numbers are now reported as background-corrected values. Specifically, the mean number of detections measured in wildtype slices was used to calculate an equivalent receptor number, which was then subtracted from the copy number estimates across hippocampus, spinal cord and striatum. This procedure is described in the methods (p. 20) and results (p. 5, 8), and mentioned in the figure legends of Fig. 1C, 3C. The background corrected values are given in the text and Table 1.
- Furthermore, the authors state that "The advantage of this estimation is that it is independent of the stoichiometry of heteropentameric GlyRs". However, if the stoichometry is unknown, the number of counted GlyRβ subunits cannot simply be reported as the number of GlyRs. This should be discussed in more detail, and more carefully reported throughout the manuscript.
The reviewer is right to point this out. There is still some debate about the stoichiometry of heteropentameric GlyRs. Configurations with 2a:3b, 3a:2b and 4a:1b subunits have been advanced (e.g. Grudzinska et al. 2005 Neuron; Durisic et al. 2012 J Neurosci; Patrizio et al. 2017 Sci Rep; Zhu & Gouaux 2021 Nature). We have therefore chosen a quantification that is independent of the underlying stoichiometry. Since our quantification is based on very sparse clusters of mEos4b detections that likely originate from a single receptor complex (irrespective of its stoichiometry), the reported values actually reflect the number of GlyRs (and not GlyRb subunits). We have clarified this in the results (p. 5) and throughout the manuscript (Table 1).
- The dual-color imaging provides insights in the subsynaptic distribution of GlyRβ molecules in hippocampal synapses. Why are similar studies not performed on synapses in the ventral striatum where functionally relevant numbers of GlyRβ molecules are found? Here insights in the subsynaptic receptor distribution would be of much more interest as it can be tight to the function.
This is an interesting suggestion. However, the primary aim of our study was to identify the existence of GlyRs in hippocampal regions. At low copy numbers, the concept of sub-synaptic domains (SSDs, e.g. Yang et al. 2021 EMBO Rep) becomes irrelevant (see R1 point 13). It should be pointed out that the dSTORM pointillist images (Fig. 2A) represent individual GlyR detections rather than clusters of molecules. In the striatum, our specific purpose was to solve an open question about the presence of GlyRs in different subregions (putamen, nucleus accumbens).
- It is unclear how the experiments in Figure 5 add to this study. These results are valid, but do not seem to directly test the hypothesis that "the expression of α subunits may be limiting factor controlling the number of synaptic GlyRs". These experiments simply test if overexpressed α subunits can be detected. If the α subunits are limiting, measuring the effect of α subunit overexpression on GlyRβ surface expression would be a more direct test.
Both R1 and R2 have also commented on the data in Fig. 5 and their interpretation. We have substantially revised this section as described before (see R1 point 3) including additional experiments and quantification of the data (new Fig. 5). The findings lend support to our earlier hypothesis that GlyR alpha subunits (in particular GlyRa1) are the limiting factor for the expression of heteropentameric GlyRa/b in hippocampal neurons (pp. 13-14). Since the GlyRa1 subunit itself does not bind to gephyrin (Patrizio et al. 2017 Sci Rep), the synaptic localisation of the recombinant mEos4b-GlyRa1 subunits is proof that they have formed heteropentamers with endogenous GlyRb subunits and driven their membrane trafficking, which the GlyRb subunits are incapable of doing on their own.
Reviewer #4 (Significance):
These results are based on carefully performed single-molecule localization experiments, and are well-presented and described. The knockin mouse with endogenously tagged GlyRβ molecules is a very strong aspect of this study and provides confidence in the labeling, the combination with single-molecule localization microscopy is very strong as it provides high sensitivity and spatial resolution.
The conceptual innovation however seems relatively modest, these results confirm previous studies but do not seem to add novel insights. This study is entirely descriptive and does not bring new mechanistic insights.
This study could be of interest to a specialized audience interested in glycine receptor biology, inhibitory synapse biology and super-resolution microscopy.
My expertise is in super-resolution microscopy, synaptic transmission and plasticity
As we have stated before, the novelty of our study lies in the use of SMLM for the identification of very small numbers of molecules, which requires careful control experiments. This is something that has not been done before and that can be of interest to a wider readership, as it opens up SMLM for ultrasensitive detection of rare molecular events. Using this approach, we solve two open scientific questions: (1) the demonstration that low-copy GlyRs are present at inhibitory synapses in the hippocampus, (2) the sub-region specific expression and functional role of GlyRs in the ventral versus dorsal striatum.
The following review was provided later under the name “Reviewer #4”. To avoid confusion with the last reviewer from above we will refer to this review as R4-2.
Reviewer #4-2 (Evidence, reproducibility and clarity):
Summary:
Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).
The authors investigate the presence of synaptic glycine receptors in the telencephalon, whose presence and function is poorly understood.
Using a transgenically labeled glycine receptor beta subunit (Glrb-mEos4b) mouse model together with super-resolution microscopy (SLMM, dSTORM), they demonstrate the presence of a low but detectable amount of synaptically localized GLRB in the hippocampus. While they do not perform a functional analysis of these receptors, they do demonstrate that these subunits are integrated into the inhibitory postsynaptic density (iPSD) as labeled by the scaffold protein gephyrin. These findings demonstrate that a low level of synaptically localized glycerine receptor subunits exist in the hippocampal formation, although whether or not they have a functional relevance remains unknown.
They then proceed to quantify synaptic glycine receptors in the striatum, demonstrating that the ventral striatum has a significantly higher amount of GLRB co-localized with gephyrin than the dorsal striatum or the hippocampus. They then recorded pharmacologically isolated glycinergic miniature inhibitory postsynaptic currents (mIPSCs) from striatal neurons. In line with their structural observations, these recordings confirmed the presence of synaptic glycinergic signaling in the ventral striatum, and an almost complete absence in the dorsal striatum. Together, these findings demonstrate that synaptic glycine receptors in the ventral striatum are present and functional, while an important contribution to dorsal striatal activity is less likely.
Lastly, the authors use existing mRNA and protein datasets to show that the expression level of GLRA1 across the brain positively correlates with the presence of synaptic GLRB.
The authors use lentiviral expression of mEos4b-tagged glycine receptor alpha1, alpha2, and beta subunits (GLRA1, GLRA1, GLRB) in cultured hippocampal neurons to investigate the ability of these subunits to cause the synaptic localization of glycine receptors. They suggest that the alpha1 subunit has a higher propensity to localize at the inhibitory postsynapse (labeled via gephyrin) than the alpha2 or beta subunits, and may therefore contribute to the distribution of functional synaptic glycine receptors across the brain.
Major comments:
- Are the key conclusions convincing?
The authors are generally precise in the formulation of their conclusions.
(1) They demonstrate a very low, but detectable, amount of a synaptically localized glycine receptor subunit in a transgenic (GlrB-mEos4b) mouse model. They demonstrate that the GLRB-mEos4b fusion protein is integrated into the iPSD as determined by gephyrin labelling. The authors do not perform functional tests of these receptors and do not state any such conclusions.
(2) The authors show that GLRB-mEos4b is clearly detectable in the striatum and integrated into gephyrin clusters at a significantly higher rate in the ventral striatum compared to the dorsal striatum, which is in line with previous studies.
(3) Adding to their quantification of GLRB-mEos4b in the striatum, the authors demonstrate the presence of glycinergic miniature IPSCs in the ventral striatum, and an almost complete absence of mIPSCs in the dorsal striatum. These currents support the observation that GLRB-mEos4b is more synaptically integrated in the ventral striatum compared to the dorsal striatum.
(4) The authors show that lentiviral expression of GLRA1-mEos4b leads to a visually higher number of GLR clusters in cultured hippocampal neurons, and a co-localization of some clusters with gephyrin. The authors claim that this supports the idea that GLRA1 may be an important driver of synaptic glycine receptor localization. However, no quantification or statistical analysis of the number of puncta or their colocalization with gephyrin is provided for any of the expressed subunits. Such a claim should be supported by quantification and statistics
A thorough analysis and quantification of the data in Fig.5 has been carried out as requested by all the other reviewers (e.g. R1, point 3). The new data and results have been described in the revised manuscript (pp. 9-10, 13-14).
- Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?
One unaddressed caveat is the fact that a GLRB-mEos4b fusion protein may behave differently in terms of localization and synaptic integration than wild-type GLRB. While unlikely, it is possible that mEos4b interacts either with itself or synaptic proteins in a way that changes the fused GLRB subunit’s localization. Such an effect would be unlikely to affect synaptic function in a measurable way, but might be detected at a structural level by highly sensitive methods such as SMLM and STORM in regions with very low molecule numbers (such as the hippocampus). Since reliable antibodies against GLRB in brain tissue sections are not available, this would be difficult to test. Considering that no functional measures of the hippocampal detections exist, we would suggest that this possible caveat be mentioned for this particular experiment.
This question has also been raised before (R1, point 5). According to an earlier study the mEos4b-GlyRb knock-in does not cause any obvious phenotypes, with the possible exception of minor loss of glycine potency (Maynard et al. 2021 eLife). The fact that the synaptic levels in the spinal cord in heterozygous animals are precisely half of those of homozygous animals argues against differences in receptor expression, heteropentameric assembly, forward trafficking to the plasma membrane and integration into the synaptic membrane as confirmed using quantitative super-resolution CLEM (Maynard et al. 2021 eLife). Accordingly, we did not observe any behavioural deficits in these animals, making it a powerful experimental model. We have added this information in the revised manuscript (p. 4).
In addition, without any quantification or statistical analysis, the author’s claims regarding the necessity of GLRA1 expression for the synaptic localization of glycine receptors in cultured hippocampal neurons should probably be described as preliminary (Fig. 5).
As mentioned before, we have substantially revised this part (R1, point 3). The quantification and analysis in the new Fig. 5 support our earlier interpretation.
- Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.
The authors show that there is colocalization of gephyrin with the mEos4b-GlyRβ subunit using the Dual-colour SMLM. This is a powerful approach that allows for a claim to be made on the synaptic location of the glycine receptors. The images presented in Figure 1, together with the distance analysis in Figure 2, display the co-localization of the fluorophores. The co-localization images in all the selected regions, hippocampus and striatum, also show detections outside of the gephyrin clusters, which the authors refer to as extrasynaptic. These punctated small clusters seem to have the same size as the ones detected and assigned as part of the synapse. It would be informative if the authors analysed the distribution, density and size of these nonsynaptic clusters and presented the data in the manuscript and also compared it against the synaptic ones. Validating this extrasynaptic signal by staining for a dendritic marker, such as MAP-2 or maybe a somatic marker and assessing the co-localization with the non-synaptic clusters would also add even more credibility to them being extrasynaptic.
The existence of extrasynaptic GlyRs is well attested in spinal cord neurons (e.g. Specht et al. 2013 Neuron; this study see Fig. S2). The fact that these appear as small clusters of detections in SMLM recordings results from the fact that a single fluorophore can be detected several times in consecutive image frames and because of blinking. Therefore, small clusters of detections likely represent single GlyRs (that can be counted), and not assemblies of several receptor complexes. Due to their diffusion in the neuronal membrane, they are seen as diffuse signals throughout the somatodendritic compartment in epifluorescence images (e.g. Fig. 5A). SMLM recordings of the same cells resolves this diffuse signal into discrete nanoclusters representing individual receptors (Fig. 5B). It is not clear what information co-localisation experiments with specific markers could provide, especially in hippocampal neurons, in which the copy numbers (and density) of GlyRs is next to zero.
In addition we would encourage the authors to quantify the clustering and co-localization of virally expressed GLRA1, GLRA2, and GLRB with gephyrin in order to support the associated claims (Fig. 5). Preferably, the density of GLR and gephyrin clusters (at least on the somatic surface, the proximal dendrites, or both) as well as their co-localization probability should be quantified if a causal claim about subunit-specific requirements for synaptic localization is to be made.
Quantification of the data have been carried out (new Fig.5C,D). The results have been described before (R1, point 3) and support our earlier interpretation of the data (pp. 13-14).
Lastly, even though it may be outside of the scope of such a study analysing other parts of the hippocampal area could provide additional important information. If one looks at the Allen Institute’s ISH of the beta subunit the strongest signal comes from the stratum oriens in the CA1 for example, suggesting that interneurons residing there would more likely have a higher expression of the glycine receptors. This could also be assessed by looking more carefully at the single cell transcriptomics, to see which cell types in the hippocampus show the highest mRNA levels. If the authors think that this is too much additional work, then perhaps a mention of this in the discussion would be good.
We have added the requested information from the ISH database of the Allen Institute in the discussion as suggested by the reviewer (p. 12). However, in combination with the transcriptomic data (Fig. S1) our finding strongly suggest that the expression of synaptic GlyRs depends on the availability of alpha subunits rather than on the presence of the GlyRb transcript. This is obvious when one compares the mRNA levels in the hippocampus with those in the basal ganglia (striatum) and medulla. While the transcript concentrations of GlyRb are elevated in all three regions and essentially the same, our data show that the GlyRb copy numbers at synapses differ over more than 2 orders of magnitude (Fig. 1B, Table 1).
- Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.
Since the labeling and some imaging has been performed already, the requested experiment would be a matter of deploying a method of quantification. In principle, it should not require any additional wet-lab experiments, although it may require additional imaging of existing samples.
- Are the data and the methods presented in such a way that they can be reproduced?
Yes, for the most part.
- Are the experiments adequately replicated and statistical analysis adequate?
Yes
Minor comments:
- Specific experimental issues that are easily addressable.
N/A
- Are prior studies referenced appropriately?
Yes
- Are the text and figures clear and accurate?
Yes, although quantification in figure 5 is currently not present.
A quantification has been added (see R1, point 3).
- Do you have suggestions that would help the authors improve the presentation of their data and conclusions?
This paper presents a method that could be used to localize receptors and perhaps other proteins that are in low abundance or for which a detailed quantification is necessary. I would therefore suggest that Figure S4 is included into Figure 2 as the first panel, showcasing the demixing, followed by the results.
We agree in principle with this suggestion. However, the revised Fig. S4 is more complex and we think that it would distract from the data shown in Fig. 2. Given that Fig. S4 is mostly methodological and not essential to understand the text, we have kept it in the supplement for the time being. We leave the final decision on this point to the editor.
Reviewer #4-2 (Significance):
[This review was supplied later]
- Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.
Using a novel and high resolution method, the authors have provided strong evidence for the presence of glycine receptors in the murine hippocampus and in the dorsal striatum. The number of receptors calculated is small compared to the numbers found in the ventral striatum. This is the first study to quantify receptor numbers in these region. In addition it also lays a roadmap for future studies addressing similar questions.
- Place the work in the context of the existing literature (provide references, where appropriate).
This is done well by the authors in the curation of the literature. As stated above, the authors have filled a gap in the presence of glycine receptors in different brain regions, a subject of importance in understanding the role they play in brain activity and function.
- State what audience might be interested in and influenced by the reported findings.
Neuroscientists working at the synaptic level, on inhibitory neurotransmission and on fundamental mechanisms of expression of genes at low levels and their relationship to the presence of the protein would be interested. Furthermore, researchers in neuroscience and cell biology may benefit from and be inspired by the approach used in this manuscript, to potentially apply it to address their own aims.
We thank the reviewer for the positive assessment of the technical and biological implications of our work, as well as the interest of our findings to a wide readership of neuroscientists and cell biologists.
- Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.
Synaptic transmission, inhibitory cells and GABAergic synapses functionally and structurally, cortex and cortical circuits. No strong expertise in super-resolution imaging methods.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This important study characterizes with rigorous methodology anatomical and functional aspects of the peripheral innervation of the Drosophila male reproductive tract. The convincing analysis reveals two distinct types of glutamatergic neurons that co-release either serotonin or octopamine. While serotonergic neurons are required for male fertility, octopaminergic neurons are dispensable. The work is providing invaluable insight into neurochemical control of insemination, peripheral motor control and neuromodulation in the male reproductive tract.
-
Reviewer #1 (Public review):
Summary:
This very thorough anatomical study addresses the innervation of the Drosophila male reproductive tract. Two distinct glutamatergic neuron types were classified: serotonergic (SGNs) and octopaminergic (OGNs). By expansion microscopy, it was established that glutamate and serotonin /octopamine are co-released. The expression of different receptors for 5-HT and OA in muscles and epithelial cells of the innervation target organs was characterized. The pattern of neurotransmitter receptor expression in the target organs suggests that seminal fluid and sperm transport and emission are subjected to complex regulation. While silencing of abdominal SGNs leads to male infertility and prevents sperm from entering the ejaculatory duct, silencing of OGNs does not render males infertile.
Strengths:
The studied neurons were analysed with different transgenes and methods, as well as antibodies against neurotransmitter synthesis enzymes, building a consistent picture of their neurotransmitter identity. The careful anatomical description of innervation patterns together with receptor expression patterns if the target organs provides a solid basis for advancing the understanding how seminal fluid and sperm transport and emission are subjected to complex regulation. The functional data showing that SGNs are required for male fertility and for the release of sperm from the seminal vesicle into the ejaculatory duct is convincing.
Weaknesses:
The functional analysis of the characterized neurons is not as comprehensive as the anatomical description and phenotypic characterization was limited to simple fertility assays. It is understandable that a full functional dissection is beyond the scope of the present work. The paper contains experiments showing neuron-independent peristaltic waves in the reproductive tract muscles, which are thematically not very well integrated into the paper. Although very interesting, one wonders if these experiments would not fit better into a future work that also explores these peristaltic waves and their interrelation with neuromodulation mechanistically.
Comments on revisions:
The manuscript has improved after fixing many small issues/errors. The new sections in the discussion are likewise adding to the quality of the manuscript.
-
Reviewer #2 (Public review):
Summary:
Cheverra et al. present a comprehensive anatomical and functional analysis of the motor neurons innervating the male reproductive tract in Drosophila melanogaster, addressing a gap in our understanding of the peripheral circuits underlying ejaculation and male fertility. They identify two classes of multi-transmitter motor neurons-OGNs (octopamine/glutamate) and SGNs (serotonin/glutamate)-with distinct innervation patterns across reproductive organs. The authors further characterize the differential expression of glutamate, octopamine, and serotonin receptors in both epithelial and muscular tissues of these organs. Behavioral assays reveal that SGNs are essential for male fertility, whereas OGNs and glutamatergic transmission are dispensable. This work provides a high-resolution map linking neuromodulatory identity to organ-specific motor control, offering a valuable framework to explore the neural basis of male reproductive function.
Strengths:
Through the use of an extensive set of GAL4 drivers and antibodies, this work successfully and precisely defines the neurons that innervate the male reproductive tract, identifying the specific organs they target and the nature of the neurotransmitters they release. It also characterizes the expression patterns and localization of the corresponding neurotransmitter receptors across different tissues. The authors describe two distinct groups of dual-identity neurons innervating the male reproductive tract: OGNs, which co-express octopamine and glutamate, and SGNs, which co-express serotonin and glutamate. They further demonstrate that the various organs within the male reproductive system differentially express receptors for these neurotransmitters. Based on these findings, the authors propose that a single neuron capable of co-releasing a fast-acting neurotransmitter along side a slower-acting one may more effectively synchronize and stagger events that require precise timing. This, together with the differential expression of ionotropic glutamate receptors and metabotropic aminergic receptors in postsynaptic muscle tissue, adds an additional layer of complexity to the coordinated regulation of fluid secretion, organ contractility, and directional sperm movement-all contributing to the optimization of male fertility.
Weaknesses:
One potential limitation of the study is the absence of information regarding the number of individuals examined for the various characterizations, which may weaken the strength of the conclusions. Another limitation may be the lack of quantitative analyses in the colocalization and morphological differentiation experiments. Nevertheless, the authors have indicated that such quantifications will be provided in a forthcoming publication; therefore, this should be considered only a partial limitation, as it is expected to be addressed in the near future.
Wider context:
This study delivers the first detailed anatomical map connecting multi-transmitter motor neurons with specific male reproductive structures. It highlights a previously unrecognized functional specialization between serotonergic and octopaminergic pathways and lays the groundwork for exploring fundamental neural mechanisms that regulate ejaculation and fertility in males. The principles uncovered here may help explain how males of Drosophila and other organisms adjust reproductive behaviors in response to environmental changes. Furthermore, by shedding light on how multi-transmitter systems operate in reproductive control, this model could provide insights into therapeutic targets for conditions such as male infertility and prostate cancer-where similar neuronal populations are involved in humans. Ultimately, this genetically accessible system serves as a powerful tool for uncovering how multi-transmitter neurons orchestrate coordinated physiological actions necessary for the functioning of complex organs.
-
Reviewer #3 (Public review):
Summary:
This work provides an overview of the motor neuron landscape in the male reproductive system. Some work had been done to elucidate the circuits of ejaculation in the spine, as well as, the cord but this work fills a gap of knowledge at the level of the reproductive organs. Using complementary approaches the authors show that there are two types of motor neurons that are mutually exclusive: neurons that co-express octopamine and glutamate and neurons that co-express serotonin and glutamate. They also show evidence that both types of neurons express large dense core vesicles indicating that neuropeptides play a role in male fertility. This paper provides a thorough characterization of expression of the different glutamate, octopamine and serotonin receptors in the different organs and tissues of the male reproductive system. The differential expression in different tissues and organs allows building initial theories on the control of emission and expulsion. Additionally, the authors characterize the expression of synaptic proteins and the neuromuscular junction sites. On a mechanistic level, the authors show that neither octopamine/glutamate neuron transmission nor glutamate transmission in serotonin/glutamate neurons are required for male fertility. This final result is quite surprising and opens up many questions on how ejaculation is coordinated.
Strengths:
This work fills an important gap on characterization of innervation of the male reproductive system by providing an extensive characterization of the motor neurons and the potential receptors of motor neuron release.The authors show convincing evidence of glutamate/monoamine co-release and of mutual exclusivity of serotonin/glutamate and octopamine/glutamate neurons.
Weaknesses:
The experiment looking at peristaltic waves in the male organs is missing labeling of the different regions and quantification of the observed waves.
-
Author response:
The following is the authors’ response to the original reviews.
Reviewer #1 (Public review):
Summary:
This very thorough anatomical study addresses the innervation of the Drosophila male reproductive tract. Two distinct glutamatergic neuron types were classified: serotonergic (SGNs) and octopaminergic (OGNs). By expansion microscopy, it was established that glutamate and serotonin /octopamine are co-released. The expression of different receptors for 5-HT and OA in muscles and epithelial cells of the innervation target organs was characterized. The pattern of neurotransmitter receptor expression in the target organs suggests that seminal fluid and sperm transport and emission are subjected to complex regulation. While silencing of abdominal SGNs leads to male infertility and prevents sperm from entering the ejaculatory duct, silencing of OGNs does not render males infertile.
Strengths:
The studied neurons were analysed with different transgenes and methods, as well as antibodies against neurotransmitter synthesis enzymes, building a consistent picture of their neurotransmitter identity. The careful anatomical description of innervation patterns together with receptor expression patterns of the target organs provides a solid basis for advancing the understanding of how seminal fluid and sperm transport and emission are subjected to complex regulation. The functional data showing that SGNs are required for male fertility and for the release of sperm from the seminal vesicle into the ejaculatory duct is convincing.
Weaknesses:
The functional analysis of the characterized neurons is not as comprehensive as the anatomical description, and phenotypic characterization was limited to simple fertility assays. It is understandable that a full functional dissection is beyond the scope of the present work. The paper contains experiments showing neuron-independent peristaltic waves in the reproductive tract muscles, which are thematically not very well integrated into the paper. Although very interesting, one wonders if these experiments would not fit better into a future work that also explores these peristaltic waves and their interrelation with neuromodulation mechanistically.
Reviewer #2 (Public review):
Summary:
Cheverra et al. present a comprehensive anatomical and functional analysis of the motor neurons innervating the male reproductive tract in Drosophila melanogaster, addressing a gap in our understanding of the peripheral circuits underlying ejaculation and male fertility. They identify two classes of multi-transmitter motor neurons-OGNs (octopamine/glutamate) and SGNs (serotonin/glutamate)-with distinct innervation patterns across reproductive organs. The authors further characterize the differential expression of glutamate, octopamine, and serotonin receptors in both epithelial and muscular tissues of these organs. Behavioral assays reveal that SGNs are essential for male fertility, whereas OGNs and glutamatergic transmission are dispensable. This work provides a high-resolution map linking neuromodulatory identity to organ-specific motor control, offering a valuable framework to explore the neural basis of male reproductive function.
Strengths:
Through the use of an extensive set of GAL4 drivers and antibodies, this work successfully and precisely defines the neurons that innervate the male reproductive tract, identifying the specific organs they target and the nature of the neurotransmitters they release. It also characterizes the expression patterns and localization of the corresponding neurotransmitter receptors across different tissues. The authors describe two distinct groups of dual-identity neurons innervating the male reproductive tract: OGNs, which co-express octopamine and glutamate, and SGNs, which co-express serotonin and glutamate. They further demonstrate that the various organs within the male reproductive system differentially express receptors for these neurotransmitters. Based on these findings, the authors propose that a single neuron capable of co-releasing a fast-acting neurotransmitter alongside a slower-acting one may more effectively synchronize and stagger events that require precise timing. This, together with the differential expression of ionotropic glutamate receptors and metabotropic aminergic receptors in postsynaptic muscle tissue, adds an additional layer of complexity to the coordinated regulation of fluid secretion, organ contractility, and directional sperm movement-all contributing to the optimization of male fertility.
Weaknesses:
The main weakness of the manuscript is the lack of detail in the presentation of the results. Specifically, all microscopy image figures are missing information about the number of samples (N), and in the case of colocalization experiments, quantitative analyses are not provided. Additionally, in the first behavioral section, it would be beneficial to complement the data table with figures similar to those presented later in the manuscript for consistency and clarity.
Wider context:
This study delivers the first detailed anatomical map connecting multi-transmitter motor neurons with specific male reproductive structures. It highlights a previously unrecognized functional specialization between serotonergic and octopaminergic pathways and lays the groundwork for exploring fundamental neural mechanisms that regulate ejaculation and fertility in males. The principles uncovered here may help explain how males of Drosophila and other organisms adjust reproductive behaviors in response to environmental changes. Furthermore, by shedding light on how multi-transmitter systems operate in reproductive control, this model could provide insights into therapeutic targets for conditions such as male infertility and prostate cancer, where similar neuronal populations are involved in humans. Ultimately, this genetically accessible system serves as a powerful tool for uncovering how multi-transmitter neurons orchestrate coordinated physiological actions necessary for the functioning of complex organs.
Reviewer #3 (Public review):
Summary:
This work provides an overview of the motor neuron landscape in the male reproductive system. Some work had been done to elucidate the circuits of ejaculation in the spine, as well as the cord, but this work fills a gap in knowledge at the level of the reproductive organs. Using complementary approaches, the authors show that there are two types of motor neurons that are mutually exclusive: neurons that co-express octopamine and glutamate and neurons that co-express serotonin and glutamate. They also show evidence that both types of neurons express large dense core vesicles, indicating that neuropeptides play a role in male fertility. This paper provides a thorough characterization of the expression of the different glutamate, octopamine, and serotonin receptors in the different organs and tissues of the male reproductive system. The differential expression in different tissues and organs allows building initial theories on the control of emission and expulsion. Additionally, the authors characterize the expression of synaptic proteins and the neuromuscular junction sites. On a mechanistic level, the authors show that neither octopamine/glutamate neuron transmission nor glutamate transmission in serotonin/glutamate neurons is required for male fertility. This final result is quite surprising and opens up many questions on how ejaculation is coordinated.
Strengths:
This work fills an important gap in the characterization of innervation of the male reproductive system by providing an extensive characterization of the motor neurons and the potential receptors of motor neuron release. The authors show convincing evidence of glutamate/monoamine co-release and of mutual exclusivity of serotonin/glutamate and octopamine/glutamate neurons.
Weaknesses:
(1) Often, it is mentioned that the expression is higher or lower or regional without quantification or an indication of the number of samples analysed.
(2) The experiment aimed at tracking sperm in the male reproductive system is difficult to interpret when it is not assessed whether ejaculation has occurred.
(3) The experiment looking at peristaltic waves in the male organs is missing labeling of the different regions and quantification of the observed waves.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
(1) While the peripheral innervations are very carefully described, it is not clear to which SGNs and OGNs (i.e., cell bodies in the central nervous system) these innervations belong. Are SV, AG, and ED innervated by branches of one neuron or by separate neurons? Multi-color flip-out experiments could provide an answer to this.
We agree this is important and are planning these experiments for follow-up study.
(2) In contrast, for the analysis of the VT19028 split line (Figure 9), only vnc and cell body images are shown. How do the arborisations of these split combinations look in the periphery? Are the same reproductive organs innervated as shown in Figure 2?
Figure 9S3 was inadvertently omitted from the initial submission. That figure is now included and shows that the VT019028 split broadly innervates the SV, AG, and ED.
(3) In the discussion, I think it would be helpful to offer some potential explanations for the role of octopaminergic and glutamatergic signaling. If not required for basic fertility, they probably have some other role.
Thank you, we have included speculation in the Discussion section "Potential for adaptation to environment".
(4) Line 543: Figure 8S4 E, (not 8E).
Correction made.
Reviewer #2 (Recommendations for the authors):
(1) Line 213-217
Comment:
The use of "significantly less expression" may be misleading, as no quantification or statistical analysis is provided to support this comparison.
Suggestion:
Consider using a more neutral term, such as "markedly less" or "noticeably less," unless quantitative data and statistical analysis are included to substantiate the claim.
Good recommendation.This suggestion has been incorporated.
(2) Line 264-267
Comment:
The observation regarding the distinct morphology of SGNs and OGNs is interesting and could strengthen the argument regarding functional differences.
Suggestion:
Consider including a quantification of morphological complexity (e.g., branching) to support the claim. A method such as Sholl analysis (Sholl, 1953), as adapted in Fernández et al., 2008, could be applied.
This is a good suggestion, and we will consider it as part of a follow-up study.
(3) Line 269-271
Comment:
The anatomical context of the observation is not explicitly stated.
Suggestion:
Add "in the ED" for clarity: "With the TRH-GAL4 experiment in the ED, vGlut-40XMYC (Figure 5S1, A and E) and 6XV5-vMAT (Figure 5S1, B and F) were both present with a highly overlapping distribution (Figure 5S1, I)."
Suggestion has been incorporated.
(4) Line 275-276
Comment:
The claim about the reduced ability to distinguish SGNs and OGNs in the ED would benefit from quantitative support.
Suggestion:
Include a morphological comparison or quantification between SGNs and OGNs in the ED and SV to reinforce this point.
Certain information on morphological comparison can be inferred within the images themselves, and we will include quantitation in a follow-up study.
(5) Line 277-279
Comment:
As with line 269, the anatomical site could be specified more clearly.
Suggestion:
Rephrase as: "With the Tdc2-GAL4 experiment in the ED, vGlut-40XMYC (Figure 5S1, M and Q) and 6XV5-vMAT (Figure 5S1, N and R) were both observed in a highly overlapping distribution (Figure 5S1, U)."
Suggestion has been incorporated.
(6) Line 348-350
Comment:
The phrase "significantly higher density" implies a statistical comparison that is not shown.
Suggestion:
If no quantification is provided, replace with a qualitative term such as "visibly higher" or "notably more dense." Alternatively, add a quantitative analysis with statistical testing to justify the use of "significantly."
Suggestion has been incorporated.
(7) Lines 415-458 (Section comment)
Comment:
There appears to be differential localization of neurotransmitter receptor expression (glutamate in muscle vs. 5-HT in epithelium or neurons), which could have functional implications.
Suggestion:
Expand this section to briefly discuss the differential localization patterns of these receptors and potential implications for signal transduction in male reproductive tissues.
(8) Lines 638-682 (Section comment)
Comment:
The table summarizing fertility phenotypes would be more informative with additional detail on experimental outcomes.
Suggestion:
Add a column showing the number of fertile males over the total tested (e.g., "n fertile / n total"). Also, clarify whether the fertility assays are identical to those reported in Figure 10S2, and whether similar analyses were conducted for females. Consider including a figure summarizing fertility results for all genotypes listed in the table, similar to Figure 10S2.
The fertility tests reported in Table 1 were separate from those reported in Figure 10S2. For these tests, the results were clear-cut with 100% of males and females reported as infertile exhibiting the infertile phenotype. For the males and females reported as fertile, it was also clear-cut with nearly 100% showing fertility at a high level. In subsequent figures we attempted to assess degrees of fertility.
(9) Line 724-727
Comment:
There seems to be a mistake in the identification of the driver lines used to silence OA neurons. Also, figure references might be incorrect.
Suggestion:
The OA neuron driver line should be corrected to "Tdc2-GAL4-DBD ∩ AbdB-AD" instead of TRH-GAL4. Additionally, the figure references should be verified; specifically, the letter "B" (in "Figure 10B, D" and "10B, E") appears to be unnecessary or misplaced.
Thanks for catching this, the corrections have been made.
(10) Line 872-877
Comment:
The discussion on the co-release of fast-acting glutamate and slower aminergic neurotransmitters is interesting and well-articulated. However, it remains somewhat disconnected from the behavioral findings.
Suggestion:
Consider linking this proposed mechanism to the results observed in the mating duration assays. For instance, the sequential action of neurotransmitters described here could potentially underlie the prolonged mating observed when specific neuromodulators are active, helping to functionally integrate molecular and behavioral data.
(11) Line 926-928
Comment:
The interpretation of 5-HT7 receptor expression in the sphincter is compelling, suggesting a role in regulating its function. However, this anatomical observation could be further contextualized with the functional data.
Suggestion:
It may strengthen the interpretation to explicitly connect this finding with the fertility assays, where SGNs - presumably acting via serotonergic signaling - are shown to be necessary for male fertility. This would support a functional role for 5-HT7 in reproductive success via sphincter regulation.
This has been added.
(12) Figure 1
Comment:
The figure legend is generally clear, but could benefit from more consistency and precision in the color-coded labeling. Additionally, the naming of some structures could be more explicit.
Suggestion:
Revise the figure and the legend as follows:
Figure 1. The Drosophila male reproductive system. A) Schematic diagram showing paired testes (colour), SVs (green), AGs (purple), Sph (red), ED (gray), and EB (colour). B) Actual male reproductive system. Te - testes, SV - seminal vesicle, AG - accessory gland, Sph - singular sphincter, ED - ejaculatory duct, EB - ejaculatory bulb. Scale bar: 200 µm.
This suggestion has been incorporated.
(13) Figure 3S2
Comment:
There appears to be a typographical error in the description of the genotypes, which may lead to confusion.
Suggestion:
Correct the legend to reflect the appropriate genotypes:
Figure 3S2. Expression of vGlut-LexA and Tdc2-GAL4 in the Drosophila male reproductive system. A, D, G, J, M, P) vGlut-LexA, LexAop-6XmCherry; B, E, H, K, N, Q) Tdc2-GAL4, UAS-6XGFP; C, F, I, L, O, R) Overlay. Scale bars: O - 50 µm; R - 10 µm.
The corrections have been made.
(14) Figure 3S3
Comment:
The genotypes for panels D and E appear to be incomplete; the DBD component of the split-GAL4 drivers is missing.
Suggestion:
Update the figure legend to:
Figure 3S3. Fruitless and Doublesex expression in the Drosophila male reproductive system. A) fru-GAL4, UAS-6XGFP; B) vGlut-LexA, LexAop-6XmCherry; C) Overlay; D) Tdc2-AD ∩ dsx-GAL4-DBD; E) TRH-AD ∩ dsx-GAL4-DBD. Scale bar: 200 µm.
The corrections have been made.
(15) Figure 4S4
Comment:
There is a repeated segment in the figure legend, which makes it unclear and redundant.
Suggestion:
Edit the legend to remove the duplicated lines:
Figure 4S4. Expression of vGlut, TβH-GFP, and 5-HT at the junction of the SV and AGs with the ED of the Drosophila male reproductive system. A) vGlut-40XV5; B) TβH-GFP; C) 5-HT; D) vGlut-40XV5, TβH-GFP overlay; E) vGlut-40XV5, 5-HT overlay; F) TβH-GFP, 5-HT overlay. Scale bar: 50 µm.
The correction has been made.
(16) Figure 6S5
Comment:
Within this figure, the orientation and/or scale of the tissue varies noticeably between individual panels, making it difficult to directly compare the different experimental conditions.
Suggestion:
For improved clarity and interpretability, consider standardizing the orientation and size of the tissue shown across all panels within the figure. Consistent presentation will facilitate direct comparisons between treatments or genotypes.
There is often variation in the size of the male reproductive organs. They were all acquired at the same magnification. The only point of this figure is there is no vGAT or vAChT at these NMJs and the result is unambiguously negative.
(17) Figure 10
Comment:
Panel A appears redundant, as it shows the same information as the other panels but without indicating statistical significance.
Suggestion:
Consider removing panel A and keeping only the remaining four graphs, which include relevant statistical comparisons and clearly show significant differences.
We realize there is some redundancy of panel A with the other panels, but we feel there is value in having all the genotypes in a single panel for comparison.
Reviewer #3 (Recommendations for the authors):
Here are some suggestions to improve the manuscript:
(1) Prot B GFP experiment: the authors should explain better the time chosen to look at the sperm content of the male reproductive system. At 10 minutes, it is expected that the male has already ejaculated, and therefore, a failure to ejaculate would result in more sperm in the reproductive system, not less. Since we are not certain when the male ejaculates, it would be important to do the analysis at different time points.
In the Prot-GFP experiments, the 10-minute time point was chosen because we nearly always observe sperm in the ejaculatory duct of control males. In the experimental males, we never observed sperm in the ejaculatory duct at this time point. Also, no Prot-GFP sperm were observed in the reproductive tract of females mated to experimental males even when mating was allowed to go to completion, while abundant sperm were found in females mated to Prot-GFP controls. Figure 10S1 has been updated to include Images of these female reproductive systems. The results showing the absence of Prot-GFP sperm in the female reproductive tract mated to experimental males indicates sperm transfer in these males isn't occurring earlier during the copulation process than in control males and that we didn't miss it by only examining at the ejaculatory duct.
(2) Discuss what may be the role of the octopamine/glutamate neurons and glutamate transmission in serotonin/glutamate neurons in the male reproductive system, given that they are not required for fertility (at least under the context in which it was tested). It is quite a striking result that deserves some attention.
We agree it is a surprising result and have included speculation on the role of glutamate and octopamine in male reproduction in the Discussion section "Potential for adaptation to environment".
(3) Very important:
(a) Figure 3 is present in the Word document but not the PDF.
(b) Figure 9S3 is not present
(c) In Figure 5 X), the legend does not correspond to the panel.
All of these corrections have been made.
(4) Other suggestions:
(a) A summary schematic (or several) of the findings would make it an easier read.
(b) Explain why the ejaculatory bulb was left out of the analysis.
(c) Explain in the main text some of the tools, such as, BONT-C and the conditional vGlut mutation.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
The authors employ an unbiased, affinity-guided reagent to label P2X7 receptor and use super-resolution imaging to monitor P2X7 redistribution in response to inflammatory signaling. The evidence is convincing and the study will be valuable to those studying the dynamics of receptor distribution and clustering.
-
Reviewer #1 (Public review):
Summary:
In this paper, the authors developed a chemical labeling reagent for P2X7 receptors, called X7-uP. This labeling reagent selectively labels endogenous P2X7 receptors with biotin based on ligand-directed NASA chemistry. After labeling the endogenous P2X7 receptor with biotin, the receptor can be fluorescently labeled with streptavidin-AlexaFluor647. The authors carefully examined the binding properties and labeling selectivity of X7-uP to P2X7, characterized the labeling site of P2X7 receptors, and demonstrated fluorescence imaging of P2X7 receptors. The data obtained by SDS-PAGE, Western blot, and fluorescence microscopy clearly shows that X7-uP labels the P2X7 receptor. Finally, the authors fluorescently labeled the endogenous P2X7 in BV2 cells, which are a murine microglia model, and used dSTORM to reveal a nanoscale P2X7 redistribution mechanism under inflammatory conditions at high resolution.
Strengths:
X7-uP selectively labels endogenous P2X7 receptors with biotin. Streptavidin-AlexaFluor647 binds to the biotin labeled to the P2X7 receptor, allowing visualization of endogenous P2X7 receptors.
-
Reviewer #2 (Public review):
Summary:
In this manuscript, Arnould et. al. develop an unbiased, affinity-guided reagent to label P2X7 receptor and use super-resolution imaging to monitor P2X7 redistribution in response to inflammatory signaling.
Strengths:
I think the X7-uP probe that they developed is very useful for visualizing localization of P2X7 receptor. They convincingly show that under inflammatory conditions, there is a reorganization of P2X7 localization into receptor clusters. Moreover, I think they have shown a very clever way to specifically label any receptor of interest. This has broad appeal.
I think the authors have done a very nice job addressing my original concerns. Here are those original concerns and my new comments related to how the authors address them.
(1) While the authors state that chemical modification of AZ10606120 to produce the X7-UP reagent has "minimal impact" on the inhibition of P2X7, we can see from Figure 2A and 2B that it does not antagonize P2X7 as effectively as the original antagonist. For the sake of completeness and quantitation, I think it would be great if the authors could determine the IC50 for X7-uP and compare it to the IC50 of AZ10606120.
The authors now show the relative inhibition of X7-uP compared to AZ10606120 at different concentrations. This provides a nice comparison to give the reader an idea of how effectively X7-uP inhibits P2X7 receptor. This is great.
(2) Do the authors know whether modification of the lysines with biotin affects the receptor's affinity for ATP (or ability to be activated by ATP)? What about P2X7 that has been modified with biotin and then labeled with Alexa 647? For the sake of completeness and quantitation, I think it would be great if the authors could determine the EC50 of biotinylated P2X7 for ATP as well as biotinylated and then Alexa 647 labeled P2X7 for ATP and compare these values to the affinity of unmodified WT P2X7 for ATP.
I agree with the authors that assessing the functional integrity of P2X7 following biotinylation and fluorophore labeling is outside the scope of this paper but would be important for studies involving dynamic or post-labeling functional analyses such as live trafficking.
(3) It is a little misleading to color the fluorescence signal from mScarlet green (for example, in Figure 3 and Figure 4). The fluorescence is not at the same wavelength as GFP. In fact, the wavelength (570 nm - 610 nm) for emission is closer to orange/red than to green. I think this color should be changed to differentiate the signal of mScarlet from the GFP signal used for each of the other P2X receptor subtypes.
The authors have now changed the mScarlet color to orange, which solves my concern.
(4) It is my understanding that P2X6 does not form homotrimers. Thus, I was a little surprised to see that the density and distribution of P2X6-GFP in Figure 3 looks very similar to the density and distribution of the other P2X subtypes. Do the authors have an explanation for this? Are they looking at P2X6 protomers inserted into the plasma membrane? Does the cell line have endogenous P2X receptor subtypes? Is Figure 3 showing heterotrimers with P2X6 receptor? A little explanation might be helpful.
The authors address this point very well and include nice data to show that P2X6 does not insert into the plasma membrane as a homotrimer.
(5) It is easy to overlook the fact that the antagonist leaves the binding pocket once the biotin has been attached to the lysines. It might be helpful if the authors made this a little more apparent in Figure 1 or in the text describing the NASA chemistry reaction.
The authors have modified Figure 1 to make it easier to understand the NASA chemistry reaction.
I congratulate the authors on an outstanding paper!
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary:
In this paper, the authors developed a chemical labeling reagent for P2X7 receptors, called X7-uP. This labeling reagent selectively labels endogenous P2X7 receptors with biotin based on ligand-directed NASA chemistry (Ref. 41). After labeling the endogenous P2X7 receptor with biotin, the receptor can be fluorescently labeled with streptavidin-AlexaFluor647. The authors carefully examined the binding properties and labeling selectivity of X7-uP to P2X7, characterized the labeling site of P2X7 receptors, and demonstrated fluorescence imaging of P2X7 receptors. The data obtained by SDS-PAGE, Western blot, and fluorescence microscopy clearly show that X7-uP labels the P2X7 receptor. Finally, the authors fluorescently labeled the endogenous P2X7 in BV2 cells, which are a murine microglia model, and used dSTORM to reveal a nanoscale P2X7 redistribution mechanism under inflammatory conditions at high resolution.
Strengths:
X7-uP selectively labels endogenous P2X7 receptors with biotin. Streptavidin-AlexaFluor647 binds to the biotin labeled to the P2X7 receptor, allowing visualization of endogenous P2X7 receptors.
We thank the reviewer for their positive comment.
Weaknesses:
Weaknesses & Comments
(1) The P2X7 receptor exists in a trimeric form. If it is not a monomer under the conditions of the pull-down assay in Figure 2C, the quantitative values may not be accurate.
We thank the reviewer for this comment. As shown in Figure 2C, the band observed on the denaturing SDS-PAGE corresponds to the monomeric form of the P2X7 receptor. While we cannot exclude the presence of non-monomeric species under native conditions, no such higher-order forms are visible in the gel. This observation supports the conclusion that the quantitative values presented are based on the monomeric form and are therefore reliable.
(2) In Figure 3, GFP fluorescence was observed in the cell. Are all types of P2X receptors really expressed on the cell surface ?
We thank the reviewer for this excellent comment, which was also raised by reviewer 2. To address this concern, we performed a commercial cell-surface protein biotinylation assay to assess whether GFP-tagged P2X receptors reach the plasma membrane. As expected, all P2X subtypes except P2X6 were detected at the cell surface in HEK293T cells, thereby validating our confocal fluorescence microscopy assay. These new data are now included in Figure 3 — figure supplement 1.
(3) The reviewer was not convinced of the advantages of the approach taken in this paper, because the endogenous receptor labeling in this study could also be done using conventional antibody-based labeling methods.
We thank the reviewer for raising this important point and would like to highlight several advantages of our approach compared to conventional antibody-based labeling.
First, commercially available P2X7 antibodies often suffer from poor specificity and are generally not suitable for reliably detecting endogenous P2X7 receptors, as documented in previous studies (e.g., PMID: 16564580 and PMID: 15254086). While recent advances have been made using nanobodies with improved specificity for P2X7 (e.g., PMID: 30074479 and PMID: 38953020), our strategy is distinct and complementary to nanobody-based approaches.
Second, antibodies rely on non-covalent interactions with the receptor, which can result in dissociation over time. In contrast, our X7-uP probe covalently biotinylates lysine residues on the P2X7 receptor through stable amide bond formation. This covalent labeling ensures that the biotin moiety remains permanently attached, an advantage not afforded by reversible binding strategies.
Third, by selectively biotinylating P2X7 receptors, our method provides a versatile platform for the chemical attachment of a wide range of probes or functional moieties. Although we did not demonstrate this application in the current study, we believe this modularity represents an additional advantage of our approach.
We have now revised the discussion to highlight these key advantages, allowing the reader to form their own opinion. We hope this addresses the reviewer’s concerns and clarifies the benefits of our approach.
(4) Although P2X7 was successfully labeled in this paper, it is not new as a chemistry. There is a need for more attractive functional evaluation such as live trafficking analysis of endogenous P2X7.
We agree with the reviewer that the underlying chemistry is not novel per se. However, to our knowledge, it has not previously been applied to the P2X7 receptor, and thus constitutes a novel application with specific relevance for studying native P2X7 biology.
We also appreciate the reviewer’s suggestion regarding live trafficking analysis of endogenous P2X7. While this is indeed a valuable and interesting direction, we believe it lies beyond the scope of the present study, as it would first require demonstrating that the labeling itself does not affect P2X7 function (see below). This important step would necessitate additional experiments, which we consider more appropriate for a follow-up investigation.
(5) The reviewer has concerns that the use of the large-size streptavidin to label the P2X7 receptor may perturbate the dynamics of the receptor.
We thank the reviewer for raising this important point. Although we did not directly measure receptor dynamics, it is indeed possible that tetrameric streptavidin (tStrept-A 647) could promote P2X7 clustering by cross-linking nearby receptors due to its tetravalency (see also point 7 raised by the reviewer). To address this concern, we performed additional dSTORM experiments using a monomeric form of streptavidin-Alexa 647 (mSA) (see PMID: 26979420). Owing to its reduced size and lack of tetravalency, mSA has been shown to minimize artificial crosslinking of synaptic receptors (PMID: 26979420). A drawback of using mSA, however, is that the monomeric form carries only two fluorophores (estimated degree of labeling, DOL ≈ 2, PMID: 26979420), whereas the tetrameric form, according to the manufacturer’s certificate of analysis (Invitrogen S21374), has an average DOL of three fluorophores per monomer, resulting in a total of ~12 fluorophores per streptavidin.
We tested three conditions with mSA incubation: (i) control BV2 cells (without X7-uP), (ii) untreated X7-uP-labeled BV2 cells, and (iii) X7-uP-labeled BV2 cells treated with LPS and ATP (using the same concentrations and incubation times described in the manuscript). As shown in Author response image 1, only LPS+ATP treatment induced a clear increase in the mean cluster density compared to quiescent (untreated) BV2 cells. This effect closely matches the results obtained with tStrept-A 647, supporting the conclusion the tetrameric streptavidin does not artificially promote P2X7 clustering. It is also possible that the cellular environment of BV2 microglia differs from the confined architecture of synapses, which may further explain why cross-linking effects are less pronounced in our system.
As expected, the overall fluorescence signal with mSA was about tenfold lower than with tStrept-A 647, consistent with the expected fluorophore stoichiometry. This lower signal may explain why the values for the untreated condition appeared slightly higher than for the control, although the difference was not statistically significant (P = 0.1455).
We hope these additional experiments adequately address the reviewer’s concerns.
Author response image 1.
BV2 labeling with monomeric streptavidin–Alexa 647 (mSA).(A) Bright-field and dSTORM images of BV2 cells labeled with mSA in the presence (untreated and LPS+ATP) or absence (control) of 1 µM X7-uP. Treatment: LPS (1 µg/mL for 24 hours) and ATP (1 mM for 30 minutes). Scale bars, 10 µm. Insets: Magnified dSTORM images. Scale bars, 1 µm.(B) Quantification of the number of localizations (n = 2 independent experiments). Bars represent mean ± s.e.m. One-way ANOVA with Tukey’s multiple comparisons (P values are indicated above the graph).
(6) It is better to directly label Alexa647 to the P2X7 receptor to avoid functional perturbation of P2X7.
Directly labeling of Alexa647 to the P2X7 receptor would require the design and synthesis of a novel probe, which is currently not available. Implementing such a strategy would involve substantial new experimental work that lies beyond the scope of the present study.
(7) In all imaging experiments, the addition of streptavidin, which acts as a cross-linking agent, may induce P2X7 receptor clustering. This concern would be dispelled if the receptors were labeled with a fluorescent dye instead of biotin and observed.
We refer the reviewer to our response in point 5, where we addressed this concern by comparing tetrameric and monomeric streptavidin conjugates. As noted above (see also point 6), directly labeling the receptor with a fluorescent dye would require the development of a new probe, which is outside the scope of the present study.
(8) There are several mentions of microglia in this paper, even though they are not used. This can lead to misunderstanding for the reader. The author conducted functional analysis of the P2X7 receptor in BV-2 cells, which are a model cell line but not microglia themselves. The text should be reviewed again and corrected to remove the misleading parts that could lead to misunderstanding. e.g. P8. lines 361-364
First, it combines N-cyanomethyl NASA chemistry with the high-affinity AZ10606120 ligand, enabling rapid labeling in microglia (within 10 min)
P8. lines 372-373
Our results not only confirm P2X7 expression in microglia, as previously reported (6, 26-33), but also reveal its nanoscale localization at the cell surface using dSTORM.
We agree with the reviewer’s comment. We have now modified the text, including the title.
Reviewer #2 (Public review):
Summary:
In this manuscript, Arnould et. al. develop an unbiased, affinity-guided reagent to label P2X7 receptor and use super-resolution imaging to monitor P2X7 redistribution in response to inflammatory signaling.
Strengths:
I think the X7-uP probe that they developed is very useful for visualizing localization of P2X7 receptor. They convincingly show that under inflammatory conditions, there is a reorganization of P2X7 localization into receptor clusters. Moreover, I think they have shown a very clever way to specifically label any receptor of interest. This has broad appeal
We thank the reviewer for their positive comment.
Weaknesses:
Overall, the manuscript is novel and interesting. However, I do have some suggestions for improvement.
(1) While the authors state that chemical modification of AZ10606120 to produce the X7-UP reagent has "minimal impact" on the inhibition of P2X7, we can see from Figure 2A and 2B that it does not antagonize P2X7 as effectively as the original antagonist. For the sake of completeness and quantitation, I think it would be great if the authors could determine the IC50 for X7-uP and compare it to the IC50 of AZ10606120.
We thank the reviewer for this insightful comment. Unfortunately, due to the limited availability of X7-uP, we were not able to establish a complete concentration–response curve to determine its IC<sub>50</sub>, which would require testing at concentrations >1 µM. Nevertheless, to estimate the effect of the modification, we assessed current inhibition at 300 µM X7-uP and compared it with the reported IC<sub>50</sub> of AZ10606120 (10 nM). Under these conditions, both compounds produced a similar level of inhibition, indicating that while the chemical modification reduces potency relative to AZ10606120, X7-uP still functions as an effective probe for P2X7. We have now included these data in Figure 2 and revised the text accordingly.
(2) Do the authors know whether modification of the lysines with biotin affects the receptor's affinity for ATP (or ability to be activated by ATP)? What about P2X7 that has been modified with biotin and then labeled with Alexa 647? For the sake of completeness and quantitation, I think it would be great if the authors could determine the EC50 of biotinylated P2X7 for ATP as well as biotinylated and then Alexa 647 labeled P2X7 for ATP and compare these values to the affinity of unmodified WT P2X7 for ATP.
We thank the reviewer for raising this important point. At present, we have not determined whether modification of lysine residues with biotin, or subsequent labeling with Alexa647, affects the ATP sensitivity or functional properties of P2X7. However, we believe this does not impact the conclusions of the current study, as all functional assays were conducted prior to X7-uP labeling. The labeling is used here as a terminal "snapshot" to visualize the endogenous receptor without interfering with the functional characterization.
We fully agree that assessing the functional integrity of P2X7 following biotinylation and fluorophore labeling—such as by determining the EC<sub>50</sub> for ATP—would be essential for studies involving dynamic or post-labeling functional analyses, such as live trafficking. However, as noted earlier in our response to Reviewer 1 (point 4), these experiments lie beyond the scope of the current study.
(3) It is a little misleading to color the fluorescence signal from mScarlet green (for example, in Figure 3 and Figure 4). The fluorescence is not at the same wavelength as GFP. In fact, the wavelength (570 nm - 610 nm) for emission is closer to orange/red than to green. I think this color should be changed to differentiate the signal of mScarlet from the GFP signal used for each of the other P2X receptor subtypes.
As suggested, we changed the mScarlet color to orange for all relevant figures.
(4) It is my understanding that P2X6 does not form homotrimers. Thus, I was a little surprised to see that the density and distribution of P2X6-GFP in Figure 3 looks very similar to the density and distribution of the other P2X subtypes. Do the authors have an explanation for this? Are they looking at P2X6 protomers inserted into the plasma membrane? Does the cell line have endogenous P2X receptor subtypes? Is Figure 3 showing heterotrimers with P2X6 receptor? A little explanation might be helpful.
We thank the reviewer for raising this important point. Indeed, it is well established that P2X6 does not form functional channels, which supports the conclusion that it does not form homotrimeric complexes. Although previous studies have shown that P2X6–GFP expression is generally lower, more diffuse, and not efficiently targeted to the cell surface compared with other P2X subtypes (see PMID: 12077178), the similar fluorescence distribution and density observed in our Figure 3 do not imply that P2X6 forms homotrimers.
We did not directly assess the presence of endogenous P2X6 in our HEK293T cells; however, according to the Human Protein Atlas, there is no detectable P2X6 RNA expression in HEK293 cells (nTPM = 0), indicating that endogenous P2X6 is not expressed in this cell line. To further investigate surface expression (see also point 2 of reviewer 1), we performed a commercial cell-surface protein biotinylation assay to assess whether GFP-tagged P2X6 reaches the plasma membrane. As expected, P2X6 was not detected at the cell surface in HEK293T cells, whereas GFP-tagged P2X1 to P2X5 were readily detected. These results further support the conclusion that P2X6 does not insert into the plasma membrane as a homotrimer, thereby validating our confocal fluorescence microscopy assay. These new data are now included in Figure 3 — figure supplement 1.
(5) It is easy to overlook the fact that the antagonist leaves the binding pocket once the biotin has been attached to the lysines. It might be helpful if the authors made this a little more apparent in Figure 1 or in the text describing the NASA chemistry reaction.
We thank the reviewer for this insightful suggestion. To address this, we have modified Figure 1A and updated the legend.
Reviewer #3 (Public review):
Summary:
This manuscript describes the development of a covalent labeling probe (X7-uP) that selectively targets and tags native P2X7 receptors at the plasma membrane of BV2 microglial cells. Using super-resolution imaging (dSTORM), the authors demonstrate that P2X7 receptors form nanoscale clusters upon microglial activation by lipopolysaccharide (LPS) and ATP, correlating with synergistic IL-1β release. These findings advance understanding of P2X7 reorganization during inflammation and provide a generalizable labeling strategy for monitoring endogenous P2X7 in immune cells.
Strengths:
(1) The authors designed X7-uP by coupling a high-affinity, P2X7-specific antagonist (AZ10606120) with N-cyanomethyl NASA chemistry to achieve site-directed biotinylation. This approach offers high specificity, minimal off-target reactivity, and a straightforward pull-down/imaging readout.
(2) The results connect P2X7's nanoscale clustering directly with IL-1β secretion in microglia, reinforcing the role of P2X7 in inflammation. By localizing endogenous P2X7 at single-molecule resolution, the authors reveal how LPS priming and ATP stimulation synergistically reorganize the receptor.
(3) The authors systematically validate their method in recombinant systems (HEK293 cells) and in BV2 cells, showing selective inhibition, mutational confirmation of the binding site, and Western blot pulldown experiments.
We thank the reviewer for their positive comment.
Weaknesses:
(1) While the data strongly indicate that P2X7 clustering contributes to IL-1β release, the manuscript would benefit from additional experiments (if feasible) or discussion on how receptor clustering interfaces with downstream inflammasome assembly. Clarification of whether the P2X7 clusters physically colocalize with known inflammasome proteins would solidify the mechanism.
We thank the reviewer for this valuable suggestion. Determining the physical colocalization of P2X7 clusters with known inflammasome components would provide important insight into the molecular partners involved in inflammasome activation. However, we believe that such an investigation would constitute a substantial study on its own and therefore lies beyond the scope of the present work.
Nevertheless, in response to the reviewer’s suggestion, we have added a short paragraph at the end of the Discussion section addressing potential mechanisms by which P2X7 clustering may contribute to downstream inflammasome activation. We also revised the text to tone down the hypothesis of physical colocalization.
(2) The authors might expand on the scope of X7-uP in other native cells that endogenously express P2X7 (e.g., macrophages, dendritic cells). Although they mention the possibility, demonstrating the probe's applicability in at least one other primary immune cell type would strengthen its general utility.
We thank the reviewer for this valuable suggestion. Again, we believe that such an investigation would constitute a substantial study on its own and therefore lies beyond the scope of the present work.
(3) The authors do include appropriate negative controls, yet providing additional details (e.g., average single-molecule on-time or blinking characteristics) in supplementary materials could help readers assess cluster calculations.
As suggested, we have included additional data showing single-molecule blinking events in untreated and LPS+ATP-treated BV2 cells, along with the corresponding movies. The data are now presented in Figure 5—supplement figure 3A and B and Figure 5—Videos 1 and 2.
Recommendations for the authors:
Reviewer #2 (Recommendations for the authors):
(1) On line 96, the authors refer to the "ballast" domain of P2X7 receptor but do not cite the original article from which this nomenclature originated (McCarthy et al., 2019, Cell). This article should be cited to give appropriate credit.
Done.
(2) On line 602, the authors state that they use models from PDB 1MK5 and 6U9W to generate the cartoons in Figure 6. The manuscripts from which these PDB files were generated need to be appropriately cited.
Done.
(3) On line 319, the authors say "300 mM BzATP" but I think they mean 300 uM.
Done. Thank you for catching the typo.
Reviewer #3 (Recommendations for the authors):
Overall, excellent data quality. The paper would benefit from a discussion of the physiological implications of clustering. It would also be helpful to elaborate about the potential mechanisms for clustering: diffusion and/or insertion. Finally, the authors should comment on work by Mackinnon's (PMID: 39739811) and Santana lab (PMID: 31371391) on two distinct models for clustering of proteins.
As suggested by the reviewer, we have revised the discussion to incorporate their comments. First, we have added the following text:
“Upon BV2 activation, we observed significant nanoscale reorganization of P2X7. Both LPS and ATP (or BzATP) trigger P2X7 upregulation and clustering, increasing the overall number of surface receptors and the number of receptors per cluster, from one to three (Figure 6). By labeling BV2 cells with X7-uP shortly after IL-1b release, we were able to correlate the nanoscale distribution of P2X7 with the functional state of BV2 cells, consistent with the two-signal, synergistic model for IL-1b secretion observed in microglia and other cell types (Ferrari et al, 1996; Perregaux et al, 2000; Ferrari et al, 2006; Di Virgilio et al, 2017; He et al, 2017; Swanson et al, 2019). In this model, LPS priming leads to intracellular accumulation of pro-IL-1b, while ATP stimulation activates P2X7, triggering NLRP3 inflammasome activation and the subsequent release of mature IL-1b.
What is the mechanism underlying P2X7 upregulation that leads to an overall increase in surface receptors—does it result from the lateral diffusion of previously masked receptors already present at the plasma membrane, or from the insertion of newly synthesized receptors from intracellular pools in response to LPS and ATP? Although our current data do not distinguish between these possibilities, a recent study suggests that the a1 subunit of the Na<sup>+</sup>/K</sup>+</sup>-ATPase (NKAa1) forms a complex with P2X7 in microglia, including BV2 cells, and that LPS+ATP induces NKAa1 internalization (Huang et al, 2024). This internalization appears to release P2X7 from NKAa1, allowing P2X7 to exist in its free form. We speculate that the internalization of NKAa1 induced by both LPS and ATP exposes previously masked P2X7 sites, including the allosteric AZ10606120 sites, thus making them accessible for X7-uP labeling.”
Second, we have added a short paragraph at the end of the Discussion section addressing potential mechanisms by which P2X7 clustering may contribute to downstream inflammasome activation:
“What mechanisms underlie P2X7 clustering in response to inflammatory signals? Several models have been proposed to explain membrane protein clustering, including recruitment to structural scaffolds (Feng & Zhang, 2009), partitioning into membrane domains enriched in specific chemical components such as lipid rafts (Simons & Ikonen, 1997), and self-assembly mechanisms (Sieber et al, 2007). These self-assembly mechanisms include an irreversible stochastic model (Sato et al, 2019) and a more recent reversible self-oligomerization model which gives rise to higher-order transient structures (HOTS) (Zhang et al, 2025). Supported by cryogenic optical localization microscopy with very high resolution (~5 nm), the HOTS model has been observed in various membrane proteins, including ion channels and receptors (Zhang et al, 2025). Furthermore, HOTS are suggested to be dynamically modulated and to play a functional role in cell signaling, potentially influencing both physiological and pathological processes (Zhang & MacKinnon, 2025). While this hypothesis is compelling, our current dSTORM data lack sufficient spatial resolution to confirm whether P2X7 trimers form HOTS via self-oligomerization. Further biophysical and ultra-high-resolution imaging studies are required to test this model in the context of P2X7 clustering.”
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This fundamental manuscript provides compelling evidence that BK and CaV1.3 channels can co-localize as ensembles early in the biosynthetic pathway, including within the ER and Golgi. The findings, supported by a range of imaging and proximity assays, offer insights into channel organization in both heterologous and endogenous systems. The data substantiate the central claims, while highlighting intriguing mechanistic questions for future studies: the determinants of mRNA co-localization, the temporal dynamics of ensemble trafficking, and the physiological implications of pre-assembly for channel function at the plasma membrane.
-
Reviewer #1 (Public review):
Summary:
The co-localization of large conductance calcium- and voltage activated potassium (BK) channels with voltage-gated calcium channels (CaV) at the plasma membrane is important for the functional role of these channels in controlling cell excitability and physiology in a variety of systems. An important question in the field is where and how do BK and CaV channels assemble as 'ensembles' to allow this coordinated regulation - is this through preassembly early in the biosynthetic pathway, during trafficking to the cell surface or once channels are integrated into the plasma membrane. These questions also have broader implications for assembly of other ion channel complexes. Using an imaging based approach, this paper addresses the spatial distribution of BK-CaV ensembles using both overexpression strategies in tsa201 and INS-1 cells and analysis of endogenous channels in INS-1 cells using proximity ligation and superesolution approaches. In addition, the authors analyse the spatial distribution of mRNAs encoding BK and Cav1.3. The key conclusion of the paper that BK and CaV1.3 are co-localised as ensembles intracellularly in the ER and Golgi is well supported by the evidence. The experiments and analysis are carefully performed and the findings are very well presented.
-
Reviewer #3 (Public review):
Summary:
The authors present a clearly written and beautifully presented piece of work demonstrating clear evidence to support the idea that BK channels and Cav1.3 channels can co-assemble prior to their assertion in the plasma membrane.
Strengths:
The experimental records shown back up their hypotheses and the authors are to be congratulated for the large number of control experiments shown in the ms.
-
Author response:
The following is the authors’ response to the previous reviews.
Reviewer #1 (Public review):
Summary:
This manuscript by Pournejati et al investigates how BK (big potassium) channels and CaV1.3 (a subtype of voltage-gated calcium channels) become functionally coupled by exploring whether their ensembles form early-during synthesis and intracellular trafficking-rather than only after insertion into the plasma membrane. To this end, the authors use the PLA technique to assess the formation of ion channel associations in the different compartments (ER, Golgi or PM), single-molecule RNA in situ hybridization (RNAscope), and super-resolution microscopy.
Strengths:
The manuscript is well written and addresses an interesting question, combining a range of imaging techniques. The findings are generally well-presented and offer important insights into the spatial organization of ion channel complexes, both in heterologous and endogenous systems.
Weaknesses:
The authors have improved their manuscript after revisions, and some previous concerns have been addressed.
Still, the main concern about this work is that the current experiments do not quantitatively or mechanistically link the ensembles observed intracellularly (in the endoplasmic reticulum (ER) or Golgi) to those found at the plasma membrane (PM). As a result, it is difficult to fully integrate the findings into a coherent model of trafficking. Specifically, the manuscript does not address what proportion of ensembles detected at the PM originated in the ER. Without data on the turnover or halflife of these ensembles at the PM, it remains unclear how many persist through trafficking versus forming de novo at the membrane. The authors report the percentage of PLApositive ensembles localized to various compartments, but this only reflects the distribution of pre-formed ensembles. What remains unknown is the proportion of total BK and Ca<sub>V</sub>1.3 channels (not just those in ensembles) that are engaged in these complexes within each compartment. Without this, it is difficult to determine whether ensembles form in the ER and are then trafficked to the PM, or if independent ensemble formation also occurs at the membrane. To support the model of intracellular assembly followed by coordinated trafficking, it would be important to quantify the fraction of the total channel population that exists as ensembles in each compartment. A comparable ensemble-to-total ratio across ER and PM would strengthen the argument for directed trafficking of pre-assembled channel complexes.
We appreciate the reviewer’s thoughtful comment and agree that quantitatively linking intracellular hetero-clusters to those at the plasma membrane is an important and unresolved question. Our current study does not determine what proportion of ensembles at the plasma membrane originated during trafficking. It also does not quantify the fraction of total BK and Ca<sub>V</sub>1.3 channels engaged in these complexes within each compartment. Addressing this requires simultaneous measurement of multiple parameters—total BK channels, total Ca<sub>V</sub>1.3 channels, hetero-cluster formation (via PLA), and compartment identity—in the same cell. This is technically challenging. The antibodies used for channel detection are also required for the proximity ligation assay, which makes these measurements incompatible within a single experiment.
To overcome these limitations, we are developing new genetically encoded tools to enable real-time tracking of BK and Ca<sub>V</sub>1.3 dynamics in live cells. These approaches will enable us to monitor channel trafficking and the formation of hetero-clusters, as detected by colocalization. This kind of experiments will provide insight into their origin and turnover. While these experiments are beyond the scope of the current study, the findings in our current manuscript provide the first direct evidence that BK and CaV channels can form hetero-clusters intracellularly prior to reaching the plasma membrane. This mechanistic insight reveals a previously unrecognized step in channel organization and lays the foundation for future work aimed at quantifying ensemble-to-total ratios and determining whether coordinated trafficking of pre-assembled complexes occurs.
This limitation is acknowledged in the discussion section, page 23. It reads: “Our findings highlight the intracellular assembly of BK-Ca<sub>V</sub>1.3 hetero-clusters, though limitations in resolution and organelle-specific analysis prevent precise quantification of the proportion of intracellular complexes that ultimately persist on the cell surface.”
Reviewer #2 (Public review):
Summary:
The co-localization of large conductance calcium- and voltage activated potassium (BK) channels with voltage-gated calcium channels (CaV) at the plasma membrane is important for the functional role of these channels in controlling cell excitability and physiology in a variety of systems.
An important question in the field is where and how do BK and CaV channels assemble as 'ensembles' to allow this coordinated regulation - is this through preassembly early in the biosynthetic pathway, during trafficking to the cell surface or once channels are integrated into the plasma membrane. These questions also have broader implications for assembly of other ion channel complexes
Using an imaging based approach, this paper addresses the spatial distribution of BKCaV ensembles using both overexpression strategies in tsa201 and INS-1 cells and analysis of endogenous channels in INS-1 cells using proximity ligation and superesolution approaches. In addition, the authors analyse the spatial distribution of mRNAs encoding BK and Cav1.3.
The key conclusion of the paper that BK and Ca<sub>V</sub>1.3 are co-localised as ensembles intracellularly in the ER and Golgi is well supported by the evidence.However, whether they are preferentially co-translated at the ER, requires further work. Moreover, whether intracellular pre-assembly of BK-Ca<sub>V</sub>1.3 complexes is the major mechanism for functional complexes at the plasma membrane in these models requires more definitive evidence including both refinement of analysis of current data as well as potentially additional experiments.
The reviewer raises the question of whether BK and Ca<sub>V</sub>1.3 channels are preferentially co-translated. In fact, I would like to propose that co-translation has not yet been clearly defined for this type of interaction between ion channels. In our current work, we 1) observed the colocalization between BK and Ca<sub>V</sub>1.3 mRNAs and 2) determined that 70% of BK mRNA in active translation also colocalizes with Ca<sub>V</sub>1.3 mRNA. We think these results favor the idea of translational complexes that can underlie the process of co-translation. However, and in total agreement with the Reviewer, the conclusion that the mRNA for the two ion channels is cotranslated would require further experimentation. For instance, mRNA coregulation is one aspect that could help to define co-translation.
To avoid overinterpretation, we have revised the manuscript to remove references to “co-translation” in the Results section and included the word “potential” when referring to co-translation in the Discussion section. We also clarified the limitations of our evidence in the Discussion that can be found on page 25: “It is important to note that while our data suggest mRNA coordination, additional experiments are required to directly assess co-translation.”
Strengths & Weaknesses
(1) Using proximity ligation assays of overexpressed BK and CaV1.3 in tsa201 and INS1 cells the authors provide strong evidence that BK and CaV can exist as ensembles (ie channels within 40 nm) at both the plasma membrane and intracellular membranes, including ER and Golgi. They also provide evidence for endogenous ensemble assembly at the Golgi in INS-1 cells and it would have been useful to determine if endogenous complexes are also observe in the ER of INS-1 cells. There are some useful controls but the specificity of ensemble formation would be better determined using other transmembrane proteins rather than peripheral proteins (eg Golgi 58K).
We thank the reviewer for their thoughtful feedback and for recognizing the strength of our proximity ligation assay data supporting BK–Ca<sub>V</sub>1.3 hetero-clusters formation at both the plasma membrane and intracellular compartments. As for specificity controls, we appreciate the suggestion to use transmembrane markers. To strengthen our conclusion, we have performed an additional experiment comparing the number of PLA puncta formed by the interaction of Ca<sub>V</sub>1.3 and BK channels with the number of PLA puncta formed by the interaction of Ca<sub>V</sub>1.3 channels and ryanodine receptors in INS-1 cells. As shown in the figure below, the number of interactions between Ca<sub>V</sub>1.3 and BK channels is significantly higher than that between Ca<sub>V</sub>1.3 and RyR<sub>2</sub>. Of note, RyR<sub>2</sub> is a protein resident of the ER. These results provide additional evidence of the existence of endogenous complex formation in INS-1 cells. We have added this figure as a supplement.
(2) Ensemble assembly was also analysed using super-resolution (dSTORM) imaging in INS-1 cells. In these cells only 7.5% of BK and CaV particles (endogenous?) co-localise that was only marginally above chance based on scrambled images. More detailed quantification and validation of potential 'ensembles' needs to be made for example by exploring nearest neighbour characteristics (but see point 4 below) to define proportion of ensembles versus clusters of BK or Cav1.3 channels alone etc. For example, it is mentioned that a distribution of distances between BK and Cav is seen but data are not shown.
We thank the reviewer for this comment. To address the request for more detailed quantification and validation of ensembles, we performed additional analyses:
Proportion of ensembles vs isolated clusters: We quantified clusters within 200 nm and found that 37 ± 3% of BK clusters are near one or more CaV1.3 clusters, whereas 15 ± 2% of CaV1.3 clusters are near BK clusters. Figure 8– Supplementary 1A
Distance distribution: As shown in Figure 8–Supplementary 1B, the nearestneighbor distance distribution for BK-to-CaV1.3 in INS-1 cells (magenta) is shifted toward shorter distances compared to randomized controls (gray), supporting preferential localization of BK–CaV1.3 hetero-clusters.
Together, these analyses confirm that BK–CaV1.3 ensembles occur more frequently than expected by chance and exhibit an asymmetric organization favoring BK proximity to CaV1.3 in INS-1 cells. We have included these data and figures in the revised manuscript, as well as description in the Results section.
(3) The evidence that the intracellular ensemble formation is in large part driven by cotranslation, based on co-localisation of mRNAs using RNAscope, requires additional critical controls and analysis. The authors now include data of co-localised BK protein that is suggestive but does not show co-translation. Secondly, while they have improved the description of some controls mRNA co-localisation needs to be measured in both directions (eg BK - SCN9A as well as SCN9A to BK) especially if the mRNAs are expressed at very different levels. The relative expression levels need to be clearly defined in the paper. Authors also use a randomized image of BK mRNA to show specificity of co-localisation with Cav1.3 mRNA, however the mRNA distribution would not be expected to be random across the cell but constrained by ER morphology if cotranslated so using ER labelling as a mask would be useful?
We thank the reviewer for these constructive suggestions. We measured mRNA colocalization in both directions as recommended. As shown in the figure below, colocalization between KCNMA1 and SCN9A transcripts was comparable in both directions, with no statistically significant difference, supporting the specificity of the observed associations. We decided not to add this to the original figure to keep the figure simple.
We agree that co-localization of BK protein with BK mRNA is not conclusive evidence of co-translation, and we do not intend to mislead readers in our conclusion. Consequently, we were careful in avoiding the use of co-translation in the result section and added the word “potential” when referring to co-translation in the Discussion section. We added a sentence in the discussion to caution our interpretation: “It is important to note that while our data suggest mRNA coordination, additional experiments are required to directly assess cotranslation.”
Author response image 1.
(4) The authors attempt to define if plasma membrane assemblies of BK and CaV occur soon after synthesis. However, because the expression of BK and CaV occur at different times after transient transfection of plasmids more definitive experiments are required. For example, using inducible constructs to allow precise and synchronised timing of transcription. This would also provide critical evidence that co-assembly occurs very early in synthesis pathways - ie detecting complexes at ER before any complexes
We appreciate the reviewer’s insightful suggestion regarding the use of inducible constructs to synchronize transcription timing. This is an excellent approach and would allow direct testing of whether co-assembly occurs early in the synthesis pathway, including detection of complexes at the ER prior to plasma membrane localization. These experiments are beyond the scope of the present work but represent an important direction for future studies.
We have added the following sentence to the Discussion section (page 24) to highlight this idea. “Future experiments using inducible constructs to precisely control transcription timing will enable more precise quantification of heterocluster formation in the ER compartment prior to plasma membrane insertion and reduce the variability introduced by differences in expression timing after plasmid transfection.”
(5) While the authors have improved the definition of hetero-clusters etc it is still not clear in superesolution analysis, how they separate a BK tetramer from a cluster of BK tetramers with the monoclonal antibody employed ie each BK channel will have 4 binding sites (4 subunits in tetramer) whereas Cav1.3 has one binding site per channel. Thus, how do authors discriminate between a single BK tetramer (molecular cluster) with potential 4 antibodies bound compared to a cluster of 4 independent BK channels.
We appreciate the reviewer’s thoughtful comment regarding the interpretation of super-resolution data. We agree that distinguishing a single BK tetramer from a cluster of multiple BK channels is challenging when using an antibody that can bind up to four sites per channel. To clarify, our analysis does not attempt to resolve individual subunits within a tetramer; rather, it focuses on the nanoscale spatial proximity of BK and Ca<sub>V</sub>1.3 signals.
We want to note that this limitation applies only to the super-resolution maps in Figures 8C and 9D and does not affect Airyscan-based analyses or measurements of BK–Ca<sub>V</sub>1.3 proximity.
To address how we might distinguish between a single BK tetramer and a cluster of multiple BK channels, we considered two contrasting scenarios. In the first case, we assume that all four α-subunits within a tetramer are labeled. Based on cryoEM structures, a BK tetramer measures approximately 13 nm × 13 nm (≈169 nm²). Adding two antibody layers (primary and secondary) would increase the footprint by ~14 nm in each direction, resulting in an estimated area of ~41 nm × 41 nm (≈1681 nm²). Under this assumption, particles smaller than ~1681 nm² would likely represent individual tetramers, whereas larger particles would correspond to clusters of multiple tetramers.
In the second scenario, we propose that steric constraints at the S9–S10 segment, where the antibody binds, limit labeling to a single antibody per tetramer. If true, the localization precision would approximate 14 nm × 14 nm—the combined size of the antibody complex and the channel—close to the resolution limit of the microscope. To test this, we performed a control experiment using two antibodies targeting the BK C-terminal domain, raised in different species and labeled with distinct fluorophores. Super-resolution imaging revealed that only ~12% of particles were colocalized, suggesting that most channels bind a single antibody.
If multiple antibodies could bind each tetramer, we would expect much greater colocalization.
Although these data are not included in the manuscript, we have added the following clarification to the Results section (page 19): “It is important to note that this technique does not allow us to distinguish between labeling of four BK αsubunits within a tetramer and labeling of multiple BK channel clusters. Hence, particles smaller than ~1680 nm² may represent either a single tetramer or a cluster. This limitation applies to Figures 8C and 9D and does not affect measurements of BK–Ca<sub>V</sub>1.3 proximity.”
Author response image 2.
(6) The post-hoc tests used for one way ANOVA and ANOVA statistics need to be defined throughout
We thank the reviewer for highlighting the need for clarity regarding our statistical analyses. We have now specified the post-hoc tests used for all one-way ANOVA and ANOVA comparisons throughout the manuscript, and updated figure legends.
Reviewer #3 (Public review):
Summary:
The authors present a clearly written and beautifully presented piece of work demonstrating clear evidence to support the idea that BK channels and Cav1.3 channels can co-assemble prior to their assertion in the plasma membrane.
Strengths:
The experimental records shown back up their hypotheses and the authors are to be congratulated for the large number of control experiments shown in the ms.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
The authors have sufficiently addressed the specific points previously raised and the manuscript has improved clarity in those aspects. My main concern, which still remains, is stated in the public review.
Reviewer #3 (Recommendations for the authors):
I am content that the authors have attempted to fully address my previous criticisms.
I have only three suggestions
(1) I think the word Homo-clusters at the bottom right of Figure 1 is erroneously included.
We thank the reviewer for bringing this to our attention. The figure has been corrected accordingly.
(2) The authors should, for completeness, to refer to the beta, gamma and LINGO subunit families in the Introduction and include appropriate references:
Knaus, H. G., Folander, K., Garcia-Calvo, M., Garcia, M. L., Kaczorowski, G. J., Smith, M., & Swanson, R. (1994). Primary sequence and immunological characterization of betasubunit of high conductance Ca2+-activated K+ channel from smooth muscle. The Journal of Biological Chemistry, 269(25), 17274-17278.
Brenner, R., Jegla, T. J., Wickenden, A., Liu, Y., & Aldrich, R. W. (2000a). Cloning and functional characterization of novel large conductance calcium-activated potassium channel beta subunits, hKCNMB3 and hKCNMB4. The Journal of Biological Chemistry, 275(9), 6453-6461.
Yan, J & R.W. Aldrich. (2010) LRRC26 auxiliary protein allows BK channel activation at resting voltage without calcium. Nature. 466(7305):513-516
Yan, J & R.W. Aldrich. (2012) BK potassium channel modulation by leucine-rich repeatcontaining proteins. Proceedings of the National Academy of Sciences 109(20):7917-22
Dudem, S, Large RJ, Kulkarni S, McClafferty H, Tikhonova IG, Sergeant, GP, Thornbury, KD, Shipston, MJ, Perrino BA & Hollywood MA (2020). LINGO1 is a novel regulatory subunit of large conductance, Ca2+-activated potassium channels. Proceedings of the National Academy of Sciences 117 (4) 2194-2200
Dudem, S., Boon, P. X., Mullins, N., McClafferty, H., Shipston, M. J., Wilkinson, R. D. A., Lobb, I., Sergeant, G. P., Thornbury, K. D., Tikhonova, I. G., & Hollywood, M. A. (2023). Oxidation modulates LINGO2-induced inactivation of large conductance, Ca2+-activated potassium channels. The Journal of Biological Chemistry, 299 (3) 102975.
We agree with the reviewer’s suggestion and have revised the Introduction to include references to the beta, gamma, and LINGO subunit families. Appropriate citations have been added to ensure completeness and contextual relevance.
Additionally, BK channels are modulated by auxiliary subunits, which fine-tune BK channel gating properties to adapt to different physiological conditions. The β, γ, and LINGO1 subunits each contribute distinct structural and regulatory features: β-subunits modulate Ca²⁺ sensitivity and can induce inactivation; γ-subunits shift voltage-dependent activation to more negative potentials; and LINGO1 reduces surface expression and promotes rapid inactivation (18-24). These interactions ensure precise control over channel activity, allowing BK channels to integrate voltage and calcium signals dynamically in various cell types.
(3) I think it may be more appropriate to include the sentence "The probes against the mRNAs of interest and tested in this work were designed by Advanced Cell Diagnostics." (P16, right hand column, L12-14) in the appropriate section of the Methods, rather than in Results.
We thank the reviewer for this helpful suggestion. In response, we have relocated the sentence to the appropriate section of the Methods, where it now appears with relevant context.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
The authors studied cognitive control signals in the anterior cingulate cortex (ACC) while rats selected between small immediate and larger delayed rewards. The description of behavioral strategies related to value-tracking signals in ACC is potentially useful. The evidence in support of this finding is incomplete due to issues with the task design, analyses, and modeling.
-
Reviewer #1 (Public review):
Summary:
Adult (4mo) rats were tasked to either press one lever for an immediate reward or another for a delayed reward. The task had an adjusting amount structure in which (1) the number of pellets provided on the immediate reward lever changed as a function of the decisions made, (2) rats were prevented from pressing the same lever three times in a row.
While the authors have been very responsive to the reviews, and I appreciate that, unfortunately, the new analyses reported in this revision actually lead me to deeper concerns about the adequacy of the data to support the conclusions. In this revision, it has become clear that the conclusions are forced and not supported by the data. Alternative theories are not considered or presented. This revision has revealed deep problems with the task, the analyses, and the modeling.
Data Weaknesses
Most importantly, the inclusion of the task behavior data has revealed a deep problem with the entire structure of the data. As is obvious in Figure 1D, there is a slow learning effect that is changing over the sessions as the animals learn to stop taking the delayed outcome. Unfortunately, the 8s delays came *after* the 4s. The first 20 sessions contain 19 4s delays and 1 8s delay, while the last 20 sessions contain 14 8s delays and 6 4s delays. Given the changes across sessions, it is likely that a large part of the difference is due to across-session learning (which is never addressed or considered).
These data are not shown by subject and I suspect that individual subjects did all 4s then all 8s and some subjects switched tasks at different times. If my suspicion is true, then any comparisons between the 4s and 8s conditions (which are a major part of the author's claims) may have nothing to do with the delays, but rather with increased experience on the task.
Furthermore, the four "groups", which are still poorly defined, seem to have been assessed at a session-by-session level. So when did each animal fall into a given group? Why is Figure 1D not showing which session fell into which group and why are we not seeing each animal's progression? They also admit that animals used a mixture of strategies, which implies that the "group" assignment is an invalid analysis, as the groups do not accommodate strategy mixing.
Figure 2 shows that none of the differences of the group behavior against random choice with a basic p(delay) are significant. The use a KS test to measure these differences. KS tests are notoriously sensitive as KS tests simply measure whether there are any statistical differences between two distributions. They do not report the full statistics for Figure 2, but only say that the 4HI group was not significant (KS p-value = 0.72) and the 8LO showed a p-value of 0.1 (which they interpret as significant). p=0.1 is not significant. They don't report the value of the 4LO or 8HI groups (why not?), but say they are in-between these two extremes. That means *none* of the differences are significant.
They then test a model with additional parameters, and say that the model includes more than the minimal p_D parameter, but never report BIC or AIC model comparisons. In order to claim that the model is better than the bare p_D assumption, they should be reporting model-comparison statistics. But given that the p_D parameters are enough (q.v. Figure 2), this entire model seems unnecessary
It took me a while to determine what was being shown in Figure 3, but I was eventually able to determine that 0 was the time after the animal made the choice to wait out the delay side, so the 4s in Figure 3A1 with high power in the low-frequency (<5 Hz) range is the waiting time. They don't show the full 8s time. Nor do they show the spectrograms separated by group (assuming that group is the analytical tool they are using). In B they show only show theta power, but it is unclear how to interpret these changes over time.
In Figure 4, panel A is mostly useless because it is just five sample sessions showing firing rate plotted on the same panels as the immediate reward amount. If they want to claim correlation, they should show and test it. But moreover, this is not how neural data should be presented - we need to know what the cells are doing, population-wise. We need to have an understanding of the neural ensemble. These data are clearly being picked and chosen, which is not OK.
Figure 4, panels B and C show that the activity trivially reflects the reward that has been delivered to the animal, if I am understanding the graphs correctly. (The authors do not interpret it this way, but the data is, to my eyes, clear.) The "immediate" signal shows up immediately at choice and reflects the size of the immediate reward (which is varying). The "delay" signal shows up after the delay and does not, which makes sense as the animals get 6 pellets on the delayed side no matter what. In fact, the max delayed side activity = the max immediate side activity, which is 6 pellets. This is just reward-related firing.
Figure 5 is poorly laid out, switching the order in 5C to be 2 1 3 in E and F. (Why?!) The statistics for Figure 5 on page 17 should be asking whether there are differences between neuron types, not whether there is a choice x time interaction in a given neuron type. When I look at Figure 5F1-3, all three types look effectively similar with different levels of noise. It is unclear why they are doing this complicated PC analysis or what we should be drawing from it.
Figure 6 mis-states pie charts as "total number" rather than proportions.
Interpretation Weaknesses
The separation of cognitive effort into "resource-based" and "resistance-based" seems artificial to me. I still do not understand why the ability to resist a choice does not also depend on resource or why using resources are not a form of resistance. Doesn't every action in the end depend on the resources one has available? And doesn't every use of a resource resist one option by taking another? Even if one buys these two separate cognitive control processes (which at this point in reading the revision, I do not), the paper starts from the assumption that a baseline probability of waiting out the delays is a "resistance-based cognitive control" (why?) and a probability of choice that takes into account the size of the immediate value (confusingly abbreviated as ival) is a "resource-based cognitive control" (again, why?)
-
Reviewer #2 (Public review):
Summary:
I appreciate the considerable work the authors have done on the revision. The manuscript is markedly improved.
Strengths still include the strong theoretical basis, well-done experiments, and clear links to LFP / spectral analyses that have links to human data. The task is now more clearly explained, and the neural correlates better articulated.
Weaknesses:
I had remaining questions, many related to my previous questions.<br /> (1) The results have some complexity, but I still had questions about which is resource and which is resistance based. The authors say in the last sentence of the discussion: "Prominent pre-choice theta power was associated with a behavioral strategy characterized by a strong bias towards a resistance-based strategy, whereas the neural signature of ival-tracking was associated with a strong bias towards a resource-based strategy.".<br /> I might suggest making this simpler and clear in the abstract and the first paragraph of the discussion. A simple statement like 'pre-choice theta was biased towards resistance whereas single neurons were biased towards resources" might make this idea come across?
(2) I think most readers would like to see raw single trial LFP traces in Figure 3, single unit rasters in Figure 4, and spike-field records in Figure 5.
(3) What limitations are there to this work? I wonder if readers might benefit from some contextualization - the sample size, heterogenous behavior - lack of cell-type specificity - using PC3 to define spectral relationships - I might suggest pointing these out.
(4) I still wasn't sure what 4 Hz vs. theta 6-12 Hz meant - is it all based on PC3's pos/neg correlation? I wonder if showing a scatter plot with the y-axis being PC3 and the x-axis being theta 4 Hz power would help distinguish these? Is this the first time this sort of analysis has been done? If so, it requires clearer definitions.
-
Reviewer #3 (Public review):
Summary:
The study investigated decision making in rats choosing between small immediate rewards and larger delayed rewards, in a task design where the size of the immediate rewards decreased when this option was chosen and increased when it was not chosen. The authors conceptualise this task as involving two different types of cognitive effort; 'resistance-based' effort putatively needed to resist the smaller immediate reward, and 'resource-based' effort needed to track the changing value of the immediate reward option. They argue based on analyses of the behaviour, and computational modelling, that rats use different strategies in different sessions, with one strategy in which they preferentially choose the delayed reward option irrespective of the current immediate reward size, and another strategy in which they preferentially choose the immediate reward option when the immediate reward size is large, and the delayed reward option when the immediate reward size is small. The authors recorded neural activity in anterior cingulate cortex. They propose that oscillatory activity in the 6-12Hz theta band occurs when subjects use a 'resistance-based' strategy of choosing the delayed option irrespective of the current value of the immediate reward option. They also examine neural representation of the current value of the immediate reward option, and suggest that this value is more strongly represented when subjects are using this value information to guide choice. They further argue that neurons whose activity is modulated by theta oscillations are less involved in tracking the value of the immediate reward option than neurons whose activity is not theta modulated. If solid, these findings will be of interest to researchers working on cognitive control and ACCs involvement in decision making. However, there are some issues with the modelling and analysis which preclude high confidence in the validity of the conclusions.
Strengths:
The behavioural task used is interesting and the recording methods used (64 channel silicon probes) should enable the collection of good quality single unit and LFP electrophysiology data. The authors recorded from a sizable sample of subjects for this type of study. The approach of splitting the data into sessions where subjects used different strategies and then examining the neural correlates of each is in principle interesting, though I have some reservations about the strength of evidence for the existence of multiple strategies.
Limitations:
The dataset is unbalanced in terms of both the number of sessions contributed by each subject, and their distribution across the different putative behavioural strategies (see Table 1), with some subjects contributing 7 sessions to a given strategy and others 0. Further, only 2 of 10 subjects contribute any sessions to one of the behavioural strategies (8LO), and a single subject contributes >50% of the sessions (7 of 13) sessions to another strategy (8HI). Apparent differences in brain activity between the strategies could therefore in fact reflect differences between subjects, which could arise due to e.g. differences in electrode placement. To make firm conclusions that neural activity is different in sessions where different strategies are thought to be employed, it would be necessary to account for potential cross-subject variation in the data. The current statistical methods don't appear to do this as they use within subject measures (e.g. trials or neurons) as the experimental unit and ignore which subject the neuron/trial came from.
The starting point for the analysis was the splitting of sessions into 4 groups based on the duration of the delay (4 vs 8 seconds) and then clustering within each delay category into two sub-groups. It was not clear why 2 clusters per delay category were used, nor whether the data did in fact have a clear split into two distinct clusters or continuous variation across the population of sessions. The simplified RL model used in the revised manuscript (which is an improvement from that used in the previous version) could in principle help to quantify variation across the populations of sessions, by using model fitting and comparison methods to evaluate variation in strategy across subjects. However, as far as I could tell no model-fitting or comparison was performed, and the only attempt to link the model to data was by simulating data using a fixed probability of choosing the delayed lever (i.e. with no learning across trials) and comparing the distribution of total rewards obtained per session with that of the subjects in each group (Figure 2). Total reward per session is a very coarse behavioural metric and using likelihood-based methods to fit model parameters to subjects trial-by-trial choice data would provide a more sensitive way of using the modelling to assess behavioural strategy across sessions.
Conceptually, it is not obvious that choices towards the delayed vs immediate lever reflect use of different strategies employing different types of cognitive effort. Rather these could reflect a single strategy which compares the estimated value of the two levers, with differences in behaviour between sessions accounted for either by differences in the task itself (between the 8s and 4s delay condition) or differences in the parameters of the strategy, such as the strength of temporal discounting.
Even if one accepts the claim that the task recruits two distinct types of cognitive control, the argument that theta oscillations, which occur on delay choice trials in the 4s delay condition, are a correlate of a 'resistance-based' strategy (resisting the immediate reward), is hard to reconcile with the fact that theta oscillations do not occur on delay choice trials in the 8s delay condition (Figure 3). The authors note this discrepancy, but state that 'The reason was because these groups largely avoided the delayed lever (Figure 1) and thereby abandoned the need to implement resistance-based control altogether.' However, the data in Figure 1D show that even in the 8s condition the subjects choose the delayed lever on around 50% of trials. It is not obvious why choosing the delayed lever on 50% of trials in the 8s condition does not require 'resistance-based' cognitive effort, while choosing it in the 4s delay condition does.
The other main claims regarding the neural data are that the neuronal representation of the value of the immediate reward lever (ival) is stronger in sessions where subjects are choosing that lever more often, particularly the 8LO group, and that neurons whose activity tracks ival are a different population from neurons whose activity is theta modulated. However, the analysis methods used to make these claims are rather convoluted and make it hard to assess the strength of the evidence for them.
To evaluate the strength of ival representation in neural activity, the authors first fit a regression model predicting each neuron's activity at different timepoints as a function of behavioural variables including ival, which is a sensible first step. However, they then perform clustering on the regression coefficients and then plot neural activity only for the cluster which they state 'provided the clearest example of value tracking'. It is not clear how the clustering was done, whether there were in fact well defined clusters in the neural activity, how the clusters whose activity is plotted were chosen, nor the proportion of neurons in this cluster for each group of sessions. The analysis therefore provides only limited information about the strength of ival representation in different session groups. It would be useful to quantify the variance explained by ival in neural activity for each group of sessions using a simpler quantification of the regression analysis, such as cross-validated coefficient of partial determination.
The analysis of how theta modulation related to representation of ival across neurons was also complicated and non-standard. To determine whether individual neurons were theta modulated, the authors did PCA on a matrix comprised of spike train autocorrelations for individual neurons, and then grouped neurons according to the projection of their autocorrelation function onto the 3rd Principal Component, on the basis that neurons with negative projection onto this component showed a peak roughly at theta frequency in the power spectrum of their autocorrelation. Even ignoring the fact that the peak in the power spectrum is broad and centred above the standard theta frequency (see figure 5B3), this is an arbitrary and unnecessarily complex way to determine if neurons are theta modulated. It would be much simpler and greatly preferable to either directly assess the modulation depth of individual neurons spike train autocorrelation in the theta band, or to use a metric of spike-LFP coupling in the theta band instead. The authors do include some analysis of spike field coherence in Figure 6 and this is a much more sensible approach. However, it is worth noting that the only session group which shows a difference in coherence at theta frequency relative to the other groups is 8LO, to which only 2 of 8 animals contribute any data and 70% of sessions come from one animal. It is therefore unclear whether differences in this group are due to differences in behavioural strategy, or reflect other sources of cross-animal variation.
-
Author response:
The following is the authors’ response to the current reviews.
We would like to thank the reviewers for their efforts and feedback on our preprint. We have elected to rework the manuscript for publication in a different journal. In this process we will alter many of the approaches and re-evaluate the conclusions. With this, many of the points raised by the reviewers will be no longer relevant and therefore do not require a response. Again, we thank the reviewers for their time and helpful feedback.
The following is the authors’ response to the original reviews.
eLife Assessment:
The authors present a potentially useful approach of broad interest arguing that anterior cingulate cortex (ACC) tracks option values in decisions involving delayed rewards. The authors introduce the idea of a resource-based cognitive effort signal in ACC ensembles and link ACC theta oscillations to a resistance-based strategy. The evidence supporting these new ideas is incomplete and would benefit from additional detail and more rigorous analyses and computational methods.
We are extremely grateful for the several excellent and comments of the reviewers. To address these concerns, we have completely reworked the manuscript adding more rigorous approaches in each phase of the analysis and computational model. We realize that this has taken some time to prepare the revision. However, given the comments of the reviewers, we felt it necessary to thoroughly rework the paper based on their input. Here is a (nonexhaustive) overview of the major changes we made:
We have developed a way to more adequately capture the heterogeneity in the behavior
We have completely reworked the RL model
We have added additional approaches and rigor to the analysis of the value-tracking signal.
Reviewer #1 (Public Review):
Summary:
Young (2.5 mo [adolescent]) rats were tasked to either press one lever for immediate reward or another for delayed reward.
Please note that at the time of testing and training that the rats were > 4 months old.
The task had a complex structure in which (1) the number of pellets provided on the immediate reward lever changed as a function of the decisions made, (2) rats were prevented from pressing the same lever three times in a row. Importantly, this task is very different from most intertemporal choice tasks which adjust delay (to the delayed lever), whereas this task held the delay constant and adjusted the number of 20 mg sucrose pellets provided on the immediate value lever.
Several studies parametrically vary the immediate lever (PMID: 39119916, 31654652, 28000083, 26779747, 12270518, 19389183). While most versions of the task will yield qualitatively similar estimates of discounting, the adjusting amount is preferred as it provides the most consistent estimates (PMID: 22445576). More specifically this version of the task avoids contrast effects of that result from changing the delay during the session (PMID: 23963529, 24780379, 19730365, 35661751) which complicates value estimates.
Analyses are based on separating sessions into groups, but group membership includes arbitrary requirements and many sessions have been dropped from the analyses.
We have updated this approach and now provide a more comprehensive assessment of the behavior. The updated approach applies a hierarchical clustering model to the behavior in each session. This was applied at each delay to separate animals that prefer the immediate option more/less. This results in 4 statistically dissociable groups (4LO, 4HI, 8LO, 8HI) and includes all sessions. Please see Figure 1.
Computational modeling is based on an overly simple reinforcement learning model, as evidenced by fit parameters pegging to the extremes.
We have completely reworked the simulations in the revision. In the updated RL model we carefully add parameters to determine which are necessary to explain the experimental data. We feel that it is simplified yet more descriptive. Please see Figure 2 and associated text.
The neural analysis is overly complex and does not contain the necessary statistics to assess the validity of their claims.
We have dramatically streamlined the spike train analysis approach and added several statistical tests to ensure the rigor of our results. Please see Figures 4,5,6 and associated text.
Strengths:
The task is interesting.
Thank you for the positive comment
Weaknesses:
Behavior:
The basic behavioral results from this task are not presented. For example, "each recording session consisted of 40 choice trials or 45 minutes". What was the distribution of choices over sessions? Did that change between rats? Did that change between delays? Were there any sequence effects? (I recommend looking at reaction times.) Were there any effects of pressing a lever twice vs after a forced trial?
Please see the updated statistics and panels in Figures 1 and 2. We believe these address this valid concern.
This task has a very complicated sequential structure that I think I would be hard pressed to follow if I were performing this task.
Human tasks implement a similar task structure (PMID: 26779747). Please note the response above that outlines the benefits of using of this task.
Before diving into the complex analyses assuming reinforcement learning paradigms or cognitive control, I would have liked to have understood the basic behaviors the rats were taking. For example, what was the typical rate of lever pressing? If the rats are pressing 40 times in 45 minutes, does waiting 8s make a large difference?
Thank you for this suggestion. Our additions to Figure 1 are intended to better explain and quantify the behavior of the animals. Note that this task is designed to hold the rate of reinforcement constant no matter the choices of the animals. Our analysis supports the long-held view in the literature that rats do not like waiting for rewards, even at small delays. Going from the 4 à 8 sec delay results in significantly more immediate choices, indicating that the rats will forgo waiting 8 sec for a larger reinforcer and take a smaller reinforcer at 4 sec.
For that matter, the reaction time from lever appearance to lever pressing would be very interesting (and important). Are they making a choice as soon as the levers appear? Are they leaning towards the delay side, but then give in and choose the immediate lever? What are the reaction time hazard distributions?
This is an excellent suggestion, we have added a brief analysis of reaction times (Please see the section entitled “4 behavioral groups are observed across all sessions” in the Results). Please note that an analysis of the reaction times has been presented in a prior analysis of this data set (White et al., 2024). In addition, an analysis of reaction times in this task was performed in Linsenbardt et al. (2017). In short, animals tend to choose within 1 second of the lever appearing. In addition, our prior work shows that responses on the immediate lever tend to be slower, which we viewed as evidence of increased deliberation requirements (possibly required to integrate value signals).
It is not clear that the animals on this task were actually using cognitive control strategies on this task. One cannot assume from the task that cognitive control is key. The authors only consider a very limited number of potential behaviors (an overly simple RL model). On this task, there are a lot of potential behavioral strategies: "win-stay/lose-shift", "perseveration", "alternation", even "random choices" should be considered.
The strategies the Reviewer mentioned are descriptors of the actual choices the rats made. For example, perseveration means the rat is choosing one of the levers at an excessively high rate whereas alternation means it is choosing the two levers more or less equally, independent of payouts. But the question we are interested in is why? We are arguing that the type of cognitive control determines the choice behavior, but cognitive control is an internal variable that guides behavior, rather than simply a descriptor of the behavior. For example, the animal opts to perseverate on the delayed lever because the cognitive control required to track ival is too high. We then searched the neural data for signatures of the two types of cognitive control.
The delay lever was assigned to the "non-preferred side". How did side bias affect the decisions made?
The side bias clearly does not impact performance as the animals prefer the delay lever at shorter delays, which works against this bias.
The analyses based on "group" are unjustified. The authors compare the proportion of delayed to immediate lever press choices on the non-forced trials and then did k-means clustering on this distribution. But the distribution itself was not shown, so it is unclear whether the "groups" were actually different. They used k=3, but do not describe how this arbitrary number was chosen. (Is 3 the optimal number of clusters to describe this distribution?) Moreover, they removed three group 1 sessions with an 8s delay and two group 2 sessions with a 4s delay, making all the group 1 sessions 4s delay sessions and all group 2 sessions 8s delay sessions. They then ignore group 3 completely. These analyses seem arbitrary and unnecessarily complex. I think they need to analyze the data by delay. (How do rats handle 4s delay sessions? How do rats handle 6s delay sessions? How do rats handle 8s delay sessions?). If they decide to analyze the data by strategy, then they should identify specific strategies, model those strategies, and do model comparison to identify the best explanatory strategy. Importantly, the groups were session-based, not rat based, suggesting that rats used different strategies based on the delay to the delayed lever.
We have completely reworked our approach for capturing the heterogeneity in behavior. We have taken care to show more of the behavioral statistics that have gone into identifying each of the groups. All sessions are included in this analysis. As the reviewer suggests, we used the statistics from each of the behavioral groups to inform the RL model that explores neural signals that underly decisions in this task. We strongly disagree that groups should be rat and not session based as the behavior of the animal can, and does, change from day to day. This is important to consider when analyzing the neural data as rat-based groupings would ignore this potential source of variance.
The reinforcement learning model used was overly simple. In particular, the RL model assumes that the subjects understand the task structure, but we know that even humans have trouble following complex task structures. Moreover, we know that rodent decision-making depends on much more complex strategies (model-based decisions, multi-state decisions, rate-based decisions, etc). There are lots of other ways to encode these decision variables, such as softmax with an inverse temperature rather than epsilon-greedy. The RL model was stated as a given and not justified. As one critical example, the RL model fit to the data assumed a constant exponential discounting function, but it is well-established that all animals, including rodents, use hyperbolic discounting in intertemporal choice tasks. Presumably this changes dramatically the effect of 4s and 8s. As evidence that the RL model is incomplete, the parameters found for the two groups were extreme. (Alpha=1 implies no history and only reacting to the most recent event. Epsilon=0.4 in an epsilongreedy algorithm is a 40% chance of responding randomly.)
While we agree that the approach was not fully justified, we do not agree that it was invalid. Simply stated, a softmax approach gives the best fit to the choice behavior, whereas our epsilon-greedy approach attempted to reproduce the choice behavior using a naïve agent that progressively learns the values of the two levers on a choice-by-choice basis. Nevertheless, we certainly appreciate that important insights can be gained by fitting a model to the data as suggested. We feel that the new modeling approach we have now implemented is optimal for the present purposes and it replaces the one used in the original manuscript.
The authors do add a "dbias" (which is a preference for the delayed lever) term to the RL model, but note that it has to be maximal in the 4s condition to reproduce group 2 behavior, which means they are not doing reinforcement learning anymore, just choosing the delayed lever.
The dbias term was dropped in the new model implementation
Neurophysiology:
The neurophysiology figures are unclear and mostly uninterpretable; they do not show variability, statistics or conclusive results.
While the reviewer is justified in criticizing the clarity of the figures, the statement that “they do not show variability, statistics or conclusive results” is not correct. Each of the figures presented in the first draft of the manuscript, except Figure 3, are accompanied by statistics and measures of variability. Nonetheless we have updated each of the neurophysiology analyses. We hope that the reviewer will find our updates more rigorous and thorough.
As with the behavior, I would have liked to have seen more traditional neurophysiological analyses first. What do the cells respond to? How do the manifolds change aligned to the lever presses? Are those different between lever presses?
We have added several figures that plot the mean +/- SEM of the neural activity (see Figures 4 and 5). Hopefully this provides a more intuitive picture of the changes in neural activity throughout the task.
Are there changes in cellular information (both at the individual and ensemble level) over time in the session?
We provide several analyses of how firing rate changes over trials in relation to ival over time and trials in the session. In addition, we describe how these signals change in each of the behavioral groups.
How do cellular responses differ during that delay while both levers are out, but the rats are not choosing the immediate lever?
We were somewhat unclear about this suggestion as the delay follows the lever press. In addition, there is no delay after immediate presses
Figure 3, for example, claims that some of the principal components tracked the number of pellets on the immediate lever ("ival"), but they are just two curves. No statistics, controls, or justification for this is shown. BTW, on Figure 3, what is the event at 200s?
This comment is no longer relevant based on the changes we’ve made to the manuscript.
I'm confused. On Figure 4, the number of trials seems to go up to 50, but in the methods, they say that rats received 40 trials or 45 minutes of experience.
This comment is no longer relevant based on the changes we’ve made to the manuscript.
At the end of page 14, the authors state that the strength of the correlation did not differ by group and that this was "predicted" by the RL modeling, but this statement is nonsensical, given that the RL modeling did not fit the data well, depended on extreme values. Moreover, this claim is dependent on "not statistically detectable", which is, of course, not interpretable as "not different".
This comment is no longer relevant based on the changes we’ve made to the manuscript.
There is an interesting result on page 16 that the increases in theta power were observed before a delayed lever press but not an immediate lever press, and then that the theta power declined after an immediate lever press.
Thank you for the positive comment.
These data are separated by session group (again group 1 is a subset of the 4s sessions, group 2 is a subset of the 8s sessions, and group 3 is ignored). I would much rather see these data analyzed by delay itself or by some sort of strategy fit across delays.
Thank you for the excellent suggestion. Our new group assignments take delay into account.
That being said, I don't see how this description shows up in Figure 6. What does Figure 6 look like if you just separate the sessions by delay?
We are unclear what the reviewer means by “this description”.
Discussion:
Finally, it is unclear to what extent this task actually gets at the questions originally laid out in the goals and returned to in the discussion. The idea of cognitive effort is interesting, but there is no data presented that this task is cognitive at all. The idea of a resourced cognitive effort and a resistance cognitive effort is interesting, but presumably the way one overcomes resistance is through resourcelimited components, so it is unclear that these two cognitive effort strategies are different.
The basis for the reviewers assertation that “the way one overcomes resistance is through resourcelimited components” is not clear. In the revised version, we have taken greater care to outline how each type of effort signal facilitates performance of the task and articulate these possibilities in our stochastic and RL models. We view the strong evidence for ival tracking presented herein as a critical component of resource based cognitive effort.
The authors state that "ival-tracking" (neurons and ensembles that presumably track the number of pellets being delivered on the immediate lever - a fancy name for "expectations") "taps into a resourced-based form of cognitive effort", but no evidence is actually provided that keeping track of the expectation of reward on the immediate lever depends on attention or mnemonic resources. They also state that a "dLP-biased strategy" (waiting out the delay) is a "resistance-based form of cognitive effort" but no evidence is made that going to the delayed side takes effort.
We challenge the reviewers that assertation ival tracking is a “fancy name for expectations”. We make no claim about the prospective or retrospective nature of the signal. Clearly, expectations should be prospective and therefore different from ival tracking. Regarding the resistance signal: First, animals avoid the delay lever more often at the 8 sec delay (Figure 1). We have shown that increasing the delay systematically biases responses AWAY from the delay (Linsenbardt et al., 2017). This is consistent with a well-developed literature that rats and mice do not like waiting for delayed reinforcers. We contend that enduring something you don’t like takes effort.
The authors talk about theta synchrony, but never actually measure theta synchrony, particularly across structures such as amygdala or ventral hippocampus. The authors try to connect this to "the unpleasantness of the delay", but provide no measures of pleasantness or unpleasantness. They have no evidence that waiting out an 8s delay is unpleasant.
We have added spike-field coherence to better contact the literature on synchrony. Note that we never refer to our results as “synchrony”. However, we would be remiss to not address the growing literature on theta synchrony in effort allocation. There is a well-developed literature that rats and mice do not like waiting for delayed reinforcers. If waiting out the delay was not pleasant then why do the animals forgo larger rewards to avoid it?
The authors hypothesize that the "ival-tracking signal" (the expectation of number of pellets on the immediate lever) "could simply reflect the emotional or autonomic response". Aside from the fact that no evidence for this is provided, if this were to be true, then, in what sense would any of these signals be related to cognitive control?
This is proposed as an alternative explanation to the ival signal in the discussion. It was added as our due diligence. Emotional state could provide feedback to the currently implemented control mechanism. If waiting for reinforcement is too unpleasant this could drive them to ival tracking and choosing the immediate option more frequently. We provide this option only as a possibility, not a conclusion. We have clarified this in the revised text. Nevertheless, based on our review of the literature, autonomic tracking in some form, seems to be the most likely function of ACC (Seamans & Floresco 2022). While the reviewer may disagree with this, we feel it is at least as valid as all the complex, cognitively-based interpretations that commonly appear in the literature.
Reviewer #2 (Public Review):
Summary:
This manuscript explores the neuronal signals that underlie resistance vs resource-based models of cognitive effort. The authors use a delayed discounting task and computational models to explore these ideas. The authors find that the ACC strongly tracks value and time, which is consistent with prior work. Novel contributions include quantification of a resource-based control signal among ACC ensembles, and linking ACC theta oscillations to a resistance-based strategy.
Strengths:
The experiments and analyses are well done and have the potential to generate an elegant explanatory framework for ACC neuronal activity. The inclusion of local-field potential / spike-field analyses is particularly important because these can be measured in humans.
Thank you for the endorsement of our work.
Weaknesses:
I had questions that might help me understand the task and details of neuronal analyses.
(1) The abstract, discussion, and introduction set up an opposition between resource and resistancebased forms of cognitive effort. It's clear that the authors find evidence for each (ACC ensembles = resource, theta=resistance?) but I'm not sure where the data fall on this dichotomy.
(a) An overall very simple schematic early in the paper (prior to the MCML model? or even the behavior) may help illustrate the main point.
(b) In the intro, results, and discussion, it may help to relate each point to this dichotomy.
(c) What would resource-based signals look like? What would resistance based signals look like? Is the main point that resistance-based strategies dominate when delays are short, but resource-based strategies dominate when delays are long?
(d) I wonder if these strategies can be illustrated? Could these two measures (dLP vs ival tracking) be plotted on separate axes or extremes, and behavior, neuronal data, LFP, and spectral relationships be shown on these axes? I think Figure 2 is working towards this. Could these be shown for each delay length? This way, as the evidence from behavior, model, single neurons, ensembles, and theta is presented, it can be related to this framework, and the reader can organize the findings.
These are excellent suggestions, and we have implemented them, where possible.
(2) The task is not clear to me.
(a) I wonder if a task schematic and a flow chart of training would help readers.
Yes, excellent idea, we have now included this in Figure 1.
(b) This task appears to be relatively new. Has it been used before in rats (Oberlin and Grahame is a mouse study)? Some history / context might help orient readers.
Indeed, this task has been used in rats in several prior studies in rats. Please see the following references (PMID: 39119916, 31654652, 28000083, 26779747, 12270518, 19389183).
(c) How many total sessions were completed with ascending delays? Was there criteria for surgeries? How many total recording sessions per animal (of the 54?)
Please note that the delay does not change within a session. There were no criteria for surgery.
(d) How many trials completed per session (40 trials OR 45 minutes)? Where are there errors? These details are important for interpreting Figure 1.
Every animal in this data set completed 40 trials and we have updated the task description to clarify this issue. There are no errors in this task, but rather the task is designed to the tendency to make an impulsive choice (smaller reward now).
(3) Figure 1 is unclear to me.
(a) Delayed vs immediate lever presses are being plotted - but I am not sure what is red, and what is blue. I might suggest plotting each animal.
We have updated Figure 1 considerably for clarity.
(b) How many animals and sessions go into each data point?
We hope this is clarified now with our new group assignments as all sessions were included in the analysis.
(c) Table 1 (which might be better referenced in the paper) refers to rats by session. Is it true that some rats (2 and 8) were not analyzed for the bulk of the paper? Some rats appear to switch strategies, and some stay in one strategy. How many neurons come from each rat?
We have updated Table 1 based on our new groupings. The rats that contribute the most sessions also tend to be represented across the behavioral groups therefore it is unlikely that effort allocation strategies across groupings are an esoteric feature of an animal.
(d) Task basics - RT, choice, accuracy, video stills - might help readers understand what is going into these plots
(e) Does the animal move differently (i.e., RTs) in G1 vs. G2?
Excellent suggestion. We have added more analysis of the task variables in the revision (e.g. RT, choice comparisons across delays, etc…)
(4) I wasn't sure how clustered G1 vs. G2 vs G3 are. To make this argument, the raw data (or some axis of it) might help.
(a) This is particularly important because G3 appears to be a mix of G1 and G2, although upon inspection, I'm not sure how different they really are
(b) Was there some objective clustering criteria that defined the clusters?
(c) Why discuss G3 at all? Can these sessions be removed from analysis?
Based on our updates to the behavioral analysis these comments are no longer relevant.
(5) The same applies to neuronal analyses in Fig 3 and 4
(a) What does a single neuron peri-event raster look like? I would include several of these.
(b) What does PC1, 2 and 3 look like for G1, G2, and G3?
(c) Certain PCs are selected, but I'm not sure how they were selected - was there a criteria used? How was the correlation between PCA and ival selected? What about PCs that don't correlate with ival?
(d) If the authors are using PCA, then scree plots and PETHs might be useful, as well as comparisons to PCs from time-shuffled / randomized data.
We hope that our reworking of the neural data analysis has clarified these issues. We now include several firing rate examples and aggregate data.
(6) I had questions about the spectral analysis
(a) Theta has many definitions - why did the authors use 6-12 Hz? Does it come from the hippocampal literature, and is this the best definition of theta? What about other bands (delta - 1-4 Hz), theta (4-7 Hz); and beta - 13- 30 Hz? These bands are of particular importance because they have been associated with errors, dopamine, and are abnormal in schizophrenia and Parkinson's disease.
This designation comes mainly from the hippocampal and ACC literature in rodents. In addition, this range best captured the peak in the power spectrum in our data. Note that we focus our analysis on theta give the literature regarding theta in the ACC as a correlate of cognitive controls (references in manuscript). We did interrogate other bands as a sanity check and the results were mostly limited to theta. Given the scope of our manuscript and the concerns raised regarding complexity we are concerned that adding frequency analyses beyond theta obfuscates the take home message.
However, the spectrograms in Figure 3 show a range of frequencies and highlight the ones in the theta band as the most dynamic prior to the choice.
(b) Power spectra and time-frequency analyses may justify the authors focus. I would show these (yaxis - frequency, x-axis - time, z-axis, power).
Thank you for the suggestion. We have added this to Figure 3.
(7) PC3 as an autocorrelation doesn't seem the to be right way to infer theta entrainment or spikefield relationships, as PCA can be vulnerable to phantom oscillations, and coherence can be transient. It is also difficult to compare to traditional measures of phase-locking. Why not simply use spike-field coherence? This is particularly important with reference to the human literature, which the authors invoke.
Excellent suggestion. Note that PCA provided a way to classify neurons that exhibited peaks in the autocorrelation at theta frequencies. We have added spike-field coherence, and this analysis confirms the differences in theta entrainment of the spike trains across the behavioral groups. Please see Figure 6D.
Reviewer #3 (Public Review):
Summary:
The study investigated decision making in rats choosing between small immediate rewards and larger delayed rewards, in a task design where the size of the immediate rewards decreased when this option was chosen and increased when it was not chosen. The authors conceptualise this task as involving two different types of cognitive effort; 'resistance-based' effort putatively needed to resist the smaller immediate reward, and 'resource-based' effort needed to track the changing value of the immediate reward option. They argue based on analyses of the behaviour, and computational modelling, that rats use different strategies in different sessions, with one strategy in which they consistently choose the delayed reward option irrespective of the current immediate reward size, and another strategy in which they preferentially choose the immediate reward option when the immediate reward size is large, and the delayed reward option when the immediate reward size is small. The authors recorded neural activity in anterior cingulate cortex (ACC) and argue that ACC neurons track the value of the immediate reward option irrespective of the strategy the rats are using. They further argue that the strategy the rats are using modulates their estimated value of the immediate reward option, and that oscillatory activity in the 6-12Hz theta band occurs when subjects use the 'resistancebased' strategy of choosing the delayed option irrespective of the current value of the immediate reward option. If solid, these findings will be of interest to researchers working on cognitive control and ACCs involvement in decision making. However, there are some issues with the experiment design, reporting, modelling and analysis which currently preclude high confidence in the validity of the conclusions.
Strengths:
The behavioural task used is interesting and the recording methods should enable the collection of good quality single unit and LFP electrophysiology data. The authors recorded from a sizable sample of subjects for this type of study. The approach of splitting the data into sessions where subjects used different strategies and then examining the neural correlates of each is in principle interesting, though I have some reservations about the strength of evidence for the existence of multiple strategies.
Thank you for the positive comments.
Weaknesses:
The dataset is very unbalanced in terms of both the number of sessions contributed by each subject, and their distribution across the different putative behavioural strategies (see table 1), with some subjects contributing 9 or 10 sessions and others only one session, and it is not clear from the text why this is the case. Further, only 3 subjects contribute any sessions to one of the behavioural strategies, while 7 contribute data to the other such that apparent differences in brain activity between the two strategies could in fact reflect differences between subjects, which could arise due to e.g. differences in electrode placement. To firm up the conclusion that neural activity is different in sessions where different strategies are thought to be employed, it would be important to account for potential cross-subject variation in the data. The current statistical methods don't do this as they all assume fixed effects (e.g. using trials or neurons as the experimental unit and ignoring which subject the neuron/trial came from).
In the revised manuscript we have updated the group assignments. We have improved our description of the logic and methods for employing these groupings as well. With this new approach, all sessions are now included in the analysis. The group assignments are made purely on the behavioral statistics of an animal in each session. We feel this approach is preferable to eliminating neurons or session with the goal of balancing them, which may introduce bias. Further, the rats that contribute the most sessions also tend to be represented across the behavioral groups therefore it is unlikely that effort allocation strategies across groupings are an esoteric feature of an animal. As neurons are randomly sampled from each animal on a given session, we feel that we’re justified in treating these as fixed effects.
It is not obvious that the differences in behaviour between the sessions characterised as using the 'G1' and 'G2' strategies actually imply the use of different strategies, because the behavioural task was different in these sessions, with a shorter wait (4 seconds vs 8 seconds) for the delayed reward in the G1 strategy sessions where the subjects consistently preferred the delayed reward irrespective of the current immediate reward size. Therefore the differences in behaviour could be driven by difference in the task (i.e. external world) rather than a difference in strategy (internal to the subject). It seems plausible that the higher value of the delayed reward option when the delay is shorter could account for the high probability of choosing this option irrespective of the current value of the immediate reward option, without appealing to the subjects using a different strategy.
Further, even if the differences in behaviour do reflect different behavioural strategies, it is not obvious that these correspond to allocation of different types of cognitive effort. For example, subjects' failure to modify their choice probabilities to track the changing value of the immediate reward option might be due simply to valuing the delayed reward option higher, rather than not allocating cognitive effort to tracking immediate option value (indeed this is suggested by the neural data). Conversely, if the rats assign higher value to the delayed reward option in the G1 sessions, it is not obvious that choosing it requires overcoming 'resistance' through cognitive effort.
The RL modelling used to characterise the subject's behavioural strategies made some unusual and arguably implausible assumptions:
Thank you for the feedback, based on these comments (and those above) we have completely reworked the RL model. In addition, we’ve taken care to separate out the variables that correspond to a resistance- versus a resource-based signal.
There were also some issues with the analyses of neural data which preclude strong confidence in their conclusions:
Figure 4I makes the striking claim that ACC neurons track the value of the immediately rewarding option equally accurately in sessions where two putative behavioural strategies were used, despite the behaviour being insensitive to this variable in the G1 strategy sessions. The analysis quantifies the strength of correlation between a component of the activity extracted using a decoding analysis and the value of the immediate reward option. However, as far as I could see this analysis was not done in a cross-validated manner (i.e. evaluating the correlation strength on test data that was not used for either training the MCML model or selecting which component to use for the correlation). As such, the chance level correlation will certainly be greater than 0, and it is not clear whether the observed correlations are greater than expected by chance.
We have added more rigorous methods to assess the ival tracking signal (Figure 4 and 5). In addition, we’ve dropped the claim that ival tracking is the same across the behavioral groups. We suspect that this was an artifact of a suboptimal group assignment approach in the previous version.
An additional caveat with the claim that ACC is tracking the value of the immediate reward option is that this value likely correlates with other behavioural variables, notably the current choice and recent choice history, that may be encoded in ACC. Encoding analyses (e.g. using linear regression to predict neural activity from behavioural variables) could allow quantification of the variance in ACC activity uniquely explained by option values after controlling for possible influence of other variables such as choice history (e.g. using a coefficient of partial determination).
We agree that the ival tracking signal may be influenced by other variables – especially ones that are not cognitive but rather more generated by the autonomic system. We have included a discussion of this possibility in the Discussion section. Our previous work has explored the role of choice history on neural activity, please see White et al., (2024).
Figure 5 argues that there are systematic differences in how ACC neurons represent the value of the immediate option (ival) in the G1 and G2 strategy sessions. This is interesting if true, but it appears possible that the effect is an artefact of the different distribution of option values between the two session types. Specifically, due to the way that ival is updated based on the subjects' choices, in G1 sessions where the subjects are mostly choosing the delayed option, ival will on average be higher than in G2 sessions where they are choosing the immediate option more often. The relative number of high, medium and low ival trials in the G1 and G2 sessions will therefore be different, which could drive systematic differences in the regression fit in the absence of real differences in the activity-value relationship. I have created an ipython notebook illustrating this, available at: https://notebooksharing.space/view/a3c4504aebe7ad3f075aafaabaf93102f2a28f8c189ab9176d48 07cf1565f4e3. To verify that this is not driving the effect it would be important to balance the number of trials at each ival level across sessions (e.g. by subsampling trials) before running the regression.
This is an excellent point and lead us to abandon the linear regression-based approach to quantify differences in ival coding across behavioral groups.
Recommendations for the authors:
Reviewer #1 (Recommendations For The Authors):
This paper was extremely hard to read. In addition to the issues raised in the public review (overly complex and incomplete analyses), one of the hardest things to deal with was the writing.
Thank you for the feedback. Hopefully we have addressed this with our thorough rewrite.
The presentation was extremely hard to follow. I had to read through it several times to figure out what the task was. It wasn't until I got to the RL model Figure 2A that I realized what was really going on with the task. I strongly recommend having an initial figure that lays out the actual task (without any RL or modeling assumptions) and identifies the multiple different kinds of sessions. What is the actual data you have to start with? That was very unclear.
Excellent idea. We have implemented this in Figure 1.
Labeling session by "group" is very confusing. I think most readers take "group" as the group of subjects, but that's not what you mean at all. You mean some sessions were one way and some were another. (And, as I noted in the public review, you ignore many of the sessions, which I think is not OK.) I think a major rewrite would help a lot. Also, I don't think the group analysis is necessary at all. In the public review, I recommend doing the analyses very differently and more classically.
We have updated the group assignments in a manner that is more intuitive, reflects the delays, and includes all sessions.
The paper is full of arbitrary abbreviations that are completely unnecessary. Every time I came to "ival", I had to translate that into "number of pellets delivered on the immediate lever" and every time I came to dLP, I had to translate that into "delayed lever press". Making the text shorter does not make the text easier to read. In general, I was taught that unless the abbreviation is the common term (such as "DNA" not "deoxyribonucleic acid"), you should never use an abbreviation. While there are some edge cases (ACC probably over "anterior cingulate cortex"), dLP, iLP, dLPs, iLPs, ival, are definitely way over the "don't do that" line.
We completely agree here and apologize for the excessive use of abbreviations. We have removed nearly all of them
The figures were incomplete, poorly labeled, and hard to read. A lot of figures were missing, for example
Basic task structure
Basic behavior on the task
Scatter plot of the measures that you are clustering (lever press choice X number of pellets on the immediate lever, you can use color or multiple panels to indicate the delay to the delayed lever) Figure 3 is just a couple of examples. That isn't convincing at all.
Figure 4 is missing labels. In Figure 4, I don't understand what you are trying to say.
I don't see how the results on page 16 arise from Figure 6. I strongly recommend starting from the actual data and working your way to what it means rather than forcing this into this unreasonable "session group" analysis.
We have completely reworked the Figures for clarity and content.
The statement that "no prior study has explored the cellular correlates of cognitive effort" is ludicrous and insulting. There are dozens of experiments looking at ACC in cognitive effort tasks, in humans, other primates, and rodents. There are many dozens of experiments looking at cellular correlates in intertemporal choice tasks, some with neural manipulations, some with ensemble recordings. There are many dozens of experiments looking at cellular relationships to waiting out a delay.
We agree that our statement was extremely imprecise. We have updated this to say: “Further, a role for theta oscillations in allocating physical effort has been identified. However, the cellular
mechanisms within the ACC that control and deploy types of cognitive effort have not been identified.”
Reviewer #2 (Recommendations For The Authors):
In Figure 2, the panels below E and F are referred to as 'right' - but they are below? I would give them letters.
I would make sure that animal #s, neuron #s, and LFP#s are clearly presented in the results and in each figure legend. This is important to follow the results throughout the manuscript.
Some additional proofreading ('Fronotmedial') might help with clarity.
Based on our updates, this is no longer relevant.
Reviewer #3 (Recommendations For The Authors):
In addition to the suggestions above to address specific issues, it would be useful to report some additional information about aspects of the experiments and analyses:
Specify how spike sorting was performed and what metrics were used to select well isolated single units.
Done.
Provide histology showing the recording locations for each subject.
Histological assessments of electrodes placements are provided in White et al. 2024, but we provide an example placement. This has been added to the text.
Indicate the sequence of recording sessions that occurred for each subject, including for each session what delay duration was used and which dataset the session contributed to, and indicate when the neural probes were advanced between sessions.
We feel that this adds complexity unnecessarily as we make no claims about holding units across sessions for differences in coding in the dorsoventral gradient of ACC.
Indicate the experimental unit when reporting uncertainty measures in figure legends (e.g. mean +/- SEM across sessions).
Done.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This study investigates how the HIV inhibitor lenacapavir influences capsid mechanics and interactions with the nuclear pore complex. It provides important insights into how drug-induced hyperstabilization of the viral shell can compromise its structural integrity during nuclear entry. While the modeling is technically sophisticated and the results are promising, some mechanistic interpretations rely on assumptions embedded in the simulations, leaving parts of the evidence incomplete.
-
Reviewer #1 (Public review):
The paper from Hudait and Voth details a number of coarse-grained simulations as well as some experiments focused on the stability of HIV capsids in the presence of the drug lenacapavir. The authors find that LEN hyperstabilizes the capsid, making it fragile and prone to breaking inside the nuclear pore complex.
I found the paper interesting. I have a few suggestions for clarification and/or improvement.
(1) How directly comparable are the NPC-capsid and capsid-only simulations? A major result rests on the conclusion that the kinetics of rupture are faster inside the NPC, but are the numbers of LENs bound identical? Is the time really comparable, given that the simulations have different starting points? I'm not really doubting the result, but I think it could be made more rigorous/quantitative.
(2) Related to the above, it is stated on page 12 that, based on the estimated free-energy barrier, pentamer dissociation should occur in ~10 us of CG time. But certainly, the simulations cover at least this length of time?
(3) At first, I was surprised that even in a CG simulation, LEN would spontaneously bind to the correct site. But if I read the SI correctly, LEN was parameterized specifically to bind to hexamers and not pentamers. This is fine, but I think it's worth describing in the main text.
-
Reviewer #2 (Public review):
Here, Hudait et al. use CG modeling to investigate the mechanism by which lenacapavir (LEN) treats HIV capsids that dock to the nuclear pore complex (NPC). However, the manuscript fails to present meaningful findings that were previously unreported in the literature, and is thus of low impact. Many claims made in the manuscript are not substantiated by the presented data. Key mechanistic details that the work purports to reveal are artifacts of the parameterization choices or simulation/analysis design, with the simulations said to reveal details that they were specifically biased to reproduce. This makes the manuscript highly problematic, as its contributions to the literature would represent misconceptions based on oversights in modeling, and thus mislead future readers.
(1) Considering the literature, it is unclear that the manuscript presents new scientific discoveries. The following are results from this paper that have been previously reported:
(a) LEN-bound capsid can dock to the nuclear pore (Figure 2; see e.g. 10.1016/j.cell.2024.12.008 or 10.1128/mbio.03613-24).
(b) NUP98 interacts with the docked capsid (Figure 2; see e.g. 10.1016/j.virol.2013.02.008 or 10.1038/s41586-023-06969-7 or 10.1016/j.cell.2024.12.008).
(c) LEN and NUP98 compete for a binding interface (Figure 2; see e.g. 10.1126/science.abb4808 or 10.1371/journal.ppat.1004459).
(d) LEN creates capsid defects (Figure 3 and 5, see e.g. 10.1073/pnas.2420497122).
(e) RNP can emerge from a damaged capsid (Figure 3 and 5; see e.g. 10.1073/pnas.2117781119 or 10.7554/eLife.64776).
(f) LEN hyperstabilizes/reduces the elasticity of the capsid lattice (Figure 6; see e.g. 10.1371/journal.ppat.1012537).
(2) The mechanistic findings related to how these processes occur are problematic, either based on circular reasoning or unsubstantiated, based on the presented data. In some cases, features of parameterization and simulation/analysis design are erroneously interpreted as predictions by the CG models.
(a) Claim: LEN-bound capsids remain associated with the NPC after rupture. CG simulations did not reach the timescale needed to demonstrate continued association or failure to translocate, leaving the claim unsubstantiated.
(b) Claim: LEN contributes to loss of capsid elasticity. The authors do not measure elasticity here, only force constants of fluctuations between capsomers in freely diffusing capsids. Elasticity is defined as the ability of a material to undergo reversible deformation when subjected to stress. Other computational works that actually measure elasticity (e.g., 0.1371/journal.ppat.1012537) could represent a point of comparison, but are not cited. The changes in force constants in the presence of LEN are shown in Figure 6C, but the text of the scale bar legend and units of k are not legible, so one cannot discern the magnitude or significance of the change.
(c) Claim: Capsid defects are formed along striated patterns of capsid disorder. Data is not presented that correlates defects/cracks with striations.
(d) Claim: Typically 1-2 LEN, but rarely 3 bind per capsid hexamer. The authors state: "The magnitude of the attractive interactions was adjusted to capture the substoichiometric binding of LEN to CA hexamers (Faysal et al., 2024). ... We simulated LEN binding to the capsid cone (in the absence of NPC), which resulted in a substoichiometric binding (~1.5 LEN per CA hexamer), consistent with experimental data (Singh et al., 2024)." This means LEN was specifically parameterized to reproduce the 1-2 binding ratio per hexamer apparent from experiments, so this was a parameterization choice, not a prediction by CG simulations as the authors erroneously claim: "This indicates that the probability of binding a third LEN molecule to a CA hexamer is impeded, likely due to steric effects that prevent the approach of an incoming molecule to a CA hexamer where 2 LEN molecules are already associated. ... Approximately 20% of CA hexamers remain unoccupied despite the availability of a large excess of unbound LEN molecules. This suggests a heterogeneity in the molecular environment of the capsid lattice for LEN binding." These statements represent gross over-interpretation of a bias deliberately introduced during parameterization, and the "finding" represents circular reasoning. Also, if "steric effects" play any role, the authors could analyze the model to characterize and report them rather than simply speculate.
(e) Claim: Competition between NUP98 and LEN regulates capsid docking. The authors state: "A fraction of LEN molecules bound at the narrow end dissociate to allow NUP98 binding to the capsid ... Therefore, LEN can inhibit the efficient binding of the viral cores to the NPC, resulting in an increased number of cores in the cytoplasm." Capsid docking occurs regardless of the presence of LEN, and appears to occur at the same rate as the LEN-free capsid presented in the authors' previous work (Hudait &Voth, 2024). The presented data simply show that there is a fluctuation of bound LEN, with about 10 fewer (<5%) bound at the end of the simulation than at the beginning, and the curve (Figure 2A) does not clearly correlate with increased NUP98 contact. In that case, no data is shown that connects LEN binding with the regulation of the docking process. Further, the two quoted statements contradict each other. The presented data appear to show that NUP outcompetes LEN binding, rather than LEN inhibiting NUP binding. The "Therefore" statement is an attempt to reconcile with experimental studies, but is not substantiated by the presented data.
(f) Claim: LEN binding leads to spontaneous dissociation of pentamers. The CG simulation trajectories show pentamer dissociation. However, it is quite difficult to believe that a pentamer in the wide end of the capsid would dissociate and diffuse 100 nm away before a hexamer in the narrow end (previously between two pentamers and now only partially coordinated, also in a highly curved environment, and further under the force of the extruding RNA) would dissociate, as in Figure 2B. A more plausible explanation could be force balance between pent-hex versus hex-hex contacts, an aspect of CG parameterization. No further modeling is presented to explain the release of pentamers, and changes in pent-hex stiffness are not apparent in the force constant fluctuation analysis in Figure 6C.
(g) Claim: WTMetaD simulations predict capsid rupture. The authors state: "In WTMetaD simulations, we used the mean coordination number (Figure S6) between CA proteins in pentamers and in hexamers as the reaction coordinate." This means that the coordination number, the number of pent-hex contacts, is the bias used to accelerate simulation sampling. Yet the authors then interpret a change in coordination number leading to capsid rupture as a discovery, representing a fundamental misuse of the WTMetaD method. Changes in coordination number cannot be claimed as an emergent property when they are in fact the applied bias, when the simulation forced them to sample such states. The bias must be orthogonal to the feature of interest for that feature to be discoverable. While the reported free energies are orthogonal to the reaction coordinate, the structural and stepwise-mechanism "findings" here represent circular reasoning.
(3) Another major concern with this work is the excessive self-citation, and the conspicuous lack of engagement with similar computational modeling studies that investigate the HIV capsid and its interactions with LEN, capsid mechanical properties relevant to nuclear entry, and other capsid-NPC simulations (e.g., 10.1016/j.cell.2024.12.008 and 10.1371/journal.ppat.1012537). Other such studies available in the literature include examination of varying aspects of the system at both CG and all-atom levels of resolution, which could be highly complementary to the present work and, in many cases, lend support to the authors' claims rather than detract from them. The choice to omit relevant literature implies either a lack of perspective or a lack of collegiality, which the presentation of the work suffers from. Overall, it is essential to discuss findings in the context of competing studies to give readers an accurate view of the state of the field and how the present work fits into it. It is appropriate in a CG modeling study to discuss the potential weaknesses of the methodology, points of disagreement with alternative modeling studies, and any lack of correlation with a broader range of experimental work. Qualitative agreement with select experiments does not constitute model validation.
(4) Other critiques, questions, concerns:
(a) The first Results sub-heading presents "results", complete with several supplementary figures and a movie that are from a previous publication about the development of the HIV capsid-NPC model in the absence of LEN (Hudait &Voth, 2024). This information should be included as part of the introduction or an abbreviated main-text methods section rather than being included within Results as if it represents a newly reported advancement, as this could be misleading.
(b) The authors say the unbiased simulations of capsid-NPC docking were run as two independent replicates, but results from only one trajectory are ever shown plotted over time. It is not mentioned if the time series data are averaged or smoothed, so what is the shadow in these plots (e.g., Figures 1,2, and Supplementary Figure 5)?
(c) Why do the insets showing LEN binding in Figure 2A look so different from the models they are apparently zoomed in on? Both instances really look like they are taken from different simulation frames, rather than being a zoomed-in view.
(d) What are the sudden jerks apparent in the SI movies? Perhaps this is related to the rate at which trajectory frames are saved, but occasionally, during the relatively smooth motion of the capsid-NPC complex, something dramatic happens all of a sudden in a frame. For example, significant and apparently instantaneous reorientation of the cone far beyond what preceding motions suggest is possible (SI movie 2, at timestamp 0.22), RNP extrusion suddenly in a single frame (SI movie 2, at timestamp 0.27), and simultaneous opening of all pentamers all at once starting in a single frame (SI movie 2, at timestamp 0.33). This almost makes the movie look generated from separate trajectories or discontinuous portions of the same trajectory. If movies have been edited for visual clarity (e.g., to skip over time when "nothing" is happening and focus on the exciting aspects), then the authors should state so in the captions.
(e) Figure 3c presents a time series of the degree of defects at pent-hex and hex-hex interfaces, but I do not understand the normalization. The authors state, "we represented the defects as the number of under-coordinated CA monomers of the hexamers at the pentamer-hexamer-pentamer and hexamer-hexamer interface as N_Pen-Hex and N_Hex-Hex ... Note that in N_Pen-Hex and N_Hex-Hex are calculated by normalizing by the total number of CA pentamer (12) and hexamer rings (209) respectively." Shouldn't the number of uncoordinated monomers be normalized by the number of that type of monomer, rather than the number of capsomers/rings? E.g., 12*5 and 209*6, rather than 12 and 209?
(f) The authors state that "Although high computational cost precluded us from continuing these CG MD simulations, we expect these defects at the hexamer-hexamer interface to propagate towards the high curvature ends of the capsid." The defects being reported are apparently propagating from (not towards) the high curvature ends of the capsid.
(g) The first half of the paper uses the color orange in figures to indicate LEN, but the second half uses orange to indicate defects, and this could be confusing for some readers. Both LEN and "defects" are simply a cluster of spheres, so highlighted defects appear to represent LEN without careful reading of captions.
(h) SI Figure S3 captions says "The CA monomers to which at least one LEN molecule is bound are shown in orange spheres. The CA monomers to which no LEN molecule is bound are shown in white spheres. " While in contradiction, the main-text Fig 2 says "The CA monomers to which at least one LEN molecule is bound are shown in white spheres. The CA monomers to which no LEN molecule is bound are shown in orange spheres. " One of these must be a typo.
(i) The authors state that: "CG MD simulations and live-cell imaging demonstrate that LEN-treated capsids dock at the NPC and rupture at the narrow end when bound to the central channel and then remain associated to the NPC after rupture." However, the live cell imaging data do not show where rupture occurs, such that this statement is at least partially false. It is also unclear that CG simulations show that cores remain bound following rupture, given that simulations were not extended to the timescale needed to observe this, again rendering the statement partially false.
(j) The authors state: "We previously demonstrated that the RNP complex inside the capsid contributes to internal mechanical strain on the lattice driven by CACTD-RNP interactions and condensation state of RNP complex (Hudait &Voth, 2024). " In that case, why do the present CG models detect no difference in results for condensed versus uncondensed RNP?
(k) The authors state: "The distribution demonstrates that the binding of LEN to the distorted lattice sites is energetically favorable. Since LEN localizes at the hydrophobic pocket between two adjoining CA monomers, it is sterically favorable to accommodate the incoming molecule at a distorted lattice site. This can be attributed to the higher available void volume at the distorted lattice relative to an ordered lattice, the latter being tightly packed. This also allows the drug molecule to avoid the multitude of unfavorable CA-LEN interactions and establish the energetically favorable interactions leading to a successful binding event. " What multitude of unfavorable interactions are the authors referring to? Data is not presented to substantiate the claim of increased void volume between hexamers in the distorted lattice. Capsomer distortion is shown as a schematic in Figure 6A rather than in the context of the actual model.
(l) The authors state that "These striated patterns also demonstrate deviations from ideal lattice packing. " What does ideal lattice packing mean in this context, where hexamers are in numerous unique environments in terms of curvature? What is the structural reference point?
(m) If pentamer-hexamer interactions are weakened in the presence of LEN, why are differences at these interfaces not apparent in the Figure 6C data that shows stiffening of the interactions between capsomer subunits?
(n) The authors state: "Lattice defects arising from the loss of pentamers and cracks along the weak points of the hexameric lattice drive the uncoating of the capsid." The word rupture or failure should be used here rather than uncoating; it is unclear that the authors are studying the true process of uncoating and whether the defects induced by LEN binding relate in any way to uncoating.
(o) The authors state: "LEN-treated broken cores are stabilized by the interaction with the disordered FG-NUP98 mesh at the NPC." But no data is presented to demonstrate that capsid stability is increased by NUP98 interaction. In fact, the presented data could suggest the opposite since capsids in contact with NUP98 in the NPC appeared to rupture faster than freely diffusing capsids.
(p) The authors state: "LEN binding stimulates similar changes in free capsids, but they occur with lower frequency on similar time scales, suggesting that the cores docked at the NPC are under increased stress, resulting in more frequent weakening of the hexamer-pentamer and hexamer-hexamer interactions, as well as more nucleation of defects at the hexamer-hexamer<br /> Interface. ... Our results suggest that in the presence of the LEN, capsid docking into the NPC central channel will increase stress, resulting in more frequent breaks in the capsid lattice compared to free capsids." The first is a run-on sentence. The results shown support that LEN stimulates changes in free capsids to happen faster, but not more frequently. The frequency with which an event occurs is separate from the speed with which the event occurs.
(q) The authors state: "A possible mechanistic pathway of capsid disassembly can be that multiple pentamers are dissociated from the capsid sequentially, and the remaining hexameric lattice remains stabilized by bound LEN molecules for a time, before the structural integrity of the remaining lattice is compromised." This statement is inconsistent with experimental studies that say LEN does not lead to capsid disassembly, and may even prevent disassembly as part of its disruption of proper uncoating (e.g., 10.1073/pnas.2420497122 previously published by the authors).
(r) Finally, it remains a concern with the authors' work that the bottom-up solvent-free CG modeling software used in this and supporting works is not open source or even available to other researchers like other commonly used molecular dynamics software packages, raising significant questions about transparency and reproducibility.
-
Author response:
Before providing a brief provisional response to the two reviews, it is important to reiterate a few key points about our work. First, our paper is largely a computational biophysics paper, augmented by experimental results. Generally speaking, computational biophysics work intends to achieve one of two things (or both). One is to provide more molecular level insight into various behaviors of biomolecular systems that have not been (or cannot be) provided by qualitative experimental results alone. The second general goal of computational biophysics it to formulate new hypotheses to be tested subsequently by experiment. In our paper, we have achieved both of these goals and then confirmed the key computational results by experiment..
The first reviewer has some valuable points, which can be addressed as follows (and will be emphasized in the revised version of the paper): (1) Yes the simulations of capsid rupture in the NPC and capsid-only are directly comparable as both have approximately the same number of bound LEN, as determined by following the LEN-capsid interaction protocol described in the main text (around Fig 6) and in the SI section S3; (2) While we have stressed this point in several places in the manuscript, here again we stress that coarse-grained (CG) MD time is not the same as real time. The point of CG simulations is to accelerate the timescale of the MD and the associated sampling, so the CG “time” from the MD integrator needs to be rescaled to associate a real time to it. As such, our CG simulation is not representing a microsecond of real time but rather something much longer. We will emphasize this again in the revised text. (3) Actually, we think that the parameterization of the LEN model and the LEN-capsid interactions is well described in the text associated with Fig 6 and in SI section S3. It is true that this one part of the CG model was parameterized “top-down” given the good experimental structures of bound LEN to capsid and other data, but the rest of the CG model is “bottom-up” (meaning developed from well-defined coarse-graining statistical mechanics as applied to molecular level structures and interactions, see also below).
As for the second reviewer, this review is quite problematic in our view as the reviewer seems to think that quoting a number of qualitative experimental results is sufficient to undermine the impact of our paper (they are not) and, furthermore, the reviewer appears to have a very minimal understanding of “bottom-up” CG modeling, which we have utilized. This modeling does not in fact rely on the “assumptions” this reviewer alleges we have relied on. (As an aside, it could be helpful for this reviewer to study the review by Jin et al, https://doi.org/10.1021/acs.jctc.2c00643) in order to become more familiar with the field and our approach before criticizing it.) We also note that our main HIV capsid-NPC docking model is already published in PNAS (https://doi.org/10.1073/pnas.2313737121), where it underwent rigorous peer review. In our forthcoming full response to the reviews and in the revised paper we will attempt to address a number of this reviewers comments, but the number, extent, and tone of this collection of criticisms, for us, calls into question the objectivity of this reviewer, not to mention the reviewer’s rather weak understanding of what we have done and how we have done it.
Finally, while we certainly appreciate the overall positive eLife assessment, we are disappointed by the statement “some mechanistic interpretations rely on assumptions embedded in the simulations, leaving parts of the evidence incomplete”. Of course, all simulations (and experiments) rely on certain assumptions, but we have gone to great length to provide a “bottomup” approach to our modeling, based on underlying molecular level structures and interactions, and we have provided experimental validation of the main simulation predictions. It seems that the comments of the second reviewer may have influenced this point of view, but we do not feel it is justified.
-
-
www.biorxiv.org www.biorxiv.org
-
eLife Assessment
This study offers valuable insights into the anatomical and physiological features of cold-selective lamina I spinal projection neurons. The evidence supporting the authors' claims is compelling, although including a larger sample size and more quantification would have strengthened the study further, and the claims of monosynaptic connectivity would benefit from being stated more cautiously. The work will interest those in the field of somatosensory biology, especially researchers studying spinal cord dorsal horn circuits and projection neuron cell types.
-
Reviewer #1 (Public review):
Summary:
Spinal projection neurons in the anterolateral tract transmit diverse somatosensory signals to the brain, including touch, temperature, itch, and pain. This group of spinal projection neurons is heterogeneous in their molecular identities, projection targets in the brain, and response properties. While most anterolateral tract projection neurons are multimodal (responding to more than one somatosensory modality), it has been shown that cold-selective projection neurons exist in lamina I of the spinal cord dorsal horn. Using a combination of anatomical and physiological approaches, the authors discovered that the cold-selective lamina I projection neurons are heavily innervated by Trpm8+ sensory neuron axons, with calb1+ spinal projection neurons primarily capturing these cold-selective lamina I projection neurons. These neurons project to specific brain targets, including the PBNrel and cPAG. This study adds to the ongoing effort in the field to identify and characterize spinal projection neuron subtypes, their physiology, and functions.
Strengths:
(1) The combination of anatomical and physiological analyses is powerful and offers a comprehensive understanding of the cold-selective lamina I projection neurons in the spinal cord dorsal horn. For example, the authors used detailed anatomical methods, including EM imaging of Trpm8+ axon terminals contacting the Phox2a+ lamina I projection neurons. Additionally, they recorded stimulus-evoked activity in Trpm8-recipient neurons, carefully selected by visual confirmation of tdTomato and GFP juxtaposition, which is technically challenging.
(2) This study identifies, for the first time, a molecular marker (calb1) that labels cold-selective lamina I projection neurons. Although calb1+ projection neurons are not entirely specific to cold-selective neurons, using an intersectional strategy combined with other genes enriched in this ALS group or cold-induced FosTRAP may further enhance specificity in the future.
(3) This study shows that cold-selective lamina I projection neurons specifically innervate certain brain targets of the anterolateral tract, including the NTS, PBNrel, and cPAG. This connectivity provides insights into the role of these neurons in cold sensation, which will be an exciting area for future research.
Weaknesses:
(1) The sample size for the ex vivo electrophysiology is small. Given the difficulty and complexity of the preparation, this is understandable. However, a larger sample size would have strengthened the authors' conclusions.
(2) The authors used tdTomato expression to identify brain targets innervated by these cold-selective lamina I projection neurons. Since tdTomato is a soluble fluorescent protein that fills the entire cell, using synaptophysin reporters (e.g., synaptophysin-GFP) would have been more convincing in revealing the synaptic targets of these projection neurons.
(3) The summary cartoon shown in Figure 7 can be misleading because this study did not determine whether these cold-selective lamina I projection neurons have collateral branches to multiple brain targets or if there are anatomical subtypes that may project exclusively to specific targets. For example, a recent study (Ding et al., Neuron, 2025) demonstrated that there are PBN-projecting spinal neurons that do not project to other rostral brain areas. Furthermore, based on the authors' bulk labeling experiments, the three main brain targets are NTS, PBNrel, and cPAG. The VPL projection is very sparse and almost negligible.
-
Reviewer #2 (Public review):
Summary:
In this study, the authors took advantage of a semi-intact ex vivo somatosensory preparation that includes hindlimb skin to characterize the response of projection neurons in the dorsal horn of the spinal cord to peripheral stimulation, including cold thermal stimuli. The main aim was to characterize the connectivity between peripheral afferents expressing the cold-sensing receptor TRPM8 and a set of genetically tagged neurons of the anterolateral system (ALS). These ALS neurons expressed high levels of the calcium-binding protein calbindin 1.
In addition, combining different viral tracing methods, the authors could identify the anatomical targets of this specific subset of projection neurons within the brainstem and diencephalon.
Strengths:
The use of a relatively new (seldom used previously) transgenic line to label TRPM8-expressing afferents, combined with the genetic characterization of a previously identified subset of projection neurons, adds a specificity to the characterization. The transgenic line appears to capture well the subpopulation of Trpm8-expressing neurons
In addition, the use of electron microscopy techniques makes the interpretation of the structural contacts more compelling.
The writing is clear, and the presentation of findings follows a logical flow.
Overall, this study provides solid, novel information about the brain circuits involved in cold thermosensation.
Weaknesses:
In the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recorded neurons is relatively low. In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the connectivity.
The authors could provide some sense of the effort needed to record from the 6 cold-activated neurons described. How many preparations were needed, etc?
-












