6,656 Matching Annotations
  1. Last 7 days
    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      The current claims should be better supported by more evidence.

      R1-1: In the first experiment, have the statistics undergone multiple comparison corrections (e.g., Line 441-442)? Given the small sample size, incorporating additional statistical tests (such as the Bayes Factor) could strengthen the analysis.

      We confirm that corrections for multiple comparisons are now applied where appropriate, particularly in the group-level ANOVA analyses.

      “Post-hoc tests using Holm-Bonferroni correction show that V1 neuronal populations receiving inputs from the central visual field (0.5-4.5°) showed greater contrast sensitivity to high spatial frequency as compared to low spatial frequency stimuli (steeper slope for the 3cpd versus 0.3cpd condition: 0.5-2.5º: t(6) = 4.35, p<sub>bonf</sub> = 0.0149; 2.5-4.5º: t(6) = 3.471, p<sub>bonf</sub> = 0.0266). Conversely, peripheral eccentricities in V1 (above 9.5°) showed higher contrast sensitivity to low as compared to high spatial frequency stimuli (steeper slope for 0.3cpd versus 3cpd condition: 9.5-15º: 𝑡(6) = −4.591, p<sub>bonf</sub> = 0.0149; 15-20º: t(6) = −6.615, p<sub>bonf</sub> = 0.0029). Between 4.5° and 9.5°, V1 contrast sensitivity was similar for both spatial frequencies (t(6) = −0.226, p<sub>bonf</sub> = 0.8286). Crucially, these effects remained when using retinotopic estimates based on structural scans derived from the Benson retinotopic atlas instead of the pRF-mapping measures (0.5-2.5º: 𝑡(6) = 5.768, p<sub>bonf</sub> = 0.0059 ; 2.5-4.5º: t(6) = 2.531, p<sub>bonf</sub> = 0.0892 ; 4.5-9.5º: 𝑡(6) = −0.293, p<sub>bonf</sub> = 0.7792; 9.5-15º: t(6) = −3.274, p<sub>bonf</sub> = 0.0509; 15-20º: t(6) = −3.528, p<sub>bonf</sub> = 0.0496; see Figure A2 and Table A3 in Appendix section).”

      “Post-hoc pairwise comparisons using Holm-Bonferroni corrections revealed that, as predicted, the cortical contrast response function had a higher slope – indicating better V1 sensitivity – along the horizontal versus vertical quadrants (Horizontal-Vertical Anisotropy – HVA: 𝑡(6) = 5.908, p<sub>bonf</sub> = 0.0031) and along the lower versus upper quadrant (Vertical Meridian Anisotropy – VMA: 𝑡(6) = 4.106, p<sub>bonf</sub> = 0.0126). Conversely, no difference in cortical contrast sensitivity was found between V1 neuronal populations encoding the left and right quadrants of the visual field (Left-Right Horizontal Meridian Anisotropy – LRHMA: t(6) = 0.7197, p<sub>bonf</sub> = 0.4988).”

      “We found that the horizontal-vertical anisotropy effect was recovered (HVA: t(6) = 3.584, p<sub>bonf</sub> = 0.0347), but that the vertical meridian anisotropy effect was not (VMA: t(6) = 0.744, p<sub>bonf</sub> = 0.9697) with this approach.”

      R1-2a: The authors claim that "structure-based atlases can replace the need for pRF mapping in cases where it might otherwise be difficult or impossible to collect pRF data." This claim needs further scrutiny. Currently, only one simulated condition of visual field loss was examined in one subject.

      AR-R1-2a: We agree that further work is needed to fully establish the utility of structure-based atlases. As a first step, we have followed the reviewer’s suggestion and collected an additional dataset from one of the seven participants, in whom we simulated another condition of visual field loss – specifically, loss of the upper right quadrant. This participant is the same individual already presented in the manuscript (C5), but with a different simulated vision loss condition.

      This new condition has been introduced in the Methods, Results and Discussion section, and a new Figure 10 alongside Figure 9 which showed the 3º-8º scotoma. With relevant changes as follows:

      “We also demonstrate the clinical relevance of this approach by recovering simulated scotomas (i.e., a ring of visual field loss around fixation and the loss of an entire visual field quadrant), as well as visual field loss in a patient with a neurodegenerative disorder causing large areas of visual field loss.”

      “Additionally, one participant (C5) repeated the task under two simulated vision loss conditions (ring or quadrant loss), and two others (C5, C6) completed it with different levels of eye movement.”

      “Simulated vision loss

      One healthy control participant (C5) also performed a version of the task designed to simulate two forms of visual input loss (i.e., artificial scotoma). These simulations were implemented by: (a) masking a region of the visual field with a grey, annular ring, covering 3º-8º eccentricity, and (b) masking the upper right visual quadrant using a grey quarter-sector overlay. The stimuli and contrast levels used in this task were identical to those described in the original task.”

      “A test-case of simulated loss of visual inputs

      In the previous sections, we showed that the slope of a square root function provides a reliable measure of contrast sensitivity in the brain of healthy controls. But can this brain-level model also quantify loss of visual inputs? To test this, we first simulated an artificial scotoma in one normal sighted participant, by (a) masking a region of the visual field with a grey, annular ring, covering 3°-8° eccentricity (Figure 9A), and (b) masking the upper-right visual quadrant using a grey quarter-sector overlay (Figure 10A). We expect smaller slope values in V1 neuronal populations that would under normal circumstances encode that part of the visual space.

      As expected, we observed reduced responses in V1 locations corresponding to the artificial scotoma (Figures 9 and 10), with increased responses along the edges of the mask for the ring scotoma condition (Figure 9B). This artificial loss of visual input was also clearly present in the cortical contrast sensitivity estimate, with significantly reduced slope steepness in V1 between 3-8° for the ring scotoma condition (Figure 9C&D) and in the upper-right quadrant for the quarter-sector scotoma condition (Figure 10B&C). Additionally, we could recover this scotoma using the calibrated Benson template, although less accurately (Figures 9E and 10D). These results show that this measure of V1 contrast sensitivity is sensitive enough to detect loss of visual inputs in the brain at an individual level, when a complete local loss of sight is simulated, and that this approach does not crucially rely on pRF mapping data from the individual. This supports the utility of our approach in recovering patterns of vision loss and recovery at a cortical level.”

      “Mapping Simulated and Pathology-Driven Vision Loss

      Our method successfully identified both simulated retinal loss in a healthy volunteer and real visual field loss in a patient with Leber Hereditary Optic Neuropathy (LHON). The signal drop observed in response to masking portions of the visual field in the healthy control was both large and significant at the individual level, as demonstrated by non-overlapping 95% confidence intervals (Figures 9B-C and 10B). This provides proof-of-concept evidence that our approach can detect signal changes in individual patients, which is a critical requirement for clinical translation.

      Unlike previous fMRI studies that used high-contrast stimuli (Farahbakhsh et al., 2022; Pawloff et al., 2023; Ritter et al., 2019), which may not accurately represent partial vision loss due to potential saturation effects and the stimulation of less sensitive retinal cells, our use of multiple contrast levels offers a more nuanced assessment of cortical contrast sensitivity.

      Combined with the large-field set-up allowing stimulation up to 20° eccentricity, this approach may be particularly well-suited for evaluating treatment efficacy in cases of widespread and variable vision loss.

      Future work will focus on further validating reconstruction accuracy under controlled conditions, including simulated scotomas of varying severity and location, expanding testing to larger patient cohorts, and establishing a normative dataset to contextualize patient data.

      R1-2b: Also, in Figure 7, contrast sensitivity in the periphery differs between pRF mapping and the Benson atlas. How do the authors explain this discrepancy?

      AR-R1-2b: The discrepancy in periphery between pRF mapping and Benson atlas is caused by various factors. These include (a) individual differences in the retinotopy/structure relationship that are not captured in the template, (b) the fact that the Benson atlas at larger eccentricities was obtained with hemifield stimulation, and (c) a larger impact of any inaccuracies at larger eccentricities because of cortical magnification. As a result, peripheral vertices are more likely to be mis-assigned by the template than central ones. Note that this adds distortion in cortical visual field maps which will be consistent across timepoints (rather than noise). Critically, a reduction in accuracy does not preclude utility if meaningful differences in spatial patterns in cortical sensitivity can still be recovered, as is the case in our data. We cover this in the discussion.

      “Particularly at large eccentricities however, we initially observed inaccuracies between the template and individual retinotopy eccentricity estimates which led to substantial distortions in cortical visual field maps due to cortical magnification (see Figure A4 in Appendix section). To address this, we adjusted the Benson eccentricity estimates to align with the cortical magnification scaling function (Horton & Hoyt, 1991).”

      “Beyond ROI considerations, we still observed differences in cortical sensitivity between pRF mapping and the adjusted Benson atlas - particularly in the periphery. Several factors likely contribute to this. First, individual differences in the relationship between cortical structure and retinotopy are not fully captured by the template. Second, the Benson atlas has never been fit with empirical data more eccentric than approximately 20°, which naturally limits its precision in the far periphery. Third, because of cortical magnification, any small inaccuracy at larger eccentricities has a disproportionately large effect, making peripheral vertices more susceptible to mis-assignment than central ones. These influences introduce systematic distortions in cortical visual field maps rather than random noise and thus remain consistent across time points - an important point when assessing longitudinal changes (e.g., ageing or gene-therapy interventions). Importantly, the spatial gradients in cortical contrast sensitivity were preserved across both the pRF and Benson atlas approaches, indicating that minor ROI differences do not affect our conclusions. Together, these findings show that the Benson Atlas remains a useful alternative when pRF mapping is not feasible.

      R1-3: Overall, the writing could be significantly improved.

      AR-R1-3: We have made edits throughout the manuscript and hope this has improved the writing.

      Reviewer #1 (Recommendations for the authors):

      R1-Recommendation 1a: The writing can be significantly improved for clarity.

      The introduction section is not well-organized, and the motivation for developing the current method (Paragraphs 2-3) is vague and lacks adequate documentation.

      Several references are missing (e.g., Lines 90-92) or incorrectly placed (e.g., Lines 108-109).

      AR-R1-Recommendation 1a: We have revised the Introduction to clarify the motivation for developing the current method and to correct missing or misplaced references.

      “Still, testing visual function across the visual field remains limited in clinical and therapeutic contexts, especially in patients with drastic central vision loss. In this study, we aimed to address this gap by introducing a novel fMRI-based approach to measure visual field sensitivity across a wide expanse of the visual field (40º diameter).”

      “Beyond visual acuity, functional impairment across the wider visual field can be measured using a range of visual field tests, from the finger counting visual confrontation field test to more complicated and/or computerized tests (e.g., standard automatic perimetry, kinetic perimetry, microperimetry; Rai et al., 2024). Computerized tests typically involve measuring sensitivity to the luminance contrast of a target relative to a background at different visual field locations while the participant’s gaze is fixed on a central point. In some cases (e.g., microperimetry), sensitivity measurements are paired with fundus imaging, offering greater precision in linking visual field functions to specific retinal locations (Rai et al., 2024). As a result, visual field assessments can reveal functionally relevant deficits – including localized sensitivity loss and scotomas – that are not captured by foveal acuity alone, and are therefore potentially valuable for tracking disease progression and therapeutic efficacy.

      Despite their clinical relevance, visual field testing comes with challenges and limitations, and as a result, the inclusion of visual field measures in sight-rescuing therapy trials is limited. Firstly, it requires prolonged fixation and sustained visual attention. This can be very challenging for patients with severe vision loss, who often struggle to fixate, and strain to detect even high intensity stimuli. This can lead to long and unpleasant testing sessions with unreliable results. Secondly, as perception of light stimuli is inherently subjective (Rai et al., 2024) and effortful, patients may vary in their criteria for visual recognition, and in their ability to report visual signals that are weakened or distorted by disease. Together, these constraints reduce the feasibility, robustness, and interpretability of conventional visual field testing in clinical trials, underscoring the need for alternative or complementary approaches that can assess functional vision while placing fewer demands on subjective reporting.”

      “Functional MRI (fMRI) has recently been proposed as a promising alternative to measure visual field loss, as it requires no overt task, and instead measures visual sensitivity directly from brain responses (Farahbakhsh et al., 2022; Prabhakaran et al., 2021; Ritter et al., 2019). Population receptive field (pRF) mapping fMRI can measure which parts of the cortex respond to which parts of the visual scene (Dumoulin & Wandell, 2008).”

      “Finally, most studies use a single maximum contrast stimulus to assess visual function (Broderick et al., 2022; Farahbakhsh et al., 2022; Liu et al., 2006; O’Connell et al., 2016; Ritter et al., 2019).”

      R1-Recommendation 1b: The strengths of the current method and its applicable scenarios are unclear. For example, in Lines 39-40: "We developed an fMRIbased approach to measure contrast sensitivity across the visual field without the need for precise fixation." To what extent can fixation be imprecise? Could this protocol be applied to patients with strabismus, who have biased fixation?

      AR-R1-Recommendation 1b: We agree with the reviewer that the tolerance to fixation challenges is key here and so we collected additional data to respond to your points regarding the effects of eye movement on the cortical contrast sensitivity maps.

      In terms of biased fixation, the approach should be very robust to this, as this would just reduce the cortical visual field covered on one side and extend it on the other.

      We collected new data to test the tolerance to fixation instability across a wide range of eye movement, including severe nystagmus-level movement. Despite large eye movements, the cortical contrast-sensitivity pattern remained largely consistent, though extreme movements reduced slope estimates and flattened the cortical sensitivity pattern for 3cpd, indicating reduced measurement sensitivity for extreme eye movement to high spatial frequency gratings.

      These additions have been incorporated into the Abstract, Methods, Results, and Discussion sections as follows:

      Abstract

      “To assess the method’s tolerance to fixation variability, we further investigated how different levels of eye movement affect cortical sensitivity patterns in two participants. We found that cortical sensitivity patterns were largely preserved across eye movement, particularly at low spatial frequencies. This suggests that our approach can accommodate several degrees of fixation instability, making it suitable for populations with unstable or biased fixation for whom visual field maps are harder to acquire behaviorally (e.g., patients with dense central scotoma or strabismus).”

      Methods

      “Additionally, one participant (C5) repeated the task under two simulated vision loss conditions (ring or quadrant loss), and two others (C5, C6) completed it with different levels of eye movement.”

      Results

      “Effect of eye movement

      Participants C5 and C6 also performed a version of the task designed to test the effect of eye movements. In this version, saccades were elicited by randomly and rapidly shifting the fixation dot away from central fixation (C5: 2º and 5º from fixation and random motion; C6: up to 2º from fixation). Participant C5 was tested using 0.3 and 3cpd gratings at four contrast levels (7.5, 42.2, 60, 100%), while participant C6 was tested only under the low spatial frequency condition (0.3cpd).

      Fixation stability was assessed for each fMRI run using the bivariate contour ellipse area (BCEA), which estimates the area (in degrees<sup>2</sup> or arcmin<sup>2</sup>) of an ellipse that contains approximately 95% of fixation points. BCEA was calculated using the formula: , as described by Morales et al. (2016). In this expression, σ<sub>h</sub> and σ<sub>v</sub> represent the standard deviations of eye position in the horizontal and vertical directions, respectively, while p corresponds to the Pearson correlation coefficient between horizontal and vertical eye positions. The constant k determines the size of the ellipse based on the desired probability area, defined by the relationship P =1 – e<sup>-k</sup>, with P set to 0.95 in this study. A smaller BCEA indicates greater fixation stability.

      “Effect of eye movements on V1 cortical sensitivity

      So far, we have demonstrated that our measure of cortical sensitivity can reliably recover known gradients in sensitivity across eccentricities and visual quadrants. We also showed that this measure was consistent across visits and sessions, suggesting its potential utility for monitoring changes over time. However, all prior tasks were conducted under conditions of central fixation, with participants instructed to maintain gaze on a central dot. A key motivation for this approach was its theoretical robustness to fixation instability. We therefore also aimed to investigate how varying degrees of eye movement might influence cortical sensitivity across the visual field.

      To address this, two participants (C5 and C6) completed a modified version of the contrast sensitivity task in which they made eye movements either by following a dot moving randomly at a radius of 2º or 5º around fixation, or by self-initiated very large eye movements. Eye movements across these or by self-initiated very large eye movements. Eye movements across these conditions (Figure 7, bottom row; Figure 8, bottom row), were quantified using BCEA (C5 – Central fixation: mean±SD = 0.57±0.11 deg<sup>2</sup>, 2º eye motion: 2.69±0.48 deg<sup>2</sup>, 5º eye motion: 20.3±1.32 deg<sup>2</sup>, random eye motion: 133.7±23.36 deg<sup>2</sup>; C6 – Central fixation: 0.96±0.56 deg<sup>2</sup>, 2º eye motion: 1.28±0.15 deg<sup>2</sup>). For reference, in severe (idiopathic) nystagmus, the eye movement variability along the vertical and horizontal planes is on average 1.08 deg and 1.60 deg, respectively (Tailor et al., 2021). Assuming a moderate correlation between axes (p = 0.3), the average fixation stability would equate to a BCEA of ~21.46 deg<sup>2</sup> (i.e., ~5º eye motion condition in our data).

      Despite these very large levels of eye movements, we observed that the overall cortical contrast sensitivity spatial pattern across eccentricity remained remarkably consistent (Figure 7, top and middle rows; Figure 8, top row). However, at the most extreme movements, contrast sensitivity estimates (slope values) were lower; and while the overall cortical visual field map structure was still clearly present for low spatial frequencies, it appeared more flattened for 3cpd, suggesting reduced sensitivity of our measure for large eye movement and high spatial frequency stimuli.”

      Discussion

      “Crucially, one advantage of cortical visual field mapping is that the maps are inherently centered on the foveal confluence, providing a stable reference point for comparing responses across eccentricities. When combined with large-field, spatially homogeneous stimuli, this anchoring means that our approach should remain robust to moderate fixation variability and still quantify sensitivity changes across the visual field – provided that fixation instability does not exceed the stimulus extent (40º diameter).

      When measuring the impact of eye movements, we found that spatial sensitivity patterns were largely preserved, even for extreme eye movements (emulating severe nystagmus). However, under the most extreme conditions, sensitivity estimates (i.e., slope values) were reduced, especially for high spatial frequency (SF) stimuli. This likely reflects image blurring from large rapid eye movements, which degrades high-SF inputs and shifts activation toward neurons tuned to lower SFs. This aligns with evidence that nystagmus and large saccades impair perception of fine detail and grating stimuli due to retinal image slip (Abadi & Bjerre, 2002; Dickinson & Abadi, 1985; Hertle et al., 2017; Randall et al., 2020). While classic findings report suppression of low-SF signals during saccades (Burr et al., 1994; Ross et al., 2001), our results suggest that high SF sensitivity may be more vulnerable to large eye movements when participants are presented with 2Hz phase-flickering gratings. Further validation in clinical groups with naturally-occurring fixation instability would further strengthen these conclusions.”

      R1-Recommendation 1c: There are also some confusing descriptions, such as Lines 130-132.

      AR-R1-Recommendation 1c: We have also clarified ambiguous descriptions of the Benson atlas templates.

      “We therefore also evaluated the approach using the structure-based atlas of retinotopic values developed by Benson et al. (Benson et al., 2014; Benson & Winawer, 2018). This atlas predicts retinotopic organization by aligning individual cortical anatomy (e.g., surface curvature) to a group-average template that incorporates an algebraic model of retinotopy (Benson et al., 2014). Once the subject’s brain is aligned to this structural atlas, retinotopic maps defined by the model – i.e., polar angle and eccentricity maps – are projected onto the individual’s cortex. This allows estimation of visual field maps without requiring functional imaging, and provides a non-invasive, anatomy-driven approximation of visual field representations.”

      R1-Recommendation 1d: Line 361: "Assessing the brain's ability to discriminate shapes"-is the author referring to the functional relevance of contrast tuning assessment here? Since the task or stimuli are not related to shapes, this description is unclear.

      AR-R1-Recommendation 1d: We have revised the reference to “discriminating shapes” to more accurately reflect the functional relevance of contrast sensitivity mapping.

      “To measure visual field function, we developed a new measure of cortical contrast sensitivity, assessing the brain’s ability to discriminate gratings of varying spatial frequencies based on luminance variations.”

      R1-Recommendation 2a: Simulated visual loss experiment: only one condition of visual field loss was examined in a single subject. I encourage the authors to include additional subjects to meet statistical test criteria at group level. Simulated scotomas in more visual quadrants, including both central and peripheral areas, should be examined, as asymmetries may exist.

      AR-R1-Recommendation 2a: We agree that it is important to verify that the approach can also capture other types of scotomas. We have therefore now incorporated another simulated condition of visual field loss, namely loss of the upper right quadrant.

      Regarding adding more participants: The drop in signal is clearly large and significant at the individual level (error bars corresponding to 95% confidence interval do not overlap; Figures 9B-C & 10B). The ability to detect signal change at the individual level is what we need for clinical application, and here we are showing proof-of-concept of its feasibility with our approach. However, we do appreciate that it might be valuable to test cortical visual field loss reconstruction accuracy with simulated scotomas of varying levels of vision loss in variable locations. We now highlight this as a future direction.

      Please refer to our response to R1-2a, where we also detail the corresponding changes made in the manuscript.

      R1-Recommendation 2b: Additionally, why do the results from pRF mapping and the corrected Benson atlas differ, particularly in the far periphery?

      AR-R1-Recommendation 2b: Please refer to our response to R1-2b, where we also detail the corresponding changes made in the manuscript.

      R1-Recommendation 3: To validate the recovery of visual field loss in the case study, it would be necessary to include fundus imaging to characterize the structural loss and correlate it with the behavioral and fMRI results.

      AR-R1-Recommendation 3: We included Compass perimetry data for the LHON patient, which is fundus-tracked perimetry and uses fundus imaging to keep the visual stimulation fixed to retinal locations.

      In the context of LHON, the fundus image is not expected to provide more information than perimetry. This is because the visual deficit in LHON arises from optic nerve dysfunction, and retinal abnormalities are typically minimal. Aside from the characteristic pallor of the optic disc, the fundus appearance is usually normal in appearance.

      For illustration, Author response image 1 shows the Compass-acquired fundus image from the LHON patient included in this study. For comparison, we also show a normal fundus image from a 25-year-old male volunteer, reproduced from Häggström, Mikael (2014). "Medical gallery of Mikael Häggström 2014". WikiJournal of Medicine 1 (2). DOI:10.15347/wjm/2014.008. ISSN 2002-4436. Public Domain.

      Author response image 1.

      We do, however, recognize the importance of linking functional changes to structural alterations (e.g., retinal thickness measured with OCT), and we now highlight this as a key future direction in the discussion. This will be a central focus of a planned follow-up study involving a larger patient cohort.

      “Next steps in this work will therefore involve testing larger patient cohorts with diverse forms of vision loss, validating the approach for tracking pathology over time, and investigating how cortex-based visual field measures relate to and complement other visual field and retinal integrity indices including Compass measures and OCT-derived retinal layer thickness.”

      “Additionally, linking brain-based variations in function across the visual field to behavioral performance (e.g., perimetry, microperimetry) and retinal structure (fundus imaging, retinal thickness from Optical Coherence Tomography), could help bridge the gap between neural measures and functional outcomes. Such integration would provide deeper insights into developmental, learning, and vision loss mechanisms.”

      R1-Recommendation 4a: Why is a 0.5 mm smoothing applied to the contrast task data?

      AR-R1-Recommendation 4a: We have now clarified in the Methods section. This 0.5 mm FWHM smoothing kernel was applied to the contrast sensitivity task data to meet the minimum requirements of the GLM module in SPM.

      “To accurately capture neural activity across various eccentricities and polar angle locations, minimal smoothing (0.5mm FWHM Gaussian blur) was applied to the contrast sensitivity task data using FSL’s 3dmerge program. This was done to meet the minimum requirements of the GLM module in SPM.”

      R1-Recommendation 4b: Is this the first time the cortical magnification calibration has been applied to the Benson atlas? I recommend including a figure to describe this method.

      AR-R1-Recommendationn 4b: This is indeed the first time this correction has been applied to the Benson atlas. We have now added a figure (Figure 3) to illustrate the eccentricity adjustment procedure applied to the Benson atlas.

      R1-Recommendation 5: In Figure 5, the test-retest reliability can be reported by including r-values.

      AR-R1-Recommendation 5: We have now included Spearman correlation 𝜌-coefficients for test-retest and between-condition comparisons in Figure 6 (previously Figure 5).

      R1-Recommendation 6: Inconsistency in the reporting format of statistical values: e.g., the degrees of freedom are presented with, or without parentheses.

      AR-R1-Recommendation 6: Thank you for pointing this out. We have reviewed and standardized the reporting format of all statistical values throughout the manuscript to ensure consistency. Degrees of freedom are now all presented with parentheses, in details:

      “Using ANOVA, we found the expected interaction between spatial frequency and eccentricity (F(1.96,11.79) = 28.66, p < 0.001; Figure 4) as well as a main effect of eccentricity (F(2.33,13.99) = 12.67, p < 0.001).”

      “We found a main effect of visual field quadrant location on V1 sensitivity (F(2.46,14.76) = 20.71, p < 0.001).”

      “Moreover, there was no interaction between spatial frequency and (F(2.16,12.99) = 1.34, p = 0.298), visual field quadrant positions suggesting V1 visual field anisotropies are relatively constant across spatial frequencies.”

      Reviewer #2 (Public reviews):

      R2-1a: Questionable sensitivity to differences in patients. The variability in heat maps across healthy control participants is somewhat surprising. Do differences between individuals represent actual visual sensitivity differences, or are they an artifact of the measurement technique, e.g., due to signal-to-noise differences introduced by local variations in brain anatomy? Will the substantial variance across controls allow for a sufficiently stable baseline to detect meaningful differences in individual patients?

      AR-R2-1a: We agree the variability across healthy controls is surprising. It is unclear whether this reflects true individual differences in visual sensitivity or arises from factors like local signal-to-noise introduced by local variations in brain anatomy. It will be really interesting to investigate this further by examining structural variations across the visual field and comparing them with behavioral measures.

      As for establishing a stable baseline for patient comparisons, this is inherently an empirical question and depends on the degree of vision loss. LHON patients typically show dense central scotomas (up to 15º) in the chronic phase, making them well suited for detecting sensitivity differences – e.g., between central versus peripheral locations. Detecting subtler changes – in the acute phase or other conditions – may be more challenging. We agree with the reviewer that a normative range will be essential for contextualizing patient data, which we now mention in the Discussion, and we aim to develop in the future based on the present data.

      “Future work will focus on further validating reconstruction accuracy under controlled conditions, including simulated scotomas of varying severity and location, expanding testing to larger patient cohorts, and establishing a normative dataset to contextualize patient data.”

      R2-1b: Also, as the authors rightly point out, Benson atlas does not model differences along meridians, so upper/lower field differences might not be detectable.

      AR-R2-1b: We acknowledge the limitations of the Benson atlas, particularly its inability to model meridional asymmetries (e.g., upper vs. lower visual field). Still, our goal is to provide a method for tracking visual cortex changes over time. By consistently projecting longitudinal functional data onto the same structural image fitted with the Benson atlas, we maintain a stable anatomical reference, which supports reliable comparisons across timepoints – even with limited spatial accuracy. Future improvements could include shearing corrections, Bayesian updating, or alternative models such as DeepRetinotopy developed by Ribeiro et al.

      “Further enhancing the alignment between retinotopic template atlases and individual retinotopic tuning could improve this approach further, for example, by integrating them with functional measures using Bayesian methods (Benson & Winawer, 2018). In parallel, geometric deep learning frameworks such as DeepRetinotopy (Ribeiro et al., 2021) could also offer anatomy-driven predictions from structural MRI, and combining these strategies may yield more accurate and generalizable retinotopic reconstructions.”

      R2-2: Effects of unstable fixation/eye movements not explicitly tested: The methods state, 'In all tasks, participants were asked to report when the color of a central fixation dot changed', suggesting participants maintained fairly good fixation. Most of the results seem to pertain to measurements where central fixation is required. How does unstable fixation affect measurements?

      AR-R2-2: This is an important point. We have now extensively and systematically investigated the impact of eye movements on the cortical contrast sensitivity maps and updated the Abstract, Methods, Results, and Discussion sections (see R1-1b).

      R2-3: Potential for clinical translation. Although it is a sensitive measure, functional MRI is costly, is not available in all clinical settings, requires significant post-processing analyses, and may be contraindicated in some individuals due to safety (e.g., metallic implants) or other concerns (e.g., claustrophobia). These could present significant barriers to widespread clinical translation if this were the ultimate goal of the study.

      AR-R2-3: We agree that fMRI, while sensitive, has practical limitations for broad clinical adoption due to cost, accessibility, and contraindications. However, it remains a valuable tool in targeted contexts, where sensitive detection of visual field loss has large utility – for example for evaluating treatment effects in clinical trials. This application has been demonstrated in recent studies (Farahbakhsh et al., 2022; Maimon-Mor et al., 2025; Haal et al., 2016; Ritter et al., 2019).

      R2-4: Limited range of spatial frequencies. The spatial frequencies tested were still quite low (0.3 and 3cpd) compared to measures such as visual acuity. Extending the measurements to higher spatial frequencies could allow better characterization of central vision, although necessarily for peripheral vision.

      AR-R2-4: We agree that extending to higher spatial frequencies could improve central vision characterization and note this can be readily incorporated into future studies using the current framework. However, LHON patient’s acuity tends to be very low, and we found that 5cpd did not allow us to measure any cortical contrast sensitivity in a prior pilot. So, to characterize the visual field in LHON with fMRI, we therefore aimed to balance central and peripheral coverage: 0.3 cpd ensured broad detectability, while 3 cpd offered a middle ground to assess central vision without exceeding acuity of this population. Additional approaches, such as neural contrast sensitivity functions (e.g., Roelofzen et al., 2025) may also offer complementary insights such as acuity, and contrast sensitivity across the full spatial frequency range (area under the curve).

      Reviewer #2 (Recommendations for the authors):

      R2-Recommendation 1: It appears that the reliability measures, comparing differences in Spearman correlations between and within sessions, were not tested statistically, but evaluated qualitatively. What was the justification for this? The results only state Spearman values, but the discussion claims that the differences between the two comparisons were significant.

      AR-R2-Recommendation 1: The differences in Spearman correlations between and within sessions were tested statistically, and the omission of p-values was an oversight. We have now revised the Results section results from the paired one-tail t-test as follows:

      “We collected test-retest reliability measures from 4 out of 7 participants (Figures 6A-B) and benchmarked them against the correlations between the 0.3cpd condition and 3cpd spatial frequency condition, collected in the same session (Figure 6C). If measures are reliable, correlations should be higher for repeated measures with the same spatial frequency stimulus, collected on different days. We tested this prediction using a one-tailed paired t-test.”

      “This difference was statistically significant (t(3) = 2.62, p < 0.0395).”

      R2-Recommendation 2a: The variability of heat maps (visual field sensitivities) between healthy controls should also be discussed. What are potential explanations for this variability?

      AR-R2-Recommendation 2: We have expanded the Discussion section to address the variability observed in cortical sensitivity maps across healthy controls.

      “We also observed intriguing variability in cortical visual field maps across healthy controls, and this variability was consistent across measures. This may reflect genuine individual differences in visual sensitivity that are relevant for behavioral performance. Alternatively, it could arise from factors such as local signal-to-noise differences driven by anatomical variability. However, the fact that maps derived from different spatial stimulus conditions showed markedly different patterns argues against a purely anatomical explanation and suggests that at least part of the variability is functional. Despite this inter-subject variability, variations in cortical contrast sensitivity across eccentricities and visual field quadrants were significant at the individual level indicating high sensitivity.”

      R2-Recommendationn 2b: There should also be more discussion about any potential effects of eye movements/unstable fixation in order to address the suitability of the methods for these clinical populations.

      AR-R2-Recommendation 2b: Please refer to our response to R2-2, where we also detail the corresponding changes made in the manuscript.

      Reviewer #3 (Public review):

      R3-1: The authors should more strongly emphasize their findings on the organization of contrast sensitivity, particularly in light of the stimulation extent provided by the wide-field setup.

      AR-R3-1: Thank you for this important point – we have now emphasized more clearly in the manuscript that our method extends the measurement of contrast sensitivity to 20º eccentricity, which represents a significant advancement over previous studies.

      “These results demonstrate that our approach can detect subtle changes in visual sensitivity across eccentricities at the individual participant level. The ability to reveal these gradients was made possible by the large peripheral coverage provided by our large-field stimulation set-up (see Figure A1 in Appendix section), which enabled a more complete characterization of V1 sensitivity across the visual field. Importantly, the same effects were preserved when using retinotopic estimates derived from structure-based atlases, demonstrating that atlas-based methods can be used as alternative to pRF mapping in cases where it might otherwise be difficult or impossible to directly collect pRF measures. Together, these highlight both the validity of our approach and its potential to broaden the scope of visual neuroscience.”

      “Crucially, the ability to visualize these sensitivity gradients was made possible by the large peripheral coverage provided by our large-field stimulation set-up. Such coverage is particularly important for clinical applications, as it enables the detection of visual field losses beyond the macula (i.e., beyond 10º eccentricity) and the evaluation of residual peripheral vision in patients with macular-restricted damage. In doing so, this work provides a useful tool for advancing both basic visual neuroscience and translational research in clinical populations.”

      R3-2: Certain methodological aspects require further clarification, particularly regarding the correction of eccentricity values from the Benson atlas. It's not clear which V1 masks are used for the specific analysis which could have a substantial impact on the reported differences between the two approaches of pRF mapping and atlas-based pRF parameters.

      AR-R3-2: The correction of eccentricity values was performed using the V1 label provided by the Benson atlas. We have now explicitly stated this in the Methods section:

      “We collected data from 7 healthy controls (mean±SD: 29.6±4.7yo; 1M). All controls either had normal or corrected to normal vision, with no other ocular pathologies, and were recruited from the local staff and student pool at the University College of London. Each control completed both the population receptive field (pRF) mapping and the fMRI contrast sensitivity task. To assess measurement repeatability, four participants (C2, C4, C5, C6) performed the contrast sensitivity task twice. Additionally, one participant (C5) repeated the task under two simulated vision loss conditions (ring or quadrant loss), and two others (C5, C6) completed it with different levels of eye movement.”

      “Four participants (C2, C4, C5, C6) were invited for a second session in which they repeated the task to assess the reliability of the measures.”

      R3-4: The conclusion that high-contrast patterns as in pRF mapping are not optimal to test for subtle but potentially clinically relevant changes in the visual field coverage is very valid. The suggested use of contrast sensitivity can therefore be a potentially well-suited parameter for estimating visual field losses. The presented work is an interesting starting point and the proposed method of using contrast sensitivity as a measure for partial vision loss should further be explored.

      AR-R3-4: Thank you for the positive evaluation of our work.

      Reviewer #3 (Recommendations for the authors):

      R3-Recommendation 1: The shown organization of contrast sensitivities is consistent with previous studies; however, it extends the measurements to up to 20º eccentricity, which is, to my knowledge, much more than previously reported. The authors should therefore emphasize this more strongly.

      AR-R3-Recommendation 1: Please refer to our response to R3-1, where we also detail the corresponding changes made in the manuscript.

      R3-Recommendation 2: In the Methods section, it is not entirely clear why the eccentricity values originating from the Benson atlas need to be corrected using Horton & Hoyt cortical magnification. Do the authors consider these cortical magnification measurements as ground truth? Is the correction only applied to higher eccentricity values that are not mapped by the Benson atlas?

      AR-R3-Recommendation 2: The Benson et al. (2014) atlas predicts both polar angle and eccentricity from cortical anatomy (curvature, thickness) using a template pRF dataset and a mathematical retinotopic model. However, it does not incorporate a smooth parametric cortical magnification function such as Horton & Hoyt. Because the atlas is fit to an average map across subjects, and because the FreeSurfer alignment used to apply the template cannot incorporate functional information, the atlas cannot capture individual variability in eccentricity or cortical magnification. In practice, we therefore treat the Benson atlas as providing the correct topological layout of eccentricity, but not necessarily the correct eccentricity values for a given individual. Moreover, the data used to generate the Benson atlas have mainly been restricted to the central visual field (roughly 8º-12º) and the Benson atlas themselves has never been fit with data more eccentric than 20º. Consequently, peripheral eccentricity values are more model-driven and less constrained by ground-truth data.

      To improve the correspondence between the atlas and expected cortical representations, we applied Horton & Hoyt cortical magnification function to all eccentricities in the V1 Benson mask (from the foveal confluence to the periphery, up to 90º). We assume that the Horton & Hoyt model, adapted from physiology data, provides an accurate model of group level cortical magnification (Benson et al., 2021) – even though it does not capture individual differences. This means it offers the best approximation of ground-truth in the absence of individual pRF data, which is often not feasible to collect in patients with unstable fixation. We have now added a figure that showcases the method and shows how this correction affects the distribution of eccentricity values in the Benson atlas.

      R3-Recommendation 3: For the analysis using the atlas-based retinotopy, it is not entirely clear whether the authors also use the provided V1 masks. In other words, differences between the original pRF-based and atlas-based analyses could originate from different borders of V1 rather than from the atlas-based pRF parameters. The authors could try using the same mask for both analyses, either the manually delineated one or the atlas-based one.

      AR-R3-Recommendation 3: This is a well-noted point that is important to clarify. We used a manually delineated V1 mask for the own pRF map data and the Benson mask for the adjusted Benson atlas-based analysis – both restricted to the screen size. The difference in included vertices could have indeed introduced some additional error beyond the atlas/pRF mapping itself. We have opted not to correct this in this version of the manuscript because (1) the error introduced is likely small (as we inspected that the alignment of V1 ROI delineations with the Benson ROIs are good, so effects are likely not too major - although using identical masks may slightly improve the mapping further in particular the very center and outer-periphery), and (2) our ROI selection for each respective approach is in line with typical procedures used in reality. Critically, the spatial gradients in cortical contrast sensitivity are preserved across the pRF and Benson atlas approach with the different ROIs, so we believe that improvements would not alter our conclusions that Benson offers a useful alternative when pRF mapping is not possible - however, we now highlight this important difference across the two approaches in the paper.

      “With this structure-based atlas, we successfully replicated key variations in visual field function (across eccentricity and polar quadrants), although sensitivity to more subtle differences (e.g., upper versus lower quadrant anisotropy) was reduced. This reduction may partly stem from differences in ROI definitions: a manually delineated V1 mask was used for the pRF-based data, while the Benson atlas mask was used for the adjusted Benson atlas analysis. Such differences could introduce minor error beyond the atlas/pRF mapping itself due to differences in the vertices included by each mask.”

      “Importantly, the spatial gradients in cortical contrast sensitivity were preserved across both the pRF and Benson atlas approaches, indicating that minor ROI differences do not affect our conclusions. Together, these findings show that the Benson atlas remains a useful alternative when pRF mapping is not feasible.”

      R3-Recommendation 4: The patient was measured monocularly. Given the widefield stimulation setup and the fact that the blind spot is located at about 15º eccentricity, do the authors expect to measure this blind spot with the given setup?

      Does this have an influence in binocular measurements?

      AR-R3-Recommendation 4: This is an interesting point. In theory, our wide-field setup could allow for the detection of the blind spot, as located around 12-15º eccentricity. However, in our LHON patient, the visual field defect typically extends to or beyond the blind spot, making it difficult to isolate its boundary, as shown in Figure 11 (previously Figure 7). Additionally, under binocular viewing, the brain integrates inputs from both eyes to create a unified percept, which may obscure blind spots unless specific paradigms are used (e.g., binocular rivalry or dichoptic tasks). Whilst this is outside the scope of this work, our setup could be adapted to map out the blind spot or explore phenomena like binocular rivalry more directly in future research.

      R3-Recommendation 5: How stable is the presented wide-field stimulation setup? In other words, does the eye tracker still capture the eye reliably after small head movements?

      AR-R3-Recommendation 5: While small head movements can occur, these were minimized by the use of padding cushions and monitored throughout the session, and the eye tracker maintained reliable tracking throughout the sessions.

      R3-Recommendation 6: Are the shown sine-wave gratings always oriented the same? We would expect orientation tuning curves in the early visual cortex; how could this influence the results?

      AR-R3-Recommendation 6: For six of the seven control participants (C1-C6), the sinewave gratings were presented with a fixed horizontal orientation. In an updated version of the task – used for participant C7, cases of simulated eye movements, cases of artificial scotoma, and the patient – the orientation of the gratings was varied every 5 seconds among four angles (−45º, 0º, 45º, 90º) during each 15-second stimulus block.

      We acknowledge that orientation tuning in the early visual cortex could influence responses, since V1 neurons are selective for specific stimulus orientations and respond most strongly to their preferred orientation. However, we replicated the same overall pattern of results in groups tested with a single orientation and with multiple orientations. Importantly, some participants completed both versions of the task, and the contrast sensitivity patterns remained consistent across conditions. This suggests that the results we report are robust across different orientation-tuned populations for the purposes of this study. A more fine-grained investigation of orientation effects would nevertheless be an interesting direction for future work.

      “For six control participants (C1–C6), gratings were initially presented with a fixed horizontal orientation. In an updated version of the task – used for C7, cases of simulated eye movement, cases of artificial scotoma, and the LHON patient – the orientation varied every 5 s among four angles (−45º, 0º, 45º, 90º). Contrast sensitivity patterns were consistent across single and multiple-orientation conditions, including in participants who completed both versions, indicating robustness across orientation-tuned populations.”

      R3-Recommendation 7: Are pRF centers also fitted outside the stimulated 20º radius? If yes, were they masked for the analysis?

      AR-R3-Recommendation 7: During pRF model fitting, pRF centers were allowed to extend beyond the stimulated visual field, up to approximately 1.5 times the maximum stimulus eccentricity (~30°), to improve model stability near stimulus boundaries. Eccentricity was sampled on a logarithmically spaced grid defined as 2<sup>*</sup>, with 𝑥 ranging from -5 to 0.6 in steps of 0.2, and then scaled by the maximum stimulus eccentricity (20°) to express pRF centers in degrees of visual angle. This spacing approach provided finer sampling near the fovea and progressively coarser sampling at larger eccentricities, consistent with cortical magnification principles. For all subsequent analyses of cortical contrast sensitivity, pRF centers located outside the stimulated 20° eccentricity were explicitly excluded. Likewise, although the Benson atlas provides eccentricity estimates extending well beyond the stimulated range (up to ~90°), only pRF centers within 20° were included to ensure consistency across pRF based and atlas-based analyses.

      “During pRF model fitting, pRF centers were allowed to extend beyond the stimulated visual field to improve model stability near stimulus boundaries – up to approximately 1.5 times the maximum stimulus eccentricity (~30°). Eccentricity was sampled on a logarithmically spaced grid defined as 2*, with x ranging from −5 to 0.6 in steps of 0.2, and then scaled by the maximum stimulus eccentricity (20°) to express pRF centers in degrees of visual angle. This sampling scheme provided finer resolution near the fovea and progressively coarser sampling at larger eccentricities, consistent with cortical magnification principles.”

      “For all subsequent analyses of cortical contrast sensitivity, pRF centers outside the stimulated 20° eccentricity were excluded. Similarly, although the Benson atlas provides eccentricity estimates extending far beyond the stimulated range (up to ~90°), only values within 20° were retained to maintain consistency across pRF-based and atlas-based analyses.”

      R3-Recommendation 8: L212: Could the authors please clarify what "scaled across eccentricity to account for cortical magnification" means for the given stimulus?

      AR-R3-Recommendation 8: The pRF stimulus was scaled across eccentricity using a logarithmic transformation of retinal radius to approximate cortical magnification. Radial checker boundaries were defined in log eccentricity space (log(r)), resulting in an exponential increase in checker size with eccentricity (scaling factor = 3.2; ~1.37× increase per radial step). As a result, the spatial frequency content of the stimulus decreases with eccentricity (i.e., checker size increases), compensating for known changes in V1 spatial frequency preference across the visual field. This eccentricity dependent scaling inherently relies on precise fixation to stimulate the intended retinal locations, which can be difficult for patients with central vision loss and therefore motivates the use of Benson templates.

      “This scaling was implemented by applying a logarithmic transformation of retinal radius, such that radial checker boundaries were defined in log eccentricity space (log(r)), where r denotes to eccentricity relative to the fixation target). This produced an exponential increase in checker size with eccentricity (scaling factor = 3.2; ~1.37 times increase per radial step), resulting in lower spatial frequency content at larger eccentricities – consistent with known variations in V1 spatial frequency tuning. Because this eccentricity dependent scaling assumes precise fixation, it can be challenging for individuals with central vision loss, further motivating the use of Benson atlas templates in such populations.”

      R3-Recommendation 9: L213: Three runs were measured per session, were they averaged before analysis or analyzed independently? If analyzed independently, how were the individual results handled?

      AR-R3-Recommendation 9: As described in the Methods, data from all three runs were first aligned to an alignment scan that had been co-registered to the MPRAGE image – typically the scan with the fewest outlier voxels, or alternatively, a single-band reference scan in cases of misregistration. The runs were then analyzed as separate regressors in a single design matrix in SPM to account for run-specific variation - following standard recommendations for this software (Author response image 2 shows the SPM design matrix for the GLM). We did not average the runs beforehand due to differences in the order of stimulus presentation across runs. Instead, the GLM modeled each run’s specific presentation sequence to estimate condition-specific beta values, capturing the average contribution of each spatial frequency and contrast level to the BOLD response.

      Author response image 2.

      R3-Recommendation 10: L289: Did the authors check for very small pRF sizes, as SamSrf is prone to fitting many small sizes?

      AR-R3-Recommendation 10: We did not apply an explicit filter to remove very small pRF sizes; we excluded only pRFs with σ > 6.

      R3-Recommendation 11: L384: p is missing before the value.

      AR-R3-Recommendation 11: Thank you for catching this oversight. We have now added the missing p-value in the revised manuscript.

      “Post-hoc tests using Holm-Bonferroni correction show that V1 neuronal populations receiving inputs from the central visual field (0.5-4.5°) showed greater contrast sensitivity to high spatial frequency as compared to low spatial frequency stimuli (steeper slope for the 3cpd versus 0.3cpd condition: 0.5-2.5º: t(6) = 4.35, p<sub>bonf</sub> = 0.0149; 2.5-4.5º: 𝑡(6) = 3.471, p<sub>bonf</sub> = 0.0266).”

      R3-Recommendation 12: I have a very subjective comment regarding the figures. I do not really like the use of the hot colormap in this setting, as I feel it is hard to interpret high and low values.

      AR-R3-Recommendation 12: We appreciate the suggestion, but we have had many heated discussions amongst the authors about this and have moved back forth several times before settling. Hopefully the reviewer will be happy for us to stick with the author’s eventually agreed-on subjective preference although we acknowledge that it is by no means a perfect color scheme.

      R3-Recommendation 13: L474: Suddenly, a second session appears in the Results section; please report this in Methods.

      AR-R3-Recommendation 13: Please refer to our response to R3-3, where we also detail the corresponding changes made in the manuscript.

      R3-Recommendation 14: Figure 5C: are the reported results from the first session of the same subjects?

      AR-R3-Recommendation 14: That is correct. The results shown in Figure 6C (previously 5C) reflect correlations between slope estimates obtained from the 0.3 and 3cpd conditions within the same session for each subject. We have updated the panel title to “C. 0.3cpd vs 3cpd (within session)” to clarify this point.

      R3-Recommendation 15: For the classic pRF mapping (Figure 6D), the artificial scotoma shows lower contrast sensitivity within the scotoma and increased values outside its borders. In contrast, using the retinotopic template (Figure 6E), the area of increased sensitivity is shifted inside the scotoma. Can the authors please comment on this discrepancy?

      Is this shift due to systematic differences between the eccentricity values estimated during the pRF run and those derived from the template?

      If such a shift exists, is it induced by the eccentricity correction step performed?

      AR-R3-Recommendation 15: The shift inside the scotoma observed in the atlas-based analysis (Figure 9E; previously Figure 6E) compared to the pRF-based analysis (Figure 9D; previously Figure 6D) likely reflects residual inaccuracies in eccentricity estimates from the adjusted Benson atlas. While the Horton & Hoyt correction improves the alignment of eccentricity values, it does not ensure perfect matching with the pRF data. Without the Horton & Hoyt correction, the misalignment and shift of activity in the scotoma region are even more pronounced (see below).

      We have added a sentence to the Methods section to justify the applied correction. Furthermore, to illustrate the impact of misalignment and its correction on cortical sensitivity maps, we have included an additional figure in the Appendix section showcasing the effect of applying the correction to improve mapping of the artificial scotoma.

      “We initially observed inaccuracies between the template and individual retinotopy eccentricity estimates which led to substantial distortions in cortical visual field maps due to cortical magnification – especially in peripheral locations (see Figure A4 in Appendix section).”

      R3-Recommendation 16: L532: The age and mutation type of the patient are already reported in the Methods. In general, many Methods and Discussion statements are embedded within the Results section.

      AR-R3-Recommendation 16: We are aware that it is a stylistic choice to remind of method in the results and foreshadow discussion. We chose this approach to support the interpretability of the results for less specialist readers.

      R3-Recommendation 17: L636: Did the authors consider other options for estimating pRF parameters based on anatomical features, like Ribeiro et al. (2021;https://github.com/felenitaribeiro/deepRetinotopy_TheToolbox).

      AR-R3-Recommendation 17: We agree that alternative approaches to estimating pRF parameters based on anatomical features, such as the DeepRetinotopy method proposed by Ribeiro et al. (2021), are promising and worth exploring. In this study, we used the Benson atlas as a starting point, along with an adjustment of eccentricity estimates based on cortical magnification. Future work could compare the performance of different retinotopic template fitting approaches, including deep learning-based methods, to further improve anatomical alignment and functional predictions.

      “Further enhancing the alignment between retinotopic template atlases and individual retinotopic tuning could improve this approach further, for example, by integrating them with functional measures using Bayesian methods (Benson & Winawer, 2018). In parallel, geometric deep learning frameworks such as DeepRetinotopy (Ribeiro et al., 2021) could also offer anatomy-driven predictions from structural MRI, and combining these strategies may yield more accurate and generalizable retinotopic reconstructions.”

      R3-Recommendation 18: Figure A4: This figure brings up a very important point, namely, whether small eye movements reduce the accuracy of pRF and contrast sensitivity estimates. However, these experiments and results are not reported in the manuscript. I would prefer the authors to add all necessary Methods and Results, or at least not leave this Figure unexplained.

      AR-R3-Recommendation 18: We thank the reviewer for highlighting the importance of this figure. To address this point, we collected additional data and have revised the manuscript to include a dedicated section on the effects of eye movements, with corresponding updates in the Abstract, Methods, Results, and Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) First, a central claim is that arousal modulates functional connectivity in a hemispherically asymmetric and community-specific manner. Although structured asymmetries are demonstrated at the group level, it remains unclear whether these effects reflect a stable neurobiological principle or arise from high-dimensional, connection-wise analyses that are sensitive to sampling variability. Given the interpretive weight placed on hemispheric lateralization, stronger evidence of robustness and individual-level consistency would be necessary to support this conclusion.

      We appreciate your critical comments on the robustness of our lateralization findings. We fully agree with you that it is essential to demonstrate that the observed hemispheric asymmetries reflect a stable neurobiological principle rather than an artifact of sampling variability or high-dimensional noise. To address this concern, we performed two rigorous validation analyses using 500-iteration resampling schemes, consisting of a split-half reliability test and a participant-level consistency assessment.

      First, to ensure our findings do not depend on specific sample compositions, we conducted a split-half reliability test where the dataset was randomly partitioned into two independent subgroups over 500 iterations. As shown in Figure S1A, the community labels maintained high spatial consistency across iterations (as evidenced by the confusion matrix and Dice coefficient distributions), and our original findings—including network-pair community architecture (Fig. S2A), regional affiliation patterns (Fig. S3A-B), and arousal–tvFC coupling lateralization (Fig. S4A-B)—were consistently situated at the center of the iteration distributions.

      Second, to account for potential within-participant dependencies in the HCP 7T dataset, we performed a participant-level resampling analysis (N = 139). By randomly selecting a different session for each participant across 500 iterations, we confirmed that the community architecture and hemispheric biases remain robust even under this strict control (Figure S1A, S2B, S3C-D and S4C-D). Collectively, these additional analyses provide strong evidence that the hemispheric lateralization we reported is not a byproduct of sampling bias, but instead represents a stable organizational principle of the arousal-modulated connectome.

      (2) Second, all analyses are based on ultra-high-field imaging. The manuscript does not address whether the reported arousal-related patterns, including the community structure and hemispheric asymmetries, are expected to be reproducible at standard field strengths. It therefore remains unclear whether the findings depend critically on the use of high-field data or whether they would generalize to more widely available datasets, limiting the broader applicability of the results.

      We appreciate your constructive comments on the generalizability of our findings across different field strengths.

      As you noted, our primary motivation for employing 7T ultra-high-field imaging was to leverage its superior signal-to-noise ratio (SNR) and significantly enhanced BOLD sensitivity. These technical advantages were instrumental in capturing the subtle, moment-to-moment coupling between spontaneous pupillary fluctuations and tvFC—signals that might be close to the detection threshold in standard field strength environments.

      However, we fully recognize your point that 3T remains the standard in most clinical and research settings. In the revised manuscript, we have added a dedicated discussion to address this (page 21, lines 447-456):

      “Fifth, the findings reported here were derived exclusively from ultra-high-field (7T) imaging data. The superior BOLD sensitivity of 7T fMRI was instrumental in resolving the fine-scale community architecture of arousal–tvFC coupling, which involves subtle signals that may be challenging to detect at lower field strengths. Given that 3T remains the most common parameter for neuroimaging research and clinical applications, future investigations are needed to determine the extent to which these organizational principles generalize to standard field strength data. Validating these motifs in large-scale 3T datasets will be essential to establish their broader applicability across different imaging environments.”

      (3) Third, arousal-connectivity coupling is assessed using zero-lag correlations between pupil diameter and time-resolved connectivity estimates. Physiological and hemodynamic considerations suggest that pupil-linked arousal and blood-based imaging signals may exhibit systematic temporal delays. The absence of analyses examining sensitivity to such delays raises the possibility that the reported coupling patterns depend on a specific temporal alignment assumption.

      Given the inherent delay of the hemodynamic response function (HRF) and the complex temporal relationship between pupillary dynamics and neural activity, we conducted an additional lagged cross-correlation analysis to test the sensitivity of our findings. Following established frameworks for linking BOLD signals with pupillometry (Yellin et al., 2015; Gonzalez-Castillo et al., 2022; Lloyd et al., 2023), we systematically shifted the pupil time series relative to the fMRI data by -3 TR to +3 TR (-3s to +3s) and evaluated the consistency of the community architecture across these different lags using Dice coefficients.

      As shown in Figure S5, these results demonstrate that the community organization remain stable across the tested range of physiological delays. This stability indicates that the arousal-modulated communities we reported are not specific to the zero-lag assumption but instead persist throughout the physiologically plausible lag window. Consequently, our findings reflect a robust neurobiological phenomenon rather than an artifact of a specific temporal alignment.

      (4) Fourth, the estimation of time-resolved connectivity relies on a single choice of sliding-window length. The manuscript does not examine whether the reported patterns are stable across different window sizes. Given ongoing concerns about parameter dependence in time-resolved connectivity analyses, sensitivity analyses would be important to establish that the findings are not artifacts of a particular analytical choice.

      To ensure that our findings are not artifacts of a specific analytical choice, we performed an exhaustive sensitivity analysis by repeating our entire pipeline across a wide range of window lengths (30s, 35s, 60s, and 90s) and step sizes (1s, 5s, and 10s). We then employed Dice coefficients to quantify the topological similarity between these alternative configurations and our original parameters (30s window, 5s step).

      As shown in Figure S5, our results demonstrate high topological consistency, with Dice coefficients for community structures remaining consistently above 0.8 across all tested parameter combinations. These findings provide strong evidence that the arousal-modulated organizational principles we reported are inherent to the data rather than being driven by specific analytical choices in the sliding-window setup.

      (5) Finally, the identification of seven connectivity communities is a central result, yet the justification for this choice relies primarily on a single clustering quality measure. In practice, evaluation of clustering solutions typically draws on multiple complementary criteria, including measures of compactness and separation, approaches for selecting the number of clusters, and assessments of stability under resampling. Without such complementary evaluations, it is difficult to determine whether the reported community structure reflects a stable organizational feature or sensitivity to specific methodological decisions.

      We agree that relying on a single measure can be limiting, and in the revised manuscript, we have implemented a comprehensive multi-criteria evaluation to justify our selection of K=7. To ensure the robustness of the community partition, we expanded our analysis to include several complementary indices, such as the Davies-Bouldin Index, Calinski-Harabasz Score, and Silhouette Coefficient, alongside the original Within-Cluster Sum of Squares (WCSS), as detailed in Figure S7A.

      To further minimize subjective bias in "elbow" detection, we utilized the L-method (Salvador & Chan, 2004), which identifies the optimal K by minimizing the combined root-mean-square error (RMSE) of two linear regression segments. As illustrated in Figure S7B, the RMSE was minimized at K=7, providing a robust mathematical basis for our partition. Furthermore, we systematically visualized the community maps across a range of granularities from K=5 to 9 (Figure S7C). This stability analysis demonstrates that the fundamental topological features and the resulting hemispheric asymmetries are not transient artifacts of a specific K but are consistently preserved as the clustering granularity increases. These additional evaluations demonstrate that the seven-community structure reflects a stable organizational feature of arousal-modulated connectivity

      Reviewer #2 (Public review):

      (1) Arousal effects on BOLD signals and on pupil size can have different delays, so it would be valuable to test lagged relationships (for example, shifting the pupil series forward and backward) to show that the main community structure and lateralization results are not sensitive to an arbitrary temporal alignment.

      We agree with you that accounting for the varying delays between BOLD signals and pupillary dynamics is essential for ensuring the robustness of our results. We conducted a comprehensive lagged cross-correlation analysis to address it. Following established frameworks for linking BOLD signals with pupillometry (Yellin et al., 2015; Gonzalez-Castillo et al., 2022; Lloyd et al., 2023), we systematically shifted the pupil time series relative to the fMRI data by -3 TR to +3 TR (-3s to +3s) and evaluated the consistency of the community architecture across these lags using Dice coefficients.

      As shown in Figure S5C, these results demonstrate that the core community organization remain stable across the tested range of physiological delays. This stability confirms that our findings are not sensitive to an arbitrary temporal alignment but instead reflect a robust neurobiological phenomenon that persists throughout the physiologically plausible lag window.

      (2) Pupil diameter covaries with blinks, eye closure, and other factors that can covary with head motion and physiological noise. The Methods include substantial quality control and denoising, including motion regression and scrubbing, plus exclusions for eye closure.

      We appreciate your attention to these potential confounding factors. While we implemented rigorous preprocessing including regressing out confounds on fMRI images, we agree that physiological noise and motion may influenced pupil signals.

      To address this, we conducted an additional control analysis where we included head motion (framewise displacement, FD) and the global signal (defined as the mean signal across all gray matter voxels) as covariates when calculating the arousal–tvFC coupling. We then re-evaluated the similarity between the resulting community architecture and our original findings. As shown in Figure S4, the community structure remained stable after controlling for these variables.

      Regarding eye closure, we intentionally did not regress this out, as extensive literature demonstrates that eye closure is itself a reliable physiological proxy for arousal levels (Sommer & Golz, 2010; Chang et al., 2016; Gonzalez-Castillo et al., 2022); regressing it out would likely remove the very arousal-related coupling effects we aim to investigate.

      (3) The dataset is described in terms of runs retained (for example, 485 resting runs), and runs are treated as observations in clustering after z-scoring across runs. If multiple runs come from the same individuals, the manuscript would benefit from explicitly showing that results replicate at the participant level (for example, community structure stability within participant across runs, and participant-level summary statistics used for inference), rather than relying primarily on pooled run-level patterns.

      We fully agree with you that it is essential to demonstrate that the observed hemispheric asymmetries reflect a stable neurobiological principle rather than an artifact of sampling variability or high-dimensional noise. To address this concern, we performed two rigorous validation analyses using 500-iteration resampling schemes, consisting of a split-half reliability test and a participant-level consistency assessment.

      First, to ensure our findings do not depend on specific sample compositions, we conducted a split-half reliability test where the dataset was randomly partitioned into two independent subgroups over 500 iterations. As shown in Figure S1A, the community labels maintained high spatial consistency across iterations (as evidenced by the confusion matrix and Dice coefficient distributions), and our original findings—including network-pair community architecture (Fig. S2A), regional affiliation patterns (Fig. S3A-B), and arousal–tvFC coupling lateralization (Fig. S4A-B)—were consistently situated at the center of the iteration distributions.

      Second, to account for potential within-participant dependencies in the HCP 7T dataset, we performed a participant-level resampling analysis (N = 139). By randomly selecting a different session for each participant across 500 iterations, we confirmed that the community architecture and hemispheric biases remain robust even under this strict control (Figure S1A, S2B, S3C-D and S4C-D). Collectively, these additional analyses provide strong evidence that the hemispheric lateralization we reported is not a byproduct of sampling bias, but instead represents a stable organizational principle of the arousal-modulated connectome.

      (4) Time-resolved connectivity is estimated using a 30-second sliding window and 5 second step. It is reasonable to wonder whether the same conclusions hold with alternative estimators that do not rely on fixed windows. The Discussion acknowledges this limitation, but adding a small robustness analysis would make the paper more definitive.

      To ensure that our findings are not artifacts of a specific analytical choice, we performed an exhaustive sensitivity analysis by repeating our entire pipeline across a wide range of window lengths (30s, 35s, 60s, and 90s) and step sizes (1s, 5s, and 10s). We then employed Dice coefficients to quantify the topological similarity between these alternative configurations and our original parameters (30s window, 5s step).

      As shown in Figure S3, our results demonstrate high topological consistency, with Dice coefficients for community structures remaining consistently above 0.8 across all tested parameter combinations. Furthermore, the core hemispheric asymmetry patterns were robustly preserved regardless of the specific windowing configuration used. These results provide strong evidence that the arousal-modulated organizational principles we reported are inherent to the data and are stable across a broad range of temporal scales.

      Reviewer #3 (Public review):

      (1) A major limitation of the study is the limited discussion of subcortical regions, which play a central role in arousal regulation according to extensive prior literature. Although the current analyses focus primarily on cortical organization, the authors should include a brief discussion of how their findings relate to subcortical arousal systems.

      We completely agree that subcortical structures are pivotal drivers of arousal regulation. While our study primarily utilized a symmetric cortical atlas to ensure a mathematically rigorous assessment of hemispheric lateralization, we recognize that the exclusion of subcortical regions limits the functional interpretation of the observed patterns.

      In the revised manuscript, we have added a dedicated discussion part (page 20, lines 412-428) to address this point:

      “First, to ensure a mathematically rigorous assessment of hemispheric asymmetry, our analysis was restricted to a symmetric cortical parcellation. Consequently, while we demonstrate that arousal-modulated connectivity follows a structured macroscopic architecture, we did not explicitly analyze the subcortical nuclei hypothesized to drive these patterns. We hypothesize that the presence of these low-dimensional cortical communities reflects coordinated motifs rather than a homogeneous gain modulation, potentially mirroring the differentiated projection patterns of subcortical neuromodulatory systems. For instance, the locus coeruleus–noradrenergic pathway (Chandler et al., 2014; Schwarz & Luo, 2015) and thalamus (Hwang et al., 2017; Shine, 2019; Müller et al., 2020; Shine et al., 2023) possess extensive yet non-uniform projections that may anchor the community-specific and hemispherically asymmetric patterns observed here. “

      (2) While sliding window methods can capture temporal changes in functional organization, they have limitations in characterizing moment-to-moment neural fluctuations. In particular, results can be highly sensitive to window length and step size. The manuscript would benefit from (a) a clearer discussion of these methodological limitations, (b) justification for the chosen window length and step size, and (c) a sensitivity analysis demonstrating whether the main findings are robust across different parameter choices.

      To ensure that our findings are not artifacts of a specific analytical choice, we performed an exhaustive sensitivity analysis by repeating our entire pipeline across a wide range of window lengths (30s, 35s, 60s, and 90s) and step sizes (1s, 5s, and 10s). We then employed Dice coefficients to quantify the topological similarity between these alternative configurations and our original parameters (30s window, 5s step).

      As shown in Figure S5, our results demonstrate high topological consistency, with Dice coefficients for community structures remaining consistently above 0.8 across all tested parameter combinations. Furthermore, the core hemispheric asymmetry patterns were robustly preserved regardless of the specific windowing configuration used. These results provide strong evidence that the arousal-modulated organizational principles we reported are inherent to the data and are stable across a broad range of temporal scales.

      (2) The authors use k-means clustering to identify groups of brain regions and refer to these groupings as "communities." However, in general, community detection typically refers to graph-based algorithms that identify modules based on connectivity structure (e.g., modularity maximization). The clusters derived from k-means in feature space are not necessarily equivalent to graph-theoretic communities. The authors should explicitly clarify this distinction and adjust terminology accordingly to avoid conceptual ambiguity.

      We agree that the term "community detection" is often specifically associated with graph-based algorithms, such as modularity maximization, which define modules based on topological connectivity. In contrast, our implementation of k-means identifies groupings based on the similarity of arousal–FC coupling patterns within a high-dimensional feature space.

      To avoid any conceptual ambiguity or potential confusion, we have explicitly clarified this distinction in the Methods (pages 24-25, lines 533-542) section of the revised manuscript:

      “We employed the k-means clustering algorithm (Euclidean distance) to explore a range of cluster solutions from K = 2 to 15. To ensure the stability of the results and avoid local optima, each K was repeated 250 times with random initializations. The optimal number of clusters was determined by evaluating clustering quality and reproducibility (e.g., maximizing silhouette stability). It is important to clarify that "communities" in this context refer to clusters of edges that exhibit similar arousal-modulation motifs within a high-dimensional feature space, rather than topological modules typically derived from graph-theoretic algorithms like modularity maximization. This procedure consistently identified seven distinct communities, each representing a robust, arousal-sensitive connectivity motif that characterizes the large-scale organization of brain-pupil coupling.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) To strengthen confidence in the reported hemispheric effects, the authors should provide additional robustness analyses, such as subject-level consistency of lateralization measures, split-half or resampling reliability, and sensitivity to alternative preprocessing or analysis choices. Reporting the distribution of lateralization effects across individuals would help clarify whether the observed asymmetries reflect stable features or group-level averages driven by a subset of connections or participants.

      We agree that establishing the individual-level stability of lateralization is essential. We have now provided extensive validation, including split-half reliability tests and participant-level consistency analyses (500 iterations). These results confirm that the reported asymmetries are robust and consistent across the sample. Please refer to Reviewer #1 Weakness2 for the full analysis and associated figures (Figure. S1-S4).

      (2) The authors should examine whether arousal-connectivity coupling patterns are robust to plausible temporal delays between pupil diameter and BOLD signals. Lagged or time-shifted analyses would help establish that the findings do not depend on a specific zero-lag assumption.

      We agree that validating the coupling between pupil dynamics and the time varying FC is essential. To address this, we conducted a lag sensitivity analysis by shifting the pupil-derived arousal signal within a physiologically plausible range (-3 to +3 TR). The community architecture remains highly consistent across these temporal offsets, showing high spatial correlation and Dice coefficients with our original findings. This stability confirms that the identified organizational motifs are robust and not dependent on a specific zero-lag assumption. For the full details of this validation and the associated figures, please refer to Reviewer #1 Weakness3 and Figure S5 in the Supplementary Material.

      (3) Given reliance on a single sliding-window length, the authors should assess how key results vary across different window sizes. Demonstrating stability of the community structure and lateralization patterns across parameter choices would strengthen the methodological foundation of the study.

      We have conducted an exhaustive sensitivity analysis across various window lengths (30s, 35s, 60s, 90s) and step sizes (1s, 5s, 10s). The high Dice coefficients (>0.8) confirm that our findings are not dependent on specific windowing choices. Please refer to Reviewer #1 Weakness3 and Figure S5 for the full results.

      (4) The justification for the chosen number of connectivity communities would benefit from additional clustering evaluations. Complementary criteria such as measures of compactness and separation, model selection approaches for determining the number of clusters, and stability or reproducibility under resampling would help establish whether the reported community structure is robust rather than method-dependent.

      To strengthen the mathematical basis for our partition, we have implemented a multi-metric evaluation and the L-method for objective K selection. These metrics consistently support the seven-community structure. Please refer to our response to Reviewer #1 Weakness5 and Figure S7 for the comprehensive evaluation.

      (5) The manuscript would benefit from a clearer discussion of why ultra-high-field imaging was required for the present analyses and whether similar results are expected at standard field strengths. If feasible, validation using lower-field data or reference to existing datasets would substantially enhance generalizability.

      We have expanded our discussion to clarify that 7T was instrumental for capturing the subtle, high-frequency arousal-tvFC coupling due to its superior SNR. We also explicitly discuss the potential and limitations of generalizing these findings to 3T datasets. Please refer to our response to Reviewer #1 Weakness2 for the full discussion (page 21, lines 447-456).

      (6) The authors should more explicitly report exclusion related to pupil measurements and discuss how missing or noisy pupillometry may affect the applicability of the approach in other datasets or experimental settings.

      We agree that transparency in data screening is essential for the reproducibility of our method. In the revised manuscript, we have clarified our quality control pipeline in the quality control section in Methods (page 23, lines 502-510):

      “The final analyzed sample for the resting-state consisted of N = 139 healthy participants (mean age = 29.1±3.5 years, 77 female). Runs were excluded if (a) more than 20% of frames exceeded motion thresholds, (b) eye tracking did not cover the full fMRI time series, or (c) more than 90% of samples were classified as eye closure. After applying these criteria, 485 of the initial 723 scans were retained for analysis. The same quality-control pipeline was applied to the movie-watching dataset, yielding 513 usable scans out of the original 725. Detailed information on data retention and run distribution per participant is summarized in Figure S9.”

      Furthermore, we have added a discussion regarding how noisy or missing pupillary signals might affect the generalizability of our approach (pages 20-21, lines 437-447):

      “Fourth, the generalizability of our approach to external cohorts warrants caution regarding pupillary data integrity. In contexts where high-fidelity eye-tracking is technically demanding—such as in clinical settings involving patients with restricted compliance or in naturalistic fMRI studies—the prevalence of blink artifacts and signal dropouts may bias the estimation of arousal-modulated states. Excessive reliance on data interpolation in such cases could artificially smooth temporal fluctuations, leading to an overestimation of community stability. Future applications should therefore prioritize high-frequency sampling and potentially incorporate multi-modal physiological features (e.g., respiratory or cardiac signals) to cross-validate arousal dynamics when pupillary data is suboptimal (Meissner et al., 2023; Bolt et al., 2025; Weijs et al., 2025).”

      (7) The authors should ensure that all data and analysis code necessary to reproduce the results are made publicly available in accordance with eLife policies, including clear documentation of preprocessing steps, parameter choices, and clustering procedures.

      All analysis code and the necessary processed data required to reproduce our findings have been made publicly available through https://github.com/kongxy6478/Arousal-modulates-functional-connectivity. This repository includes documented pipelines for pupillometry cleaning and fMRI denoising, alongside the core Python scripts used for sliding-window connectivity calculation, k-means clustering, and hemispheric lateralization analysis.

      Reviewer #2 (Recommendations for the authors):

      (1) Add a lag sensitivity analysis between pupil-derived arousal and time-resolved connectivity, and report whether the seven community structure and key lateralization findings are stable across a plausible lag range.

      We agree that validating the coupling between pupil dynamics and the time varying FC is essential. To address this, we conducted a lag sensitivity analysis by shifting the pupil-derived arousal signal within a physiologically plausible range (-3 to +3 TR). The community architecture remains highly consistent across these temporal offsets, showing high spatial correlation and Dice coefficients with our original findings. This stability confirms that the identified organizational motifs are robust and not dependent on a specific zero-lag assumption. For the full details of this validation and the associated figures, please refer to Reviewer #1 Weakness3 and Figure S5 in the Supplementary Material.

      (2) Quantify and report the extent to which residual head motion, blink rate, eye closure segments, and global signal changes explain arousal connectivity coupling, for example, via partial correlation or regression controls, and show that key effects persist.

      We agree that it is essential to demonstrate that the observed arousal-connectivity coupling is not driven by non-specific physiological or motion-related artifacts. As requested, we have quantified the influence of head motion (FD) and global signal on our primary results. By implementing partial correlation analyses, we confirmed that the identified arousal-modulated community structures persist even after strictly controlling for these variables. These results indicate that the arousal-tvFC coupling we report reflects a specific neuro-arousal process rather than a byproduct of motion or systemic physiological fluctuations. For the detailed quantitative results and control analysis figures, please refer to our response to Reviewer #2 Weakness3 and Figure S6 in the Supplementary Material.

      (3) Add participant-level validation: demonstrate that community profiles and lateralization signatures are consistent within participants across runs, and consider participant-level statistical summaries rather than treating all runs as independent observations.

      We agree that demonstrating participant-level consistency is vital. In response, we performed two rigorous 500-iteration resampling schemes: a split-half reliability test and a participant-level consistency assessment (N = 139). These analyses, which involved randomly partitioning the sample and selecting single sessions per participant, confirm that our community architecture and hemispheric biases are remarkably stable and not driven by sampling variability or high-dimensional noise. For a comprehensive description of these validations and the associated statistical distributions, please refer to our detailed response to Reviewer #2 Weakness3 and Figures S1–S4.

      (4) Provide an alternative dynamic connectivity estimator robustness check, or at a minimum, vary the window length and step size to show stability of the primary conclusions.

      We have conducted an exhaustive sensitivity analysis across various window lengths (30s, 35s, 60s, 90s) and step sizes (1s, 5s, 10s). The high Dice coefficients (>0.8) confirm that our findings are not dependent on specific windowing choices. Please refer to Reviewer #1 Weakness3 and Figure S5 for the full results.

      (5) Consider validating the seven community solutions with at least one additional unsupervised approach, and report agreement with the main k-means solution.

      We agree that validating the clustering scheme is essential. To this end, we implemented a multi-criteria evaluation (including Davies-Bouldin and Silhouette indices) and utilized the L-method (Salvador & Chan, 2004) to mathematically confirm K=7 as the optimal granularity (Figure S7A–B). Furthermore, we verified that the core topological features and hemispheric asymmetries remain robustly consistent across a range of granularities from K=5 to 9 (Figure S7C). These analyses demonstrate that our findings are not dependent on a specific K or subjective bias. For the full quantitative evaluation and stability maps, please refer to our response to Reviewer #2 Weakness5 and Figure S7.

      (6) State explicitly, early in Results, what the main inferential unit is (run or participant) for each key analysis, and clarify how repeated runs per participant are handled.

      We agree that defining the inferential unit is critical for methodological clarity. In the revised manuscript, we have explicitly stated at the beginning of the Results section (page 5, lines 113-116):

      “While our primary inferential analyses were conducted at the run level to leverage the high-density sampling of the HCP 7T dataset, we further validated the robustness of these findings using participant-level statistical summaries and resampling to account for within-participant dependencies (see Figure. S1-S2 in Supplementary Materia).”

      Specifically, all key findings—including community architecture and hemispheric asymmetries—were validated using participant-level statistics and resampling schemes (N = 139) to ensure that the results are not biased by within-participant dependencies.

      (7) When introducing the integration and segregation indices, add a brief intuitive explanation of what a positive or negative value means in plain language before the equations.

      We thank the reviewer for this suggestion to improve the accessibility of our methods. We have added brief, intuitive explanations for both indices in the Methods section (pages 26-27, lines 569-582):

      “The integration index provides a measure of the overall hemispheric dominance of arousal-modulated connections. A positive value indicates that arousal-related edges are preferentially concentrated in the left hemisphere (including its internal and outgoing connections) compared to the right.” and “The segregation index assesses whether arousal preferentially modulates local, intra-hemispheric communication versus long-range, inter-hemispheric communication. A positive value reflects a "segregated" left-hemisphere bias, where arousal strengthens within-hemisphere connections more than it strengthens across-hemisphere communication for that same hemisphere. “

      (8) In the Discussion, separate claims into "what we show" versus "what we hypothesize," especially when connecting findings to neuromodulatory pathways.

      In the revised manuscript, we have carefully separated our direct empirical findings from our mechanistic hypotheses. we have utilized more cautious and speculative language (e.g., "suggesting a potential role of," "may be mediated by," and "we hypothesize that”) (page 17, lines 352-358):

      “Specifically, we show the presence of low-dimensional, reproducible communities suggests that arousal modulates the connectome through coordinated motifs rather than homogeneous gain modulation. We hypothesize that this structured macroscopic architecture reflects the differentiated projection patterns of subcortical neuromodulatory systems, such as the locus coeruleus–noradrenergic pathway (Aston-Jones & Cohen, 2005; Jordan, 2024) and thalamus (Magnin et al., 2010; Lewis et al., 2015; Liu et al., 2018)”

      (9) Provide a clear participant-level summary (number of participants contributing to the retained runs, demographics if available, and distribution of runs per participant), alongside the reported run counts retained after quality control.

      We agree that clear reporting of participant-level data is essential. In the revised Methods section, we have added a detailed summary of participant demographics (age and sex) and clarified the sample composition (page 23, lines 502-503):

      “The final analyzed sample for the resting-state consisted of N = 139 healthy participants (mean age = 29.1±3.5 years, 77 female).”

      Furthermore, to provide a transparent view of the data retained after quality control, we have included Figure S9 to illustrate the distribution of valid runs per participant. This visualization confirms the amount of data contributing to our group-level inferences and accounts for exclusions due to motion or pupillary signal quality.

      (10) Report the robustness of results to reasonable changes in pupil preprocessing choices (for example, smoothing parameters or interpolation rules), since pupil diameter is the key arousal index.

      We agree that the robustness of pupil-derived arousal estimates is fundamental to our findings. To address this, we conducted an extensive validation analysis by comparing our original pupil preprocessing pipeline against 18 alternative combinations of parameters. These variations included different smoothing window sizes (100 ms, 200 ms, and 500 ms), interpolation methods (linear vs. cubic spline), and blink buffer durations (25 ms, 50 ms, and 100 ms). As shown in Figure S8, the pupil diameter time courses derived from these diverse pipelines remained highly correlated with our original estimates (all above 0.65). This demonstrates that our arousal-modulated connectivity results are remarkably robust to reasonable changes in pupil preprocessing choices.

      Reviewer #3 (Recommendations for the authors):

      I have two additional minor comments:

      (1) Given the overall goal of this study to identify large-scale brain communities or clusters underlying arousal, the results may be sensitive to the choice of cortical parcellation. The authors should consider:

      (a) including analyses using additional parcellation schemes, or

      (b) discussing how the current findings might depend on the chosen parcellation and the implications for robustness and generalizability.

      We have addressed this by adding a dedicated point in the Discussion (page 21, lines 456-465):

      “Sixth, our findings were derived using a single high-resolution cortical parcellation. While the specific choice of atlas can influence fine-grained regional connectivity, it is important to note that our primary conclusions—such as hemispheric asymmetries and community-level preferences—were identified and interpreted at the macroscopic network and system level. By aggregating signals across broad functional systems, this approach likely mitigates the dependency on precise regional boundary definitions. Nevertheless, future studies employing alternative parcellation schemes would be valuable to further confirm that these organizational principles are not specific to the current atlas but represent a generalizable feature of the arousal-modulated connectome.”

      (2) Some key details, such as the number of participants included in the study, as well as basic demographic information, are not reported.

      We apologize for this omission. In the revised Methods section, we have now included a detailed summary of the participant demographics, including the final sample size (N = 139), age, and sex distribution (page 23, lines 502-503):

      “The final analyzed sample for the resting-state consisted of N = 139 healthy participants (mean age = 29.1±3.5 years, 77 female)”

      Furthermore, to ensure full transparency regarding data retention, we have added a new figure (Figure S9) illustrating the distribution of valid fMRI runs per participant following our quality-control procedures. We believe these additions provide a clear and complete overview of the study sample.

      Reference

      Aston-Jones, G., & Cohen, J. D. (2005). AN INTEGRATIVE THEORY OF LOCUS COERULEUS-NOREPINEPHRINE FUNCTION: Adaptive Gain and Optimal Performance. In Annual Review of Neuroscience (Vol. 28, Issue Volume 28, 2005, pp. 403–450). Annual Reviews. https://doi.org/10.1146/annurev.neuro.28.061604.135709

      Bolt, T., Wang, S., Nomi, J. S., Setton, R., Gold, B. P., deB.Frederick, B., Yeo, B. T. T., Chen, J. J., Picchioni, D., Duyn, J. H., Spreng, R. N., Keilholz, S. D., Uddin, L. Q., & Chang, C. (2025). Autonomic physiological coupling of the global fMRI signal. Nature Neuroscience, 28(6), 1327–1335. https://doi.org/10.1038/s41593-025-01945-y

      Chandler, D. J., Gao, W.-J., & Waterhouse, B. D. (2014). Heterogeneous organization of the locus coeruleus projections to prefrontal and motor cortices. Proceedings of the National Academy of Sciences, 111(18), 6816–6821. https://doi.org/10.1073/pnas.1320827111

      Chang, C., Leopold, D. A., Schölvinck, M. L., Mandelkow, H., Picchioni, D., Liu, X., Ye, F. Q., Turchi, J. N., & Duyn, J. H. (2016). Tracking brain arousal fluctuations with fMRI. Proceedings of the National Academy of Sciences, 113(16), 4518–4523. https://doi.org/10/f8ktgg

      Gonzalez-Castillo, J., Fernandez, I. S., Handwerker, D. A., & Bandettini, P. A. (2022). Ultra-slow fMRI fluctuations in the fourth ventricle as a marker of drowsiness. NeuroImage, 259, 119424. https://doi.org/10.1016/j.neuroimage.2022.119424

      Hwang, K., Bertolero, M. A., Liu, W. B., & D’Esposito, M. (2017). The Human Thalamus Is an Integrative Hub for Functional Brain Networks. The Journal of Neuroscience, 37(23), 5594–5607. https://doi.org/10.1523/JNEUROSCI.0067-17.2017

      Jordan, R. (2024). The locus coeruleus as a global model failure system. Trends in Neurosciences, 47(2), 92–105. https://doi.org/10.1016/j.tins.2023.11.006

      Lewis, L. D., Voigts, J., Flores, F. J., Schmitt, L. I., Wilson, M. A., Halassa, M. M., & Brown, E. N. (2015). Thalamic reticular nucleus induces fast and local modulation of arousal state. eLife, 4, e08760. https://doi.org/10.7554/eLife.08760

      Liu, X., De Zwart, J. A., Schölvinck, M. L., Chang, C., Ye, F. Q., Leopold, D. A., & Duyn, J. H. (2018). Subcortical evidence for a contribution of arousal to fMRI studies of brain activity. Nature Communications, 9(1), 395. https://doi.org/10.1038/s41467-017-02815-3

      Lloyd, B., De Voogd, L. D., Mäki-Marttunen, V., & Nieuwenhuis, S. (2023). Pupil size reflects activation of subcortical ascending arousal system nuclei during rest. eLife, 12, e84822. https://doi.org/10.7554/eLife.84822

      Magnin, M., Rey, M., Bastuji, H., Guillemant, P., Mauguière, F., & Garcia-Larrea, L. (2010). Thalamic deactivation at sleep onset precedes that of the cerebral cortex in humans. Proceedings of the National Academy of Sciences, 107(8), 3829–3833. https://doi.org/10.1073/pnas.0909710107

      Meissner, S. N., Bächinger, M., Kikkert, S., Imhof, J., Missura, S., Carro Dominguez, M., & Wenderoth, N. (2023). Self-regulating arousal via pupil-based biofeedback. Nature Human Behaviour, 8(1), 43–62. https://doi.org/10.1038/s41562-023-01729-z

      Müller, E. J., Munn, B., Hearne, L. J., Smith, J. B., Fulcher, B., Arnatkevičiūtė, A., Lurie, D. J., Cocchi, L., & Shine, J. M. (2020). Core and matrix thalamic sub-populations relate to spatio-temporal cortical connectivity gradients. NeuroImage, 222, 117224. https://doi.org/10.1016/j.neuroimage.2020.117224

      Salvador, S., & Chan, P. (2004). Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. 16th IEEE International Conference on Tools with Artificial Intelligence, 576–584. https://doi.org/10.1109/ICTAI.2004.50

      Schwarz, L. A., & Luo, L. (2015). Organization of the Locus Coeruleus-Norepinephrine System. Current Biology, 25(21), R1051–R1056. https://doi.org/10.1016/j.cub.2015.09.039

      Shine, J. M. (2019). Neuromodulatory Influences on Integration and Segregation in the Brain. Trends in Cognitive Sciences, 23(7), 572–583. https://doi.org/10.1016/j.tics.2019.04.002

      Shine, J. M., Lewis, L. D., Garrett, D. D., & Hwang, K. (2023). The impact of the human thalamus on brain-wide information processing. Nature Reviews Neuroscience, 24(7), 416–430. https://doi.org/10.1038/s41583-023-00701-0

      Sommer, D., & Golz, M. (2010). Evaluation of PERCLOS based current fatigue monitoring technologies. 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, 4456–4459. https://doi.org/10.1109/IEMBS.2010.5625960

      Weijs, M. L., Missura, S., Potok-Szybińska, W., Bächinger, M., Badii, B., Carro-Domínguez, M., Wenderoth, N., & Meissner, S. N. (2025). Modulating cortical excitability and cortical arousal by pupil self-regulation. Nature Communications, 16(1), 4552. https://doi.org/10.1038/s41467-025-59837-5

      Yellin, D., Berkovich-Ohana, A., & Malach, R. (2015). Coupling between pupil fluctuations and resting-state fMRI uncovers a slow build-up of antagonistic responses in the human cortex. NeuroImage, 106, 414–427. https://doi.org/10.1016/j.neuroimage.2014.11.034

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) Figure 1A and B: Although a trend is evident, it does not appear that the absolute number of cNK cells at day 14 is significantly changed from day 6.5?

      We thank the reviewer for this careful observation. We had not originally performed a statistical comparison between the number of cNK cells present at gds 6.5 and 14.5. We have now conducted the appropriate statistical analysis for this dataset and found that the absolute number of cNK cells at day 14.5 is in fact significantly different from day 6.5 (p = 0.0005; unpaired t test, Mann-Whitney correction). The figure and corresponding legend have been updated to reflect this analysis. Please see Figure 1B:

      “Statistics were calculated using unpaired t tests with the Mann-Whitney correction. Error bars indicate SEM; *** p < 0.001.”

      (2) Figure 2E: The authors state, "This reduction of uterine trNK cells was accompanied by a concomitant increase in the absolute number and frequency of CD49b+Eomes+ cNK cells within the pregnant uterus of TGF-βRIINcr1Δ dams (Figure 2 D, E). The number of cNK cells appears relatively low (visually ~1,000-1,300), and although the difference is statistically significant, its physiological relevance is unclear. More importantly, this modest increase does not correlate with the marked decrease in trNK and ILC1 populations, as cNK cells do not appear to accumulate. In my opinion, the conclusion "Collectively, these findings indicate that a TGF-β-driven differentiation pathway directs the conversion of peripheral cNK cells into uterine trNK cells during murine pregnancy" should be slightly toned down.

      We thank both reviewers for this suggestion. Regarding the absence of cNK cell accumulation in the absence of TGF-β signaling, we suggest that this may be related to the normal passage of cNK cells circulating in the placenta, i.e., these cells may not have acquired signals to remain in the uterus and are simply continuing to pass through and not accumulating. Nonetheless, we have rephrased our wording in to address this concern as follows:

      “This reduction of uterine trNK cells was accompanied by a small increase in the absolute number and frequency of CD49b<sup>+</sup> Eomes<sup>+</sup> cNK cells within the pregnant uterus of TGF-βRII<sup>Ncr1∆</sup> dams (Figure 2 D, E). Collectively, these findings suggest that a TGF-β–driven differentiation pathway directs the conversion of peripheral cNK cells into uterine trNK cells during murine pregnancy.”

      “The absence of cNK cell accumulation in the gravid uterus in the setting of impaired TGF-β signaling suggests a defect in tissue retention rather than recruitment. In the absence of TGF-β–mediated cues, circulating cNK cells that enter the uterine vasculature may fail to acquire the molecular programs required for residency and instead continue to transit through the tissue. This is consistent with a model in which TGF-β signaling promotes not only phenotypic conversion but also the acquisition of retention signals necessary for persistence within the uterine microenvironment, reinforcing that acquisition of tissue-residency in the gravid uterus is an actively instructed process [29,32].”

      (3) Figures 2-4: It is unclear whether the littermate controls are floxed mice or floxhet-Ncr1iCre mice? This distinction is important, as Ncr1iCre expression itself could potentially lead to a phenotype.

      To address these concerns, we characterized the uterine innate lymphoid cell compartment in the pregnant uterus of Ncr1<sup>icre</sup> dams at gestational day 6.5. We did not observe a difference in the absolute number and frequency of trNK cells, cNK cells, and ILC1s in the gravid uterus of Ncr1<sup>icre</sup> dams compared to wildtype CD45.1 C57BL/6 mice. Additionally, the number of implantation sites and resorption rates in Ncr1<sup>icre</sup> dams was comparable to wildtype CD45.1 C57BL/6 mice. Together these data indicate that Ncr1<sup>icre</sup> expression itself does not influence the phenotype we report in TGF-βRII<sup>Ncr1∆</sup> dams. These additional findings have been included in Supplementary Figure 1 and in the text as follows:

      “To ensure we exclude a confounding effect of Ncr1<sup>iCre</sup> expression, we profiled the uterine innate lymphoid compartment in pregnant Ncr1<sup>iCre</sup> dams at gestational day 6.5. No differences were observed in the absolute number of trNK cells, cNK cells, or ILC1s relative to wildtype controls (Figure S1 A-D), and implantation site number and resorption rates were likewise unchanged (Figure S1 E-F). These data indicate that Ncr1<sup>iCre</sup> expression alone does not perturb uterine ILC composition or early pregnancy outcomes.”

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1C &D: The adoptive transfer experiment is convincing. As a minor point, why is the gate setting for Eomes different between panels 1C and 1D?

      To clarify the phenotype of the adoptively transferred cNK cells, we included two additional gates depicting the expression of CD49a and CD49b in unlabeled (non-vascular) trNK cells and cNK cells in the pregnant uterus Please see the revised Figure 1C and revised figure legend:

      “(C) Concatenated flow plots of implantation sites showing that adoptively transferred cNK cells in pregnant uterus of wildtype dams upregulate CD49a and down regulate CD49b by gd 10.5, acquiring a CD49a<sup>+</sup> CD49b<sup>-</sup> Eomes<sup>+</sup> phenotype characteristic of uterine trNK cells (C57BL/6 dams n=4). Here, 2.5x10<sup>6</sup> CD45.2<sup>+</sup> CD3<sup>-</sup> CD19<sup>-</sup> NK1.1<sup>+</sup> NKp46<sup>+</sup> CD49b<sup>+</sup> splenic cNK cells were adoptively transferred into pregnant C57BL/6-CD45.1 dams at gd 0.5, and the receptor profile of these cells was subsequently assessed at gd 10.5. Gating strategy: Live, Single Cells; CD3<sup>-</sup> CD19<sup>-</sup> CD45.1<sup>-</sup> CD45.2–PE-Cy7<sup>-</sup> CD45.2–PE<sup>+</sup> NK1.1<sup>+</sup> NKp46<sup>+</sup> cells.”

      (2) Figure 3: Has the pup ratio male/female changed?

      We did not observe a statistically significant difference in the female-to-male pup ratio between groups.

      Reviewer #2 (Public review):

      (1) The authors suggest cNK extravasation and local differentiation into iv- trNK. Can it be estimated how much this process contributes to the trNK pool vs. a potential local proliferation of already existing trNK? How do absolute numbers of CD49a+ Eomes+ trNK change during pregnancies? (In Figure 1A, the cell numbers of CD49a+ Eomes+ trNK seem to go down dramatically between gd 6.5 and 14.5). The plot in 1B could also include absolute numbers of ILC1s and trNKs. Would recruited cNK cells compensate for a potential loss of CD49a+ Eomes+ trNK?

      Our prior work as well as others have tracked the changes in uterine trNK cells, cNK cells, and ILC1s over the course of murine pregnancy. Consistent with these studies, the absolute number of uterine CD49a<sup>+</sup> Eomes<sup>+</sup> trNK cells peaks during early pregnancy (roughly between gds 5.5 7.5) and subsequently declines until term. The decrease in uterine trNK cells between gd 6.5 and gd 14.5 observed in Figure 1A is therefore consistent with the known physiological contraction of the decidual NK compartment as pregnancy progresses. Thus, it is unlikely that cNK cells recruited within the uterine tissue compensate for the loss of CD49a<sup>+</sup> Eomes<sup>+</sup> trNK cells observed. To address the reviewer’s request, we have now included the absolute number of uterine trNK cells and ILC1s in Figure 1–please see updated Figure 1C and D and corresponding figure legend (provided below). With respect to the relative contribution of cNK cells extravasation vs local proliferation of trNK cells, our data do not allow us to quantitatively distinguish between these mechanisms. Moreover, previous studies have demonstrated that uterine trNK cells express Ki67, suggesting that they exhibit proliferative activity during this period. Thus, we hypothesize that both local proliferation of existing trNK cells and recruitment of circulating cNK cells contribute to the population of uterine trNK cells during early pregnancy.

      “(C) Concatenated flow plots of implantation sites showing that adoptively transferred cNK cells in pregnant uterus of wildtype dams upregulate CD49a and down regulate CD49b by gd 10.5, acquiring a CD49a<sup>+</sup> CD49b<sup>-</sup> Eomes<sup>+</sup> phenotype characteristic of uterine trNK cells (C57BL/6 dams n=4). Here, 2.5x10<sup>6</sup> CD45.2<sup>+</sup> CD3<sup>-</sup> CD19<sup>-</sup> NK1.1<sup>+</sup> NKp46<sup>+</sup> CD49b<sup>+</sup> splenic cNK cells were adoptively transferred into pregnant C57BL/6-CD45.1 dams at gd 0.5, and the receptor profile of these cells was subsequently assessed at gd 10.5. Gating strategy: Live, Single Cells; CD3<sup>-</sup> CD19<sup>-</sup> CD45.1<sup>-</sup> CD45.2–PE-Cy7<sup>-</sup> CD45.2–PE<sup>+</sup> NK1.1<sup>+</sup> NKp46<sup>+</sup> cells. (D) Proportion of uterine ILC subsets derived from adoptively transferred splenic cNK cells in the pregnant uterus of wildtype dams. Statistics were calculated using unpaired t tests with the Mann-Whitney correction. Error bars indicate SEM; ***p < 0.001.”

      Barahona, J.D., Yang, L. and Yokoyama, W.M., 2025. Eomesodermin defines uterine NK cells crucial for pregnancy success in mice. The Journal of Immunology, 214(10), pp.2549-2556.

      Filipovic, I., Chiossone, L., Vacca, P., Hamilton, R.S., Ingegnere, T., Doisne, J.M., Hawkes, D.A., Mingari, M.C., Sharkey, A.M., Moretta, L. and Colucci, F., 2018. Molecular definition of group 1 innate lymphoid cells in the mouse uterus. Nature Communications, 9(1), p.4492.

      (2) Figure 1C: 2.5 Mio cNK cells have been transferred, but only very few cells can be detected within the uterus (concatenated FACS plot shown). What may represent the limit to generate uterine trNK out of cNK? Is the niche supporting cNK-trNK differentiation limited? Is it only a specific subset of (splenic) cNK capable of differentiating into trNK? Is gd 0.5 the optimal timepoint for the transfer? Is there continuous recruitment of cNK into the uterus and differentiation into trNK, or is it enhanced at specific timepoints of pregnancy? Could there be local proliferation of cNK-derived trNK? This could be studied by proliferation dye dilution of WT cNK cells in this transfer-setup.

      We recognize that transferring cNK cells at gestational day 0.5–prior to placental formation–may partially account for the low uterine reconstitution observed. At this time point, the local signals necessary for efficient recruitment and retention of cNK cells in the uterus may not yet be fully established, potentially resulting in preferential homing to peripheral tissues such as the spleen and liver. Consistent with this possibility, we do observe a robust population of adoptively transferred cNK cells in the spleen and liver of our pregnant dams. We decided to transfer cNK cells at gestational day 0.5 to ensure that the cells were present at throughout most of early pregnancy, particularly during implantation and the initial stages of decidualization. We also did not transfer cells before mating to minimize the number of mice that did not get pregnant. Additionally, performing the transfer at this early time point minimized repeated manipulation of pregnant dams, as procedural stress itself has been shown to affect physiological processes of gestation and could thereby confound the pregnancy outcomes we were assessing. Furthermore, Filipovic et al. 2018 previously showed that both trNK cells and cNK cells in the pregnant uterus expressed Ki67 at gestational 9.5, suggesting that there could be local proliferation of cNK-derived trNK cells in the gravid uterus that could limit the migration of circulating cNK cells into this microenvironment. We have discussed in more depth in our discussion section as follows:

      “Interestingly, the inability to fully reconstitute the uterine trNK cell compartment following adoptive transfer suggests that only a subset of circulating cNK cells may be capable of differentiating into trNK cells during pregnancy, or alternatively that trNK cells already present in the virgin uterus may undergo in situ proliferation in the gravid uterus. Previous studies from our lab as well as others show that trNK cells within the pregnant murine uterus express marked levels of Ki67, supporting a model in which local proliferation of uterine trNK cells is a major contributor to the uterine trNK cell pool during pregnancy [7,32]. Prior studies have also described hematopoietic precursors within endometrial and decidual tissues that generate uterine trNK cells, suggesting that the compartment may be also sustained by local precursor differentiation [33-35]. Together, these findings suggest that uterine trNK cell ontogeny may be more complex than a single-source model and raise the possibility that distinct developmental pathways may operate at different stages of reproductive life. Therefore, defining the relative contribution and developmental timing of hematogenous versus locally maintained sources in vivo could provide relevant insights into the developmental trajectories and transcriptional programs that underlie decidual NK cell heterogeneity.”

      Zhai, Q.Y., Wang, J.J., Tian, Y., Liu, X. and Song, Z., 2020. Review of psychological stress on oocyte and early embryonic development in female mice. Reproductive Biology and Endocrinology, 18(1), p.101.

      Wiebold, J.L., Stanfield, P.H., Becker, W.C. and Hillers, J.K., 1986. The effect of restraint stress in early pregnancy in mice. Reproduction, 78(1), pp.185-192.

      Sánchez-Rubio, M., Abarzúa-Catalán, L., Del Valle, A., Méndez-Ruette, M., Salazar, N., Sigala, J., Sandoval, S., Godoy, M.I., Luarte, A., Monteiro, L.J. and Romero, R., 2024. Maternal stress during pregnancy alters circulating small extracellular vesicles and enhances their targeting to the placenta and fetus. Biological Research, 57(1), p.70.

      Filipovic, I., Chiossone, L., Vacca, P., Hamilton, R.S., Ingegnere, T., Doisne, J.M., Hawkes, D.A., Mingari, M.C., Sharkey, A.M., Moretta, L. and Colucci, F., 2018. Molecular definition of group 1 innate lymphoid cells in the mouse uterus. Nature Communications, 9(1), p.4492.

      (3) The authors should consider inducible Tgfbr2 deletion (e.g. with Tamoxifen-inducible Cre) to enable development of the uterine NK compartment in virgin mice and only ablate trNK differentiation during pregnancy. This could help to estimate the turnover of cNK into trNK, or to understand if constant cNK recruitment is required to form the uterine trNK compartment during pregnancy.

      Thank you for this suggestion. We did initially consider incorporating a mouse model with a tamoxifen-inducible deletion of the TGF-βRII to examine the differentiation of peripheral cNK cells into uterine trNK cells more precisely. However, the administration of tamoxifen during murine pregnancy has well-established deleterious effects on implantation, fetal viability, and placentation, which would confound our interpretations of any adverse pregnancy outcome observed in our studies. Because our goal was to assess NK cell-specific contributions to murine gestation without introducing additional pregnancy-related perturbations, we elected to use an Ncr1<sup>iCre</sup> – based mouse model in our studies.

      Ved, N., Curran, A., Ashcroft, F.M. and Sparrow, D.B., 2019. Tamoxifen administration in pregnant mice can be deleterious to both mother and embryo. Laboratory animals, 53(6), pp.630-633.

      Sun, M.R., Steward, A.C., Sweet, E.A., Martin, A.A. and Lipinski, R.J., 2021. Developmental malformations resulting from high-dose maternal tamoxifen exposure in the mouse. PLoS One, 16(8), p.e0256299.

      Ilchuk, L.A., Stavskaya, N.I., Varlamova, E.A., Khamidullina, A.I., Tatarskiy, V.V., Mogila, V.A., Kolbutova, K.B., Bogdan, S.A., Sheremetov, A.M., Baulin, A.N. and Filatova, I.A., 2022. Limitations of tamoxifen application for in vivo genome editing using Cre/ERT2 system. International Journal of Molecular Sciences, 23(22), p.14077.

      (4) Did the authors consider transfer of Tgfbr2-floxed Ncr1-Cre cNK in the same setup as in Fig. 1C? This experiment could confirm the requirement of Tgfbr-dependent signaling for cNK to trNK conversion during pregnancy versus effects of Tgfb signals on trNK numbers in the uterus at steady state (before pregnancy).

      We thank the reviewer for this mechanistically insightful suggestion. We did consider performing reciprocal transfer experiments using TGF-βRII<sup>fl/fl</sup> Ncr1<sup>icre</sup> cNK cells in the same adoptive transfer system as in Figure 1C. Our current adoptive transfer experiments already directly address this question. Transfer of congenically labeled wild-type splenic cNK cells into TGF-βRII<sup>Ncr1Δ</sup> dams at gestational day 0.5 resulted in partial reconstitution of the uterine trNK compartment and, importantly, this was sufficient to rescue the adverse pregnancy outcomes observed at midgestation. These findings indicate that TGF-β–competent cNK cells can differentiate and function appropriately within the pregnant uterine environment, supporting a requirement for TGF-β–dependent signaling in cNK-to-trNK conversion during pregnancy. Because restoration of TGF-β–sufficient cNK cells rescues these pregnancy outcomes, we believe this experiment functionally demonstrates the importance of TGF-β signaling in this process and therefore did not pursue reciprocal transfer of TGF-βRII–deficient cNK cells.

      “Partial reconstitution of uterine trNK cells restores midgestational pregnancy outcomes in TGF-βRII<sup>Ncr1∆</sup> dams

      To determine whether restoring uterine trNK cells could rescue the midgestational pregnancy defects observed in TGF-βRII<sup>Ncr1∆</sup> dams, we adoptively transferred wildtype, congenically labeled splenic cNK cells into pregnant TGF-βRII<sup>Ncr1∆</sup> dams at gd 0.5. By gd 10.5, donor cNK cells were detected in the pregnant uterus, where a subset upregulated CD49a and downregulated CD49b, consistent with acquisition of a uterine trNK cell phenotype (Figure 5 A). However, adoptively transferred splenic cNK cells only partially reconstituted the uterine trNK cell population in the gravid uterus of TGF-βRII<sup>Ncr1∆</sup> dams, as evidenced by reduced absolute number and frequency of donor-derived trNK cells in reconstituted TGF-βRII<sup>Ncr1∆</sup> dams (Figure 5 A-C). Notably, this partial reconstitution was sufficient to rescue the gestational defects caused by impaired TGF-β–mediated uterine trNK cell differentiation. Reconstituted TGF- βRII<sup>Ncr1∆</sup> dams exhibited implantation site numbers and fetal resorption rates at gd 10.5 comparable to those observed in littermate controls (Figure 5 D, E). Together, these findings suggest that even partial restoration of the uterine trNK cell in pregnant TGF-βRII<sup>Ncr1∆</sup> dams is sufficient to restore pregnancy outcomes at midgestation, supporting a central role for uterine trNK cells as the principal NK cell subset required for successful murine pregnancy.”

      (5) Figures 2D/E: The authors should state that ILC1s are reduced in the virgin uterus of female Tgfbr2-floxed or Tgfb1-floxed Ncr1-Cre mice and cite the relevant work (the Ref #29 discussed in this context did not show that?). It would be helpful to include an analysis of all three uterine ILC subsets in steady state. This could help to answer the question if the cNK cell changes are pregnancy-specific or a general phenomenon in Tgfbr2-floxed Ncr1-Cre mice.

      We thank the reviewer for this important comment and for noting the miscitation. We regret the error and have corrected the reference in the revised manuscript to cite the appropriate study demonstrating reduced ILC1s in the virgin uterus of Tgfb1<sup>fl/fl</sup> Ncr1<sup>iCre</sup> mice {Sparano, C. et al. 2024. Autocrine TGF-β1 drives tissue-specific differentiation and function of resident NK cells. Journal of Experimental Medicine, 222(3), p.e20240930}. Please see Line 148. Importantly, the steady-state ILC compartment in virgin Tgfb1<sup>fl/fl</sup> Ncr1<sup>iCre</sup> mice has already been carefully characterized in the previously published work, including analysis of all three uterine ILC subsets. Because the steady-state uterine ILC landscape in this mouse model has already been established by Sparano, C. et al. 2024, our study focuses specifically on the pregnancy-associated changes in the uterine ILC landscape occurring in the absence of TGF-β signaling in Ncr1-expressing cells and their subsequent effects on gestational outcomes. In the absence of TGF-β signaling there appears to be a higher frequency of cNK cells in both the virgin uterus and pregnant uterus, suggesting that this is more of a general phenomenon.

      “However, in the pregnant uterus, CD49a<sup>+</sup> Eomes<sup>-</sup> ILC1s were markedly reduced in implantation sites of TGF-βRII<sup>Ncr1∆</sup> dams, paralleling the reduction of ILC1s previously reported in the virgin uterus of TGF-βRII<sup>Ncr1∆</sup> female mice [26].”

      (6) Figure 2E: Please phrase more carefully about the "concomitant increase" of cNKs, since this increase is much less pronounced compared to the very strong reduction (absence) of trNKs in Tgfbr2-floxed Ncr1-Cre mice. Do the authors suggest that cNKs are halted at this stage and cannot differentiate into trNK, based on these data?

      We thank both reviewers for this suggestion, and we have rephrased our wording to address this concern as follows:

      “This reduction of uterine trNK cells was accompanied by a small increase in the absolute number and frequency of CD49b<sup>+</sup> Eomes<sup>+</sup> cNK cells within the pregnant uterus of TGF-βRII<sup>Ncr1∆</sup> dams (Figure 2 D, E). Collectively, these findings suggest that a TGF-β–driven differentiation pathway directs the conversion of peripheral cNK cells into uterine trNK cells during murine pregnancy.”

      Please also see our response to Reviewer #1, Comment #2.

      (7) Can the reduced litter size and the abnormal spiral artery formation be rescued by transfer of WT cNK into Tgfbr2-floxed Ncr1-Cre mice?

      We thank the reviewers for this interesting question. In subsequent experiments, we transferred congenically labeled, splenic cNK cells from wildtype female mice into TGF-βRII<sup>Ncr1∆</sup> dams at gestational day 0.5. We only observed partial reconstitution of uterine trNK cell population; however, the number of viable implantation sites and resorption rates in reconstituted TGF-βRII<sup>Ncr1∆</sup> dams were comparable to the number of viable implantation sites and resorption rates in HBSS-treated littermate controls at gestational day 10.5. Given that partial reconstitution of the uterine trNK cell compartment in reconstituted TGF-βRII<sup>Ncr1∆</sup> dams was sufficient to rescue the defects in implantation site number and fetal resorption rates observed at midgestation, we hypothesize that this level of restoration may permit patrial but functionally sufficient spiral artery remodeling to reestablish maternal-fetal blood flow adequate to support fetal viability, although spiral artery remodeling was not directly assessed in this transfer study.

      “Partial reconstitution of uterine trNK cells restores midgestational pregnancy outcomes in TGF-βRII<sup>Ncr1∆</sup> dams

      To determine whether restoring uterine trNK cells could rescue the midgestational pregnancy defects observed in TGF-βRII<sup>cr1∆</sup> dams, we adoptively transferred wildtype, congenically labeled splenic cNK cells into pregnant TGF-βRII<sup>Ncr1∆</sup> dams at gd 0.5. By gd 10.5, donor cNK cells were detected in the pregnant uterus, where a subset upregulated CD49a and downregulated CD49b, consistent with acquisition of a uterine trNK cell phenotype (Figure 5 A). However, adoptively transferred splenic cNK cells only partially reconstituted the uterine trNK cell population in the gravid uterus of TGF-βRII<sup>Ncr1∆</sup> dams, as evidenced by reduced absolute number and frequency of donor-derived trNK cells in reconstituted TGF-βRII<sup>Ncr1∆</sup> dams (Figure 5 A-C). Notably, this partial reconstitution was sufficient to rescue the gestational defects caused by impaired TGF-β–mediated uterine trNK cell differentiation. Reconstituted TGF-βRII<sup>Ncr1∆</sup> dams exhibited implantation site numbers and fetal resorption rates at gd 10.5 comparable to those observed in littermate controls (Figure 5 D, E). Together, these findings suggest that even partial restoration of the uterine trNK cell in pregnant TGF-βRII<sup>Ncr1∆</sup> dams is sufficient to restore pregnancy outcomes at midgestation, supporting a central role for uterine trNK cells as the principal NK cell subset required for successful murine pregnancy.”

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1C: The shown gate seems to "cut" into the CD49b staining; staining for all transferred cells should be shown; have cNK cells been stained in parallel with the same panel to provide a positive and compensation control?

      To clarify the phenotype of the adoptively transferred cNK cells, we included two additional gates depicting the expression of CD49a and CD49b in unlabeled (non-vascular) trNK cells and cNK cells in the pregnant uterus Please see the revised Figure 1C.

      “(C) Concatenated flow plots of implantation sites showing that adoptively transferred cNK cells in pregnant uterus of wildtype dams upregulate CD49a and down regulate CD49b by gd 10.5, acquiring a CD49a<sup>+</sup> CD49b<sup>-</sup> Eomes<sup>+</sup> phenotype characteristic of uterine trNK cells (C57BL/6 dams n=4). Here, 2.5x10<sup>6</sup> CD45.2<sup>+</sup> CD3<sup>-</sup> CD19<sup>-</sup> NK1.1<sup>+</sup> NKp46<sup>+</sup> CD49b<sup>+</sup> splenic cNK cells were adoptively transferred into pregnant C57BL/6-CD45.1 dams at gd 0.5, and the receptor profile of these cells was subsequently assessed at gd 10.5. Gating strategy: Live, Single Cells; CD3<sup>-</sup> CD19<sup>-</sup> CD45.1<sup>-</sup> CD45.2–PE-Cy7<sup>-</sup> CD45.2–PE<sup>+</sup> NK1.1<sup>+</sup> NKp46<sup>+</sup> cells.”

      (2) Figure 2A: The authors could include an isotype control or a staining in a genetic knockout as a control staining.

      Thank you for this suggestion. As suggested, we included staining in a genetic TGF-βRII<sup>Ncr1∆</sup> knockout as additional control staining. Please see the revised Figure 2A.

      “Representative histograms depicting TGF-β Receptor II expression on splenic NK cells from virgin TGF-βRII<sup>Ncr1∆</sup> and wildtype mice as well as splenic and uterine NK cell subsets from pregnant wildtype mice at gd 10.5 (virgin TGF-βRII<sup>Ncr1∆</sup> mice, n=2; virgin mice: C57BL/6, n=5; gd 10.5: C57BL/6 dams, n=8, implantation sites n=8). MFI, median fluorescent intensity. Gating strategy: Live, Single Cells; CD3<sup>-</sup> CD19<sup>-</sup> CD45.1<sup>-</sup> CD45.2<sup>+</sup> NK1.1<sup>+</sup> NKp46<sup>+</sup> cells.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors generated mouse and zebrafish models for DeSanto-Shinawi Syndrome, caused by loss-of-function variants in the WAC gene. Using these vertebrate systems, they demonstrate conserved craniofacial and social-behavioral phenotypes that parallel human clinical features, along with deficits in GABAergic markers. They observe increased seizure susceptibility and male-biased brain volumetric changes in Wac mutant mice. Together, these findings begin to define the biological consequences of Wac haploinsufficiency and provide valuable resources for future mechanistic studies.

      Strengths:

      WAC is a high-confidence neurodevelopmental disorder gene and one of the genes identified by large-scale exome sequencing efforts, including the Satterstrom et al. (2020) autism spectrum disorder cohort. This study establishes the first vertebrate Wac models, addressing a major gap in the understanding of DeSanto-Shinawi Syndrome, and provides a framework for studying other syndromic forms of autism. The models generated will be impactful and useful to the community to study and understand DeSanto-Shinawi Syndrome.

      The cross-species analysis is important and well executed, and reveals both conserved and divergent phenotypes. The behavioral and anatomical assays are rigorously executed and well-controlled, and the inclusion of RNA-sequencing analyses adds valuable insights into the mechanisms underlying brain function in Wac mutants. Notably, the RNA-seq data reveal upregulation of several clustered protocadherins, genes central to neuronal identity and cell-cell interactions, which are known to be regulated by dynamic developmental regulation of chromatin architecture. This observation provides an intriguing hint that could link Wac function to higher-order chromatin organization and neuronal connectivity.

      Weaknesses:

      The evidence is solid, but the study remains incomplete in its mechanistic depth and molecular interpretation. The authors compellingly describe behavioral, anatomical, and transcriptomic phenotypes associated with WAC loss, yet do not explore how WAC mechanistically regulates chromatin or transcription. Given prior evidence that WAC interacts with the RNF20/40 ubiquitin ligase complex and promotes histone H2B ubiquitination and transcriptional elongation, the paper would benefit from a discussion of these functions as a potential link between Wac haploinsufficiency and the observed changes in neuronal gene expression. Similarly, the authors mention WAC's WW and coiled-coil domains but do not consider how these domains could mediate nuclear interactions or recruitment of transcriptional cofactors that shape gene regulation and chromatin organization in neurons.

      We agree that many mechanisms underlying how both animal model phenotypes and human symptoms that are caused by the Wac gene still need to be worked out. Due to the need to generate a great deal of data to first describe these models in this manuscript this will be expanded upon later. In lieu of this, we plan to follow up with mechanistic papers later to fully address the gap that remains. We have now added a paragraph in the discussion to bring up these important points regarding the roles of Wac during transcription and how its protein domains might be involved in these processes.

      The transcriptomic analysis is rich but largely descriptive. Although the upregulation of clustered protocadherins is particularly intriguing, these findings are not validated or localized to specific neuronal populations. The study would be strengthened by independently validating the most significant RNA-seq changes, such as protocadherin gamma genes, using in situ hybridization methods to confirm the spatial and cellular specificity of expression changes.

      We have greatly expanded the analyses of the bulk RNA-seq data, including a more rigorous look into the differences in gene expression between sexes, which has additionally revealed males to be more impacted by Wac loss of function. We have also added new western blot data for pan protocadherin alpha, which is now validated to be upregulated in the cortex (new Figure 7I and 7J). We are holding back any additional data from this report as we have single nucleus RNA-seq data that will be reported on in follow-up papers with targeted conditional deletion models.

      Finally, while the behavioral and MRI results add valuable breadth, their interpretation would be improved by clearer reporting of sample sizes, statistical corrections, and effect sizes to support claims of sex-specific and regional brain volume differences.

      Some additional details have been added to the methods section. In addition, we have now provided sample sizes assessed in each figure legend.

      Reviewer #2 (Public review):

      The authors describe the first deep neurological characterization of WAC mutation in two vertebrate species (zebrafish and mouse). They examine these at various levels, guided by the work in humans that has associated a heterozygous WAC mutation with DeSantos Shinawi Syndrome (DESSH). Therefore, they investigate the animals for a variety of phenotypes, following a template for what is seen when characterizing a new mouse/fish model of a developmental disability gene. Investigations include analysis of skull and jaw for abnormalities(both species), MRI of brain structure(in mice), electrophysiology(mice), assessment of signaling pathways (by Western blot, in mice), cell counts (both, more in mice), transcriptomics (mice), and behavior (both).

      Generally, this describes an important first characterization of the consequences of the mutation. Most of the studies appear well-conducted and reasonably powered, thus solid or convincing. However, there are a few places where the data presentation could be improved for clarity, and a few concerns about some choices in analytical approach for a couple of the experiments, where improved statistical approaches could improve their sensitivity and/or better rule out false positives, and thus the support of some of these claims is currently incomplete. There is also some lack of clarity about the rationale for some decisions regarding the fish genetics. Nonetheless, this is an important and useful first characterization of many phenotypes of these lines. Such experiments form a baseline for future mechanistic studies in the same lines and a platform to test approaches to reverse phenotypes.

      Individual claims and their strength & weaknesses:

      (1) The authors developed mouse and zebrafish models of WAC deletion

      They used the existing KOMP floxed WAC line to generate a null allele. For the mouse, there is a Western showing that it is indeed null for the protein. The fish data is less robustly validated - they don't confirm the allele in null at the protein or RNA level, and fish have two paralogs (waca and wacb), and this paper only characterizes one of these. So this evidence is less clear. The evaluated mice are heterozygous (Het), similar to patients, while the fish appear to be evaluated as homozygous mutants.

      We agree with the reviewer’s comments on zebrafish genetics. Since antibodies against zebrafish Wac proteins are not available, we could not examine protein levels in zebrafish. We predicted frameshift mutations due to DNA analyses in waca and wacb KO zebrafish. We made waca KO, wacb KO, and waca/wacb double KO zebrafish. waca/wacb double KO zebrafish showed a lethal phenotype, similar to homozygous mice mutants. Since wacb KO zebrafish did not show any detectable phenotype we do not report those here. However, we now show examples of the wacb and dKO zebrafish in Figure S1. Since waca KO zebrafish showed craniofacial and behavioral phenotypes that are comparable to mice Het and human patients, they are focused on in this report.

      (2) The authors show that both species show altered craniofacial features

      These data appear well powered, and the findings are robust.

      We appreciate this confirmation.

      (3) Each model altered GABAergic neurons

      In mice, the authors stained with PV antibodies and saw a decrease in cells positive for this staining. A second marker, Lhx6, does not show a difference, suggesting this might be a change in PV expression rather than cell number. They could maybe look into the literature to see if this loss of just the protein also occurs in other models. Overall, the sample size here is a bit smaller than other parts of the paper (n=3), and the methods on the cell counts were less clear, so it is not as clear that this finding is as robust. The authors counted several other broad classes of cells, and those appear normal. Interestingly, there might also be some TBR1 mislocalization in layer 6 that might be significant with added power.

      Thank you for these suggestions. Yes, other models also show this lack of PV expression even when MGE-lineage interneurons are present at normal levels. We mention in the discussion a previous study on the ASD gene CTNNAP2 that showed this. We also agree that there is a trend going on in the Tbr1 population. We assessed another WT and Het pair for Tbr1 laminar distribution and were able to determine that these changes held up and are now significantly different; the person counting these numbers was blind to the genotypes. Finally, we added more details to the methods to describe how the counting was performed.

      The fish data is based on an in situ hybridization for GAD. The measure shown is the width of the positive area in the forebrain. This measure is not one I have seen much before, and has potential to be driven by something unrelated to GABA (e.g., if the whole forebrain were simply a bit smaller). So this analysis could use a couple of other approaches (density of signal?) and/or a control probe for some other brain gene showing the measure is normal, and thus it is not just a size issue.

      To compare altered GABAergic neurons in mice and zebrafish, we tried to isolate zebrafish PV genes and examined their expression by whole-mount in situ hybridization, now included Figure S3 but found no differences. However, we could not find any zebrafish PV gene useful for GABAergic neurons. We chose to examine gad1b expression in the positive area of the forebrain in WT and waca KO zebrafish and then found differences in the brain area with gad1b expression. Since WT and waca KO brain sizes are generally the same we believe this measurement is reasonable to make this conclusion and have added text to the results section to justify.

      (4) Mice were more susceptible to the seizure-inducing agent PTZ

      These data appear well powered, and the findings are robust. The authors also did a fair amount of useful electrophysiology that was all normal, but appeared to be well executed.

      Thank you, we appreciate this confirmation.

      (5) Mice had changes in brain volume that interact with sex

      The authors conducted an MRI on a good number of mice and reported a slight increase in global volume just in males. Sample size is fair, but the statistical approach here may be better if it puts males and females in the same model (to boost power and explicitly test for sex by genotype interaction that they report), and there is some chance that the brain region level differences that they report could include some false positives. They tested many regions, and it is not clear whether or not they corrected for the number of tests. Often, an FDR correction would be used in such imaging studies. It may be that only the most robust regional findings will survive those corrections. It is interesting data either way, but the analysis could be improved.

      Given the 80 regions (bilaterally) that we used and the number of mice, i.e. 6-7, we are underpowered to robustly undertake FDR types of corrections. In the data presented we used t-tests between sex and regions to illuminate putative regional changes. However, we did revisit our MRI data and found three data sets where the results were not normally distributed. We thus changed our statistical test to Mann Whitney for male retrosplenial cortex, male parietal cortex and female corpus callosum, which are now reflected in the figures and differential statistics noted in figure legends.

      (6) Several behaviors are altered in the mice as well

      These studies were fairly well-powered (n=15,16), and they found several positive and negative results, including alterations in memory and sociability in both species. There is a minor statistical flaw in the three-chamber analysis (they don't actually compare the Hets directly to the wildtypes in their statistical testing - a common mistake in neuroscience that should be addressed. But the data look like they will probably still be significant when correctly analyzed. In the supplement, the authors could do a bit more with the data they have to look at hyperactivity (i.e., show total motion in open field, not just time in center vs. periphery), and adding sex to their model might improve sensitivity for genotype effects.

      Thank you for these suggestions. We have done several things to address this behavioral paradigm. First, we added more n’s and also switched from comparing the mouse vs. object to just comparing genotypes as a variable. In addition, we switched to quantifying a discrimination index, described in Phiilips et al., 2019 PMID: 31112129 for our measurement. These new data are shown in Figure 3A. Open field total distance traveled has now been added to Figure S2A. For all other measurements, we did first assess for sex differences but found none and thus compiled both sexes for the graphs.

      (7) Some biochemical signaling pathways are altered in the brain

      These are n=4 immunoblots, and show altered phospho ERK, but no changes in other signaling events predicted from prior WAC literature like H2B ubiquitination. They appear well done, and the authors share the full blots in the supplement.

      Thank you, we appreciate this confirmation. Since Wac is an adaptor protein we needed to test these reported molecular changes in neurons that were previously only reported in cell lines and drosophila. We were not surprised that some of these previously reported changes would not be the same in brain cells. However, it is possible that these changes might arise in more discrete brain regions or at different times during development, which will be tested in our future conditional knockout models.

      (8) WAC deletion also alters gene expression in the brain

      These studies were well-powered for RNAseq, with 10 and 14 samples, using neonates (P2), just the forebrain. The sequencing quality metrics all looked good, and the approach to analysis was okay. It would be stronger to again include sex in the model, rather than separate by sex. There were some typos in this part of the paper that made part of the conclusions unclear, but the RNAseq nicely confirmed the mutation of the mice, and discovered many differentially expressed genes, consistent with the role of this gene as a regulator of transcription. The presentation could be expanded to make more use of the data. Overall, though, this is a useful first characterization of the transcriptome in the line.

      Thank you for the suggestions. We have greatly expanded our assessments of the RNA-seq data. Upon analyzation of the data we found many differences between males and females and now show combined and sex-separated data. Our new data isolate several more extreme and some unique changes in males that are better shown as stand alone figure panels. In addition to these edits, we have also reworked all the text in this section of the results for better reading.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The cause and timing of lethality in the homozygous Wac knockout should be reported or discussed. Investigating Wac homozygous knockout embryos, if viable at early stages, could provide valuable insight into the developmental origins of the neuroanatomical and behavioral phenotypes described in the heterozygous animals. Even a brief histological or transcriptomic characterization of embryonic brains would strengthen the mechanistic understanding of Wac function during neurodevelopment.

      We agree and have collected embryos as early as embryonic day 12.5 from multiple litters but never detected a knockout. We have added this text to the animal methods sections to let readers understand effort had been done to determine when death occurs. While we don’t currently explore this further in mice we now include zebrafish waca; wacb double knockouts. Notably, while we were able to generate a few of these mutants, most died. However, some zebrafish were aged long enough to observe lethal deficits in heart formation and swim bladder development, suggesting that early loss of Wac could impact these critical organs that leads to death.

      (2) A better description of the data reported in Supplementary Tables 3 through 5 is needed. Supplementary Table 3 does not report any statistically significantly differentially expressed genes in the FDR column, and Supplementary Table 5 reports only two, and the reader should understand what the columns are indicating.

      We have now added figure legend text to the supplementary file to explain each Table mentioned here.

      Reviewer #2 (Recommendations for the authors):

      (1) Page 3, last paragraph. The description of wacb is confusing. I recommend that the authors provide the unshown data they mention and also further explanation of the breeding scheme and result. Indeed, if wacb is homozygous lethal, does that make it more like the mouse WAC gene, and thus potentially the more relevant paralogue to study? Are both waca and wacb expressed in the same tissues? How does that compare to mouse and human WAC expression? Such figures about gene expression (even when adapted with permission from public resources like Allen brain atlas or GTEX) are common in this sort of paper, as they can be helpful to understand when and where the gene is thought to act. For waca vs. wacb, they may help determine which gene is more relevant to the brain (for example, if only one is expressed in the brain).

      First, this is a great question and we have now added whole mount in situ for the waca and wacb genes as Figure S1. These data show low to no wacb expression in brain regions while waca is highly expressed there. Since the waca mutants showed phenotypes relevant to DESSH but wacb mutants did not, this correlates with observed expression patterns without fully excluding wacb from any role. Thus, we also made waca/wacb double KO zebrafish that showed a lethal phenotype, similar to homozygous mice mutants. Only a few waca; wacb double knockouts survived a little through development and are now shown in Figure S1. Since wacb KO zebrafish did not show any detectable phenotype on their own, we did not include the data since there are already several figures/tables in this manuscript. However, the waca KO zebrafish did show phenotypes similar to humans with DESSH and are the ones we focused on.

      (2) Why did the authors cross the mice into the outbred CD1 background? Usually, most labs keep the lines on an inbred background. Was there a particular rationale here? I am not saying that they could not outcross them. It is just a bit puzzling why. Perhaps a sentence of explanation in the methods section would be warranted.

      This is a great question and we have now added text to the animal methods section. Many labs that study development, especially on genes critical for survival/life like the Wac gene, use a more robust strain like CD-1. By doing this, we have a better chance of evaluating mutants at more mature ages and getting enough progeny to do more reproducible studies.

      (3) A typical first experiment in a new knockout (fish or mouse) is to establish that the deletion does indeed result in a loss of RNA and protein. In the absence of this, the rest of the paper cannot be as confidently interpreted.

      We did this for the mouse model and found reduced protein expression in the constitutive Het, however this datum is part of the western blots in figure 5. We now mention this in the early results section that protein levels were reduced in the Hets but maintain that the presentation of the western blot is better suited in Fig. 5 to compare to the other western blots. For zebrafish this was attempted but was more difficult. Available antibodies don’t work in zebrafish. RNA expression was attempted in both models and due to Wac being a critical gene for life, there are checks in place to upregulate faulty and normal RNA in the waca model. We screened for frameshift mutations in multiple KO lines and confirmed it by genomic DNA sequencing. In making many KOs and large-scale mutagenesis in zebrafish, we usually depend on phenotype-genotype segregation in Mendelian inheritance for many generations.

      (4) Are these new lines indeed knockouts? I did find a WAC western as part of a later figure for the mouse. The authors may want to mention that earlier, or present at least that data right away. What about in the fish? Is there a way to confirm at the RNA or protein level that it is indeed a null allele?

      Yes, as mentioned in the above response we have now mentioned our Wac western blot results early when introducing the mouse mutants and the issues with doing this in fish are presented above as well.

      (5) Why are fish used that are KO while mice are Hets? Are WAC homozygous mice not viable? This should be mentioned. Regardless, the rationale for examining heterozygous mice and homozygous mutant fish should be provided. Each kind of experiment is useful, but they are interpreted in different ways. Hets will genocopy the patients, who are generally hets, while KOs are often useful for a study of the essential roles of the genes, even if they are not really modeling the patient gene dose.

      Wac homozygous mice in our hands are embryonic lethal, now mentioned in the animal methods section, but we found early on that the Hets mimic several human DESSH patients. In zebrafish it is more complicated. We analyzed waca and wacb hets in zebrafish but found no phenotypes. This could be in part due to some complementation between the waca and wacb genes. It is also possible that a full waca KO could resemble a human DESSH individual since wacb may complement somewhat, even though deleting wacb entirely does not have a measurable phenotype. We have added more text to the discussion to explore these complexities. We also made waca/wacb double KO (dKO) zebrafish but they showed lethal phenotype, similar to homozygous mice mutants and suggesting some complementation by the wacb gene even though alone it did not exhibit phenotypes.

      (6) Figure 3A: It does not appear that the authors are directly statistically comparing the two groups (genotypes) that they are drawing conclusions about. This is an unfortunately common mistake in the neuroscience literature across papers. There is a nice older review about it here. https://pubmed.ncbi.nlm.nih.gov/21878926/. To draw conclusions about the differences between the mouse genotypes, they need to compare the two genotypes directly with a statistical test. See Nygard et al for a recommended approach, like comparing social preference indexes

      (https://onlinelibrary.wiley.com/doi/abs/10.1002/aur.2154).

      Thank you for this information. Previous reviewers at a different journal asked for this particular evaluation. We have now made changes to address the assessment, and graphs now reflect comparisons of genotypes instead of a single genotype between time with a mouse or object. We have also moved to using a social discrimination index to compare the genotypes, similar to the study mentioned.

      (7) MRI - it is a bit weird to separate the male and female brains just for the MRI. Was there a premise from human data to do so? If not, the authors should probably pool them. If they are concerned there are sex effects (or, more likely, a sex by genotype interaction) I recommend that they use a two-factor ANOVA and simply put both sex and genotype into the model. This will also have the advantage of increasing their statistical power for genotype effects a bit. If their current results are robust, they will still show up as a significant sex x genotype interaction.

      All data in the manuscript initially compared the sexes to each other. We have now added this text to the animal section of the methods: For MRI, some zebrafish behaviors and now the RNA-seq data, sex was a difference and due to this observation, sex was (or now is) presented independently for these measurements. We now state that if no sex differences were observed the data were pooled.

      (8) Also, did the authors correct for multiple testing in the MRI analysis? Since they are testing many regions, there is a risk of false positives if they do not. This could be confounded further by their splitting the data by sex, thus doubling the number of tests.

      As noted above we did not do multiple corrections given the large number of regions and low number of replicates.

      (9) How many images per animal were analyzed for the cell counts? This detail is absent from the methods and would help with evaluating the robustness of these findings. What other approaches were used to make sure the counting was unbiased?

      We analyzed 3-4 images per animal for counts and counted hundreds of cells per image. In addition, the person counting was blinded to avoid any bias. These details have now been updated in the methods.

      (10) As with the MRI, for the DEG analysis, I recommend the authors simply put sex and genotype into the same model as two factors (with an interaction), to increase their sensitivity to genotype effects, as well as be able to report on robust genotype x sex differences, if there are any. They may also consider testing the model with and without excluding the three outlier animals on their PCA. It may be that the noise of those outliers is detracting from their sensitivity for DEGs somewhat.

      We greatly expanded our analyses and found more robust and unique changes in males that are now added to Figure 7 and supplemental files. After considering the data, decided to highlight the sex differences separately.

      (11) A few more relatively simple things could readily be done with the RNAseq data to add some depth and interpretation. For example, do the hits here overlap other published IDD/autism DEG lists from mouse knockouts studies of genes like FoxP2, Chd8, Dnmt3a, Myt1l, Tcf4, etc? Do autism genes show up in the lists of hits here? And if so, more than expected by chance? Can they provide some visualization of their GO results in the main figure?

      When we looked into the sex differences more we found that only the males showed significant upregulation of other autism risk genes increase that was previously unappreciated when the sexes were assessed together. Yes, several autism genes do show up but is heavily biased to males. Our main Figure 7 and new supplemental files show new GO term analyses and provide additional data looking not only autism but other factors.

      (12) It appears the IMPC has phenotyped this mouse somewhat, including craniofacial abnormalities. They also report on some blood cell differences. Anyway, if no one has written about that data yet (as it was generated in the context of a big consortium effort), their guidelines may allow you to include some of their data as Supplementary Figures here with proper attribution. It might help to at least summarize useful findings from there in your discussion.

      Due to the large number of figures/tables already in this report we don’t think this will be helpful. However, we do refer readers to the consortium in the animal methods section so they can explore data already generated by the IMPC.

      (13) Minor/Typos:

      (a) Figure 2K: I am confused by the description of three genotypes in the legend, but only two in the panel?

      Corrected.

      (b) I found it a little distracting that some results figures were embedded in the introduction.

      We have moved the figures further in the manuscript to start in the results section.

      (c) I don't understand this sentence: "Due to reduced sample size, sex-stratified DE was performed without model corrections at FDR < 0.1, 7 and found genes significantly upregulated and downregulated, respectively;" The sample size here seemed robust, so I am not sure what they were referring to? Are there missing numbers form this sentence? What is the 7? I think there are enough typos here that I am not sure how to evaluate this claim. Thus, the writing and clarity of this part could be improved.

      This section had several typos that have now been corrected.

      (d) "Marwan Shinawi, (unpublished results)" is a bit atypical of a citation. Are these results being reported with his permission? If so, then it should say 'personal communication' (if the journal permits this - some do not). If not, they should not report someone else's unpublished results without their explicit permission. It might upset some people to have their results presented this way.

      We have changed unpublished results to personal communication. Marwin Shinawi is an author on this manuscript and has approved of everything we have reported.

      (e) In all figures, consider shape or color coding for sex, even when pooling the data (e.g, the data points in the behavior figures).

      This is a good idea but since we found no difference when analyzing the data we don’t see how this extra work will make a difference. Since we now mention that sex differences were only presented as separate graphs when observed in the methods we think this should be acceptable.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The behaviour of cells expressing constitutively active HRas is examined in mosaic monolayers, both in MCF10a breast epithelial and Beas2b bronchial epithelial cell lines, mimicking the potential initial phase of development of carcinoma. Single HRas-positive cells are excluded from MCF10a but not Beas2b monolayers. Most interestingly, however, when in groups, these cells are not excluded, but rather sharply segregated within a MCF10a monolayer. In contrast, they freely mix with wt Beas2b cells. Biophysical analysis identifies high tension at heterotypic interfaces between HRas and wild-type cells as the likely reason for segregation of MCF10a cells. The hypothesis is supported experimentally, as myosin inhibition abolishes segregation. The probable reason for lack of segregation in the bronchial epithelium is to be found in the different intrinsic properties of these cells, which form a looser tissue with lower basal actomyosin activity. The behaviour of single cells and groups is recapitulated in a vortex model based on the principle of differential interfacial tension, under the condition of high heterotypic interfacial tension.

      Strengths:

      Despite being long recognized as a crucial event during cancer development, segregation of oncogenic cells has been a largely understudied question. This nice work addresses the mechanics of this phenomenon through a straightforward experimental design, applying the biophysical analytical approaches established in the field of morphogenesis. Comparison between two cell types provides some preliminary clues on the diversity of effects in various cancers.

      Weaknesses:

      Although not calling into question the main message of this study, there are a few issues that one may want to address:

      (1) One may be careful in interpreting the comparison between MCF10a and Beas2b cells as used in this study. The conditions may not necessarily be representative of the actual properties of breast and bronchial epithelia. How much of the epithelial organization is reconstituted under these experimental conditions remains to be established. This is particularly obvious for bronchial cells, which would need quite specific culture conditions to build a proper bronchial layer. In this study, they seemed to be on the verge of a mesenchymal phenotype (large gaps, huge protrusions, cells growing on top of each other, as mentioned in the manuscript).

      As an alternative to Beas2b, comparison of MCF10a with another cell line capable of more robust in vitro epithelial organization, but ideally with different adhesive and/or tensile properties, would be highly interesting, as it may narrow down the parameters involved in segregation of oncogenic cells.

      (2) While the seminal description of tissue properties based on interfacial tensions (Brodland 2002) is clearly key to interpreting these data, the actual "Differential Interfacial Tension Hypothesis" poses that segregation results from global differences, i.e., juxtaposition of two tissues displaying different intrinsic tensions. On the contrary, the results of the present work support a different scenario, where what counts is the actual difference in tension ALONG the tissue boundary, in other words, that segregation is driven by high HETEROTYPIC interfacial tension. This is an important distinction that should be clarified.

      (3) Related: The fact that actomyosin accumulates at the heterotypic interface is key here. It would be quite informative to better document the pattern of this accumulation, which is not clear enough from the images of the current manuscript: Are we talking about the actual interface between mutant and wt cells (membrane/cortex of heterotypic contacts)? Or is it more globally overactivated in the whole cell layer along the border? Some better images and some quantification would help.

      (4) In the case of Beas2b cells, mutant cells show higher actin than wt cells, while actin is, on the contrary, lower in mutant MCF10a cells (Figure 2b). Has this been taken into account in the model? It may be in line with the idea that HRas may have a different action on the two cell types, a possibility that would certainly be worth considering and discussing.

      Comments on revisions:

      There is still one last point that should be made even clearer:

      The system is being modelled based on the principle of INTERFACIAL TENSION, a description pioneered by the works of Steinberg and of Harris, and nicely conceptualized by Brodland (2002). Now the observed behaviour is a perfect case of sorting based on higher interfacial tension AT the boundary between cell types (with nice additional documentation of local actin and myosin enrichment in the revised manuscript). What needs to be made crystal clear it that this is NOT equivalent to the model of DITH ("DIFFERENTIAL INTERFACIAL TENSION HYPOTHESIS)" (Brodland 2002, Krieg et al 2008). It is important to stop using DITH in this context, as it leads to confusion and misinterpretations. Indeed, DITH predicts cell/tissue sorting based on differences in interfacial tension WITHIN the two cell types. While DITH accounts for relative POSITIONING (one tissue engulfing the other), it is now established that this is not the motor for cell sorting and tissue segregation, the key parameter is being heterotypic tension at the heterotypic interface. I thus invite the authors to avoid the terms "differential"/DITH, and rather use either "interfacial tension", or specifically to "HIGH HETEROTYPIC INTERFACIAL TENSION".

      Related: the authors correctly cite Canty et al NatComm2017 when discussing this phenomenon. I suggest to add an additional key supporting reference "D.M. Sussman, J.M. Schwarz, M.C. Marchetti, M.L. Manning, Soft yet sharp interfaces in a vertex model of confluent tissue, Phys. Rev. Letters 120 (2018) 058001". One may also include another pioneer work in Drosophila is "M. Aliee, J.C. Roper, K.P. Landsberg, C. Pentzold, T.J. Widmann, F. Julicher, C. Dahmann, Physical mechanisms shaping the Drosophila dorsoventral compartment boundary, Curr. Biol. 22 (2012) 967-976."

      We thank the reviewer for this important clarification. We fully agree that the mechanism underlying the observed segregation in our system is best described in terms of elevated heterotypic interfacial tension, rather than the classical Differential Interfacial Tension Hypothesis (DITH). As the reviewer correctly points out, DITH in its original formulation refers to differences in intrinsic interfacial tensions within each cell population, which primarily governs relative positioning (e.g., tissue engulfment), rather than the local sorting dynamics we observe here.

      In contrast, our experimental and modeling results support a scenario in which segregation is driven by increased tension specifically at heterotypic interfaces between HRasV12 and wild-type cells. We agree that continued use of the term “Differential interfacial tension” in this context may lead to conceptual ambiguity.

      Accordingly, we have revised the manuscript throughout to replace references to “differential interfacial tension” with more precise terminology, namely “interfacial tension” or “heterotypic interfacial tension”, wherever appropriate. We have also updated the Discussion to explicitly clarify this distinction and its implications for interpreting our results.

      We thank the reviewer for suggesting additional relevant literature which have now included.

      Reviewer #2 (Public review):

      Summary:

      The authors investigate the behavior of oncogenic cells in mammary and bronchial epithelia. They observe that individual oncogenic cells are preferentially excluded from the mammary epithelium, but they remain integrated in the bronchial epithelium. They also observe that clusters of oncogenic cells form a compact cluster in mammary epithelium, but they disperse in the bronchial epithelium. The authors demonstrate experimentally and in the vertex model simulations that the difference in observed behavior is due to the differential tension between the mutant and wild-type cells due to a differential expression of actin and myosin.

      Strengths:

      Very detailed analysis of experiments to systematically characterize and quantify differences between mammary and bronchial epithelia

      Detailed comparison between the experiments and vertex model simulations to identify the differential cell line tension between the oncogenic and wild-type cells as one of the key parameters that are responsible for the different behavior of oncogenic cells in mammary and bronchial epithelia

      Weaknesses:

      It is unclear what is the mechanistic origin of the shape-tension coupling, which is used in the vertex model, and how important that coupling is for the presented results. Authors claim that the shape-tension coupling is due to the anisotropic distribution of stress fibers when cells are under external stress. It is unclear why the stress fibers should affect an effective line tension on the cell boundaries and why the stress fibers should be sensitive to the magnitude of the internal isotropic cell pressure. In experiments, it makes sense that stress fibers form when cells are stretched. Similar stress fibers form when cytoskeleton or polymer networks are stretched. It is unclear why the stress fibers should be sensitive to the magnitude of internal isotropic cell pressure. If all the surrounding cells have the same internal pressure, then the cell would not be significantly deformed due to that pressure and stress fibers would not form. Authors should better justify the use of the shape-tension coupling in the model, since most of the observed behavior is already captured by the differential tension even if there is no shape-tension coupling.

      We thank the reviewer for this comment. We agree that we did not provide a mechanistic origin for the shape-tension coupling. In our model, stress fiber formation, along with actin ring formation, indicated that cells at the interface were elongated. Hence, we hypothesised that an interfacial force could induce nematic alignment at the interface. However, such an activity would only be feasible if the interface interaction were sufficiently high. Thus, the isotropic pressure at the heterotypic interface served as a proxy for cell-cell interactions in our model. However, inspired by recent work [1], we have tested whether activation of cells at the interface by shear stress would produce similar results. Exploring this aspect will require additional simulations.

      (1) Pérez-Verdugo, F., Maniou, E., Galea, G. L., & Banerjee, S. (2026). Mechanosensitive feedback organizes cell shape and motion during hindbrain neuropore morphogenesis. Current Biology.

      The observed difference of shape indices between the interfacial and bulk cells in simulations in the absence of differential line tension is concerning. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. For all presented simulation results, the authors should repeat multiple simulations and then present both averages and standard deviations. This way it would be easier to determine whether the observed differences in simulations are statistically significant.

      The observed differences in shape indices between interfacial and bulk cells in simulations in the zero-line-tension case (Lambda=0) remain non-zero at the zero-stress threshold because the interface cells are still subject to the shape-dependent contribution gamma_ij, since the current model treats gamma_ij as independent of Lambda. We are exploring the possible relationship between Lambda and gamma_ij, and we will update this in the next version of the manuscript.

      Recommendations for the authors:

      The editor recommends considering the new comment made by reviewer #1 in his/her report:

      "There is still one last point that should be made even more clear:

      The system is being modelled based on the principle of INTERFACIAL TENSION, a description pioneered by the works of Steinberg and of Harris, and nicely conceptualized by Brodland (2002). Now the observed behaviour is a perfect case of sorting based on higher interfacial tension AT the boundary between cell types (with nice additional documentation of local actin and myosin enrichment in the revised manuscript). What needs to be made crystal clear it that this is NOT equivalent to the model of DITH ("DIFFERENTIAL INTERFACIAL TENSION HYPOTHESIS)" (Brodland 2002, Krieg et al 2008). It is important to stop using DITH in this context, as it leads to confusion and misinterpretations. Indeed, DITH predicts cell/tissue sorting based on differences in interfacial tension WITHIN the two cell types. While DITH accounts for relative POSITIONING (one tissue engulfing the other), it is now established that this is not the motor for cell sorting and tissue segregation, the key parameter is being heterotypic tension at the heterotypic interface. I thus invite the authors to avoid the terms "differential"/DITH, and rather use either "interfacial tension", or specifically to "HIGH HETEROTYPIC INTERFACIAL TENSION".

      Related: the authors correctly cite Canty et al NatComm2017 when discussing this phenomenon. I suggest to add an additional key supporting reference "D.M. Sussman, J.M. Schwarz, M.C. Marchetti, M.L. Manning, Soft yet sharp interfaces in a vertex model of confluent tissue, Phys. Rev. Letters 120 (2018) 058001". One may also include another pioneer work in Drosophila is "M. Aliee, J.C. Roper, K.P. Landsberg, C. Pentzold, T.J. Widmann, F. Julicher, C. Dahmann, Physical mechanisms shaping the Drosophila dorsoventral compartment boundary, Curr. Biol. 22 (2012) 967-976."

      Please see response to Reviewer 1

      Reviewer #2 (Recommendations for the authors):

      The authors have improved the manuscript and addressed some of my concerns. However, some of the questions were not adequately addressed.

      (1) I appreciate additional justification regarding the need for the shape-tension coupling in the vertex model. However, the authors have not answered my question regarding why the shape-tension coupling model should be sensitive to the magnitude of the internal isotropic cell pressure. In experiments, it makes sense that stress fibers form when cells are stretched, but it is unclear why the stress fibers should be sensitive to the magnitude of internal isotropic cell pressure. If all the surrounding cells have the same internal pressure, then the cell would not be significantly deformed due to that pressure, and stress fibers would not form.

      We thank the reviewer for pointing this out. We agree that we did not provide a mechanistic origin for the shape-tension coupling. In our model, stress fiber formation, along with actin ring formation, indicated that cells at the interface were elongated. Hence, we hypothesized that an interfacial force could induce nematic alignment at the interface. However, such an activity would only be feasible if the interface interaction were sufficiently high. Thus, the isotropic pressure at the heterotypic interface served as a proxy for cell-cell interactions in our model.

      However, inspired by recent work [1], we have tested whether activation of cells at the interface by shear stress would produce similar results. Exploring this aspect will require additional simulations.

      (1) Pérez-Verdugo, F., Maniou, E., Galea, G. L., & Banerjee, S. (2026). Mechanosensitive feedback organizes cell shape and motion during hindbrain neuropore morphogenesis. Current Biology.

      (2) I appreciate that the authors provided additional statistics related to simulations. I am still very concerned about the observed difference in the shape indices between the cells at the interface and the bulk, when the interfacial line tension is exactly zero (Lambda=0). In that case, the cells at the interface and at the boundary are identical, and there should be no difference in the shape indices. Are cells at the interface for the zero-line tension case (Lambda=0) still subject to the shape dependent contribution gamma_ij? If that contribution is still included for the cells at the interface, then this could explain why cells at the interface are still different from cells in the bulk even when Lambda=0.

      The observed differences in shape indices between interfacial and bulk cells in simulations in the zero-line-tension case (Lambda=0) remain non-zero at the zero-stress threshold because the interface cells are still subject to the shape-dependent contribution gamma_ij, since the current model treats gamma_ij as independent of Lambda. We are exploring the possible relationship between Lambda and gamma_ij, and we will update this in the next version of the manuscript.

      (3) Authors included several additional supplemental figures (Figs. S4, S5, S6, S7) , but they are not discussed in the manuscript text. These new supplemental figures were only discussed in the rebuttal letter. These figures should also be discussed in the manuscript text.

      We have cited the new supplementary figures in the main text.

      (4) Authors have answered in the rebuttal letter what experimental data was used in Fig. 4c. This information also needs to be provided in the manuscript text.

      We have added this information in the caption of Figure 4

      (5) Supplementary Figure 3 is missing. That figure got moved to the appendix.

      This has been rectified in the Supplementary file and the citations have been updated accordingly in the main text.

      (6) At the end of section 4 in the main text, the authors introduced a new sentence regarding simulations of the vertex model with interfacial tension and mechanochemical feedback. The details of that model are described in the appendix, but it would be helpful to add a sentence or two already in the main text describing what is the mechanism of the mechanochemcial feedback.

      We have added a line describing the mechanism of mechanochemical feedback.

      (7) In the definition of the eccentricity, 'a' should be the minor axis and 'b' the major axis, i.e., 'a' and 'b' should be swapped.

      We have corrected this.

      (8) There is a typo at the end of the vertex model description in the methods section. "The details of the shape-tension coupling is described in the interface." The word interface should be an appendix.

      We have fixed the typo.

      (9) In the appendix section describing the shape-tension coupling, the authors should explain how the cell's director n is defined.

      We have added a line in the appendix section describing shape-tension coupling explaining how the cell’s director n is defined.

      (10) In Appendix Fig. 1, the two angles are defined as theta and theta' but the figure caption is defining angles theta_1 and theta_2. These angles need to be consistent.

      This has been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors reveal that the availability of extracellular asparagine (Asn) represents a metabolic vulnerability for the activation and differentiation of naive CD4+ T cells. To deplete extracellular Asn, they employed two orthogonal approaches: activating naive CD4+ T cells in either PEGylated asparaginase (PEG-AsnASE)-treated medium or custom-formulated RPMI medium specifically lacking Asn. Importantly, they demonstrate that depletion not only impaired metabolic reprogramming associated with CD4+ T cell activation but also reduced CD4+ helper T cell lineage-specific cytokine production, thereby ameliorating the severity of experimental autoimmune encephalomyelitis.

      Strengths:

      The experiments presented here are comprehensive and well-designed, providing compelling evidence for the conclusions. The conclusions will be important to the field.

      We thank the reviewer for their assessment of our work and enthusiasm towards our findings.

      Weaknesses:

      (1) EAE is the prototypic T cell-mediated autoimmune disease model, and both Th1 and Th17 cells are implicated in its pathogenesis. In contrast, Th2 and Treg cells and their associated cytokines (such as IL-4 and IL-10) have been shown to play a role in the resolution of EAE, and potentially in the modulation of disease progression. Thus, it will be important to determine whether Asn depletion affects the differentiation of naive CD4+ T cells into corresponding subsets under Th2 and Treg polarization conditions, as well as the expression of lineage-specific transcription factors and cytokine production.

      We appreciate that the reviewer recognizes the functional relevance of our findings showing that Asn is important for proper Th17 differentiation and promotion of EAE (Figure 5 E-J, Figure 6). Given that multiple CD4+ T cell subsets play a role in both the initiation and resolution of EAE, we agree that it would be valuable to further support these findings with complementary Th2 and Treg differentiation experiments.

      To address this, we examined the effects of asparagine depletion during in vitro iTreg and TH2 differentiation. We found that the frequencies of FOXP3+ iTreg and GATA3+ Th2 cells were reduced when cultures were grown in asparagine-deficient media. These results have been added to Supplementary Figure 5.

      (2) EAE is characterized by inflammation and demyelination in the central nervous system (CNS), leading to neurological deficits. Myelin destruction is directly correlated with the severity of the disease. For Figure 6, did the authors perform spinal cord histological analysis by hematoxylin and eosin (H&E) or Luxol fast blue (LFB) staining? This is important to rigorously examine pathological EAE symptoms.

      We agree with the reviewer that histopathology including H&E and/or LFB staining is a useful indicator of EAE disease severity. However, we are no longer able to obtain PEGAsnASE (Oncaspar) to perform these studies.

      Reviewer #2 (Public review):

      While the importance of asparagine in the differentiation and activation of CD8+ T cells has been previously reported, its role in CD4+ T cells remained unclear. Using culture media containing specific amino acids, the authors demonstrated that extracellular asparagine promotes CD4+ T cell proliferation. Consistent with this, depletion of extracellular asparagine using PEG-AsnASE suppressed CD4+ T cell activation. Proteomic analysis focusing on asparagine content revealed that, during the early phase of T cell activation, most asparagine incorporated into proteins is derived from extracellular sources. The authors further confirmed the importance of extracellular asparagine in vivo, demonstrating improved EAE pathology.

      While the data are well organized and convincing, the mechanism by which asparagine deficiency leads to altered T cell differentiation remains unclear. It is also necessary to investigate the transporters involved in asparagine uptake. In particular, elucidating whether different T cell subsets utilize the same or distinct transport mechanisms would provide important insight into the immunoregulatory role of asparagine.

      (1) The finding that asparagine supplementation promotes T cell proliferation under various amino acid conditions is highly significant. However, the concentration at which this effect occurs remains unclear. A titration analysis would be necessary to determine the dosedependency of asparagine.

      Our studies indicate that the concentration of asparagine present in conventional RPMI lymphocyte media is sufficient to support CD4+ T cell activation and proliferation in vitro (Figure 1, Supplementary Figure 1 & Figure 2). This concentration was consistently used throughout our studies. In line with the reviewer’s comments, however, we have not yet determined the dose dependency of Asn during CD4+ T cell activation.

      To address this, we performed a titration experiment in which asparagine was supplemented at varying concentrations in DMEM and Asn-deficient RPMI. Activation markers were measured 24 hours after TCR stimulation under these culture conditions. We found that the critical asparagine concentration lies between 37.8 and 3.78 uM. This concentration range is consistent with the physiological concentration of asparagine in murine plasma, which is approximately 50 uM (PMID: 24842860; PMID: 23853755). These data have been added to Supplementary Figure 1.

      (2) The effects of asparagine deficiency occur during the early phase of T cell activation. Thus, it is likely that the transporters responsible for asparagine uptake are either rapidly induced upon activation or already expressed in the resting state. Since this is central to the focus of the manuscript, it is interesting to identify the transporter responsible for asparagine uptake during early T cell activation. A recent paper (DOI: 10.1126/sciadv.ads350) reported that macrophages utilize Slc6a14 to use extracellular asparagine. Is this also true for CD4+ T cells?

      While a comprehensive characterization of the amino acid transporter network is certainly of interest, it is beyond the scope of the present study. As the reviewer notes, others have explored asparagine transport in lymphocytes. For example, Wu et al. (PMID: 33420490) determined that the asparagine transporter, Slc1a5, is significantly upregulated in CD8+ T cells upon activation, based on qRT-PCR measurements comparing mRNA from naïve and activated CD8+ T cell. They further validated the functional role of Asn transporters in CD8+ T cells by measuring N15-labeled asparagine uptake in the presence of siRNAs targeting the asparagine transporters Slc1a5 or Slc38a2 and found that inhibition of either transporter significantly reduced intracellular N15-Asn accumulation.

      To gain additional insight into Asn transporters in distinct CD4+ T cell subsets, we reanalyzed a published RNA-seq dataset (Thakore et al., 2024; PMID: 39009838). We quantified the expression of transporters Slc1a5, Slc38a2, and Slc6a14 in naïve and activated CD4+ T cells polarized under Th1, npTh17, or pTh17 conditions at various time points. We observed that Slc1a5 expression increased upon activation in all subsets. Similarly, Slc38a2 expression increased during early activation stage, but subsequently returned to basal levels similar to naïve cells. In contrast, Slc6a14 showed relatively low basal expression in naïve cells compared to the other transporters investigated, and its expression decreased over the differentiation period in all CD4+ T cell subsets examined. These results indicate that Asn transporters Slc1a5 and Slc38a2 are expressed in CD4+ T cells during early activation and differentiation. These data have been included in Supplementary Figure 3.

      (3) Given that depletion of extracellular asparagine impairs differentiation of Th1 and Th17 cells, it is possible that TCR signaling is compromised under these conditions. This point should be investigated by targeting downstream signaling molecules such as Lck, ZAP70, or mTOR. Also, does it affect the protein stability of master transcription factors such as Tbet and RORgt?

      We agree with the reviewer that asparagine deprivation could impact several aspects of T cell function. In our study, we demonstrate that asparagine is crucial for CD4+ T cell protein synthesis and the expression of activation markers (Figure 1B-K, Figure 2K-L, and Figure 3AC). We also highlight its importance in promoting CD4+ T cell subset differentiation and lineage-defining cytokine production (Figure 5B-J). Other studies have reported a role for asparagine in early activation marker expression in CD8+ T cells and in enhancing LCK function (PMID: 33822775; PMID: 33420490). Given its proposed function as a promoter of LCK signaling function in CD8+ T cells, it will be important to determine if a similar mechanism operates during CD4+ T cell activation in future studies.

      We appreciate the reviewer’s inquiry regarding the stability of critical transcription factors defining Th1 and Th17 subsets. We have examined the expression of the transcription factors RORγT and Tbet in Th17 and Th1 polarized cells and observed reduced expression in the absence of asparagine. We have included these findings in Supplementary Figure 5.

      (4) Is extracellular asparagine also important for the differentiation of helper T cell subsets other than Th1 and Th17, such as Th2, Th9, and iTreg?

      Please see our response to Reviewer 1 regarding iTreg and TH2. Investigation of Th9 cells is beyond the scope of the present study.

      (5) Asparagine taken up from outside the cell has been shown to be used for de novo protein synthesis (Figure 3E), but are there any proteins that are particularly susceptible to asparagine deficiency? This can be verified by performing proteome analysis, and the effects on Th1/17 subset differentiation mentioned above should also be examined.

      The investigation of specific proteins that exhibit asparagine dependency would indeed be interesting. Given our results showing that global protein synthesis is blunted with asparagine deprivation (Figure 3A-C), it would be particularly compelling to identify proteins with a specific requirement for asparagine. However, this level of analysis is beyond the scope of our study.

      (6) While the importance of extracellular asparagine is emphasized, Asns expression is markedly induced during early T cell activation. Nevertheless, the majority of asparagine incorporated into proteins appears to be derived from extracellular sources. Does genetic deletion of Asns have any impact on early CD4+ T cell activation? The authors indicated that newly synthesized Asns have little impact on CD8+ T cells in the Discussion section, but is this also true for CD4+ T cells? This could be verified through experiments using CRISPR-mediated Asns gene targeting or pharmacological inhibition.

      We appreciate the reviewer’s consideration of the contribution of endogenous asparagine to CD4 +T cell function. However, genetic perturbation of Asns is beyond the scope of our study, which is specifically focused on defining the requirements for extracellular asparagine and its role in CD4+ T cell activation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors set out to define how arginine availability regulates lipid metabolism and to explore the implications of this relationship in pancreatic ductal adenocarcinoma (PDAC), a tumor type known to exist in an arginine-poor microenvironment. Using a combination of rigorous genetic and metabolomic approaches, they uncover a previously underappreciated role for arginine in maintaining lipid homeostasis. Importantly, they demonstrate that arginine deprivation sensitizes PDAC cells to ferroptosis through lipidome perturbations, which can be exploited therapeutically via co-treatment with aESA and ferroptosis inducers (FINs). These findings have meaningful implications for the field. They not only shed light on the metabolic vulnerabilities created by nutrient restriction in PDAC, but also suggest a practical avenue for combination therapies that exploit ferroptosis sensitivity. This is particularly relevant in the context of pancreatic cancer, which is notoriously resistant to conventional treatments. The methods employed are broadly applicable to other nutrient-stress contexts and may inspire similar investigations in other solid tumor types.

      Strengths:

      One of the major strengths of the study is the use of complementary and well-controlled approaches-including metabolomic profiling, genetic perturbations, and in vivo models-to support the central hypothesis. The experiments are thoughtfully designed and clearly presented, and the conclusions are, for the most part, well supported by the data. The findings provide mechanistic insight into nutrient-lipid crosstalk and identify a potential therapeutic strategy for targeting arginine-deprived tumors.

      We thank the reviewer for their positive assessment of our manuscript.

      Weaknesses:

      A key weakness of the study lies in the mechanistic connection between arginine levels and SREBP1 activation. While the authors show that arginine restriction leads to reduced SREBP1 expression, the magnitude of this effect appears modest relative to the substantial changes observed in the lipidome. The study would benefit from a deeper analysis of SREBP1 regulation-particularly whether nuclear translocation or activation is affected. This could be addressed by examining the nuclear pool of SREBP1, using either subcellular fractionation or improved immunofluorescence imaging in both cell lines and tissue samples.

      We thank the reviewer for this comment and in our revised manuscript have undertaken several new studies to assess how the nuclear pool of SREBP1 is regulated by arginine starvation. We further identified one mechanism by which arginine starvation suppresses SREBP1 protein levels, namely GCN activation. We believe these additional studies strengthen the manuscript and appreciate the reviewer suggesting these studies.

      Another area where additional context would strengthen the manuscript is in the transcriptomic profiling of PDAC cells cultured in a tumor interstitial fluid mimic (TIFM). While the study emphasizes lipid-related pathways, highlighting the most significantly upregulated and downregulated pathways in Figure 1B would give readers a broader perspective on how arginine restriction reprograms the PDAC transcriptome. For instance, because polyamines are downstream of arginine and are known to influence lipid metabolism, it would be worth discussing whether these metabolites contribute to the phenotypes observed. Similarly, an evaluation of whether Dgat1/2 expression is altered could help delineate the full scope of lipid metabolic rewiring.

      We thank the reviewer for suggesting this change to our manuscript and we now provide much more extensive analysis of our transcriptomic analyses in Figure 1 – Figure supplement 1, which we think will make our manuscript more useful to readers.

      Finally, it is worth noting that the KPC mouse model used in this study is based on conditional deletion of p53, which leads to faster-growing tumors and a distinct tumor microenvironment compared to models harboring the p53^R172H point mutation. Including a brief discussion of this distinction would help readers contextualize the translational relevance of the findings.

      We have revised the manuscript to include a discussion of this point.

      Reviewer #2 (Public review):

      This study by Jonker et al. examines how the metabolic adaptations to the microenvironment by pancreatic ductal adenocarcinomas (PDAC) present vulnerabilities that could be used for therapeutic purposes. The evidence supporting the claims of the authors is mostly solid, and the multiplicity of models used, as well as the combination of in vitro and in vivo work, are appreciated, but some conclusions would benefit from additional substantiation. This work would be of interest to biologists working on the impact of microenvironment and metabolism in cancer, and especially those investigating pancreatic cancer.

      We thank the reviewer for their positive assessment of our manuscript.

      In this study, the authors use mostly "doublings per day" as an indicator of cell death, notably for Figures 4 to 6. However, proliferative arrest (or a decrease in the proliferative rate) is not necessarily synonymous with cell death. It might be nice to complement these experiments with a true measure of cell death (e.g., PI uptake).

      We thank the reviewer for this important comment and have performed extensive additional experiments to measure cell death directly via viability markers in addition to our indirect measurements of cell number at the start and end of experiments. We believe these additions strengthen our claims that PUFAs cause arginine starved PDAC cells to undergo ferroptotic cell death.

      The composition of Tumor Interstitial Fluid Medium (TIFM) was published previously, but nonetheless a reminder of the composition of this medium in a Supplemental file of this study might be helpful. In particular, at the start of the Results section, the nature of serum/lipids in the different media should be specifically noted, especially given that the subsequent focus of the work is on lipids/SREBP. It is known that differences in the extracellular availability of lipids can profoundly alter de novo lipid biosynthesis pathways.

      We thank the reviewer for this comment. We have edited the text to provide additional context on the composition of TIFM, especially lipid availability. We further have provided a supplemental file with the composition of TIFM. We hope this will make the manuscript more useful and readily interpretable for readers.

      Reviewer #3 (Public review):

      This important study investigates the impact of nutrient stress in the tumor microenvironment (TME), focusing on lipid metabolism in pancreatic ductal adenocarcinoma (PDAC).

      Understanding TME composition is crucial, as it highlights cancer vulnerabilities independent of intracellular mutations, particularly because PDAC tumors are often exposed to limited nutrient availability due to reduced perfusion.

      By utilizing a medium that mimics the nutrient conditions of PDAC tumors, the authors convincingly show that TME nutrient stress suppresses SREBP1, leading to reduced lipid synthesis, with low arginine levels identified as a key driver of this suppression. Importantly, mice with arginine-starved pancreatic tumors respond to a polyunsaturated fatty acid-rich diet. This discovery uncovers a synthetic lethal interaction in the tumor microenvironment that could be leveraged through dietary interventions.

      The conclusions of this paper are mostly well supported by data; however, below are some aspects that could be further clarified.

      We thank the reviewer for their positive assessment of our manuscript.

      This study uses PDAC cells from the LSL-Kras G12D/+ ; Trp53 ; Pdx-1-Cre PDAC model. The authors convincingly demonstrate that the cell-extrinsic stimuli of low arginine availability suppress lipid synthesis and thus exert a dominant effect over the cell-intrinsic oncogenic Ras mutation, which is known to enhance fatty acid synthesis. Could the effect of low arginine on lipid synthesis be specific for certain mutations in PDAC? It would be interesting to investigate or discuss whether different mutations show the same SREBP1 reduction caused by low arginine levels, and whether these low SREBP1 levels can be ameliorated by arginine re-supplementation. Here, Jonker et al. show that human PDAC cells cultured in TIFM have reduced SREBP1 levels (Figure 1 - Figure supplement 1C). It would be further supportive of their conclusions if the authors could show that arginine re-supplementation is sufficient to restore SREBP1 levels in human PDAC cells.

      We thank the reviewer for this comment. In response, we have now shown that arginine supplementation increases SREBP1 levels and fatty acid synthesis in human PDAC cells (Figure 2 – Figure supplement 2). Further, we have also updated the manuscript to discuss that using the LSL-Kras G12D/+; Trp53; Pdx-1-Cre PDAC model limits our ability to assess how genetic differences influence the response to arginine starvation. We additionally discuss the genetic diversity of the human PDAC cell lines used in these studies, which do include different oncogenic mutations. We believe that these results provide some data that the findings we have made regarding arginine deprivation and SREBP in our genetically defined murine PDAC cell line are applicable to human PDAC cells with more diverse oncogenic lesions.

      The authors demonstrate that mPDAC cells cultured in RPMI and subsequently implanted into an orthotopic mouse model exhibit reduced expression of SREBP target genes when compared to in vitro cultured mPDAC-RPMI cells. This finding is in line with the observation that culturing PDAC cells in TIFM downregulates SREBP target genes compared to PDAC cells cultured in RPMI. However, caution is needed when directly comparing mPDAC-RPMI cultured cells to those in the orthotopic model, as the latter may include non-tumor cells and additional factors that could confound the results. The authors should explicitly acknowledge this limitation in their study.

      We thank the reviewer for this important caveat and we have revised to text to address this point. Importantly, we note that for all comparisons between in vitro and in vivo cultures, we carefully sort malignant cancer cells from orthotopic tumors prior to analysis. We believe this approach mitigates the impact of stromal contamination on our analyses.

      The in vivo evidence demonstrating that PUFA-rich tung oil reduces tumor size is compelling. However, the specific in vitro findings regarding its impact on doubling rates per day, particularly in the context of arginine-dependent PUFA supplementation, require further explanation. To enhance the robustness of their data and conclusions, the authors could consider conducting additional cell viability and proliferation assays. Moreover, it would be valuable to assess whether the observed effects on doubling rates per day remain significant after normalizing the data to the initial doubling time prior to PUFA supplementation. This is in particular important regarding the statement that "Addition of arginine significantly decreases sensitivity to a-ESA" as these cells already start with a higher doubling rate prior to a-ESA treatment.

      We thank the reviewer for this important comment and have performed additional experiments to measure cell death directly via viability markers in addition to our indirect measurements of cell number at the start and end of experiments. Furthermore, to address the issue of different rates of cell growth in cultures affecting the response to perturbations, we also used growth rate corrected metrics (PMID: 27135972) to ensure that affects of perturbations on cell growth and viability are not confounded by the baseline proliferative kinetics of the cells under various media conditions. We believe these additions strengthen our claims that arginine starvation sensitizes PDAC cells to PUFAs.

      Overall, this paper presents a compelling study that significantly enhances our understanding of the PDAC tumor microenvironment and its complex interactions with the tumor lipid metabolism.

      We again thank the reviewer for their positive assessment of our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In this study, the authors employ rigorous genetic and biochemical (metabolomic) approaches to uncover a previously unappreciated role for arginine in regulating lipid homeostasis. They further demonstrate the relevance of this pathway in pancreatic tumors, a solid tumor type often characterized by limited access to extracellular arginine. The authors present compelling evidence that arginine deprivation creates a metabolic liability, rendering tumors more susceptible to lipidome perturbations. This vulnerability can be therapeutically exploited through co-treatment with aESA and FIN to induce ferroptosis. Overall, the conclusions are convincing, the manuscript is well-written, and the figures are clearly presented.

      We again thank the reviewer for their positive assessment of our manuscript.

      The key weakness of the study lies in the mechanistic link between arginine levels and SREBP1 expression. While the data support the authors' argument, the observed changes in SREBP1 expression following arginine restriction appear modest relative to the more pronounced changes in the lipidome. To strengthen this connection, the authors may consider performing cellular fractionation to focus their analysis on the nuclear (active) pool of SREBP1. Improved immunofluorescence imaging and quantification of nuclear SREBP1 levels in tissues would also provide additional support for their model.

      We thank the reviewers for this helpful comment. To strengthen this study, we both examined the nuclear levels of SREBP1 in TIFM cultured cells and worked to identify the mechanistic link connecting arginine levels of SREBP1 expression.

      First, we found that arginine starvation does not lead to nuclear exclusion of SREBP1. We believe this finding strengthens our conclusion that arginine starvation regulates SREBP1 at the level of protein expression. We do agree with the reviewer that the change in SREBP1 protein level is modest, but we do show the effects of arginine on PDAC cell lipid metabolism are SREBP1 dependent (Figure 3O-P, Figure 5F, Figure 5 – Figure supplement 2D). Thus, we interpret these data that even the relatively modest change in SREBP1 protein levels are sufficient to cause large changes in the output of this transcription factor and the cellular lipidome.

      Second, we determined if the arginine-responsive GCN2 signaling pathway, which is known to regulate SREBP1, could contribute to the suppression of SREBP1 observed in PDAC cells. We found that GCN2 signaling is activated in PDAC cells in TIFM culture by arginine starvation and is active in animal tumors. We further found that activation of GCN2 is in part responsible for suppression of SREBP1, which is consistent with prior literature describing a role for GCN2 activation in suppressing SREBP1 translation (PMID: 17276353). Thus, while other mechanisms are at play in transducing arginine starvation to reduced SREBP1 protein levels, we have identified one mechanism (activation of GCN2) by which arginine starvation suppresses SREBP1, leading to the lipidomic changes we observed upon starvation of this amino acid.

      In addition, it would be helpful for the authors to highlight the most significantly upregulated and downregulated pathways in Figure 1B to give a more comprehensive view of transcriptomic changes in PDAC cells cultured under TIFM conditions. For example, since polyamines are downstream of arginine and known to regulate lipid metabolism, could some of the observed effects be attributed to changes in polyamine levels? Similarly, do arginine levels affect the expression of Dgat1 or Dgat2?

      We have added an additional Figure supplement to Figure 1 that include a comprehensive list of up- and downregulated gene sets in PDAC cells cultured in TIFM via GSEA analysis. We also added additional KEGG metabolic pathway analysis via GATOM (PMID: 35639928). We hope these additions will be useful for readers and point their attention to other metabolic pathways that are significantly altered by nutrient stress, such as the TCA cycle and oxidative phosphorylation, beyond those related to lipid metabolism that we investigated here.

      From this analysis, we did not specifically note strong changes in the expression of polyamine metabolic enzymes or DGATs.

      Finally, the KPC model used in this study involves conditional deletion of p53, which is known to produce tumors with a faster progression and a distinct tumor microenvironment compared to the more commonly used p53^R172H knock-in model. Including this point in the discussion would help contextualize the findings.

      We thank the reviewers for mentioning this limitation of our study. In the results section of the test, we now included a discussion of the limitations of the mouse model used in the discussion of the work. We also highlight in the text now that in addition to our studies using the murine p53 deletion model that our studies make use of human PDAC lines that contain p53 mutations. We believe that these results provide some data that the findings we have made regarding arginine deprivation and SREBP in our genetically defined murine PDAC cell line are applicable to human PDAC cells with more diverse oncogenic lesions.

      Minor comments to improve clarity:

      (1) In Figure 3C, it would be helpful to annotate the PE-linked TG for clarity.

      We do not understand exactly what PE-linked TGs refers to. We note in Fig. 3C that ether-linked triglycerides are labeled in orange and annotated as O-TG and vinyl ether-linked triglycerides are labeled in grey and annotated as P-TG.

      (2) Is Figure 3P mislabeled? Both conditions are labeled as +Arg / -lipid.

      We thank the reviewers for pointing out this mistake in the figure and have updated it to correctly label these samples as sgSREBP1 and sgNTG transduced PDAC cell lines.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1B: Misspelling in Y axis "Normalized enrichment score".

      We thank the authors for catching this mistake and have corrected this error.

      (2) Figure 1B: Could the authors elaborate on why they decided to focus specifically on these three hits, which are not the most downregulated genes (the "top hits") appearing in the GSEA?

      We chose to focus on lipid metabolism as multiple transcriptomic analysis tools, namely GSEA and GATOM, which specifically focuses on enrichment in KEGG annotated metabolic pathways, highlighted lipid synthesis as being the most transcriptionally regulated metabolic pathway in TIFM. To make this apparent to readers, we added an additional Figure supplement to Figure 1 that includes a comprehensive list of up- and downregulated gene sets in PDAC cells cultured in TIFM from GSEA and GATOM analysis. We hope these additions will make the logic for our focus on lipid synthesis clear and will be useful for readers in highlighting other metabolic pathways that are significantly altered by nutrient stress, such as the TCA cycle and oxidative phosphorylation.

      (3) Figure 1: It might improve the clarity of the text if the three pairs of murine cell lines (mPDAC1, mPDAC2, mPDAC3) were introduced in a bit more detail in the main text and not just in the figure legend.

      We have added more detail describing the three mouse cell lines used in the main text.

      (4) Figure 1E: The authors may wish to comment on why they chose to perform transcriptomic analyses with the mPDAC3 derived models, and not mPDAC1 or mPDAC2, given that mPDAC3 appears to exhibit the most distinct phenotype of the three, according to the results presented in Figure 1 J-L.

      The transcriptional analysis described in Fig. 1E was performed on a previously acquired dataset using mPDAC3 cell lines (PMID: 37254839), which is why this line was used. We have revised the text to make it clear that this transcriptional analysis uses pre-existing data from a previous publication.

      (5) Figure 1L: The authors may wish to clarify why they only show relative palmitate to assess global fatty acid biosynthesis in these cell lines. There is a decrease in labeled palmitate of mPDAC3 cells cultured in TIFM in comparison to the cells cultured in RPMI media, showing a decrease in the lipid biosynthesis of these cells in these conditions. However, there also seems to be lower palmitate levels in the TIFM-cultured mPDAC3 cells specifically, in comparison to their mPDAC1 and mPDAC2 counterparts. Why is that? Could the authors comment on this result?

      We thank the reviewers for this helpful observation. In Figure 1L (now Figure 1N), we wanted to show how culture conditions (RPMI/TIFM) affected both the total amount of palmitate in PDAC cells but also the fraction that is labeled (i.e. arising from de novo synthesis). We think this provides more information for readers by allowing them to assess both changes in pool size of palmitate and changes in the fraction of palmitate that is synthesized. We like this presentation as it shows clearly that while total palmitate levels behave differently across cell lines (with TIFM culture reducing levels in mPDAC1-2 but increasing levels in mPDAC3) the amount of palmitate that is synthesized de novo is decreased in all three cell lines when cultured in TIFM. To highlight this, we also present the fraction of palmitate that is labeled in Fig. 1O.

      We are unsure why TIFM culture reduces total palmitate levels in some PDAC cell lines, while others are able to maintain total palmitate pools. We assume that TIFM cultures increase lipid uptake to compensate for lack of synthesis, and potentially differences in lipid scavenging capacity between the lines could explain this difference. We are currently working on experiments to test these hypotheses and will present the results in a future study.

      (6) Figure 2 - Figure Supplement 1A: It would be informative and appreciated to know which nutrients are actually represented and correspond to certain points on the graph, in particular for the ones that are the most differentially present in the two different media.

      We have now updated this graph to highlight key metabolites that are most differentially abundant between the two media. We also now provide as a Supplementary file the composition of TIFM, which provides readers with all the information needed to understand which metabolites are differentially abundant in TIFM and any media they wish to compare.

      (7) Figure 2 - Related to Figure supplement 1D: It would be useful to know how or why arginine was selected for further investigation from the subset of amino acids. The authors could elaborate on this, by showing or highlighting the data that drew attention to this amino acid initially.

      We thank the reviewers for this note. We have tried to make Figure 2 – Figure supplement 1 more clear as to how arginine was selected for further investigation. We have updated the figure to improve clarity for the comparisons of different media that enabled us to identify differences in amino acids between RPMI and TIFM as driving the difference in lipid metabolism. We have also highlighted in Figure 2 – Figure supplement 1A that arginine is the most differentially abundant amino acid and editing the text to explain the logic that this high degree of differential abundance is why we focused on arginine amongst all the amino acids as a likely candidate for regulation of SREBP1.

      (8) The legends for Figures 2G and 2H could be improved, i.e., making clearer that 2H shows incorporation in the circulating fatty acids, unlike 2G.

      We have updated the figure with improved labeling as the reviewer suggested to denote which panels correspond to which sample type.

      (9) Figure 3E and 3G: The heatmaps displayed here show that the addition of arginine to TIFM culture medium restores fatty acid synthesis; however, it appears that the nature of the lipids synthesized in this condition may differ from the ones synthesized in RPMI cultured conditions.

      We have added additional text highlighting that arginine supplementation to TIFM and RPMI culture led to induction of different SREBP1-target genes, but that both lead to activation of fatty acid synthesis and desaturation genes, which contributes to the focus of our study on de novo synthesis of saturated and monounsaturated fatty acids in the study.

      (10) Figure 3O: The SREBP1 immunoblot still seems to show some residual bands for the cells transduced with SREBP1 targeting sgRNAs, therefore, the authors may want to be more nuanced and present this model as a KD, instead of a KO, as mentioned in the text?

      We agree with the reviewer’s suggestion, and we have changed the text to describe these as knockdowns rather than full knockouts.

      (11) Figure 3P: Is it possible that there is an error in the legend of the figure (Lipids + for the first bar and - for the second one?). The figure could also be improved by a legend that explains what the different colored bars represent.

      We thank the reviewers for pointing out this mistake in the figure and have updated it to correctly label these samples as sgSREBP1 and sgNTG transduced PDAC cell lines.

      (12) Figure 4: The authors are stating in Figure 4 - Figure supplement 1A-F, that argininerestricted mPDAC cells are not sensitized to xCT or GPX4 inhibitors that trigger ferroptosis and that therefore SREBP1 suppression by arginine restriction in the TME does not sensitize PDAC cells to ferroptosis inducers. However, this does not appear to be so clear with the data shown. This might be due to the limitations associated with the population doubling measurements instead of the lethality measures noted above. Likewise, later it is proposed that arginine restriction sensitizes both mPDAC cells and human PDAC cells to α-ESA induced ferroptosis. These results would benefit from a direct measure of cell death. Related to the above point, it would be useful to better understand why cells cultured in arginine-deprived TIFM do not appear to be sensitized to ferroptosis inducers, but these same cells die from ferroptosis when treated with α-ESA. It would be useful to present some thoughts.

      We thank the reviewers for bringing up this important point. To the reviewers first point, we repeated xCT and GPX4 inhibitor treatment experiments to include both growth corrected (PMID: 27135972) proliferation assays and Sytox-based viability assays. In both cases, we did not find consistent sensitization to xCT or GPX4 inhibitors across multiple PDAC lines when cultured in TIFM. In contrast, we found consistent sensitization to PUFA treatment across multiple murine and human PDAC cell lines cultured in TIFM. Together, this analysis suggests that arginine starvation specifically sensitizes PDAC cells to PUFAs, but not other ferroptosis inducers.

      We agree with the reviewer that this is an interesting and unexpected observation. We do not have a mechanistic understanding as to why this is the case. However, we believe this is quite interesting and suggests that PUFAs maybe a better method of inducing ferroptosis in certain conditions than other ferroptosis inducing approaches. We have added text to the discussion to highlight this interesting and unexplained observation.

      (13) Figure 6: The authors mention that α-ESA is used here at sublethal doses, which do not affect viability or proliferation, but this is not shown in either the main or supplementary data. These data should be provided somewhere. It might also be nice to mention in the main text (not just in the legend) the dose of α-ESA used for the combination treatments.

      We thank the reviewers for this helpful suggestion. To illustrate that α-ESA is used at a sublethal dose, we altered each panel to be on a linear rather than logarithmic x-axis, therefore including the DMSO control arm for each ferroptosis inducer in combination with α-ESA. We hope this now clearly illustrates that this dose α-ESA is not perturbing cell growth or viability in these assays.

      (14) Figure 6B: Fer-1 treatment does not seem to rescue the phenotype very clearly. This could again be because cell death is being conflated (to degree) with effects on proliferation, and Fer-1 is not expected to affect cell proliferation. Again, measuring cell death directly would be better than measuring population doublings.

      We thank the reviewers for this helpful comment. To address this concern, we have added Sytox-based viability assays to figure 6. These assays indicate that Fer-1 treatment rescues the viability of PDAC cells treated with ferroptosis inducers, α-ESA, or the two in combination.

      Reviewer #3 (Recommendations for the authors):

      General notes:

      (1) It would be easier for the reader if one condition were consistently placed in the same position throughout the graphs. For example, RPMI results should always appear first and TIFM second. Currently, this is inconsistent throughout the manuscript (e.g., Figure 1 - Figure Supplement 1: RPMI is first and TIFM second; Figure 2 - Figure Supplement 1: TIFM is first and RPMI second).

      We thank the reviewers for this note. We have updated the figures to remain consistent in their ordering throughout the manuscript.

      (2) Please briefly explain the differences between PDAC1-3 and clarify why most follow-up experiments were conducted using PDAC1. Presumably, this was because PDAC1 showed the most robust effect on fatty acid synthesis.

      We have added additional text in the results section of the manuscript describing the different murine PDAC lines used in this study. We performed most studies with mPDAC1 as this line has robust differences in fatty acid synthesis between culture conditions. However, murine PDAC lines recapitulate the transcriptional subtype diversity of PDAC (PMID: 29364867), so we critically repeat key experiments in multiple mPDAC lines to determine if a given finding is translatable to other PDAC subtypes.

      (3) Are only SREBP1 protein levels affected or are SREBP1 RNA levels also decreased in low arginine TME?

      We appreciate this important comment. We have added SREBP1 RNA levels to Figure 1 to show that RNA levels do not differ between conditions, whereas protein levels of SREBP1 change significantly.

      (4) What was the rationale for investigating lipid metabolism even though it was not the top changed metabolic gene signature? It would be interesting to briefly discuss which pathways were the most enriched.

      We chose to focus on lipid metabolism as multiple transcriptomic analysis tools, namely GSEA and GATOM, which specifically focuses on enrichment in KEGG annotated metabolic pathways, highlighted lipid synthesis as being the most transcriptionally regulated metabolic pathway in TIFM. To make this apparent to readers, we added an additional Figure supplement to Figure 1 that includes a comprehensive list of up- and downregulated gene sets in PDAC cells cultured in TIFM from GSEA and GATOM analysis. We hope these additions will make the logic for our focus on lipid synthesis clear and will be useful for readers in highlighting other metabolic pathways that are significantly altered by nutrient stress, such as the TCA cycle and oxidative phosphorylation.

      Further comments:

      (1) Figure 1 Supplement 1A: It is not clear which SREBP target genes are significant. Please indicate this more clearly.

      The analysis in this section was done on expression level of all the indicated genes between groups (tumor/normal) rather testing for significance of individual genes between the two groups. We have updated both the text and the figure legend to clarify this as the statistical analysis that was performed.

      (2) Figure 1J and 2C: The Western blot loading control (Actin) does not appear equal across all samples. It would be helpful to include a quantification normalized to the Actin loading control.

      We have included quantification of each western blot to help interpret these immunoblots.

      (3) Supplementary Figure 2: How often has this experiment been performed? The TIFM results appear to consistently show the same values. If this is the case, it needs to be labeled appropriately.

      Thank you for pointing out that how we presented the data was confusing as to how the experiment described was performed. Initially, we performed multiple separate experiments to identify arginine starvation as the TIFM-driver of SREBP1 suppression. To compare across all the separate media conditions, we performed one experiment with all the relevant media conditions together, which is the experiment that is described in the manuscript. Thus, there was one set of control TIFM/RPMI conditions to which we compared all of the different media conditions. As we initially presented the data, it appeared as if we had performed multiple experiments in which the TIFM/RPMI controls had exactly the same behavior, which is not the case. We have updated the data presentation in this figure to make it clear that this was the experimental design for the data presented.

      (4) Figure 3P: Please add a legend for this panel.

      We thank the reviewers for point out this mistake in the figure and have updated it to correctly label these samples as sgSREBP1 and sgNTG transduced PDAC cell lines.

      (5) Figure 4 - Figure Supplement 1: Please review the legend carefully. The legend currently includes only circles, but some of the graphs (A and F) display squares.

      Thank you for catching this mistake. We have updated the panels and legends for this figure so they are concordant.

      (6) Figure 4D: The effect of a-ESA treatment on the doubling delta of arginine-treated versus non-treated TIFM cells looks similar. It looks like the difference is because cells treated with arginine start at higher doubling values from the beginning. I would suggest looking at the delta and subsequently tone down the statement: "Addition of arginine significantly decreases sensitivity to a-ESA."

      Thank you for this helpful comment. To avoid any confounding effects of differences in basal growth rate between mPDAC cells grown in different media, we have converted all of our data to GR values as described in (PMID: 27135972) which enables us to take into account the basal growth rates of cultures when calculating the effects of treatments/perturbations on culture growth and viability. We hope this addition makes the effect that arginine has on α-ESA sensitivity clear beyond the impact that arginine has on basal growth rate.

      In addition, we also measured the viability of α-ESA treated mPDAC cells with and without supplemental arginine (current Fig. 5E) by Sytox-exclusion assay. We believe this new data supports the claim that arginine makes PDAC cells resistant to the addition of exogenous PUFAs.

    1. Author response:

      We appreciate the constructive feedback from the reviewers and are currently working diligently to address all concerns raised in both the public reviews and the recommendations for the authors. Below, we outline the revisions planned for the revised manuscript.

      (1) We acknowledge the limitations of the current modeling framework regarding spatial integration, and we agree that the present model does not account for the short lifetime of the dot stimuli.

      For spatial integration, our current data suggest a relatively narrow, center-weighted integration function in zebrafish, compared to a broader integration function in medaka. While incorporating such spatial weighting into the model would improve its completeness, we do not expect it to substantially alter our current interpretation of the underlying mechanisms.

      Regarding the responses to short-lifetime dot stimuli, we hypothesize that medaka may possess local retinal receptive units that function as low-pass filters, as illustrated schematically in Figure 3e. At present, however, we believe that explicitly modeling this component would remain largely uninformative and would not substantially increase the explanatory power of the model.

      In the revised manuscript, we will discuss these limitations and the possible neural implementations more explicitly in the Discussion section.

      (2) We appreciate the reviewer’s comments regarding the clarity of data presentation and statistical descriptions.

      In the revised manuscript, we will improve the clarity of the figures and legends and provide more explicit explanations of the statistical analyses and summary metrics used throughout the study. We will also revise several sections of the text to improve the framing and interpretation of the results.

    1. Author response:

      We thank the editors and reviewers for their constructive feedback on our manuscript. We accept the reviewers' recommendations and will implement them fully in our revised manuscript and include all of the suggested literature references. Below, we highlight several key points raised during the evaluation and outline exactly how we will address them. We will also explicitly address every other point and minor recommendation raised by the reviewers in our final, comprehensive point-by-point response.

      Population-level quantification and statistical thresholds: The reviewers noted that our manuscript relied on single-neuron examples without fully demonstrating how widespread these patterns are across the recorded population. To address this, we will add population-level quantification across the recorded units using standard False Discovery Rate (FDR) corrections for multiple comparisons. We will include summary tables in the text and add statistical threshold lines to the distribution figures to report the proportion of significant neurons per region.

      Identifying amodal neurons: Reviewers raised concerns that our classification of amodal language neurons required a more direct test. We will provide additional measures of modality and, in particular, we will implement a cross-modal generalization analysis where our encoding models are trained on one modality (e.g., listening) and evaluated on the other (e.g., reading). This additional procedure will classify neurons as amodal if their cross-modal predictive performance exceeds a baseline null model.

      Isolating linguistic features from sensory confounds: A point was raised regarding whether some neurons were tracking low-level sensory properties (like sound amplitude or visual text size) rather than language features. We will address this by running encoding analyses that include additional basic acoustic envelopes and visual baseline properties as control variables. This will allow us to evaluate the unique variance explained by linguistic features after accounting for these low-level sensory baselines.

      Evaluating the "Compositional Code" in the Fusiform Gyrus: Reviewers pointed out that our claim regarding a "compositional code" (neurons tracking a combination of letter identity and position) was supported primarily by individual examples. To provide population-level context, we will perform a model comparison across our fusiform gyrus neurons. We will compare a baseline letter-only model against a model that includes letter-by-position interactions to report how many neurons statistically support this compositional structure.

      TRF Feature and procedure explanation: Reviewers requested clarification on the construction of our TRF features. We will update the Methods section to explicitly detail how the features were constructed for both modalities. We will also include a feature correlation matrix in the Supplementary Materials. Furthermore, in order to contrast low-level possible confounds and high-level linguistic features, we will also conduct a control analysis tracking, e.g., specific affixes across different structural roles – for example, comparing how neurons respond to the phoneme /-s/ when it functions as a plural number marker versus when it appears as part of a lexical item (e.g., pass) or a third-person verb agreement. We will conduct such analyses in addition to fitting the main TRF models with these additional confounds included, ensuring a clear dissociation between high and low-level features.

    1. Author response:

      Reviewer #1 (Public Review): 

      The medial reticular formation (MRF) in the brainstem has long been implicated in the regulation of locomotion. One common - albeit very simple - model often presents the MRF as a major relay station receiving inputs from MLR circuits, among other brain regions, that together convey locomotor signals through efferent projections targeting the caudal brainstem and the spinal cord. Yet, the MRF is a particularly large brain area whose cellular complexity is far from understood. How molecularly distinct MRF ensembles contribute to the regulation of locomotor behaviors is largely unknown. Here, the authors apply focal activation of either glutamatergic, GABAergic, or serotonergic neurons throughout the MRF using a chemogenetic gain-of-function approach to uncover the putative modulatory properties of these neuronal ensembles during walking. Using kinematic analysis of mice limbs during self-paced over-ground walkway locomotion, the authors find that activation of GABAergic MRF neurons can selectively slow down walking, whereas activation of glutamatergic neurons can induce a specific "shuffle" limb trajectory, altogether revealing that distinct MRF populations may retain the capability to engage divergent walking signatures, whose behavioral relevance are not yet clear. In contrast, the activation of serotonergic neurons did not affect walking signatures as described for the other two subgroups but led to an increase of locomotor speed. Interestingly, MRF neurons in each regional activation "hotspots" appear to target different domains in the lumbar spinal cord, suggesting that distinct circuit mechanisms are at play for the slowmo vs shuffle effects. 

      Major points: 

      (1) While the experiments are carefully done and the results are well analyzed and clearly presented in a series of beautiful figures, several aspects of the methodology remain very confusing. 

      A) In particular, the initial choice for the injection coordinates is not justified and the authors don't leverage the mapping of spinal projection neurons to drive their chemogenetic screen. 

      Thank you for pointing this out. To clarify this, we now start the results with an extra paragraph and accompanying figures (Figure 2 and its supplementary figures) in which we define the region of interest (ROI) within the mRF. The ROI is based upon the distribution of reticulospinal neurons in the brainstem mRF that connect directly with the lumbosacral enlargement (whether or not this ROI projects to other CNS sites), which contains the main networks important for hindlimb control during locomotion, including walking gait. Reticulospinal neurons in the mRF in the caudal pons and medulla oblongata form longitudinal columns that together occupy up to more than half of the entire brainstem. While the morphology of the medulla and caudal pons varies little from level to level, in contrast to rapid changes at the midbrain level, this doesn’t necessarily mean that the neuronal populations, even within neurotransmitter classes, are homogeneous in connectivity and function. We have now clearly denoted the rostrocaudally extensive field with its dorsoventral and mediolateral dimensions that comprises the anatomical region of interest in the new figure. While this dataset is rather basic, it allows us to directly refer back to it and clarify additional queries that came up related to the anatomy (i.e. that the hotspots for slomo- and shuffle-like gaits only cover a small portion of the reticulospinal field).

      We then included detailed anatomical mapping of the spinal projections for the identified hotspots for changes in walking quality (phenomenology), the central theme of the study, and immediately adjacent regions to highlight contrasting location-connectivity-functional properties between these adjacent sites. To better incorporate these mapping results we now present it directly following the walking function based transfection site mapping, but before delving into the details of the walking gait phenotypes. We did not systematically include mapping results from all sites in the mRF ROI into this manuscript as this was beyond the scope of this already very large functional-anatomical study. 

      B) Similarly, the authors group very different injection schemes (unilateral or bilateral targeting of MRF neurons), that should be analyzed separately. 

      We now clarify early in the results section how uni- and bilateral groups were composed and what the rationale was for this. As pilot data suggested that the slomo gait style was only seen following bilateral activation in VGaT-cre mice, but not in all bilateral cases, we designed the VGaT cohort to contain mainly bilateral injections, spread across the mRF region of interest, with a smaller group of unilateral injections to verify the pilot data. 

      For the shuffle gait style, pilot data suggested that both uni- and bilateral activation of VGluT2 neurons could elicit this style, but only in a subset of uni- and bilateral cases. Therefore we mainly included unilateral injections in this group with a smaller bilateral cohort for verification.  This approach served the main goal of the study, which was to map the walking style changes to subregions in the mRF.

      However, laterality is indeed very important when it comes to locomotor control. The effects of laterality on the walking gait styles generated from the hotspots were included in supplemental figures and accompanying Tables. We have now better highlighted these in the body of the text and we have added analyses of the motor tests for uni- or bilateral groups. 

      Furthermore, it should be noted that the uni- and bilateral groups are heterogeneous when it comes to rostrocaudal and dorsoventral placement within the mRF ROI. As such, we were not able to rigorously compare uni- versus bilateral activation effects while at the same time separating cases out by dorsoventral and rostrocaudal location (which would be needed to do justice to the functional anatomical organization of the mRF) as we do not have sufficient power in each of the subgroups (i.e. 3 rostrocaudal levels, with each a dorsal, intermediate and ventral region to target, which each would have to be injected unilaterally and bilaterally). This was beyond the scope of this already very large study. Further studies designed to balance ipsi- and contralateral groups will be necessary to map out the hotspots for mobility phenotypes that may be driven by the mRF beyond the slomo- and shuffle-hotspots or to systematically study the impact of laterality on mobility from the mRF.  

      To summarize, analyses of uni- vs bilateral stimulation demonstrate that bilateral inhibition within the slomo hotspot is necessary to create the slomo walking phenotype, and that unilateral inhibition within the shuffle hotspot is sufficient to create the shuffle walking phenotype (with bilateral stimulation not enhancing the phenotype further). Unilateral activation of the slomo hotspot did not induce asymmetries in gait or a reduction in motor performance, whereas unilateral activation of the shuffle hotspot induced an asymmetry in swing time but not stride length, with laterality affecting horizontal ladder but not other motor tests. Mice with transfection sites within the mRF region of interest but outside of the slomo and shuffle hotspots did not display these walking phenotypes but did display slowed walking without qualitative changes. The connectivity to spinal and other supraspinal substrates differed between these sites, providing clues for the substrates that mediate these differential functions.

      C) The choice of Z score cutoff that dictates the in-depth analysis of the chemogenetic phenotypes appears arbitrary and is not grounded in a set of objective criteria. 

      We are sorry that the Z score cutoff appeared arbitrary as that was not our intention. 

      The values to separate mice with and without a significant change were simply set at 2 standard deviations from the population mean in the control mice (i.e. Z=2). Two standard deviations from the population mean is widely used in all types of statistical analyses. We have now included the rationale for the cutoff of Z=2 in the text. Where group size allowed, to increase contrast between positive and negative groups in terms of gait characteristics, other behavioral assays and mapping, we used data from Z scores >3 (or < -3), but can assure that all moderately positive data (i.e. from mice with gait style Z scores between 2 and 3, and between -3 and -2) was reported as well in the statistical tables or supplementary figures. We have now included the links to theses supplementary tables and figures in the text, rather than only in the figure legends.

      The Z scores for the different gait styles indeed appear to map to discrete sites, but the Z score cutoff was not informed by these sites or by anatomical data. Similarly, Z scores for changes in tonic muscle activity elicited by activation of inhibitory neurons also mapped to a hotspot in the same rostrocaudal column as the slomo gait style, but further caudally. This further demonstrates the strength of function-based mapping. 

      (2) One issue that arise from the work presented here is that we don't know if these MRF neurons are active during locomotion in normal, unperturbed conditions. Knowing the recruitment profile of these MRF neurons would clarify whether the chemogenetic activation boosts the firing of neurons that are already active during walking, or activate neurons that are otherwise silent. Disentangling between these possibilities may have a profound impact on the overall interpretation of the results. 

      We agree that this knowledge would improve our ability to interpret and apply the findings of the current study. It is indeed important to learn when these mRF sites are being recruited, whether part of normal modulatory strategies in order to navigate through a complex environment or as part of specialized behavioral modules or both.  Another question is how loss of function in these sites impacts behavior and function. This concept has been added to the discussion and these questions can now be pursued in future experiments. 

      (3) The results should be discussed in the broader context of historic stimulation experiments, notably in cats and other species, as well as more recent circuit mapping approaches in rodents. For instance, the notion that focal stimulation of distinct area within the MRF can elicit or modify the pattern of locomotion is not really new, so is the notion that some of these modulations are phase-specific and can influence the duration of single muscle activation during stance or swing phases. This last point has for instance already been assessed through individual muscle recordings paired with MRF stimulation in cats. Perhaps better introducing these key studies and a thorough discussion of what the results presented in this manuscript bring in terms of novelty will help readers ground this work into a more comprehensive and larger body of work. 

      There is indeed a rich series of meticulous work done in cats, which included effects from stimulation of inhibitory and excitatory neurons on limb EMG, and rodent work focusing on excitatory mRF neurons. These studies show that distinct neurons or sites within the mRF drive distinct changes in motor readouts, albeit not described in terms of modulation of walking gait as we do here in terms of gait signatures. Despite this solid body of prior work, the notion of phase specificity and separate modulation of swing versus stance phase metrics has been underappreciated and therefore deserves to be emphasized. We have expanded the discussion to better highlight prior work and the interpretation of phase specificity has been enriched.  

      Reviewer #2 (Public Review): 

      This paper is an interesting conceptual work where certain hotspot areas were found to induce unique gait patterns. These patterns differed from a classic change in speed or gait pattern from a walk to a gallop. From this, a hypothesis was formed that these areas could be important for possible alternative walking patterns seen, for example, during pathologies such as Parkinson's disease or perhaps related to stalking behaviors. 

      While I liked the work and found it interesting, it remains descriptive in that the actual behaviors observed can't be causally related to a particular behavior such as stalking or shuffling. If the necessity or sufficiency of this region was related to a specific hunting behavior, for example, its interest to the field would be greater. 

      Nevertheless, this paper does contribute to growing evidence that specific behaviors can be triggered by specific neuronal populations within the brainstem. 

      We thank the reviewer for their thoughtful comments. We agree that more studies are necessary to understand how the slomo and shuffle hotspots serve behavioral repertoires (such as stalking or other internally driven activities) and adaptations (such as object avoidance or more subtle adjustments to terrain or internal cues). The experimental details of the present study leave ample leads for the research community to pursue these new directions.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1.

      We appreciate the constructive comments, which greatly improved this manuscript.

      Reviewer #2.

      We appreciate Reviewer #2's thorough analysis of our manuscript. However, we are concerned that the reviewer criticized a conclusion different from the one we claim in the manuscript. Although Reviewer #2's public comment stated, "Such an approach is insufficient to unequivocally support the central claim that DNA methylation increases accessibility of H2A.Z-containing nucleosomes", we did not draw such a bold conclusion. In the Abstract, we cautiously described that the impact of DNA methylation we observed was subtle and based on satellite II-derived DNA sequences. We made a nuanced proposal regarding this observation, stating, "Altogether, we propose that SRCAP drives the biased association of H2A.Z to unmethylated DNA, while additional mechanisms, potentially taking advantage of the subtle DNA methylation-induced physical effects, further assist the exclusion of H2A.Z from methylated DNA". We believe our analysis will contribute valuable insights into the mechanistic basis behind the antagonism between DNA methylation and H2A.Z.

      Reviewer #3.

      We appreciate the constructive comments, which greatly improved this manuscript.


      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides valuable mechanistic insight into the mutually exclusive distributions of the histone variant H2A.Z and DNA methylation by testing two hypotheses: (i) that DNA methylation destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin remodeling complexes. Through a series of well-designed and carefully executed experiments, findings are presented in support of both hypotheses. However, the evidence in support of either hypothesis is incomplete, so that the proposed mechanisms underlying the enrichment of H2A.Z on unmethylated DNA remain somewhat speculative.

      We would like to thank the editor and reviewers for their critical assessments of our manuscript. While we do acknowledge the limitations of our work, we believe that our results provide important mechanistic insights into the long-standing question of how H2A.Z is preferentially enriched in hypomethylated genomic DNA regions. First, our structural and biochemical data suggest that DNA methylation increases the openness and physical accessibility of H2A.Z, albeit the effect is relatively subtle and is sequence-dependent. Second, using Xenopus egg extracts and synthetic DNA templates, we provide the first clear and direct evidence that DNA methylation-sensitive H2A.Z deposition is due to the H2A.Z chaperone SRCAP-C, corroborated by our discovery that SRCAP-C binding to DNA is suppressed by DNA methylation. Although the molecular details by which DNA methylation inhibits binding of SRCAP-C is an important area of future study, in our current manuscript, we do provide evidence that directly links the presence of SRCAP-C to the establishment of the DNA methylation/H2A.Z antagonism in a physiological system. Thanks to criticisms by the reviewers, we realized that we did not clearly state in our Abstract that the impact of DNA methylation on intrinsic H2A.Z nucleosome stability is relatively subtle, although we did explain these observations and limitations in the main text. In our revised manuscript, we are willing to edit the text to better clarify the criticisms raised by the reviewers.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors considered the mechanism underlying previous observations that H2A.Z is preferentially excluded from methylated DNA regions. They considered two non-mutually exclusive mechanisms. First, they tested the hypothesis that nucleosomes containing both methylated DNA and H2A.Z might be intrinsically unstable due to their structural features. Second, they explored the possibility that DNA methylation might impede SRCAP-C from efficiently depositing H2A.Z onto these DNA methylated regions.

      Their structural analyses revealed subtle differences between H2A.Z-containing nucleosomes assembled on methylated versus unmethylated DNA. To test the second hypothesis, the authors allowed H2A.Z assembly on sperm chromatin in Xenopus egg extracts and mapped both H2A.Z localization and DNA methylation in this transcriptionally inactive system. They compared these data with corresponding maps from a transcriptionally active Xenopus fibroblast cell line. This comparison confirmed the preferential deposition or enrichment of H2A.Z on unmethylated DNA regions, an effect that was much more pronounced in the fibroblast genome than in sperm chromatin. Furthermore, nucleosome assembly on methylated versus unmethylated DNA, along with SRCAP-C depletion from Xenopus egg extracts, provided a means to test whether SRCAP-C contributes to the preferential loading of H2A.Z onto unmethylated DNA.

      Strengths:

      The strength and originality of this work lie in its focused attempt to dissect the unexplained observation that H2A.Z is excluded from methylated genomic regions.

      Weaknesses:

      The study has two weaknesses. First, although the authors identify specific structural effects of DNA methylation on H2A.Z-containing nucleosomes, they do not provide evidence demonstrating that these structural differences lead to altered histone dynamics or nucleosome instability. Second, building on the elegant work of Berta and colleagues (cited in the manuscript), the authors implicate SRCAP-C in the selective deposition of H2A.Z at unmethylated regions. Yet the role of SRCAP-C appears only partial, and the study does not address how the structural or molecular consequences of DNA methylation prevent efficient H2A.Z deposition. Finally, additional plausible mechanisms beyond the two scenarios the authors considered are not investigated or discussed in the manuscript.

      Although we acknowledge the limitations of our study and are willing to expand our discussion to more thoroughly discuss these points, we believe our manuscript provides several important mechanistic insights which this reviewer may not have fully appreciated.

      Our first conclusion that H2A.Z nucleosomes on methylated DNA are more open and accessible compared to their unmethylated counterparts is supported by both our cryo-EM study and the restriction enzyme accessibility assay. Although the physical effect of DNA methylation is relatively subtle and is likely sequence dependent, as we clearly noted within the manuscript, the difference does exist and is valuable information for the chromatin field at large to consider.

      The second major conclusion of our manuscript is that SRCAP-C exhibits preferential binding to unmethylated DNA over methylated DNA, and that SRCAP-C represents the major mechanism that can explain the biased deposition of H2A.Z to unmethylated DNA in Xenopus egg extracts. Furthermore, our experiments using Xenopus egg extract clearly demonstrated that H2A.Z is deposited by both DNAmethylation sensitive and insensitive mechanisms. Depletion of SRCAP-C almost completely eliminated the levels of DNA-methylation-sensitive H2A.Z deposition and reduced the total level of H2A.Z on chromatin to less than half of that seen in non-depleted extract. This result demonstrated that DNA methylation-sensitive H2A.Z loading is primarily regulated by SRCAP-C, at least in our experimental context where transcription, replication, and other epigenetic modifications are not involved. It is likely that additional mechanisms do further contribute, implicated by our sequencing experiments, particularly at regions with active transcription, and we have noted these possibilities and the rationale for their existence in the Discussion.

      Our study also suggests that a SRCAP-independent, DNA methylation-insensitive mechanism of H2A.Z loading exists, which we suspect to be mediated by Tip60-C. In line with this possibility, our data suggest that Tip60-C binds DNA in a DNA methylation-insensitive manner in Xenopus egg extract. Since antibodies to deplete Tip60-C from Xenopus egg extract are currently unavailable, we were unable to directly test that hypothesis and decided not to include Tip60-C into our final model as we lacked experimental evidence for its role. However, whether or not Tip60-C is the complex responsible for the DNA methylation-insensitive pathway does not influence our final conclusion that SRCAP-C plays a major role in DNA methylation-sensitive H2A.Z loading. We are planning to edit our manuscript to more comprehensively discuss these points.

      Please note that while Berta et al reported that DNA methylation increases at H2A.Z loci in tumors defective in SRCAP-C, they selected those regions based off where H2A.Z is typically enriched within normal tissues (Berta et al., 2021). They did not show data indicating whether H2A.Z is still retained specifically at those analyzed loci upon mutation of SRCAP-C subunits. Thus, although we greatly admire their work and are pleased that many of our findings align with theirs, their paper did not directly address whether SRCAP-C itself differentiates between DNA methylation status nor the impact that has on H2A.Z and DNA methylation colocalization. In contrast, our Xenopus egg extract system, where de novo methylation is undetectable (Nishiyama et al., 2013; Wassing et al., 2024) offers a unique opportunity to examine the direct impact of DNA methylation on H2A.Z deposition using controlled synthetic DNA substrates. Corroborated with our demonstration that DNA binding of SRCAP-C is suppressed by DNA methylation, we believe that our manuscript provides a specific mechanism that can explain the preferential deposition of H2A.Z at hypomethylated genomic regions.

      Reviewer #2 (Public review):

      This manuscript aims to elucidate the mechanistic basis for the long-standing observation that DNA methylation and the histone variant H2A.Z occupy mutually exclusive genomic regions. The authors test two hypotheses: (i) that DNA methylation intrinsically destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATPdependent chromatin-remodelling complexes. However, neither hypothesis is rigorously addressed. There are experimental caveats, issues with data interpretation, and conclusions that are not supported by the data. Substantial revision and additional experiments, including controls, would be required before mechanistic conclusions can be drawn. Major concerns are as follows:

      We appreciate the critical assessment of our manuscript by this reviewer. Although we acknowledge the limitations of our study and will revise the manuscript to better describe them, we would like to respectfully argue against the statement that our "conclusions […] are not supported by the data".

      (1) The cryo-EM structure of methylated H2A.Z nucleosomes is insufficiently resolved to address the central mechanistic question: where the methylated CpGs are located relative to DNA-histone contact points and how these modifications influence H2A.Z nucleosome structure. The structure provides no mechanistic insights into methylation-induced destabilization.

      The fact that the DNA resolution in the methylated structure was not high enough to resolve the positions of methylated CpGs despite a high overall resolution of 2.78 Å implies that 1) the Sat2R-P DNA was not as stably registered as the 601L sequence, requiring us to create two alternative Sat2R-P atomic models to account for the variable positioning in our samples, and 2) that the presence of DNA methylation increases that positional variability. We understand that one may prefer to see highly resolved density around each methylation mark, but we do believe that our inability to accomplish that is actually a feature rather than a weakness and has important biological implications. The decrease in local DNA resolution on the methylated Sat2R-P structure compared to its unmethylated counterpart is meaningful and suggests to us that DNA methylation weakens overall DNA wrapping and positioning on the nucleosome, supported by the increased flexibility seen at the linker DNA ends as well as an increase in the population of highly shifted nucleosomes amongst the methylated particles. Additionally, one major view in the DNA methylation/nucleosome stability field is that the presence of DNA methylation can make DNA stiffer and harder to bend, causing opening and destabilization of nucleosomes (Ngo et al., 2016). The increased opening of linker DNA ends and accessibility of methylated H2A.Z nucleosomes in our hands also aligns with such an idea, again suggesting decreased histone-DNA contact stability on methylated DNA substrates. We plan to revise the writing in our manuscript to better reflect these ideas.

      The experimental system also lacks physiological relevance. The template DNA sequence is artificial, despite the existence of well-characterised native genomic sequences for which DNA methylation is known to inhibit H2A.Z incorporation. Alternatively, there are a number of studies examining the effect of DNA methylation on nucleosome structure, stability, DNA unwrapping, and positioning. Choosing one of these DNA sequences would have at least allowed a direct comparison with a canonical nucleosome. Indeed, a major omission is the absence of a cryo-EM structure of a canonical nucleosome assembled on the same DNA template - this is essential to assess whether the observed effects are H2A.Z-specific.

      The reviewer raises a fair question about whether canonical H2A would experience the same DNA methylation-dependent structural effects. We had considered solving the H2A structures, however, ultimately decided against it for a few reasons. First, there already exists crystal structures of canonical H2A nucleosomes using a DNA sequence highly similar to our Sat2R-P with and without the presence of DNA methylation (PDB: 5CPI and 5CPJ). The authors of this study did not see any physical differences present in their structures (Osakabe et al., 2015). Additionally, we had included canonical H2A conditions within our restriction enzyme accessibility assay and did not see a significant impact of DNA methylation on those samples (Fig 3). Because of the previous report and our own negative data, we expected that only limited additional insights would be obtained from the canonical H2A structures and decided not to pursue that analysis.

      One of the primary reasons we chose the Sat2R-P sequence was, as noted above, that there already was a published study examining how DNA methylation affects nucleosome structure using a variant of this sequence which we could compare to our results, as the reviewer has suggested. We did have to modify the sequence, namely by making it palindromic, in order to increase the final achievable resolution. We viewed the Sat2R-P sequence as an attractive candidate because it is physiologically relevant; the initial sequence was taken directly from human satellite II. Several modifications were made for technical reasons, including making the sequence palindromic as described above and also ensuring that each CpG is recognizable by a methylation-sensitive restriction enzyme so that we could be certain about the degree of methylation on our substrates. These practical concerns outweighed the necessity of maintaining a strict physiological sequence to us. However, we still believe the final Sat2R-P more closely mimics physiological sequences than Widom 601. Additionally, human satellite II is a highly abundant sequence in the human genome that is known to undergo large methylation changes on the onset of many disorders, like cancer, as well as during aging. Thus, there are interesting biological questions surrounding how the methylation state of this particular sequence affects chromatin structure.

      Furthermore, it has been reported that satellite II is devoid of H2A.Z (Capurso et al., 2012). Beyond those reasons, the satellite II sequence is generally interesting to our lab because we have been studying genes involved in ICF syndrome, where hypomethylation of satellite II sequences forms one of the hallmarks of this disorder (Funabiki et al., 2023; Jenness et al., 2018; Wassing et al., 2024). We understand that sequence context plays a large role in nucleosome wrapping and stability. This is why we strived to test multiple sequences in each of our assays. We do agree that it would be interesting to use DNA sequences where H2A.Z binding has already been described to be affected in a DNA methylation-dependent manner, forming an exciting future study to pursue.

      Furthermore, the DNA template is methylated at numerous random CpG sites. The authors' argument that only the global methylation level is relevant is inconsistent with the literature, which clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent. Not all CpG sites contribute equally to nucleosome stability or unwrapping, and this critical factor is not considered.

      We did not argue that only the global methylation level is relevant. We also would appreciate it if the reviewer could provide specific references that "clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent". We are aware of a series of studies conducted by Chongli Yuan's group, including one testing the effect of placing methylated CpGs at different positions along the Widom 601 sequence. In that study (Jimenez-Useche et al., 2013), they did find that positioning of mCpGs has differential impacts on the salt resistance of the nucleosomes, with 5 tandem mCpG copies at the dyad causing the most dramatic nucleosome opening whereas having mCpGs only at the DNA major grooves, but not elsewhere, increased nucleosome stability. However, they did also find that methylation of the original Widom 601 sequence also caused destabilization, albeit to a lesser degree, and another study by the same group (Jimenez-Useche et al., 2014) also found that CpG methylation decreased nucleosome-forming ability for all tested variants of the Widom 601 sequence, regardless of CpG density or positioning.

      Other studies monitored how distribution of methylated CpGs correlates with nucleosome positioning (Collings et al., 2013; Davey et al., 1997; Davey et al., 2004). However, these studies assessed the sequence-dependent effects specifically on nucleosome assembly during in vitro salt dialysis, which is a different physical process than the one our manuscript focuses on, especially when considering the fact that H2A.Z is deposited onto preassembled H2A-nucleosome. Our cryo-EM analysis examines the structural changes induced by DNA methylation on already formed nucleosomes rather than the process of formation. Thus, probing accessibility changes using a restriction enzyme was the more appropriate biochemical assay to verify our structures.

      We do very much agree that DNA context can influence nucleosome stability under different conditions. A study of molecular dynamics simulations concluded that the "combination of overall DNA geometrical and shape properties upon methylation" makes nucleosomes resistant to unwrapping (Li et al., 2022), while another modeling study suggests that DNA methylation impacts nucleosome stability in a manner dependent on DNA sequence, where "[s]trong binding is weakened and weak binding is strengthened" (Minary and Levitt, 2014). While G/C-dinucleotides are preferentially placed at major groove-inward positions in the nucleosomes in vivo (Chodavarapu et al., 2010; Segal et al., 2006) and G/C-rich segments are excluded from major groove-outward positions in Widom 601-like nucleosomes (Chua et al., 2012), methylated CpG dinucleotides are preferably, if not exclusively, located at major groove-outward positions in vivo. Mechanisms behind this biased mCpG positioning on the nucleosome remain speculative, likely caused by a combination of multiple factors, but the fact that we did not observe clear structural impacts using the Widom 601L sequence, where mCpGs are located at the major groove-outward and -inward positions ((Chua et al., 2012) and our structure), deserves a space for discussion. On the other hand, positioning of mCpG on satellite II-derived sequences that we used in this study was based on a physiological sequence, and thus it may not be appropriate to say that those CpGs are placed at multiple "random" positions. Although we decided not to discuss the position of 5mC on our Sat2R nucleosome structure due to ambiguous base assignments, neither of our two atomic models is consistent with an idea that DNA methylation repositions the CpG to the outward major grooves. As the potential contribution of how DNA methylation affects the nucleosome structure via modulating DNA stiffness has been extensively studied (Choy et al., 2010; Li et al., 2022; Ngo et al., 2016; Perez et al., 2012), we believe that it is appropriate to consider overall DNA properties along the whole DNA sequence, though we are willing to discuss potential positional effects in the revised manuscript.

      Perhaps one of the most important points that we did not emphasize enough in our original manuscript was that in contrast to the subtle intrinsic effect of DNA methylation that was DNA sequence dependent, we observed SRCAP-dependent preferential H2A.Z deposition to unmethylated DNA over methylated DNA in both 601 and satellite II DNAs. In the revised manuscript, we will make the value of comparative studies on 601 and satellite II in two distinct mechanisms.

      Finally, and most importantly, the reported increase in accessibility of the methylated H2A.Z nucleosome is negligible compared with the much larger intrinsic DNA accessibility of the unmethylated H2A.Z nucleosome. These data do not support the authors' hypothesis and contradict the manuscript's conclusions. Claims that methylated H2A.Z nucleosomes are "more open and accessible" must therefore be removed, and the title is misleading, given that no meaningful impact of DNA methylation on H2A.Z nucleosome stability is demonstrated.

      We respectfully disagree with this reviewer's criticism. We investigated the potential impact of DNA methylation on nucleosome stability to the best of our abilities through complementary assays and reported our observations. The effect of DNA methylation is smaller than the difference between H2A.Z and H2A, but we were able to see an effect. It is also not uncommon for small differences to have functional impacts in biological systems. We agree that further testing is required to determine whether this subtle effect is functionally important, and it remains the subject of future research due to the many technical challenges associated with addressing said question. We would like to note that 18 years have passed since Daniel Zilberman first reported the antagonistic relationship between H2AZ and DNA methylation (Zilberman et al., 2008) but very few studies have since directly tested specific mechanistic hypotheses. We believe that our study lays the groundwork for exciting future investigation that better elucidates the pathways that contribute to this antagonism and will have meaningful impacts on the field in general. However, thanks to the reviewer's criticism, we realized that we did not clearly state in the Abstract the relatively subtle effect of DNA methylation on the intrinsic H2A.Z nucleosome stability. Therefore, we will accordingly revise the Abstract to make this point clearer.

      (2) The cryo-EM structures of methylated and unmethylated 601L H2A.Z nucleosomes show no detectable differences. As presented, this negative result adds little value. If anything, it reinforces the point that the positional context of CpG methylation is critical, which the manuscript does not consider.

      We believe the inclusion and factual reporting of negative data is important for the scientific community as one of the major issues currently in biology research is biased omission of negative data. We considered eLife as a venue to publish this work for this reason. We understand that the reviewer believes our 601L structures may detract from the overall message of our manuscript. We believe this data rather emphasizes the importance of DNA sequence context, something that the reviewer also rightfully notes. It is standard practice in the nucleosome field to use the Widom 601 sequence, along with its variants. Our experience has shown that use of an artificially strong positioning sequence may mask weaker physical effects that could play a physiological role. Thus, we were careful to validate all further assays with multiple DNA sequences and believed it important to report these sequence-dependent effects on nucleosome structure.

      (3) Very little H3 signal coincides with H2A.Z at TSSs in sperm pronuclei, yet this is neither explained nor discussed (Supplementary Figure 10D). The authors need to clarify this.

      Our H3 signal, which represents the global nucleosome population, is more broadly distributed across the genome than H2A.Z, which is known to localize at specific genomic sites. Since both histone types were sequenced to similar read depths, H3 peaks are generally shallower than H2A.Z and peak heights cannot be directly compared (i.e. they should be represented in separate appropriate data ranges).

      (4) In my view, the most conceptually important finding is that H2A.Z-associated reads in sperm pronuclei show ~43% CpG methylation. This directly contradicts the model of strict mutual exclusivity and suggests that the antagonism is context-dependent. Similarly, the finding that the depletion of SRCAP reduces H2A.Z deposition only on unmethylated templates is also very intriguing. Collectively, these result warrants further investigation (see below).

      (5) Given that H2A.Z is located at diverse genomic elements (e.g., enhancers, repressed gene bodies, promoters), the manuscript requires a more rigorous genomic annotation comparing H2A.Z occupancy in sperm pronuclei versus XTC-2 cells. The authors should stratify H2A.Z-DNA methylation relationships across promoters, 5′UTRs, exons, gene bodies, enhancers, etc., as described in Supplementary Figure 10A.

      We agree that the substantial presence of co-localized H2A.Z and DNA methylation specifically in the sperm pronuclei samples and the changes in pattern between nuclear types are highly interesting and require further investigation. However, we faced technical challenges in our sequencing experiments that made us refrain from conducting a more detailed analysis for fear of over-interpreting potential artifacts. These challenges mainly stemmed from the difficulties in collecting enough material from Xenopus egg extracts and Tn5’s innate bias towards accessible regions of the genome. Because of this, open regions of the genome tend to be overrepresented in our data (as noted in our Discussion), making it challenging to rigorously compare methylation profiles and H2A.Z/H3 associated genomic elements.

      While the degree of separation seems to be dependent on nuclei type, we still believe the antagonism exists in both the sperm pronuclei and XTC-2 samples when comparing H2A.Z methylation profiles to the corresponding H3 condition. Our study also demonstrates that H2A.Z is preferentially deposited to hypomethylated DNA in a manner dependent of SRCAP-C (the loss of SRCAP only reduces H2A.Z on unmethylated substrates) but an additional methylation-insensitive H2A.Z deposition mechanism also exists. We realized that this interesting point was not clearly highlighted in Abstract, so we will revise it accordingly.

      (6) Although H2A.Z accumulates less efficiently on exogenous methylated substrates in egg extract, substantial deposition still occurs (~50%). This observation directly challenges the strong antagonistic model described in the manuscript, yet the authors do not acknowledge or discuss it. Moreover, differences between unmethylated and methylated 601 DNA raise further questions about the biological relevance of the cryo-EM 601 structures.

      As depicted in Figure 6 and described in the Discussion, we clearly indicated that both methylation-sensitive and methylation-insensitive pathways exist to deposit H2A.Z within the genome. We also directly stated in our Discussion that a substantial proportion of H2A.Z colocalizes with DNA methylation both in our study as well as in previous reports, which is of major interest for future study. Additionally, we further discussed how the absence of transcription in Xenopus eggs is a likely reason for the more limited effect of DNA methylation restricting H2A.Z deposition in our egg extract system.

      As noted in our response to (2), the lack of a clear impact on our 601L structures implies that this is due to the extraordinarily strong artificial nucleosome positioning capacity of the 601 sequence and its variants. Since 601 is heavily used in chromatin biology, including within DNA methylation research, such negative data are still useful to include and publish.

      (7) The SRCAP depletion is insufficiently validated i.e., the antibody-mediated depletion of SRCAP lacks quantitative verification. A minimum of three biological replicates with quantification is required to substantiate the claims.

      We are willing to address this concern. However, please note that our data showed that methylation-dependent H2A.Z deposition is almost completely erased upon SRCAP depletion, indicating functionally effective depletion. The specificity of the custom antibody against Xenopus SRCAP was verified by mass spectrometry. Additionally, we have obtained the same effect using another commercially available SRCAP antibody, though we did not include this preliminary result in our original manuscript. Due to its relatively low abundance and high molecular weight, SRCAP western blot signals are weak, making it challenging to quantify the degree of depletion. We also believe that the value of quantification in this context, with the points noted above, is rather limited. In the past, our lab has published papers on depleting the H3T3 kinase Haspin from Xenopus egg extracts (Ghenoiu et al., 2013; Kelly et al., 2010) but were never able to detect Haspin via western blot. This protein was only detected by mass spectrometry specifically on nucleosome array beads with H3K9me3 (Jenness et al., 2018). However, depletion of Haspin was readily monitored by erasure of H3T3ph, the enzymatic product of Haspin. In these experiments, it was impossible, and not critical, to quantitatively monitor the depletion of Haspin protein in order to investigate its molecular functions. Similarly, in this current study, the important fact is that depletion of SRCAP suppressed methylation-sensitive H2A.Z deposition and quantifying the degree of SRCAP depletion would not have a major impact on this conclusion.

      (8) It appears that the role of p400-Tip60 has been completely overlooked. This complex is the second major H2A.Z deposition complex. Because p400 exhibits DNA methylation-insensitive binding (Supplementary Figure 14), it may account for the deposition of H2A.Z onto methylated DNA. This possibility is highly significant and must be addressed by repeating the key experiments in Figure 5 following p400-Tip60 depletion.

      We are aware that the Tip60 complex is a very likely candidate for mediating DNA methylation insensitive H2A.Z deposition, which is why we tested whether DNA binding of p400 is methylation sensitive. Therefore, the reviewer's statement that we "completely overlooked" Tip60-C’s role does not fairly report on our efforts. We wished to test the potential contribution of Tip60-C, but, unfortunately, the antibodies we currently have available to us were not successful in depleting the complex from egg extract. Since we had no direct experimental evidence indicating the role Tip60-C plays, we decided to take a conservative approach to our model and leave the methylation-insensitive pathway as mediated by something still unidentified. While further investigating Tip60-C’s contribution to this pathway is of definite value, we do not believe that it impacts our major conclusion that SRCAP-C is the main mediator responsible for H2A.Z deposition on unmethylated DNA and thus remains a subject for future study.

      (9) The manuscript repeatedly states that H2A.Z nucleosomes are intrinsically unstable; however, this is an oversimplification. Although some DNA unwrapping is observed, multiple studies show that H3/H4 tetramer-H2A.Z/H2B interactions are more stable (important recent studies include the following: DOI: 10.1038/s41594-021-00589-3; 10.1038/s41467-021-22688-x; and reviewed in 10.1038/s41576-02400759-1).

      We understand that the H2A.Z stability field is highly controversial. We have introduced the many conflicting reports that have been published in the field but can further expand on the controversies if desired. We also understand that the term “nucleosome stability” is broad and encompasses many physical aspects. As noted in a prior response, we will better specify our use of the term within the manuscript. In our assays, we are most focused on the DNA wrapping stability of the nucleosome and have consistently seen in our hands that H2A.Z nucleosomes are much more open and accessible compared to canonical H2A on satellite II-derived sequences, regardless of methylation status. However, we do understand that many groups have observed the opposite findings while others have obtained results similar to us. We reported on our findings of the general H2A.Z stability with the hopes to help clarify some of the field’s controversies.

      In summary, the current manuscript does not present a convincing mechanistic explanation for the antagonism between DNA methylation and H2A.Z. The observation that H2A.Z can substantially coexist with DNA methylation in sperm pronuclei, perhaps, should be the conceptual focus.

      We appreciate this reviewer’s advice. However, please note that the first author who led this project has already successfully defended their PhD thesis primarily based on this project, making it impractical and unrealistic to completely change the focus of this manuscript to include an entirely new avenue of research. We believe that our data provide important insights into the mechanisms by which H2A.Z is excluded from methylated DNA, particularly via the DNA methylation-sensitive binding of SRCAP-C, which has never been described before. We agree that many questions are still left unanswered, including the exact molecular mechanism behind how DNA methylation prevents SRCAP-C binding. We have preliminary data that suggest none of the known DNA-binding modules of SRCAP-C, including ZNHIT1, by themselves can explain this sensitivity. This implies that domain dissection in the context of the holo-SRCAP complex is required to fully address this question. We believe this represents a very exciting future avenue of study; however, it does not negate our finding that SRCAP-C itself is important for maintaining the DNA methylation/H2A.Z antagonism. Therefore, we respectfully disagree with this reviewer's summary statement, which misleadingly undermines the impact of our work.

      Reviewer #3 (Public review):

      Summary:

      Histone variant H2A.Z is evolutionarily conserved among various species. The selective incorporation and removal of histone variants on the genome play crucial roles in regulating nuclear events, including transcription. Shih et al. aimed to address antagonistic mechanisms between histone variant H2A.Z deposition and DNA methylation. To this end, the authors reconstituted H2A.Z nucleosomes in vitro using methylated or unmethylated human satellite II DNA sequence and examined how DNA methylation affects H2A.Z nucleosome structure and dynamics. The cryo-EM analysis revealed that DNA methylation induces a more open conformation in H2A.Z nucleosomes. Consistent with this, their biochemical assays showed that DNA methylation subtly increases restriction enzyme accessibility in H2A.Z nucleosomes compared with canonical H2A nucleosomes. The authors identified genome-wide profiles of H2A.Z and DNA methylation using genomic assays and found their unique distribution between Xenopus sperm pronuclei and fibroblast cells. Using Xenopus egg extract systems, the authors showed SRCAP complex, the chromatin remodelers for H2A.Z deposition, preferentially deposit H2A.Z on unmethylated DNA.

      Strengths:

      The study is solid, and most conclusions are well-supported. The experiments are rigorously performed, and interpretations are clear. The study presents a high-resolution cryo-EM structure of human H2A.Z nucleosome with methylated DNA. The discovery that the SRCAP complex senses DNA methylation is novel and provides important mechanistic insight into the antagonism between H2A.Z and DNA methylation.

      We are grateful that this reviewer recognizes the importance of our study.

      Weaknesses:

      The study is already strong, and most conclusions are well supported. However, it can be further strengthened in several ways.

      (1) It is difficult to interpret how DNA methylation alters the orientation of the H4 tail and leads to the additional density on the acidic patch. The data do not convincingly support whether DNA methylation enhances interactions with H2A.Z mono-nucleosomes, nor whether this effect is specific to methylated H2A.Z nucleosomes.

      The altered H4 tail orientation and extra density seen on the acidic patch were incidental findings that we thought could be interesting for the field to be aware of but decided not to follow up on as there were other structural differences that were more directly related to our central question. We do believe that the above two differences are linked to each other because we used a highly purified and homogenous sample for cryo-EM analysis and the H4 tail/acidic patch interaction is a well characterized contact that mediates inter-nucleosome interactions. Additionally, other groups have reported that the presence of DNA methylation causes condensation of both chromatin and bare DNA (cited within our manuscript), though the mechanics behind this phenomenon remain to be elucidated. We believed that our structure data may also align with those findings. However, the reviewer is fair in pointing out that we do not provide further experimental evidence in verifying the existence of these increased interactions. We can revise our writing to clarify that these points are currently hypotheses rather than validated results.

      (2) It remains unclear whether DNA methylation alters global H2A.Z nucleosome stability or primarily affects local DNA end flexibility. Moreover, while the authors showed locus-specific accessibility by HinfI digestion, an unbiased assay such as MNase digestion would strengthen the conclusions.

      We would like to thank the reviewer for bringing up these issues. Although our current data cannot explicitly clarify these possibilities, we favor an idea that DNA methylation specifically alters histone to DNA contacts and that this effect is felt globally across the entire nucleosome rather than only at specific locations. The intrinsic flexibility of linker DNA ends means that that region tends to exhibit the greatest differences under different physical influences, hence the focus on characterizing that area; flexibility of a thread on a spool is most pronounced at the ends. However, we also found that the DNA backbone of H2A.Z on methylated DNA had a lower local resolution compared to its unmethylated counterpart, despite that structure having a higher global resolution, which suggested to us that DNA positioning along the nucleosome is overall weaker under the presence of DNA methylation. This is corroborated by the increased population of open/shifted structures in our classification analysis. The reviewer raises a fair point about the use of a specific restriction enzyme versus MNase. We agree that our accessibility assay is highly influenced by the position of the restriction site and have previously seen that moving the cut site too close to the linker DNA end will abolish any DNA methylation-dependent differences. We did initially attempt an MNase digestion-based assay, but the data were not as reproducible as with the use of a specific restriction enzyme. We do not know the reason behind this irreproducibility though we believe that the processivity of MNase could make it difficult to capture subtle effects like those induced by DNA methylation on already highly accessible H2A.Z nucleosomes. Overall, while we believe that DNA methylation does exert a physical effect, its subtlety may explain the many contradictory studies present within the DNA methylation and nucleosome stability field.

      References

      Berta, D.G., H. Kuisma, N. Valimaki, M. Raisanen, M. Jantti, A. Pasanen, A. Karhu, J. Kaukomaa, A. Taira, T. Cajuso, S. Nieminen, R.M. Penttinen, S. Ahonen, R. Lehtonen, M. Mehine, P. Vahteristo, J. Jalkanen, B. Sahu, J. Ravantti, N. Makinen, K. Rajamaki, K. Palin, J. Taipale, O. Heikinheimo, R. Butzow, E. Kaasinen, and L.A. Aaltonen. 2021. Deficient H2A.Z deposition is associated with genesis of uterine leiomyoma. Nature. 596:398–403.

      Capurso, D., H. Xiong, and M.R. Segal. 2012. A histone arginine methylation localizes to nucleosomes in satellite II and III DNA sequences in the human genome. BMC Genomics. 13:630.

      Chodavarapu, R.K., S. Feng, Y.V. Bernatavichute, P.Y. Chen, H. Stroud, Y. Yu, J.A. Hetzel, F. Kuo, J. Kim, S.J. Cokus, D. Casero, M. Bernal, P. Huijser, A.T. Clark, U.

      Kramer, S.S. Merchant, X. Zhang, S.E. Jacobsen, and M. Pellegrini. 2010. Relationship between nucleosome positioning and DNA methylation. Nature. 466:388–392.

      Choy, J.S., S. Wei, J.Y. Lee, S. Tan, S. Chu, and T.H. Lee. 2010. DNA methylation increases nucleosome compaction and rigidity. J Am Chem Soc. 132:1782–1783.

      Chua, E.Y., D. Vasudevan, G.E. Davey, B. Wu, and C.A. Davey. 2012. The mechanics behind DNA sequence-dependent properties of the nucleosome. Nucleic Acids Res. 40:6338–6352.

      Collings, C.K., P.J. Waddell, and J.N. Anderson. 2013. Effects of DNA methylation on nucleosome stability. Nucleic Acids Res. 41:2918–2931.

      Davey, C., S. Pennings, and J. Allan. 1997. CpG methylation remodels chromatin structure in vitro. J Mol Biol. 267:276–288.

      Davey, C.S., S. Pennings, C. Reilly, R.R. Meehan, and J. Allan. 2004. A determining influence for CpG dinucleotides on nucleosome positioning in vitro. Nucleic Acids Res. 32:4322–4331.

      Funabiki, H., I.E. Wassing, Q. Jia, J.D. Luo, and T. Carroll. 2023. Coevolution of the CDCA7-HELLS ICF-related nucleosome remodeling complex and DNA methyltransferases. Elife. 12.

      Ghenoiu, C., M.S. Wheelock, and H. Funabiki. 2013. Autoinhibition and polo-dependent multisite phosphorylation restrict activity of the histone h3 kinase haspin to mitosis. Mol Cell. 52:734–745.

      Jenness, C., S. Giunta, M.M. Muller, H. Kimura, T.W. Muir, and H. Funabiki. 2018. HELLS and CDCA7 comprise a bipartite nucleosome remodeling complex defective in ICF syndrome. Proc Natl Acad Sci U S A. 115:E876–E885.

      Jimenez-Useche, I., J. Ke, Y. Tian, D. Shim, S.C. Howell, X. Qiu, and C. Yuan. 2013. DNA methylation regulated nucleosome dynamics. Sci Rep. 3:2121.

      Jimenez-Useche, I., D. Shim, J. Yu, and C. Yuan. 2014. Unmethylated and methylated CpG dinucleotides distinctively regulate the physical properties of DNA. Biopolymers. 101:517–524.

      Kelly, A.E., C. Ghenoiu, J.Z. Xue, C. Zierhut, H. Kimura, and H. Funabiki. 2010. Survivin reads phosphorylated histone H3 threonine 3 to activate the mitotic kinase Aurora B. Science. 330:235– 239.

      Li, S., Y. Peng, D. Landsman, and A.R. Panchenko. 2022. DNA methylation cues in nucleosome geometry, stability and unwrapping. Nucleic Acids Res. 50:1864–1874.

      Minary, P., and M. Levitt. 2014. Training-free atomistic prediction of nucleosome occupancy. Proc Natl Acad Sci U S A. 111:6293–6298.

      Ngo, T.T., J. Yoo, Q. Dai, Q. Zhang, C. He, A. Aksimentiev, and T. Ha. 2016. Effects of cytosine modifications on DNA flexibility and nucleosome mechanical stability. Nat Commun. 7:10813.

      Nishiyama, A., L. Yamaguchi, J. Sharif, Y. Johmura, T. Kawamura, K. Nakanishi, S. Shimamura, K. Arita, T. Kodama, F. Ishikawa, H. Koseki, and M. Nakanishi. 2013. Uhrf1-dependent H3K23 ubiquitylation couples maintenance DNA methylation and replication. Nature. 502:249–253.

      Osakabe, A., F. Adachi, Y. Arimura, K. Maehara, Y. Ohkawa, and H. Kurumizaka. 2015. Influence of DNA methylation on positioning and DNA flexibility of nucleosomes with pericentric satellite DNA. Open Biol. 5.

      Perez, A., C.L. Castellazzi, F. Battistini, K. Collinet, O. Flores, O. Deniz, M.L. Ruiz, D. Torrents, R. Eritja, M. Soler-Lopez, and M. Orozco. 2012. Impact of methylation on the physical properties of DNA. Biophys J. 102:2140–2148.

      Segal, E., Y. Fondufe-Mittendorf, L. Chen, A. Thastrom, Y. Field, I.K. Moore, J.P. Wang, and J. Widom. 2006. A genomic code for nucleosome positioning. Nature. 442:772–778.

      Wassing, I.E., A. Nishiyama, R. Shikimachi, Q. Jia, A. Kikuchi, M. Hiruta, K. Sugimura, X. Hong, Y. Chiba, J. Peng, C. Jenness, M. Nakanishi, L. Zhao, K. Arita, and H. Funabiki. 2024. CDCA7 is an evolutionarily conserved hemimethylated DNA sensor in eukaryotes. Sci Adv. 10:eadp5753.

      Zilberman, D., D. Coleman-Derr, T. Ballinger, and S. Henikoff. 2008. Histone H2A.Z and DNA methylation are mutually antagonistic chromatin marks. Nature. 456:125–129.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors designed two sets of experiments to explore the molecular mechanisms underlying the mutually exclusive distribution of H2A.Z and DNA methylation previously reported by several groups.

      First, they examined how DNA methylation affects the physical stability of H2A.Z-containing nucleosomes. Although their results point to subtle differences between nucleosomes assembled on methylated versus unmethylated DNA, the authors did not extend their analyses to directly test the stability of these H2A.Z-containing nucleosomes under more challenging conditions. Prior studies have demonstrated that certain nucleosomes, such as those containing H3.3-H2A.Z or H2A.Z-H3K56Q, exhibit specific instability, but such instability is only revealed under challenging conditions, for example, altered salt concentrations or the presence of additional factors like FACT (PMID: 17575053; PMID: 19633671; PMID: 19639024; PMID: 41303375). In light of this literature, the observable structural features noted here for nucleosomes containing H2A.Z and methylated DNA are suggestive of increased instability, yet the authors did not employ comparable approaches to rigorously test whether such instability might explain the absence of H2A.Z from methylated genomic regions.

      As a result, at this stage of analysis, the idea that nucleosomes containing both H2A.Z and methylated DNA are intrinsically unstable, and that this instability accounts for the depletion of H2A.Z from methylated regions, remains unsubstantiated.

      We thank the reviewer's constructive criticisms. Through our response to these points, we were able to significantly improve our manuscript, including major rewriting of the Abstract and Discussion as well as incorporation of new data.

      We agree that combinations with other histone variants, modifications, and mutations could further affect our observed impact of DNA methylation on H2A.Z-nucleosome stability. What we observed based on satellite II-derived DNA was that DNA methylation made H2A.Znucleosomes (with H3.2) more open, although the effect of DNA methylation is relatively small (as compared to the general impact of H2A.Z incorporation). We readily admit that such a subtle physical effect is unlikely to be the main driver of the antagonistic distribution of H2A.Z and DNA methylation, though small physical changes have been known to influence larger biological functions, and sought to describe additional regulatory factors that could play major roles.

      We also agree that H3.3 is of major interest when discussing H2A.Z. In our Xenopus egg extract experiments using DNA beads, the primary H3 variant deposited is H3.3 as no DNA replication occurs on the beads to allow for H3.1/.2 replication-coupled deposition. From those experiments, we demonstrated that preferential loading of H2A.Z can be primarily explained by SRCAP. In other words, in the absence of SRCAP, loading/retention of H2A.Z on H3.3nucleosomes was not noticeably affected by DNA methylation, indicating that DNA methylation’s physical effects on H2A.Z nucleosomes plays little, if any, role in the preferential accumulation of H2A.Z on unmethylated DNA at least in the context of synthetic DNA beads incubated in

      Xenopus egg extract lacking active transcription. Our sequencing data hints at the interesting possibility that transcription, along with other factors missing in egg extract, may be involved in further pruning H2A.Z from methylated DNA which conceivably could take advantage of subtle physical alterations. However, we agree we lack firm supporting evidence for such a mechanism which led us to forgo including that in our final model figure and we instead only report on our observations with discussions on potential biological implications and limitations. Of note, it has been reported that the H2A.Z nucleosome is more accessible than the H2A nucleosome, while inclusion of H3.3 does not further enhance accessibility of the H2A.Z nucleosome (PMID 38920622). We have now noted these points in the Discussion of our revised manuscript.

      We appreciate and agree with this reviewer’s point that nucleosome instability sometimes requires challenging conditions to be fully revealed. However, in our system, use of H2A.Z was the challenge provided as we find in our hands that H2A.Z by itself substantially destabilizes histone-DNA contacts compared to canonical H2A. And it is only with this already destabilized nucleosome that we see further enhancement of accessibility/openness in the presence of DNA methylation. This is similar to findings by [PMID: 23260052] that reported that only an intrinsically destabilized sub-population of canonical H2A nucleosomes on 601 DNA experienced detectable physical changes in the presence of DNA methylation.

      In response to this reviewer's comment, we edited the Abstract and Discussion to clearly note the subtly of the impact of DNA methylation on H2A.Z nucleosome structure, and that the potential functional significance remains an open question.

      Second, the authors investigated whether SRCAP-C contributes to preferential H2A.Z incorporation into unmethylated DNA. The absence of H2A.Z from methylated regions does not necessarily imply that it cannot be incorporated there; it may instead reflect the chromatin environment associated with DNA methylation, which could disfavor SRCAP-C activity, whereas open chromatin environments strongly promote SRCAP-dependent H2A.Z deposition.

      This reviewer suggested an alternative model where SRCAP prefers to act on open chromatin and that the apparent preferential H2A.Z deposition to unmethylated DNA is due solely to the increased accessibility associated with unmethylated DNA. Following such a model, one would predict that SRCAP-C's preference to unmethylated DNA would be eliminated on nucleosome-free DNA in Xenopus egg extracts. To test this alternative model, we repeated the SRCAP-C binding experiment in egg extracts depleted of the HIRA complex, the H3.3-H4 chaperone responsible for de novo nucleosome assembly on exogenously added DNA in egg extracts. Contrary to this prediction, both SRCAP and ZNHIT1 still display preferential binding to unmethylated DNA substrates in HIRA-depleted extracts in which nucleosome assembly is suppressed (newly added Suppl Fig 16). The results argue that discrimination of SRCAP-C from methylated DNA is not due to a potential effect of chromatin compaction by DNA methylation. Furthermore, our new result is in line with an idea that SRCAP employs 1D diffusion on the linker DNA before engaging the H2A nucleosome (PMID 39131301), implying that discrimination of SRCAP-C from methylated linker DNA contributes to this process. This is now illustrated in the new model Figure 6.

      Please note we also indicate in both our model and in text that there exists an additional methylation-insensitive mechanism that drives H2A.Z deposition on methylated DNA, leading to a substantial amount of colocalized H2A.Z and DNA methylation. Why two different deposition pathways for H2A.Z differing in their methylation sensitivities must exist is an interesting topic for future work and has not been described prior to our report.

      This interpretation is consistent with the authors' own comparative mapping of H2A.Z and DNA methylation in sperm pronuclei incubated in egg extract versus a transcriptionally active Xenopus fibroblast line. They observed that about 40% of H2A.Z-associated genomic DNA is methylated in sperm pronuclei, but only 3% in fibroblasts. As they note, the major difference between these systems is the presence of transcription in fibroblasts, a process known to drive H2A.Z eviction/recycling, and which is absent in the egg-extract system. Thus, no specific inhibition of SRCAP-C by methylated DNA needs to be invoked: H2A.Z deposition on both methylated and unmethylated accessible regions, followed by preferential eviction from methylated sites in active nuclei, could fully account for the observed patterns.

      As the reviewer correctly notes here, we proposed that transcription is likely to play an important role in pruning H2A.Z from methylated DNA. Our observations and proposed mechanism do not argue against the possible existence of a DNA methylation-insensitive, transcription-dependent mechanism that promotes dissociation of H2A.Z from methylated DNA, which we believe likely would be correlated to gene body methylation. In fact, we did propose in our Discussion that such a transcription-mediated mechanism may conceivably take advantage of the subtly destabilized DNA wrapping of H2A.Z nucleosomes on methylated DNA to further selectively prune H2A.Z at colocalized regions. However, such a mechanism would be an additional component to what we have already described and does not explain the observed preferential recruitment of SRCAP-C to unmethylated DNA in Xenopus egg extracts in the absence of active transcription.

      In this respect, studies from the Felsenfeld laboratory showing that double-variant nucleosomes are highly unstable under physiological ionic conditions are particularly relevant (PMID: 19633671; PMID: 19639024). They demonstrated that such unstable nucleosomes are only evident under low ionic strength extraction conditions, emphasizing that the apparent absence of H2A.Z may reflect facilitated removal rather than failure of assembly.

      The authors may also have been influenced by the study of Berta et al. (cited in the manuscript), which examined uterine leiomyomas harboring somatic or germline mutations in SRCAP-C subunits. In those tumors, the normal association of H2A.Z with accessible, active chromatin, and its exclusion from methylated regions, was lost. However, this observation does not demonstrate that SRCAP-C actively prevents H2A.Z incorporation into methylated DNA. Instead, it may simply reflect that in the absence of SRCAP-C, a default, less efficient deposition pathway operates regardless of whether the chromatin environment is normally permissive or restrictive for SRCAP-dependent activity.

      Even if one accepts the more straightforward interpretation proposed by the present authors, that SRCAP-C is actively inhibited by methylated DNA, as suggested by their pull-down experiments from Xenopus egg extracts using unmethylated and methylated DNA, the hypothesis lacks mechanistic support.

      Considering this reviewers' criticism, we have expanded our discussion to indicate a possibility that SRCAP-C may have an alternative mechanism to find open chromatin independent of DNA methylation status. However, our data show that SRCAP-C preferentially binds to unmethylated DNA in a manner independent of transcription or other epigenetic status in Xenopus egg extracts, and that SRCAP-C carries the major mechanism that explains preferential deposition of H2A.Z to unmethylated DNA. Therefore, we believe that our study for the first time offers a mechanistic explanation of how H2A.Z discrimination from methylated DNA is accomplished through SRCAP-dependent H2A.Z deposition.

      The following points summarize the issues discussed above:

      (1) The authors did not sufficiently test the hypothesis that H2A.Z-methylated DNA nucleosomes are inherently unstable and could explain the exclusion of H2A.Z from methylated genomic regions.

      We stand by our conclusion that DNA methylation has an intrinsic capacity to make the H2A.Z nucleosome more open and accessible, even though the effect is subtle. We did not argue that this subtle effect can fully explain the exclusion of H2A.Z from methylated genomic regions. Rather, our Xenopus egg extract experiment suggested that in the transcriptionally inactive egg extract setting, such a mechanism plays little or no role and it is SRCAP-C instead that is the major driver. Whether this physical mechanism also contributes to their exclusion in cells with active transcription remains a future subject of study.

      (2) The proposed active role of SRCAP-C in preventing H2A.Z assembly on methylated DNA is supported only by limited experimental data and lacks a mechanistic explanation. In particular, this hypothesis does not account for the significant H2A.Z assembly observed on methylated DNA regions in sperm nuclei after incubation in egg extract.

      We respectfully disagree with this summary assessment. Our conclusions are well aligned with the substantial H2A.Z association with methylated DNA in sperm pronuclei assembled in Xenopus egg extracts seen. We demonstrated that:

      (1) In transcriptionally-silent Xenopus egg extracts using synthetic DNA beads, DNAbinding of SRCAP-C is inhibited by DNA methylation.

      (2) In this set up, H2A.Z is preferentially, if not exclusively, loaded to unmethylated DNA over methylated DNA.

      (3) Depletion of SRCAP-C almost completely eliminated preferential association of H2A.Z to unmethylated DNA, while leaving some DNA methylation-insensitive H2A.Z loading.

      (4) These data indicate the presence of a SRCAP-C-dependent, DNA methylationsensitive mechanism as well as a SRCAP-C-independent, DNA-methylation-insensitive mechanism to load H2A.Z to chromatin. This conclusion matches well with our genomic analysis showing that H2A.Z is preferentially but not exclusively loaded to hypomethylated genomic segments to sperm pronuclei in Xenopus egg extracts.

      (5) As we clearly discussed, this SRCAP-C-dependent mechanism by itself is insufficient to explain the much clearer exclusion of H2A.Z in somatic cells. We discussed the possibility that transcription contributes to further pruning of H2A.Z from methylated DNA.

      To deliver this overall message with nuances that we noted above, we have heavily revised the Abstract, the model Figure 6, and Discussion. Thanks to the criticisms raised by this reviewer, we believe that our revised manuscript has been significantly improved.

      Reviewer #2 (Recommendations for the authors):

      (1) A major omission is the absence of a cryo-EM structure of a canonical nucleosome assembled on the same DNA template - this is essential to assess whether the observed effects are H2A.Z-specific.

      We had considered solving the H2A structures, however, ultimately decided against it for a few reasons. First, there already exists crystal structures of canonical H2A nucleosomes using a DNA sequence highly similar to our Sat2R-P with and without the presence of DNA methylation (PDB: 5CPI and 5CPJ). The authors of this study did not see any physical differences present in their structures (Osakabe et al., 2015). Additionally, we had included canonical H2A conditions within our restriction enzyme accessibility assay and did not see a significant impact of DNA methylation on those samples (Fig 3). Because of the previous report and our own negative data, we expected that only limited additional insights would be obtained from the canonical H2A structures and decided not to pursue that analysis, considering the cost and effort for this additional cryo-EM analysis.

      (2) The reported increase in accessibility of the methylated H2A.Z nucleosome is negligible compared with the much larger intrinsic DNA accessibility of the unmethylated H2A.Z nucleosome. Claims that methylated H2A.Z nucleosomes are "more open and accessible" must therefore be removed, and the title is misleading, given that no meaningful impact of DNA methylation on H2A.Z nucleosome stability is demonstrated.

      We respectfully disagree with this reviewer's criticism. We investigated the potential impact of DNA methylation on nucleosome stability to the best of our abilities through complementary assays and reported our observations. The effect of DNA methylation is smaller than the difference between H2A.Z and H2A, but we were able to see an effect. It is also not uncommon for small differences to have functional impacts in biological systems. We agree that further testing is required to determine whether this subtle effect is functionally important, and it remains the subject of future research due to the many technical challenges associated with addressing said question. We would like to note that 18 years have passed since Daniel Zilberman first reported the antagonistic relationship between H2AZ and DNA methylation (Zilberman et al., 2008) but very few studies have since directly tested specific mechanistic hypotheses. We believe that our study lays the groundwork for exciting future investigation that better elucidates the pathways that contribute to this antagonism and will have meaningful impacts on the field in general. However, thanks to the reviewer's criticism, we realized that we did not clearly state in the Abstract that the effect of DNA methylation on intrinsic H2A.Z nucleosome stability is relatively subtle. We will accordingly revise the Abstract, the model Figure 6, and Discussion to make this point clearer.

      (3) The cryo-EM structures of methylated and unmethylated 601L H2A.Z nucleosomes show no detectable differences. As presented, this negative result adds little value and should be removed.

      We believe the inclusion and factual reporting of negative data is important for the scientific community as one of the major issues currently in biology research is biased omission of negative data. We considered eLife as a venue to publish this work for this reason. We understand that the reviewer believes our 601L structures may detract from the overall message of our manuscript, however, we believe that this data rather emphasizes the importance of DNA sequence context, something that the reviewer also rightfully notes. It is standard practice in the nucleosome field to use the Widom 601 sequence, along with its variants. Our experience has shown that use of an artificially strong positioning sequence may mask weaker physical effects that could play a physiological role. Thus, we were careful to validate all further assays with multiple DNA sequences and believed it important to report these sequence-dependent effects on nucleosome structure.

      (4) Very little H3 signal coincides with H2A.Z at TSSs in sperm pronuclei, yet this is neither explained nor discussed (Supplementary Figure 10D). The authors need to clarify this.

      Our H3 signal, which represents the global nucleosome population, is more broadly distributed across the genome than H2A.Z, which is known to localize at specific genomic sites. Since both histone types were sequenced to similar read depths, H3 peaks are generally shallower than H2A.Z and peak heights cannot be directly compared (i.e. they should be represented in separate appropriate data ranges).

      (5) In my view, the most conceptually important finding is that H2A.Z-associated reads in sperm pronuclei show ~43% CpG methylation. This directly contradicts the model of strict mutual exclusivity and suggests that the antagonism is context-dependent. Similarly, the finding that the depletion of SRCAP reduces H2A.Z deposition only on unmethylated templates is also very intriguing. Collectively, these result warrants further investigation (see below).

      (6) Given that H2A.Z is located at diverse genomic elements (e.g., enhancers, repressed gene bodies, promoters), the manuscript requires a more rigorous genomic annotation comparing H2A.Z occupancy in sperm pronuclei versus XTC-2 cells. The authors should stratify H2A.ZDNA methylation relationships across promoters, 5′UTRs, exons, gene bodies, enhancers, etc., as described in Supplementary Figure 10A.

      We appreciate recognition of the importance of our finding by this reviewer. We agree that the substantial presence of co-localized H2A.Z and DNA methylation specifically in the sperm pronuclei samples and the changes in pattern between nuclear types are highly interesting and require further investigation. However, we faced technical challenges in our sequencing experiments that made us refrain from conducting a more detailed analysis for fear of over-interpreting potential artifacts. These challenges mainly stemmed from the difficulties in collecting enough material from Xenopus egg extracts and Tn5’s innate bias towards accessible regions of the genome. Because of this, open regions of the genome tend to be overrepresented in our data (as noted in our Discussion), making it challenging to rigorously compare methylation profiles and H2A.Z/H3 associated genomic elements.

      While the degree of separation seems to be dependent on nuclei type, we still believe the antagonism exists in both the sperm pronuclei and XTC-2 samples when comparing H2A.Z methylation profiles to the corresponding H3 condition. Our study also demonstrates that H2A.Z is preferentially deposited to hypomethylated DNA in a manner dependent of SRCAP-C (the loss of SRCAP only reduces H2A.Z on unmethylated substrates) but an additional methylationinsensitive H2A.Z deposition mechanism also exists. We realized that this interesting point was not clearly highlighted in Abstract, so we will revise it accordingly.

      (7) Although H2A.Z accumulates less efficiently on exogenous methylated substrates in egg extract, substantial deposition still occurs (~50%). This observation directly challenges the strong antagonistic model described in the manuscript. The authors need to discuss this in more detail.

      As depicted in Figure 6 and described in the Discussion, we indicated that both methylation-sensitive and methylation-insensitive pathways exist to deposit H2A.Z within the genome. We also directly stated in our Discussion that a substantial proportion of H2A.Z colocalizes with DNA methylation both in our study as well as in previous reports, which is of major interest for future study. Additionally, we further discussed how the absence of transcription in Xenopus eggs is a likely reason for the more limited effect of DNA methylation restricting H2A.Z deposition in our egg extract system. In the revised manuscript, we heavily edited the Discussion to better clarify these points.

      (8) The SRCAP depletion is insufficiently validated, i.e., the antibody-mediated depletion of SRCAP lacks quantitative verification. A minimum of three biological replicates with quantification is required to substantiate the claims.

      In response to this, quantification of the SRCAP depletion is now included as Supplementary Figure 13A and B. Since our anti-ZNHIT1 antibodies reproducibly detected ZNHIT1 on DNA beads isolated from egg extracts, we have conducted additional verification of the SRCAP depletion by probing for SRCAP and ZNHIT1 on DNA beads, confirming that these proteins were depleted on DNA beads upon immunodepletion with anti-SRCAP antibodies (Author response image 1). To further validate this conclusion, we added data showing that the effect of SRCAP depletion on methylation-sensitive H2A.Z deposition was reproduced through use of a different commercially available antibody raised against human SRCAP (newly added Suppl Fig 14).

      Author response image 1.

      Verification of SRCAP depletion using DNA beads. DNA beads were incubated in interphase-cycled Xenopus egg extract that had been depleted with either our custom SRCAP antibody or an IgG negative control. SRCAP and ZNHIT1 association was then assessed via Western Blot.

      (9) It appears that the role of p400-Tip60 has been completely overlooked. This complex is the second major H2A.Z deposition complex. Because p400 exhibits DNA methylation-insensitive binding (Supplementary Figure 14), it may account for the deposition of H2A.Z onto methylated DNA. This possibility is highly significant and must be addressed by repeating the key experiments in Figure 5 following p400-Tip60 depletion.

      Thank you very much for raising this interesting point. We were aware that the TIP60 complex is a very likely candidate for mediating DNA methylation-insensitive H2A.Z deposition, which is why we tested whether DNA binding of p400 is methylation sensitive (shown in the revised Supplementary Figure 15). We wished to test the potential contribution of TIP60-C, but, unfortunately, the antibodies we currently have available to us were not successful in depleting the complex from egg extract. Since we had no direct experimental evidence indicating the role TIP60-C plays, we decided to take a conservative approach to our model and leave the methylation-insensitive pathway as mediated by something still unidentified. While further investigating TIP60-C’s contribution to this pathway is of definite value, we do not believe that it impacts our major conclusion that SRCAP-C is the main mediator responsible for H2A.Z deposition on unmethylated DNA and thus remains a subject for future study. However, we have now added descriptions to note that TIP60-C is a likely candidate to execute the SRCAPindependent and methylation-insensitive mechanism of H2A.Z loading in Xenopus egg extracts. In the model figure, we initially did not include Tip60-C, but we now infer TIP60-C is a likely candidate in the revised model (Figure 6) to facilitate the future research in the field.

      (10) The manuscript repeatedly states that H2A.Z nucleosomes are intrinsically unstable; however, this is an oversimplification. Although some DNA unwrapping is observed, multiple studies show that H3/H4 tetramer-H2A.Z/H2B interactions are more stable (important recent studies include the following: DOI: 10.1038/s41594-021-00589-3; 10.1038/s41467-021-22688-x; and reviewed in 10.1038/s41576-024-00759-1). These references should be considered.

      We appreciate that the reviewer points out this important issue. Although we had described that controversy exists regarding how H2A.Z and DNA methylation contributes to nucleosome stability, it was not clearly explained. We understand that this confusion was in part due to the term “nucleosome stability”, which is broad and encompasses many physical aspects. As noted in a prior response, we now better specify our use of the term within the manuscript, emphasizing the nucleosome openness and accessibility, particularly at the nucleosome core particle entry/exit sites. As noted by published studies (PMID 38920622), the impact on nucleosome stability may differ between the internal and external segments of nucleosomal DNA. In our assays, we are most focused on the DNA wrapping stability of the nucleosome and have consistently seen in our hands that H2A.Z nucleosomes are much more open and accessible at DNA ends compared to canonical H2A on satellite II-derived sequences, regardless of methylation status. However, we do understand that many groups have observed the opposite findings while others have obtained results similar to us. This may be caused by usage of different assays (for example, nucleosome assembly during salt dialysis or salt sensitivity vs openness/accessibility of preassembled nucleosome). In the Discussion of the revised manuscript, we now explain these factors, with the hope that our study will help clarify some of the field’s controversies.

      Reviewer #3 (Recommendations for the authors):

      (1) Since the cryo-EM structure determined by single-particle analysis represents only one major population, it would be important to determine the dyad axis position by complementary biochemical assays, such as MNase-seq or chemical digestion by the Fenton reaction (PMID: 22929776).

      We would like to thank the reviewer for bringing up this important issue. We agree that the high-resolution structure represents only a subpopulation in which we specifically selected for the most stably wrapped nucleosomes in each sample. This issue is why we then supplemented our high-resolution structure with our in-silico classification analysis to survey the overall structure distribution of the full nucleosome particle population. The classification input contains all nucleosome-like particles picked from both unmethylated and methylated sample micrographs mixed together, ensuring that all particles are taken into consideration and that both samples have been analyzed in an identical manner. From our sorting analysis, we find an increased population of open and shifted nucleosome structures present in our methylated DNA sample, indicating destabilization of DNA-histone wrapping with DNA methylation. This is corroborated by the lower local resolution seen on the DNA backbone of our high-resolution H2A.Z on methylated DNA structure, despite it having a higher global resolution compared to its unmethylated counterpart. This suggested to us that DNA positioning along the nucleosome is overall weaker under the presence of DNA methylation.

      The reviewer raises a fair point about the use of a specific restriction enzyme versus MNase. We agree that our accessibility assay is highly influenced by the position of the restriction site and have previously seen that moving the cut site too close to the linker DNA end will abolish any DNA methylation-dependent differences. We realized that we did not explain how we decided to place the HinfI site in the context of our solved cryo-EM structure. In the revised Figure 3B, we now illustrate that the HinfI site is located at a segment where H2A/H2A.Z directly contacts the DNA and explained that this segment belongs to the region that exhibited clear methylation-induced flexibility in our cryo-EM structures. Thus, our structure helped us design this experiment.

      We did initially attempt an MNase digestion-based assay, but the data were not as reproducible as with the use of a specific restriction enzyme. We do not know the reason behind this irreproducibility though we believe that the processivity of MNase could make it difficult to capture subtle effects like those induced by DNA methylation on already highly accessible H2A.Z nucleosomes, as subtle technical errors in the MNase concentration can have significant effects. Overall, while we believe that DNA methylation does exert a physical effect, its subtlety may explain the many contradictory studies present within the DNA methylation and nucleosome stability field.

      (2) I assume that the authors confirmed complete DNA methylation by restricted enzyme digestion. It would be helpful to include this validation in supplementary figures.

      We would like to thank the reviewer for pointing out that this critical verification was missing from our initial manuscript. DNA methylation of Sat2R-P and Sat2R was verified via BstBI digestion (Suppl Fig 1B and 7D, respectively); 601L verified with HpaII digestion (Suppl Fig 6B); and 19x601 DNA verified via BstUI digestion (Suppl Fig 11A). All data has been added to the specified figures. Unfortunately, the 16xHSat2 DNA substrate we used in our assays does not contain appropriate cut-sites for methylation-sensitive restriction enzymes. Due to that, we always prepared the 16xHSat2 DNA in parallel with the 19x601 substrate under identical conditions then use digestion of the 19x601 substrate to verify quality of methylation for each batch. To more directly verify methylation of 16xHSat2 DNA, we used Xenopus laevis ZHX2 and ZHX3, which we recently identified as proteins that selectively associate with methylated DNA in Xenopus egg extracts. Although identification and characterization of Xenopus ZHX2/3 will be described elsewhere, previous published proteomic studies have also identified mammalian ZHXs as proteins that enrich on methylated DNA (PMID 21029866, 23434322). By incubating DNA beads in Xenopus egg extract and probing for endogenous ZHX2/3 (our antibody recognizes both ZHX2 and ZHX3), we verified that ZHXs selectively binds to methylated 16xHSat2 but not unmethylated DNA (Author response image 2). Although this does not necessarily verify that all CpGs in 16xHSat2 were methylated, we observed comparable methylation-induced inhibition of SRCAP binding between 16x601 and 16HSat2, supporting our conclusion.

      Author response image 2.

      Verification of 16xHSat2 methylation status via ZHX2/3 protein binding. 16xHSat2 DNA beads were incubated in Xenopus egg extract and endogenous ZHX2/3 protein binding assessed via Western Blot with a custom generated antibody that recognizes both ZHX2 and ZHX3.

      (3) Figure 1A: The dyad position is difficult to identify. Please indicate it clearly using a distinct color (not green).

      We now directly indicate each sequence midpoint with a black triangle and also changed the font of DNA sequences to further clarify that the dyad resides at the palindromic center.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents an extensive body of work and an outstanding contribution to our understanding of the IFN type I and III system in chickens. The research started with the innovative approach of generating KO chickens that lack the receptor for IFNα/β (IFNAR1) or IFN-λ (IFNLR1). The successful deletion and functional loss of these receptors was clearly and comprehensively demonstrated in comparison to the WT. Moreover, the homozygous KO lines (IFNAR1-/- or IFNLR1-/-) were found to have similar body weights, and normal egg production and fertility compared to their WT counterparts. These lines are a major contribution to the toolbox for the study of avian/chicken immunology.

      The significance of this contribution is further demonstrated by the use of these lines by the authors to gain insight into the roles of IFN type I and IFN-type III in chickens, by conducting in ovo and in vivo studies examining basic aspects of immune system development and function, as well as the responses to viral challenges conducted in ovo and in vivo.

      Based on solid, state-of the-art methods and convincing evidence from studies comparing various immune system related functions in the IFNAR1-/- or IFNLR1-/- lines to the WT, revealed that the deletion of IFNAR1 and/or IFNLR1 resulted in:

      (1) impaired IFN signaling and induction of anti-viral state;

      (2) modulation of immune cell profiles in the peripheral blood circulation and spleen;

      (3) modulation of the cecum microbiome;

      (4) reduced concentrations of IgM and IgY in the blood plasma before and following immunization with model antigen KLH, whereby also line differences in the time-course of the antibody production were observed;

      (5) decrease in MHCII+ macrophages and B cells in the spleen of IFNAR1 KO chickens, although the MHCII-expression per cell was not affected in this line; and

      (6) reduction in the response of αβ1 TCR+ T cells of IFNAR1 KO chickens as suggested by clonal repertoire analyses.

      These studies were then followed by examination of the role of type I and type III IFN in virus infection, using different avian influenza A virus strains as well as an avian gamma corona virus (IBV) in in ovo challenge experiments. These studies revealed: viral titers that reflect virus-species and strain-specific IFN responses; no differences in the secretion of IFN-α/β in both KO compared to the WT lines; a predominant role of type I IFN in inducing the interferon-stimulated gene (ISG) Mx; and that an excessive and unbalanced type I IFN response can harm host fitness (survival rate, length of survival) and contribute to immunopathology.

      Based on guidance from the in ovo studies, comprehensive in vivo studies were conducted on host-pathogen interactions in hens from the three lines (WT, IFNAR1 KO, or IFNLR1 KO). These studies revealed the early appearance of symptoms and poor survival of hens from the IFNR1 KO line challenged with H3N1 avian influenza A virus; efficient H#N1 virus replication in IFNAR1 KO hens, increased plasma concentrations of IFNα/β and mRNA expression of IFN-λ in spleens of the IFNAR1 KO hens; a pro-inflammatory role of IFN-λ in the oviduct of hens infected with H3N1 virus; increased proinflammatory cytokine expression in spleens of IFNAR1 KO hens, and Impairment of negative feedback mechanisms regulating IFN-α/β secretion in IFNAR1-KO hens and a significant decrease in this group's antiviral state; additionally it was demonstrated that IFN-α/β can compensate IFN-λ to induce an adequate antiviral state in the spleen during H3N1 infection, but IFN-λ cannot compensate for IFN-α/β signaling in the spleen.

      Strengths:

      (1) Both the methods and results from the comprehensive, well-designed, and well-executed experiments are considered excellent. The results are well and correctly described in the result narrative and well presented in both the manuscript and supplement Tables and Figures. Excellent discussion/interpretation of results.

      (2) The successful generation of the type I and type III IFN KO lines offers unprecedented insight and opens multiple new venues for exploring the IFN system in chickens. The new knowledge reported here is direct evidence of the high impact of this model system on effectively addressing a critical knowledge gap in avian immunology.

      (3) The thoughtful selection of highly relevant viruses to poultry and human health for the in ovo and in vivo challenge studies to examine and assess host-pathogen interactions in the IFNR KO and WT lines.

      (4) Making use of the unique opportunities in the chicken model to examine and evaluate the host's IFN system responses to various viral challenges in ovo, before conducting challenge studies in hens.

      (5) The new knowledge gained from the IFNAR1 and IFNLR1 KO lines will find much-needed application in developing more effective strategies to prevent health challenges like avian influenza and its devastating effects on poultry, humans, and other mammals.

      (6) The excellent cooperation and contributions of the co-authors and institutions.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      We thank Reviewer #1 for the very positive and thoughtful evaluation of our manuscript. We appreciate the recognition of the effort involved in generating and characterizing the IFNAR1<sup>-/-</sup> and IFNLR1<sup>-/-</sup> chicken lines and for highlighting their significance as valuable tools for advancing avian immunology.

      We are grateful for the reviewer’s clear summary of our findings and for acknowledging the quality of the experimental design, data presentation, and interpretation. The encouraging feedback affirms the broader impact of our study and its contribution to understanding type I and type III interferon biology and antiviral defense mechanisms in chickens.

      We have carefully considered all reviewer comments and revised the manuscript accordingly to further clarify methodological details and improve the presentation of our results.

      Reviewer #1 (Recommendations for the authors):

      Minor suggestions/corrections:

      (1) Line 192, 193, 196 - the superscript "+" sign appears to be underlined.

      We corrected the formatting of all superscript "+" symbols (L 192-196).

      (2) L195: ...in the spleen "of both IIFNR KO lines" (or some clarification of what you are comparing).

      The sentence was revised to read “in the spleen of both IFNR knockout lines” for clarity (L 195).

      (3) L198: replace "highlighting" with "and".

      “Highlighting” was replaced with “and” as suggested (L 198).

      (4) L231 and 235: change "monocytes" to "macrophages" as this description appears to refer to spleen cells. Also, make this change in Figure 3b and in the Figure 3 caption (e.g. monocytes/macrophages).

      “Monocytes” was replaced with “macrophages” to accurately describe spleen cells. The same correction was made in Figure 3b and the Figure 3 caption as well as in the supplementary Figure 4 (L 229-234).

      (5) L257: indicate this significant difference in Figure 5b.

      The significant difference has now been clearly indicated in Figure 5b.

      (6) L420, 421: change "monocytes" to "macrophages" as this discussion appears to refer to the spleen.

      “Monocytes” was replaced with “macrophages” to reflect the correct cell type discussed in the spleen context (L 226-227).

      (7) L564-565: has the anti-human MX antibody been shown to cross-react with chicken Mx?

      We thank the reviewer for this valuable comment. Yes, the cross-reactivity of the anti-human MxA monoclonal antibody (clone M143, mouse IgGκ; Merck, Germany) with chicken Mx protein has been previously demonstrated. This antibody has been used successfully to detect chicken Mx in several published studies, including Schusser et al., Journal of Virology (2011). Accordingly, supporting references have been added to the revised manuscript (L584-586).

      (8) L608: how were PBMC and splenocytes (mononuclear spleen cells?) isolated -Line 647 on page 14 mentions their isolation using Histopaque-1077 density gradient centrifugation

      We thank the reviewer for this helpful comment. A detailed description of the isolation procedure for PBMCs and mononuclear spleen cells has now been added to the Materials and Methods section under the new subsection titled “Isolation of peripheral blood and splenic mononuclear cells” In this section, we specify that both PBMCs and splenic mononuclear cells were isolated using Histopaque®-1077 density gradient centrifugation as described on page (14), lines (668-676)

      Reviewer #2 (Public review):

      Summary:

      This study attempts to dissect the contributions of type I and type III IFNs to the antiviral response in chickens. The first part of the study characterises the generation of IFNAR and IFNLR KO chicken strains and describes basic differences. Four different viruses are then tested in chicken embryos, while the subsequent analysis of the antiviral response in vivo is performed with one influenza H3N1 strain.

      Strengths:

      Having these two KO chicken strains as a tool is a great achievement. The initial analysis is solid. Clear effect of IFNAR deficiency in in vivo infection, less so for IFNLR deficiency.

      Weaknesses:

      (1) The antibody induction by KLH immunisation: No data indicated whether or not this vaccination induces IFN responses in wt mice, so the effects observed may be due to steady-state differences or to differential effects of IFN induced during the vaccination phase. No pre-immune results are shown. The differences are relatively small and often found at only one plasma dilution - the whole of Figure 4 could be condensed into one or two panels by proper calculation of Ab titers - would these titres be significantly different? This, as all of the other in vivo experiments, has not been repeated, if I understand the methods section correctly.

      We thank the reviewer for the valuable comments and helpful suggestions.

      Regarding interferon induction by KLH immunisation, we agree that KLH is not known to strongly induce type I or type III interferon responses. Importantly, the goal of this experiment was not to quantify IFN induction per se, but to assess how the absence of IFN receptors affects adaptive antibody responses under standard immunisation conditions. KLH is a highly immunogenic, copper‑containing extracellular oxygen‑carrier protein derived from the marine gastropod Megathura crenulata and is widely used as a T cell–dependent model antigen to study B‑cell activation, antibody production, and class switching in vivo (Harris & Markl, Micron 1999, doi: 10.1016/s0968-4328(99)00036-0; Schusser et al., 2016, doi: 10.1002/eji.201546171). Because chickens are extremely unlikely to encounter KLH under natural conditions, KLH behaves as a neo‑antigen, and anti‑KLH antibodies can be considered to arise from de novo adaptive responses rather than pre‑existing antigen experience. Owing to its structural complexity and unusual glycosylation, KLH provides broad antigenic stimulation and engages adaptive immune mechanisms largely independently of pathogen‑specific innate pattern recognition, while still supporting robust T helper cell responses (Swaminathan et al., 2014, doi: 10.1111/bcp.12422; Geyer et al., 2004, doi: 10.1016/j.micron.2003.10.033). This makes KLH particularly suitable for dissecting intrinsic differences in adaptive immune responses between genotypes.

      We have now included pre-immune plasma controls (Figure 4 c, d), demonstrating that baseline antibody levels did not differ statistically between groups and were negligible prior to immunisation.

      As for the use of different plasma dilutions, this was necessary to ensure that all samples were measured within the linear detection range of our in-house ELISA. For example, after the primary immunisation, IgY concentrations were relatively low (e.g., day 5 post-immunisation), and plasma samples had to be diluted only 1:100 to detect measurable differences between groups. In contrast, after the booster immunisation, IgY concentrations increased substantially, and lower dilutions such as 1:100 led to signal saturation. Therefore, higher dilutions (up to 1:1600) were required to keep the values within the measurable range.

      Following the reviewer’s recommendation, we have now unified the presentation of results by showing data at a single representative dilution for each isotype: 1:100 for IgM (Figure 4C) and 1:1600 for IgY (Figure 4D). These dilutions fall within the linear part of the standard curve to distinguish between groups. We also calculated endpoint antibody titers, which confirmed that the observed differences remain statistically significant (p < 0.05).

      Regarding experimental replication, the study design already incorporated sufficient biological replication and longitudinal sampling to ensure robustness of the findings. Each experimental group consisted of ten animals, including three animals that served as negative controls. In addition, animals were sampled at multiple time points following immunisation, allowing the dynamics of the antibody response to be monitored over time. This longitudinal design provides repeated biological measurements within the same experimental cohort and allows confirmation of consistent response patterns across time points. All ELISA measurements were performed in technical triplicates. Together, the combination of adequate group size, appropriate controls, repeated sampling over time, and technical replication provides sufficient statistical power and internal validation of the observed effects. Furthermore, all animal experiments were conducted under strict approval of the Government of Upper Bavaria and in accordance with German animal welfare regulations, which limit unnecessary repetition of in vivo experiments beyond the approved experimental design.

      (2) The basic conundrum here and in later figures is never addressed by the authors: Situations where IFN type 1 and 3 signalling deficiency each have an independent effect (i.e., Figure 4d) suggest that they act by separate, unrelated mechanisms. However, all the literature about these IFN families suggests that they show almost identical signalling and gene induction downstream of their respective receptors. How can the same signalling, clearly active here downstream of the receptors for IFN type 1 or type 3, be non-redundant, i.e., why does the unaffected IFN family not stand in? This is a major difference from the mouse studies, which showed a rather subtle phenotype when only one of the two IFN systems was missing, but a massive reduction in virus control in double KO mice (the correct primary paper should be quoted here, not only the review by McNab). Reasons could be a direct effect of IFNab on B cells and an indirect effect of IFNL through non-B cells, timing issues, and many other scenarios can be envisaged. The authors do not address this question, which limits the depth of analysis.<br />

      We thank the reviewer for this insightful comment. Indeed, this represents one of the most interesting and novel findings of our study. Unlike in mice, where both type I and type III interferon systems need to be disrupted to observe clear susceptibility to influenza infection, in our chicken model the loss of IFNAR1 alone was sufficient to render the animals highly susceptible. This highlights a key difference between mammalian and avian interferon biology and supports the main goal of our work, to investigate the specific biological activities of avian interferons rather than directly transferring conclusions from mammalian systems.

      In relation to Figure 4d (anti-KLH IgY), we observed that both IFNAR1<sup>-/-</sup> and IFNLR1<sup>-/-</sup> animals reduced IgY levels compared to wild type at day 3 after the booster immunisation. However, by day 5 post-booster, IgY levels in IFNLR1<sup>-/-</sup> animals had returned to wild-type levels, while IFNAR1-/- animals still showed significantly lower IgY. This indicates that type III IFN contributes to the early phase of the IgY response but that its absence can later be compensated by type I IFN signalling. In contrast, loss of type I IFN cannot be compensated by type III IFN, suggesting that type I IFN plays a more dominant or sustained role in antibody induction.

      Although type I and type III IFNs share overlapping signaling pathways and induce similar sets of ISGs, their effects are not entirely redundant in chickens. A likely explanation is the difference in receptor distribution: IFNAR1 is broadly expressed across most cell types, while IFNLR1 expression is mainly confined to epithelial cells (Reuter et al. 2014, doi: 10.1128/jvi.02764-13; Santhakumar et al., 2017, doi: 10.3389/fimmu.2017.00049). This systemic versus localized receptor pattern likely determines the range of responsive cells and may account for the differential outcomes observed when either receptor is absent.

      Taken together, our findings indicate that while type I and type III IFNs share overlapping signaling mechanisms, they maintain distinct biological functions in chickens, consistent with their differing receptor expression and cellular responsiveness. This contrasts with mammalian models, where redundancy between these systems is more apparent and only double knockouts show strong phenotypes especially during influenza infection (Mordstein et al., 2008, doi: 10.1371/journal.ppat.1000151; Mordstein et al., 2010, doi: 10.1128/jvi.00272-10). We have now cited this primary study instead of the McNab review and expanded the Discussion to reflect this interpretation (Page 10, Line 463-467).

      (3) In the one in vivo experiment performed with chickens, only one virus was tested; more influenza strains should be included, as well as non-influenza viruses.

      We thank the reviewer for this valuable suggestion. The main objective of the present study was to generate and characterize novel chicken models lacking type I and type III interferon receptors in order to investigate their physiological relevance and to obtain the first insights into their roles during viral infection with more emphasis on avian influenza. As part of this manuscript, we performed detailed in ovo experiments using both influenza and non-influenza viruses (Figure 6). These included three influenza strains: H1N1, a mammalian-adapted strain; H3N1, a low pathogenic avian strain showing features of high pathogenicity; and H9N2, a low pathogenic avian strain, as well as a non-influenza virus, the infectious bronchitis virus (IBV). The in ovo analyses revealed clear strain-dependent modulation of interferon responses, and have provided a comprehensive overview of virus-specific interferon activity in chickens. The subsequent in vivo experiment was therefore designed as a proof of concept using the most suitable viral strain to robustly challenge the immune system and to identify the distinct functions of chicken interferons.

      (4) The basic conundrum of point 2 applies equally to Figure 6a; both KOs have a phenotype. Again in 6d, both IFNs appear to be separately required for Mx induction. An explanation is needed.

      We thank the reviewer for raising this important point. We have revised the Discussion (page 10, lines 442-454) and provided supporting references to clarify how the composition of the chorioallantoic membrane (CAM) and virus tropism together determine the apparent requirement for type I and type III interferons. The CAM contains both epithelial and mesodermal–vascular layers, which support complementary interferon functions: type I IFN acts mainly in systemic and vascular compartments, while type III IFN provides localized protection at the epithelial surface. Consequently, viruses that replicate in both compartments (e.g., WSN33, H3N1) require both IFN pathways for maximal Mx induction (Figures 6a, 6d), whereas viruses with a predominant or prolonged epithelial phase (e.g., H9N2, IBV) at the time point analyzed are effectively controlled by type I IFN signaling alone.

      These differences likely reflect virus-specific factors, including cell tropism, replication kinetics, and the spatial–temporal dynamics of receptor expression and signaling. Notably, our measurement of Mx expression at 24 hours post infection (hpi) may represent a phase when type I IFN signaling is dominant and can compensate for the absence of type III IFN. It remains possible that IFN-λ plays a more critical, non-redundant role at earlier stages post infection, when rapid antiviral protection is first required at the epithelial surface. Thus, the apparent redundancy observed at 24 hpi likely reflects temporal compensation and crosstalk between the IFN pathways rather than a lack of biological relevance for type III IFN.

      (5) Line 308, where are the viral titers you refer to in the text? The statement that the results demonstrate that excessive IFNab has a negative impact is overstretched, as no IFN measurements of the infected embryos are shown here.

      We thank the reviewer for this comment and would like to clarify that measurements of type I IFN (IFN-α/β) concentrations were indeed performed. The data are presented in Figure 6b and cited in the Results section (“Knockout of IFNAR1 and IFNLR1 did not affect IFN-α/β secretion in ovo”). To avoid misunderstanding, the Results section has been revised to explicitly reference the IFN-α/β measurements supporting this conclusion (line 302-309).

      These data indicate that all genotypes produced comparable IFN-α/β levels upon viral infection, with the IBV infection inducing approximately tenfold higher IFN-α/β secretion than the influenza strains tested (Figure 6b). The interpretation that an excessive type I IFN response can negatively affect host fitness is based on the combination of quantified IFN-α/β data (Figure 6b) and survival probability results (Supplementary Figure 10), where embryos exhibiting the highest IFN-α/β levels (embryos of all genotypes infected with IBV and embryos infected with IFNLR1<sup>-/-</sup> H9N2) showed the poorest survival despite moderate or low viral titers.

      (6) The in vivo infection is the most interesting experiment, and the key outcome here is that IFN type 1 is crucial for anti-H3N1 protection in chickens, while type 3 is less impactful. However, this experiment suffers from the different time points when chickens were culled, so many parameters are impossible to compare (e.g., weight loss, histopathology, IFN measurements, and more). Many of these phenomena are highly dynamic in acute virus infections, so disparate time points do not allow a meaningful comparison between different genotypes. What are the stats in 7b? Is the median rather than the mean indicated by the line? Otherwise, the lines appear in surprising places. SD must be shown, and I find it difficult to believe that there is a significant difference in weight, for e.g., IFNAR KO, unless maybe with a paired t test. What is the statistical test?

      We thank the reviewer for these thoughtful comments and agree that disease progression and sampling time can influence comparisons in acute infection studies. Hens were euthanized upon reaching predefined humane endpoint scores in full compliance with the Bavarian animal welfare regulations. Because the infection produced markedly different clinical kinetics among genotypes, all data were interpreted with reference to matched disease stages rather than absolute days post-infection.

      For matched comparisons: Viral titers in the trachea and cloaca, as well as plasma IFN-α/β concentrations, were compared between day 2 in IFNAR1<sup>-/-</sup> hens and day 3 in WT and IFNLR1<sup>-/-</sup> hens, which represent equivalent clinical stages before the sharp viral rise seen later in WT and IFNLR1<sup>-/-</sup> birds. At these comparable stages, viral titers were still low and IFN-α/β concentrations remained significantly lower in WT and IFNLR1<sup>-/-</sup> than in IFNAR1<sup>-/-</sup> hens (Figure 7c, d, f), indicating that uncontrolled viral replication and IFN-α/β secretion in the absence of type I signaling occur earlier and more intensely.

      For Figure 7b: Because chickens reached humane endpoints at different days post infection (2 dpi for IFNAR1<sup>-/-</sup> and 5–7 dpi for WT and IFNLR1<sup>-/-</sup>), statistical comparisons were performed within each genotype using paired t-tests and all data were shown together as mean ± SD.

      We acknowledge that unequal survival times limit direct temporal comparison. However, the consistent pattern across all parameters including early severe disease, high viral load, and excessive IFN-α/β secretion in IFNAR1<sup>-/-</sup> hens versus delayed onset in WT and IFNLR1<sup>-/-</sup>, supports the conclusion that type I IFN signaling is essential for early viral restriction and host survival, while type III IFN contributes mainly to localized inflammatory responses. The experiment cannot be repeated under the current animal welfare authorization.

      (7) Figures 7e,f: these comparisons are very difficult to interpret as the virus loads at these time points already differ significantly, so any difference could be secondary to virus load differences.

      We thank the reviewer for this valuable comment. We agree that viral load can influence interferon induction; however, our comparisons in Figures 7e and 7f were designed to reflect equivalent stages of disease progression rather than identical time points post-infection. For IFN-λ mRNA expression (Fig. 7e), spleens from IFNAR1<sup>-/-</sup> hens were sampled on day 2 post-infection, when viral titers were maximal, and compared to WT and IFNLR1<sup>-/-</sup> hens sampled on day 5 post-infection, at which point viral titers reached comparable levels. Thus, this comparison represents the phase of peak infection and systemic immune activation across all genotypes rather than an absolute temporal comparison.

      Similarly, for IFN-α/β concentrations (Fig. 7f), two levels of comparison were made: between IFNAR1<sup>-/-</sup> hens at day 2 post-infection (high viral titer) and WT and IFNLR1<sup>-/-</sup> hens at day 3 (low viral titer), and between WT and IFNLR1<sup>-/-</sup> hens at day 5 post-infection (high viral titer). In both cases, IFN-α/β levels remained disproportionately elevated in IFNAR1<sup>-/-</sup> hens, indicating that the excessive type I IFN response is primarily due to the loss of receptor-mediated feedback regulation rather than viral load alone.

      We have clarified this rationale in the legend of figure 7 and in the results (Line 338-345). We believe these results are valuable as they provide important insight into the temporal dynamics and regulatory interplay between type I and type III interferons during avian influenza infection.

      Reviewer #2 (Recommendations for the authors):

      Experiments need to be repeated. Comparisons in infection experiments must be done on the same day. More viruses need to be tested.

      We thank the reviewer for these constructive recommendations. All infection experiments were conducted under approved animal welfare regulations, which limited the number of replicates and prevented repeating in vivo challenges beyond the authorized design, in line with the 3R principles, particularly Reduction, to avoid unnecessary animal use. To ensure comparability, samples were analyzed at matched disease stages rather than identical time points, as clarified in the revised figure legends (figure 7) and Results (Line 338-345). The study already includes multiple influenza and non-influenza viruses (H1N1, H3N1, H9N2, and IBV) tested in ovo to capture virus-specific interferon responses, while the in vivo H3N1 infection served as a proof-of-concept to dissect genotype-specific immune dynamics.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ducrocq et al. present research exploring the genetic link between simple multicellular group formation (ace2Δ/ace2Δ) and its interaction with cell-cycle progression mutants (e.g., cln3Δ/cln3Δ), demonstrating that this combination can provide fitness benefits during fluctuating resource conditions, resulting in a rapid increase in the fraction of multicellular cell-cycle mutants over unicellular yeast without selection for multicellular size. Because both the multicellular phenotype and the regulatory link enabling faster escape from the stationary phase are controlled by the Ace2 transcription factor, this work demonstrates that multicellularity can arise as a side-effect of a completely independent fitness advantage unrelated to the benefits of group formation itself. As a "passenger phenotype," multicellularity could thus emerge for other selective reasons, potentially facilitating a later transition to more entrenched multicellularity if novel conditions arise where group formation becomes directly beneficial.

      Strengths:

      This work is novel and exciting for research exploring the very first steps of the transition from unicellularity to simple multicellularity. This is particularly significant because the formation of multicellular groups is almost always assumed to come at a cell-level fitness cost due to reduced reproductive fitness compared to remaining unicellular. This cell-level fitness cost generally needs to be outweighed by the benefits of multicellular group formation (e.g., large size escaping predation) for the multicellular phenotype to be stable, which is true for a large number of cases studied in the literature, where the multicellular phenotype can only evolve over unicellular competitors under strong selection for multicellular groups. However, this study presents an interesting case of a genetic and environmental condition under which individual cells (forming simple multicellular clusters) can actually have higher reproductive fitness than unicellular yeast. This demonstrates that the assumed cost at the single-cell level does not always apply. In summary, this work represents a unique example contrary to common assumptions regarding the costs of multicellular phenotypes, showing that simple multicellular phenotypes can evolve and remain stable without requiring strong selection for multicellular size or other benefits of group formation.

      The claims and interpretation of the results align well with the data presented. This is due to the careful and straightforward experimental design testing predictions with a clear, stepwise methodology, ruling out alternative explanations and providing support for the proposed link between the mutations (ace2, cln3, and others), their impact on faster exit from quiescence, and thus earlier entry into reproduction in fresh media, resulting in higher fitness in the snowflake yeast phenotype compared to unicellular yeast.

      Weaknesses:

      The authors show that the same multicellular phenotype with higher cell-level fitness due to faster exit from the stationary phase can also be observed with alleles found at other loci in non-laboratory yeast strains, implying that the results are likely not specific to a peculiar case genetically engineered in laboratory strains, but that similar phenotypes may be present in nature. However, this remains to be explored further by examining the natural ecology of commercially available or wild yeast isolates and their genomes. This is by no means a weakness of this study and, therefore, not necessarily something the current work can improve. It does mean, however, that the relevance of these findings for early multicellularity in yeast, and even more so for nascent multicellularity in distinct taxa, remains to be explored in the future. Until then, it is difficult to make strong claims about how applicable these results would be for non-laboratory yeast and other taxa. Regardless, this work does its part by representing a very exciting finding.

      Reviewer #2 (Public review):

      Summary:

      Here, the authors attempt to demonstrate that a simple model of multicellularity - snowflake yeast - exhibits key ecologically relevant changes in the regulation of the cell cycle. By examining the effects of the ace2 mutation in environments where multicellularity is not directly selected for or against, and combining it with mutations in key cell cycle regulators, they hope to show that mutations driving simple multicellularity can be selectively favored due to their effects on the release from quiescence rather than their effects on multicellularity itself.

      Strengths:

      The experiments performed are extensive and thorough. The yeast genotypes examined are judiciously chosen, so as to map out a functional model of the relationship between alterations to cell cycle control and changes to multicellularity phenotypes. Multiple possible interactions are examined, with the causal link and model of the relationship between the multicellular passenger phenotype and the selectable quiescence-release phenotype being well-supported. There are extensive controls demonstrating the separation between the 'passenger' multicellular phenotype and the cell cycle regulation phenotypes examined, including haploid/diploid strains with different multicellular phenotypes but similar cell cycle regulation phenotypes, and phenocopy strains in which downstream enzymes are deleted rather than key central regulators.

      Weaknesses:

      My only concerns about these results relate to the focus on selection on cell cycle control being examined in a model of multicellularity with key core cell cycle mutations rather than in a wild-type background, as this is a somewhat artificial system.

      I believe, however, that the authors convincingly make their case that this work on the multicellular phenotypes of yeast represents a potent proof-of-concept that simple multicellularity can be driven into existence or selected for as a passenger phenotype due to pleiotropic effects of mutations under selection from real-world ecological pressures. They are able to connect this phenotype back to known mutations of particular cell cycle regulators (RB) in other multicellular lineages and demonstrate that ecologically relevant changes to the cell cycle are connected to multicellular phenotypes. As a proof of concept of the connection between these phenotypes, rather than a study of a particular event in the past of a living lineage, it makes a strong case.

      A longstanding question in the field of multicellularity is the selective pressures that can drive simple multicellularity into existence and then act on simple multicells to drive their increased size and complexity. This work brings to the table tangible evidence of the possibility that, instead of being selected for on its own, simple multicellularity can be a side-effect of selection on other key phenotypes.

      This separates the question of the origins of multicellularity and the forces that drive its further evolution. This separation can reframe how the field is studied, especially in the context of the apparent dichotomy between dozens of origins of 'simple' multicellularity across the tree of life and a few origins of 'complex' multicellularity in the history of Earth. Especially in light of other evidence that multicellularity is connected to changes in cell cycle regulation, I believe that this is an important insight that will alter the way we think about the origins of this key evolutionary transition.

      We thank the reviewers for their insightful comments on our work.

      We agree with reviewer #1 that further experiments would be needed to figure out how the observations done on lab strains can apply to yeast in various ecological conditions and particularly in the wild. We here provide a proof of principle that multicellularity selection can arise as a side-effect. It obviously does not prove that it took place during yeast evolution, but we would like to emphasize that resource fluctuations are very common in ecological conditions, making it highly likely that the environmental conditions necessary for the selection of the side effects described have arisen.

      We agree with reviewer #2 that our work on yeast strains is “somewhat artificial” as often the case with model organisms under laboratory conditions. Importantly though, we showed that the effect found with the cln3 knock-out mutation can be phenocopied by overexpression of WHI5 (encoding the yeast equivalent of Rb). We propose that variations in the levels of cell cycle regulators during evolution may have played a role in multicellularity selection as a side effect. We agree that this is merely a hypothesis to explain the selection of multicellularity (just like predator escape) and that there is no direct evidence that this occurred in the history of the lineage. Nevertheless, our work provides a first evidence that such a selection of multicellularity as a side effect could be possible, and gives a framework to understand how multicellularity can persist in the wild, even when it is not the primary target of selection.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned in my public review, I very much appreciate this work, its interpretation for early multicellularity as an example opposite to the assumed cost of multicellular phenotypes, and the robust design behind the premise and claims. Therefore, my suggestions below are mostly aimed at improving the readability and data presentation.

      (1) In the abstract, Lines 24-27 (the last sentence): This statement is worded too generally and therefore reads as too strong. I think the authors' work provides an example that multicellularity itself does not need to be beneficial all the time - this is really exciting and makes sense! However, there is a substantial body of work showing the origin and maintenance of multicellularity for its direct benefits. Relative to that body of work, this represents a special case, and therefore, while we should definitely reconsider the view that "multicellularity always comes at a cell-level fitness cost," we cannot overgeneralize these findings. Please consider reframing this statement.

      Done, now line 25 (addition of “in some cases”)

      (2) Line 48 (Introduction): "This mostly concerns two major regulators, RB and Cyclin D." Which organisms are you referring to? Please specify.

      Done.

      (3) In the Introduction, there are at least three sentences that need citations: L57-58, L59-60, and L65. For instance, I do not know what makes CLN3 the yeast functional equivalent of RB, and I wanted to verify this claim, but no references are cited. Please ensure citations are provided throughout the manuscript.

      Done: ref 11,12 and 13 were added

      (4) This is my main request regarding data collection and presentation. The authors share some microscopy images of mutant strains in Figure 2 for different purposes (e.g., Figure 2B compares the fraction of budded cells between two genotypes). However, I would appreciate seeing a collected microscopy figure showcasing the phenotypes of all genotypes that went into competition experiments, including the planktonic (WT lab strain) yeast, either where they appear or in a supplementary figure, all presented with the same magnification and scale to make them comparable. Because cell size, shape, and multicellular phenotype are all key aspects of the competition experiments, being able to see all those genotypes/phenotypes would prepare the reader to make predictions about the fitness assays and other experiments.

      Done Supplementary Figure 1 B-E were added

      (5) Related to my previous point, I would appreciate seeing cell size measurements for the different genotypes (both single cells of planktonic genotypes and single cells forming multicellular clusters). Cell size is a key trait that directly impacts the results shown in the paper, and summary statistics comparing them would be helpful for interpreting the results.

      Done Supplementary Figure 1 F was added

      (6) In competition experiments, the authors mix unicellular and multicellular yeast clusters at 50/50 and measure the fraction of a phenotype of interest (usually the % of snowflake). It took me a while to understand what is being counted under the "% snowflake yeast" category. This is because, while each cell in unicellular yeast should be counted as one unit, one can count a snowflake yeast composed of 50 cells as 50 units or as 1 unit. Please clearly state what is being counted for the Y-axis labeled "% of snowflake yeast" (or relabel those Y-axes in plots to make this clear).

      Done: Added in figure legend 1A and Y-axes of competition figures

      (7) I recommend editing the genotype labels in figures (see, for instance, Figure 1B, C, D). In Figure 1B, the bars are labeled as "CLN3/CLN3 co-culture" or "cln3Δ/cln3Δ co-culture," etc. These are actually co-cultures of SF vs. PK (with or without a CLN3 copy). Please consider using more representative labels that will be easier for readers to understand.

      Done: this has been changed in all concerned figures

      (8) In the Results, L225, you begin referring to AMN1368D as AMN1. I suggest using the full allelic form throughout the text so it will be clear each time that you are referring to that specific allele, as I was confused about whether you were discussing the allele or the gene AMN1 itself.

      This has been changed throughout the text.

      (9) Discussion, Lines 250-252, states that this is a "situation that is likely to happen very often under ecological conditions." Are there any examples you can cite?

      Done, as also requested by reviewer #2 (now line 256-7)

      (10) Lines 272-275 contain a strong, general statement suggesting that co-evolution of cell cycle regulation and multicellularity could be more general (which is acceptable as speculation). However, the suggestion that this co-evolution could have "started very early in the evolution of eukaryotic cells" is too speculative. I would recommend sticking with the alternative, suggesting that the link between the two phenotypes may be a case of convergent evolution.

      Done

      (11) Lines 278-279 are both vague and too bold. The text mentions a link between cancer and multicellularity and then extends this link through cell cycle regulators. Without explaining the connection between cancer and multicellularity and then trying to link it to cell cycle regulators, all in a few words without background, this sentence is too vague. Please consider deleting this or spending more time clearly explaining the link, which would at best still be speculative.

      These speculative sentences were removed.

      (12) First, I wanted to note that I highlighted Lines 284-287, as this passage is clearly written and provides a nice argument. I also wonder if you could mention that your work shows simple multicellular cluster formation should not always come at a cost, contrary to the general assumption in the literature, and add a few citations to support that claim. This would highlight how significant this work is within the broader multicellularity literature.

      Changed in discussion (now line 242-4 with additional references 30 and 31)

      (13) I recommend labeling the genotype of your "quintuple mutant" in Figure 3. You can refer to it as the quintuple mutant in the text, but I had to go back and forth to see what those mutations were when trying to think about potential genetic interactions. Even the legend of Figure 3 does not specify the genotype and refers to it only as the "quintuple mutant."

      Now explicitly stated in the title of the figure

      Reviewer #2 (Recommendations for the authors):

      I find the presented research to be of high quality, with very important implications. I have suggestions for improvement of the manuscript, but they are largely stylistic, with one paper that I believe deserves citation regarding the proteins involved. I see little need for additional experiments or analysis, just a clearer description of the results and their significance.

      (1) Line 62: Yeast CLN3 definitely performs the same role as cyclin D in the cell cycle, but has an unclear phylogenetic relationship with the rest of the cyclins. See Cross, Buchler, & Skotheim 2011 ("Evolution of networks and sequences in eukaryotic cell cycle control"). This reference also covers the functional relationship between RB and Whi5, referred to in nearby sentences, as does Medina, Walsh, and Buchler 2019 ("Evolutionary innovation, fungal cell biology, and the lateral gene transfer of a viral KilA-N domain").

      The reference has been added

      (2) Line 69: Is the question whether the evolution of G1/S regulation favoring multicellularity the question, or the two of them being connected such that the evolution of one can affect the other?

      It is clearly the first of the two questions.

      (3) Line 73: Comma after Ace2.

      Done

      (4) Line 76: It would be clearer to specify that snowflake and ACE2 yeast were co-cultured without settling selection or other selection that explicitly favors multicellularity, unlike in experiments where multicellular evolution is observed, as in Ratcliff publications.

      This is now specified.

      (5) Line 80: Specify which phenotypes observed for ace2 mutants are observed, specifically, both the multicellularity and the release from quiescence.

      Done

      (6) Line 146: This observation should be noted as another indication that the multicellular phenotype is not behind the selective pressure, because it is so different between unicells and multicells.

      Overall, you have very strong evidence that this is the case, and emphasizing this would benefit the paper!

      Done.

      (7) Line 151: specify that you are maintaining yeast in proliferation in coculture.

      Done.

      (8) Line 181: This is another key experiment showing that the multicellular phenotype is not the causal reason for the change in quiescence. It might make things clearer to bring all these confirmatory experiments together, particularly the haploids and the sonicated single cells.

      This is now clearly stated line 195.

      (9) Line 225: The choice of referring to the non-laboratory strain as the 'AMN1' wild type default may be confusing to readers, who may treat the genetic background you are using as the ground truth wild type. I recommend throughout the paper always specifying the allele's amino acid to avoid any confusion.

      The genotype is now clearly presented throughout the text.

      (10) Line 238: I would continue to specify that the multicellular phenotype has no selective advantage, specifically when no selection for size is applied.

      See added sentence Line 242-4 (revised version)

      (11) Line 243: I would say that the evolution of cell cycle regulation may interact with the multicellular phenotype.

      This was changed (now line 248)

      (12) Line 244: Strike 'indeed' and the 'the' before AMN1 and ACE2.

      Done

      (13) Line 252: Suggest some ecological conditions under which quiescence exit is likely, such as boom and bust or moving from rotting fruit to rotting fruit.

      Done

      (14) Line 267: Are you suggesting that the specific genes AMN1 and ACE2 had particular effects on actual organisms in the past, or that it represents a broad pattern of evolution in which multicellularity could be more broadly related to exit from quiescence? I believe it is the latter, and I think that should be clearer.

      Modified as suggested

      (15) Line 280: In this paragraph, I think that the point being made could be slightly clearer - if I am not mistaken, you are making the distinction between the appearance of multicellularity and its refinement under selection, and that the former may be more common than previously believed, given this proof of concept. I think this can be made clearer. Furthermore, it is worth noting that all experiments that show effects of the multicellular phenotype are in mutant backgrounds, and explaining why this is still relevant to wild organisms. It might be taken by some as indicating that the multicellular phenotypes are not relevant to a wild population, but the connection to known RB mutations in known multicellular lineages and the fact that it is connected to a very key aspect of cell cycle regulation, I think, overcomes this issue, and this should be made clear.

      Our study reveals a genetic link between multicellularity and Whi5 and Cln3, two important G1/S cell cycle regulators. Similar genetic interactions have been observed in phylogenetically distant species, reinforcing the idea that the interplay between cell cycle regulation and multicellularity is a general feature and not a mere artifact of mutant background.

      The neutral fitness effect of multicellularity in wild-type backgrounds is particularly of interest. By being maintained as a side effect of selection on fundamental cellular processes, the neutral effect of multicellularity may have provided “an evolutionary scheme” for its repeated emergence throughout the tree of life. As such, the "passenger selection" hypothesis fits well with the observations of phenotypic reversibility and facultative multicellularity, despite varying and specific selective pressures. Our work thus gives a framework to understand how multicellularity can persist in the wild, even when it is not the primary target of selection.

      (16) Line 314: What promoters are they driven by?

      Specified

      (17) Line 336: What was the culture volume, and the volume transferred?

      Specified

      (18) Line 362: How was the proportion of blue-stained cells scored? Manually, or with an imaging software cutoff?

      Specified

      (19) Figure 1: I think that the full genotypes of each strain should be specified, either in the legend or the key of the figure, rather than always specifying the ACE2 genotype and other mutations separately.

      Done as requested by reviewer #1

      (20) Figure 2E, 2F: Same as Figure 1, regarding genotypes.

      Done

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Morgan et al. studied how paternal dietary alteration influenced testicular phenotype, placental and fetal growth using a mouse model of paternal low protein diet (LPD) or Western Diet (WD) feeding, with or without supplementation of methyl-donors and carriers (MD). They found diet- and sex-specific effects of paternal diet alteration. All experimental diets decreased paternal body weight and the number of spermatogonial stem cells, while fertility was unaffected. WD males (irrespective of MD) showed signs of adiposity and metabolic dysfunction, abnormal seminiferous tubules, and dysregulation of testicular genes related to chromatin homeostasis. Conversely, LPD induced abnormalities in the early placental cone, fetal growth restriction, and placental insufficiency, which were partly ameliorated by MD. The paternal diets changed the placental transcriptome in a sex-specific manner and led to a loss of sexual dimorphism in the placental transcriptome. These data provide a novel insight into how paternal health can affect the outcome of pregnancies, which is often overlooked in prenatal care.

      Strengths:

      The authors have performed a well-designed study using commonly used mouse models of paternal underfeeding (low protein) and overfeeding (Western diet). They performed comprehensive phenotyping at multiple timepoints, including the fathers, the early placenta, and the late gestation feto-placental unit. The inclusion of both testicular and placental morphological and transcriptomic analysis is a powerful, non-biased tool for such exploratory observational studies. The authors describe changes in testicular gene expression revolving around histone (methylation) pathways that are linked to altered offspring development (H3.3 and H3K4), which is in line with hypothesised paternal contributions to offspring health. The authors report sex differences in control placentas that mimic those in humans, providing potential for translatability of the findings. The exploration of sexual dimorphism (often overlooked) and its absence in response to dietary modification is novel and contributes to the evidence-base for the inclusion of both sexes in developmental studies.

      Weaknesses:

      The data are overall consistent with the conclusions of the authors. The paternal and pregnancy data are discussed separately, instead of linking the paternal phenotype to offspring outcomes. Some clarifications regarding the methods and the model would improve the interpretation of the findings.

      (1) The authors insufficiently discuss their rationale for studying methyl-donors and carriers as micronutrient supplementation in their mouse model. The impact of the findings would be better disseminated if their role were explained in more detail.

      We acknowledge the Reviewer’s comments regarding the amount of detail in support of the inclusion of methyl carriers and donors within our diet. Therefore, we will revise the manuscript to include more justification, especially within the Introduction section, for their inclusion. Please see lines 111-120.

      (2) It is unclear from the methods exactly how long the male mice were kept on their respective diets at the time of mating and culling. Male mice were kept on the diet between 8 and 24 weeks before mating, which is a large window in which the males undergo a considerable change in body weight (Figure 1A). If males were mated at 8 weeks but phenotyped at 24 weeks, or if there were differences between groups, this complicates the interpretation of the findings and the extrapolation of the paternal phenotype to changes seen in the fetoplacental unit. The same applies to paternal age, which is an important known factor affecting male fertility and offspring outcomes.

      We thank the Reviewer for their comments regarding the ages of the males analysed. As we had 5 treatment groups, and intended to generate a minimum of 8 litters of offspring per treatment group, this resulted in over 40 litters in total. In order to dissect these litters appropriately, and in a timely fashion, we had to stagger their generation over time. As such, this resulted in utilising our males at different ages/durations on the diet. However, in all our statistical analysis, we factored in the duration of time on the diet, which also acted as a proxy measure of paternal age. We also ensured that we staggered the generation of litters in each diet group so that any age effects were experienced across all paternal regimens.

      We have revised the manuscript to acknowledge this fact and to highlight that the duration of time on any diet was factored into the statistical analysis.

      (3) The male mice exhibited lower body weights when fed experimental diets compared to the control diet, even when placed on the hypercaloric Western Diet. As paternal body weight is an important contributor to offspring health, this is an important confounder that needs to be addressed. This may also have translational implications; in humans, consumption of a Western-style diet is often associated with weight gain. The cause of the weight discrepancy is also unaddressed. It is mentioned that the isocaloric LPD was fed ad libitum, while it is unclear whether the WD was also fed ad libitum, or whether males under- or over-ate on each experimental diet.

      We agree with the Reviewer that the general trend towards a lighter body weight for our experimental animals is unexpected. We can confirm that all diets were fed ad libitum. However, as males were group housed, we were unable to measure food consumption for individual males. We also observed that for males fed the high fat diets, they often shredded significant quantities of their diet, rather than eating it, so preventing accurate measurement of food intake.

      We also agree with the Reviewer that body weight can be a significant confounder for many paternal and offspring parameters. However, while the experimental males did become lighter, there were no statistical differences between groups in mean body weight. As such, body weight was not included as a variable within our statistical analysis.

      (4) The description and presentation of certain statistical analyses could be improved.

      (i) It is unclear what statistical analysis has been performed on the time-course data in Figure 1A (if any). If one-way ANOVA was performed at each timepoint (as the methods and legend suggest), this is an inaccurate method to analyse time-course data.

      (ii) It is unclear what methods were used to test the relative abundance of microbiome species at the family level (Figure 2L), whether correction was applied for multiple testing, and what the stars represent in the figure. 3) Mentioning whether siblings were used in any analyses would improve transparency, and if so, whether statistical correction needed to be applied to control for confounding by the father.

      We apologies for the lack of clarity regarding the statistical analyses. Going forward, we will revise the manuscript and include a more detailed description of the different analyses, inclusion of siblings and correction for multiple testing.

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the effects of a low-protein diet (LPD) and a high sugar- and fat-rich diet (Western diet, WD) on paternal metabolic and reproductive parameters and fetoplacental development and gene expression. They did not observe significant effects on fertility; however, they reported gut microbiota dysbiosis, alterations in testicular morphology, and severe detrimental effects on spermatogenesis. In addition, they examined whether the adverse effects of these diets could be prevented by supplementation with methyl donors. Although LPD and WD showed limited negative effects on paternal reproductive health (with no impairment of reproductive success), the consequences on fetal and placental development were evident and, as reported in many previous studies, were sex-dependent.

      Strengths:

      This study is of high quality and addresses a research question of great global relevance, particularly in light of the growing concern regarding the exponential increase in metabolic disorders, such as obesity and diabetes, worldwide. The work highlights the importance of a balanced paternal diet in regulating the expression of metabolic genes in the offspring at both fetal and placental levels. The identification of genes involved in metabolic pathways that may influence offspring health after birth is highly valuable, strengthening the manuscript and emphasizing the need to further investigate long-term outcomes in adult offspring.

      The histological analyses performed on paternal testes clearly demonstrate diet-induced damage. Moreover, although placental morphometric analyses and detailed histological assessments of the different placental zones did not reveal significant differences between groups, their inclusion is important. These results indicate that even in the absence of overt placental phenotypic changes, placental function may still be altered, with potential consequences for fetal programming.

      Weaknesses:

      Overall, this manuscript presents a rich and comprehensive dataset; however, this has resulted in the analysis of paternal gut dysbiosis remaining largely descriptive. While still valuable, this raises questions regarding why supplementation with methyl donors was unable to restore gut microbial balance in animals receiving the modified diets.

      We thank the Reviewer for their considered thoughts on the gut dysbiosis induced in our models the minimal impact of the methyl donors and carriers. We will include additional text within the Discussion to acknowledge this. However, at this point in time, we are unsure as to why the methyl donors had minimal impact. It could be that the macronutrients (i.e. protein, fat, carbohydrates) have more of an influence on gut bacterial profiles than micronutrients. Alternatively, due to the prolonged nature of our feeding regimens, any initial influences of the methyl donors may become diluted out over time. We will amend the text to reflect these potential factors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have done an immense amount of work, which should be commended. In addition to the public review, I have a few suggestions for improvement.

      (1) To further explore the weight discrepancy between the males subjected to diet alteration and those on the control diet, further details about the intake and provision of the diets would be beneficial. Seeing as the fat mass was increased in males fed a WD, do you have information on where the weight 'loss' originated from?

      We thank the Reviewer for their insight into the changes in male body weight. We agree that the differences in total body weight verses the amount of adipose tissue, is intriguing. Unfortunately, we were unable to monitor the food intake of our animals for two main reasons. The first was that for animal welfare considerations, all our males were initially group housed prior to mating. This meant that typically, males were housed in groups of 4 during the initial feeding (pre-mating) period. Males were only housed singly upon them being used for mating. As such, it was not possible to obtain food consumption data for individual males.

      A second limitation arose due to the high extend of males who were fed the Western Diet effectively shredding the diet. This meant that it was not possible to weight the food to obtain a crude idea of how much they were consuming. The reason for this shredding is not clear to us. All mice received environmental enrichment, as we did not observed this behaviour for our control or low protein diet fed males.

      With regards to the weight of the other organs, we did not observe and significant overall changes in organ weight, or weight relative to body weight. Unfortunately, we did not have access to, or conduct any whole body scanning, such as DEXA, which would have given more insight into the body composition of our mice.

      (2) The testicular abnormalities and gene expression findings are linked nicely to the offspring's story. This is not as compelling for other findings, including the gut microbiome changes, which are not discussed in the context of the fetoplacental outcomes. More discussion of the potential impact of paternal changes on fetal outcomes would strengthen claims that these findings are impactful.

      We thank the Reviewer for their comments and suggestion. Our caution with connecting the gut microbiota to offspring development is that, to the best of our understanding, there is little data with regards to its effect on post-fertilisation development. While there is data showing that the microbiome can produce compounds and metabolites that can affect sperm quality and metabolism, lipid composition and testicular morphology, the connection with post-fertilisation development is limited. Additionally, as we saw no difference in fundamental fertility, as measured by changes in litter size, we propose that there no overall changes in the ability of the sperm from our experimental males to reach, fertilise and support development.

      However, we acknowledge the Reviewers comments on strengthening the manuscript and so have included some additional text within the Discussion to highlight the links between the microbiome and male reproductive fitness. Please see lines 337-348.

      (3) It is clarified in the methods that n=8 males were used in the study, but different nnumbers are shown for some parameters. It would improve transparency for the reader if it were clarified whether these differences result from missing data or from the removal of statistical outliers.

      The Reviewer is correct that while 8 males were initially placed on their respective diets, for some of the analyses, the n-number is less than 8. In some instances, for example the analysis of total body fat (Fig. 1D), data was unfortunately not collected during an initial round of dissections. As such, the n number here is only 6 in each group. Additionally, due to the high cost associated with sequencing the microbiome for 5 groups, we decided to only sequence 6 samples per group. However, we do not feel that this impacts significantly on the overall focus of the data presented.

      (4) Despite this, you may have been underpowered to detect differences in some parameters, for example, the placental stereology. Alternative approaches, such as immunostaining with whole-section quantification, may be more sensitive to detect subtle changes. Alternatively, have you considered using smaller grids for improved sensitivity of the stereological analysis?

      We thank the Reviewer for their insight into the data and their suggestion for immunostaining. We agree with the Reviewers that a greater number of samples would have strengthened our analyses. However, we are not in the possession of further samples which have been processed in the correct manner for additional stereological analysis. We are hoping to conduct further placental analyses based on our RNA-Seq data, but this will require the generation of new samples.

      (5) It would be easier to interpret the figures if it were clear which datasets were analysed using non-parametric tests. Were Figure 2F, 2G, 6A, 6E, and 6I are shown differently for that reason, perhaps? It would improve transparency if non-normally distributed data are shown as medians, as that's what's being compared in a non-parametric test.

      We apologies for any confusion regarding the analysis of our data. The Reviewer is correct that the data in 2F and 2G were analysed using a non-parametric test. We have now made this clearer in the legend to the figure highlighting which data sets were analysed by ANOVA or Kruskal–Wallis test. We have also done this for the other figure legends where appropriate. With regard to Figure 6, the data presented in Panels A, E and I were intended to show the range of data extending above and below the 90th and 10th centiles of the CD fetuses. As such, we felt that violin plots were the most appropriate way to display these data.

      (6) Supplemental Figure 1 seems to be missing.

      We apologise sincerely for the lack of inclusion of Supplemental Figure 1. We will ensure that it is included in our resubmission

      (7) Line 523 states that samples with RIN < 7 were used for microarray analysis. Do the authors mean RIN > 7?

      We thank the Reviewer for identifying our mistake. The Reviewer is correct that this should have been a RIN >7. We have now corrected this.

      (8) It is mentioned in lines 603-604 that paraffin shrinkage was accounted for. It could be useful to describe how this was done.

      We have revised the text within the Materials and Methods to provide additional clarity on how we compensated for the shrinkage due to the paraffin processing.

      In the revised Methods we have added a brief “Shrinkage correction” subsection describing how paraffin-embedding shrinkage was quantified for each placenta individually. Specifically, we now state that post-embedding placental volume was estimated using the Cavalieri Principle on systematic and uniformly-random sampled H&E sections, and a per-placenta volume shrinkage coefficient (k<sub>V</sub> = V<sub>post</sub>/V<sub>pre</sub>) was calculated.

      We have also added the equations showing how this coefficient was used to correct compartment volumes and the derived surface area estimates (surface area calculated from S<sub>v</sub> and the corresponding shrinkage-corrected placenta volume). Please see lines 618-644.

      (9) This may be due to the generation of the reviewer PDF, but Figure 4E and 4H are illegible in our version of the manuscript.

      We apologies for the lower resolution with these figures and the difficulty in seeing the information presented. We have created revised versions of these figures which we hope are of higher quality and clarity.

      (10) What do the stars represent in Figure 6A, E, I - compared to what, controls?

      The Reviewer is correct that the asterisks in Figures 6A, E and I represent differences in the proportion of fetuses either above or below the 90th and 10th centile of the CD fetuses respectively. As such, in panel A, for both the LPD and MD-LPD groups, there are significantly more fetuses who are below the 10th centile of the CD group. Similarly, in panel E, there are significantly more placentas in the LPD group that have a weight above the 90th centile of the CD group. We have revised the graphs to make these differences, and their comparisons clearer.

      Reviewer #2 (Recommendations for the authors):

      Some Recommendations for improving the writing and presentation, and minor corrections to the text and figures:

      (1) Please describe Wnt signaling in the Abstract.

      The Abstract has been amended to provide some additional text regarding Wnt signalling. Please see lines 60-63.

      (2) Page 6, line 134: A brief explanation of why measuring the inhibin beta-A chain should be included.

      The text within this section has been amended to include a brief description of the role of Inhibin β-A chain on testicular function. Please see lines 135-139.

      (3) The methodology used for Tnf determination is missing and should be described.

      We apologies for the lack of detail regarding our analysis of serum Tnf in our males. This has now been included. Please see lines 479-480.

      (4) It is important to mention that free fatty acid levels in the MD-WD group were similar to those in the CD group, although they remained comparable to the WD group.

      We agree with the Reviewer and have amended the text to indicate that there was no difference in the FFA profile of the MD-WD males to either the CD or WD males. Please see lines 147-148.

      (5) Figure 2 presents both metabolic parameters and bacterial profile analyses. Although the authors appear to relate these outcomes, clarity would be improved by presenting them in separate figures.

      As requested, we have now presented these data as two separate Figures

      (6) Figure 3H: The data suggest that the decrease in the number of spermatogonia (PLZF⁺) observed in the LPD and WD groups was prevented when the diets were supplemented with methyl donors.

      (7) However, the description and interpretation of this result (or of a neutral effect) are missing.

      We agree with the Reviewer in their interpretation of the PLZF+ data. We have indicated this in the text within the Results and Discussion sections. Please see lines 177-178 and lines.

      (8) Line 284: Please check the abbreviation for MD-LPD.

      We thank the Reviewer for identifying this typographical mistake. This has now been corrected to state MD-LPD and not MDL.

      (9) Line 285: Please check the lettering in the text and in Figure 6H-K.

      We thank the Reviewer for identifying this typographical mistake. This has now been corrected to state the panels are Figure 9H-K, as we have split the original Figure 2 into two figures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The study presents important insights into the regulation of muscle hypertrophy, regulated by Muscle Ankyrin Repeat Proteins (MARPs) and mTOR. The methods are overall solid and complementary, with only minor limitations. Overall, the findings will be of interest for both muscle-biology specialists and the broader mechanobiology community.

      We thank the editors for their interest in our manuscript. Below we respond to the reviewer’s comments. Based on these comments we made extensive textual revisions throughout the manuscript, and we added additional analyses to the revised results.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors employ diaphragm denervation in rats and mice to study titin‑based mechanosensing and longitudinal muscle hypertrophy. By integrating bulk RNA‑seq, proteomics, and phosphoproteomics, they map the stretch‑responsive signalling landscape, uncovering robust induction of the muscle‑ankyrin‑repeat proteins (MARP1‑3) together with enhanced phosphorylation of titin's N2A element. Genetic ablation of MARPs in mice amplifies longitudinal fibre growth and is accompanied by activation of the mTOR pathway, whereas systemic rapamycin treatment suppresses the hypertrophic response, highlighting mTORC1 as a key downstream effector of titin/MARP signalling.

      Strengths:

      The authors address a clear biological question: "how titin‑associated factors translate mechanical stretch into longitudinal fibre growth" using a unique and clinically relevant animal model of diaphragm denervation. Using a comprehensive multiomics approach, the authors identify MARPs as potential mediators of these effects and use a genetic mouse model to provide compelling evidence supporting causality. Additionally, connecting these findings to rapamycin, a drug widely used clinically, further increases the relevance and potential impact of the study.

      We thank the reviewer for their kind words and critical review of our manuscript. The roles of the MARP proteins are diverse and form an intriguing target for further study.

      Weaknesses:

      There are several areas where the manuscript could be substantially improved.

      (1) The statistical analysis of multi-omics data needs clarification. Typically, analyses across multiple experimental groups require controlling the false discovery rate (FDR) simultaneously to avoid reporting false-positive findings. It would be very helpful if the authors could specify whether adjusted p-values were calculated using a multi-factorial statistical model (e.g., ~group) or through separate pairwise contrasts.

      We agree with the reviewer that the description of the statistical analysis could be improved. We report the q-values in the supplemental data tables to correct for false positive data, the p-values reflect pairwise comparisons. Statistical testing was performed on whole proteomes or phospho-proteomes, making for very stringent testing (please also see reply to reviewer 2, response 5). Unbiased quantitative proteomics functions primarily as a screen, in-solution digestion of muscle proteins yields comparatively few peptides making population adjusted p-value calculation very stringent, suggesting no/few differences in expression. Hence, we compared RNAseq to proteome data to isolate consistently differential proteins. We have revised the method section (lines 745-746) to include clarifications of the FDR analysis.

      (2) (A)There are three separate points regarding MARP3 that could be improved. First, the authors report that MARP3-KO mice exhibit smaller increases in muscle mass after diaphragm denervation compared to wild-type mice (a -13% difference), indicating MARP3 likely promotes rather than attenuates hypertrophy. However, the manuscript currently states the opposite (lines 215-216); this interpretation should be revisited. (B) Second, it would be valuable if the authors could provide data showing whether MARP3 transcript or protein levels change response to denervation - if they do not, discussing mechanisms behind the observed phenotype would help clarify the findings. (C) Finally, given that some MARP-KO mice already exhibit baseline differences, employing and reporting the full two-way ANOVA (including genotype × treatment interaction) would allow a direct statistical assessment of whether MARP deficiency modifies the muscle's response to stretch. This analysis would help clearly resolve any existing ambiguity.

      (A) Compared to wildtype mice, MARP3 KO mice exhibit baseline diaphragm hypertrophy. This suggests that MARP3 may normally restrain hypertrophy under basal conditions. However, in response to UDD, MARP3 KO mice display an attenuated hypertrophic response, which could be interpreted as MARP3 promoting hypertrophy under stress conditions, as noted by the reviewer. The relationship between MARP3 and metabolism remains incompletely understood, but prior studies indicate that loss of MARP3 enhances glucose tolerance and insulin sensitivity (PMID: 12456686), suggesting that MARP3 may act as a negative regulator of metabolic signaling. Both glucose and insulin can activate the PI3K pathway to promote hypertrophy (PMID: 16679293), which may contribute to the baseline hypertrophy observed in MARP3 KO diaphragms. In addition, MARP3 deficiency has been associated with activation of AMPK signaling (PMID: 26398569). AMPK is a key regulator of metabolic pathways and a well-established inhibitor of hypertrophic signaling, in part through suppression of mTOR activity, and is also responsive to mechanical stimuli (PMID: 18556591). Thus, increased AMPK activity in MARP3 KO mice may limit hypertrophy in response to UDD. Supporting this, our phospho-proteomics data indicate increased activation of the AMPK β-subunit following UDD, suggesting a potential role for AMPK signaling in stretch-induced hypertrophy. Based on these considerations, we have removed the statement that MARP3 attenuates hypertrophy and instead incorporated the potential role of AMPK signaling into the Discussion (lines 354–355). While the present study focuses on the triple MARP KO model, future work will examine the specific contributions of individual MARP proteins to muscle hypertrophy.

      (B) MARP3 (Ankrd23) upregulation at the RNA level was detected by RNA-seq in rat diaphragm following both UDD and BDD (Supplemental Tables 1 and 2). This is consistent with our prior findings in mice, where western blot analysis showed increased MARP3 protein expression following UDD (PMID: 29978560). We note that reliable detection of MARP3 protein remains technically challenging due to limited availability of specific antibodies.

      (C) We agree with the reviewer and have added the results of the two-way ANOVA to the figures (see updated Figure 4). The three MARP proteins exhibit differential effects on diaphragm hypertrophy, supporting their role as modulators of stretch-induced hypertrophy.

      (3) The current presentation of multi-omics data is somewhat difficult to follow, making it challenging to determine whether observed changes occur at the transcript or protein level due to inconsistent gene/protein naming and capitalization (e.g., proper forms are mTOR, p70 S6K, 4E-BP1). Clearly organizing and presenting transcript and protein-level changes side-by-side, especially for key molecules discussed in later experiments, would make the data more accessible and provide clearer insights into the biology of titin-mediated mechanosensing.

      We agree with the reviewer that naming conventions between gene and protein can be hard to follow. We kept the names for titin-associated proteins as some have multiple protein names and the most common names is shown here. However, we made the suggested changes for the mTOR related proteins (for example, see figure 5).

      (4) The current analysis relies on total protein measurements downstream of mTOR, yet mTOR's primary mode of action is to change phosphorylation status. Because the authors have already generated a phosphoproteomic dataset, it would be very helpful to report - or at least comment on - whether known mTOR target phosphosites were detected and how they respond to denervation and rapamycin. Including even a brief summary of canonical sites such as S6K1 Thr389 or 4E - BP1 Thr37/46 would make the link between mTOR activity and hypertrophy much clearer.

      We agree with the reviewer that the mTOR data requires more work to ascertain its function in regulating hypertrophy following UDD. We investigated S6K1 Thr389 or 4E BP1 Thr37/46 in both the phosphoproteomic dataset and by western blot. These sites do not appear in phosphoproteome mass spectrometry (supplemental data table 13) and 4E BP1 Thr37/46 was unchanged by western blot (not shown). The S6K1 Thr389 antibody was aspecific in our hands, but Norrby et al (PMID: 22657251) saw increased levels by 6-days UDD. Hence the mTOR aspect of this study is quite complex, suggesting mTOR plays a major role in UDD hypertrophy, but potentially through an alternative activation pathway from what is classically described for muscle hypertrophy. We are investigating the mTOR mechanism further focusing on mTOR’s role in regulating longitudinal hypertrophy with potential connection to titin signaling and hope to publish this in the next few years. We revised the discussion to include canonical mTOR activation in hypertrophy, please see lines 388-392.

      (5) Finally, since rapamycin blocks only a subset of mTOR signalling, a brief discussion that distinguishes rapamycin‑sensitive from rapamycin‑insensitive pathways would be valuable. Clarifying whether diaphragm stretch relies exclusively on the sensitive branch or also engages the resistant branch would place the results in a broader mTOR context and deepen the mechanistic narrative.

      We agree with the reviewer that distinguishing between rapamycin-sensitive and -insensitive mTOR signaling adds useful context to the interpretation of stretch-induced hypertrophy. Rapamycin primarily inhibits mTORC1, whereas mTORC2 is generally considered rapamycin-insensitive, although prolonged or high-dose exposure can also affect mTORC2 activity. Our data indicate that UDD induces a form of hypertrophy that is sensitive to rapamycin, supporting a prominent role for mTORC1 in this process. However, we cannot exclude the possibility that rapamycin-insensitive pathways, including mTORC2 signaling, also contribute. Notably, denervation itself may influence mTORC2 activity, which could complicate the distinction between stretch- and denervation-mediated signaling. Given these considerations, we have added a brief discussion to acknowledge potential contributions of rapamycin-insensitive mTOR signaling (lines 379-384). A more comprehensive dissection of mTORC1 versus mTORC2 signaling in this context will require targeted approaches and falls beyond the scope of the present study.

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      (6) The manuscript notes that KEGG analysis "confirmed" the GO‑term findings. Because KEGG pathways and GO terms describe different types of biological information, it might be clearer simply to present them as complementary lines of evidence rather than one validating the other.

      We agree and modified the text accordingly. “Concurrently, KEGG PATHWAY database searches (Supplemental data Table 6) indicated that the DEG’s are involved in muscle remodeling.” See lines 166-169.

      (7) Figure 2's legend mentions a two‑way ANOVA, but the specific factors tested are not specified. Listing those two factors would help readers interpret the statistics more easily.

      The two-way ANOVA refers to the violin plot in figure 2E and tests the difference of the 2 surgical modalities sham vs UDD and sham vs BDD. Sham groups were combined in the graphs for easy comparison. We clarified the text of figure legend 2.

      (8) The Methods briefly describe phosphopeptide enrichment, but additional details on the criteria for site identification - such as the localisation algorithm, probability cut‑off, and FDR thresholds - would make the phosphoproteomics section more transparent and reproducible.

      Please see the updated method section, lines 756-765

      Reviewer #2 (Public review):

      Summary:

      Muscle hypertrophy is a major regulator of human health and performance. Here, van der Pilj and colleagues assess the role of the giant elastic protein, titin, in regulating the longitudinal hypertrophy of diaphragm muscles following denervation. Interestingly, the authors find an early hypertrophic response, with 30% new serial sarcomeres added within 6 days, followed by subsequent muscle atrophy. Using RBM20 mutant mice, which express a more compliant titin, the authors discovered that this longitudinal hypertrophy is mediated via titin mechanosensing. Through an omics approach, it is suggested that the Muscle ankyrin proteins may regulate this approach. Genetic ablation of MARPs 1-3 blocks the hypertrophic response, although single knockouts are more variable, suggesting extensive complementation between these titin binding proteins. Finally, it is found through the administration of rapamycin that the mTOR signalling pathway plays a role in longitudinal hypertrophic growth.

      Strengths:

      This paper is well written and uses an impressive suite of genetic mouse models to address this interesting question of what drives longitudinal muscle growth.

      We appreciate the reviewer’s kind words on our manuscript and their critical review of our work. A potential separate mechanism governing cross-sectional versus longitudinal hypertrophy is of great interest and something we aim to address in future manuscripts.

      Weaknesses:

      While the findings are of interest, they lack sufficient mechanistic detail in the current state to separate cross-sectional versus longitudinal hypertrophy. The authors have excellent tools such as the RBM20 model to functionally dissect mTOR signalling to these processes. It is also unclear if this process is unique to the diaphragm or is conserved across other muscle groups during eccentric contractions.

      Reviewer #2 (Recommendations for the authors):

      (1) Cross-sectional hypertrophy characterization: The paper emphasizes longitudinal hypertrophy but does not quantify the contribution of radial (cross-sectional) hypertrophy to the total mass increase. Given that the denervated costal diaphragm shows ~50% increase in mass (Figure 1B) but there is only ~30% fiber lengthening, it is important to determine the proportion attributable to fiber diameter changes. Histological analysis of muscle fiber cross-sectional area would clarify the relative contributions of longitudinal versus radial hypertrophy to the overall mass phenotype.

      We agree with the reviewer that radial hypertrophy is an important mechanism for muscle weight gain in UDD. In previous work we characterized both the radial and longitudinal hypertrophy response in 6-day UDD and found that ~20% of the mass gain seen in UDD is radial hypertrophy (PMID: 29978560). We reference this paper in the discussion section, line 277-278. Doing a full histological work-up of UDD diaphragm would be interesting but falls outside the scope of this manuscript. Our focus was to characterize longitudinal hypertrophy by addition of sarcomeres in series and provide insight into titin’s role in regulating longitudinal hypertrophy. We hope that the reviewer agrees with this approach.

      (2) Titin isoform expression analysis: At line 103, the authors propose that longitudinal hypertrophy reduces strain on titin by decreasing fractional sarcomere extension. However, this hypothesis does not exclude the possibility of isoform switching to a less elastic titin variant, which may compensate for changes in mechanical stress. The RNA-sequencing data should be analyzed for titin exon usage patterns between sham and UDD to determine whether changes in isoform composition (e.g., PEVK region splicing) accompany longitudinal hypertrophy. If isoform switching occurs, this represents an alternative or complementary mechanism to sarcomere addition.

      We analyzed titin exon usage in rat following both UDD and BDD. Increases in sarcomeres in series associated with UDD show modest changes in titin exon usage, though not significant by population adjusted p-values. The denervation effect of BDD did show changes in splicing, indicating lower inclusion of PEVK encoding exons, suggesting a stiffening of the titin molecules. Stiffening of titin molecules might be protective for the fully paralyzed diaphragm and preserve muscle mass. This would align with our prior publication (PMID: 29978560) which showed that stiffer titin generated more radial hypertrophy in response to UDD. In response to the reviewer’s comment, we added the splicing data to the supplemental data as new figure 2 and briefly address titin splicing in the results section, see lines 121-125.

      (3) The comparison of 3-day unilateral diaphragm denervation (UDD) and bilateral diaphragm denervation (BDD) in rats (Figure 1D-E) is used to argue that hypertrophic signaling is stretch-dependent rather than denervation-dependent. However, this interpretation requires clarification. In mice, hypertrophy is detectable as early as 1 day post-UDD, whereas the 3-day BDD protocol may drive an accelerated hypertrophic-to-atrophic remodelling process given the severity of the model. Moreover, longitudinal and global muscle hypertrophy may operate through distinct mechanisms: denervation could suppress longitudinal hypertrophy through a separate pathway while promoting or delaying cross-sectional hypertrophy. The authors should acknowledge that the current evidence does not fully exclude denervation-dependent mechanisms and should consider extended BDD time points or additional mechanistic studies to clarify this distinction.

      UDD and BDD are both denervation models and hypertrophy occurs in the denervated costal of UDD operated animals. Stretch is thus the mechanical difference between UDD and BDD and thus the trigger for hypertrophy signaling. At the denervation signaling level both models should in principle be comparable and are unlikely to play different roles between UDD and BDD, except that UDD also induces a more potent hypertrophy signaling profile on top of the atrophy program. That said, BDD is a more severe model and respiration rate is depressed compared to UDD where respiration rate is elevated. BDD rats also engage in abdominal breathing, which mildly stretches the diaphragm. Hypoxia is likely to play a stronger role in BDD than UDD and could thus further enhance the atrophy profile of BDD. We agree with the reviewer that more work is needed to elucidate the BDD remodeling response, however UDD induced stretch is the main driver of longitudinal hypertrophy. In response to the reviewer’s comment, we have added clarifying text to the discussion, lines 286-292.

      The potential for there being two independent mechanisms for both radial and longitudinal hypertrophy is of great interest to us. We foresee that dissecting out these differences will require a cell culture-based approach and will aid in avoiding the complexity of overlapping denervation and hypertrophy signals as seen in this manuscript.

      (4) Characterization of RBM20 models: The RBM20 experiments rely on the assumption that increased titin compliance reduces stretch sensitivity. However, the paper provides minimal baseline characterization of the diaphragms. Specifically: (a) What are the sarcomere lengths in RBM20-deficient diaphragms at rest and under stretch? (b) How does the passive force-length relationship differ between wildtype and RBM20-deficient diaphragm muscles? and (c) Would RBM20-deficient muscles, despite having longer sarcomeres at baseline, actually experience sufficient strain to activate mechanosensing? These data are necessary to interpret why RBM20-deficient mice show attenuated mass gain rather than none (as in BDD) during UDD (Supplemental Figure 2A-C). Additionally, what would the authors hypothesize would happen if rapamycin were used in RMB20 UDD models? It appears to be an attractive experimental approach to separate potential mTOR contributions to longitudinal versus cross-sectional hypertrophy.

      We agree with the reviewer that more work is needed on Rbm20 deficient mice and rats to elucidate their response to stretch. Part of this characterization has previously been published (PMID: 29978560) and Rbm20 splice-deficient mice have reduced passive stiffness in the diaphragm and show a robust mechanosensing response to UDD. Rbm20 splice-deficient mice also show a similar increase in longitudinal hypertrophy, but a blunted radial hypertrophy in response to 6-days UDD. The main reason for not expanding on these mice/rats further was the added complexity of Rbm20 splicing multiple targets that could affect hypertrophy signaling, for example LDB3 (ZASP) and FLNC (Filamin C) are both associated with hypertrophic cardiomyopathy. Hence for the purpose of this manuscript we showed mice and rats having a similar response to UDD, hypertrophy wise, and that titin stiffness (reduced in Rbm20-deficient animals) affects hypertrophy at the diaphragm mass level.

      Testing rapamycin on Rbm20-deficient animals could be interesting, however the complexities of also changing splicing of non-titin targets will make interpretation of mTOR signaling difficult. Perhaps an alternative approach would be to generate a titin mouse model with more compliant titin (e.g. increase the size of the PEVK segment), a model we are considering for future studies. TtnΔ112-158 mice, deleting a large portion of the PEVK region (PMID: 30565562) show increases in sarcomere number. We would expect a model with more PEVK to thus show a reduction in the number of sarcomeres in series. We discuss the role of titin stiffness in the discussion and how titin stiffness ties to longitudinal hypertrophy, please see lines 302-314.

      (5) Statistical analysis and multiple hypothesis correction: The proteomic analyses appear to employ a nominal p-value threshold (p < 0.05) without correction for multiple comparisons or false discovery rate (FDR) control. This is particularly concerning given the large number of comparisons. For example, the authors report 142 titin phosphorylation sites significantly different between sham and UDD at p < 0.05 (approximately 20% of ~700 identified sites). However, with proper FDR correction (adjusted p < 0.05), only 14 sites remain significant - a 90% reduction. This discrepancy is critical for the discussion on titin N2A phosphorylation sites pS9459 and pS9520, where only pS9520 achieves statistical significance after FDR adjustment. The authors should justify their choice of statistical thresholds and reanalyze key findings using FDR-corrected p-values. Additionally, the phosphoproteomics dataset should be screened for duplicate phosphosite identifications to ensure each site is counted only once.

      Reviewer 1 has voiced similar concerns, and we have thus expanded the methodology to explain the statistical tests used to analyze the data and the process of establishing Z-scores of isobaric peptides for the same phospho-sites (see lines 756-765). Our statistical analysis covers all detected peptides, when we only analyze the titin peptides: pS9459 is only significant in t-test, likely due to large variation in isobaric peptides. pS9520 is significant in both independent t-test and FDR. We changed figure 3D to show the fold change instead of the previous Z-score for more intuitive interpretation.

      Minor comments:

      (6) Line 52: "thesarcomeres" should read "the sarcomeres".

      A space has been added, please see line 52.

      (7) Line 52: "half-sarcomer" should read "half-sarcomere"

      Spelling has been corrected, please see line 52.

      (8) Figure clarity: Figure 1 (B-C) presents mouse data, while Figure 1 (D-E) presents rat data. This distinction should be clearly labeled in the figure legend or on the axes to prevent misinterpretation, particularly for readers unfamiliar with the experimental design.

      We added the species to the y-axis of revised figure 1B-E and added additional clarification in the figure legend.

      (9) Supplementary tables: When reporting statistical comparisons in the supplementary tables, please consider including the directionality of the statistical tests (e.g., which group was higher or lower) alongside p-values. This will facilitate interpretation without requiring reference to the main text figures.

      We agree with the reviewer and added statistical direction as a new column next to the p-values, please see the revised supplemental tables.

      (10) Given the interesting divergent findings in MARPtKO versus single knockouts, it would be interesting to assess by immunofluorescence the association of each MARP with the N2A region of titin following UDD.

      We agree with the reviewer that localization is important. Miller et al (PMID: 14583192) previously localized MARP1-3 to the N2A segment by immuno-EM and our work previously localized MARP1 to N2A using SR-SIM (PMID: 29978560). We will further investigate MARPs binding to the N2A region in an upcoming study that we intend to publish soon.

    1. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      This is a challenging hypothesis that would require some additional experimental controls. The pathway dissection, while extensive, is sometimes approached in unconvincing ways, and the results are not always evident to judge or interpret. Technically, the western blots and transcriptomic analyses require notable improvements.

      We would like to thank the reviewer for the careful and patient examination of the issues identified in our manuscript. The poor quality of some of the Western blot bands in Figure 4 may have been caused by inappropriate electrophoresis conditions during the Western blot experiments. In the revised manuscript, we will optimize the electrophoresis conditions to obtain higher-quality protein bands and update the quantitative data. Regarding the quantification format, we believe that heatmaps provide a more intuitive representation of trends in protein expression across different treatment groups. This approach more accurately reflects the results of our biological replicates than simply analyzing the significance of differences in the grayscale values of protein bands. For the analysis of transcriptomic data, we will conduct a more detailed analysis of signal pathway enrichment and the identified differentially expressed genes to ensure that predicted genes are excluded from our current results and redundant data presentation is removed.

      Regarding additional experimental controls, such as incorporating experimental data under blue light treatment conditions as a control for red light. While exploring the optimal red light irradiation dose at the cellular level, we simultaneously conducted experiments on the effects of blue light irradiation at the same dose on keratinocyte activity. The results indicated that as the blue light irradiation dose increased (0–160 J/cm<sup>2</sup>), the keratinocyte activity exhibited a dose-dependent decline. This indicates that blue light is phototoxic to keratinocytes. The relevant experimental results have already been published in our previous study (Communications Biology 2024, doi: 10.1038/s42003-024-06973-1). Taken together with the data from our study, this demonstrates that the anti-aging effects of red light reported in the current manuscript are indeed driven by red light.

      Reviewer #2 (Public review):

      Weaknesses:

      The paper does not evolve to use the mechanistic discoveries of the manuscript to help our community to identify the mechanism of photobiomodulation, which is not known so far.

      I would like to draw attention to a recently published paper by Herrera et al. (FEBS Letters 2025, doi:10.1002/1873-3468.70195), which shows that red light (660 nm) stimulates mitochondrial fatty acid oxidation in keratinocytes via AMPK‑dependent phosphorylation of ACC, without altering expression of electron transport chain complexes. I believe this paper is highly complementary to the current study.

      Herrera et al. demonstrate that red light increases basal, ATP-linked, and maximal oxygen consumption rates in keratinocytes specifically through enhanced fatty acid oxidation (inhibited by etomoxir). This independently validates the central finding of the current manuscript, i.e., red light boosts lipid metabolism, strengthening the robustness of this concept.

      While the current manuscript focuses on the SIRT4-MCD axis, Herrera et al. identify AMPK phosphorylation and ACC inhibition as key effectors. The authors can integrate and expand their discussion, since SIRT4 downregulation may converge on AMPK activation, or they may represent parallel, reinforcing mechanisms. This would enrich the mechanistic model and open new hypotheses.

      The mechanism of photobiomodulation: Herrera et al. explicitly challenge the prevailing paradigm that red light acts solely via cytochrome c oxidase (by showing long-lasting effects, unchanged OXPHOS protein levels, and no difference in permeabilised cells). The current finding (red light acts through SIRT4 downregulation, i.e., not direct enzymatic activation) aligns perfectly with Herrera´s critique.

      Long-term metabolic effects-Herrera et al. show that a single red light exposure elevates oxygen consumption for up to 2 days. The current study focuses on changes at 12-24 h. Their data extend the time window and suggest that the metabolic reprogramming you describe may persist longer than currently discussed, which is clinically relevant.

      Discussing Herrera et al.'s results would not only acknowledge independent, corroborating evidence but would also allow the authors to position their SIRT4-centric mechanism within a broader, emerging understanding of red-light photobiomodulation.

      We would like to thank the reviewer for providing us with constructive suggestions for discussion. Our results showed that under red light conditions, both glycolipid and lipid metabolism were activated in keratinocytes, and cellular metabolic flux increased. The activation of lipid metabolism directly led to an increase in metabolism-associated H3K9ac and drove the upregulation of anti-aging-related genes; we believe this is key to the anti-aging effects of red light. Mechanistic analysis combining proteomics and acetylation proteomics revealed that red light significantly downregulated SIRT4 expression and increased the acetylation of MCD, a protein regulated by SIRT4 that governs cellular fatty acid oxidation rates. Through validation using cell-level knockdown and inhibitors, we confirmed that SIRT4 inhibition exerts anti-aging effects in vitro and that inhibiting MCD function under red light conditions suppresses H3K9ac. These results establish the role of the SIRT4-MCD signalling axis in mediating the anti-aging effects of red light.

      The study by Herrera et al. included a substantial body of validation data confirming the role of red light in promoting fatty acid oxidation, providing robust empirical support for our research. Furthermore, Herrera et al. revealed that red light-induced fatty acid oxidation depends on AMPK and ACC phosphorylation. This mechanism of red-light photobiomodulation may refute the notion that its bio-regulatory effects rely solely on the action of mitochondrial cytochrome c oxidase. Furthermore, together with our study revealing that red light exerts anti-aging photobiomodulatory effects via the SIRT4-MCD signalling axis, these findings independently confirm that red light regulates cellular fatty acid oxidation, thereby demonstrating the pivotal role of activated fatty acid oxidation in the bio-regulatory effects of red light. In the revised manuscript, we will include a discussion on the potential link between the red light-driven downregulation of SIRT4 and the phosphorylation of AMPK/ACC. This will be of positive value in elucidating how SIRT4 exerts its anti-aging effects by regulating lipid metabolism, as well as in explaining the possible mechanisms by which red light downregulates SIRT4.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript has several strengths, including a technically comprehensive approach that combines mouse genetics, electrophysiology, live imaging in assembloids, and human organoid models, providing a rich and multifaceted dataset. Cross-species validation through the parallel use of mouse and human systems strengthens the generality of the observed phenotypes and increases relevance to human neurodevelopment.

      Consistent phenotypic observations across systems show that ARHGEF6 loss affects migration, neurite morphology, growth cone structure, and neuronal survival, supporting a coherent role in cytoskeletal regulation.

      There is clear evidence for developmental defects, including reduced interneuron numbers, increased apoptosis in the ganglionic eminences, and migration deficits, all well supported by quantitative analyses. Also, there is a high-quality electrophysiological characterization that demonstrates reduced firing in interneurons, providing a well-controlled functional phenotype.

      Strengths:

      The manuscript has several strengths, including a technically comprehensive approach that combines mouse genetics, electrophysiology, live imaging in assembloids, and human organoid models, providing a rich and multifaceted dataset. Cross-species validation through the parallel use of mouse and human systems strengthens the generality of the observed phenotypes and increases relevance to human neurodevelopment.

      Consistent phenotypic observations across systems show that ARHGEF6 loss affects migration, neurite morphology, growth cone structure, and neuronal survival, supporting a coherent role in cytoskeletal regulation.

      There is clear evidence for developmental defects, including reduced interneuron numbers, increased apoptosis in the ganglionic eminences, and migration deficits, all well supported by quantitative analyses. Also, there is a high-quality electrophysiological characterization that demonstrates reduced firing in interneurons, providing a well-controlled functional phenotype.

      We thank the reviewer for their positive and thoughtful assessment of our manuscript. We appreciate their recognition of the technical breadth of the study, including the integration of mouse genetics, electrophysiology, live imaging in assembloids, and human organoid models. We are also grateful that the reviewer highlights the value of our cross-species approach, as a major goal of the study was to determine whether ARHGEF6 loss produces convergent developmental and cellular phenotypes in both mouse and human systems.

      Weaknesses:

      Despite the strengths mentioned above, the study has some conceptual and experimental weaknesses that reduce its impact. The mechanistic insight is limited, as the research does not directly establish how ARHGEF6 regulates downstream signaling pathways.

      We appreciate the reviewer’s constructive comment. We agree that, although our data establish a phenotypic link between ARHGEF6 loss and interneuron development, they do not directly dissect the molecular mechanisms underlying the observed defects. Our interpretation that the mutant phenotype involves dysregulation of cytoskeletal dynamics is based on the directly observed defects in actin polymerization and organization in neural progenitor cells and neuronal growth cones respectively, and is consistent with the abnormalities observed in neurite morphology and neuronal migration. This interpretation is further supported by the established role of Arhgef6 as a regulator of the small Rho GTPases Rac1 and Cdc42. Previous evidence shows that Arhgef6 loss reduces the activity of both GTPases and deregulates the expression of the cytoskeletal regulators Pak1–3, Limk1, and Cofilin in the mouse brain (Ramakers et al., 2012). Moreover, spine abnormalities in Arhgef6-knockdown ex vivo slice cultures can be rescued by expressing the active form of Pak3, a downstream effector of Rac1 and Cdc42 (Node-Langlois et al., 2006). Together, these findings support a model in which the loss of the protein affects development through cytoskeletal dysregulation, likely involving altered Rho GTPase signalling. We nevertheless agree that further experiments would be required to establish a direct causal relationship between ARHGEF6 loss, Rho GTPase activity, cytoskeletal dysregulation, and the interneuron phenotypes described here. We will therefore revise the manuscript to clarify that this mechanistic link remains an interpretation supported by our data and the literature, rather than a direct demonstration within the present study.

      Also, there is insufficient evidence for interneuron specificity; although the central claim is that ARHGEF6 plays a selective role in interneurons, the data do not adequately exclude the possibility that the observed effects reflect broader neuronal defects. The study lacks critical controls across cell types, as several phenotypes observed in organoids and progenitors, including apoptosis, reduced neuronal output, and altered morphology, could also affect multiple neuronal populations without being directly tested.

      We agree that the current data do not exclude the possibility of alterations in other neuronal lineages, specifically the excitatory lineage. With regard to this, we would like to emphasize that the investigation of excitatory cell phenotypes was beyond the scope of the present study, as this aspect has previously been examined by Ramakers et al., 2012 and Node-Langlois et al., 2006, particularly in the context of hippocampal pyramidal cells, which are among the few cell types showing consistent expression of the gene in the adult mouse brain (Allen Brain Atlas; Yao et al., 2021). In this context, it is interesting to note that, in Ramakers et al., 2012 (Figure S1), MAP2 immunostaining of hippocampal formations revealed comparable distribution and intensity of neuronal cell bodies and dendrites throughout the hippocampus of both wild-type and Arhgef6-KO animals. With regard to morphological maturation of excitatory cells, whereas we observe a simplification of interneuron morphology in both mouse and human models, Ramakers et al., 2012 reported increased dendritic arborization complexity in hippocampal pyramidal cells. With regard to migration, a direct comparison with excitatory neurons would be intrinsically difficult, as excitatory and inhibitory neurons undergo highly distinct migratory processes and are therefore not directly comparable. We greatly appreciate the reviewer’s comment, as it gives us the opportunity to better discuss the relationship between our findings and previous studies in the Discussion. We will revise the manuscript and avoid implying that the phenotype observed is exclusive to interneurons.

      Furthermore, the data are predominantly descriptive, with many results remaining correlative and failing to establish causal relationships.

      We agree that our study primarily establishes a phenotypic framework and does not fully resolve the causal hierarchy among altered survival, migration, cytoskeletal morphology, and intrinsic excitability. We will revise the manuscript to make this limitation explicit, avoiding statements that imply direct causality beyond the data presented.

      Some more comments:

      (1) Given that ARHGEF6 is a guanine nucleotide exchange factor for Rac1 and Cdc42, the absence of direct measurements of GTPase activity or downstream signaling represents a significant gap. The interpretation that the observed phenotypes are mediated through specific cytoskeletal pathways, therefore, remains inferential.

      We appreciate the comment. The interpretation that our phenotype involves dysregulated cytoskeletal dynamics is based on the observed defects in actin polymerization and F-actin organization in neuronal growth cones and is consistent with the abnormalities in neurite morphology and neuronal migration. We will explicitly state in the Discussion that, since we did not directly measure Rac1 and Cdc42 activity levels in our models, our hypothesis regarding the involvement of this molecular pathway in the establishment of the observed phenotype therefore remains inferential, despite being supported by the current literature.

      (2) The manuscript repeatedly interprets the findings as interneuron-specific. However, several key observations are not demonstrated to be restricted to IN. Without direct comparison to excitatory neurons or other cell types, it is difficult to conclude that ARHGEF6 plays a selective role in interneurons rather than a more general role in neuronal development. The well-done analysis of the transcriptomic dataset is not sufficient to claim IN specificity. This issue is particularly important for the interpretation of the human organoid experiments, where reductions in SOX2⁺ progenitors and NEUN⁺ neurons, as well as increased apoptosis, could reflect global developmental defects. Similarly, in the mouse experiments, the reduction in GAD67⁺ cells is compelling, but it is not shown whether other neuronal populations are also affected.

      As previously mentioned, we understand the reviewer’s concern regarding the specificity of the observed phenotypes in interneurons and agree that the claims should be tempered. However, it is important to note that the interpretation of the human organoid experiments should be reconsidered. The use of specifically ventralized MGE-like organoids allowed us to assess the cell-autonomous nature of defects such as the reduction in inhibitory progenitors’ neuronal output, the increased apoptosis, and the morphological abnormalities of inhibitory neurons. We will acknowledge in the Discussion the limitations of the study with regard to assessing the cell-autonomous nature of the observed migration defects.

      (3) The study provides a strong phenotypic description but limited causal resolution. For example, migration defects, altered growth cone morphology, and reduced branching are all consistent with impaired cytoskeletal regulation, but the links between these phenotypes are not directly established. Likewise, while the electrophysiological data convincingly show reduced firing in interneurons, the connection between altered cytoskeletal dynamics and intrinsic excitability is not explored.

      The observed migration defects, altered growth-cone morphology, and reduced branching are consistent with impaired cytoskeletal regulation. However, we acknowledge that the mechanistic links among these phenotypes remain to be directly demonstrated. Similarly, although our electrophysiological data show reduced firing in ARHGEF6-KO interneurons, the present study does not provide direct evidence linking impaired excitability to altered cytoskeletal dynamics. In the latter case, we think that the underlying mechanisms should be further investigated at the subcellular level, particularly with respect to cytoskeleton-mediated intracellular trafficking and localization and distribution of ion channels. One limitation of the present study, which may have masked electrophysiological alterations associated with differences in membrane composition (current Figure S1D–H), is that different interneuron subtypes with distinct intrinsic properties were pooled together in the analysis. We will expand the Discussion to address these limitations.

      (4) Several aspects of data presentation could be improved. In multiple figures (e.g., Figure 1A, D; Figure 4 and Video S1, 2), the images are difficult to interpret due to high cellular density, limited magnification, or lack of clear annotation. In some cases, it is not fully clear how quantifications were performed or which regions were analyzed. Improving the visual clarity with arrows, boxes, and high-magnification inserts of the data would strengthen confidence in the conclusions.

      We would like to thank the reviewer for pointing this out. We agree that some images and videos would benefit from clearer annotation. In the revised manuscript, we will add high-magnification insets, arrows or boxes highlighting the relevant regions/cells, and clearer descriptions of the quantified regions. We will also improve legends and video labels to indicate genotype, region, and tracked cells.

      Reviewer #2 (Public review):

      The authors investigate the impact of the deletion of the small GTPase regulator ARHGEF6 on the development and physiology of interneurons. Using public databases, they first show that ARHGEF6 is enriched in interneurons or in areas that give rise to them, both in development and adulthood, in humans and mice. Using a complete KO mouse previously reported, and using a GAD67-GFP reporter mice line, they show that in the adult mouse cortex and hippocampus, there is a notorious reduction GFP+ cells. These mice show increased apoptotic cells at different timepoints and areas of the brain during development. In the developing cortex of ARHGEF6-KO mice, there are fewer IN in all layers of the developing cortex, and cells present processes not correctly oriented. IN from the hippocampus in culture show reduced excitability and impaired neurite branching. The authors then established isogenic hiPSCs lines to study ARHGEF6 deletion in human cells and differentiated ventral forebrain neurons, to find interneuron-related and non-related phenotypes. Most importantly, human interneurons grown in organoids show reduced branching and altered growth cone morphology. The authors claim that the novel interneuron phenotypes found in these models can explain, in part, the human intellectual disabilities associated with mutations in this protein. The study is well conducted and opens new avenues of research not only for the role of small GTPases regulation in early nervous system development, but also for how interneuron deficiencies impact a wider range of intellectual disability syndromes found in humans.

      We appreciate the reviewer’s positive evaluation of our manuscript and their recognition of this work’s potential to expand the focus of intellectual disability research on the development and function of the inhibitory system. We are particularly encouraged that the reviewer highlights the strength of our combined mouse and human cellular models, as well as the relevance of the interneuron-related phenotypes we identify across systems.

      However, most conclusions of the present version would be strengthened after considering the following comments:

      Major comments:

      (1) The reported biological processes evaluated at different developmental stages may be directly or indirectly related to ARHGEF6 function itself. As a model of a hereditary disease, full organism gene deletion is valid, since the human patients suffer from that condition as well. However, to investigate the roles of a protein, complete deletions may not be very accurate since they can give rise to phenotypes that are only indirectly related to the protein function itself. Most conclusions of the present manuscript should either be discussed in this regard or add evidence for a direct role of the protein. One such evidence is typically performed with acute knockdowns in culture, or in developing brains by in utero electroporation. For example, Figure 1C shows that the principal excitatory neurons in the hippocampus do not express ARHGEF6. However, most electrophysiological and behavioral evidence of defects in ARHGEF6-KO mice arises from evaluating these cells (Ramakers et al., 2012). I am not suggesting that either previous or actual evidence is wrong. But I believe readers would benefit from a clear distinction (or add caution notes) between a functional consequence of the deletion (that can be months away and in other cells than the actual molecular defect) and a true cell biological function of the protein under study. In favor of the authors, this is a concern with most conclusions derived from KO organisms.

      We agree with the reviewer that phenotypes observed in constitutive knockout models may, in some contexts, reflect indirect or compensatory consequences of long-term gene loss. Conditional and/or inducible knockout or knockdown approaches can certainly help dissect the nature of the observed defects and better define the effects of gene ablation at different developmental stages or in specific cell types. However, in the context of our study, it is important to note that the experiments performed in ventralized MGE-like organoids allowed us to assess the cell-autonomous nature of very early developmental defects in the inhibitory lineage, in isolation from other cell types. These defects include reduced neuronal output from inhibitory progenitors, increased apoptosis, and morphological abnormalities in inhibitory neurons. Therefore, the phenotypes reported here are less likely to reflect effects originating in, or indirectly caused by, cell types that do not express Arhgef6.

      With regard to Figure 1C, we state in the Results that “among excitatory populations, only CA3 pyramidal neurons and mossy cells exhibited expression levels comparable to those observed in inhibitory clusters (Figure 1D, Table S2),” thereby not neglecting the potential effect of the lack of a functional protein in these populations.

      (2) Figure 1E-G H I. All conclusions are made with a GAD67-GFP reporter, which is a very powerful and reliable tool for large-scale screening. All the conclusions of the paper would be strengthened if some immunohistochemical staining in the same areas of specific markers for interneurons would be added as supporting complementary evidence.

      We appreciate the insightful comment of the reviewer. Additional validation using established interneuronal markers will further strengthen the GAD67-eGFP analysis. We will perform complementary stainings (e.g., PVALB and CCK) and quantifications and include these data as a Supplementary Figure.

      (3) Cell death in development: It is surprising that the high amount of TUNEL staining during development does not translate into gross histological changes in the adult brain (studied elsewhere). Can authors discuss possible explanations?

      We appreciate the thoughtful consideration of our findings. We think that possible explanations include partial compensatory mechanisms during development, which may mitigate the long-term anatomical consequences of increased cell death. In addition, the phenotype may be restricted to specific neuronal populations or developmental windows, thereby producing functional alterations without necessarily resulting in overt macroanatomical defects. Thus, although increased developmental cell death may contribute to altered circuit assembly and neuronal output, it may not be sufficient to produce gross histological changes detectable at the adult brain level.

      (4) Section 4 (Figures 2F-J) - The authors present this staining as an analysis of migration. Normally, migration studies are performed with a "pulse-chase" paradigm, where a single cohort is labeled and then followed over time (normally by in utero electroporation of a fluorescent protein). Tissue is then fixed at different time points, and migration can be followed. On the contrary, the evidence is from a single point, in an experimental setting in which all Gad67 IN are stained, and hence, one cannot imply a defect in migration. The differences between WT and ARHGEF6-KO are obvious and interesting; it is just that they cannot be solely attributed to a problem in migration.

      Also, a true phenotype of migration in the current setting should have found that the cells that failed to migrate are accumulated in deeper layers. My impression is that the changes in IN per layer are easier explained by total cell number, rather than migration. Perhaps evaluating earlier timepoints could clarify this.

      We appreciate the reviewer’s suggestion to implement an additional time point in the in vivo migration analysis. Since an earlier in vivo time point would most likely not reveal migration-related defects, as most cells would still be confined to the ganglionic eminence (Liaci et al., 2022), we will include analyses performed at a later developmental time point as supplementary evidence. We will also revise the wording to clarify that the fixed-tissue data show altered distribution and orientation of GAD67-eGFP-positive interneurons, which are consistent with impaired migratory behavior when considered together with the in vitro live-imaging data. At the same time, we will acknowledge that reduced interneuron survival and/or neuronal output may also contribute to the observed phenotype.

      (5) It is known that ARHGEF6 deletion produces severe F-actin phenotypes in neurons. Have the authors confirmed in their hippocampal cultures GAD67 cells ALSO have these phenotypes? Stress fibers in somas, growth cones, and actin patches along neurites.

      We did not directly assess F-actin organization in GAD67-eGFP murine primary cultures. Direct analyses of F-actin organization, growth-cone morphology, and cytoskeletal organization were performed only in the human system. To further assess this phenotype, we will perform phalloidin staining on GAD67-eGFP brain sections to evaluate F-actin organization in interneurons in vivo.

      (6) Section 4. The authors present data for deficient migration of the GFP-labeled interneurons. Is it possible to assess, in the same sections, whether other cell types are also affected? Although the hypothesis that ARHGEF6 deletion will have an impact in IN is well rooted in expression data, by assessing other cell types, one can even include a positive control or evidence for a cell-autonomous phenotype.

      We thank the reviewer for their thoughtful suggestions. We agree that extending the analysis to additional cell types would provide further insight into the specificity of the phenotype; however, a comprehensive evaluation of all neuronal populations falls beyond the scope of this research. The use of ventralized MGE-like organoids enabled us to examine whether key defects were cell-autonomous, including the reduced neuronal output of inhibitory progenitors, increased apoptosis, and abnormal inhibitory-neuron morphology.

      (7) ARHGEDF6 deletion has an important impact on organoid development (size, shape, etc). Have the authors analysed whether these organoids produced fewer interneurons?

      We would like to clarify that the organoids analyzed in the study are ventral MGE-like organoids and therefore the reduction in neuronal output (current Figure 4K) primarily reflects the ventral/interneuron lineage in this model.

      (8) In assembloids, the differences in migration parameters are very small between WT and ARHGEF6-KO, which reinforces that perhaps what is observed in the different layers of cortex during mouse development is likely not entirely due to migration, as concluded.

      We agree that the migration parameters in assembloids should not be interpreted in isolation. We will revise the text to emphasize that the reduction in the number of interneurons observed in the adult brains is part of a broader pattern that also includes altered neuronal output and reduced viability.

      (9) To properly weigh the present evidence -interneuron deficits- using the ARHGEF6-KO model, authors should include a deeper discussion in light of much work that has been done using these mice. How does the finding of a diminished IN population in the brain of these mice explain the large amount of electrophysiological and behavioral evidence produced before with these animals? Perhaps the most important work to discuss these aspects is the initial ARHGEF6-KO report by Ramakers and colleagues (2012), but there are others.

      We appreciate the reviewer’s emphasis on the importance of framing our findings within the broader context of the existing literature. We will expand the Discussion to better integrate previous work on ARHGEF6-KO mice. Specifically, we will discuss how reduced interneuron number and altered interneuronal function may contribute to previously reported electrophysiological and behavioral phenotypes, acting in concert with previously described alterations in excitatory neurons and synaptic plasticity (Ramakers et al., 2012).

      Minor comments:

      (1) Figure 1A. It looks clear that the GE shows the highest expression of ARHGEF6; however, the reader needs the reference levels where the log2 expression is calculated. What are the reference levels?

      We would like to thank the reviewer for pointing this out. We will clarify in the caption that the log2(RPKM+1) expression values are shown as absolute values and are not relative to a reference condition.

      (2) Have the authors compared the number of GAD67-eGFP cells in the hippocampal cultures between WT and ARHGEF6-KO mice?

      We did not rely on total GAD67-eGFP counts in dissociated hippocampal cultures because differences could reflect initial plating composition, survival, and maturation. In our experience, the MGE-like organoid system provides a more controlled in vitro context to assess neuronal output in the ventral lineage.

      (3) Section 3, as a caution note, authors should mention that it is not possible to know from the evidence provided which cells are dying.

      We agree with the reviewer and will add a cautionary statement noting that TUNEL staining alone does not identify the precise dying cell type. We will clarify that increased cell death in the ganglionic eminence and MGE-like organoids is consistent with a prominent involvement of the ventral/inhibitory lineage, while acknowledging the limits of the assay.

      (4) In the dorsal-ventral assembloids, it is expected that the ventral organoid would contain lots of GFP expression compared to the dorsal, but in the image shown (Figure 5A) both parts of the assembloid seem to have the same amount and distribution of GFP. How is that possible?

      We appreciate the thoughtful comment of the reviewer. After two weeks of fusion, a considerable number of interneurons are expected to have migrated from the ventral to the dorsal compartment of the assembloid (Birey et al., 2017; Sloan et al., 2018). In terms of distribution, we think that current Figure 5A shows a gradient of eGFP-positive cells within the dorsal compartment, with the number of labeled cells decreasing as the distance from the fusion interface between the two organoids increases. By contrast, a comparable gradient is not evident in the ventral compartment, where several labeled neurons remain present even in regions distal to the fusion site.

      Reviewer #3 (Public review):

      Summary:

      ARHGEF6 is a RAC1/CDC42 guanine nucleotide exchange factor that has been proposed to be associated with X-linked intellectual disability, but its relevance to the pathology is not well established. ARHGEF6 has been assigned a role in spine density and plasticity of hippocampal pyramidal neurons, but nothing is known about its role in interneuron development. Here, the authors show that ARHGEF6 is expressed early in development in the inhibitory lineage during the peak of interneuron generation and migration. The aim of the study is therefore to investigate whether, in addition to its role in pyramidal neurons, ARHGEF6 could play a role in inhibitory neuron development. Using both ARHGEF6-KO mice and organoids from ARHGEF6-KO hiPSCs, the authors show that ARHGEF6 plays a critical role in interneuron development and function

      Strengths:

      The major strength of the paper is the very detailed analysis of the role of ARHGEF6 using two different systems: ARHGEF6-KO mice and deletion of ARHGEF6 in human iPSC-derived organoids. Strikingly, deletion of ARHGEF6 in both systems induces similar defects such as an increase in apoptosis, reduced neuronal output, impaired neuronal morphology, and disrupted migratory dynamics. This compelling evidence demonstrates that ARHGEF6, in addition to its already well-described role in spine formation and plasticity, is playing a crucial role during embryonic development through its function in interneurons.

      We thank the reviewer for this positive assessment of our work and for highlighting the strength of our combined in vivo and human iPSC-derived organoid approaches. We are pleased that the reviewer recognizes the consistency of the phenotypes observed across both systems and acknowledges that our findings support a crucial role, during early stages of embryonic development, for a protein previously thought to be relevant primarily in the synaptic context.

      Weaknesses:

      (1) In Figure 1, the authors show that ARHGEF6 is expressed in different regions of the brain, including the interneuron lineage, and that depletion of ARHGEF6 reduces the number of GABAergic neurons in the adult cortex and hippocampus. To try to better characterize this defect, the authors in Figure 2 investigate whether deletion of ARHGEF6 affects interneuron migration and survival during embryonic development. To do so, ARHGEF6 ko mice were crossed with the GAD67-eGFP reporter line to follow the inhibitory lineage. The authors analyse apoptosis using TUNEL staining, and show that it is significantly increased in the ganglion eminence of ARHGEF6-KO E14.5 embryos. The authors claim that this is not the case in the cortex. However, the image shown in Figure 2A really suggests that staining is increased. Which part of the neocortex is analysed for quantification? This should be clarified.

      We would like to thank the reviewer for pointing this out. The region analyzed was the same as that used to assess GAD67-eGFP-positive cells in Figure 2F. We will clarify the exact neocortical region used for TUNEL quantification and revise the figure and legend to make the analyzed area explicit. We will also analyze additional animals to improve the accuracy of the analysis.

      (2) In Figure 2F-J, the authors investigate the migration of interneurons by analysing the GAD67-eGFP staining, and clearly show that the migratory abilities of the depleted neurons are reduced. However, the authors do not discuss the fact that, because depletion of ARHGEF6 increases apoptosis, there are fewer neurons available for migration. This is important for the interpretation of the data. This point should be clarified.

      We appreciate this comment and believe that it is particularly relevant to the interpretation of the data shown in Figure 2F–G. We will clarify the limited interpretation of this specific analysis in the Results section. The altered directionality observed in vivo, together with evidence of impaired migratory behavior obtained through in vitro live imaging, supports the possibility that altered migratory dynamics contribute to the phenotype, although increased apoptosis and reduced neuronal output may also contribute.

      (3) In Supplementary Figure S2, the authors describe the establishment of the ARHGEF6-KO human iPSC line and test the ability of these cells to undergo correct development, especially for the generation of neural progenitor cells. I was wondering why the authors do not present the data of both control and ARHGEF6-KO cells.

      We thank the reviewer for pointing this out. All staining reported in the organoids and assembloids in this paper shows that the WT ATCC-DYS0100 cell line, as well as the mutant, efficiently differentiates into neuronal tissue. The Supplementary Figure was intended to validate the impact of the mutation on the ability of the iPSC line to retain its differentiation capacity as a preliminary step before proceeding with organoid differentiation. We will integrate stainings for NPC markers on the WT line in the Supplementary Figure.

      (4) At the molecular level, how ARHGEF6 depletion could affect neuronal survival is missing. In addition, as ARHGEF6 is a GEF for RAC1 and Cdc42 amongst other GEFs, I would have expected that the authors test how RAC1 activity (and Cdc42) is affected in ARHGEF6-depleted brains and in ARHGEF6-KO organoids. The measure of phalloidin staining and the anisotropy index are not really meaningful.

      We appreciate the thoughtful comment of the reviewer. Previous evidence already shows that Arhgef6 loss reduces the activity of both GTPases and deregulates the expression of the cytoskeletal regulators Pak1–3, Limk1, and Cofilin in the mouse brain (Ramakers et al., 2012). Regarding organoids, we agree that direct RAC1/CDC42 activity measurements would have strengthened the molecular mechanism. We will revise the manuscript to avoid implying that our phalloidin-based measurements alone establish the underlying dysregulated molecular pathway.

      (5) The authors show that ARHGEF6-KO forebrain organoids were markedly smaller compared to their isogenic controls, and their study suggests that ARHGEF6 expression impacts progenitor maintenance and neurogenesis. Despite representing only a minority of the total neuronal population, I was wondering whether ARHGEF6-KO mice present brain morphology defects such as microcephaly.

      We appreciate the comment. We did not perform a morphometric analysis for microcephaly in the present study. We will add this limitation to the Discussion and note that gross brain morphology changes were not reported in the previously published ARHGEF6-KO mouse characterization (Ramakers et al., 2012). We will also clarify that the smaller organoid phenotype may reflect developmental defects that may reflect developmental defects that are not fully compensated in a reductionist in vitro model and therefore do not necessarily imply overt microcephaly in vivo.

      References

      Allen Institute for Brain Science. Allen Mouse Brain Atlas: Arhgef6 ISH data. Available from: Allen Brain Map.

      Birey, F., Andersen, J., Makinson, C. D., Islam, S., Wei, W., Huber, N., Fan, H. C., Metzler, K. R. C., Panagiotakos, G., Thom, N., O’Rourke, N. A., Steinmetz, L. M., Bernstein, J. A., Hallmayer, J., Huguenard, J. R., & Pașca, S. P. (2017). Assembly of functionally integrated human forebrain spheroids. Nature, 545(7652), 54–59. https://doi.org/10.1038/nature22330

      Liaci, C., Camera, M., Zamboni, V., Sarò, G., Ammoni, A., Parmigiani, E., Ponzoni, L., Hidisoglu, E., Chiantia, G., Marcantoni, A., Giustetto, M., Tomagra, G., Carabelli, V., Torelli, F., Sala, M., Yanagawa, Y., Obata, K., Hirsch, E., & Merlo, G. R. (2022). Loss of ARHGAP15 affects the directional control of migrating interneurons in the embryonic cortex and increases susceptibility to epilepsy. Frontiers in Cell and Developmental Biology, 10, 875468. https://doi.org/10.3389/fcell.2022.875468

      Nodé-Langlois, R., Muller, D., & Boda, B. (2006). Sequential implication of the mental retardation proteins ARHGEF6 and PAK3 in spine morphogenesis. Journal of Cell Science, 119(23), 4986–4993. https://doi.org/10.1242/jcs.03273

      Pelkey, K. A., Chittajallu, R., Craig, M. T., Tricoire, L., Wester, J. C., & McBain, C. J. (2017). Hippocampal GABAergic inhibitory interneurons. Physiological Reviews, 97(4), 1619–1747. https://doi.org/10.1152/physrev.00007.2017

      Ramakers, G. J. A., Wolfer, D., Rosenberger, G., Kuchenbecker, K., Kreienkamp, H.-J., Prange-Kiel, J., Rune, G., Richter, K., Langnaese, K., Masneuf, S., Bösl, M. R., Fischer, K.-D., Krugers, H. J., Lipp, H.-P., van Galen, E., & Kutsche, K. (2012). Dysregulation of Rho GTPases in the αPix/Arhgef6 mouse model of X-linked intellectual disability is paralleled by impaired structural and synaptic plasticity and cognitive deficits. Human Molecular Genetics, 21(2), 268–286. https://doi.org/10.1093/hmg/ddr457

      Sloan, S. A., Andersen, J., Pașca, A. M., Birey, F., & Pașca, S. P. (2018). Generation and assembly of human brain region-specific three-dimensional cultures. Nature Protocols, 13(9), 2062–2085. https://doi.org/10.1038/s41596-018-0032-7

      Yao, Z., Nguyen, T. N., van Velthoven, C. T. J., Goldy, J., Sedeno-Cortes, A. E., Baftizadeh, F., Bertagnolli, D., Casper, T., Chiang, M., Crichton, K., Ding, S.-L., Fong, O., Garren, E., Glandon, A., Gouwens, N. W., Gray, J., Graybuck, L. T., Hawrylycz, M. J., Hirschstein, D., … Zeng, H. (2021). A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. Cell, 184(12), 3222–3241.e26. https://doi.org/10.1016/j.cell.2021.04.021

    1. Author response:

      We sincerely thank the reviewers and editors for the thorough, constructive, and insightful comments, which have greatly helped us improve the accuracy, clarity, and rigor of the manuscript. We acknowledge that the current version has several limitations, including insufficient contextualization with other model systems and lack of critical synthesis. These important weaknesses will be comprehensively addressed in a future revised version of the review.

      For the present revision, we have focused exclusively on correcting objective errors, factual inaccuracies, and citation mistakes as pointed out by the reviewers. All specific factual and reference issues raised by Reviewer 2 and Reviewer 3 have been carefully corrected in the revised manuscript, including inaccurate statements, incorrect citations, missing references, and inconsistent descriptions of zebrafish clock genes, photoreception, and physiological functions.

      We appreciate the reviewers’ thoughtful suggestions regarding the conceptual depth, comparative context, critical synthesis, and expanded discussion of sleep and model limitations. While we fully agree that these aspects would significantly strengthen the review, we plan to systematically incorporate these broader conceptual improvements in a future, more substantial revision.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosinbased mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments

      A. Structural overhaul and figure reorganization

      The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.

      Move methodological and descriptive details (e.g., especially from the second Results subheading and Figure 2) to the Methods or Supplementary Materials.

      In these parts, we define four phases of kinetochore motion in early mitosis. Without such a description in the main text, readers would be confused about subsequent analyses. Figure 2 is also important to show examples of how the four phases develop. Although we respect this suggestion from the reviewer, we would like to keep these parts in the main text and main figure.

      Remove repetitive statements that simply restate that later phenotypes arise as consequences of delayed Phase 1 (applicable to subheadings 3 onward).

      As suggested, we have removed the statement for the delayed start of Phase 2 for peripheral kinetochores in azBB-treated cells (Page 9, second paragraph). We have also simplified the statement for the delayed start of Phase 3 and Phase 4 to avoid repetition (Page 9, third paragraph; Page 10, second paragraph).

      Figure 4I: This panel is currently unclear and should be drastically simplified.

      Following this suggestion, we simplified Figure 4I by removing the column of ‘Start’, which is easily deduced from the ‘Duration’ results and therefore does not provide much new information.

      I recommend to reorganize figures as follows:

      Figure I: Keep as single figure but simplify. Figure 1D and 1E could be combined, move unnormalized SCV to supplementary materials. Same goes for 1F.

      We have reorganized Figure 1, as suggested, and moved unnormalized data to supplemental materials.

      New Figure 2: Combine current Figures 2A, 3A, 3C, 3D, 4C, 4F, and 4H to illustrate how PANEM contraction facilitates initial interactions of peripheral chromosomes with spindle microtubules which increases speed of congression initiation.

      If we were to follow this suggestion, we would lose Figure 2B, D, Figure 3B and Figure 4A, where examples of kinetochore motions are shown in images and 3D diagrams. The new Figure would mostly consist of only graphs. Without examples of images and 3D diagrams, readers would have difficulty understanding the study. Although we respect this suggestion from the reviewer, we would like to keep Figures 2, 3 and 4, as they are (except for making Figure 4I simpler; see above).

      New Figure 3: Combine current Figures 5A, 5C, 5D, 5F, 6B, 6C, and lower panels of 4H to show how

      PANEM contraction repositions polar chromosomes and reduces chromosome volume in early mitosis to enable rapid initiation of congression.

      If we were to follow this suggestion, we would lose Figure 5B and Figure 6A, where examples of kinetochore/chromosome dynamics are shown in images and 3D diagrams. For the same reason as above, we would like to keep Figure 5 and 6 as they are, although we respect this suggestion from the reviewer.

      New Figure 4: Combine Figures 7A, 7B, 7D, 7E, 7F, expanded Supplementary Figure S7, and new data to demonstrate that PANEM actively pushes peripheral chromosomes inward which is important for efficient chromosome congression in diverse cellular contexts.

      We have conducted new experiments to demonstrate the role of PANEM in diverse cellular contexts, as detailed below. We have combined the new results with the original Figure S7 to create Figure 8 in line with this suggestion.

      On the other hand, in our view, combining Figure 7A-E and the extended Figure S7 would be confusing because the two parts address different topics. Although we respect this suggestion from the reviewer, we would like to keep Figure 7 and the extended Figure S7 (i.e. Figure 8) separate.

      B. Specificity and redundancy of actin perturbation

      To establish the specificity and relevance of PANEM, the authors should include or discuss appropriate controls:

      Apply global actin inhibitors (e.g., cytochalasin D, latrunculin A) to disrupt the entire actin cytoskeleton. These perturbations strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as reported previously (Lancaster et al., 2013; Dewey et al., 2017; Koprivec et al., 2025). The minimal effect of global inhibition must be addressed when proposing a localized actomyosin mechanism. Comment if the apparent differences in this approach and one that the authors were using arises due to different cell types.

      We did experiments along this line, using a dominant-negative LINC construct, in our previous study (Booth et al eLife 2019). LINC-DN should more specifically remove/reduce PANEM than the global actin inhibitors mentioned above. LINC-DN attenuated the reduction of CSV soon after NEBD and increased the number of polar chromosomes (Booth et al eLife 2019); i.e. in this regard, the outcome was similar to azBB treatment in the current study. One can expect that global actin inhibitors would also inhibit the PANEM formation and show effects similar to LINC-DN. By contrast, the indicated references reported that global actin inhibitors strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as the reviewer noted. One possibility is that such differences may have arisen from different cell types – this could be important, especially given that some cells form the PANEM and others do not (Figure 8A). A second possibility is that cytokinesis, mitotic rounding and PANEM formation may rely on actin polymerization to different extents. For example, the same concentration of global actin polymerization inhibitors may affect cytokinesis, but may still allow PANEM formation to proceed without observable effects on early chromosome movements. As suggested, we discussed this topic in the Discussion (page 16, third paragraph).

      Clarify why spindle-associated actin, especially near centrosomes, as reported in prior studies using human cultured cells (Kita et al., 2019; Plessner et al., 2019; Aquino-Perez et al., 2024), was not observed in this study. The Myosin-10 and actin were also observed close to centrosomes during mitosis in X.laevis mitotic spindles (Woolner et al., 2008). Possible explanations include differences in fixation, probe selection, imaging methods, or cell type. Note that some actin probes (e.g., phalloidin) poorly penetrate internal actin, and certain antibodies require harsh extraction protocols. Comment on possibility that interference with a pool of Myo10 at the centrosomes is important for effects on congression.

      As the reviewer implies, we cannot rule out that we could not detect actin associated with the spindle or centrosomes because of the difference in methods or cell lines between the current study and the literature mentioned by the reviewer. We have therefore moderated our claim in the Discussion that ‘we did not detect any actin network inside the nucleus, on the spindle or between chromosomes’ by adding ‘at least, using the method and the cell line in the current study’ to this statement (Page 14, second paragraph). We have also cited the three references mentioned by the reviewer in the Discussion (Page 14, second paragraph). Regarding Myosin10, azBB (blebbistatin variant) should have negligible effects on class-X myosin, including Myosin-10 (Limouze et al 2004 [PMID 15548862]). It is therefore unlikely that the effects of azBB that we observed in the current study are due to the inhibition of Myosin-10. We have cited Woolner et al 2008 and another paper and discussed this topic in the Discussion (Page 14, second paragraph).

      C. Expansion of PANEM functional analysis

      To strengthen the conclusions and broaden the study beyond the group's previous work, PANEM function should be tested in additional contexts (some may be considered optional but important for broader impact): [underlined by authors]

      Test PANEM function in at least one additional cell line that displays PANEM to rule out cell-line-specific effects.

      As suggested, we have studied the effect of PANEM contraction in cell lines other than U2OS. We have found that when PANEM contraction was inhibited, the reduction in chromosome scattering was diminished in RPE1 cells (new Figure 8B, C). Moreover, we have found that inhibition of PANEM contraction increased polar chromosomes during prometaphase/ metaphase in RPE1 and HCT116 cells (which form PANEM), but not in HeLa cells (which do not form PANEM) (new Figure 8D, E). These results suggest that the effects of PANEM contraction, originally observed in U2OS cells, are also present in other cell lines (RPE1 and HCT116) that form PANEM.

      Examine higher-ploidy or binucleated cells to determine whether multiple PANEM contractions are coordinated and if PANEM contraction contributes more in cells of higher ploidies or specific nuclear morphologies.

      This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.

      Investigate dependency on nuclear shape or lamina stiffness; test whether PANEM force transmission requires a rigid nuclear remnant.

      This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.

      Analyze PANEM's contribution under mild microtubule perturbations that are known to induce congression problems (e.g., low-dose nocodazole).

      In the current study, we found that PANEM contraction affects chromosome motions in Phase 1 and Phase 3 but not Phase 2 or Phase 4. Mild microtubule perturbation itself could affect chromosome motions in all four Phases. We do not think it would be so informative to study what additional effects the reduced PANEM contraction shows when combined with mild microtubule perturbation.

      Evaluate PANEM contraction role in unsynchronized U2OS cells, where centrosome separation can occur before NEBD in a subset of cells (Koprivec et al., 2025), and in other cell types with variable spindle elongation timing.

      Following this suggestion, we first investigated the timing of spindle elongation, relative to NEBD, in asynchronous U2OS cells (Figure 8 – figure supplement 3). We imaged cells every 5 min (it was difficult to reasonably observe enough mitotic cells using a shorter interval). Most of the cells showed no significant change in the spindle length (distance between two spindle poles) after (or around) NEBD [e.g. Cell 1 in A] or a mild reduction in it [e.g. Cell 2 in A]. Only a small number of cells (2-3 out of 26) showed a mild increase in the spindle length after (or around) NEBD [e.g. Cell 3 in A]. Because the spindle elongation after NEBD was rare and mild, it was difficult to address how the timing of spindle elongation affects the effect of PANEM on reducing chromosome scattering and on chromosome relocation from polar regions. We explained this result and discussed this topic in the Discussion section.

      Quantify not only the percentage of affected cells after azBB but also the number of chromosomes per cell with congression defects in the current and future experiments.

      It is tricky to count the number of chromosomes because they frequently overlap. Counting kinetochores is more feasible, but kinetochore signals show some non-specific background (e.g. those outside of the nucleus in prophase). We therefore quantified the chromosome volume at polar regions in azBB-treated cells (Figure 6C).

      D. Conceptual integration in Introduction and Discussion

      The manuscript should better situate its findings within the context of early mitotic chromosome movements:

      Clearly state in the Introduction and elaborate in the Discussion that initiation of congression is coupled to biorientation (Vukušić & Tolić, 2025). This provides essential context for how PANEM-mediated nuclear volume reduction supports efficient congression of polar chromosomes.

      It has been a widely accepted view in the field that chromosome congression precedes biorientation, since the publication in 2006 (Kapoor et al Science 2006). Very recently, this view has been challenged by the new publication (Vukušić & Tolić, Nat comm 2025), as indicated by this reviewer. We have mentioned this new model and discussed the new interpretation of our results based on this new model, in the Discussion (page 15; ‘It has been a widely accepted view…’).

      To explain the new interpretation of our results more clearly, we have a new diagram as a supplemental figure (Figure 9 – figure supplement 1) in the revised manuscript.

      Explain that PANEM is most critical for polar chromosomes because their peripheral positions are unfavorable for rapid biorientation (Barišić et al., 2014; Vukušić & Tolić, 2025).

      We have included such a statement in the Discussion, as a part of the new interpretation of our results based on the new model that chromosome biorientation precedes congression (see above). We have also cited the indicated two papers.

      Discuss how cell lines lacking PANEM (e.g., HeLa and others) nonetheless achieve efficient congression, and what alternative mechanisms compensate in the absence of PANEM. For example, it is well established that cells congress chromosomes after monastrol or nocodazole washout, which essentially bypasses the contribution of PANEM contraction.

      Following this suggestion, we discussed three possible mechanisms that could compensate for a lack of PANEM and facilitate kinetochore-MT interaction and chromosome congression, based on previous literature (Page 17): 1) the enhanced assembly rate of spindle MTs may facilitate kinetochore-MT interactions in N-CIN+ cancer cells, 2) chromosome biorientation may precede congression more frequently to promote the congression towards the spindle midplane, and 3) the balance between CENP-E, Dynein and chromokinesin’s activities may incline to greater chromosome-arm ejection forces towards the spindle midplane.

      Minor Comments

      These issues are more easily addressable but will significantly improve clarity and presentation.

      Introduction

      Remove the reference to Figure 1A in the Introduction. The portion of Figure 1 and related text that recapitulates the authors' previous work should be incorporated into the Introduction, not the Results.

      As suggested in the second sentence of this comment, we have moved most of the second paragraph of the first section of Results to Introduction (Page 4) and cited Figure 1A and 1B in Introduction. We would like to keep the reference to Figure 1A in the Introduction, because showing the PANEM images at the beginning of the manuscript would help readers’ understanding of our study. In addition, citing Figure 1A in the Introduction is more consistent with the suggestion in the second sentence of this comment.

      Results (by subheading)

      First subheading: When introducing the ~8-minute early mitotic interval, cite additional studies that have characterized this period: Magidson et al., 2011 (Cell); Renda et al., 2022 (Cell Reports); Koprivec et al., 2025 (bioRxiv); Vukušić & Tolić, 2025 (Nat Commun); Barišić et al., 2013 (Nat Cell Biol).

      As suggested, we cited these references at the indicated part of the first section of the Results (page 5).

      Second subheading: Cite key reviews and foundational research on kinetochore architecture and sequential chromosome movement during early mitosis: Mussachio & Desai, 2017 (Biology); Itoh et al., 2018 (Sci Rep); Magidson et al., 2011 (Cell); Vukušić & Tolić, 2025 (Nat Commun); Koprivec et al., 2025 (bioRxiv); Rieder & Alexander, 1990 (J Cell Biol); Skibbens et al., 1993 (J Cell Biol); Kapoor et al., 2006 (Science); Armond et al., 2015 (PLoS Comput Biol); Jaqaman et al., 2010 (J Cell Biol).

      Rieder & Alexander, 1990 (J Cell Biol) and Kapoor et al., 2006 (Science) have already been cited in the second section of the Results in the original manuscript. We agree that all other references should be cited in this manuscript, and they are now cited in the Introduction and/or Discussion where they fit best (e.g. Mussachio & Desai 2017 reviews the kinetochore in general and is therefore best cited in the Introduction).

      Third subheading: Clarify why some kinetochores on Figure 3A appear outside the white boundaries if these boundaries are intended to represent the nuclear envelope.

      We interpret that these are background signals in the cytoplasm, which do not come from kinetochores, because 1) before NEBD, they were outside of the nucleus, and 2) after NEBD, they did not show any characteristic kinetochore motions such as those towards a spindle pole (Phase 2) and the spindle mid-plane (Phase 4). We have commented on these background signals in the legend for Figure 3A.

      Fourth subheading: Note that congression speed is lower for centrally located kinetochores because they achieve biorientation more rapidly (Barišić et al., 2013, Nat Cell Biol; Vukušić & Tolić, 2025, Nat Commun).

      Relevant to this comment, there was an error regarding the congression speed of central kinetochores (original Figure 4H). The congression speed of peripheral kinetochores was shown correctly, but for central kinetochores it was shown incorrectly with µm per time interval (30s) shown, rather than µm per minute. We amended this error in the revised manuscript (new Figure 4H). Based on the corrected data, the speed of congression is similar between peripheral and central kinetochores. The original Figure 3G (the speed of poleward motion for central kinetochores) had a similar error, which we have also corrected in the revised manuscript. We apologize for these errors and the confusion it may have caused.

      Regarding this comment, if biorientation is achieved more rapidly for central kinetochores, Phase 3 (rather than congression speed) would be shorter for central kinetochores. Indeed, Phase 3 is slightly shorter for central kinetochores (control) than for peripheral kinetochores (control) (Figure 4C), but the difference is not statistically significant (t test; p\=0.21).

      Fifth subheading: Cite studies on polar chromosome movements: Klaasen et al., 2022 (Nature); Koprivec et al., 2025 (bioRxiv). Clarify that Figure 5F displays only those kinetochores that initiated directed congression movements.

      These two references have already been cited and discussed in this Result section of our original manuscript. However, considering this suggestion, we have discussed more about polar chromosome movements reported by Koprivec et al (page 11). Meanwhile, the reviewer is correct about Figure 5F, and we have clarified this point in the Figure 5F legend.

      Sixth subheading (currently in Discussion): Move the final paragraph of the Discussion into the Results and expand it with preliminary analyses linking PANEM contraction to congression efficiency across untreated cell types or under mild nocodazole treatment.

      As suggested, we have moved the final paragraph of the Discussion in the original manuscript to make a new final section in the Results in the revised manuscript. Moreover, as suggested, we have studied the outcome of inhibiting PANEM contraction in cell lines other than U2OS (Figure 8 B–E), and have described the new results to the new final section in the Results.

      Discussion

      1. When discussing cortical actin, cite key reviews on its presence and function during mitosis: Kunda & Baum, 2009 (Trends Cell Biol); Pollard & O'Shaughnessy, 2019 (Annu Rev Biochem); Di Pietro et al., 2016 (EMBO Rep).

      As suggested, we have cited all these review papers in the Discussion (page 17), and mentioned the role of the cortical actin on the spindle orientation and positioning (Kunda & Baum, 2009; Di Pietro et al., 2016), as well as the function of the actomyosin ring on cytokinesis (Pollard & O'Shaughnessy, 2019).

      Significance

      Advance

      This study's main strength is its novel and potentially important demonstration that contraction of PANEM, a peripheral actomyosin network that operates contracts early mitosis, contributes to the timely initiation of chromosome congression, especially for polar chromosomes. While PANEM itself was previously described by this group, this manuscript provides new mechanistic evidence, improved perturbations, and detailed chromosome tracking. To my knowledge, no prior studies have mechanistically connected this contraction to polar chromosome congression in this level of detail. The work complements dominant microtubule-centric models of chromosome congression and introduces actomyosin-based forces as a cooperating system during very early mitosis. However, the impact of the study is currently limited by major organizational issues, insufficient controls, and incomplete contextualization within existing literature. Addressing these issues will substantially improve clarity and credibility. [underlined by authors]

      We have addressed the underlined criticisms as detailed above.

      Audience

      Primary audience of this study will be researchers working in cell division, mitosis, cytoskeleton dynamics, and motor proteins. The findings may interest also the wider cell biology community, particularly those studying chromosome segregation fidelity, spindle mechanics, and cytoskeletal crosstalk. If validated and clarified, the concept of PANEM could be integrated into textbooks and models of chromosome congression and could inform studies on mitotic errors and cancer cell mechanics.

      Expertise

      My expertise lies in kinetochore-microtubule interactions, spindle mechanics, chromosome congression, and mitotic signaling pathways.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript, Sheidaei et al. reported on their study of chromosome congression during the early stages of mitotic spindle assembly. Building on their previous study (ref. #15, Booth et al., Elife, 2019), they focused on the exact role of the actin-myosin-based contraction of the nuclear envelope. First, they addressed a technical issue from their previous study, finding a way to specifically impair the actomyosin contraction of the nuclear membrane without affecting the contraction of the plasma membrane. This allowed them to study the former more specifically. They then tracked individual kinetochores to reveal which were affected by nuclear membrane contraction and at what stage of displacement towards the metaphase plate. The investigation is rigorous, with all the necessary controls performed. The images are of high quality. The analyses are accurate and supported by convincing quantifications. In summary, they found that peripheral chromosomes, which are close to the nuclear membrane, are more influenced by nuclear membrane contraction than internal chromosomes. They discovered that nuclear membrane contraction primarily contributes to the initial displacement of peripheral chromosomes by moving them towards the microtubules. The microtubules then become the sole contributors to their motion towards the pole and subsequently the midplane. This step is particularly critical for the outermost chromosomes, which are located behind the spindle pole and are most likely to be missegregated.

      Significance

      While the conclusions are somewhat intuitive and could be considered incremental with regard to previous works, they are solid and improve our understanding of mitotic fidelity. The authors had already reported the overall role of nuclear membrane contraction in reducing chromosome missegregation in their previous study, as mentioned fairly and transparently in the text. However, the reason for this is now described in more detail with solid quantification. Overall, this is good-quality work which does not drastically change our understanding of chromosome congression, but contributes to improving it. Personally, I am surprised by the impact of such a small contraction (of around one micron) on the proper capture of chromosomes and wonder whether the signalling associated with the contraction has a local impact on microtubule dynamics. However, investigating this point is clearly beyond the scope of this study, which can be published as it is. [underlined by authors]

      The suggested topic (underlined) is intriguing. However, we agree with the reviewer that it is beyond the scope of this paper. The reviewer recommends publication of our manuscript as it is.

      Reviewer #3:

      Sheidaei et al., report how chromosomes are brought to positions that facilitate kinetochore-microtubule interactions during mitosis. The study focusses on an important early step of the highly orchestrated chromosome segregation process. Studying kinetochore capture during early prophase is extremely difficult due to kinetochore crowding but the team has taken up the challenge by classifying the types of kinetochore movements, carefully marking kinetochore positions in early mitosis and linking these to map their fate/next-positions over time. The work is an excellent addition to the field as most of the literature has thus far focussed on tracking kinetochore in slightly later stages of mitosis. The authors show that the PANEM facilitates chromosome positioning towards the interior of the newly forming spindle, which in turn facilitates chromosome congression - in the absence of PANEM chromosomes end up in unfavourable locations, and they fail to form proper kinetochore-microtubule interactions. The work highlights the perinuclear actomyosin network in early mitosis (PANEM) as a key spatial and temporal element of chromosome congression which precedes the segregation process.

      Major points

      (1) The complexity of tracking has been managed by classifying kinetochore movements into 4 categories, considering motions towards or away from the spindle mid-plane. While this is a very creative solution in most cases, there may be some difficult phases that involve movement in both directions or no dominant direction (eg Phase3-like). It is unclear if all kinetochores go through phase1, 2, 3 and 4 in a sequential or a few deviate from this pattern. A comment on this would be helpful. Also, it may be interesting to compare those that deviate from the sequence, and ask how they recover in the presence and absence of azBB.

      To respond to this comment, we would like to first clarify how we selected kinetochores for our analysis. We selected kinetochores that can be individually tracked. If kinetochore tracking was difficult (before the start of Phase 4 in control and azBB-treated cells or before observing the extended Phase 3 in azBB-treated cells) because of kinetochore crowding, we did not choose such kinetochores. For example, related to the next comment of this Reviewer, we did not include kinetochores close to spindle poles (within 4 µm) at NEBD in our analysis for the following two reasons: First, these kinetochores often did not show clear and rapid movements towards a spindle pole, which we used to define Phase 2. Second, although we referred to kinetochore co-localization with a microtubule signal for the start of Phase 2, this was difficult for kinetochores close to spindle poles because of a high density of microtubules. As requested, we have added this comment to the Method section (page 25).

      With the above selection, all selected kinetochores without azBB treatment (control) showed the poleward motion (Phase 2) and congression (Phase 4) in this order, though their extents were varied among kinetochores. All selected kinetochores with azBB treatment also showed the poleward motion (Phase 2), and some of them showed congression (Phase 4) after Phase 2. Then, Phase 1 and Phase 3 were defined as intervals between NEBD and Phase 2 and between Phase 2 and Phase 4, respectively. If no Phase 4 was observed with azBB, we judged that Phase 3 continued till the end of tracking. We have added this comment to the Method section (page 25-26).

      (2) Would peripheral kinetochore close to poles behave differently compared to peripheral kinetochore close to the midplane (figure S4)? In figure 3D, are they separated? If not, would it look different?

      Since we did not include kinetochores close to spindle poles (at NEBD), for which it was difficult to define Phase 2 (see our response to the above major point 1), in our analysis, the suggested comparison is not feasible.

      (3) Uncongressed polar chromosomes (eg., CENPE inhibited cells) are known to promote tumbling of the spindle. In figure 5B with polar chromosomes, it will be helpful to indicate how the authors decouple spindle pole movements from individual kinetochore movements.

      In contrast to CENPE-inhibited cells, azBB-treated cells did not show much tumbling of the spindle, though both cells showed uncongressed polar chromosomes. The reason for this difference may be fewer uncongressed polar chromosomes in azBB-treated cells. There were still modest spindle motions in azBB-treated cells. However, because kinetochore motions were assessed relative to a spindle pole (and other reference points on the spindle) in our study (Figure 2A, C), the modest spindle motions were offset in our analyses of kinetochore motions. We have clarified the underlined part in the Method section (page 24).

      (4) The work has high quality manual tracking of objects in early mitosis- if this would be made available to the field, it can help build AI models for tracking. The authors could consider depositing the tracking data and increasing the impact of their work.

      As suggested, we have included kinetochore tracking data as supplemental data in the revised manuscript (Figure 3 – source data 1–4; Figure 5 – source data 1, 2).

      Minor points

      (1) It will be helpful for readers to see how many kinetochores/cell were considered in the tracking studies. Figure legends show kinetochore numbers but not cell numbers.

      As suggested, we have now mentioned the number of cells, where the kinetochore motions were analyzed, in the legends for Figures 3, 4, 5, and supplemental figures.

      (2) Discussion point: If cells had not separated their centrosomes before NEBD, would PANEM still be effective? Perhaps the cancer cell lines or examples as shown in Figure 6A have some clues here.

      Following this suggestion, we first investigated the timing of spindle elongation, relative to NEBD, in asynchronous U2OS cells (Figure 8 – figure supplement 3). We imaged cells every 5 min (it was difficult to reasonably observe enough mitotic cells using a shorter interval). Most of the cells showed no significant change in the spindle length (distance between two spindle poles) after (or around) NEBD [e.g. Cell 1 in A] or a mild reduction in it [e.g. Cell 2 in A]. Only a small number of cells (2-3 out of 26) showed a mild increase in the spindle length after (or around) NEBD [e.g. Cell 3 in A]. Because the spindle elongation after NEBD was rare and mild, it was difficult to address how the timing of spindle elongation affects the effect of PANEM on reducing chromosome scattering and on chromosome relocation from polar regions. We explained this result and discussed this topic in the Discussion section.

      (3) Figure 7 cartoon shows misalignment leading to missegregation. It may be useful to consider this in the context of the centrosome directed kinetochore movements via pivoting microtubules. Is this process blocked in azBB-treated cells?

      We understand that the Reviewer refers to the kinetochore pivoting mechanism around a spindle pole, which was recently reported by the Tolic group (Koprivec et al., 2026). Such a pivoting mechanism would work only when the spindle elongates (i.e. the distance between spindle poles is enlarged) after NEBD. Therefore, to address this Reviewer’s question, we tried to assess how PANEM contraction contributes to relocating polar chromosomes when the spindle elongates before or after NEBD in asynchronous U2OS cells (i.e. in the situation where the kinetochore pivoting mechanism is applied or not), as we noted above in response to Point 2. However, spindle elongation after NEBD was rare and mild, and we were unable to address this issue (see our response to Point 2). We discussed this matter in the Discussion section.

      (4) Are all the N-CIN- lines with PANEM highly sensitive to azBB? In other words, is PANEM essential for normal congression in some of these lines.

      Because blebbistatin could kill cells by inhibiting cytokinesis, the blebbistatin sensitivity of cell growth may not necessarily reflect how essential the PANEM contraction is for chromosome congression.

      Instead, we addressed more directly how essential the PANEM contraction is for chromosome congression. We analyzed chromosome congression in RPE1 and HCT116 cells (both are NCIN-) in the presence and absence of pnBB, the inhibitor of PANEM contraction (new Figure 8D, E). With pnBB, these cells showed congression defects, suggesting that the PANEM contraction is essential for chromosome congression in these N-CIN- cells.

      (5) Are congression times delayed in lines that naturally lack PANEM?

      For example, it takes 10-20 min for HeLa cells (lacking PANEM) to complete chromosome congression after the NEBD (Bancroft et al 2025: https://doi.org/10.1242/jcs.163659). This is not significantly different from the time (8-18 min) for chromosome congression we observed in U2OS cells (which form PANEM). We assume that cells lacking PANEM have developed a compensatory mechanism for efficient chromosome congression – we have discussed possible compensatory mechanisms in the last paragraph of the Discussion (page 17).

      (6) Page 23 "we first identified the end of congression" how does this relate to kinetochore oscillations that move kinetochores away from the metaphase plate?

      The start of kinetochore oscillation was defined as the end of Phase 4 if we could track the kinetochore until that point. In some cases where the kinetochore became close to the midplane (< 2.5 µm), it was not possible to track it further due to kinetochore crowding around the spindle mid-plane – in such cases, the end of Phase 4 was assigned as the end of tracking. These definitions were not necessarily clear in the original manuscript. Moreover, in the original manuscript, it was not clearly stated that the end of Phase 4 was defined in the same way for both non-polar and polar kinetochores. We have now clarified these points in the Method section (page 25).

      (7) Are spindle pole distances (spindle sizes) different in early and late mitotic cells (4min vs 6min after NEBD) in control vs azBB-treated cells? Please comment on Figure S2E (mean distance) in the context of when phase 4 is completed. Does spindle size return to normal after congression?

      In Figure S2E (Figure 1 – figure supplement 6 in the revised manuscript), we did not observe a significant difference in the spindle-pole distance (the spindle size) between control and azBBtreated cells at any individual time points. The smallest p-value was 0.094 at 6.0 min. As suggested, we have explained this in the legend for this supplementary figure. Completion of Phase 4 is highly variable across different kinetochores within the same cell; thus, a general comment on its completion timing in cells is not feasible.

      Significance:

      The current work builds upon their previous work, in which the authors demonstrated that an actomyosin network forms on the cytoplasmic side of the nuclear envelope during prophase. This work explains how the network facilitates chromosome capture and congression by tracking motions of individual kinetochores during early mitosis. The findings can be broadly useful for cell division and the cytoskeletal fields.

    1. Author response:

      Thank you for your decision letter with the public review and the recommendations. While we are delighted that the referees feel the work is addressing an outstanding and important issue, they have raised concerns regarding the strength of the support. We will address all the concerns in full in a revised manuscript in the due course. Please find below a couple of general points regarding the referees’ concerns and a proposal as to how we plan to address them.

      (1) The idea of the manuscript is to present a plausible solution for a long-standing question in the field of mitochondrial biology and evolution. The fact that the identified solution to the origin of AAC transporters is a remote structural homolog (as you will see in our later detailed response that it is better than any other sequence/structure available till date) is to be expected. If the actual similarities were any better than what we have identified (with a special case of circular permutation), they could have been identified by other simpler structural homology search methodologies.

      (2) A recurrent and strong disagreement of the reviewers on the findings presented in this manuscript is rooted on the fact that the structural and sequence relatedness between AAC and CysZ detected in this work are so weak that they can be co-incidental and not an actual evolutionary link. Based on the above, we now searched carefully in all available structural databases such as SCOP, CATH, ECOD etc. whether the above fold link has been noted by others independently. We notice that in the ECOD (Evolutionary Classification of Protein Domains) database only AAC and CysZ are grouped together under a single Possible homology group (X) called ‘Mitochondrial ADP/ATP carrier-like’. The ECOD database contains hierarchical classification of protein domains organized according to their evolutionary relationships and the server is maintained by Prof. Nick Grishin at The University of Texas Southwestern Medical Center.

      Link to ECOD database: http://prodata.swmed.edu/ecod/index_af2_pdb.php

      Reference: Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, et al. (2014) ECOD: An Evolutionary Classification of Protein Domains. PLOS Computational Biology 10(12): e1003926. https://doi.org/10.1371/journal.pcbi.1003926

      Therefore, our study and the independent findings of the ECOD database team together offers greater confidence on the proposed remote evolutionary relationship between AAC and CysZ, and that the structural and sequence similarity we report in the manuscript are not a mere co-incidence. We will also incorporate the details of possible evolutionary relationship between AAC and CysZ identified in the ECOD database in the revised version of manuscript.

      (3) One point we would like to stress is that considering all the similarities identified, it very unlikely falls into the class of ‘convergent evolution’. We will make this point explicit in the revised version.

      (4) Lastly, while we totally agree that the similarities are in the twilight zone, considering the importance of the problem, we feel that our work would induce researchers from the field of protein design to attempt possible interconversion of the two distantly related transporters thus providing an experimental rationale for the evolution of these transporters.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Renard, Ukrow et al. applied their recently published computational pipeline (CHROMAS) to the skin of Euprymna berryi and Sepia officinalis to track the dynamics of cephalopod chromatophore expansion. By segmenting each chromatophore into radial slices and analyzing the co-expansion of slices across regions of the skin, they inferred the motor control underlying chromatophore groups.

      Strengths:

      The authors demonstrate that most motor units of cephalopod skin include a subregion of multiple chromatophores, creating "virtual chromatophores" in between the fixed chromatophores. This is an interesting concept that challenges prevailing models of chromatophore organization, and raises interesting possibilities for how chromatophore arrays may be patterned during development.

      This study introduces new analyses of cephalopod skin that will be valuable for the quantitative study of cephalopod behavior.

      Weaknesses:

      The authors chose to image spontaneous skin changes in sedated animals, rather than visually-evoked skin changes in awake, freely-moving animals. Spontaneous chromatophore changes tend to be small shimmers of expansion and contraction, rather than obvious, sizable expansions. This may make it more challenging to distinguish truly co-occurring expansions from background activity. The authors don't provide any raw data (videos) of the skin, so it is difficult to independently assess the robustness of the inferred chromatophore groupings.

      The patch-clamp experiments in E. berryi are used to test the validity of their approach for inferring motor units. The stimulations evoke expansions of sub-regions of each chromatophore, creating "virtual chromatophores" as predicted from the behavioral analysis. However, the authors were not able to predict these specific motor units from behavioral analysis before confirming them with patch-clamp, limiting the strength of the validation. It would be informative to quantify the results of the patch-clamp experiments - are the inferred motor units of similar sizes to those predicted from behavior?

      The authors report testing multiple experimental conditions (e.g., age, size, behavioral stimuli, sedation, head-fixation, and lighting), but only a small subset of these data are presented. It is difficult to determine which conditions were used for which experiments, and the manuscript would benefit from pooling data from multiple experiments to draw general conclusions about the motor control of cephalopod skin.

      The authors use a different clustering algorithm for E. berryi and S. officinalis, but do not discuss why different clustering approaches were required for the two species.

      Impact:

      The authors use their computational pipeline to generate a number of interesting predictions about chromatophore control, including motor unit size, their spatial distribution within the skin, and the independent control of subregions within individual chromatophores by putatively distinct motor neurons. While these observations are interesting, the current data do not yet fully support them.

      The CHROMAS tool is likely to be valuable to the field, given the need for quantitative frameworks in cephalopod biology. The predictions outlined here provide a useful foundation for future experimental investigation.

      We thank the reviewer for the thoughtful and detailed evaluation of our work and for recognizing the potential of the CHROMAS pipeline for studying chromatophore control.

      We agree that some aspects of the manuscript required clarification and additional explanation, and we have revised the text accordingly. We also now provide access to representative raw video recordings in the Data Availability section. In the E. berryi patch-clamp experiments, single motor neurons evoked expansions of sub-regions of chromatophores, consistent with the “virtual chromatophore” concept. We have now quantified the size of motor units across patch-clamp sessions, and the results show that the inferred motor-unit sizes broadly match those predicted from behavioral recordings, supporting the validity of our approach.

      We agree that pooling data across individuals would provide valuable insight into variability across animals. In practice, we recorded chromatophore activity from several animals (14 Euprymna berryi and 12 Sepia officinalis) under different experimental conditions during development of the experimental pipeline. However, acquiring long, stable, artifact-free recordings suitable for motor unit analysis is technically challenging. We now clarify this point in the manuscript. Specifically, we explain that multiple animals were recorded during pipeline development, while the analyses presented focus on recordings with the highest signal quality. We anticipate that the framework introduced here will enable future studies to collect larger datasets and compare motor unit organization across individuals, developmental stages, and species.

      HDBSCAN was used for E. berryi during initial exploratory analyses, and Affinity Propagation was adopted for S. officinalis because it better captured the correlation structure of those recordings. We did not re-analyze the E. berryi data with Affinity Propagation, and the implications of algorithm choice are now discussed in the Discussion.

      Reviewer #2 (Public review):

      Summary:

      Overall, this is an excellent paper, making use of a newly developed system for monitoring the behaviour of chromatophores in the skin of (mostly) free-swimming bobtail squid and European cuttlefish. The manuscript is very well-written, clearly presented and very well-structured. The central finding, that individual chromatophores are connected to multiple motor neurones, is not new. Novelty instead comes from the ability to measure the actuation of chromatophore sections across wide areas of skin in free-swimming animals, showing the diversity of local motor units and reinforcing the notion that individual chromatophores are not necessarily the individual units of colour change, but rather local motor units that cover multiple neighbour and near-neighbour chromatophore muscles. This is an excellent finding and one that will shape our understanding of the neural control of cephalopod skin colour.

      Strengths:

      The methodological approach to collecting large amounts of data about local variations in the expansion of sections of chromatophores is exciting, and the analysis pipeline for clustering sections of chromatophores whose spontaneous activity correlated over time is powerful and exciting.

      Weaknesses:

      Some minor edits and typographical errors need correcting. I also had some concerns that the preparation for the electrophysiological section of the manuscript complies with the journal's ethical requirements, so I would urge that this be carefully checked.

      We thank the reviewer for the positive evaluation of our work and for recognizing the value of the methodological approach and the clarity of the manuscript.

      We have carefully reviewed the manuscript and corrected minor typographical errors.

      Regarding the ethical considerations raised for the electrophysiological experiments, we have carefully verified that the experimental procedures comply with the journal's ethical requirements and relevant institutional guidelines.

      Reviewer #3 (Public review):

      Summary:

      This study uses high-resolution videography and a custom computer-vision pipeline to dissect the motor control of cephalopod chromatophores in Euprymna berryi and Sepia officinalis. By quantifying anisotropic chromatophore deformations and applying dimensionality reduction methods, the authors infer that individual chromatophores can be a part of multiple motor units. Clustering analyses reveal putative motor units that often span multiple chromatophores, with diverse and overlapping geometries. Chromatophore expansion dynamics are faster and more stereotyped than relaxation, consistent with active neural contraction followed by passive recoil. Together, the results show that chromatophores function not as uniform pixels but as fractionated, coordinately controlled elements that enable flexible pattern generation

      Strengths:

      The authors present compelling, direct evidence that a). chromatophore deformations are anisotropic, and indirect evidence that b) individual chromatophores can be split across multiple putative motor units. This evidence is provided through data collected over large spatial scales, but also at a sub-chromatophore resolution. This combination of scale and resolution is not possible using traditional neuroanatomical and physiological approaches alone.

      The authors also develop a new non-invasive, image analysis approach to extract information about chromatophore deformation across large spatial scales on the organism's body. In principle, this approach is applicable across species and may allow for further comparative characterization of chromatophore motor control. It is therefore a promising new tool and useful resource for the community.

      Weaknesses:

      An important weakness of the work is that the methods the authors develop can only be applied during resting, spontaneous 'flickering' activity of chromatophores. The inability to reliably apply their technique during any kind of realistic camouflage is a large limitation, as it means this method cannot be used to study the dynamics of motor control during realistic camouflage behaviors.

      Another weakness of this paper is the rather limited electrophysiological validation of the computational findings. The authors present only one electrophysiology experiment in E. berryi, the species that they used only for 'methodological development' and not for detailed characterization. A complementary electrophysiological experiment in S. officinalis, or some visualization of neuron morphology confirming that motor neurons do indeed project to multiple chromatophores, would strengthen the generalizability of their computational analysis. This would be particularly pertinent to validate the author's claim that some motor units contain chromatophores that are quite distant from one another on the animal.

      Overall, the authors' technical contributions and method development are an important advance. This work serves as an excellent proof of concept that their method can extract useful information about chromatophore motor control. Further validation of their method is needed to fully trust the fine-scale conclusions drawn about the distribution and composition of multi-innervated chromatophores. Furthermore, the authors raise many interesting ideas about developmental constraints on circuit wiring and potential adaptive significance of multi-innervated chromatophores for certain features of camouflage patterning. Their method may be able to help resolve some of these questions in the future if it is refined and applied across developmental stages, regions of the animal, and across species

      We thank the reviewer for their thoughtful evaluation and for recognizing the potential of the computational approach introduced in this study.

      Regarding the focus on spontaneous chromatophore activity, we have clarified earlier in the Results section why these events are necessary to isolate individual muscle activations. While large camouflage patterns are visually striking, they involve the coordinated activation of many groups of chromatophores by premotor circuits simultaneously, making the identification of individual motor units, our goal here, impossible. Our approach can, however, also be applied during active behavior, including camouflage; the questions addressed there would be different, focusing on how multiple motor units are coordinated to generate the resulting skin patterns, rather than resolving the structure of single motor units. This could be challenging if the patterns of premotor control are highly variable, thus making the detection of meaningful or interpretable motion correlations difficult. This remains to be tested.

      We also acknowledge that electrophysiological validation remains limited. Patch-clamp experiments were performed in Euprymna berryi to test predictions generated by the computational analysis, and these experiments confirmed that activation of single motor neurons can produce anisotropic expansion of chromatophore subregions. We now provide the associated datasets in the Data Availability section. We agree that complementary electrophysiological or anatomical experiments in Sepia officinalis would further strengthen the conclusions. Such experiments represent an important direction for future work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      General points:

      (1) Given all the experimental conditions and animals tested, the manuscript would be much stronger if the figures represented pooled data from many animals and experiments (e.g. Figure 1C).

      We agree that pooling data from multiple animals would strengthen the manuscript. In practice, we tested these experimental conditions across several animals (14 Euprymna berryi and 12 Sepia officinalis), but we selected the segments shown in the figures for their minimal artifacts and errors. Acquiring high-quality, stable recordings of this type is extremely challenging, and the presented data represents the clearest examples suitable for analysis and visualization. We hope that in the future these methods will enable not only the collection of a larger, high-quality dataset, but also comparisons across individuals, ages, species, and different regions of the mantle.

      (2) It's very unclear what animals were used for each experiment:

      (a) E. berryi: L677 states that 14 animals were filmed, and L684 implies that non-sedated individuals were used in addition to sedated animals, but it appears all the data is from a single E. berryi with sedation?

      The original wording was unclear, so we modified the sentence for clarity. The Methods now specify that 14 animals were filmed to refine the experimental pipeline and explore different conditions, while the data presented in the Results are from a single lightly sedated individual chosen for quality and stability of chromatophore activity.

      (b) S. officinalis: L692 onwards states that lots of different conditions and animals were explored, but only minimal data from a couple of animals is described in the figures. L156 states that all (?) the data comes from one head-fixed animal and one sedated and head-fixed animal. L549: The conclusion states that the pipeline was used in freely moving animals, but it appears that all of the S. officinalis were head-fixed? This is very confusing. Rather than describing the conditions of every experiment ever performed, the manuscript would benefit from explicitly stating the experimental conditions used for each figure.

      The original text was unclear. We have clarified in the manuscript which animals and experimental conditions were used for the analyses in each figure. To clarify, E. berryi was recorded without head fixation, whereas S. officinalis data were obtained under head-fixed conditions. We did film 11 S. officinalis without head fixation, and data can in principle be extracted from these recordings. Head fixation was used both to minimize visual artifacts and to enable longer, stable recordings, which was important for capturing the highest level of apparent noise in motor unit activation—information that is critical for our analyses of motor-unit organization, though not necessary for studies of broader camouflage patterns. Our computational pipeline enables large-scale analyses that would be very difficult or impossible with traditional electrophysiology, not that all data were acquired from freely behaving animals. While fully unconstrained recordings remain technically challenging due to optical and logistical constraints, we maintain that our approach provides a valid framework for analyzing freely behaving animals.

      (c) Additionally, there is a claim that the sedated condition represents the unsedated one (e.g. L151 and L643), but no data is shown to support this. L173 references Figure 6d as evidence, but 6d doesn't exist. Only L210 provides sedation/no sedation statistics for the number of components per motor unit. However, in L643 it says "and motor unit organization remained unchanged". This data needs to be shown to include that statement.

      Reference to the inexistant 6d figure was removed. L170 provides statistics for the number of principal components per chromatophore, and L210 provides statistics for the number of components per MU. We do not think a sub-figure is necessary. We, however, agree that L643 “motor unit organisation” is potentially misleading as we only compared the number of chromatophores belonging to a single MU and not the MU shape or distribution. Changed “organization” to “size (in chromatophores)”.

      (3) The text needs considerable revision. There are many typos (including multiple instances of "refs" instead of the actual references being inserted). These issues make the manuscript much more difficult to evaluate.

      Our apologies. We have now added the missing refs.

      (4) It is not clear how convincing the chromatophore groups are. For instance, Figure 4h could alternatively be interpreted as a group of 5 chromatophores in a motor group that happen to co-vary with a sixth one at a great distance. Without seeing some of the raw data (videos), it's difficult to assess how convincing it is that these chromatophores belong to the same group. I recommend analyzing: when multiple chromatophores expand together, what is the likelihood that other chromatophores also happen to expand at the same time (given the frequency that they're all changing shape spontaneously)?

      We appreciate the reviewer’s concern. Chromatophores are assigned to the same cluster because their activity, or that of their slices, covaries consistently over time. It is, of course, possible that what appears as a single motor unit may reflect two or more motor neurons acting simultaneously during the recording. Longer video segments increase confidence in the integrity of inferred motor units, but in the absence of a ground truth for motor unit spatial organization in this species at this age, it is difficult to quantify the likelihood that two motor units are being conflated. Raw video data is provided in the Data Availability section. We note, however, that most of the time motor units cannot be readily discerned by eye, because individual chromatophores and their constituent slices fluctuate continuously, and motor-unit correlations are subtle and distributed across multiple chromatophores.

      (5) The rationale for focusing on spontaneous activity is introduced relatively late in the manuscript and would benefit from being stated earlier. Examples should be provided of what this looks like (as opposed to regular chromatophore expansion). It would be valuable to see measurements across many experiments of how expanded the chromatophores are - what is the change in surface area? And what is the frequency of expansion for each chromatophore?

      Thank you for the remark. This is true. We have added a paragraph at the beginning of the Results section to clarify the rationale for focusing on spontaneous activity.

      This section now reads:

      “Because our primary aim was to describe the composition and coordination of chromatophore motor units, it was important to examine animals in the absence of the descending commands that occur during active behavior. Spontaneous activity, typically mild and “noisy” was thus ideal to enable measurements of the motion correlations between chromatophores that reflected shared motor neuron drive, rather than shared correlations due to upstream motor neuron groupings by premotor circuits.”

      We added an example of video recording of spontaneous activity in our Data Availability section.

      While quantifying expansion magnitude and frequency across experiments would indeed be valuable, these questions fall outside the primary focus of the present study, which centers on resolving motor unit organization. In the section “Dynamics of chromatophore expansion and contraction,” we analyze the speed of expansion and contraction to demonstrate that such kinetic features can be reliably detected with the temporal resolution of our video imaging approach. By isolating single muscle activations, we establish a methodological framework that can be used in future work to quantify expansion amplitude, rate of change and frequency across preparations.

      (6) Chromatophore expansion was only measured in anesthetized E. berryi, and L679 states that chromatophore expansion was triggered by shining light on the skin. However, light-mediated chromatophore expansion may be mediated by a different mechanism, so chromatophore correlations do not necessarily reflect the underlying motor control.

      We agree that there is, in principle, a theoretical risk of direct light-mediated activation of chromatophores. Yet, the kinetics of this light mediated activation are very different, and are the object of a separate, on-going investigation by our groups. In our experiments, the illumination was applied to the whole animal rather than locally to the skin, ensuring that all chromatophores and the eyes were exposed to the same light source. By transitioning from darkness to light, we created a window in which chromatophores were partially expanded—both fully contracted and fully expanded states would show little to no decorrelation. Within this window, we observed spontaneous fluctuations in chromatophore activity, which formed the basis for our correlation analyses. To our knowledge, direct light-mediated expansion of chromatophores has not been reported in E. berryi although it may exist there. Finally, the size, shape, and orientation of the inferred motor units align with electrophysiological evidence, supporting the validity of our motor unit inferences.

      (7) Some figures might be better suited for the supplement. For instance, it's not clear what the significance of Figure 5 is (it's not currently sufficiently justified in the text).

      We have clarified the purpose of Fig. 5 in both the Results and Discussion sections. In the Results, we now explain that events are separated by amplitude to show that expansion–contraction kinetics can be reliably measured across a full range of chromatophore events, validating the precision of our videographic approach. In the Discussion, we highlight that this precision allows measurement of radial muscle speeds and opens avenues to study chromatophore biomechanics, including the contributions of intertwined forces such as radial muscles, elastic pigment sacs, and intercellular coupling.

      (8) Multiple chromatophores can belong to multiple clusters - this study reveals that this is because subsections of a chromatophore are controlled separately. But do the same sections (slices) of chromatophores ever belong to multiple clusters?

      Yes, it is possible. Dubas (1985) used videographic recordings to show that the same chromatophore muscle fibers could be activated by stimulation of different nerve bundles, supporting Florey’s (1969) electrophysiological evidence for polyneuronal excitatory innervation. From Dubas: "Usually, different muscle fibres were recruited by each nerve but sometimes a single muscle fibre responded to stimulation of each nerve. Variations of the stimulus voltage also produced gradation of the amplitude of shortening of individual muscle fibres. This supports the evidence above for multiple innervation of single muscle fibres."

      The petal-like distribution of motor-neuron influence shows overlapping territories, suggesting that some chromatophore sections may be influenced by multiple neurons. However, this overlap could arise from polyinnervation of individual muscles, the presence of gap junctions between muscles, or passive mechanical coupling due to the elastic properties of the pigment sac.

      The petal-like distribution of motor-neuron influence shows overlapping territories, suggesting that some chromatophore sections may be influenced by multiple neurons. However, this overlap could arise from polyinnervation of individual muscles, the presence of gap junctions between muscles, or passive mechanical coupling due to the elastic properties of the pigment sac.

      With the present approach, it is not possible to disentangle the relative contributions of these mechanisms, which will require targeted physiological or anatomical experiments. For this reason, we adopted a hard clustering approach for individual chromatophore slices.

      (9) All time should be labeled in seconds, not in frames, and all distances should be measured in um or mm, not in pixels.

      We chose to present figures in pixels and frames to reflect the native units of our recordings and analyses, which preserves fidelity and reproducibility of the computational pipeline. For biological interpretation, corresponding values are converted to µm in the main text, providing the relevant real-world scale. A scale for conversion is provided in the figure legend.

      Specific comments:

      (1) L36: I'm not sure the description of virtual chromatophores here is clear enough to make sense to a more general audience.

      Addressed. We retained the concept of ‘virtual chromatophores’ in the abstract and added a brief clarifying phrase to indicate that these are functional groupings of adjacent chromatophore territories that act as single units.

      (2) L50: "Rimmed by" - consider rephrasing.

      Addressed. Replaced with “surrounded”.

      (3) L64: "refs" - actual references aren't inserted. There are multiple other examples of this.

      Addressed. Added missing references.

      (4) L100: This section could use rewriting. Some of the text reads more like a figure legend.

      Addressed. We have streamlined the main text to reduce redundancy with the figure legend.

      (5) L101: Consider the opening sentence/s providing a more general introduction to the question and approach.

      Addressed.

      (6) L104: This implies that the data presented are from 14 animals of many ages. This is only relevant if the pooled data is analyzed and presented.

      We agree that the original phrasing was ambiguous. We have modified the sentence for clarity, and explain in the Methods that 14 animals were filmed to refine the pipeline and explore experimental conditions, while the analyses shown are from a single animal.

      (7) L111: HDBSCAN should be defined.

      Addressed. The acronym has been expanded.

      (8) L173: Figure 6D doesn't exist.

      Addressed. Reference to the inexistent 6d figure was removed.

      (9) L193: "excluding negative (contraction) phases" This phrase requires clarification.

      Addressed. Added “see Methods” in the legend and added clarification on the reasoning in Methods.

      (10) L204: Should explain why the switch to affinity-propagation clustering was made when a different method was used for E. berryi.

      Addressed in discussion.

      (11) Figure 3: I recommend including a diagram or image of a whole cuttlefish and showing what the corresponding imaging area was in relation to the animal so the reader gets an intuitive sense of scale.

      Thank you. We have added a supplementary figure to give the reader a sense of scale.

      (12) L221/Fig 3b: These colors are supposed to represent clusters of 3 to 5 chromatophores? The clusters look much bigger.

      The figure shows clusters of 3 to 5 chromatophores, but many adjacent clusters were assigned the same color. We have changed the colors to remove this ambiguity.

      (13) Figure 3c: This would be more powerful if it represented the combined data of many experiments to draw a general conclusion. Also, shouldn't these cluster sizes match those in 2e, e.g. they get as big as 40?

      We assume the reviewer is referring to a comparison between Figures 3c and 2e. For visualization purposes, the graph in 3c was truncated to display over 90% of the data, which explains why the largest clusters appear smaller than in 2e. We modified the legend accordingly. We agree that the results would be strengthened by pooling data from additional experiments; however, acquiring high-quality, artifact-free recordings suitable for motor unit analysis is extremely challenging. We hope that our framework will enable future studies to extend this analysis.

      (14) Figure 4: I would show some of these examples earlier, to give the reader an intuitive sense of the data and claims (though it doesn't need its own figure - provide a couple of examples, and the diagram of how much of the mantle you're sampling) then put the rest in the supplement, and include some videos too.

      We agree that providing spatial context is important for readers to develop an intuitive understanding of the dataset. However, introducing examples of motor units earlier in the manuscript would, in our view, interrupt the logical progression of the Results, where motor unit identification builds on prior analyses. To address the reviewer’s concern, we have added a new supplementary figure (Fig. S1) illustrating the size and location of the sampled mantle region. In addition, we now provide representative videos in the Data Availability section to give readers direct visual access to the underlying dynamics.

      (15) Figure 4f: Is the location of the split color in each dot accurate? It's surprising that each one is split down the middle, and the pink side is always on the right - this is unintuitive given where the motor neuron is likely to be located.

      The dots and half dots represent the membership of a chromatophore to a particular cluster.

      (16) Figure 5: I didn't find this figure sufficiently justified in the text. I would move this to the supplement.

      Addressed in General point #7.

      (17) L350: States that 12 animals were patched, but the data isn't shown. It's important to show all of this data (some of which can be in the supplement).

      Addressed. We provided the data in the Data Availability Section.

      (18) Figure 5: I would quantify how many chromatophores were in each motor group across all the recording sessions, and compare this to the equivalent behavioral analysis.

      We assume the reviewer means Fig. 6. We calculated and stated the size of motor units across patching sessions.

      (19) Figure 5c: I recommend labeling each panel with a different number so you can refer to specific data.

      We assume the reviewer means Fig. 6c. We consider the figure layout clear enough to allow readers to follow the data without additional panel numbers.

      (20) L379: Typo: repeat of "quantitative"

      Addressed.

      (21) L576: Salinity should be 33-36 ppt, not %

      Addressed.

      (22) L877: The salinity units are sg? That should be stated. Though I would use the same units for salinity throughout.

      Addressed.

      Overall, this work introduces a potentially valuable quantitative framework for studying chromatophore dynamics. Addressing the points above would substantially strengthen the manuscript and clarify the scope and support for its conclusions.

      We thank the reviewer for these many helpful comments.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 64 - missing references for chromatophore colour with age.

      Addressed. Added missing refs.

      (2) Line 64-65 - would be good to have a little more detail about what is meant by 'migrating through the skin'. Is this a lateral process, or depth in the skin?

      Addressed. Changed “migrating in the thickness..” with “through the thickness..” to emphasize verticality.

      (3) Line 72 - typo, should read '...individual and groups...'

      Addressed.

      (4) Remove 'In Fig 1, ...' from line 104.

      Addressed.

      (5) Figure 1 - It's unclear why some chromatophores are uncoloured with a red dot in the centre. Are these chromatophores that do not share a cluster with neighbours? If so, wouldn't it make more sense to colour the chromatophore with a unique colour of its own? Or, at the very least, make a note in the caption to indicate that all white chromatophores are not clustered with neighbours.

      Segmented chromatophores are shown in white, with coloured slices highlighting cluster membership. Uncoloured slices represent outliers. Addressed in the figure legend.

      (6) Line 119 - the concept of a 'closed virtual chromatophore' needs a few more words of explanation. The way I interpret the text as it is, is that the motor units driving colour change are not necessarily the individual chromatophores, but a motor region containing a mixture of whole and partial chromatophores innervated by the same motor neuron. If this is the case, a few extra words of description would help here to remove any ambiguity as I think this is an important concept for the paper.

      Addressed. We added a sentence clarifying the concept.

      (7) Line 173 - Figure 6d doesn't exist in the paper. Was a different panel intended? If so, please make sure to number the figures in order of appearance in the manuscript.

      Reference to the inexistent figure 6d was removed.

      (8) Figure 3b is very difficult to see. Perhaps consider lightening the background image. Please also indicate whether the individual colours refer to individual clusters. If this is the case, then some of these clusters look much larger than the 3-5 suggested in the caption.

      This issue has been corrected.

      (9) Line 210 - remove the bold type.

      Addressed.

      (10) Line 211 - please specify which 'two groups' you are referring to here. Presumably, this is anaesthetised and non-anaesthetised.

      Addressed.

      (11) I think that the text is missing any indication of the pixel sizes involved in extracting slice metrics, particularly from the S. officinalis data. It would be great to include some data on how many pixels span the radius of an expanded chromatophore. There is some small indication of this in Figure 2a, but a panel or two with details about the pixel size of S. officinalis chromatophores and their slices would be welcome. This would help with the judgment of the robustness of the resolution of the analysis. Looking at the y-axis in Figure 5a, there is some indication that the chromatophore radius is only 1 to 8 pixels. Is this the case?

      Figure 5a doesn’t show chromatophore radius but instead the relative change in peak amplitude during an expansion event. At that point the chromatophore has likely a larger radius as you sum the baseline radius of the chromatophore + the size of the peak.

      (12) Line 246-7 - reword this sentence to avoid referring to Figure 3d in the narrative. Include it in parentheses instead.

      Addressed.

      (13) Lines 408 and 409 - missing references.

      Addressed.

      (14) Line 576 - salinity should be reported in parts per thousand, not per cent.

      Addressed.

      (15) Line 593 - how were animals <50mm fed?

      Animals smaller than 50 mm were fed Neomysis spp. or small Palaemonetes spp., as noted a few lines above the description for animals larger than 50 mm.

      (16) Line 847 - typo - '...putative motor units' ramifications...'

      Addressed.

      (17) Line 854 - better to write out the [chrom_id, label] info as narrative text rather than using the variable names.

      Addressed.

      (18) Line 876 - two typos '...were reared in an artificial...'

      Addressed.

      (19) Line 877 - please use the same salinity metric as used in the earlier part of the methods.

      Addressed.

      (20) Section 898-910 - equipment details would ideally include the location of the company. E.g. (BX51W1, Olympus, Tokyo, Japan).

      Addressed.

      Reviewer #3 (Recommendations for the authors):

      I am left with a number of questions that arise from the authors' work, some of which the authors themselves briefly mention in the technical limitations section.

      (1) In relation to the first weakness, do the authors know if the recruitment patterns they identify are likely to be the same when octopi perform visually-mediated camouflage to their environment?

      Thank you for this comment. We assume the reviewer is referring to S. officinalis. There seems to be a misunderstanding: our approach is designed to reveal the smallest independent functional units—motor units—that together generate skin patterns. The technique is fully applicable to an animal displaying camouflage, but the results would necessarily differ. Camouflage patterns are composed of relatively large shapes compared to individual motor units and arise from the coordinated activation of multiple units. Disentangling motor units requires decorrelated activity, whereas visually-evoked camouflage inherently drives correlated motor-unit activation by premotor control. To use an analogy, if our goal were to map the distribution and wiring of pixels on a screen, it would be more informative to broadcast a noise signal rather than display coherent images, as the noise produces decorrelated activity that allows the underlying structure to be resolved. We have clarified this important point in the early results section.

      (2) The authors provide indirect evidence that motor neurons innervate multiple chromatophores. Can sets of radial muscles within a chromatophore be innervated by multiple motor neurons? Is there neuroanatomical evidence or experiments that could perhaps shed light on this?

      Addressed above. Same question as #1(8).

      (3) Are multi-innervated chromatophores evenly distributed across the octopus's body? For instance, could the authors compare chromatophore recruitment over multiple patches on the animal from multiple regions?

      At present, we do not have sufficient data to quantitatively compare motor-unit structure or the distribution of multi-innervated chromatophores across different body regions of cuttlefish. However, we would not necessarily expect uniformity across the skin, as distinct body regions are associated with characteristic pattern elements (e.g., the white square on the central mantle or the thicker zebra stripes along the sides). It is therefore plausible that different motor-unit geometries and densities are differentially represented across regions to support these region-specific patterns. Future recordings spanning multiple patches and body locations will be required to test this question directly.

      (4) Relatedly, is there any idea of whether chromatophore size or age corresponds with the number of motor units within a single chromatophore?

      At present, our analyses are limited to single developmental time points, and we therefore cannot directly assess whether chromatophore size or age correlates with the number of motor neurons innervating an individual chromatophore. However, this is a question that our analysis framework is explicitly designed to address. Our custom pipeline, CHROMAS, (Ukrow, Renard et al., 2025) includes tools for longitudinal image alignment that allow chromatophores to be tracked within the same animal across development. Applying these scripts to developmental datasets enables future analyses linking chromatophore growth or age to changes in the motor innervation of single chromatophores.

      I understand that a full resolution to the issues raised above may require substantial additional experiments. At a minimum, further discussion of these points with integration of existing literature would elevate the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The rationale behind averaging sentence embeddings across multiple transformer models (with different architectures and training objectives) is unclear. These transformer-based models have different training paradigms and model architectures, which may result in misaligned semantic spaces. The averaging operation may dilute the distinct sentence representations learned by each model, potentially weakening the overall semantic encoding for sentences. Please clarify this choice or cite supporting methodology.

      The reviewer questions the rationale for averaging sentence embeddings across different models. However, our method involves computing correlations separately for each model, then averaging the correlations. We apologize for the confusion. We have clarified this on page 3:

      “Results for the ‘Transformers’ model are computed by computing correlations separately for five different transformer models and then taking a simple average of these correlations. Results for each individual transformer are presented in Supplementary Information Figure S2.”

      (2) All structure-sensitive models discussed incorporate semantics to some extent. Including a purely syntactic baseline, such as a model based on context-free grammar, would help confirm the importance of syntactic structures.

      Following the suggestion, we have implemented two syntactic models and discuss the results on page 10:

      “We also found that purely syntactic models based on constituency parses (see Benepar and CFG) show poor correlations with brain activity (see Supplementary Information Figure S2). Examining the corresponding RSA matrices (see Figure S1), this seems to be due to such models being overly sensitive to syntactic form, and relatively insensitive to which words are assigned to different nodes within the syntactic tree. This is most evident for the edit-distance similarity metric, and to a lesser extent also for the subtree similarity metric. This finding highlights the value of hybrid approaches designed to appropriately balance sensitivity to lexical, syntactic, and compositional information in representing semantic information at the sentence level.”

      (3) In Figure 2, human behavioral judgments show weak correlations with neural data, and even fall below those of computational models, suggesting the behavioral judgments may not reflect the sentence structures in a brain-like way. This discrepancy between behavioral and neural data should be clarified, as it affects the interpretation of the results.

      While the behavioural judgements are made by different participants and involve a different task than the neuroimaging results, nonetheless we agree the difference is surprising and warrants more detailed consideration. We have included a more detailed discussion of this issue on page 11:

      “Our study has several limitations. First, we found a surprisingly low correlation between behavioural ratings and brain activations (see Figure 2). This may be partly explained by differences in task structure. In the behavioural experiment, participants viewed many pairs of related sentences, and were explicitly asked to pay attention to differences in the words of each sentence. In contrast, in the fMRI task, participants read one sentence at a time without an explicit comparison. In addition, we suspect that presentation of so many sentence pairs with highly similar structures may have biased the way in which participants rated sentence similarity. Modifications to the behavioural task to mitigate these aspects may reduce the divergence between behavioural and brain findings.”

      (4) To better contextualize model and neural performance, sentence similarity should be anchored to a notion of semantic "ground truth", such as the matrix shown in Figure 1a. Comparing this reference with human judgments, brain responses, and model similarities would help establish an upper bound.

      While our design matrix served as the basis for constructing a set of stimuli with systematic modifications, we respectfully suggest that it should not be regarded as a ‘semantic ground truth’. Sentence pairs within each category will not have the same degrees of semantic similarity since the words and context differ across sentences in a graded manner. Furthermore, while we anticipated ‘different’ sentence pairs would be less similar than ‘swapped’ sentence pairs, and that within each of the six block diagonals the ‘modified’ or ‘substituted’ sentence pairs would be the most similar, we did not have any prediction about the magnitude of these differences. Our goal was to construct a set of sentence pairs which spanned a range of semantic similarities, and allowed for dissociation between lexical similarity and overall similarity in meaning. The design matrix is not intended to represent a ‘ground truth’ that human judgements or brain representations would be expected to conform with.

      (5) The structure of this paper is confusing. For instance, Figure 5 is cited early but appears much later. Reordering sections and figures would enhance readability.

      We agree that placement of figures was not ideal in the previous draft. We have reworked the manuscript so that all figures appear closer to their mention in the text, and the figure (now Figure 3) appears in the correct order. We have also substantially revised the discussion, and included subheadings to help guide the reader through the various different issues we include.

      (6) While the analysis is broad and comprehensive, it lacks depth in some respects. For instance, it remains unclear what specific insights are gained from comparing across brain regions (e.g., whole brain, language network, and other subregions). Similarly, the results of simple-average and group-average RSA appear quite similar and may not advance the interpretation.

      We included both analyses in line with our preregistration, and also because we believe the fact that two distinct approaches to analyzing the data yield similar results strengthens our conclusions.

      (7) While explaining the grid-like pattern due to sentence length is important, this part feels somewhat disconnected from the central question of this paper (word order). It might be better placed in supplementary material.

      We believe that the grid-like pattern in the RSA results is an important unexpected finding that warrants discussion in the main manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) Consider including a purely syntactic baseline model. For instance, parse each sentence into a constituency tree and compute tree edit distances between pairs of trees. This would allow you to construct a sentence similarity matrix based solely on syntactic structure, and may clarify the role of syntax in sentence representations.

      See our response to Public Review comment 2.

      (2) Instead of averaging embeddings across different transformer-based models, I recommend reporting RSA results for each model individually. For instance, compare one sentence-level model (e.g., SentBERT or SimCSE) and one general-purpose language model (e.g., GPT-2 or Llama).

      See our response to Public Review comment 1.

      (3) I suggest revisiting the structure of the Results section to improve the clarity and impact of your key findings. Consider which results are most central to the paper's claims and ensure they are presented in the main text. Less central analyses (e.g., the analysis on the grid-like pattern) might be better suited for the supplementary information. Presenting behavioral results prior to neuroimaging results could also improve logical flow by first validating model similarity estimates behaviorally.

      As mentioned in our response to Public Review comment 5, we have revised the ordering of the figures to improve the flow of the main manuscript. We believe that the grid-like pattern in the RSA results is an important unexpected finding that warrants discussion in the main manuscript. In addition, we believe that presenting the neuroimaging results first is appropriate as this is the primary and most important contribution of our study.

      Reviewer #2 (Public review):

      (1) The stimuli are not fully controlled for lexical content across conditions. Residual lexical differences between sentences could still influence both brain and model similarity patterns. To more cleanly isolate syntactic effects, it would be useful to systematically vary only a single structural element while keeping all other lexical content constant (e.g., the boy kicked the ball / the ball kicked the boy). It would be better to engage more with the minimal pair paradigm, which is widely used in large language model probing research.

      The reviewer rightly argues that our stimuli do not fully control for lexical content across conditions, and that a more appropriate paradigm may be to utilise minimal pairs in which only a single variable of interest (such as sentence structure) is modified. We agree that most of our sentence pairs do not constitute minimal pairs; however, this was not our objective. Our study design aimed to synthesise traditional minimal pair approaches with more recent research paradigms using naturalistic stimuli. As such, we selected stimuli which are more complex and contain more variable features than traditional minimal pair studies, but which also are tailored to highlight differences which are of particular theoretical interest.

      Because we are interested in comparing the effects of multiple sentence elements and semantic roles, a systematic pairwise comparison of minimal pairs is not necessarily optimal. Instead, we designed our stimuli to leverage the advantage of fMRI in that we can measure the brain representations corresponding to each sentence, and hence can conduct a full series of pairwise comparisons of sentence representations. We do not claim this approach to be universally superior to a minimal pair approach, but we do believe our novel approach provides additional insights and a new perspective on semantic representation relative to minimal pair studies.

      We have added the following paragraph on pages 9-10 contrasting our approach to previous minimal-pair studies:

      “Another approach that has seen widespread use is the presentation of minimal sentence pairs that differ only in one specified aspect, for example, interchanging subject and object in a sentence (Frankland 2015, Wang 2016, Frankland 2020, Giglio 2024), or altering adjective-noun phrases to influence composition (Graves 2010, Schell 2017, Fyshe 2019, Ciapparelli 2025). Our approach is an extension of these approaches utilising more naturalistic and complex sentences, designed to facilitate comparison of a wider range of structural manipulations (see Table 1). In more completely characterising the representational structure of various computational models in response to different structural contrasts, we can more comprehensively evaluate their adequacy as models of semantic processing in the brain.”

      (2) The comparisons are done across fundamentally different model types, including static embeddings, graph-based parsers, and transformers. The inherent differences in dimensionality and training objectives might make the conclusion drawn from RSA inconclusive. Transformer embeddings typically occupy much higher-dimensional, anisotropic representational spaces, and their similarity structure may reflect richer, more heterogeneous information than models explicitly encoding semantic roles. A lower RSA correlation in this study does not necessarily imply that transformers fail to encode syntactic information; rather, they may represent additional aspects of meaning or context that diverge from the narrow structural contrasts probed here.

      The reviewer notes that low RSA correlations do not necessarily imply that transformers fail to encode syntactic information. We acknowledge this in our discussion (page 10), where we also highlight that our focus is not on whether transformers encode such information, but rather what transformer representations can tell us about how sentence structure is represented in the brain. Our results indicate that transformer embeddings do not have the same geometric properties as brain representations of sentence meaning, at least for certain types of sentences where lexical information is insufficient to determine overall meaning.

      The reviewer also notes that transformer embeddings are highly anisotropic; however, we adjust for this by normalising each feature as discussed on page 14. Finally, the reviewer notes that the transformers we examine differ in architecture and training objectives. This is not critical for our study because we are not seeking to determine which architecture or training objectives are best. Our goal is simply to compare a range of approaches and see which, if any, have similar sentence representations to those formed by the brain. In fact, our results indicate that architecture and training regime make relatively little difference for our stimuli, as shown by the pattern of results for all models in Figure S2.

      (3) The interpretation of the RSA correlation largely depends on the understanding of models. The authors suggest that because hybrid models correlate better than transformers, this implies that transformers are inferior at representing syntax. However, this is not a direct test of syntactic ability. Transformers may encode syntactic information, but it may not be expressed in a way that aligns with the RSA paradigm or the chosen stimuli. RSA does not reveal what the model encodes, and the models might achieve a good correlation for non-syntactic reasons (e.g., length of sentence, orthographic similarity, lexical features).

      The reviewer argues that RSA correlations do not measure the extent to which a model encodes syntactic information. This is very similar to the previous point. We do not claim that our results show that transformers do not encode syntactic information. Rather, our claim is that sentence embeddings derived from transformers have different geometric properties to brain representations, and that brain representations are better described by models explicitly representing key semantic roles. From this we conclude that, at least for the sentences we present, the brain is highly sensitive to semantic roles in a way that transformer representations are not (at least to the same extent). We have clarified this in a modified paragraph on page 11:

      “We emphasise that our results do not show that transformers fail to represent syntactic or semantic role information. Indeed, large language models show clear capabilities of correctly interpreting sentence structure (Chang 2024), and probing studies have found that transformers represent information about syntax and word order (Clark 2019, Manning 2020). This is consistent with our finding that directly prompting GPT-4 to rate sentence similarity yields very high correlations with human judgements (see Supplementary Information Figure S3). Nonetheless, the fact that transformers can encode and utilise structural information to perform linguistic tasks does not mean that they effectively utilise this information to construct a brain-like representation of sentence meaning.”

      We also respectfully disagree with the reviewer’s suggestions that sentence length and orthographic or lexical similarities may drive model correlations with brain activity. As we discuss on page 19, we explicitly control for differences in sentence length when computing correlations. Our process for constructing our sentence set also controls for lexical similarity by generating pairs of sentences with all or mostly the same words but different orderings. We did not explicitly address orthographic similarity, but this will be strongly correlated with lexical similarity.

      Reviewer #2 (Recommendations for the authors):

      (1) Model dimensionality: the interpretability of cosine similarity diminishes as the dimensionality increases, and there are some math tricks to work around it. To make a fair comparison among models with different dimensionalities, it would be better to apply some dimensionality-insensitive distance metrics.

      We thank the reviewer for this suggestion. We repeated all vector-based similarity calculations using the Dimension Insensitive Euclidean Metric (DIEM). As shown in Figure S9, the results are broadly similar, though with overall somewhat lower brain correlations for most transformers compared to cosine similarity.

      (2) Depending on the scope of the current study, if the authors would like to establish whether transformers are inferior to graph-based models in representing syntax, a linear classifier using the model embeddings would be sufficient. I think this would be a more direct assessment of model syntax ability than correlation with brain data.

      As we discuss in our previous responses, our objective in this study was not to assess how well transformers can represent syntax. Rather, the goal was to assess whether internal transformer representations have similar geometric properties to patterns of brain activation. Our results indicate that transformers do represent sentence structure, but in a different manner to the human brain.

      Reviewer #3 (Public review):

      (1) The interpretation of findings is nuanced. Although Transformers underperform as brain models on the critical subsets of controlled sentences, a Transformer outperforms all other models when evaluated on the union of all sentences when both word-level content and structure vary. Transformers also yield equivalent or better models of human behavioral data. Thus, although Transformers have demonstrable flaws as human models, which are pinpointed here, in the general case, (some) Transformers are more human-like than the other models considered.

      The reviewer argues that we overstate some of our conclusions, as several transformers achieve higher brain correlations than the hybrid model when computed over all sentence pairs, as well as on the behavioural data. In response, we first note that our primary interest in this paper is on the block diagonal sentence pairs, as these were specifically designed to interrogate how different models represent sentence structure. The comparison with all sentence pairs is presented for comparison but is not our primary focus on this paper, as also reflected in the pre-registered prediction that our VerbNet-CN hybrid model would show higher brain correlations than transformers over this block diagonal subset.

      Second, we have included a new analysis in the revised manuscript (Figure S9) where we compute brain correlations controlling for the pattern of similarities observed in the primary visual cortex (averaged over participants), as a way to control for visual similarity. This added control substantially reduces the brain correlations of the transformers, such that they all have lower correlations than VerbNet-CN and AMR-smatch even over the set of all sentence pairs. We provide interpretation of this result in the discussion.

      Third, we would like to note one of the disadvantages of transformers as a model of mind or brain representations is that they are largely a ‘black box’ whose workings are poorly understood. One advantage of hybrid models like our simple semantic role model is that they can be much easier to interpret, thereby enabling them to be used to determine which features are most important for brain representations of sentence meaning, and what mechanisms are used to combine individual words into a full sentence. Given their relative simplicity and interpretability, we believe hybrid models have considerable value as scientific tools, even in cases where they achieve comparable correlations to transformers. We have added a short discussion of this issue in the revised manuscript (page 10).

      (2) There may be confounds between the critical sentence structure manipulations and visual representations of sentence stimuli. This is inconvenient because activation in brain regions that process semantics tends to partially correlate with visual cortex representations, and computational models tend to reflect the number of words/tokens/elements in sentences. Although the study commendably controls for confounds associated with sentence length, there could still be residual effects that remain. For instance, the Graph model correlates most strongly with the visual cortex despite these sentence length controls.

      We agree with the reviewer that this is a potential confound. As noted in the previous response, we have implemented a new control analysis in which we directly control for visual similarities as reflected in participant-averaged similarities of primary visual cortex activations in response to all stimuli. These results are shown in Figures S8-S11 in the SI. We show that transformer correlations are reduced much more than graph and hybrid models with this control. Also, we note that the AMR-smatch graph model shows high correlations with other brain regions even after removing correlations with the visual cortex (Figure S10). This indicates that the model represents a range of sentence features, including both superficial visual or length-related features, as well as semantic features that are represented in common with language and other cortical regions.

      (3) Sentence similarity computations are emphasized as the basis for unifying comparative analyses of graph structures and vector data. A strength of this approach is that correlation is not always the ideal similarity metric. However, a weakness is that similarity computations are not unified across models. This has practical consequences here because different similarity metrics applied to the same model produce positive or negative correlations with brain data.

      The reviewer notes that the method for computing similarities differs between the vector-based (mean and transformer) models, and the hybrid and syntax-based models, thereby potentially adding an additional confound to our results. We agree that this is a potential limitation, and our correlations should always be understood as applying to a model paired with a similarity metric. However, we believe that this is mostly unavoidable when comparing different formalisms. In the revised manuscript we have incorporated an entirely new similarity metric for vector-based models (DIEM similarity), as well as an extended discussion of the effect of different similarity metrics for graph and hybrid models.

      Reviewer #3 (Recommendations for the authors):

      (1) Compute separate RSAs on each sentence pair type (especially Swapped), to quantify how each sentence type manipulation contributed to the divergence between model and brain. Although the manuscript is already brimming with analyses, I think squeezing this in would be helpful because the results currently rely on qualitative inspection of group-average scatter plots to interpret how sentence pair manipulations contributed to the divergence between Transformers and humans. The Swapped condition would appear to be the centrepiece of the title and manuscript, and potentially the only condition for which confounds associated with the surface form of sentence are controlled for (because sentences should be the same words in different orders). Thus, this analysis might see to the inconvenient visual cortex correlations in Figures 3d/e.

      We respectfully disagree that computing separate RSA for each sentence pair type would be a useful additional analysis. The motivation for the construction of our stimulus set was to provide a range of variants of a given base sentence that alter the semantic meaning and lexical content (somewhat) independently. The purpose of the ‘modified’ sentences, for instance, is to construct sentences with a similar overall meaning but lower lexical similarity due to the inclusion of many modifier words. It is precisely the comparisons across the different pair types that provide information about how each model represents sentence semantics, so restricting an analysis to only a single subset would not be very informative. Another problem with this approach is that it would dramatically reduce the number of sentence pairs analysed, thereby decreasing statistical power. In the revised manuscript we have provided additional details regarding the motivation and rationale for how our stimulus set of 108 sentences was constructed, which should help to elucidate this point more clearly. The following excerpt is from page 3:

      “Within each of the six subsets, we begin with a base sentence such as `the cameraman brought the equipment to the director', which we then systematically modified in various ways to create different combinations of lexical and compositional similarity, in order to dissociate these two aspects of meaning (see Table 1 for further details).”

      (2) Explaining the motivation for the sentence stimulus types. I appreciated the careful design of the dataset, but I couldn't immediately work out the motivation for all the different sentence types, and why this selection was ideal to identify divergences with Transformers. For instance, given the goal of (approximately) controlling for lexical similarity whilst varying sentence meaning, I couldn't immediately see why stimulus blocks weren't all built from rearranging the same content words (as in the Swapped condition). The negative RSA correlation with the Mean model also made me stop and think - it seems like the more similar the words in a sentence, the more different their structure, and vice versa, but I wasn't clear that this was a design feature. Thus, a few extra words motivating the conditions could be helpful for the reader, and these might helpfully lead them to anticipate the negative RSA correlation.

      As noted in the previous response, in the revised manuscript we have expanded our explanation of the rationale for the construction of our 108 sentences. In particular, Table 1 in the methods section now includes two additional columns which summarise the intended combinations of lexical and overall sentence similarity which our sentence pairs are intended to satisfy.

      (3) Explanation for why different implementations and similarity computations between variants of ostensibly equivalent Graph / Hybrid models yielded widely divergent positive vs negative brain correlations, despite both positively capturing behavioural ratings. This might incorporate a brief intuitive explanation of how Graph model similarities were computed (e.g., what SMATCH and WWLK do). In light of the above, why do different similarity algorithms applied to the Graph model yield positive and negative correlations on the same brain (e.g., Figure S2 - Graph / Graph-WL a,b, diag-pairs). Same goes for why Hybrid and Hybrid-AMR yielded positive vs negative correlations (e.g., Figure S2 - Graph / Graph-WL a,b, diag-pairs). Acknowledge that the brain results are sensitive to similarity computations in the Discussion.

      We appreciate this suggestion. We have added an extended consideration of these issues to the discussion (pages 10-11), as well as some additional details regarding the differences between the Smatch and WWLK metrics in the methods section (page 17).

      (4) Acknowledgement and explanation of why the human similarity ratings were poor at explaining brain data in Figure 2a,b (right column diag-pairs). The poor behaviour vs brain match is indirectly implied in the Discussion as "the comparison between behavioural and fMRI data is somewhat difficult owing to the difference in task structure." However, I would suggest being upfront and explicitly mentioning and explaining the poor brain match in Figures 2a and b, because the reader will notice and wonder - especially because the models correlate strongly with the behavioural data without the models doing the human behavioral task (though this could be a possibility, see later).’

      As suggested, we have included a passing reference to this in the presentation of our main results in page 5, and a lengthier discussion on page 11:

      “Our study has several limitations. First, we found a surprisingly low correlation between behavioural ratings and brain activations (see Figure 2). This may be partly explained by differences in task structure. In the behavioural experiment, participants viewed many pairs of related sentences, and were explicitly asked to pay attention to differences in the words of each sentence. In contrast, in the fMRI task participants (who were not the same as the behavioural task participants) read one sentence at a time without an explicit comparison. In addition, we suspect that presentation of so many sentence pairs with highly similar structures may have biased the way in which participants rated sentence similarity. Modifications to the behavioural task to mitigate these aspects may reduce the divergence between behavioural and brain findings.”

      (5) Brief explanation of why model vs brain correlations tended to be strongest in the visual cortex (Figure 3d,e). Currently, this issue is only mentioned in passing, however, it seems worthy of further comment.

      We appreciate the reviewer for highlighting this issue. We have added discussion of the potential for visual confounds to several points in the revised manuscript, including the ‘Neuroscience of semantics’ subsection on page 11. As noted, we have also added a new analysis in which we compute correlations controlling for the average RSA similarities of the primary visual cortex. We find that this additional control significantly reduces correlations for most transformer models, but only has a more modest reduction on the correlations for most of the graph and hybrid models, particularly VerbNet-CN (see Figures S8-S11).

      (6) Softening/clarifying some statements that could be misconstrued as suggesting Transformers were universally inferior models. Statements made in the Abstract/Discussion initially came over to me as implying that Transformers were universally inferior models when compared to the Graph/Hybrid models - but this appears only to be true when one looks at analyses conducted within block diagonal sentence subsets. Otherwise, when analyses are conducted on all sentences (between and within blocks, Figure 5) Llama 3 L2 provides by far the strongest brain model. Transformers also appear to yield the strongest accounts of the behavioural data, whether tested on block diagonal or all sentence pairs (Figure S3). To remedy this, I would suggest softening some statements in the Abstract/Discussion that could be misconstrued as suggesting that Transformers were universally inferior. I would also suggest explicitly acknowledging that when the entire dataset was analyzed, Transformers were most accurate, and that (some) Transformers best accounted for the behavioural data.

      We agree that there was some lack of precision in certain sections of the previous draft regarding the conclusions to be drawn regarding the representational capacities of transformers. We have revised the abstract and conclusion to better reflect our intended message, which is that transformers certainly can represent sentence structure and semantic roles, but that the way in which they do this (through vector representations in their hidden layers) is significantly different to how such features are represented in the human brain. In particular, we have included this new text on page 10:

      “We emphasise that our results do not show that transformers fail to represent syntactic or semantic role information. Indeed, large language models show clear capabilities of correctly interpreting sentence structure, and probing studies have found that transformers represent information about syntax and word order. This is consistent with our finding that directly prompting GPT-4 to rate sentence similarity yields very high correlations with human judgements (see Figure S3). Nonetheless, the fact that transformers can encode and utilise structural information to perform linguistic tasks does not mean that they effectively utilise this information to construct a brain-like representation of sentence meaning.

      (7) Given that GPT-4 was already deployed to parse semantic roles for the hybrid model, and GPT-4 should be able to generate reasonable similarity ratings between sentence pairs, it struck me that an interesting addendum could be to use GPT-4 similarities derived from the human behavioral task to interpret both brain and human behavioral data. This might also help support the case for conducting analyses within a similarity-based framework.

      We appreciate this suggestion. We have added this model (GPT-4 ratings of sentence similarity) to the revised manuscript (see Figures S1-S3).

      Other changes

      As noted by reviewer 3, the full set of sentence pairs was missing from the previous draft. They have been added to the SI of the revised manuscript.

      We have renamed the Graph and Hybrid models in the manuscript to AMR-Smatch and Verbnet-CN respectively, for greater clarity as to which models these terms refer to, and also to better differentiate from the newly added constituency parse graph models.

      We have thoroughly revised the discussion section, incorporating feedback from all reviewers regarding areas needing additional depth.

      We have added subsections to the discussion to aid the reader navigating the now lengthier section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review

      This manuscript puts forward the provocative idea that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. The authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. This firing pattern was disrupted by pharmacological blockade of the Orco receptor. They then use these recordings together with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. Orco did not show a circadian expression pattern in a qPCR experiment, and its conductance was proposed to be regulated by cyclic nucleotide levels. This evidence led the authors to conclude that a post-translational feedback loop (PTFL) clockwork, associated with the ORN plasma membrane, allows for temporal control of pheromone detection via the generation of multi-scale endogenous membrane potential oscillations. The findings will interest researchers in neurophysiology, circadian rhythms, and sensory biology. However, the manuscript has limited experimental evidence to support its central hypothesis and is undermined by several questionable assumptions that underlie their data analysis and model builds, as well as insufficient biological data, including critical controls to validate and/or fully justify the model the authors are proposing.

      We thank the reviewers for their thorough and thoughtful comments and believe that the manuscript is much stronger now after the revision which incorporates the requested changes. We added results of new experiments and additional analyses. Although these new insights did not change the previous conclusions, we significantly reworked the Discussion and added further references to clarify the conclusions we want to make.

      Please note that we used ORN as acronym for “olfactory receptor neuron” throughout the manuscript. ORNs contain odorant receptors (ORs), and in insects these ORs associate with the olfactory receptor co-receptor (Orco) to be trafficked to the membrane of the cilium of the ORN, where they can be contacted by pheromones and odorants. In Manduca sexta, evidence is accumulating for G-protein coupled metabotropic pheromone transduction and not for OR-Orco dependent ionotropic transduction, as shown for Drosophila melanogaster. In both insect species, besides its chaperone function, Orco can form leaky cation channels, which can regulate the spontaneous spiking activity of ORNs. In this study, we explored this role of Orco.

      Strengths:

      The study is notable for its combination of long-term in vivo tip recordings with computational modeling, which is technically challenging and adds weight to the authors' claims. The link between Orco, cyclic nucleotides, and circadian regulation is potentially important for sensory neuroscience, and the modeling framework itself - a stochastic Hodgkin-Huxley formulation that explicitly incorporates channel noise - is a solid and forward-looking contribution. Together, these elements make the study conceptually bold and of clear interest to circadian and olfactory biologists.

      Major weaknesses:

      At the same time, several limitations temper the conclusions. The pharmacological evidence relies on a single antagonist and concentration, without key controls. The circadian analysis is based on relatively small numbers of neurons, with rhythms detected only in subsets, and the alignment procedure used in constant darkness raises concerns of bias. The molecular evidence is sparse, with only three qPCR timepoints, and the model, while creative, rests on assumptions that are not yet fully supported by in vivo data.

      Please see our responses to the detailed comments.

      Detailed comments are provided below:

      (1) The role for Orco proposed in the authors' model largely stems from the effects seen following the administration of (a single dose) of the Orco antagonist, OLC15. However, this hypothesis is undercut by the lack of adequate pharmacological controls, including a basic multipoint OLC15 dose-response series in addition to the administration of blockers for the other channels that are embedded in their model, but which were ruled out as being involved in the modulation of biological rhythms. In addition, these studies would (ideally) also benefit from the inclusion of the same concentration (series) of an inactive OLC15 analog to better control for off-target effects.

      The Orco agonist VUAA1 (Jones et al., 2011) binds directly to Orco and increases the channel open time probability. In M. sexta hawkmoths, we have already published that VUAA 1 increases the low spontaneous activity of ORNs in a dose-dependent fashion (Nolte et al., 2013). Chen and Luetje (2012) systematically varied the chemical structure of VUAA1 to identify new Orco ligands and discovered 22 Orco ligand candidates (OLCs) that either activated or inhibited Orco. In their heterologous expression system, Orco was most sensitive to inhibition by OLC15. Based on these results, we published a dose-response curve of OLC15 inhibition (1-100 µM) using in vivo tip recordings of pheromone-sensitive long trichoid sensilla of M. sexta (Nolte et al., 2016). There, we also demonstrated that OLC15 dose-dependently antagonizes the VUAA1-dependent activation of Orco.

      Furthermore, we tested other published Orco antagonists, which were characterized in heterologous assays, in primary cell cultures of hawkmoth ORNs, as well as in in vivo assays in intact hawkmoths. We focused on amiloride-derived antagonists, because we previously identified an amiloride-sensitive cation channel in hawkmoth ORNs. We found that, in contrast to OLC15, the amilorides HMA and MIA were not Orco-specific antagonists but instead affected different ion channel targets depending on the time of day (Nolte et al., 2016). Based on those experiments and the dose-response curves we determined that the Orco agonist VUAA1 (Jones et al., 2011) and the Orco antagonist OLC15 (Chen and Luetje, 2012) worked best in hawkmoth ORNs to target Orco pharmacologically. Due to those results and other comparative tests with other published Orco antagonists we settled since then in all further experiments on a dose of 50 µM OLC15 as most adequate to antagonize Orco functions in Manudca. In the current study, we focus on Orco without excluding the possibility that other ion channels in the ORNs contribute to the control of membrane potential rhythms.

      We have clarified the Methods section accordingly.

      (2) The expression pattern of Orco was assessed using qPCR at only three timepoints. Rhythmic transcripts can easily be missed with such sparse sampling (Hughes et al., 2017). A minimum of six evenly spaced timepoints across a 24-hour cycle would be required to confidently rule out circadian transcriptional regulation. In addition, the use of the timeless mRNA control from another study is not acceptable. Furthermore, qPCR analysis measures transcript abundance, not transcription, as the authors repeatedly state. Transcriptional studies would require nuclear run-off or, more recently, can be done with snRNAseq analysis. Taken together, these concerns undermine the authors' desire to rule out TTFL-based control that directly led them to implicate a PTTF-based model.

      We agree with the referees that more time points and a direct comparison between timeless and Orco mRNA levels should be included in this manuscript. We included these additional qPCR experiments and edited the manuscript to make clear that we measure transcript abundance, but we will not perform snRNAseq analysis due to time- and financial constraints.

      (3) The modelling presented is based on Orco as a ZT-dependent conductance tied to the cAMP oscillations that were reported by this group in the cockroach and from the presence and functionality in Manduca of homomeric Orco complexes that are devoid of tuning ORs. While these complexes have been generated in cell culture and other heterologous expression systems, as well as presumably exist in vivo in the Drosophila empty neuron and other tuning OR mutants, there is no evidence that these complexes exist in wild-type Manduca ORNs. While this doesn't necessarily undermine every aspect of their models, the authors should note the presence of Orco/OR complexes rather than Orco homomeric complexes.

      Our ELISAs found circadian oscillations in cAMP levels not only in antennae of the Madeira cockroach (Schendzielorz et al., 2014, 2012), but also in hawkmoth antennae (Schendzielorz et al., 2015). For clarification, we added the 2015 citation to the Modeling chapter in the Methods section.

      We agree with the referees that we cannot distinguish between Orco homo- and heteromers in the different compartments of our hawkmoth ORNs but we know that both are expressed in the pheromone-sensitive ORNs. Thus, as the referee suggests, we added text regarding the presence and localization of OR-Orco heteromers. Consistent data collected across different experiments (heterologous expression systems, primary cell cultures of hawkmoth ORNs, in vivo/in situ studies) support that Orco homomers are present in hawkmoth ORNs. In addition to co-expression of MsexOrco and MsexSNMP-1 with either MsexOr-1 or MsexOr-4 in a heterologous expression system, MsexOrco expression alone was already sufficient to increase intracellular Ca<sup>2+</sup> levels spontaneously as a result of its property as leaky, non-specific cation channel, and in response to VUAA1 application (Nolte et al., 2013). Both in developing hawkmoth pupae and differentiating primary cell cultures of hawkmoth ORNs, Orco expression started during a developmental time window where ORNs did not yet express pheromone receptors but where Orco affected spontaneous activity and intracellular Ca<sup>2+</sup> levels dependent on VUAA1 (Nolte et al., 2016). In vitro patch clamp studies of differentiating cultured hawkmoth ORNs during this time window of pupal development characterized ion channels/currents with properties of Orco as a leaky, non-specific cation channel/current that depends on protein kinase C and cyclic nucleotides (Dolzer et al., 2021, 2008; Krannich and Stengl, 2008; Stengl, 1993). Thus, Orco homomers are present in developing hawkmoth ORNs during a time window where ORNs already express spontaneous activity but they do not heteromerize with pheromone receptors. However, we do not know whether and in what ratio homo- and heteromers of Orco and ORs are present in the respective sensillum compartments of adult hawkmoths because all OR-specific antibodies tested did not work in immunocytochemical studies of hawkmoth antennae (Nolte et al., 2013; Stengl, 1994; Stengl and Hildebrand, 1990). Our hypothesis of differential distribution of Orco homomers in the some and dendrite compartment, and OR-Orco heteromers in the cilia is based on differential immunocytochemical localization of Drosophila ORs mainly in the cilia compartment (Benton et al., 2006).

      We clarified our manuscript accordingly.

      (4) Some aspects of the authors' models, most notably the decision to phase align/optimize their DD and OLC15 recordings, are likely to bias their interpretations.

      It is consensus that insects display daily and circadian rhythms in pheromone-dependent mating, odor-gated feeding, and egg-laying behavior that phase-locks to environmental rhythms, corresponding with daily/circadian rhythms of sensory neuron physiology (e.g., Merlin et al., 2007; Rymer et al., 2007; Schendzielorz et al., 2015, 2012). However, circadian rhythms can be easily masked by stress, like the disturbances during an experimentally very challenging long-term recording experiment over several days. In addition, we observed over the years in our animal raising facility that in 17:7 light-dark cycles the originally nocturnal hawkmoths M. sexta distribute their activity patterns over the course of the day, finding nocturnal as well as diurnal hawkmoths. Thus, light-dark cycles were not enough to ensure phase-synchronized behavioral rhythms, and it is very likely that the nocturnal hawkmoths, next to stress signals, rely heavily on pheromone/odor dependent synchronization as also found in other moth species (Ghosh et al., 2024). Because we focus on spontaneous activity and not on pheromone-dependent physiology in this study, we used isolated males that were never exposed to the female pheromones, taking phase dispersal into account. Therefore, it became necessary in free-running conditions to first determine the respective behavioral rhythm for each animal, and then to phase-align their activity patterns to allow for statistical analysis. Otherwise, circadian differences would average out in a phase-dispersed free-running population. As requested by the referees in point (7), we added RAIN to test for rhythmicity in each of our recordings and revised the manuscript accordingly.

      Furthermore, in preliminary experiments we briefly exposed hawkmoths to pheromone the night before the start of the experiment. However, we failed to obtain phase-synchronized spiking rhythms. Most likely, a circadian pattern of pheromone exposure would have been necessary as zeitgeber, which could not be used here due to long-term pheromone-dependent effects in spiking activity. These results are added as supplementary figure to Fig 3.

      (5) The tip recordings from long trichoid sensilla are critical aspects of this study. These recordings were carried out on upper sensillar tips located on the distal-most second annulus. Since there are approximately 80 annuli on the Manduca antennae, it is unclear whether the recordings are representative of the antennal response.

      We think the reviewers might have misinterpreted our description of the recording site. In the Methods, we state that we clip off the 20 most distal annuli (leaving a stump of about 60 annuli) and insert the reference electrode into the flagellum up to the second annulus from the cut end, i.e., the recording sites are located at 2/3 – 3/4 of the antenna length as seen from the head of the animal. We clarified this in the Methods section.

      In addition, our lab did show with antibody stainings against Orco that apparently all ORNs that innervate long and short trichoid sensilla along the whole flagellum express the same staining pattern (Nolte et al., 2016). Lee and Strausfeld (1990) mapped all types of antennal sensilla, and together with pheromone-dependent tip-recordings of Kaissling et al. (1989) it was shown that most of the male antennal sensilla are pheromone-sensitive long trichoid sensilla, with one of the two innervating ORNs always responding to bombykal, ensuring high sensitivity to pheromone detection. Furthermore, our patch clamp recordings of primary cell cultures of whole male antennae found largely overlapping ion channel populations across ORNs (review: (Stengl, 2010)). This would indicate that all ORNs, whether they express ORs sensitive to pheromone or general odorants, could potentially share the same Orco-dependent spontaneous activity rhythms. Furthermore, in our lab, different experimenters from different years that recorded from long trichoid sensilla on different annuli did not detect obvious differences in neither the spontaneous activity nor the pheromone responses (c.f., Dolzer et al., 2003; Gawalek and Stengl, 2018; Schneider et al., 2025). Thus, it is very likely that we are reporting a general encoding mechanism that is not locally restricted along the antennal flagellum and is very likely shared by all types of OR-Orco expressing ORNs.

      (6.1) The authors do not provide any data in support of their cAMP/cGMP-based Orco gating…

      There are publications supporting cyclic nucleotide gating of Orco in Drosophila, but only after previous phosphorylation via protein kinase C (PKC; review: (Wicher and Miazzi, 2021)). Since Orco is very conserved among insect species, it is likely that PKC- and cGMP/cAMP-dependent regulations are present for Orco in other insect species. To test this, we are currently characterizing second messenger-dependence of spontaneous spiking activity, which is the focus of a follow-up manuscript. Nevertheless, to provide more evidence for our hypothesis of the current manuscript, we added a new set of tip-recording experiments that demonstrate cAMP-dependent gating of Orco. Because of the addition of this figure, we merged figures 8-10 into Figure 8 and added the cAMP data as Figure 9.

      (6.2) … and the PTTF model proposed is somewhat disappointing.

      For a detailed introduction of our PTFL membrane clock hypothesis please see our opinion paper that we refer to in the manuscript (Stengl and Schneider, 2024). We added clarification of how Orco activation can influence cAMP levels. A more elaborate PTFL clock model including many more of the identified ion channels in hawkmoth ORNs is the focus of another manuscript to come.

      (6.3) The model seems to be influenced by their long-held proposal that insect olfactory signaling has a critical metabotropic component involving cyclic nucleotides, PKC, etc, a view that may be influenced by the use of Orco homomeric complexes generated in HEK cells.

      Indeed, we propose a metabotropic pheromone-transduction cascade, which in moths and cockroaches is based on G-protein-mediated activation of phospholipase C but not on adenylyl cyclase activation. Our hypothesis is not influenced by HEK cell heterologous expression studies of Orco but is supported by our own work comparing in vivo tip recordings of intact hawkmoths with patch clamp experiments on hawkmoth primary cell cultures of olfactory receptor neurons, which are able to respond to their species-specific pheromones in vitro (Schneider et al., 2025; Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In addition, a multitude of publications by other laboratories with in vivo and in vitro studies using physiological, genetic, and immunocytochemical assays all support a metabotropic signal transduction cascade in insect olfaction (Stengl, 2010; Stengl and Funk, 2013; Takagi et al., 2025; Wicher and Miazzi, 2021). In contrast, the hypothesis suggesting a solely ionotropic pheromone- and general odor-dependent transduction cascade for all insect species is based on very sparse experimental evidence, based primarily on heterologous expression studies such as HEK cells that lack the insect’s WT molecular surroundings, and thus, cannot predict OR-Orco function in vivo. Furthermore, the ionotropic hypothesis is heavily based upon the argument that an inverse 7TM receptor cannot couple to G-proteins, which lacks careful backup via biochemical and structural studies. In addition, the ionotropic hypothesis lacks support via carefully performed physiological in vivo studies in different insect species that paid attention to analysis of the distinct kinetic components of ORN´s odor/pheromone responses and that employ physiological concentrations and durations of odor/pheromone stimuli (please see our most recent publication by Schneider et al. (2025)). We added references to the possible odor transduction mechanisms to the introduction.

      (6.4) Nevertheless, structural studies on Orco do not support a cyclic nucleotide binding site, although PKC-based phosphorylation has been implicated in the fine-tuning/adaptation of olfactory signaling.

      While structural studies did not find evidence for conserved known cyclic nucleotide binding sites on Orco, this does not exclude the presence of indirect cAMP effects via e.g., Orco subunits complexing with other molecules under direct cAMP control, such as other ion channel subunits. Furthermore, it does not exclude so far unknown binding sites, or via sites that fold out only after a specific sequence of previous phosphorylations of the many phosphorylation sites on Orco. Indeed, physiological studies in Drosophila presented evidence for cyclic nucleotide dependence of Orco after previous PKC-dependent phosphorylation (Getahun et al., 2013). Our ongoing in vivo experiments in hawkmoths further corroborate a zeitgeber time-dependent PKC- and cyclic nucleotide-dependent modulation of Orco. These detailed studies will be published in a follow-up publication. In the revised version of this manuscript, we added tip-recording experiments that indicate cAMP involvement in Orco gating (new Figure 9).

      (7) Because only 5/11 LD and 7/10 DD animals showed daily rhythms, with averages lacking clear daily modulation, the methods are not sufficiently reliable enough to reveal novel underlying mechanisms of circadian rhythm generation. The reported results are therefore not yet reliable or quantifiable. To quantify their results, the authors should apply tests for circadian rhythmicity using methods such as RAIN, JTK CYCLE, MetaCycle, or Echo. The use of FFT and Wavelet is applauded, but these methods do not have tests of significance for rhythms and can be biased when analyzing data in which there could only be 1-3 circadian cycles. Because the conclusions appear to be based on 11-12 neurons that were recorded for 2-4 days, the reader is concerned that the methods are not yet perfected to provide strong evidence for circadian regulation of spontaneous firing of ORNs. The average data (e.g., Figure 3Bii and 3Cii) highlight the apparent lack of daily rhythms. In summary, the results would be more compelling if more than 50% of the recordings had significant circadian amplitudes and with similar periods and phases.

      The long-term tip-recordings of intact hawkmoths are very challenging and take a very long time to accomplish, thus, we are very happy that we succeeded in obtaining so many of them (N=40). We are thankful to the reviewers’ suggestion to use RAIN since this analysis revealed circadian rhythms in 7 of 11 LD recordings, 8 of 12 DD recordings, and 2 of 12 OLC15 recordings. Please see also our response to (4) above, commenting the phase-dispersal of activity rhythms observed in our experiments, as well as in the behavior of hawkmoth males in the mating cage.

      (8) The statement that circadian patterns of ORN firing are lost with the Orco antagonist (OLC15) is not strongly supported. The manuscript should be revised to quantify how Orco changed circadian amplitude in the 12 recorded neurons. Measures of circadian amplitude can avoid confusing/vague statements like Line 394 “low and high frequency bands appeared to merge during the activity phase around ZT 0 in the animals that showed clear circadian rhythms (N = 5 of 11 in LD)”. The conclusion that Orco blocks circadian firing appears to be contradicted by Figure 6, which indicates that ~6 of these neurons had circadian periods detected by wavelet. The manuscript would be strengthened with details about the specificity and reproducibility of the Orco antagonist. The authors quantify the gradual decrease in firing with the slope of a linear fit to estimate how the “effectiveness [of OLC15] increased over time.” They conclude that the drug “obliterated circadian rhythms and attenuated the spontaneous activity in several, but not all experiments (N = 8 of 12).” The report would be greatly strengthened with corroborating data from additional Orco antagonists and additional doses of OLC15 (the authors use only 50 uM OLC15).

      According to the valuable suggestions of the referees, we used RAIN to detect circadian rhythms in the spiking attributes in each individual animal. Since only 2 of 12 animals displayed a circadian rhythm in OLC15, statistical comparison of circadian amplitudes is not possible. We revised the results section accordingly and added to the figure legend to make it clearer that the heat maps in Fig 5 are representative from one animal each and not averages across animals.

      As the reviewer states correctly in (7), wavelet results of circadian rhythmicity must be interpreted carefully because of the low number of circadian cycles in ~3-4 day recordings. Since the heatmaps in Figure 5 visually revealed the presence of ultradian rhythms, the main focus of the wavelet analysis in Figure 6 is in the detection and quantification of ultradian periods up to 20 h.

      We revised the Methods section to include references to previous experiments that characterized the effect of different doses of OLC15 and other Orco antagonists and agonists in M. sexta antennae (Nolte et al., 2016). Please see also our response to (1).

      (9) The manuscript includes several statements that are more speculation than conclusion. For example, there is no evidence for tuning or plasticity in this report. Statements like the following should be removed or addressed with experiments that show changes in odor response specificity or sensitivity: "ORN signalosomes are highly plastic endogenous PTFL clocks comprising receptors for circadian and ultradian Zeitgebers that allow to tune into internal physiological and external environmental rhythms as basis for active sensing." (Discussion Line 622). The paper concludes that (line 380) "mean frequency of spontaneous spiking and the frequency of bursting expressed daily modulation, and are both most likely controlled via a circadian clock that targets the leak channel Orco." This is too bold given the available results.

      We revised the manuscript accordingly and clarified which statements are supported via published evidence and which are predictions based upon our novel hypothesis published in our opinion paper (Stengl and Schneider, 2024).

      (10.1) Because Orco conductance is modulated by cyclic nucleotides, it remains highly plausible that circadian regulation occurs upstream at the level of signaling pathways (e.g., calcium, calcium-binding proteins, GPCRs, cyclases, phosphodiesterases).

      We agree with the referees that it is very likely that there are multiple layers of interconnected feedback cycles that control Orco localization and activity. Our novel hypothesis suggests interlocked TTFL and PTFL control of physiological circadian rhythms, not strictly hierarchical TTFL control, which would require a daily turnover of membrane proteins and transcriptional control via the established TTFL clock in insect ORNs. We are currently searching for TTFL control at all levels of odor/pheromone transduction using ZT-dependent transcriptomics in combination with qPCR and single-nucleus transcriptomics, involving also all the molecules suggested by the referees. These studies are ongoing, are very time- and money-consuming, and are beyond the scope of this manuscript. However, we added a set of experiments to this manuscript in which we demonstrate that the effect of increased cAMP on the spontaneous spiking activity is mediated by Orco (new Figure 9).

      (10.2) The possibility that circadian oscillations of cyclic nucleotides are generated by the canonical TTFL mechanism has not been excluded. In fact, extensive work in Drosophila has demonstrated that the TTFL-based molecular clock proteins are required for circadian rhythms in olfaction.

      Our experiments that test circadian TTFL control at different levels of the cAMP transduction cascade in hawkmoth antennae are on the way and are part of another publication. In section 6.2 we already stated that our experiments do not exclude that Orco is under indirect control of the TTFL. We revised our discussion accordingly.

      The experiments published for TTFL dependent control of Drosophila olfaction that we are aware of (Krishnan et al., 1999; Tanoue et al., 2004) do not exclude interlinked PTFL and TTFL clocks. Krishnan et al. (1999) demonstrated that the TTFL clock in antennal olfactory receptor neurons correlates with circadian rhythms in odor responses measured in electroantennogram (EAG) recordings, not in single sensillum recordings as in our experiments. EAG recordings comprise not only voltage responses of the olfactory sensory neurons but also voltage changes generated in non-neuronal antennal cells such as trichogen and tormogen cells that built the transepithelial potential gradient via vATPases that generates the high K<sup>+</sup> concentration in the sensillum lymph (Jain et al., 2024; Klein, 1992; Thurm and Küppers, 1980). In addition, EAG recordings most likely contain responses of afferent neurons originating from somata in the brain that maintain central control of the antennae. Thus, EAG recordings are difficult to interpret.

      (11) A defining feature of circadian oscillators is the feedback mechanism that generates a time delay (e.g., PERIOD/TIMELESS repressing their own transcription). While the authors describe how cyclic nucleotides can regulate Orco conductance, they do not provide a convincing explanation of how Orco activity could, in turn, feed back into the proposed PTFL to sustain oscillations. For these reasons, the authors should consider:

      (a) Providing a broader discussion of non-TTFL models of circadian rhythms (e.g., redox cycles, post-translational modifications).

      We revised the discussion accordingly.

      (b) Reassessing Orco expression using a higher-resolution temporal sampling ({greater than or equal to}6 timepoints per 24 h).

      We added those experiments to the revised version of the manuscript (see our response to (2)).

      (c) Clarifying or revising the PTFL model to explicitly address how feedback would be achieved. Alternatively, the data may be more consistent with Orco conductance rhythms being regulated by post-translational mechanisms downstream of the canonical TTFL oscillator, as suggested by the Drosophila olfactory system literature.

      We added possible negative feedback elements to the Discussion to explain how our proposed PTFL could in principle work independent of TTFL clock.

      Minor weaknesses:

      (1) The authors should compare the firing patterns of ORN neurons to the bursts, clusters, and packets of retinal efferent spikes reported in Liu JS and Passaglia CL (2011; JBR). By comparing measures in moths to measures in Limulus, the authors might be able to address the question: Is the daily firing pattern of ORN neurons likely a conserved feature of circadian control of sensory sensitivity?

      We have revised the discussion accordingly.

      (2) The methods need further details. For example, it is unclear if or how single neuron activity was discriminated and whether the results were compromised by the relatively large environmental fluctuations in temperature (21-27oC), humidity (35-60%), or other cues known to modulate spontaneous firing.

      These large fluctuations stem from doing experiments at different seasons (higher temperature and humidity in the summer months, lower temperature and humidity in winter). Throughout each individual experiment, conditions were stable. We clarified the Methods section accordingly.

      Recommendations for the authors:

      The authors should post the code for their computational model to a repository like GitHub.

      The code for the computational model is now available at https://github.com/a-c-schneider/VijayanForlinoEtAl2025_Model.git

      References

      Benton R, Sachse S, Michnick SW, Vosshall LB. 2006. Atypical Membrane Topology and Heteromeric Function of Drosophila Odorant Receptors In Vivo. PLOS Biology 4:e20. DOI: https://doi.org/10.1371/journal.pbio.0040020

      Chen S, Luetje CW. 2012. Identification of New Agonists and Antagonists of the Insect Odorant Receptor Co-Receptor Subunit. PLOS ONE 7:e36784. DOI: https://doi.org/10.1371/journal.pone.0036784

      Dolzer J, Fischer K, Stengl M. 2003. Adaptation in pheromone-sensitive trichoid sensilla of the hawkmoth Manduca sexta. Journal of Experimental Biology 206:1575–1588. DOI: https://doi.org/10.1242/jeb.00302

      Dolzer J, Krannich S, Stengl M. 2008. Pharmacological Investigation of Protein Kinase C- and cGMP-Dependent Ion Channels in Cultured Olfactory Receptor Neurons of the Hawkmoth Manduca sexta. Chemical Senses 33:803–813. DOI: https://doi.org/10.1093/chemse/bjn043

      Dolzer J, Schröder K, Stengl M. 2021. Cyclic nucleotide-dependent ionic currents in olfactory receptor neurons of the hawkmoth Manduca sexta suggest pull–push sensitivity modulation. European Journal of Neuroscience 54:4804–4826. DOI: https://doi.org/10.1111/ejn.15346

      Gawalek P, Stengl M. 2018. The Diacylglycerol Analogs OAG and DOG Differentially Affect Primary Events of Pheromone Transduction in the Hawkmoth Manduca sexta in a Zeitgebertime-Dependent Manner Apparently Targeting TRP Channels. Frontiers in Cellular Neuroscience 12:218. DOI: https://doi.org/10.3389/fncel.2018.00218

      Getahun MN, Olsson SB, Lavista-Llanos S, Hansson BS, Wicher D. 2013. Insect Odorant Response Sensitivity Is Tuned by Metabotropically Autoregulated Olfactory Receptors. PLOS ONE 8:e58889. DOI: https://doi.org/10.1371/journal.pone.0058889

      Ghosh S, Suray C, Bozzolan F, Palazzo A, Monsempès C, Lecouvreur F, Chatterjee A. 2024. Pheromone-mediated command from the female to male clock induces and synchronizes circadian rhythms of the moth Spodoptera littoralis. Current biology 34:1414-1425.e5. DOI: https://doi.org/10.1016/j.cub.2024.02.042, PMID: 38479388

      Jain K, Prelic S, Hansson BS, Wicher D. 2024. Expression of Drosophila melanogaster V-ATPases in Olfactory Sensillum Support Cells. Insects 15:1016. DOI: https://doi.org/10.3390/insects15121016

      Jones PL, Pask GM, Rinker DC, Zwiebel LJ. 2011. Functional agonism of insect odorant receptor ion channels. Proceedings of the National Academy of Sciences 108:8821–8825. DOI: https://doi.org/10.1073/pnas.1102425108

      Kaissling KE, Hildebrand JG, Tumlinson JH. 1989. Pheromone receptor cells in the male moth Manduca sexta. Archives of Insect Biochemistry and Physiology 10:273–279. DOI: https://doi.org/10.1002/arch.940100403

      Klein U. 1992. The insect V-ATPase, a plasma membrane proton pump energizing secondary active transport: immunological evidence for the occurrence of a V-ATPase in insect ion-transporting epithelia. Journal of Experimental Biology 172:345–354. DOI: https://doi.org/10.1242/jeb.172.1.345

      Krannich S, Stengl M. 2008. Cyclic Nucleotide-Activated Currents in Cultured Olfactory Receptor Neurons of the Hawkmoth Manduca sexta. Journal of Neurophysiology 100:2866–2877. DOI: https://doi.org/10.1152/jn.01400.2007

      Krishnan B, Dryer SE, Hardin PE. 1999. Circadian rhythms in olfactory responses of Drosophila melanogaster. Nature 400:375–378. DOI: https://doi.org/10.1038/22566

      Lee JK, Strausfeld NJ. 1990. Structure, distribution and number of surface sensilla and their receptor cells on the olfactory appendage of the male mothManduca sexta. Journal of Neurocytology 19:519–538. DOI: https://doi.org/10.1007/BF01257241

      Merlin C, Lucas P, Rochat D, François M-C, Maïbèche-Coisne M, Jacquin-Joly E. 2007. An Antennal Circadian Clock and Circadian Rhythms in Peripheral Pheromone Reception in the Moth Spodoptera littoralis. Journal of Biological Rhythms 22:502–514. DOI: https://doi.org/10.1177/0748730407307737

      Nolte A, Funk NW, Mukunda L, Gawalek P, Werckenthin A, Hansson BS, Wicher D, Stengl M. 2013. In situ Tip-Recordings Found No Evidence for an Orco-Based Ionotropic Mechanism of Pheromone-Transduction in Manduca sexta. PLOS ONE 8:e62648. DOI: https://doi.org/10.1371/journal.pone.0062648

      Nolte A, Gawalek P, Koerte S, Wei H, Schumann R, Werckenthin A, Krieger J, Stengl M. 2016. No Evidence for Ionotropic Pheromone Transduction in the Hawkmoth Manduca sexta. PLOS ONE 11:e0166060. DOI: https://doi.org/10.1371/journal.pone.0166060

      Rymer J, Bauernfeind AL, Brown S, Page TL. 2007. Circadian rhythms in the mating behavior of the cockroach, Leucophaea maderae. Journal of Biological Rhythms 22:43–57. DOI: https://doi.org/10.1177/0748730406295462, PMID: 17229924

      Schendzielorz J, Schendzielorz T, Arendt A, Stengl M. 2014. Bimodal Oscillations of Cyclic Nucleotide Concentrations in the Circadian System of the Madeira Cockroach Rhyparobia maderae. Journal of Biological Rhythms 29:318–331. DOI: https://doi.org/10.1177/0748730414546133

      Schendzielorz T, Peters W, Boekhoff I, Stengl M. 2012. Time of Day Changes in Cyclic Nucleotides Are Modified via Octopamine and Pheromone in Antennae of the Madeira Cockroach. Journal of Biological Rhythms 27:388–397. DOI: https://doi.org/10.1177/0748730412456265

      Schendzielorz T, Schirmer K, Stolte P, Stengl M. 2015. Octopamine Regulates Antennal Sensory Neurons via Daytime-Dependent Changes in cAMP and IP3 Levels in the Hawkmoth Manduca sexta. PLOS ONE 10:e0121230. DOI: https://doi.org/10.1371/journal.pone.0121230

      Schneider AC, Schröder K, Chang Y, Nolte A, Gawalek P, Stengl M. 2025. Hawkmoth Pheromone Transduction Involves G-Protein–Dependent Phospholipase Cβ Signaling. eNeuro 12:ENEURO.0376-24.2024. DOI: https://doi.org/10.1523/ENEURO.0376-24.2024, PMID: 39880675

      Stengl M. 2010. Pheromone Transduction in Moths. Frontiers in Cellular Neuroscience 4:133. DOI: https://doi.org/10.3389/fncel.2010.00133

      Stengl M. 1994. Inositol-trisphosphate-dependent calcium currents precede cation currents in insect olfactory receptor neurons in vitro. Journal of Comparative Physiology A 174:187–194. DOI: https://doi.org/10.1007/BF00193785

      Stengl M. 1993. Intracellular-Messenger-Mediated Cation Channels in Cultured Olfactory Receptor Neurons. Journal of Experimental Biology 178:125–147. DOI: https://doi.org/10.1242/jeb.178.1.125

      Stengl M, Funk NW. 2013. The role of the coreceptor Orco in insect olfactory transduction. Journal of Comparative Physiology A 199:897–909. DOI: https://doi.org/10.1007/s00359-013-0837-3

      Stengl M, Hildebrand JG. 1990. Insect olfactory neurons in vitro: morphological and immunocytochemical characterization of male-specific antennal receptor cells from developing antennae of male Manduca sexta. Journal of Neuroscience 10:837–847. DOI: https://doi.org/10.1523/JNEUROSCI.10-03-00837.1990, PMID: 2319305

      Stengl M, Schneider AC. 2024. Contribution of membrane-associated oscillators to biological timing at different timescales. Frontiers in Physiology 14:1243455. DOI: https://doi.org/10.3389/fphys.2023.1243455

      Takagi S, Abuin L, Mermet J, Lee D, Benton R. 2025. A GPCR signaling pathway in insect odor detection. DOI: https://doi.org/10.1101/2025.10.03.680299

      Tanoue S, Krishnan P, Krishnan B, Dryer SE, Hardin PE. 2004. Circadian Clocks in Antennal Neurons Are Necessary and Sufficient for Olfaction Rhythms in Drosophila. Current Biology 14:638–649. DOI: https://doi.org/10.1016/j.cub.2004.04.009, PMID: 15084278

      Thurm U, Küppers J. 1980. Epithelial physiology of insect sensilla. In: Locke M, Smith DS (Eds). Insect Biology in the Future. Academic Press. p. 735–763. DOI: https://doi.org/10.1016/B978-0-12-454340-9.50039-2

      Wicher D, Miazzi F. 2021. Functional properties of insect olfactory receptors: ionotropic receptors and odorant receptors. Cell and Tissue Research 383:7–19. DOI: https://doi.org/10.1007/s00441-020-03363-x

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This rigorous and creative study uses an elegant combination of metabolomics, transcriptomics, and budding yeast molecular genetics to discover that (i) activating AMPK to maintain mitochondrial respiration fueled by cytosolic Acetyl CoA and (ii) increasing fatty acid synthesis independent of respiration drive independent pathways that increase the fitness of replicatively-aged budding yeast cells, albeit without increasing their lifespan. This work will be of interest to scientists in the field of aging and metabolism. Some clarifications in the text would address the following concerns, which would increase the impact of the study:

      (1) What does activation of AMPK (via PGDP-Sak1 expression) do to the replicative lifespan? How many bud scars, in general, do the subpopulations that are older - yet have less Tom70 (increased mitochondrial fitness) - have, after the 48 hrs time point that they are examining? How many divisions occurred in this 48hr time period - i.e. is it long enough to have all cells reach the end of their replicative lifespan? This information is important to rule out that a subset of the mutant cells just divided faster and hence had more divisions within 48 hrs (growing faster and living longer are different things). Having identical growth curves doesn't indicate per se that they all divide at the same rate, as there may be a subpopulation that divides faster and a subpopulation that doesn't grow so well.

      Increasing AMPK activity increases replicative lifespan [PMID: 25869125], but given our finding that AMPK activation splits the population, such replicative lifespan assays are hard to interpret. Bud scar counts have a similar issue. Hence we restricted the lifespan and bud scar analyses to wt and A2A which are more homogenous (Figures S2 B and E). A2A cells at 48h have ~25% more bud scars than wt cells. Yes, by 48h most of the cells have lost viability (Figure 2E). The reviewer is correct that you can't properly compare the lifespan curves if the cells divide at different rates, hence our follow-up test of wt at 48h vs A2A at 40h viability after we had confirmed that these timepoints captured cells at equivalent replicative ages (Figure 2D,E). This shows that viability of A2A is slightly lower than wt at matched age, indicating a slightly shorter lifespan.

      (2) A2A cells do not have an extended replicative lifespan (RLS) but show an increase in the "low senescence" population (Figure 2). If the cells are not becoming senescent, why don't they have longer RLS? Not having a longer lifespan seems inconsistent with the statement that "bud scar counting confirmed that A2A cells reach a higher age than wild type", which comes back to how many times the cells can divide in the 48hr timepoint studied and their rate of cell division? Also, the lifespan curve shown is plotted against time, not cell division number, which does not take into account different division times of cells within the population (described above). It would be much more useful to show standard lifespan curves showing cell division numbers per lifespan per cell.

      Our observation that cells can reach the end of life without senescing is consistent with other studies that have studied the life course of individual cells by microscopy [PMID: 31291577, 32675375]. These studies always highlight some proportion of the cells that reach the end of life with no or minimal senescence, though this fraction varies with the experimental system. The question of why cells lose viability without senescing is a complete unknown in the field, but reflects a wider lack of consensus as to why yeast lose viability with replicative age.

      We are wary about making strong statements on lifespan for exactly the reason the reviewer picks out. In liquid culture we can only assess viability over time, and it is clear from the comparison of liquid and solid media lifespans performed by the Gottschling lab [PMID: 19652178] that culture system has a huge effect on lifespan, with cells in classical microdissection-based lifespan assays living far longer than they do in liquid. This of course means that classical microdissection assays are not very useful for A2A so we are left with an unsatisfactory approximation. We have therefore restricted our conclusion on lifespan to simply say that lifespan of A2A cells is not extended which our data in Figures 2D,E,S2B does support (see also answer to Q1), and therefore with the majority of A2A cells showing low senescence marks and high fitness at 48h we can conclude that lifespan and fitness loss must be separable.

      We will note these limitations of lifespan measurements in the manuscript.

      (3) Increased "fitness" of the old cells is implied from the increased size of the colonies that the old cells can make. However, this is a measure of the fitness of the daughters per se, not the old mother cells. Are the old mothers just passing on healthier mitochondria and more lipids to the daughters, such that they can divide more times? If the aged cells have an "increased fitness", why don't they divide more times themselves (i.e. live longer?).

      Yes, colony growth speed is defined by daughter cell replication, and as long as the daughters and subsequent generations divide at the same rate irrespective of whether they come from a young or old mothers then the size of the colony after 24 hours varies based on the time it took the initial mother to produce a daughter. This is what the assay really measures. We note that aged wildtype mothers often do not divide at all in the first 24 hours after being put on an agar plate (hence the tiny reported colony size), even though they do eventually produce a daughter which then forms a colony, whereas A2A cells tend to produce the first daughter rapidly whether young or old. It is known that daughters of aged wildtype mothers also divide slower, which will also contribute to differences in colony size, and this may well result from a lipid and/or mitochondrial contribution, but the primary driver of colony size in 24 hours is the time the mother took to initially divide. We will add this detail to the manuscript.

      As noted above, the mechanistic basis of lifespan is unknown, but although senescence can shorten lifespan, our work and that of others shows that lifespan is still limited in the absence of senescence.

      (4) The statement is made that "these experiments define two classes of aging cells with distinct metabolic needs, coherent with the model of two aging trajectories previously proposed (referencing Nan Hao's work)". However, the big difference here is that in Nan Hao's work, their two aging trajectories influenced the length of lifespan, but that does not appear to be the case here. That distinction should be made clear. Perhaps the authors could also speculate as to why the A2A yeast stops dividing after presumably the same number of cell divisions, even though they have an activated AMPK and activated fatty acid synthesis pathway.

      We will add this distinction. As noted above, we are wary of making strong statements regarding lifespan as the assays we can do in liquid culture are limited. We are therefore similarly wary about speculating about causes for the lack of lifespan difference because in reality all we can do is rule out a big effect. We would love to speculate on why the A2A cells don't have an extended lifespan, but at this point we don't have any good ideas on this point!

      (5) I am a bit confused by the use of the word "senescence" by this lab here and in their previous growth on galactose studies. If yeast don't senesce, which is usually defined as an irreversible arrest of the cell cycle where cells stop dividing, shouldn't the yeast that do not senesce still be dividing and hence have a longer lifespan? Should a different term be used rather than senescence? Such as "fitness late in life". The authors giving their definition of senescence may help reduce this apparent contradiction.

      We completely agree, this is confusing and noted this distinction in the Introduction. Use of the term senescence to mean a loss of fitness late in life in yeast stems from the classical definition of senescence as applied to whole organisms. However, the term senescence as applied to cells has a more specific meaning in terms of the cell cycle as the reviewer notes. As an individual S. cerevisiae is both a cell and an organism, the terminology clashes. However, the marker we largely employ (Tom70-GFP) which in our hands is a very good proxy for fitness was originally defined as marking the senescence entry point (SEP), so overall we feel we can't avoid the term.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors investigate how cytosolic acetyl-CoA metabolism influences replicative aging in budding yeast. They propose that acetyl-CoA regulates aging through three major pathways: (1) mitochondrial transport to support mitochondrial function, (2) fatty acid synthesis, and (3) global protein acetylation. The data show that AMPK activation promotes mitochondrial import of acetyl-CoA and partially mitigates mitochondrial decline in a subset of aging cells.

      Furthermore, the engineered A2A strain, which enhances mitochondrial acetyl-CoA utilization while relieving inhibition of fatty acid synthesis, increases the proportion of cells exhibiting a "low senescence" phenotype.

      Overall, this is a thoughtful and potentially impactful study that advances our understanding of metab to olic control of aging. Addressing the points below, particularly by refining interpretations and, where feasible, incorporating additional analyses, will further strengthen the manuscript and its conclusions.

      Strengths:

      The study has several notable strengths. It addresses an important question by shifting the focus from lifespan to preservation of late-life fitness, which is highly relevant to aging biology. The work integrates metabolic, genetic, and functional analyses to link cytosolic acetyl-CoA flux with distinct aging outcomes, and the engineering of the A2A strain provides a clear and elegant demonstration of how coordinated pathway modulation can improve cellular fitness.

      Weaknesses:

      (1) While the manuscript focuses on mitochondrial transport and fatty acid synthesis, cytosolic acetyl-CoA is also a key regulator of histone acetylation and chromatin silencing. It would strengthen the study to consider whether acetyl-CoA depletion contributes to improved fitness through enhanced rDNA silencing. Given the well-established role of rDNA instability in yeast aging, additional experiments examining rDNA silencing and stability would be valuable. For example, monitoring rDNA copy number changes (not necessarily ERCs) under AMPK activation, oleic acid supplementation, and in the A2A strain, similar to approaches used in the authors' prior work, would help clarify whether chromatin regulation contributes to the observed phenotypes.

      We have data addressing this point that we will add to the manuscript. In short, we see no difference in gene expression from Sir2-repressed sub-telomeric regions or MAT loci, but the genome-wide gene expression dysregulation associated with age is partially suppressed in PGPD-SAK1. However, A2A does not suppress this further, so it is not critical for the suppression of senescence in A2A though we are following this up. ERC accumulation is higher in A2A at 48h, consistent with the cells being older, meaning that ERCs are unlinked to senescence onset as we have previously reported. There is a strong upregulation of transcripts from Sir2-repressed rDNA intergenic spacers with age in all genotypes, but we attribute this simply to the copy number increase of these regions on ERCs rather than a defect in silencing. We have previously looked for heritable changes in rDNA copy number arising during ageing and found (to our surprise) absolutely nothing, so we don't expect any changes under these conditions.

      (2) The current data do not fully distinguish whether AMPK activation and oleic acid supplementation act on distinct subpopulations of aging cells. An alternative explanation is that oleic acid supplementation enhances mitochondrial function and acts additively with AMPK activation, thereby increasing the fraction of cells in the "low senescence" state. Since this distinction is not central to the main conclusions, I suggest softening the language around subpopulation specificity. Emphasizing instead that the A2A strain coordinately modulates multiple branches of acetyl-CoA metabolism to improve late-life fitness would maintain the strength of the central message without overinterpretation.

      We agree that oleic acid and the lipids produced downstream of Acc1 in A2A may improve late life fitness via enhanced mitochondrial function, and in support of this Oxygen Consumption Rate is marginally (though significantly) higher in A2A than PGPD-SAK1. We will add this data to the manuscript. However, we disagree with the interpretation of an additive effect as we report throughout the study that AMPK activation and lipid biosynthesis/supplementation affect different sub-populations of cells. We do not observe populations of intermediate senescence cells, rather by flow cytometry and fitness assays we observe individual cells in binary low senescence or high senescence states.

      (3) The manuscript proposes that lipid starvation and excess acetyl-CoA are major drivers of senescence in distinct subpopulations of wild-type aging cells. This conclusion is not yet fully supported by the presented data. Direct measurements of age-dependent divergence in acetyl-CoA and fatty acid levels at the single-cell level would be needed to substantiate this model. Based on the current evidence, a more conservative interpretation would be that aging cells exhibit differential sensitivity to perturbations in acetyl-CoA and lipid metabolism. Accordingly, I recommend revising the statement in the Abstract ("We further implicate lipid starvation and excess acetyl coenzyme A availability as major drivers of senescence...") and the corresponding discussion text to better align with the data.

      We agree and will adjust the abstract to make it clearer that the lipid starvation / excess acetyl coA interpretation is a model.

      Reviewer #3 (Public review):

      Summary:

      These findings suggest that PGPD-SAK1 yeast show a subpopulation with lowered TOM70-GFP expression in high bud scar staining aged cells. Deletion of CAT2 or MLS1 reduces this effect. A PGPD-SAK1 acc1S1157A double mutant (called "A2A" here) shows an even larger effect of lowered tom70 expression in high bud scar staining aged cells. Utilization of various additional mutants involved in acetyl-CoA transport, carnitine shuttle, respiration, etc., leads the authors to conclude that these shifts in TOM70-GFP in aged cells are linked to the AMPK-fatty acid metabolic regulatory system.

      Strengths:

      These extensive and clearly described experiments reveal interesting changes in TOM70-GFP intensity in subsets of aged yeast in several mutants eventually identified as linked to the AMPK-fatty acid metabolic regulatory system.

      Weaknesses:

      (1) 3 biological replicates for mRNASeq is low.

      Thank you for pointing this out. We performed another replicate after posting the initial preprint but didn’t update the figure in the eLIFe-reviewed version. We will add this to the scatter plots and analysis in Figure 1, the findings have not changed.

      (2) While "Traditional conceptions of ageing implicate a progressive accumulation of damage leading to systemic degradation in performance until death, with evolutionary pressures acting to maximise early life fitness and fecundity at the expense of ageing health." is tangential perhaps to the data and conclusions of the study, both claims of this sentence are at best controversial, and the manuscript is no weaker for their omission.

      We actually feel that this sentence is very important to the message of the manuscript, which is that ageing does not necessarily have to involve a loss of fitness before death. Ageing is often described as the progressive wearing out of components leading to decline and death (with an old car often used as an analogy); in the ageing field this is certainly controversial, but outside the field this remains the normal understanding. We think it is important to state this widely held viewpoint with which our findings are hard to reconcile.

      Our interpretation that yeast are bet-hedging as a population growth strategy and this drives ageing in the long term is a classic antagonistic pleiotropy - we will add this term (from the citation that is already in the manuscript) and clarify in the discussion to make it obvious why we are introducing this concept in the introduction.

      (3) The statement that "Here, we determine the basis of senescence and fitness loss in replicatively ageing yeast" is a bit strong as a summary of the present careful work presented here. If the authors had created yeast mutants that retained fitness indefinitely, this would be a more appropriate strength of claim to summarize the work.

      Indeed - we will refine this sentence.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work the authors investigate the molecular dynamics of MinD, a component of the Bacillus subtilis Min system, in vitro and in vivo. In Escherichia coli the Min system is highly dynamic and displays rapid pole to pole oscillation whereby a time average minimum of the Min proteins at mid cell is established. However, in B. subtilis, this is not the case, and there is no MinE present. MinD in B. subtilis dynamically relocalizes from the poles to division sites, and binds to MinC and MinJ, which mediates its interaction with DivIVA. This paper reports biochemical characterization of B. subtilis MinD in vitro and dynamics of MinD variants in vivo, providing mechanistic insight into the mechanism of dynamic localization.

      Strengths:

      In the current study, the authors perform a detailed biochemical characterization of the in vitro ATPase activity of MinD and demonstrate that rapid hydrolysis is elicited by adding phospholipids. They further show using a collection of substitution mutants of MinD that both monomers and dimers bind to the membrane, and ATP occupancy changes the on and off rates. Identification, quantification, and tracking of discrete Halo-MinD populations was nicely done and showed that mutations in MinD alter dynamic localization, correlating with PL binding on and off rates in vitro.

      In the revised manuscript, the authors now demonstrate localization and tracking data for minC and minJ deletion strains, which suggest that MinJ impacts MinD membrane cycling, but MinC does not. Additional in vitro work showed that the PDZ domain of MinJ modifies MinD ATP hydrolysis rates, and the authors propose that MinJ may promote MinD dimer formation.

      Weaknesses of the revised version: No major weaknesses.

      We thank this reviewer for the positive evaluation of our manuscript and the precise summary of our findings.

      Reviewer #2 (Public review):

      Summary:

      Feddersen & Bramkamp determined important characteristics of how MinD protein binds/dissociates to/from the membrane, and dimerizes in relation to its ATPase activity. The presented data clearly shows the differences in function of MinD homologs from B. subtilis and E. coli.

      Strengths:

      The work presents well-executed experiments that lead to interesting conclusions and a new model of how Min system works during B. subtilis mid-cell division. Importantly, this model is supported by in vitro characterization of well-chosen mutants in the functional domains of MinD. Outstandingly, most of the in vitro data are confirmed by single-molecule localization microscopy.

      Weaknesses:

      The authors immobilized liposomes, for which they used E. coli total lipids, to measure ATPase activity and liposome association and dissociation of B. subtilis MinD. For these experiments would be more suitable to use B. subtilis total lipids as more biologically relevant data could be gained.

      Although the work is in detail and nicely compares the function of B. subtilis Min system with E. coli Min system, it lacks the comparison of the Min system function in other rod-shaped Gram-positive bacteria. I would suggest including in the Discussion the complexity of other Min systems. Especially, this complexity is seen in other rod-shaped and spore formers such as Clostridial species in which one of these Min systems or both are present, an oscillating E. coli Min system type and more static as in B. subtilis.

      Comments on revisions:

      I'm satisfied with the authors response to my private recommendation points. However, I thought that they would also respond to my points mentioned in Public Review part, weaknesses as shown above and update the revised version accordingly.

      We are very grateful to the reviewer for the positive comments and fully agree with the points raised. Due to the overall length of the manuscript, we initially omitted a discussion of the complexity of the Min system in certain Firmicutes. However, we agree that this aspect should be considered. Accordingly, we have now added a dedicated paragraph to the Discussion section addressing this point.

      We also agree that investigating different lipid compositions, including native membranes from Bacillus subtilis, represents a logical next step to further elucidate the influence of lipids on the MinD activity cycle. However, we consider this to constitute a separate project and therefore beyond the scope of the present study.

      Recommendations for the authors:

      Reviewing Editors:

      Some minor corrections are requested-the addition of a bit more details about the complexity of Min systems in other bacteria in particular to the discussion as suggested by Reviewer 2 would be very much appreciated.

      We thank the editors for their positive assessment and the clear recommendations. We have now added a dedicated paragraph to the Discussion section addressing the complexity of the Min system in Clostridioides.

      Reviewer #1 (Recommendations for the authors):

      The following corrections are requested:

      Abstract - Line 29 - Remove the word "solely" from this statement of the abstract. It would be wise to not be so rigid for a biological system that is only partially characterized and to allow for the possibility that biological factors, including local concentrations and/or other molecules, may yet be discovered to impact MinD activation under certain conditions.

      We agree and have amended the text to avoid a to restrictive statement.

      Line 38 - Remove "do not require any unknown protein component" for the reason stated above. Currently, the experiments recapitulate activation suggesting the membrane binding and release controls dynamics without additional factors. This allows for the possibility that biological factors may yet be shown to impact MinD activation under certain conditions.

      We agree and have change the text.

      Discussion - Line 526 - Thermus thermophilus is misspelt.

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study reports a dynamic association/dissociation between malate dehydrogenase (MDH1) and citrate synthase (CIT1) in Saccharomyces cerevisiae under different metabolic conditions that control TCA pathway flux rate. The research question is timely, the use of the NanoBiT split-luciferase system to monitor protein-protein interactions is innovative, and the significance of the findings is valuable. However, the strength of evidence needed to support the conclusions was found to be incomplete based on a lack of critical control and mechanistic experiments.

      We thank the editor for this thoughtful assessment of our work. We are encouraged that the research question, experimental approach, and overall significance were viewed positively.

      To address the concern regarding the strength of evidence, we have implemented additional controls in the revised manuscript. Specifically, we have repeated all MDH1CIT1 interaction measurements alongside strains expressing full-length NanoLUC fusion proteins to assess MDH1 and CIT1 protein abundance. The resulting data, now included as supplementary figures (Figure 2 – figure supplement 2, Figure 2 – figure supplement 3, Figure 3 – figure supplement 1, Figure 4 – figure supplement 2), demonstrate the reproducibility of the findings and indicate that the observed changes in MDH1-CIT1 interaction are not attributable to protein abundance variations.

      We agree that a detailed mechanistic dissection of how the MDH1–CIT1 complex influences metabolic pathway flux is an essential piece of evidence for establishing the functions of the metabolon. However, such analyses require extensive additional investigation beyond the scope of the present study. Accordingly, we have clarified the aims of this work in the revised manuscript to emphasize that our primary objective is to characterize the dynamic behavior of the MDH1–CIT1 interaction under different metabolic conditions and to identify key factors associated with its regulation.

      We believe these revisions strengthen the rigor of the study, better define its scope, and provide a solid foundation for future mechanistic investigations.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by the Obata group characterizes the dynamics of the canonical malate dehydrogenase-citrate synthase metabolon in yeast.

      Strengths:

      The study is well-written and appears to give clear demonstrations of this phenomenon.

      Studies of the dynamics of metabolon formation are rare; if the authors can address the concern detailed below, then they have provided such for one of the canonical metabolons in nature.

      We sincerely thank the reviewer for their positive assessment and for recognizing the value of our study in characterizing the dynamics of the MDH1-CIT1 metabolon. We appreciate the recognition that studies of metabolon dynamics are rare and that our work provides a clear demonstration of this phenomenon for a canonical metabolon. We have carefully addressed the methodological concerns regarding the NanoBiT system as detailed below to further strengthen the evidence for our findings.

      Weaknesses:

      There is a fundamental issue with the study, which is that the authors do not provide enough support or information concerning the split luciferase system that they use.

      We agree that a detailed description of the NanoBiT system is essential to ensure the reliability of the methodology. As suggested, we have added a dedicated paragraph to the Introduction (Lines 90–103) to clarify these technical aspects, supported by the foundational work of Dixon et al. (2016).

      Is the binding reversible or not? How the data is interpreted is massively influenced by this fact.

      Yes, the NanoBiT system is specifically designed to be reversible. The intrinsic affinity of the subunits is low (K<sub>D</sub> = 190 μM), and the association and dissociation rate constants (k<sub>on</sub> = 500 M<sup>-1</sup>s <sup>-1</sup>, k<sub>off</sub> = 0.2 s<sup>-1</sup>) are well outside the range of typical protein-protein interactions (Dixon et al., 2016). These kinetics ensure that the assembly and disassembly of the luminescent complex are dictated solely by the interaction characteristics of the target proteins (MDH1 and CIT1) and not by the tags themselves. This allows for real-time monitoring of both the association and dissociation phases.

      What are the pros and cons of this method in comparison to, for example, FLIM-FRET?

      We have now explicitly addressed the pros and cons of our methodology compared to fluorescence-based systems:

      Pros: The NanoLUC-based reporter is 150 times brighter than conventional luciferases and has a significantly higher dynamic range (Hall et al 2016), allowing detection of weak transient interactions. Importantly for this study, fluorescence-based methods such as FLIM-FRET and BRET are difficult to implement in yeast microplate assays due to the high levels of cellular autofluorescence. NanoBiT bypasses this issue, providing a high signal-tonoise ratio.

      Cons: Unlike FRET, NanoBiT requires the application of a substrate (furimazine). We did not include this disadvantage in the manuscript because it is not critical in a yeast study. Furimazine can be applied directly to the medium and readily permeates cells.

      The authors state that the method is semi-quantitative - can they document this?

      The semi-quantitative nature of the system is supported by its high dynamic range and the linear relationship between the luminescence signal and the amount of protein complex formed, as documented in Dixon et al. (2016). By using this system in a microplate setting, we were able to monitor relative increases or decreases in interaction levels over time across multiple metabolic conditions, providing a robust comparative analysis of metabolon dynamics.

      All of the conclusions are based on the quality of this method. I know that it has been used by others, but at least some preliminary documentation to address these questions is required.

      We acknowledge the reviewer’s concern regarding the reliance on the NanoBiT system. To ensure the reliability of our conclusions, we have included several lines of evidence to validate the method and demonstrate that the observed luminescence signals accurately reflect protein-protein interaction dynamics.

      To confirm the NanoBiT results using an independent biochemical approach, we performed an in vivo pull-down assay following glucose addition (Figure 2 – figure supplement 1A). The results demonstrate a reduction in the physical association between MDH1 and CIT1. This biochemical validation directly supports the reduction in interaction observed with the NanoBiT system during the Crabtree effect.

      We have provided protein abundance data for both MDH1 and CIT1 across the experimental conditions (Figure 2 – figure supplement 1&3; Figure 3 – figure supplement 1; Figure 4 – figure supplement 2). These results show only minor changes in protein levels, confirming that the fluctuations in the NanoBiT signal are independent of protein expression and represent genuine changes in metabolon assembly.

      To ensure the findings are reproducible, we have included MDH1-CIT1 interaction results from repeated independent experiments (Figure 2 – figure supplement 1&3; Figure 3 – figure supplement 1; Figure 4 – figure supplement 1). The consistency of the results across these trials confirms the robustness of the system in monitoring the metabolic regulation of this complex.

      We hope that these additional experimental validations, alongside the detailed technical description based on the established properties of the NanoBiT system (Dixon et al., 2016; Hall et al., 2012), provide the necessary documentation to satisfy the reviewer’s concerns regarding the quality and reliability of the method.

      Reviewer #2 (Public review):

      This study explores the dynamic association between malate dehydrogenase (MDH1) and citrate synthase (CIT1) in Saccharomyces cerevisiae, with the aim of linking this interaction to respiratory metabolism. Utilizing a NanoBiT split-luciferase system, the authors monitor protein-protein interactions in vivo under various metabolic conditions.

      Major Concerns:

      (1) NanoBiT Signal May Reflect Protein Abundance Rather Than Interaction Strength

      In Figure 1C, the authors report increased MDH1-CIT1 interaction under respiratory (acetate) conditions and decreased interaction during fermentation (glucose), as indicated by NanoBiT luminescence. However, this signal appears to correlate strongly with the expression levels of MDH1 and CIT1, raising the possibility that the observed luminescence reflects protein abundance rather than specific interaction dynamics. To resolve this, NanoBiT signals should be normalized to the expression levels of both proteins to distinguish between abundance-driven and interaction-driven changes.

      We agree that distinguishing between abundance-driven and interaction-driven changes is vital. To address this, we have included new data showing the relative protein levels of MDH1 and CIT1 across all experimental conditions. The protein levels were assessed using yeast lines expressing these proteins tagged with full-length NanoLUC luciferase (Figure 2 – figure supplement 1&3, Figure 3 - figure supplement 1, Figure 4 – figure supplement 2). Using the luminescence data of these relative protein levels, we have included plots showing normalized interaction index (Figure 2 – figure supplement 1G & 3D,H,L; Figure 3 - figure supplement 1D,H,L P; Figure 4 – figure supplement 1D,H,L). This index was calculated by dividing the NanoBiT interaction signal by the product of the relative abundances of both proteins:

      In this formula, NanoBiT, MDH1, and CIT1 are the relative luminescence levels at each time point. This analysis clarified that the changes in the interaction signal significantly exceeded the fluctuations in protein levels, confirming that the dynamics are interactionspecific and not abundance-driven. To provide the most direct and transparent representation of the experimental measurements, we have chosen to keep the raw RLU data in the main figures and have moved the data related to protein abundance and normalization to figure supplements.

      (2) Lack of Causal Evidence

      The study presents a series of metabolic perturbation experiments (e.g., arsenite, AOA, antimycin A, malonate) and correlates changes in metabolite levels with NanoBiT signals. However, these data are correlative and do not establish a functional role for the MDH1CIT1 interaction in metabolic regulation. To demonstrate causality, the authors should implement approaches to specifically disrupt the MDH1-CIT1 interaction. One strategy could involve using a 15-residue peptide (Pept1) derived from the Pro354-Pro366 region of CIT1, previously shown to mediate the interaction, or introducing the cit1Δ3 (Arg362Glu) mutation, which perturbs binding. Metabolic flux analysis using ^13C-labeled glucose and mitochondrial respiration assays (e.g., Seahorse) could then assess functional consequences.

      We agree with the reviewer that the current dataset correlates metabolon assembly with metabolic states rather than establishing a direct causal proof of its functional role in regulating pathway flux.

      However, the primary objective of this manuscript was to establish the dynamic nature of the MDH1-CIT1 metabolon and to demonstrate the causal relationship between the changes in cellular conditions and metabolon dynamics through in vitro and in vivo assessments. Demonstrating that this canonical multienzyme complex undergoes reversible assembly and disassembly in vivo represents a major advance, as metabolon dynamics is a critical, yet previously unrevealed, factor involved in metabolic regulation. We aimed to define the specific environmental triggers that govern these dynamics, providing the necessary foundation for defining the functions of metabolons.

      We completely agree that establishing causality using interaction-deficient mutants coupled with metabolic flux analysis is another critical experiment to establish the functions of the TCA cycle metabolon. We have, in fact, been conducting these precise metabolic flux analyses on CIT1 mutants with disrupted interaction with MDH1. Because the functional consequences of complex disruption involve wide-reaching metabolic rerouting that requires extensive data presentation and modeling, this work forms a separate, comprehensive follow-up study that is currently in preparation for submission in the near future.

      To address this limitation in the current manuscript, we have carefully reviewed and revised the Abstract, Results, Discussion, and Conclusion sections (Lines 19-22; 205; 322-327; 341-342; 458-466). We have removed any language that may have inadvertently implied direct causality. We now explicitly state that our findings indicate the relationship between metabolon dynamics and respiratory conditions, and we have added a clear statement noting that the direct effects of this assembly on metabolic flux are the focus of our forthcoming studies.

      (3) Absence of Protein Expression Controls Under Perturbation Conditions

      In experiments involving acetate, arsenite, AOA, antimycin A, and malonate, the authors infer changes in MDH1-CIT1 association based solely on NanoBiT signals. However, no accompanying data are provided on MDH1 and CIT1 protein levels under these conditions. This omission weakens the conclusions, as altered expression rather than interaction strength could underlie the observed luminescence changes. Immunoblotting or quantitative proteomics should be used to confirm constant protein expression across conditions.

      In response to your first concern, we have now performed protein expression assessments for all experiments, including the perturbation conditions, such as acetate, arsenite, AOA (Figure 3 – figure supplement 1), antimycin A, cyanide, and malonate (Figure 4 – figure supplement 2). The results demonstrate that the protein levels of MDH1 and CIT1 remain relatively stable throughout these treatments and do not correlate with the large changes observed in the interaction signals. This is also demonstrated by the normalized interaction index, which confirms that the shifts in luminescence are driven by the dynamic assembly and disassembly of the MDH1-CIT1 metabolon rather than changes in protein concentrations.

      Conclusion:

      Although the central question is compelling and the use of NanoBiT in live cells is a strength, the manuscript requires additional experimental rigor. Specifically, normalization of interaction signals, introduction of causative perturbations, and validation of protein expression are essential to substantiate the study's claims.

      We sincerely thank the reviewer for recognizing the value of our central question and the strength of the live-cell NanoBiT system, as well as for your rigorous critique that has strengthened this manuscript. To address the concerns regarding experimental rigor, we have now provided extensive validation of MDH1 and CIT1 protein expression across all experimental conditions using yeast lines tagged with the full-length NanoLUC luciferase. These data demonstrate relatively stable protein expression, allowing us to calculate a normalized interaction index that substantiates that the observed luminescence shifts are driven by dynamic metabolon assembly rather than protein concentration. Regarding causative perturbations, we agree that introducing interaction-deficient mutants coupled with isotopic flux analysis is the critical next step to establish functional consequences. Because defining these pathway-wide rerouting events requires extensive modeling, this work will be reported in a follow-up study currently in preparation. Accordingly, we have carefully revised the manuscript to remove language implying direct causality, explicitly framing metabolon dynamics as an integral factor in metabolic regulation closely related to pathway activity and cellular metabolic states. We believe these new quantitative controls, normalizations, and textual clarifications thoroughly address the need for additional rigor and solidly substantiate our findings.

      Reviewer #3 (Public review):

      Summary:

      Metabolons are multisubunit complexes that promote the physical association of sequential enzymes within a metabolic pathway. Such complexes are proposed to increase metabolic flux and efficiency by channeling reaction intermediates between enzymes. The TCA cycle enzymes malate dehydrogenase (MDH1) and citrate synthase (CIT1) have been linked to metabolon formation, yet the conditions under which these enzymes interact, and whether such interactions are dynamic in response to metabolic cues, remain unclear, particularly in the native cellular context. This study uses a nanoBIT protein-protein interaction assay to map the dynamic behavior of the MDH1-CIT1 interaction in response to multiple metabolic stimuli and challenges in yeast. Beyond mapping these interactions in real time, the authors also performed GC-MS metabolomics to map whole-cell metabolite alterations across experimental conditions. Finally, the authors use microscale thermophoresis to determine components that alter the MDH1-CIT1 interaction in vitro. Collectively, the authors synthesize their collected data into a model in which the MDH1CIT1 metabolon dissociates in conditions of low respiratory flux, and is stimulated during conditions of high respiratory flux. While their data largely support these models, some key exceptions are found that suggest this model is likely oversimplified and will require further work to understand the complexities associated with MDH1-CIT1 interaction dynamics. Nonetheless, the authors put forth an interesting and timely toolkit to begin to understand the interaction kinetics and dynamics of key metabolic enzymes that should serve as a platform to begin disentangling these important yet understudied aspects of metabolic regulation.

      We thank the reviewer for this thoughtful and constructive summary of our work. We appreciate the recognition of the novelty and utility of our experimental approach and the integrated analysis of MDH1–CIT1 interaction dynamics.

      We agree with the reviewer that, although our data largely support a model in which MDH1– CIT1 interaction correlates with respiratory activity, there are conditions that do not fully conform to this simplified framework. In the revised manuscript, we have addressed these apparent inconsistencies by providing detailed interpretations of the counterintuitive observations (e.g., ETC inhibition) and emphasizing that the MDH1–CIT1 interaction is modulated by changes in the mitochondrial matrix microenvironment associated with respiratory activity.

      Furthermore, we have revised the Discussion to highlight that the regulation of the MDH1– CIT1 interaction is likely multifactorial, involving the combined effects of pH, metabolites, and other unknown factors, which together enable fine-tuning of metabolic flux in fluctuating environments. This expanded perspective is now more clarified.

      We agree that identifying the precise molecular determinants of MDH1–CIT1 interaction dynamics will require additional mechanistic studies, such as systematic analyses using yeast mutants. While these experiments are an important next step, they are beyond the scope of the present study. We anticipate that the toolkit and framework established here will facilitate such future investigations.

      Strengths:

      (1) The authors address an important question: how do metabolon-associated proteinprotein interactions change across altered metabolic conditions?

      (2) The development and validation of the MDH1-CIT1 nanoBIT assay provides an important tool to allow the quantification of this protein-protein interaction in vivo. Importantly, the authors demonstrate that the assay allows kinetic and real time assessment of these protein interactions, which reveal interesting and dynamic behavior across conditions.

      (3) The use of classic biochemical techniques to confirm that pH and various metabolites can alter the MDH1-CIT1 interaction in vitro is rigorous and supports the model put forth by the authors.

      We thank the reviewer for these positive and encouraging comments. We are pleased that the importance of the research question, the development of the MDH1–CIT1 NanoBiT assay, and the integration of in vivo and in vitro approaches were recognized. We especially appreciate the acknowledgment of the assay’s ability to capture dynamic and kinetic changes in protein–protein interactions, as well as the support provided by the biochemical analyses. We hope that the experimental framework established in this study will serve as a useful platform for further investigations into metabolon dynamics and metabolic regulation.

      Weaknesses:

      (1) Some of the data collected seem to be merely reported rather than synthesized and interpreted for the reader.

      We agree that explicitly synthesizing these findings is essential for clarity. To improve this, we have revised the Results section to include concise summary statements at the conclusion of each major experimental paragraph (Lines 190-191, 201, 218-219, 229-231, 241-242, 272-274, 282-283; 291-293). These additions interpret the data in relation to our main hypothesis. The discussion section was thoroughly revised to more precisely explain the logic supporting the model (Lines 381-393; 433-443, 458-466). Additionally, to bring together the entire dataset, we introduced a new summary schematic (Figure 6A). This figure visually and conceptually integrates our diverse findings, covering metabolic treatments, pH fluctuations, and complex metabolite profiles, showing how these signals work together to control multienzyme complex assembly.

      This is particularly true for data that seem to reflect more complex trends, such as the GCMS experiments that map metabolites across multiple experiments, or treatments that show somewhat counterintuitive results, such as the antimycin A treatment, which promotes rather than disrupts the MDH1-CIT1 interaction.

      We agree that our complex datasets, including the metabolomics and the seemingly counterintuitive Antimycin A results, required deeper synthesis. To clarify the broader metabolic trends, we have added Figure 6A to visually map which factors, specifically pH, malate, fumarate, and aspartate, most consistently align with complex assembly. We revised the Discussion (Lines 390-393, 439-443) to explicitly conclude that no single variable predominantly governs the interaction, but it is coordinately regulated by multiple microenvironmental cues.

      Regarding the Antimycin A (and other ETC inhibitors) discrepancy, where the interaction is enhanced despite suppressed respiration, we have expanded our interpretation (Lines 346–358) to explain this as a transient response that is not directly reflected by steadystate respiratory activity. Specifically, we propose that acute perturbations of the mitochondrial matrix microenvironment, particularly changes in pH, temporarily promote MDH1–CIT1 interaction. Thus, under these conditions, transient microenvironmental changes can dominate over steady-state respiratory output in regulating metabolon assembly.

      The discussion paragraph about the imperfect relationship between pH and interaction has been revised to highlight our conclusion that mitochondrial matrix pH can be a contributing factor rather than the primary regulator (Lines 386-393).

      (2) Some of the assertions put forth in the manuscript are not substantiated by the data presented, and the authors are at times overly reliant on previous findings from the literature to support their claims. This is particularly notable for claims about "TCA cycle flux"; the authors do not perform flux analysis anywhere in their study and should be cautious when insinuating correlations between their observations and "flux".

      We appreciate the reviewer’s careful evaluation of our terminology and fully agree that claims regarding "flux" should be reserved for studies that employ direct isotopic flux measurements. In response to this constructive feedback, we have thoroughly reviewed the manuscript to ensure that our assertions are substantiated by the presented experimental data. We have carefully evaluated the use of the term "flux" throughout the Abstract, Introduction, and Discussion, replacing it with more accurate phrases such as "pathway activity," "respiratory activity," or "mitochondrial respiration" depending on the specific context (Lines 11; 20-21; 50; 111-112; 322-327; 329; 345; 349-350; 442-443; 458466).

      We also removed a paragraph discussing the potential role of the MDH1-CIT1 metabolon in the malate-aspartate shuttle (Line 361). We realized the paragraph is highly speculative, and our data do not directly support the hypothesis. The influence of the MDH1-CIT1 on the malate-aspartate shuttle is a major finding of the upcoming manuscript reporting its effects in metabolic network flux. We apologize for mixing up the results of two separate studies.

      Furthermore, we have revised our conclusions to avoid over-reliance on prior literature in making causal claims. We now explicitly frame the dynamic assembly of the MDH1-CIT1 metabolon as an integral factor in metabolic regulation, closely related to cellular metabolic states, rather than stating that it controls pathway flux (Lines 454-462). We believe these textual revisions accurately align our claims with our current observations and remove any unsubstantiated assertions.

      (3) The manuscript presentation could be improved. For figures, at times, the axes do not have intuitive labels (example, Figure 1A), data points and details about the number of samples analyzed are missing (bar graphs and box plots), and molecular weight markers are not reported on western blots. The authors refer to the figures out of order in the text, which makes the manuscript challenging to navigate as a reader.

      We thank the reviewer for these helpful suggestions to improve the clarity and presentation of the manuscript. We have made several revisions accordingly.

      First, axis labels have been revised throughout the figures to improve clarity and make them more intuitive. Second, we have added the number of biological replicates to the figure captions and updated bar graphs and box plots to display individual data points. Third, to improve the transparency of the immunoblot data, we have included molecular weight marker position in Figure 1C and corresponding full gel images in a new Figure 1 – figure supplement 2. Other immunoblot images have been moved to Figure 2 – figure supplement 1 since they lack molecular marker images.

      In addition, we have reorganized the figure panel labeling and corresponding text to improve the flow of the Results section. Specifically, figure subpanels are now arranged according to the measured parameters rather than treatment conditions, and the relevant sections describing TCA cycle manipulation and ETC inhibition have been revised to follow this updated figure order (Lines 208–231; 251–274). These changes improve the readability and logical progression of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The grammar in the abstract in the sentence which states called metabolon. This needs to be fixed.

      We thank the reviewer for pointing this out. We have revised the sentence in the Abstract to improve clarity. The revised sentence reads: “The tricarboxylic acid (TCA) cycle enzymes malate dehydrogenase (MDH1) and citrate synthase (CIT1) form a multienzyme complex, referred to as a metabolon, that channels intermediate oxaloacetate between their reaction centers.” (Lines 7-9)

      Reviewer #3 (Recommendations for the authors):

      Major points:

      (1) Much of the data reported in this manuscript reads as a summary of what was found, rather than distilling what the trends in the data mean or how they support the proposed model.

      We thank the reviewer for this comment. This concern overlaps with your previous point (Weakness 1), which we have addressed through revisions to improve synthesis and clarity. Specifically, we have added concise summary statements at the end of each major experimental section (Lines 190-191, 201, 218-219, 229-231, 241-242, 272-274, 282-283; 291-293), and we have included a new summary schematic (Figure 6A) that integrates the findings to illustrate how metabolic conditions and mitochondrial microenvironments relat to MDH1–CIT1 interaction. Together, these revisions improve the interpretation and clarify how the results support our model.

      For instance, in Figure 3, the authors use one metabolic treatment to activate the TCA cycle and two to inhibit the TCA cycle. In Figure 3M, GC-MS data are reported for select metabolites across these three conditions, as well as a control condition. However, these metabolites don't follow clean "trends" according to the predictions; as one example, malate is down in the TCA active (acetate) and one TCA inhibited condition (arsenite), whereas it is elevated in the second TCA inhibited (aminooxyacetate) condition. As an additional example, glutamate is down in the arsenite (inhibited) condition, slightly down in the acetate (activated) condition, but is unchanged in the AOA (inhibited) condition. Similar variability is seen in Figure 4M. What do these discrepancies mean? How do they support the model? As written, these data bring forth more questions than they answer.

      We appreciate the reviewer’s careful analysis of the metabolomics data in Figures 2E, 3M, and 4M. The reviewer notes that the levels of certain metabolites show complex patterns that do not simply reflect overall TCA cycle activity. We have acknowledged that our metabolomics dataset is a valuable resource for the research community and have added a brief paragraph to emphasize the complex metabolic phenotypes resulting from chemical treatments (Lines 422-431).

      As mentioned in the paragraph, this complexity is biologically expected. It is likely from the distinct primary targets of each inhibitor, such as arsenite affecting redox-sensitive enzymes and AOA disrupting the malate-aspartate shuttle, as well as off-target effects and the adaptive reorganization of intersecting metabolic networks to bypass local blockades. Rather than viewing these diverse metabolic phenotypes as discrepancies, we leveraged them to uncouple general respiratory suppression from specific metabolite pools, allowing us to independently assess their relationship with metabolon assembly.

      Furthermore, we note that our GC-MS analysis measures whole-cell metabolite levels, which represent the sum of multiple subcellular compartments and may not precisely reflect localized concentrations within the mitochondrial matrix that is directly affected by the TCA cycle. The description of this limitation of whole-cell metabolomics has been revised in Lines 417-420.

      (2) Why do the authors propose that antimycin A increases the interaction between MDH1 and CIT1 despite decreasing respiratory activity? Given the generalities proposed in Figure 6, this is important to address.

      We thank the reviewer for this comment. This point overlaps with Weakness 1, where we have addressed the apparent discrepancy associated with antimycin A (and other ETC inhibitors). Briefly, we have expanded our interpretation (Lines 349–360) to explain this effect as a transient response that is not directly aligned with steady-state respiratory activity. We propose that acute perturbations of the mitochondrial matrix microenvironment, particularly changes in pH, temporarily promote MDH1–CIT1 interaction. In addition, we have revised the Discussion (Lines 386–404) to clarify that mitochondrial matrix pH acts as a contributing factor rather than the primary regulator of the interaction. Together, these revisions reconcile the ETC inhibition by antimycin A with the overall model presented in Figure 6.

      (3) The authors use acetate to "activate" the TCA cycle; do other non-fermentable carbon sources also promote the MDH1-CIT1 interaction?

      We thank the reviewer for this insightful question. We have tested additional nonfermentable carbon sources and found that they did not significantly affect MDH1–CIT1 interaction (Figure 3—figure supplement 1). We note that raffinose present in the medium likely provides a baseline carbon source supporting oxidative metabolism, which may limit the observable effects of these treatments (Lines 149-150).

      In addition, we performed a new experiment using ethanol. While ethanol treatment enhanced the MDH1–CIT1 interaction signal, it also increased the abundance of MDH1 and CIT1, resulting in a reduced interaction index. Because ethanol induces protein accumulation under our experimental conditions, this result is not straightforward to interpret. We have included this observation and its interpretation in the revised manuscript (Lines 208–211).

      (4) The authors show that the MDH1-CIT1 interaction is sensitive to pH. Is the MDH1-CIT1 interaction affected by uncouplers in vivo?

      We thank the reviewer for suggesting a meaningful experiment. We performed a new experiment examining the effect of the uncoupler CCCP on MDH1–CIT1 interaction in vivo (Figure 4—figure supplement 4). We found that CCCP treatment increased the interaction signal, consistent with the idea that acidification of the mitochondrial matrix promotes MDH1–CIT1 association.

      However, we observe that CCCP treatment also decreased the luciferase signals from MDH1 and CIT1 fused to full-length NanoLUC in an abnormal way, making it harder to interpret the interaction index. Therefore, although these results support a possible role for pH in regulating the interaction, they should be viewed with caution and included as a figure supplement. This experiment and its interpretation have been added to the revised manuscript (Lines 276–283).

      (5) NADH is a potent suppressor of many enzymes within the TCA cycle, including MDH1 and CIT1. Can the authors modulate mitochondrial NADH through genetic manipulation of Ndi1, or through overexpression of mito-Lb-NOX (PMID: 27124460)?

      We thank the reviewer for this insightful suggestion. We agree that the mitochondrial NADH is a potential regulator of the MDH1-CIT1 interaction as it is a potent suppressor of many TCA cycle enzymes, and indeed, we have previously shown that NADH inhibit the MDH-CS interaction in vitro (Omini et al 2021 PMID: 34548590). For this reason, we investigated the mitochondrial matrix redox state that is related to the NADH levels in the current study. The reviewer’s proposed strategy of using targeted genetic tools like mito-Lb-NOX or Ndi1 manipulation to specifically influence the NADH level is an elegant approach to isolate this variable. However, implementing this system requires generating, optimizing, and validating new yeast strains that harbor the targeted NADH-modulating constructs alongside NanoBiT and full-length NanoLUC sensor systems. Because this extensive strain engineering and subsequent live-cell validation fall outside a feasible timeframe for the current manuscript revision, we must respectfully defer these experiments. We view the precise manipulation of the mitochondrial redox state via tools like mito-Lb-NOX as a complementary approach for our future work to systematically pinpoint the individual regulatory factors. We have expanded our Discussion (Lines 417-420; 462-465) to highlight the targeted genetic manipulation of the possible regulatory factors including the NADH pool, as a critical future direction for dissecting these dynamics.

      (6) The authors should correct their figures:

      (a) Axes should be easy to interpret on graphs.

      (b) Individual datapoints should be shown on bar graphs and box plots. Minimally, the number of samples evaluated should be reported.

      (c) Molecular weight markers should be reported on blots.

      We thank the reviewer for these helpful suggestions. Points (a) and (b) overlap with Weakness 3, which we have addressed through revisions to improve figure clarity and data presentation. Specifically, axis labels have been revised to be more intuitive, the number of samples is now reported in the figure captions, and bar and box plots have been updated to include individual data points. For time-course data, we retained point-line plots, as alternative formats (e.g., bar or box plots) would reduce clarity due to the density of time points.

      For point (c), we have added molecular weight markers to the immunoblot data where available (Figure 1C). In the time-course experiment in the original Figure 2, molecular weight markers were absent from the gel images. Although we are confident in the identity of the detected signals, we have moved these data to a figure supplement (Figure 2—figure supplement 1C) to reflect this limitation. Similarly, the corresponding Co-IP data are now presented as a figure supplement (Figure 2—figure supplement 1A).

      Minor points:

      (1) In the last paragraph before the results, the authors refer to "the fluorescent biosensors", but start the paragraph discussing the nanoBIT PPI. After reading the manuscript, these seem to be distinct experimental setups, but that was not evident in the first read through of the paper.

      We thank the reviewer for pointing out this source of confusion. We apologize for the lack of clarity in distinguishing between the experimental approaches. In this study, the NanoBiT system was used to measure MDH1–CIT1 interaction, whereas fluorescent biosensors were used to assess mitochondrial matrix pH, redox state, and ATP levels. We have revised the paragraph to more clearly distinguish these methodologies and their respective roles in the study (Lines 105–112).

      (2) As mentioned above, referring to multiple figures out of order within the manuscript is very jarring for the reader. The authors should consider reworking the narrative or figures to be presented in order.

      We thank the reviewer for this comment. This concern overlaps with the previous comment regarding figure organization, which we have addressed by revising both the figure labeling and the corresponding text. Specifically, figure subpanels have been reorganized to follow the measured parameters rather than treatment conditions, and the Results sections describing TCA cycle manipulation and ETC inhibition have been revised to follow the updated figure order (Lines 208–231; 251–274). These changes improve the logical flow and readability of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This study investigated how visuospatial attention influences the way people build simplified mental representations to support planning and decision-making. Using computational modeling and virtual maze navigation, the authors examined whether spatial proximity and the spatial arrangement of obstacles determine which elements are included in participants' internal models of a task. The study developed and tested an extension of the value-guided construal (VGC) model that incorporates features of spatial attention for selecting simpler task mental representation.

      Strengths:

      (1) Original Perspective:

      The study introduces an explicit attentional component to established models of planning, offering an approach that bridges perception, attention, and decisionmaking.

      (2) Methodological Approach:

      The combination of computational modeling, behavioral data, and eye-tracking provides converging measures to assess the relationship between attention and planning representations.

      (3) Cross-validated data:

      The study relies on the analysis of three separate datasets, two already published and an additional novel one. This allows for cross-validation of the findings and enhances the robustness of the evidence.

      (4) Focus on Individual Differences:

      Reports of how individual variability in attentional "spillover" correlates with the sparsity of task representations and spatial proximity add depth to the analysis.

      We thank the Reviewer for their overall positive assessment of our work and their helpful comments. We have addressed each point below.

      Weaknesses:

      (1) Clarity of the VGC model and behavioral task:

      The exposition of the VGC model lacks sufficient detail for non-expert readers. It is not clear how this model infers which maze obstacles are relevant or irrelevant for planning, nor how the maze tasks specifically operationalize "planning" versus other cognitive processes.

      The method for classifying obstacles as relevant or irrelevant to the task and connecting metacognitive awareness (i.e., participants' reports of noticing obstacles) to attentional capture is not well justified. The rationale for why awareness serves as a valid attention proxy, as opposed to behavioral or neurophysiological markers, should be clearer.

      We thank the reviewer for urging further clarity here. Our work builds closely on the previous maze navigation paradigm and VGC model developed and reported by Ho et al. Nature (2022). We directly adopted variants of their maze stimuli, computational model and obstacle awareness measures, and married these with an investigation of the role of visuospatial attention. We agree that it would be useful for the reader to have a more in-depth description of the paradigm and model, and how it operationalises planning, without needing to refer back to the original Ho et al. paper. We have now added additional explanatory sections to the Introduction and Methods as follows:

      On page 4:

      “One elegant approach to forming such a simplified representation is to adaptively select the granularity of information required to complete the task (Ho et al., 2022), known as value-guided construal (VGC). Unlike previous accounts, which model human planning as a search over all items (e.g.., tube lines), the VGC model predicts that a cognitively limited decision-maker selects a manageable subset of information over which to plan— i.e., a task representation—balancing utility and complexity (Ho et al., 2022). In our example, the VGC algorithm would plan over a few relevant tube lines rather than planning over all possible stations. To select the representation that achieves the best balance between utility and complexity, the model searches across all possible combinations of tube lines, computing the value (i.e., the plan’s utility minus its cost) of each representation for planning a specific journey. The algorithm then selects the representation with the highest value, which ensures that an ideal observer selects a representation which only includes the items (i.e., tube lines) that lead to successful planning while excluding as many items as possible to keep the plan as simple as possible. For our purposes, items included in the representation are considered taskrelevant, while items that are not represented are considered task-irrelevant. This algorithm, therefore, provides a normative standard of an efficient plan to which we can compare people’s actual plans.”

      On page 6:

      “We operationalized planning using a maze navigation paradigm, akin to our tube-related example, where participants were required to plan a route through the maze, avoiding obstacles that blocked their path. Obstacles predicted by the sVGC model to be included in the representation were considered task-relevant.”

      “At the end of every trial, participants reported their awareness of specific obstacles (see Methods for details). The level of awareness reported for different obstacles provides a read-out of what features of the environment individuals were subjectively representing while solving a particular maze. While other markers of attention and awareness (for instance, behavioural or neurophysiological variables) could also be used, here we focused on direct awareness reports in order to relate our findings both to those of Ho and colleagues and to the subjective awareness reports used in consciousness science (e.g. the Perceptual Awareness Scale (Barnett et al., 2024; Overgaard & Sandberg, 2021; Ramsøy & Overgaard, 2004; Samaha et al., 2015)). Participants were instructed to maintain central fixation while planning (see dataset dSC 1), in line with previous empirical work using this task (Ho et al., 2022).”

      To visualize our effects, we binarized the predictions of the sVGC model such that obstacles with a marginalized probability greater than 0.5 were considered taskrelevant, while other obstacles were considered task-irrelevant (e.g., Figure 2b). We now clarify this point in the caption of Figure 2.

      (2) Attention framework:

      The account of attention is largely limited to the "spotlight" model. When solving a maze, participants trace the correct trail, following it mentally with their overt or covert attention. In this perspective, relevant concepts are also rooted in attention literature pertaining to object-based attention using tasks like curve tracing (e.g., Pooresmaeili & Roelfsema, 2014) and to mental maze solving (e.g., Wong & Scholl, 2024), which may be highly relevant and add nuance to the current work. This view of attention may be more pertinent to the task than models of simultaneously tracking multiple objects cited here. Prior work (notably from the Roelfsema group) indicates that attentional engagement in curve-tracing tasks may be a continuous, bottom-up process that progressively spreads along a trajectory, in time and space, rather than a "spotlight" that simply travels along the path. The spread of attention depends on the spatial proximity to distractors - a point that could also be pertinent to the findings here.

      Moreover, the tracing of a "solution" trail in a maze may be spontaneous and not only a top-down voluntary operation (Wong & Scholl, 2024), a finding that requires a more careful framing of the link to conscious perception discussed in the manuscript.

      Conceptualizing attention as a spatial spotlight may therefore oversimplify its role in navigation and planning. Perhaps the observed attentional modulation reflects a perceptual stage of building the trail in the maze rather than a filter for a later representation for more efficient decision making and planning. A fuller discussion of whether the current model and data can distinguish between these frameworks would benefit readers.

      We thank the reviewer for highlighting relevant findings in the attention literature that were missing from our discussion. We fully agree that a complete account of the interplay of planning, navigation, and attention is likely to recruit the kind of curvetracing processes highlighted by the reviewer. However, we emphasise that our current focus is not on the process of navigation through a maze, but on the process of construing the maze itself. In other words, we are focused not on how people represent their path from A to B, but how they represent the maze itself, which they then use as a basis for planning between A and B. The VGC model predicts that a subset of obstacles will be included in this construal. We think that a spotlight model is a good starting point for this work, because attention is being deployed across the whole maze stimulus, and then becomes attached to particular objects located in particular positions. This is a distinct process from that involved in navigating the path itself. Accordingly, our stimuli were designed such that task-relevant obstacles could be presented either proximally or distally to the optimal path (e.g., Figure 1a and Supplemental Figures S1-6). An obstacle that blocks any possible path on one side of the maze is task-relevant but located a long way from the optimal path. The results of Ho and colleagues’ (2022) third experiment demonstrate how task-relevant yet distal obstacles are better remembered than task-irrelevant proximal obstacles (see Figure 4 of Ho et al., 2022). We also observed that obstacles further away from the navigation path were often represented by participants (see Figures S1-6), which cannot be explained by curve tracing alone.

      While these results cannot definitively rule out the possibility that participants automatically trace the path while also construing the maze, they suggest that the value-guided construal process is an independent predictor of participants’ representations beyond proximity to the navigated path. To make this distinction clearer, we now cite the papers alluded to by the reviewer, in the Discussion on pages 28-29, while also acknowledging the potential for investigating attention during the navigation process itself:

      “Future work may also wish to examine the relevance of visuospatial attention for the navigation process itself in this task. While our present findings speak to how individuals perceive the maze while planning, it remains unclear how attention is deployed during navigation along a path, such as how object-based attention progressively spreads along trajectories in time and space(Pooresmaeili & Roelfsema, 2014; Wong & Scholl, 2024).”

      There is also one additional nuance to the current spotlight model that we were inspired to consider by the reviewer’s comment. This is the idea that attentional effects may spread within or along the obstacles themselves. We cannot explore this in the current data because we asked for awareness of the entire obstacles, not parts of obstacles, but it may be possible to explore this in future work, for instance, with eye tracking measures.

      More generally, the growth-cone (i.e., zoom lens) model of attention for curve tracing proposed by Roelfsema and colleagues shares considerable similarities with the spotlight of attention model. Both models argue for the grouping of spatially proximal items based on attention. While the growth-cone model argues for varying sizes of zoom lenses (i.e., receptive fields of neurons) that facilitate the tracing of proximal items, both models predict that spatially proximal items are preferentially processed together because of attention. Indeed, the spotlight model could model these varying zoom lenses by altering the width of the attentional spotlight dynamically across the visual scene based on the spatial proximity of obstacles. Following related comments by Reviewer 2, we now investigate inter-individual differences in the attentional spotlight of participants and observed that these differences significantly predict participants’ mental representations (see Attentional spotlight model of task representations). We have now updated the Discussion to include consideration of these alternative model frameworks:

      On page 27:

      “Second, in the current work we were unable to distinguish whether these attentional effects are driven by a fixed spotlight of attention, or whether attention operates akin to a zoom lens, shifting the ‘width’ of the focus of attention according to the task demands (Eriksen & St. James, 1986; Müller et al., 2003; Schad & Engbert, 2012). The latter view would be consistent with growth-cone models of attention in which the focus of attention expands and contracts in accordance with task demands, mirroring the various receptive field sizes in the visual hierarchy (Pooresmaeili et al., 2014; Pooresmaeili & Roelfsema, 2014). In partial support of this idea, we found significant inter-individual differences in the width of participants’ attentional spotlight (Figure S11). It is also possible that attention is deployed within or along parts of obstacles, rather than on entire obstacles. Future work using naturalistic measures of eye movements may be able to address these questions.”

      (3) Lateralization of attention:

      The analysis considers whether relevant information is distributed bilaterally or unilaterally across the visual display, but does not sufficiently address evidence for attentional asymmetries across the left and right visual fields due to hemispheric specialization (e.g., Bartolomeo & Seidel Malkinson, 2019). Whether effects differ for left versus right hemifield arrangements is not made explicit in the presented findings.

      We thank the reviewer for this suggestion. To address this point, we fitted a three-way interaction model between VGC model prediction, lateralization index, and side (left vs right hemifield). We did not find evidence for the three-way effect (β= 0.01, SE= 0.02, 95% CI [-0.03, 0.04], p = 0.738; ΔBIC = 58.30 in favour of the null effect; see table below), suggesting that the side to which participants lateralized their attention did not influence their task representations. This result is now reported on page 12:

      “This effect did not vary significantly as a function of the specific hemifield (i.e., left vs right) in which task-relevant information was presented (β= 0.01, SE= 0.02, 95% CI [-0.03, 0.04], p = 0.738; ΔBIC = 58.30 in favour of the null effect; see table S14).”

      We also explored inter-individual differences in participants’ tendency to lateralize their attention (see also the next point). We observed that participants tended to lateralize their attention slightly more to the right-hand side for non-lateralized maze stimuli, despite the normative sVGC model predicting that participants should not lateralize their attention for these stimuli (Figure 3c). These results may speak to potential asymmetries in lateralization, but given the exploratory nature of these analyses, they should be verified and replicated in future work.

      (4) Individual differences:

      Individual differences in attentional modulation are a strength of the work, but similar analyses exploring individual variation in lateralization effects could provide further insight, and the lack of such analyses may mask important effects.

      Thank you for this suggestion. In new analyses, we explored whether i) participants exhibited differences in their tendency to lateralize their awareness reports, and ii) whether the degree to which they tended to lateralize their awareness predicted their performance on a separate set of maze stimuli. In short, we observed substantial variation in participants’ tendency to lateralize their awareness (Figure S11) and found that this tendency reflected an inter-individual difference which was stable across maze types. We report these new findings on pages 14-16.

      “Inter-individual variation in lateralization of attention

      Next, we investigated participants’ tendency to pay attention to obstacles within a single hemifield (left vs right) regardless of the sVGC model predictions. To do so, we computed an awareness lateralization index (ALI) based on participants’ self-reported awareness reports of obstacles on each trial (Figure 3a). Large positive values indicate that participants were preferentially aware of the right hemifield, whereas negative values indicate preferential awareness of the left hemifield. Values close to zero indicate that participants paid attention to both hemifields equally (see Methods for details). We observed that participants’ tendency to lateralize their awareness varied greatly across the Ho datasets 1 and 2 (Figure 3b); some participants preferentially paid attention to a single hemifield, regardless of whether the sVGC model predictions were lateralized. For the dSC1 dataset, we observed that on some trials, participants significantly lateralized their awareness (|ALI| > 0.5; Figure 3c) even though the sVGC model predictions were non-lateralized. These findings suggest that participants’ tendency to pay attention to a single hemifield may represent an observable inter-individual difference in how they allocate their awareness to form task construals.”

      “To further explore these inter-individual differences, we tested whether participants’ tendencies to lateralize their attention to a single hemifield was consistent across trials and maze stimuli. We observed that participants’ tendency to lateralize their attention to a single hemifield was similar for left and right lateralized maze stimuli (Spearman ⍴= 0.72, Figure 3d). This suggests that participants who preferentially attended to a single hemifield did so regardless of which hemifield they should attend to. More consequentially, the tendency for participants to lateralize their awareness on maze stimuli whose model predictions were also lateralized linearly correlated with participants’ tendency to lateralize their attention on non-lateralized maze stimuli (Spearman ⍴= 0.88, Figure 3d). Taken together, these findings emphasize that some individuals tend to preferentially attend to a single hemifield when planning. This tendency, importantly, represents an inter-individual difference in how participants allocate their attention across various maze types.”

      (5) Distinction between overt and covert attention:

      The current report at times equates eye movement patterns with the locus of attention. However, attention can be covertly shifted without corresponding gaze changes (see, for example, Pooresmaeili & Roelfsema, 2014).

      We fully agree, and thank the reviewer for prompting further reflection on this distinction. In the online experiments run by Ho and colleagues (i.e., datasets Ho1 and Ho2), participants’ eye movements were not tracked, and therefore, they could not disambiguate whether participants were engaging in covert or overt attention to sample maze obstacles. In our third experiment (i.e., dataset dSC1), we both recorded eye movements and explicitly instructed participants to fixate centrally while viewing the maze. This ensured that participants oriented their attention only covertly during planning (see Figure S13-14).

      We now elaborate on this important distinction in the Results section of the manuscript, page 12:

      “In addition, we monitored participants’ eye movements in dataset dSC 1 to ensure that attention shifts would be covert as opposed to overt—a distinction which could not be determined in the online samples of datasets Ho 1 and 2.”

      On page 28:

      “Importantly, while the visuospatial attention effects observed in the Ho 1 and 2 datasets are likely driven by both covert and overt shifts in attention, the findings presented in experiment 3 (i.e., dSC1 dataset) rule out the contribution of overt shifts in attention through the use of eye tracking (see Figure S13-14)(Carrasco, 2011; Pooresmaeili & Roelfsema, 2014).”

      The implications for interpreting the relationship between eye movement, memory, and attention in this setting are not fully addressed. The potential dynamics of attention along a maze trajectory and their impact on lateralization analysis would benefit from further clarification.

      We thank the reviewer for urging more clarity here. The attentional dynamics we document in our study concern how people perceive / construe the maze itself, rather than how they deploy their attention to guide active navigation. We have now sought to make this distinction clear at a number of points in the paper. The core idea is that attention acts as an early filter to select which obstacles are part of a task construal, which then affects both awareness and memory.

      We have now clarified the focus of our study in the introduction on pages 5-7:

      “Our focus in this study was to examine how participants perceive and represent their environment (the maze stimulus). This is a distinct process to how participants orient their attention during navigation itself, which is not part of our current study. To do so, we harness classical signatures of attentional selection to characterise how visuospatial attention shapes awareness of maze obstacles during planning.” … “Our focus in the present study was to examine attentional effects on participants’ perception of the maze stimulus. We did not quantify how individuals deploy their attention in the phase in which they were navigating through the maze.”

      We did not explicitly test for memory effects in our new experiments, but Ho and colleagues demonstrated that the sVGC model predicted not only awareness reports, but also participants’ memory of obstacles (see Ho et al., 2022). Indeed, task representations computed from memory or awareness reports were strikingly similar in their experiments (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness). In relation to eye movements, we refer the reviewer back to our previous response, which details how eye movements were measured and controlled during maze construal.

      Figure 1 legend (b) --> (c)

      We have corrected this typo in the figure caption.

      Reviewer #2 (Public review):

      Summary:

      Castanheira et al. investigate the role of spatial attention for planning during three maze navigation experiments (one new experiment and two existing datasets). Effective planning in complex situations requires the construction of simplified representations of the task at hand. The authors find that these mental representations (as assessed by conscious awareness) of a given stimulus are influenced by (spatially) surrounding stimuli. Individual participants varied in the degree to which attention influenced their task representations, and this attentional effect correlated with the sparsity of representations (as measured by the range of awareness reports across all stimuli). Spatially grouping taskrelevant information on either the left or right side of the maze led to mental representations more similar to optimal representations predicted by the valueguided construal (VGC) model - a normative model describing a theoretical approach to simplifying complex task information. Finally, the authors propose an update to this model, incorporating an attentional spotlight component; the revised descriptive model predicts empirical task representations better than the original (normative) VGC model.

      Strengths:

      The novelty of this study lies in the proposal and investigation of a cognitive mechanism through which a normative model like value-guided construal can enable human planning. After proposing attention as this mechanism, the authors make concrete hypotheses about mismatches between the VGC predictions and real human behavior, which are experimentally validated. Thus, not only does this study describe a possible mechanism for simplification of task information for planning, but the authors also propose a descriptive model, revising VGC to incorporate this attentional component.

      A strength of this paper is the variety of investigative approaches: analysis of existing data, novel experiment, and a computational approach to predict experimental findings from a theoretical model. Analyzing pre-existing datasets increases the size of the participant cohort and strengthens the authors' conclusions. Meanwhile, comparing the predictions of the existing normative model and the authors' own refined model is a clever approach to substantiate their claims. In addition, the authors describe several crucial controls, which are key to the interpretability of their results. In particular, the eye tracking results were critical.

      In summary, this paper constitutes an important step toward a more complete understanding of the human ability to plan.

      We thank the Reviewer for their thoughtful and positive assessment of our findings. We also appreciate the constructive feedback on our methodology, which we believe has substantially improved our manuscript.

      Weaknesses:

      (1) There is a critical conceptual gap in the study and its interpretation, mainly due to the reliance on a self-report metric of awareness (rather than an objective measure of behavioral performance).

      a. Awareness is tested by a 9-point self-report scale. It is currently unclear why awareness of task-irrelevant obstacles in this task would necessarily compromise optimal planning. There is no indication of whether self-reported awareness affects performance (e.g., navigation path distance, time to complete the maze, number of errors). Such behavioral evidence of planning would be more compelling.

      We thank the reviewer for prompting further reflection on the connection between construal and navigation performance. We wish to emphasise that the primary focus of our study was on measuring and modeling participants’ task construals using perceptual awareness judgments, building on the methods developed by Ho and colleagues, rather than on navigation performance itself. However, as the reviewer points out, there is a natural relationship between construal and performance – if you represent the wrong obstacles, plans may be disrupted.

      To explore the relationship between task construals and performance on the navigation task we first regressed out the effects of the sVGC model on participants’ awareness reports and computed the mean squared residuals for each trial. We then used these values to predict participants’ navigation response times on each trial. We observed a significant negative relationship, suggesting that on trials where participants’ representations showed greater deviations from the normative model, they were in fact faster at navigating the mazes. This relationship was surprising, and at odds with the initial idea that adhering to normative VGC aids in task performance. However, we think that this direction of effect may make sense if one considers that a large part of the actual construal (rather than the normative prediction) in our data was in fact driven by effects such as lateralisation which are not accounted for by the sVGC model. If one is faster at harnessing inductive biases such as lateralisation, then one may be faster to complete the maze but also show a greater deviation from the predictions of the original model.

      To further explore these effects, we next focused on the distinction between lateralised and non-lateralised mazes. Here, we reasoned that the initial phase of lateralised attentional selection would lead to lateralised mazes being easier to navigate than nonlateralised ones. We conducted new analyses to determine whether participants navigated lateralized maze stimuli faster and with fewer moves than maze stimuli with non-lateralized model predictions. As detailed in Methods, we excluded trials in which participants significantly deviated from the optimal number of moves (9 or more moves) and took longer than 20 seconds to solve the maze. In line with our interpretation that attention operates as an inductive bias, participants were faster and deviated less from the optimal path on lateralized compared to non-lateralized mazes.

      We now report these new results on navigation performance on pages 20-21:

      “Maze navigation performance

      The previous analyses focused on participants’ task representations during planning. We next sought to explore links between participants’ task representations and maze navigation performance. Participants performed the maze navigation task near-ceiling: they solved 95% of maze stimuli in under 20 seconds, with minimal deviation from the optimal path (i.e., 9 moves or fewer). Notwithstanding this limited variance in task performance, we explored whether participants’ task construals may have impacted their navigation speed. To do so, we first regressed out the effects of the sVGC model from participants’ awareness reports and used the mean squared residuals for each trial to predict response times (see Methods for details). Surprisingly, we observed a negative relationship between mean squared residual variance and response times (β = -0.31, SE = 0.05, 95% CI [-0.41, -0.21], p < 0.001), indicating that participants were faster on trials where the sVGC model explained less variance in their awareness reports. In other words, trials in which participants deviated more from the sVGC model predictions were solved faster. We note that one reason for this may be the strong influence of the lateralisation effect on navigation performance (see paragraph below), which itself is not part of the sVGC model prediction.”

      “We then explored whether participant performance differed between lateralised and nonlateralised mazes. Here, we reasoned that the initial phase of lateralised attentional selection would lead to lateralised mazes being easier to navigate than non-lateralised ones. Consistent with this hypothesis, participants were faster (β = -0.04, SE = 5.91*10<sup>3</sup>, 95% CI [-0.06, -0.03], p< 0.001) and followed the optimal path more closely (β = -0.59, SE = 0.09, 95% CI [-0.78, -0.40], p< 0.001) when maze stimuli were more lateralized.”

      And in the Discussion section, on page 23:

      “Mental representations and task performance

      We observed that participants were faster and deviated less from the optimal path on maze stimuli that were lateralized. This effect is not predicted by the original sVGC model but dovetails with the interpretation that early visuospatial attention operates as an inductive bias to guide the formation of simplified task representations. Surprisingly, we also observed that participants were faster to navigate mazes on trials where their simplified task representation deviated from the sVGC model prediction. We interpret this seemingly contradictory finding in the following way: there are several factors beyond the sVGC model – including, for instance, maze lateralisation – that predict both construal and performance on the maze navigation task. Further work is needed to understand how inductive biases such as lateralisation shape both construal and performance, and the real-world benefits that such strategies might afford for naturalistic stimuli.”

      b. Relatedly, it would have been more convincing to have an objective measure of awareness, for instance, how the presence or absence of a "task-irrelevant" obstacle affects performance (e.g., change navigation path distance or time to complete the maze), or whether participants can accurately recall the location of obstacles.

      We thank the reviewer for prompting further reflection on the validity and robustness of our awareness measures. We emphasise however that our focus is not (primarily) on maze navigation performance, but on task construal, which as noted in our previous response may come apart from navigation performance for a variety of reasons. Our primary goal is to measure participants’ subjective awareness of the maze as a marker of their idiosyncratic (conscious) mental representation on each trial. In doing so, we build on a rich tradition of measuring subjective awareness in consciousness and perception science (for instance, work using the Perceptual Awareness Scale, or detection judgments). In this sense, we think our awareness scale (following Ho et al.) represents a valid and straightforward way of assessing our target psychological construct. However, we also agree with the Reviewer that convergent evidence from other measures is always valuable. In Ho and colleagues’ original paper, they developed a variant of the maze task where participants had to recall the location of obstacles, as well as rate their awareness (Exp 3) and a variant in which participants could hover their mouse over hidden obstacles in the maze to reveal their location – an online metric of attentional deployment (Exp 4). These data afforded us the opportunity to validate the awareness reports against an objective measure of recall, as suggested by the Reviewer. In reanalysing these data, we observed that the obstacle awareness and memory/hover measures were strikingly correlated within two independent samples of participants (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness; ⍴ = 0.76 between the probability of hovering over the obstacle and awareness; ⍴ = 0.65 between the duration of the mouse hovering and awareness). These re-analyses are now reported on page 22 of our manuscript, to highlight the convergent validity of the awareness metric:

      “Finally, we examined the convergent validity of participants’ awareness reports by reanalyzing the memory recall data reported in Ho and colleagues’ experiment(Ho et al., 2022). We reasoned that participants should demonstrate similar task representations regardless of the measure used to probe the construal. In line with this prediction, we observed that the obstacle awareness reports and memory/hover measures were strikingly correlated within three independent samples of participants (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness; ⍴ = 0.76 between the probability of hovering over the obstacle and awareness; ⍴ = 0.65 between the duration of the mouse hovering and awareness; see Tables S18 and S19).”

      c. Consequently, I'm not sure that we can conclude that the spatial context does impact participants' ability to plan spatial navigation or to "incorporate taskrelevant information into their construal". We know that the spatial context affects subjective (self-reported) awareness, but the authors do not present evidence that spatial context affects behavioral performance.

      Following the line of argument above, we think it’s important to separate out task construal (the simplified representation of the maze, measured by awareness reports), and the impact of this on navigation and other aspects of behaviour. The awareness reports (and other convergent measures) show that task-relevant information (as predicted by the VGC) is incorporated into the construal, a process which is modulated by spatial context. These are the key targets of our modeling. Whether this impacts performance is a distinct question, and one that we now address in our response to point a above.

      d. Another concern that may complicate interpretation is the following: Figure 3c shows improved VGC model predictions (steeper slope) for mazes with greater lateralization. However, there are notable outliers in these plots, where a high lateralization index does not correspond to good model performance. There is currently no discussion/explanation of these cases.

      The Reviewer astutely points out some outliers in our analysis. While on average lateralized maze stimuli are represented more closely to the sVGC model, there are indeed some noticeable outlier mazes. These mazes represent stimuli in which participants tended to lateralize their attention to the ‘wrong hemifield’—e.g., participants were more aware of obstacles in the right hemifield despite sVGC model predicting that obstacles on the left hemifield were task-relevant. We believe this explains the poor sVGC model fits on these trials. We note, however, that on average participants were capable of attending to the correct hemifield without explicit instructions (i.e., 9 out of 12 mazes).

      We have now included a discussion of these outliers in the results section of the paper on page 12:

      “We note that for three maze stimuli whose model predictions were lateralized there was nevertheless a poor fit to the sVGC model (see Figure 2c, right panel). These outliers correspond to maze stimuli where participants, on average, lateralized their attention to the incorrect hemifield (i.e., the opposite hemifield to that predicted by the sVGC model).”

      (2) I noticed an issue with clarity regarding task-relevance. It is currently not fully clear which obstacles are "task irrelevant". Also, the term is used inconsistently, sometimes conflating with "awareness". For example, in the "Attentional spotlight model of task representations" section, the authors state that "taskrelevant information becomes less relevant when surrounded by task-irrelevant information". But they really mean that participants become less aware of those task-relevant obstacles. I assume task-relevance is an objective characteristic related to maze organization, not to a participant's construal. Indeed, the following paragraph provides evidence of model predictions of awareness.

      We apologize for any confusion regarding the terminology of our manuscript. We indeed use the terms task-relevant and task-irrelevant to refer to obstacles that are objectively predicted by the normative sVGC model or the attentional spotlight model to be included in (>0.5) or excluded from (<0.5) task construals, respectively. This designation reflects the predictions from the computational model and does not reflect participants’ reported awareness. We then ran linear hierarchical models to predict participants’ awareness reports from these model predictions. The Reviewer is correct that the task-relevance of obstacles is indeed related to the maze’s organization, and not related to participants’ subjective reports of awareness. We have now clarified this point throughout the manuscript to better emphasize the difference between the model predictions of taskrelevance and participants’ subjective reports.

      On page 17:

      “To achieve this, we computed the predictions of the existing VGC model for each obstacle’s task relevance in a given maze, and averaged these predictions within an attentional spotlight of 3 squares (Figure 4a & S8, see Methods for details). This process yielded novel model predictions, whereby some obstacles which were once predicted as task-irrelevant by the normative sVGC are now predicted as task-relevant by the attentional spotlight model. We depict the effects of this spatial spotlight in Figure 4a: task-irrelevant stimuli (plotted in grey; see middle left obstacle) neighbouring taskrelevant obstacles (plotted in orange) become more task-relevant, whereas taskrelevant information becomes less relevant when surrounded by task-irrelevant information (see bottom right orange obstacle). This deviation in model predictions from the normative sVGC model was used to predict participants’ awareness reports. We hypothesized that this spotlight-VGC model would predict participants’ reports better than the original VGC model, which does not account for spatial attention.”

      (3) The behavioral paradigm has some distinct disadvantages, and the validity of the task is not backed up by behavioral data.

      a. I understand the need for central fixation, but it also makes the task less naturalistic.

      The fixation cross was required on every trial such that participants could maintain central fixation for our eye tracking experiment. While this design is less naturalistic, it allows us to examine the eye movements of participants. Requiring participants to fixate during the ‘planning’ phase of the experiment allowed us to isolate the effects of covert attention from changes in awareness due to overt shifts in attention. In other words, differences in participants’ awareness reports in the 3rd experiment cannot be explained by longer fixation times to specific obstacles.

      b. The task with its top-down grid view does not seem to mimic real human navigation. Though this grid may be similar to mental maps we form for navigation, the sensory stimuli corresponding to possible paths and to spatial context during real-life navigation are very different.

      We agree with the reviewer that while our task is engaging for participants and simple to follow, it does not mimic naturalistic navigation in humans. There is a natural tension in computational / experimental work in cognitive science in wanting to build closely on previous results and paradigms, while ensuring that results can generalise to real-world contexts. Here, our choice of paradigm and measures was closely built on previous papers using this task from Ho and colleagues (2022, 2023). While preparing this response, we learnt that the MIT group had also harnessed this same task to develop a novel dynamic variant of the VGC model (Chen et al., 2026) called the Just in Time model (JIT). The advantage of building on this prior work is that we are able to iteratively refine and expand the VGC approach, and (in our case) bring it into closer contact with work on modeling the deployment of spatial attention in human vision. The top-down aspect of the maze notably facilitated the study of the spatial deployment of attention. We now discuss the novel dynamic variant of the VGC model in our paper on page 27:

      “We close by reflecting on opportunities for further work in this area. First, an important next step is to explore the process by which task representations are formed, and how inductive biases might affect the process of task construal. The sVGC model is a normative model of the optimal task representation. Since it’s construction involves an exhaustive calculation over possible paths, it is not a plausible basis for a model of the psychological process by which participants actually construct task representations. More recently a process model of task construal has been proposed, the Just in Time model (JIT). The hypothesis of the JIT model is that participants’ task representations are built up over time by iteratively simulating possible paths through the maze, affording insight into the construal process (Chen et al., 2026). In future work, it would be of interest to ask whether the attentional effects we observe in our experiments could be meshed with a dynamic JIT account of construal. We speculate that visuospatial attention may operate as an early filter, limiting the space of potential construals based on coarse spatial features of the environment, constraining a dynamic selection of obstacles. Brain imaging techniques with high time resolution, such as M/EEG, may be able to shed further light on how task representations are formed as participants plan.”

      c. Behavioral performance is not reported, so it is unknown whether participants are able to properly complete the task. The task seems pretty difficult to navigate, especially when the obstacles disappear, and in combination with the central fixation.

      Behavioural performance is now reported in response to point 1a above.

      d. There is no discussion of whether/how this navigation task generalizes to other forms of planning.

      We fully agree that an important next step would be to generalise our results on construal to naturalistic forms of planning – for instance, using immersive VR mazes, and or investigating cognitive rather than perceptual construals. We have now added a line to this effect to the Discussion on page 28.

      “An important next step to further our understanding of task representations would be to extend the current paradigm to other forms of planning and more naturalistic tasks, such as navigating immersive virtual reality (VR) environments, planning over cognitive rather than perceptual representations (e.g. planning over an abstract space), or internallyguided planning based on working memory.”

      Reviewer #2 (Recommendations for the authors):

      (1) There are, of course, benefits to simple tasks like the ones described, but it would be interesting to compare the results to a possible experiment in which a top-down grid/map is used for planning, but then task execution is carried out in a simulated environment corresponding to the map. Also, perhaps beyond the scope of the questions addressed in this paper, but I am curious how unexpected obstacles affect representations. For instance, if participants plan based on a topdown map and then begin "real" navigation but encounter an unexpected obstacle that was not indicated on the map, does this modulate representations/awareness of future obstacles (near vs. far)?

      We fully agree that all of these lines of investigation would be super interesting to pursue in future studies, and we have added a line to the discussion to that effect on page 28:

      “An important next step to further our understanding of task representations would be to extend the current paradigm to other forms of planning and more naturalistic tasks, such as navigating immersive virtual reality (VR) environments, planning over cognitive rather than perceptual representations (e.g.. planning over an abstract space), or internallyguided planning based on working memory.”

      (2) Regarding self-reported awareness as a metric, an additional experiment could ask participants to recreate the maze (identify locations of obstacles after they disappear). This would be a more objective measure of awareness.

      Yes indeed, and as described above, this was a metric used by Ho and colleagues in their previous experiment. As we describe in more detail above, the task representations obtained via memory or awareness reports demonstrated striking similarity (⍴ = 0.86).

      (3) What is meant by "all possible orientations of the maze" in this Methods sentence: "For dataset dSC 1, participants solved each of these 24 mazes four times (i.e., all possible orientations of the maze)"?

      We thank the Reviewer for prompting more clarity here. We vertically and horizontally reversed mazes (i.e., left-right flipped) such that participants could not predict the location of the goal or start location. In this way, each maze stimulus had four potential orientations. This resulted in 96 trials of 24 unique mazes. We have clarified this point in the Methods section on page 30:

      Maze stimuli were vertically and horizontally reversed (i.e., left-right flipped) such that participants could not predict the location of the start or goal location. This resulted in four potential orientations of each maze across all 24 mazes, 96 trials in total.

      (4) For lateralization, it was unclear until reading the Methods that the lateralization index was calculated using the VGC-predicted level of taskrelevance. From the main text and Figure 2, I assumed you were just counting the number of task-relevant obstacles on each side, rather than also quantifying relevance. I understood after reading the Methods, but this could be clarified further.

      We agree with the Reviewer that this was not evident from the text. We have now updated the Results section of the manuscript to clarify this point on page 11:

      “To test this hypothesis, we derived a measure of task-relevant lateralization inspired by the attention literature (Ghafari et al., 2024; Keefe & Störmer, 2021; Vollebregt et al., 2015) (Figure 2a). Specifically, we separated maze stimuli across the vertical meridian and computed the ratio of task-relevant information presented on the left versus right side derived from the sVGC model. For example, the maze shown in Figure 2a has twice the amount of task-relevant information presented in the left hemifield than in the right (lat. Index= 1/3). A lateralization index of 0.0 indicates that both hemifields contain equal amounts of task-relevant information (i.e., non-lateralized). The lateralization index was computed using the continuous VGC predictions for each obstacle (see Methods).”

      (5) The explanation in the Methods of how the width of the attentional spotlight was chosen references Figure 1b and Supplementary Figure S2, but it seems that Supplementary Figure S8 explains this more in the caption. Also, I don't see how Figure S2 supports this.

      We apologize for this typo. The explanation of how we selected the width of the attentional spotlight should indeed reference supplemental Figure 15 (previously Figure S8). We have now corrected this and elaborated on this choice in the Methods section on page 35:

      “We fixed the ‘width’ of the attentional spotlight to a distance of 3 squares based on the observation that the two neighbouring obstacles positively predicted the awareness of a probe. We observed that the mean and median distance between neighbouring obstacles of the 2nd rank (i.e., second closest) was 3 squares away for all mazes (Figure S15). We therefore opted to fix the value of the attention spotlight to 3 squares based on these observations. Future work utilizing this model should consider the statistics of their maze stimuli when deciding on the ‘width’ of the attentional spotlight.”

      (6) The attentional spotlight width was assumed to be 3 squares, based on the linear regression predictions of the effect of neighboring obstacles on stimulus awareness. Given the individual differences across participants, it would be interesting to choose a different attentional spotlight size for each participant. Would a participant-specific attentional spotlight width improve the predictions of the spotlight-VGC model?

      The Reviewer highlights a very interesting question: do individuals vary in terms of their attentional spotlight? To test this hypothesis, we first estimated the size of the attentional spotlight for each individual based on lateralized maze stimuli, and then used this to generate personalized attentional spotlight model predictions for each subject based on these values (Figure S11). We restricted this analysis to the dSC1 dataset, where we had substantially more trials (96 in total).

      In brief, we observed that indeed the personalized spotlight model fit participants’ awareness reports better than both a normative sVGC model and a group-level attentional spotlight model. We interpret these findings with some caution as i) a subset of individuals had flat attentional slopes and therefore were excluded from these analyses, and ii) we believe we require additional trials to ensure a robust model fit at the individual level. While our results are encouraging, we hope future investigations into inter-individual differences will extend these findings.

      We have included these additional analyses in the main text.

      On page 18:

      “To further explore inter-individual differences in task construal, we tested whether adjusting the attentional spotlight width to each participant’s awareness reports improved the predictions of the attentional spotlight model. To do so, we first determined the width attentional spotlight of each individual in the dSC1 dataset based on lateralized maze stimuli. We then generated person-specific attentional spotlight model predictions for the non-lateralized maze stimuli to avoid overfitting the data (Figure S11). We note that 7 participants had either flat attentional slopes or negative beta coefficients, which prevented the selection of an appropriate attentional spotlight width (see Methods for details). We observed a significant improvement in model fit for the person-specific attentional spotlight model relative to both the group-level attentional spotlight model (ΔBIC= -1487.39) and the normative sVGC model (ΔBIC= -1655.29). While the limited trial numbers per participant in our current dataset warrants caution in interpreting these findings, these findings do encourage further research on inter-individual differences in attentional deployment during planning.”

      On pages 23-24:

      “Inter-individual differences in attention

      We also observed considerable inter-individual differences in attentional effects across participants (Figure 1c). While some participants were strongly influenced by the spatial context of neighbouring stimuli, others showed more limited evidence for an attentional effect (Figure 1b). Inter-individual differences in attention predicted the sparsity of participants’ simplified representations: participants with larger attention effects exhibited sparser representations. Moreover, these inter-individual differences in effects of spatial proximity could be incorporated into the attentional spotlight model by varying the width of the spotlight, resulting in better model predictions.”

      “Beyond these spatial proximity effects, we also observed that participants varied in their tendency to lateralize their attention to a single hemifield (Figure 3). This tendency was observed across all three datasets, including on maze stimuli whose value-guided model predictions were not lateralized. This suggests that although a strategy of allocating attention is sub-optimal for these maze stimuli, some individuals preferentially attend to a single hemifield in a heuristic-like fashion. This tendency to attend to a single hemifield was a robust inter-individual difference across maze stimuli (Figure 3d), and dovetails with individual-level variation in spatial proximity effects. Taken together, these findings offer novel insights into how people vary in the ways they allocate spatial attention to solve complex problems. Future research could explore how these individual differences constrain performance on other tasks that require planning and search in highdimensional spaces.”

      On page 17 of the Supplemental Materials:

      (7) The supplementary text about lateralization effects, above Supplementary Table S8, references Table S6, but it is Table S6 does not seem to display lateralization results.

      We thank the Reviewer for pointing out this typo: we now refer to the correct supplementary table (S9).

      (8) Why does it matter that "the maze stimuli were not designed to test horizontalmeridian lateralization effects"? What is the effect on power? Is it because there is not a good enough range in lateralization indices? It would be good to clarify, or just remove that explanation, since the cortical retinotopy explanation seems more convincing.

      We did not specifically design the maze stimuli such that there is an equal number of obstacles above and below the horizontal meridian. As such, the lateralization index derived along the horizontal meridian does not control for the number of obstacles in each hemifield, which may influence participants’ awareness reports. In contrast, we designed maze stimuli such that this would not be a concern for the vertical meridian. We have clarified this point in the discussion on page 27.

      “Third, while we observed clear lateralization effects along the vertical meridian (i.e., left vs right hemifield), effects along the horizontal meridian were less clear (i.e., above vs below; see Table S15-16). One potential explanation of this asymmetry is the retinotopic organization of the cortex, in which spatially adjacent stimuli can be retinotopically distant if presented on the opposite side of the vertical (but not horizontal) meridian, facilitating distractor inhibition. Importantly, while the visuospatial attention effects observed in the Ho 1 and 2 datasets are likely driven by both covert and overt shifts in attention, the findings presented in experiment 3 (i.e., dSC1 dataset) rule out the contribution of overt shifts in attention through the use of eye tracking (see Figure S13-14)(Carrasco, 2011; Pooresmaeili & Roelfsema, 2014).”

      (9) For Figure 2c, it would be helpful to directly state what each dot and line mean.

      We updated the caption of Figure 2c to clarify what we are plotting: each point represents an obstacle, and each line the linear fit for a maze stimulus.

      “Each point represents an obstacle in a maze, and each line represents the model fit for that specific maze stimulus.”

      (10) Figures and wording imply there is only a single probe obstacle per trial, but methods and model imply that participants are asked to report awareness for every obstacle. This should be clarified.

      We apologize for any confusion regarding the methodology of our study. The Reviewer is correct that participants reported their awareness of every obstacle presented on a given trial. We have clarified this in the Results section of the manuscript on page 7:

      “Note, participants reported their awareness of every obstacle presented on a given trial.”

      We have also updated the caption of Figure 1 to clarify this point:

      “Once participants finished navigating the maze, they were asked to report their awareness of every obstacle presented on a given trial in a random order.”

      (11) What is the reason for the exclusion of participants (33 for experiment 1 and 26 for experiment 2)?

      Participants were excluded from the Ho et al. datasets 1 and 2 based on their preregistered exclusion criteria, as detailed in the Methods section of their paper. In short, trials were excluded if participants took longer than 20 seconds to complete the trial, or if they spent longer than 5 seconds in the initial state. Participants were excluded if less than 80% of trials remained after reaction time exclusions or if they failed 2 out of 3 comprehension checks. We have elaborated on this point in the Methods section on page 31.

      “Participants were excluded from analyses based on pre-registered exclusion criteria as detailed in (Ho et al., 2022). In short, participants were excluded if 20% or more of their trials were removed based on reaction times, or if they failed 2 out of 3 comprehension checks.”

      (12) The supplemental figures are not referenced in order, and some are not referenced at all; this should be fixed.

      We thank the Reviewer for pointing this out and have reorganized our Supplementary materials accordingly.

      Reviewer #3 (Public review):

      Summary:

      The authors build on a recent computational model of planning, the "value-guided construal" framework by Ho et al. (2022), which proposes that people plan by constructing simple models of a task, such as by attending to a subset of obstacles in a maze. They analyze both published experimental data and new experimental data from a task in which participants report attention to objects in mazes. The authors find that attention to objects is affected by spatial proximity to other objects (i.e., attentional overspill) as well as whether relevant objects are lateralized to the same hemifield. To account for these results, the authors propose a "spotlight-VGC" model, in which, after calculating attention scores based on the original VGC model, attention to objects is enhanced based on distance. They find that this model better explains participant responses when objects are lateralized to different hemifields. These results demonstrate complex interactions between filtering of task-relevant information and more classical signatures of attentional selection.

      Strengths:

      (1) The paper builds on existing modeling work in a novel manner and integrates classic results on attention into the computational framework.

      (2) The authors report new and extensive analyses of existing data that shed light on additional sources of systematic variability in responses related to attentional spillover effects

      (3) They collect new data using new stimuli in the original paradigm that directly test predictions related to the lateralization of task-relevant information, including eye tracking data that allows them to control for possible confounds.

      (4) The extended model (spotlight-VGC) provides a formal account of these new results.

      We thank the Reviewer for their positive assessment of our manuscript and their insightful comments, which has improved the clarity of our findings.

      Weaknesses:

      (1) The spotlight-VGC model has a free parameter - the "width" of the attentional spotlight. This seems to have been fixed to be 3 squares. It would be good if the authors could describe a more principled procedure for selecting the width so that others can use the model in other contexts.

      Our choice for this parameter was informed by the spatial effects reported in Figure 1b. We observed that the two closest neighbouring obstacles to a probe had similar awareness (i.e., positive beta weights). We therefore compute the mean and median distances between obstacle pairs that were the second closest obstacle to a probe. This distance was 3 squares away, as depicted in Figure S15. We fixed the width of the attentional spotlight across all studies based on this observation. We agree that future research utilizing this model may need to tune this hyperparameter depending on the mean distance between a probe and its neighbours.

      We have clarified this point in the methods section on page 35:

      “We fixed the ‘width’ of the attentional spotlight to a distance of 3 squares based on the observation that the two neighbouring obstacles positively predicted the awareness of a probe. We observed that the mean and median distance between neighbouring obstacles of the 2nd rank (i.e., second closest) was 3 squares away for all mazes (Figure S15). We therefore opted to fix the value of the attention spotlight to 3 squares based on these observations. Future work utilizing this model should consider the statistics of their maze stimuli when deciding on the ‘width’ of the attentional spotlight.”

      Following the suggestion of Reviewer 2 point 6, we now also explored inter-individual differences in this parameter. To do so, we first used the lateralized mazes in the dSC1 dataset to determine the optimal width of the attentional spotlight for each individual.

      Then, we used this spotlight to derive model predictions for each person. We observed that these personalized attentional spotlight model predictions fit participants’ awareness reports on non-lateralized mazes better than the fixed-width spotlight model. We believe this preliminary result suggests the importance of modelling inter-individual differences in attentional deployment during planning. We report these effects on page 17.

      (2) Have the authors considered other ways in which factors such as attentional spillover and lateralization could be incorporated into the model? The spotlightVGC model, as presented, involves first computing VGC predictions and only afterwards computing spillover. This seems psychologically implausible, since it supposes that the "optimal" representation is first formed and then it gets corrupted. Is there a way to integrate these biases directly into the VGC framework, perhaps as a prior on construals? The authors gesture towards this when they talk about "inductive biases", but this is not formalized.

      We thank the reviewer for bringing up this very important point. We think that a full computational treatment of the inductive bias would be a distinct project, but now seek to expand our discussion on the mechanisms by which representations could be formed. In this context, we specifically highlight novel computational work from the MIT group that was published as a preprint in the time since we submitted our paper, and which proposes a new process account of construal, the “Just in Time” (JIT) model. We also elaborate on a possible mechanism by which visuospatial attention may aid the dynamics of the construal process. In short, we agree with the reviewer that spatial attention may bias individuals to search over a subset of potential representations based on low-level spatial characteristics of the obstacles (e.g., their spatial spread in the visual field), prior to (or in concert with) a dynamic JIT-like selection process. We now elaborate on these possibilities on pages 27-28:

      “We close by reflecting on opportunities for further work in this area. First, an important next step is to explore the process by which task representations are formed, and how inductive biases might affect the process of task construal. The sVGC model is a normative model of the optimal task representation. Since it’s construction involves an exhaustive calculation over possible paths, it is not a plausible basis for a model of the psychological process by which participants actually construct task representations. More recently a process model of task construal has been proposed, the Just in Time model (JIT). The hypothesis of the JIT model is that participants’ task representations are built up over time by iteratively simulating possible paths through the maze, affording insight into the construal process (Chen et al., 2026). In future work, it would be of interest to ask whether the attentional effects we observe in our experiments could be meshed with a dynamic JIT account of construal. We speculate that visuospatial attention may operate as an early filter, limiting the space of potential construals based on coarse spatial features of the environment, constraining a dynamic selection of obstacles. Brain imaging techniques with high time resolution, such as M/EEG, may be able to shed further light on how task representations are formed as participants plan.”

      […]

      “Fourth, it will also be necessary to elaborate on how bottom-up and top-down aspects of attentional selection are combined to guide complex task representations and plans. Foundational questions remain unanswered, for instance: can multiple spatial locations be preferentially selected at once, i.e. are there multiple spotlights (Awh & Pashler, 2000; McMains & Somers, 2004; Pylyshyn & Storm, 1988; Shaw & Shaw, 1977)? There is also discourse on how spatial attention may move from one location to another: are the intervening visual regions between attended locations similarly selected (Dubois et al., 2009; Kr & Np, 1999; McMains & Somers, 2004, 2005)? Our findings tentatively suggest that individuals are able to attend to disparate spatial regions to form sparse task representations, yet there is substantial variability in how individuals orient their attention during the task. The present paradigm and computational modelling, in conjunction with carefully designed stimuli, may help resolve these outstanding questions.”

      (3) Can the authors rule out that the lateralization effects are the result of memory biases since the main measure used is a self-report of attention?

      We thank the reviewer for bringing up this important point. In our experiments, we sought to measure participants’ subjective awareness of the maze stimuli as a readout of their conscious task representation on each trial. This approach marries an extensive literature on measures of perceptual awareness in consciousness science (e.g., using the Perceptual Awareness Scale) with computational models of planning. Participants’ memory of (their awareness of) the obstacles is inherent to this approach, but just as with similar approaches in consciousness science (e.g. measures of iconic memory in the Sperling paradigm), we think it provides a reasonably “online” measure of awareness. It’s important of course to ensure that results obtained with awareness reports are not idiosyncratic, and generalise to other approaches to quantifying task representations.

      To further bolster the convergent validity of our awareness measure, we reanalyzed the data from Ho and colleagues. In their original paper, they developed a variant of the maze-navigation task where participants were asked to recall the location of obstacles as well as report their awareness (Exp 3) and a third variant of the task where participants could hover their cursors over hidden obstacles to reveal their locations (Exp 4). These data allowed us to validate the awareness reports against objective measures of recall and mouse-tracking data. We observed that the subjective awareness reports of participants were strikingly correlated with recall/hover measures across two independent samples of participants (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness; ⍴ = 0.76 between the probability of hovering over the obstacle and awareness; ⍴ = 0.65 between the duration of the mouse hovering and awareness). We believe these findings validate participants’ awareness reports. These findings are now reported on page 22 of the manuscript.

      “Finally, we examined the convergent validity of participants’ awareness reports by reanalyzing the memory recall data reported in Ho and colleagues’ experiment (Ho et al., 2022). We reasoned that participants should demonstrate similar task representations regardless of the measure used to probe the construal. In line with this prediction, we observed that the obstacle awareness reports and memory/hover measures were strikingly correlated within three independent samples of participants (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness; ⍴ = 0.76 between the probability of hovering over the obstacle and awareness; ⍴ = 0.65 between the duration of the mouse hovering and awareness; see Tables S18 and S19).”

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review)

      Summary:

      In this manuscript the authors derive a mean-field model for a network of Hodgkin-Huxley neurons retaining the equations for ion exchange between the intracellular and extracellular space.

      The mean-field model derived in this work relies on approximations and heuristic arguments that, on the one hand, allow a closed-form derivation of the mean-field equations, and on the other hand restrict its validity to a limited regime of activity corresponding to quasi-synchronous neuronal populations. Therefore, rather than an exact mean-field representation, the model provides a description of a mesoscopic population of connected neurons driven by ion exchange dynamics.

      Strengths:

      The idea of deriving a mean-field model which relates the slow-timescale biophysical mechanism of ion exchange and transportation in the brain to the fast-timescale electrical activities of large neuronal ensembles.

      Weaknesses:

      The idea underlying this work is not completely implemented in practice.

      The derived mean field model do not show a one-to-one correspondence with the neural network simulations, except in strongly synchronous regimes. The agreement with the in vitro experiment is hardly evident, both for the mean-field model and for the network model. The assumptions made to derive the closed-form equations of the mean field model have not been justified by any biological reason, they just allow for the mathematical derivation. The final form of the mean-field equations do not clarify whether or not microscopic variables are used together with macroscopic variables in an inconsistent mixture.

      Comments on revisions:

      The main weaknesses I listed in the first report are still present, since the authors did not answer my questions on a solid basis. I report the list for completeness:

      (1) It seems that the reduction methodology that is employed is not the most suitable one for the single-neuron model they are considering.

      (2) The formulation of the mean-field derivation is unnecessarily complicated. It could be heavily simplified by following previously published approaches to derive biologically realistic neural masses.

      (3) The model seems to work only for highly synchronized situations and not for the standard asynchronous evolution usually observed in neural circuits.

      Therefore, my statement remains unchanged.

      Reviewer #2 (Public review)

      Summary:

      The authors aiming in developing a neural mass model characterized by few collective variables mimicking the dynamics of a network of Hodgkin - Huxley neurons encompassing ion-exchange mechanisms. They describe in details the derivation of the mean-field model , then they compare experimental results obtained for the hippocampus of a mice with the neural network simulations and the mean-field results. Furthermore, they report a bifurcation analysis of the developed model and simulation of a small network containing various coupled neural masses, somehow moving towards the simulation of an entire connectome.

      Strengths:

      The author attempts to develop a mean-field model for a globally coupled network of heterogeneous Hodgkin-Huxley neurons with explicit ion exchange mechanism between the cell interior and exterior.

      Weaknesses:

      (1) They do not employ the reduction methodology more suited for the single neuron model they consider.

      (2) Their derivation of the neural mass model is based on several assumptions, and not all well justified.

      (3) Their formulation of the mean-field derivation is unnecessary complicated, it can be strongly simplified by following previously published approaches to derive biologically realistic neural masses.

      (4) Their model seems to work only for highly synchronized situations and not for the standard asynchronous evolution usually observed in neural circuits.

      General Statements:

      The authors honestly declared the many limitations of their approach, once assumed this the results of the mean-field are somehow inconsistent with the neural network simulations as expected.

      The authors suggest to employ this model for the simulations on the whole connectome to follow seizure propagation, however I believe that a simpler model, as the Epileptor, remains superior in this respect to this model. That indeed includes biophysical parameters but their correspondence with the ones employed in the network dynamics remain elusive, due to the many assumptions required to derive this mean field model. Furthermore it is more complicated than the Epileptor, I do not think that the present model will be largely employed by the community.

      Comments on revisions:

      The authors have corrected mistakes present in the manuscript and put a correct list of references.

      However, they refuse

      (1) To simplify the formulation of the model, the model contains unnecessary complications, as I have clearly written in my report, the authors agree, but they do not want to change the formulation;

      (2) To derive the mean field model in a simpler way, as possible, and as I asked many times in my Referee report, this would help the readers to understand the important aspect of the derivation, without not needed and confusing complicated formulations;

      (3) To compare direct simulations of the network with neural mass results in sub-section "Bifurcation analysis: emergent network states and multistability" to show bistability, as I asked.

      As a matter of fact the performed modifications do not solve my previous doubts on the validity of the results reported in the manuscript.

      Therefore, my previous assessments remain valid.

      We thank the editors and the two reviewers for their continued engagement with our manuscript. The three weaknesses retained from the first round are essentially identical between the two public reviews:

      (i) The reduction methodology is not the most suitable for the single-neuron model we consider;

      (ii) The mean-field derivation is unnecessarily complicated;

      (iii) The model works only in highly synchronous regimes and does not reproduce the asynchronous evolution typical of neural circuits.

      Both reviewers explicitly note that their assessments remain unchanged and we have decided not to alter the formulation of the model. We use this response to state—on the public record—exactly where we agree with the reviewers, where we disagree, and why.

      On point (i): the reduction methodology.

      We fully agree with the reviewers' technical observation: the Ott–Antonsen / Lorentzian-ansatz reduction in the form introduced by Montbrió, Pazó and Roxin (2015) is exact for canonical Type I neurons (QIF), whose membrane-potential equation is quadratic, and is not directly applicable to a Type II / Hodgkin–Huxley-type neuron whose voltage dynamics is cubic-like. On this point there is no disagreement.

      Where we differ is in the conclusion the reviewers draw from this observation. The reviewers read our work as applying an inappropriate reduction methodology to an inappropriate neuron model. We instead positioned our work, from the outset, as an extension of that methodology: we keep the biophysically detailed Hodgkin–Huxley substrate (because it is the only level at which extracellular ion concentrations, depolarization block, bursting and seizure-like events are biophysically grounded), and we adapt the reduction by approximating the cubic voltage nullcline as a piece-wise quadratic with two parabolas of opposite curvature. This is explicitly an approximate, not exact, mean-field. The Lorentzian ansatz is then applied on each branch of the piece-wise quadratic, with the limitations of this construction analyzed in the manuscript.

      The reviewers' alternative—starting from a Type I canonical model and grafting on biophysical features—would indeed yield an exact mean-field, but it would forfeit precisely what motivates our work: a tractable mesoscopic description in which the slow variables are physiologically interpretable ion concentrations rather than phenomenological parameters. The trade-off is that we give up exact rigour in order to construct a bridge between the Montbrió-style next-generation neural mass models on one side and the Epileptor on the other, with the additional benefit that the parameters of the resulting neural mass retain a biophysical correspondence (e.g., [K<sup>+</sup>]_bath, Δ[K<sup>+</sup>]_int, [K<sup>+</sup>]_g, the gating variable n) that the Epileptor does not afford.

      We therefore respectfully maintain our position: the methodology is not "the wrong reduction for a Type II neuron"; it is an extended reduction designed to be applicable beyond the Type I case, with explicitly characterized validity.

      On point (ii): the formulation is unnecessarily complicated.

      We agree with the reviewers that, given the assumptions we ultimately adopt, namely that the gating variable n and the potassium concentrations Δ[K<sup>+</sup>]_int and [K<sup>+</sup>]_g are treated as collective (mesoscopic) variables shared by the population, with n a function of the average membrane potential, the closed neural mass equations could be reached by the more direct path used by Guerreiro et al. (2022) and the related literature (R1–R7). In the revised manuscript we now state this explicitly, and we note that the same five-dimensional system arises under either derivation.

      Our choice to follow Chen and Campbell (2022) is motivated by the fact that it makes each approximation visible at the point where it is invoked. In particular, it exposes the moment-closure step (Eq. 19), the vanishing-flux boundary condition (Eq. 28), and the locations where microscopic and mesoscopic variables enter the description. We believe that for a reader trying to extend our framework, for instance to a setting with partial heterogeneity in the slow variables, or with stochastic gating, this is the more useful presentation. We have added a remark stating that the simpler Guerreiro-type derivation reaches the same equations under our assumptions, so that readers can take whichever route they find clearer.

      On point (iii): the model only works in highly synchronous regimes.

      Here we partially agree and partially disagree, and we would like the partial disagreement to appear on the public record.

      We agree that the Lorentzian ansatz is, strictly, valid in regimes where the population's membrane potential distribution is unimodal, that is, when essentially all neurons sit on the same side of the threshold V*. Where we disagree is with the implication that the mean-field model fails outside the strongly synchronous regime. The supplementary analysis in Fig. S2, added in the previous round, quantifies the error introduced by the first-moment approximation of n as a collective variable across the full range of [K<sup>+</sup>]_bath values, spanning quiescent, bursting, seizure-like, sustained ictal and depolarization-block dynamics. The fraction of neurons whose gating variable deviates from the population mean is below 2% for the parameters used throughout the manuscript, and the error becomes appreciable only during the brief transitions between sub- and supra-threshold states. These are precisely the moments at which the population is genuinely bimodal and the single-Lorentzian assumption is theoretically expected to leak. In other words, the error peaks coincide with the moments where our derivation tells us in advance that the assumption is locally invalid; the model "knows where it fails." Away from these transitions, the mean-field tracks the population average across all dynamical regimes shown in Fig. 3, not only in the most strongly synchronized ones.

      This is, in our view, the strongest argument we can make: we are not claiming exactness, and we are not unaware of the limitations. We have characterized them analytically (the construction of the piece-wise Lorentzian, and the theoretical reason a closed solution exists only when the two branches collapse onto one), and we have characterized them numerically (Fig. S2). The deviations are bounded, their location in parameter space is well identified, and they coincide with transitions where the underlying assumption is locally violated. We believe this constitutes a controlled approximation rather than an uncontrolled one, and we would like this distinction to be visible to readers of the Reviewed Preprint.

      We note, in this connection, that the reviewers' preferred reference point, the next-generation neural mass model of Montbrió et al. (2015), which is exact and one-to-one with its underlying network, is exact precisely because the underlying network is a network of QIF neurons. The corresponding statement for a network of Hodgkin–Huxley-type neurons with explicit ion exchange does not, to our knowledge, exist in closed form, and may not exist at all. The relevant question is therefore not whether our model matches the exactness of the QIF case, but whether the controlled approximation we provide is useful. Given the qualitative agreement with neural-network simulations across the full range of [K<sup>+</sup>]_bath, the qualitative agreement with the in vitro recordings, and the recovery of the expected bifurcation structure with new emergent regimes, we believe the answer is yes.

      Other outstanding points in the review.

      Reviewer 2 reiterates the view that the Epileptor remains superior for whole-connectome seizure-propagation simulations because it is simpler and better characterized. We do not dispute that the Epileptor is more thoroughly analyzed and more parsimonious. The complementarity we propose is not a replacement but a parameter-grounding, as the Epileptor's phenomenological parameters (excitability, slow permittivity) acquire, in the present framework, an interpretation in terms of measurable biophysical quantities (extracellular potassium, intracellular potassium variation, glial buffering).

      We thank the reviewers and editors once again for their careful reading, and we are grateful that the points of disagreement have been sharpened to a state where readers can judge them transparently.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors derive a mean-field model for a network of Hodgkin-Huxley neurons retaining the equations for ion exchange between the intracellular and extracellular space.

      The mean-field model derived in this work relies on approximations and heuristic arguments that, on the one hand, allow a closed-form derivation of the mean-field equations, and on the other hand restrict its validity to a limited regime of activity corresponding to quasi-synchronous neuronal populations. Therefore, rather than an exact mean-field representation, the model provides a description of a mesoscopic population of connected neurons driven by ion exchange dynamics.

      We agree with the reviewer's characterization. Our manuscript describes the derivation as relying on "approximations and heuristic arguments" and states that "the derivation is not exact"; what we provide is a controlled, approximate mesoscopic description in which the slow variables are physiologically interpretable ion concentrations rather than phenomenological parameters. An exact closed-form thermodynamic limit is, to our knowledge, available only for canonical Type I (QIF) networks (Montbrió, Pazó and Roxin, 2015) and a few of their extensions; it is not currently known for a Hodgkin–Huxley-type network with explicit ion-exchange dynamics. We acknowledge that the original description of the regime of validity may have caused confusion on this point, and in the revised manuscript we have therefore replaced the looser formulation "strongly synchronous regimes" by the more accurate "regimes where the membrane-potential distribution is unimodal and can be reasonably approximated by a Lorentzian" throughout the manuscript.

      Strengths:

      The idea of deriving a mean-field model that relates the slow-timescale biophysical mechanism of ion exchange and transportation in the brain to the fast-timescale electrical activities of large neuronal ensembles.

      We thank the reviewer for recognizing the motivation behind our work. This explicit coupling between slow biophysical ion dynamics and fast electrical activity is precisely the feature we tried to preserve in the reduction, even at the cost of giving up exactness.

      Weaknesses:

      The idea underlying this work is not completely implemented in practice.

      We address this general statement through the four specific sub-points the reviewer raises in the paragraph that follows.

      The derived mean field model does not show a one-to-one correspondence with the neural network simulations, except in strongly synchronous regimes.

      We partially agree and partially disagree. We agree that the Lorentzian ansatz is strictly valid where the membrane-potential distribution is unimodal, i.e. when essentially all neurons sit on the same side of the threshold V*. We disagree with the implication that the mean-field fails outside this regime. To make this claim quantitative, we added a new supplementary figure (Fig. S2) that quantifies the deviation of individual neurons' gating variables from the population mean across the full range of [K<sup>+</sup>]_bath values—quiescent, bursting, seizure-like, sustained ictal and depolarization-block dynamics. The fraction of deviating neurons is below 2% for the parameters used in the manuscript, with localized peaks only during the brief, genuinely bimodal transitions between sub- and supra-threshold states—precisely the moments at which the theory predicts the assumption to be locally invalid. Away from these transitions, the mean-field tracks the population average across all dynamical regimes shown in Fig. 3, not only in the strongly synchronized ones.

      The agreement with the in vitro experiment is hardly evident, both for the mean-field model and for the network model.

      We acknowledge that the experimental and simulated traces in the original Fig. 4 did not match quantitatively; this was never our intention. The figure and its caption have been reorganized in the revised manuscript to frame the comparison as qualitative: we aim to demonstrate the shared structure i.e., the slow modulation of fast population activity by extracellular potassium fluctuations, rather than to claim a quantitative fit.

      We also added two clarifications that account for the residual differences: (i) the network simulations were intentionally run with rescaled biophysical parameters (membrane capacitance, gating time constants) to keep the computational cost feasible, a standard practice when the goal is to validate dynamical mechanisms rather than absolute timescales; (ii) the in vitro LFP recordings were AC-coupled, so the slow DC components visible in the mean-field traces are filtered out at acquisition.

      The assumptions made to derive the closed-form equations of the mean-field model have not been justified by any biological reason, they just allow for the mathematical derivation.

      We agree that the modelling assumptions were scattered through the original derivation. In the revised manuscript, the three core assumptions are stated explicitly at the point of derivation: (i) the gating variable n is treated as a collective, population-averaged variable; (ii) the potassium concentrations Δ[K<sup>+</sup>]_int and [K<sup>+</sup>]_g are homogeneous across the population, biophysically justified by the rapid redistribution of ions through diffusion and electrochemical gradients, which enforces near-instantaneous equilibration at the mesoscopic scale; (iii) no heterogeneity is assumed at the level of ion dynamics. The meaning of "locally homogeneous" is now defined explicitly.

      On the biophysical motivation of the in vitro perturbation used in the experiment, we have added a new Methods subsection that explains how low extracellular Mg<sup>2+</sup> unblocks NMDARs and abolishes the divalent-cation stabilisation of the resting membrane potential, depolarising hippocampal neurons and increasing the driving force for outward K<sup>+</sup> currents. This provides a biophysical link between the experimental perturbation and the model's main control parameter, the extracellular potassium concentration. We also added a reference to the well-established model of epileptic discharges that underpins the experiment.

      The final form of the mean-field equations does not clarify whether or not microscopic variables are used together with macroscopic variables in an inconsistent mixture.

      We now explicitly acknowledge that in the spiking-network simulations the gating variable n is microscopic (each neuron has its own n_i), whereas in the mean-field derivation it is treated as mesoscopic and shared by the population. This asymmetry between modalities is discussed both in the Results and in the Limitations sections, and is identified as a likely source of some of the discrepancy between the two modalities.

      We have also made the notation in Eqs. (36)–(37) consistent (firing rate r used throughout, full current-based dV/dt̄ restored) and fixed the typos and broken equation/reference labels that contributed to the impression of inconsistency (Eqs. 18, 28, 29; the Fig. 2(c) [K<sup>+</sup>] bath label; the lost reference at line 696).

      Reviewer #2 (Public review):

      Summary:

      The authors aim to develop a neural mass model characterized by a few collective variables mimicking the dynamics of a network of Hodgkin – Huxley neurons encompassing ion-exchange mechanisms. They describe in detail the derivation of the mean-field model, then they compare experimental results obtained for the hippocampus of a mouse with the neural network simulations and the mean-field results. Furthermore, they report a bifurcation analysis of the developed model and simulation of a small network containing various coupled neural masses, somehow moving towards the simulation of an entire connectome.

      We thank the reviewer for the accurate summary of the manuscript's structure and aims.

      Strengths:

      The author attempts to develop a mean-field model for a globally coupled network of heterogeneous Hodgkin-Huxley neurons with an explicit ion exchange mechanism between the cell interior and exterior.

      We thank the reviewer for recognizing this objective. The retention of Hodgkin–Huxley dynamics with explicit ion exchange is precisely the feature that distinguishes our framework from QIF-based reductions, and it is what enables the slow variables of the resulting mean-field to retain a direct biophysical interpretation.

      Weaknesses:

      (1) It seems that the reduction methodology that is employed is not the most suitable one for the single-neuron model they are considering.

      We agree, on technical grounds, with the observation: the Ott–Antonsen / Lorentzian-ansatz reduction is exact for canonical Type I neurons (QIF) and is not directly applicable to a Type II Hodgkin–Huxley-type neuron with a cubic-like voltage nullcline. Where we differ is in the conclusion. We did not apply an inappropriate reduction to an inappropriate neuron; we deliberately extended the methodology by approximating the cubic nullcline as a piece-wise quadratic with two parabolas of opposite curvature, and then applying the Lorentzian ansatz on each branch. The result is an explicitly approximate, biophysically grounded mean-field, with its regime of validity stated and quantified (Fig. S2).

      To make this positioning explicit, we have added a paragraph to the Introduction that situates our work within the next-generation neural mass literature (Byrne et al. 2020; Montbrió, Pazó & Roxin 2015; Guerreiro et al. 2022; Forrester et al. 2024; Perl et al. 2023; Gerster et al. 2021; and works on short-term plasticity, adaptation, conductance-based reductions,

      spike-timing-dependent plasticity, random connectivity and noise) and clarifies that we see our contribution as complementary to these approaches, not as a competitor to the exact QIF reductions.

      (2) The authors' derivation of the neural mass model is based on several assumptions, and not all well justified.

      We agree that, in the original submission, the modelling assumptions were scattered through the derivation. In the revised manuscript, the three core assumptions are stated explicitly at the point of derivation: (i) the gating variable n is treated as a collective population-averaged variable; (ii) the potassium concentrations Δ[K<sup>+</sup>]_int and [K<sup>+</sup>]_g are homogeneous across the population, biophysically justified by the rapid redistribution of ions through diffusion and electrochemical gradients, which enforces near-instantaneous equilibration at the mesoscopic scale; (iii) no heterogeneity at the level of ion dynamics is assumed. The meaning of "locally homogeneous" is now defined explicitly. In addition, we have added Fig. S2, which quantifies numerically the error introduced by the moment-closure assumption (deviation below 2% for the parameters used in the manuscript).

      (3) The formulation of the mean-field derivation is unnecessarily complicated. It could be heavily simplified by following previously published approaches to derive biologically realistic neural masses.

      We agree that, under the assumptions ultimately adopted in our model—namely that n, Δ[K<sup>+</sup>]_int and [K<sup>+</sup>]_g are mesoscopic—the final five-dimensional system can be reached by the more direct path used by Guerreiro et al. (2022) and the related literature. We now state this explicitly in the revised manuscript and note that the same system arises under either derivation, so that the reader can take whichever route they find clearer. Our choice to retain the Chen and Campbell (2022) formalism is pedagogical: it exposes the moment-closure step (Eq. 19), the vanishing-flux boundary condition (Eq. 28), and the locations where microscopic versus mesoscopic variables enter the description, which is the more useful presentation for a reader wishing to extend the framework (e.g. to partial heterogeneity in the slow variables or to stochastic gating). We also made the notation in Eqs. (36)–(37) consistent (firing rate r used throughout, full current-based dV/dt̄ restored) and fixed a number of typos and broken equation/reference labels.

      (4) The model seems to work only for highly synchronized situations and not for the standard asynchronous evolution usually observed in neural circuits.

      We partially agree and partially disagree. We agree that the Lorentzian ansatz is strictly valid where the membrane-potential distribution is unimodal; we have replaced "strongly synchronous regimes" by this more accurate formulation throughout the manuscript. We disagree, however, with the implication that the mean-field is useful only in those regimes. Fig. S2, added in this revision, explicitly quantifies the deviation across all dynamical regimes (quiescent, bursting, seizure-like, sustained ictal and depolarization-block dynamics): it remains below 2% for the parameters used in the manuscript, with localized peaks only during the brief sub-to-supra-threshold transitions where the population is genuinely bimodal. Away from these transitions, the mean-field tracks the population average across all dynamical regimes shown in Fig. 3.

      General Statements:

      The authors honestly declared the many limitations of their approach. It is assumed that the results of the mean-field are somehow inconsistent with the neural network simulations as expected.

      We thank the reviewer for acknowledging that the limitations are honestly declared. As detailed above and quantified in Fig. S2, the deviation from the network simulations is bounded and well characterized; it is not assumed but measured.

      The authors suggest employing this model for the simulations on the whole connectome to follow seizure propagation, however, I believe that the Epileptor remains superior in this respect to this model. That indeed includes biophysical parameters but their correspondence with the ones employed in the network dynamics remains elusive, due to the many assumptions required to derive this mean-field model. Furthermore, it is more complicated than the Epileptor, I do not think that the present model will be largely employed by the community.

      We do not propose our model as a direct replacement for the Epileptor and we do not dispute that the Epileptor is more thoroughly analyzed and more parsimonious. The complementarity we propose is not a replacement but a parameter-grounding: the Epileptor's phenomenological parameters (excitability, slow permittivity) acquire, in our framework, a concrete interpretation in terms of measurable biophysical variables (extracellular potassium, intracellular potassium variation, glial buffering). Retaining the Hodgkin–Huxley substrate is essential to ground these variables biophysically.

      To make this complementarity more visible, the Limitations and Discussion section has been expanded to discuss the choice of a purely excitatory network as a first step (with excitatory–inhibitory generalizations available via the synaptic reversal potential) and to point to additional biological ingredients (calcium and other ions, plastic synapses, random connectivity and noise, adaptation, spike-timing-dependent plasticity) that the framework can accommodate, with reference to the next-generation neural mass literature.

      We thank the reviewers and editors for their careful reading. We hope this public response makes our reasoning, the limits of our approach, and the concrete revisions made in this round transparent.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In general, the writing is scattered. Every time a model is introduced, one starts from the general formulation only to find that a very simplified case is used with respect to that formulation, which is very confusing. Authors need to reduce unnecessary formulations that confuse the reader and make it clear which formulations are actually used.

      We thank the reviewer for this comment and understand the concern regarding the balance between general formulations and specific approximations. Our intention in including the more general equations and derivations (e.g., Eq. 7 and others) was pedagogical — to ensure completeness and transparency in the modeling steps, especially for readers less familiar with mean-field reductions of biophysically detailed models. These general forms also serve to clarify the assumptions underlying the simplifications we employ. In the latest version, we improved the clarity of core equations (e.g., Eq. 37), which form the basis of all simulations presented (see details below, in the answer to question 14).

      (2) The Introduction would benefit from a wider view of the literature. The literature on exact mean field models (i.e. derived from the Lorentzian Ansatz) has flourished in the last years. In particular, it would be worth considering the following papers, where exact neural mass models are applied to perform whole-brain and large-scale brain simulations:

      Forrester, M., Petros, S., Cattell, O., Lai, Y. M., O'Dea, R. D., Sotiropoulos, S., & Coombes, S. (2024). Whole brain functional connectivity: Insights from next generation neural mass modelling incorporating electrical synapses. PLOS Computational Biology, 20(12), e1012647.

      Perl, Y. S., Zamora-Lopez, G., Montbrio, E., Monge-Asensio, M., Vohryzek, J., Fittipaldi, S.,

      Campo, C. G., Moguilner, S., Ibanez, A., Tagliazucchi, E., Yeo, B. T. T., Kringelbach, M. L., & Deco, G. (2023). The impact of regional heterogeneity in whole-brain dynamics in the presence of oscillations. Network Neuroscience, 7(2), 632-660.

      Byrne, Aine, James Ross, Rachel Nicks, and Stephen Coombes. "Mean-field models for EEG/MEG: from oscillations to waves." Brain topography 35, no. 1 (2022): 36-53.

      Gerster, M., Taher, H., Skoch, A., Hlinka, J., Guye, M., Bartolomei, F.,... & Olmi, S. (2021). Patient-specific network connectivity combined with a next generation neural mass model to test clinical hypothesis of seizure propagation. Frontiers in Systems Neuroscience, 15, 675272.

      Byrne, Aine, Reuben D. O'Dea, Michael Forrester, James Ross, and Stephen Coombes. "Next-generation neural mass and field modeling." Journal of neurophysiology 123, no. 2 (2020): 726-742.

      Benitez-Stulz, Sophie, Samy Castro, Gregory Dumont, Boris Gutkin, and Demian Battaglia. "Compensating functional connectivity changes due to structural connectivity damage via modifications of local dynamics." bioRxiv (2024): 2024-05.

      We have added the following paragraph:

      “Recently, a class of these models, called next-generation neural mass models [42], has been developed based on an analytical approach introduced by [25] that allowed for the exact derivation of mean field parameters for a population of quadratic integrate-and-fire (QIF) neurons. These can be linked to EEG/MEG oscillations [43], including epipeltic seizures [43], and have been used to study various aspects of the whole-brain dynamics such as the low-dimensional manifold of the resting state [45,46], aging [47] and neural signatures of consciousness [48].”

      We have also modified the preceding paragraph of the introduction that now reads:

      “At the mesoscopic level, the observable properties of a neuronal ensemble are generally explained by statistical physics formalism of mean-field theory [19-22]. Mean-field models demonstrated a predictive value for studying the mesoscopic dynamics of neuronal populations [23], providing statistical descriptions of neuronal networks [2, 19, 24-29], which can be used to address questions related to network-level mechanisms [12, 24, 30].

      In general, neural mass models have a low enough number of parameters to be tractable and provide general intuitions regarding mechanisms underlying complex neuronal activity [31-36]. For example, statistical population measures, such as the firing rate, can be used to assess mesoscopic dynamics [1, 7, 31, 36-41].”

      (3) Moreover, conductance-based models have been already implemented in neural mass models not only in references [69, 71, 95], but also in:

      Guerreiro, I. C., Di Volo, M., & Gutkin, B. (2023). A new generation of reduction methods for networks of neurons with complex dynamic phenotypes.

      Capone, C., Di Volo, M., Romagnoni, A., Mattia, M., & Destexhe, A. (2019). State-dependent mean-field formalism to model different activity states in conductance-based networks of spiking neurons. Physical Review E, 100(6), 062413.

      We have added the following sentence:

      “Moreover, conductance-based couplings between the spiking neurons have been already implemented in neural mass models [58, 59, 91, 93, 121], but without an extracellular exchange mechanism.”

      (4) Sec. 1.1 As previously established in the literature, a system of all-to-all coupled neuronal equations can be solved exactly in the thermodynamic limit (i.e., infinite neurons limit) if the single neuron membrane potential equation is a quadratic function and if the instantaneous distribution of membrane potentials of neurons in a population is described by a Lorentzian [Montbrió, E., Pazó, D. & Roxin, A. Physical Review X 5 (2), 021028 (2015)]. This means that the thermodynamic limit can be performed for a Canonical Type I model like the quadratic integrate-and-fire.

      What is the biological justification and the reason to approximate a different neuron type (a type II neuron model), whose membrane potential equation resembles a cubic function, with a quadratic function? The fact that it can be solved in the quadratic approximation is not, in my opinion, a sufficient justification. It would be more correct to start from a type I neuron at the microscopic level with a quadratic function and then provide additional biological features.

      We thank the reviewer for raising this important point. We respectfully disagree with the notion that starting from a canonical Type I model (such as the quadratic integrate-and-fire neuron) would be a more biologically grounded approach. While the quadratic form is analytically convenient, it does not capture certain key features of neuronal excitability particularly those related to bursting, seizure-like events, and depolarization block which are closely tied to the cubic-like nullcline geometry arising in Hodgkin–Huxley-type models, especially in the presence of slow ion dynamics.

      Our work seeks to bridge biophysical realism with analytical tractability. The step-wise quadratic approximation we employ is specifically designed to mimic the cubic membrane potential profile that emerges from the full ion-exchange dynamics. While the Lorentzian Ansatz is not strictly justified in this case from first principles, we show that it yields a workable and biologically interpretable mean-field description, which aligns with single-neuron dynamics, population simulations, and even in vitro observations. To our knowledge, this is a novel contribution that extends mean-field modeling beyond currently available approaches, which are often restricted to simplified or phenomenological neuron models.

      In this context, using a quadratic approximation is not merely a mathematical convenience — it is a means to retain key dynamical features of more realistic (non-Type I) neurons within a tractable framework, enabling insights into complex behaviors like multistability and pathological bursting.

      (5) Sec. 1.2 As shown in Figure 3, the mean-field equations do not show a one-to-one correspondence with the neural network simulations, except in strongly synchronous regimes. This represents a strong limitation in the model, especially because exact neural mass models (as shown in Reference [23]) perfectly fit the dynamics of the underlying network model both in the asynchronous and in the synchronized regime.

      We appreciate the reviewer’s observation and acknowledge that our original description may have caused confusion. The model's validity is not strictly limited to strongly synchronous regimes, but rather to regimes where the distribution of membrane potentials across the neuronal population remains unimodal and can be reasonably approximated by a Lorentzian. This includes but is not restricted to—highly synchronized states.

      We agree that this distinction is important and have clarified it in the revised manuscript (e.g., “in strongly synchronous regimes” —> “in regimes where the membrane potentials' distribution is unimodal and can be reasonably approximated by a Lorentzian”).

      In contrast to exact mean-field reductions based on quadratic integrate-and-fire neurons (e.g., [23]), our model originates from a biophysically grounded HH-type neuron with ion exchange dynamics, and necessarily involves heuristic approximations to achieve a closed-form mean-field description. While this results in a less exact correspondence with network simulations in more heterogeneous or bimodal states, our goal was to retain biological interpretability and account for phenomena such as ion-driven bursting and seizure-like transitions, which are not captured by standard QIF-based neural masses.

      We see our contribution as complementary to existing exact reductions — offering a biophysically grounded alternative that remains tractable and informative in a relevant class of unimodal, mesoscopic dynamical regimes.

      (6) Sec. 1.3 In this section the authors show the comparison between in vitro experiments and simulations with both the network model and the neural mass model (Figure 4, panels a,b,c). The qualitative agreement that is supposed to be shown is hardly evident. The shape of the signals is different as is the type of bursting. The only agreement results in the fact that there are repeated spiking events at successive times in a periodic manner. However, the time scale of the simulations is different for neural network simulation and mean-field experiment, making it difficult to compare them. While the period of the bursting event is around 2 min for mean field simulation (in according with experiments), the time scale of the network simulation is 60 times smaller, thus meaning that we are considering completely different mechanisms and phenomena. The justification given by the authors, that "the parameters were modified to simulate shorter fluctuations (in the network of Hodgkin-Huxley neurons) for computational efficiency" is inappropriate.

      The poor agreement turns out to be even worse in the comparison between experiments and mean-field simulations shown in panels d and e of Figure 4. While the mean field simulation is characterized by a periodic behaviour both in the mean membrane potential and in the external potassium concentration, the in-vitro traces are not periodic and show an increasing irregular activity of the extracellular LFP in correspondence with increasing external potassium concentration.

      How it is possible to justify the implementation of this model if the working hypotheses are not supported by the results? The worst agreement of the network simulations with the experiments reinforces the doubt raised in the previous point: what is the reasoning underlying the choice of Hodgkin-Huxley as a single neuron model?

      We thank the reviewer for this detailed critique. We acknowledge that the comparisons in Figure 4 involve limitations and we now provide a clearer rationale and context in the revised manuscript. First, we emphasize that our intention is not to claim a quantitative match between the experimental and simulated traces, but rather to demonstrate that our model grounded in biophysical mechanisms such as ion exchange is capable of qualitatively reproducing a key feature observed experimentally: the slow modulation of neuronal activity by extracellular potassium concentration. For example, both in vitro (Fig. 4a, 4d) and in our simulations (Fig. 4b, 4e), bursts of activity ride on slower oscillations of potassium, and the interplay of fast and slow dynamics is central to both.

      Regarding the discrepancy in timescales between the neural network and mean-field simulations: the network simulations were intentionally run with accelerated dynamics by rescaling biophysical parameters (e.g., membrane capacitance and gating time constants) to keep the computational cost feasible. We now clarify in the manuscript that this choice is standard practice in computational modeling when the primary goal is to validate dynamical mechanisms rather than replicate absolute timescales.

      On the shape of LFP signals: the experimental recordings were AC-coupled, and the DC components associated with slower shifts in membrane potential such as those modeled in the mean-field simulations are not captured in those recordings. This limits the visibility of key features like the underlying potential jumps. Additionally, no claim is made regarding a specific bursting classification in either data or simulation.

      We agree that the experimental trace in Fig. 4d shows more complex, non-periodic dynamics (e.g., slowing burst frequency and irregularity), which are not captured by our current deterministic model. These differences could plausibly arise from additional physiological processes (e.g., stochastic transitions between metastable regimes or variability in ion regulation) that are not modeled here. In future work, such phenomena may be captured by introducing noise or parameter variability (see, e.g., Saggio et al., A taxonomy of seizure dynamotypes , elife 2020), or by allowing the parabola coefficients in the nullcline approximation to vary dynamically.

      Finally, regarding the choice of a Hodgkin–Huxley-type neuron: this model allows us to incorporate a biophysical description of ion exchange, which is central to the phenomena we study. While modeling the spiking mechanisms explicitly precludes certain mathematical simplifications available to very simplified neuron models with reset, it enables direct links between mesoscopic dynamics and measurable quantities such as extracellular potassium an essential objective of our work. To summarize, we rearranged Fig4:

      Potassium can have periodic behavior with V bursting riding on top (Fig.4 a). The model also shows this behavior at different timescales (Fig. b,c,e).

      AC LFP recording is filtered so we might not see the V jump during the bursts (because we do not have DC recordings). No claim about bursting class here.

      Potassium can also have more complex behavior (e.g., slowing down of burst frequency Fig.4.d), that the deterministic model do not show, but maybe exploring dynamical parameters (e.g., from parabolas or K_bath) or with added noise allowing to jump between regimes (reference Saggio et al. eLife 2020).

      (7) Sec. 1.5 Here six neural masses are coupled via long-range structural connections with random weights. Simulations of the system are shown for two different values of the global coupling parameter (G = 0 and G = 100). How many realisations of the network have been considered?

      We thank the reviewer for pointing this out. The presented simulation was intended as a proof-of-concept demonstration to illustrate the model’s capacity to support network-level propagation of pathological activity. For this purpose, we considered a single representative realization of the structural connectivity with random weights. Given the deterministic nature of the model and the qualitative focus of the demonstration, additional realizations do not qualitatively change the observed behavior — namely, the transition from localized to network-wide bursting as coupling strength increases. We have now clarified this in the revised manuscript.

      “This simulation serves as a proof of concept to illustrate how local pathological activity can propagate through a network depending on the strength of coupling. We used a single representative realization of randomly weighted structural connectivity. While we did not perform a systematic exploration of different realizations or coupling strengths, we observed that the qualitative behavior namely, the emergence of network-wide bursting beyond a critical coupling threshold remains robust across similar setups. The model is compatible with empirical connectome data and can be readily extended to simulations using realistic brain network architectures.”

      In future applications involving data-driven network architectures or variability analyses, we agree that exploring multiple realizations or empirical connectomes will be valuable.

      How do the results depend on the different choices of the random weights? What is the dependence of the emergent dynamics on G? What kind of dynamics can be observed varying smoothly the parameter G (e.g. from 0 to 100)?

      This section serves as a proof of concept to show that pathological activity in one node can propagate through the network when coupling is strong. We used a single random weight configuration and did not systematically explore variations in G or connectivity. While richer dynamics likely emerge across intermediate values of G, a full parameter sweep is beyond the scope of this study. We clarify this in the revised text (see answer above).

      (8) Sec. 2.1 In the description of the experiment it is mentioned that only Mg^{2+} is varied. What is the role played by Mg^{2+} variation in influencing the external potassium concentration variation? How the experiment can be linked to the model? How the hypothesis of introducing an equation for the potassium concentration current in the microscopic model is supported by the experiment and vice-versa?

      We thank the reviewer for this question. We have added a new subsection in the Methods explaining the.agnesium removal as a mean to influence the external potassium dynamics:

      “The membrane of hippocampal neurons is equipped with N-methyl-D-aspartate type glutamate receptors (NMDARs). These receptors have a very high affinity for glutamate and can, in principle, be activated by ambient glutamate present at low concentrations in the brain extracellular fluid (ECF). Under normal physiological conditions, this activation does not occur because extracellular magnesium ions (Mg<sup>2+</sup>) block the NMDAR channel at membrane potentials more negative than about –50 mV; this voltage-dependent block prevents receptor activation at rest. When extracellular magnesium is removed, the block is relieved, allowing NMDARs to be activated, leading to neuronal depolarization toward the action potential threshold [117].”

      “In addition, as a divalent cation, Mg<sup>2+</sup> interacts with the negatively charged neuronal membrane, contributing to the stabilization of the resting membrane potential. Lowering extracellular magnesium concentration disrupts this effect, resulting in membrane depolarization [118].”

      “Consequently, magnesium removal not only facilitates NMDAR-dependent depolarization, but also directly depolarizes neurons. This depolarization increases the driving force for outward potassium currents through K<sup>+</sup> channels, meaning that variations in Mg<sup>2+</sup> can indirectly influence external potassium dynamics during neuronal activity.”

      (9) Sec. 2.6 The modified version of the continuity equation has been derived following Reference [95], where the authors consider a network of Izhikevich neurons, and each neuron is modelled by a two-dimensional system consisting of a quadratic integrate and fire equation plus an equation that implements spike frequency adaptation. In particular, in [95] the authors achieve a closed set of mean-field equations with the inclusion of the mean-field dynamics of the adaptation variable by using a Lorentzian ansatz combined with the moment closure approach. The moment closure condition is also assumed in the present manuscript (Eq. 19). Under which assumptions is the implementation of the moment closure condition justified?

      We are thankful to the reviewer (and also to the R2) for pointing out to the validity of the justification of the assumptions that we have used in our formalism. We hence agree that the moment closure is not a sufficient justification for assuming that V depends on the mean n, which is neccessary for the derivation of Eq. 20, but in addition we need the assumption that n can be treated as a collective variable as it is done in the works mentioned by the reviewer 2. In addition we have performed numerical simulations of the full system to calculate the error term introduced by this approximation, and the results in the new Fig. S2 show that this is below 2% for each of the different dynamical regimes.

      We have hence modified the justification for Eq. (19) reading:

      “Next we assume a first-order moment closure condition for the variable n [59], justified by the numerical simulations of the full network (see Fig. S2) which show that for most of the neurons (close to 99 \% for the value of ∆ same as in the other simulations) the mean of the population is well capturing the behavior of the single neurons [122]. Finally, putting together these factors and assuming that n can be treated as a collective variable for each neuron (see Limitations of the model} section) we arrive to ” and also

      “The validity of the first moment closure, Eqs. (19), as in [59], is supported by the numerical simulations, which show that, both, during the silent regime and when seizure-like events occur, n<sub>i</sub> for most neurons track the network averaged ⟨n | V, η⟩. In particular, it is less than 2% of the neurons that fire while the mean is low, and vice-versa, Fig. S2. In less synchronized scenarios (larger ∆ or smaller J), however, this value would increase, but the mean would always capture the qualitative behaviour of the population.”

      This is also now explicitly mentioned in the following paragraph:

      “Unlike the mean membrane potential ⟨V⟩ and the firing rate (r), which can be explicitly derived from the continuity equation under the Lorentzian assumption, the expression for ⟨n(t)⟩ in Eq. (26) is formal. In our mean-field model, the gating variable (n) is treated as a global population variable, evolving deterministically as a function of the average membrane potential. Therefore, ⟨n(t)⟩ corresponds to the collective gating variable assumed to be shared by all neurons, and is not computed by averaging distinct microscopic (n<sub>i</sub>) values.”

      (10) Considering also the comments reported above, I think that it would make more sense to start from an Izhikevich neuron model as microscopic model and add the equations for the ionic currents as mesoscopic variables (i.e. written as population average variables), instead of starting from the Hodgkin-Huxley single neuron model and trying to make hardly justifiable approximations and simplifications.

      We respectfully disagree. While the Izhikevich model is computationally efficient, it lacks the biophysical detail required to capture key ion-driven mechanisms such as depolarization block, slow ion accumulation, and specific burst-initiation dynamics all of which are central to our study. The Hodgkin–Huxley framework, despite requiring approximation, provides the necessary physiological grounding to link microscopic ion exchange with emergent population behavior.

      (11) Sec. 2.7 What is the advantage of using six more parameters to fit, like R-,R+,c-,c+,I-,I+?

      This is in contradiction with the spirit of deriving a mean-field model, where the number of parameters should be reduced. What is the advantage of this mean-field derivation with respect to other mean-field derivations of Hodgkin-Huxley neurons, like the one in Reference [9]?

      The additional parameters (R±, c±, I±) are not arbitrary they compactly parametrize the cubic-like nonlinearity of the membrane potential dynamics in our stepwise-quadratic approximation. This trade-off allows us to preserve essential biophysical features of HH neurons (e.g., bursting regimes, depolarization block) within a tractable analytic framework. Compared to alternative approaches like in ref. [9], which focus on phenomenological reductions and do not yield an ODE system, our model offers more direct interpretability in terms of ion dynamics, providing a closer link between microscopic mechanisms and mesoscopic activity patterns.

      (12) Sec. 2.11 The derivation of the mean-field dynamics for the gating variable is rather heavy and difficult to follow. This section could be simplified, whilst also better explaining the underlying approximations and the validity of these approximations, which is currently missing.

      We agree that the derivation is technical, but we chose to retain it for transparency, as it follows the Chen and Campbell approach and makes key approximations such as moment closure explicit. We have now added a clarification that n is treated as a collective variable We hope that the current level of detail helps readers understand the assumptions underlying the gating variable dynamics.

      (13) Sec. 2.12 The derivation of Eqs. (36) is quite confusing and needs to be re-written in a clearer form. Why are both the variables x and r present in these equations, since they are proportional according to Eq. (25)?

      We thank the reviewer for pointing this out. We have adjusted the equations to improve clarity and now consistently express the firing rate in terms of a single variable. This removes the redundancy and simplifies the presentation.

      (14) Sec. 2.13 The derivation of Eqs. (37) is quite confusing and needs to be rewritten in a clearer form.

      Both the auxiliary variable x and the firing rate r are present in this equation, the same as in Eq. (36). Therefore it is presented as a set of equations for the auxiliary variable x and for the physical variable V. Moreover in the equation for dV/dt, the quadratic term in V has disappeared and it is not clear to me which are the variables corresponding to I- and I+. In particular, in Eqs. (36) there are two different current terms I-,I+ for the two equations related to dy/dt. In Eqs. (37) there is a single term (I_{cl} +I_{Na}+I_K+I_{pump})/C_m which is identical for both equations related to dV/dt. I was expecting two different terms also in Eqs. (37).

      We appreciate the reviewer’s close reading. To improve clarity, we now express the dynamics in terms of the firing rate r, replacing \dot{x} with \dot{r} in both Eq. (36) and Eq. (37) to avoid confusion.

      As for the current terms: in Eq. (37), we reverse the stepwise quadratic approximation and reintroduce the original ionic currents from Eq. (16). This is why the expressions involving I_{\text{cl}}, I_{\text{Na}}, I_K, and I_{\text{pump}} appear as a single summed term in \dot{V}, rather than the split I_-,I_+ terms used in the stepwise approximation. We now clarify this in the text.

      We also write V as \bar{V} to clarify that it refers to the average membrane potential for the neuronal population. Finally, we wrote the final equation in a more compact form to improve clarity (new Eq.38).

      (15) Moreover, while the equation for the gating variable n can be considered as a differential equation for a mesoscopic variable since n depends on average values only, it is not clear to me if the remaining variables 𝛥[K+]_{int}, [K+]_g can be considered mesoscopic or not. Since Eqs. (37) represent a mean-field model, I expect every variable to be a mean-field variable. This could be easily achievable for the extracellular potassium concentration, but I do not understand how a site-specific microscopic variable like the intracellular potassium concentration variation can be automatically inserted in a set of mean-field equations without any averaging or intermediate steps. This is a crucial point to be clarified for the validity of the neural mass equations.

      We thank the reviewer for raising this important point. In our model, we assume spatial homogeneity at the mesoscopic scale, meaning that ion concentrations — both intra- and extracellular — are uniformly distributed across the population. As a result, variables such as \Delta[K^+]_{\text{int}}, Δ[K+]int and [K+]g are treated as population-level averages, consistent with the mean-field framework.

      Moreover, the rate of change of intracellular potassium is tightly coupled to extracellular dynamics via ion exchange mechanisms, justifying its inclusion as a slow, mesoscopic variable. We now clarify this modeling assumption explicitly in the text.

      “By locally homogeneous, we mean that all neurons in the population are assumed to share the same extracellular and intracellular ionic environment and are connected with identical coupling rules, allowing us to treat the population as uniform with respect to ion dynamics and connectivity.”

      “These slow variables are in addition considered to be mesoscopic, meaning they are identical for every neuron in the population.”

      Minor points:

      (1) Figure 2, panel d. Please detail the variable on the y-axis, which is not reported in the figure.

      Done

      (2) Eq. (15) is cited in many parts of the manuscript, while it seems to me it would be more appropriate to reference Eq. (2). Is this a mistake or is there a reason to cite Eq. (15)?

      The reviewer is correct, we have had a wrong equation label, which we have now corrected.

      (3) Figure 4 Would it be possible to show enlargements of the mean membrane potential traces to directly compare the different bursting types shown by the simulation of the different models?

      The panel d already contains enlarged part of the membrane potential traces. For the rest, going back to the Q6, we want to stress again that our intention is not to claim a quantitative match between the experimental and simulated traces.

      (4) Figure 5 In the caption the author refers to "the generic model, single neuron model, and epileptor model". Could you please better explain the models referred to and why they are mentioned? Are the generic model and the single neuron model those that are presented in the Materials and Methods section? Or do you refer to completely different models, as for the epileptor?

      We have removed the reference to the generic model (we had in mind the canonical model for seizures by Saggio et al. 2017), since it is not mentioned in the paper, and we have clarified that the single neuron model and epileptor model, which were used to simulate seizure like events.

      (5) Sec 2.5 As already stated above, the authors need to reduce unnecessary formulations that confuse the reader. Here, for example, Eqs. (6) and (7) are unnecessary, in view of the fact that delta spikes are used (Eq. 8).

      We thank the reviewer for the suggestion, but we disagree, and we think it is better to start the derivations from the more general case, as done with Eqs. 6-7.

      (6) Sec. 2.6 Could you please better explain why in Eqs. (15) and (16), the variable V0 is introduced, while before and after this, the variable V is used?

      We thank the reviewer for the comment. In Eqs. (15) and (16), \dot{V}_0 denotes the free term of the membrane potential equation, i.e., the component driven solely by the intrinsic ionic currents and excluding the synaptic input I_syn. Only this \dot{V}_0 term (a function rather than an independent variable) is approximated by the piece-wise quadratic expression in Eq.(21). In contrast, the variable V represents the membrane–potential variable, which dynamics is obtained by combining \dot{V}_0 with the synaptic current contribution I_syn. In summary, there is no independent variable V_0; only the function \dot{V}_0 is introduced to represent the intrinsic (non-synaptic) component of the membrane–potential dynamics. We have now clarified this in the text.

      (7) In the square brackets of the r.h.s. of Eq. (18), for all the intermediate steps, it appears G^n(V,n) ϱ^V, while there should be G^n(V,n) ϱ^n.

      We thank the reviewer for catching this typo. We have corrected this in the revised manuscript.

      (8) Sec. 2.8 Here the authors affirm that "a double-Lorentzian (or a piece-wise Lorentzian) could be a suitable form for ρ^V (t, V | η). However, it is not clear under which conditions such an assumption would allow a solution to the continuity equation". What are the problems underlying the implementation of the double Lorentzian? It seems to be a more correct form than the single Lorentzian actually implemented.

      We thank the reviewer for this thoughtful question. In principle, a double-Lorentzian ansatz for \rho^V can indeed be implemented in several reasonable ways–for example, by enforcing that the combined area of the two Lorentzian components is normalized to one (to preserve the probabilistic interpretation) and by imposing smoothness constraints at their boundaries. However, despite exploring these implementations, we were unable to obtain non-trivial solutions of the continuity equation under this parametrization. The only solvable case we found is the degenerate one in which the two Lorentzians collapse onto each other (i.e., (x_- = x_+) and (y_- = y_+)), which reduces the ansatz to the single-Lorentzian form used in the manuscript. For this reason, although the double-Lorentzian is conceptually appealing, it did not yield practically useful solutions within our framework.

      (9) Eq. (28). The symbols used for the flux (especially those used in the second-to-last step once the inner integration is performed) are confusing and it is difficult to understand what they mean.

      We thank the reviewer for noting this issue. The problem was due to a LaTeX typo that prevented the vertical lines—indicating that the flux is evaluated at specific points—from rendering correctly. We have now corrected this.

      (10) Eq. (29) In the third step there are some misprints that impair comprehension.

      We thank the reviewer for noting this. We have corrected these misprints in the revised version.

      (11) Line 696. The reference is not displayed.

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      As a really general remark, this manuscript is written in a confusing manner, the authors present their model in a general formulation and their analysis in a complicated way that in the end is not needed, as I will explain in detail in the following.

      Another general question is why the authors want to employ the neural mass reduction methodology developed in [23] to obtain exact mean-field evolution for quadratic neurons (like the quadratic integrate and fire (QIF)) for a model that reveals a cubic dependence on the membrane potential, as the FizhHugh-Nagumo neuron (that indeed is a 2d reduction of the Hodkgin-Huxley model), to obtain an approximate neural mass model that somehow works qualitatively only for synchronized dynamics? Why not use another approach more suited to derive the neural mass model for cubic nonlinearity, as the one suggested in [33] and [69] by Di Volo and co-authors? What is the rationale behind the choice of the authors?

      We appreciate the reviewer’s critical feedback and the opportunity to clarify our methodological choices. Our decision to base the mean-field model on Hodgkin–Huxley-type neurons stems from the need to retain ion channel dynamics, which are essential to capture the coupling between membrane activity and extracellular ionic concentrations. This biophysical link is central to our study and cannot be achieved using more abstract neuron models such as QIF or FitzHugh-Nagumo alone.

      Regarding the mean-field reduction method: while the Ott-Antonsen/Lorentzian framework is indeed exact for QIF neurons, we adopted a stepwise quadratic approximation to apply a similar formalism to the cubic-like dynamics of the HH model. This choice enables us to analytically capture a rich set of behaviors, including bursting, depolarization block, and seizure-like dynamics, in a tractable mean-field system.

      We considered the approach of Di Volo and colleagues [33, 69], but their methodology is tailored to asynchronous irregular regimes, whereas our model is specifically designed to capture dynamics in quasi-synchronous or bursting regimes — including epileptiform activity — which are not covered by the assumptions of the Di Volo framework.

      We now clarify these modeling choices more explicitly in the revised manuscript.

      "Unlike phenomenological or reduced models, the Hodgkin–Huxley framework allows us to retain explicit ion exchange dynamics, which are essential for linking membrane behavior to extracellular potassium fluctuations. This level of biophysical detail is crucial for modeling pathological regimes such as seizure onset and propagation."

      Furthermore, the derivation of the neural mass equations is unnecessarily complicated, as a matter of fact, they approximate all the variables (except the membrane potentials of the single neurons) as collective variables (i.e. the gating variable and the potassium concentration) common to all the neurons. The neural network model for which they derive the neural mass model presents microscopic evolutions of the membrane potential cubic-like plus other global variables equal for all neurons, that depend on collective variables such as the mean membrane potential or the mean firing rate. Once clarified, the derivation of the neural mass model is much simpler, and it is not necessary to follow the approach reported in Reference [95] [Chen, L. & Campbell, S. A. Exact mean-field models for spiking neural networks with adaptation. Journal of Computational Neuroscience 50 (4), 445-469 (2022)] which is unnecessarily complicated. The authors can follow a much simpler methodology as explained by Guerriero et al in Reference [R6] (cited below) where the authors consider the same model studied in [95]. Such a methodology has been applied in many cases already, to introduce realistic aspects in the neural mass model [23] (see References [R1-R7] below). I strongly encourage the authors to reformulate their approach in a simpler and clearer manner, by following the approach reported in [R1-R7]. The manuscript will become more readable and it will gain in comprehension.

      We thank the reviewer for this helpful suggestion. We agree that, given the assumptions made in our derivation (i.e., shared gating and ion concentration variables across neurons), the mean-field equations could alternatively be obtained using the simpler methodology proposed by Guerriero et al. [R6] and related works [R1–R7]. However, we chose to follow the derivation presented by Chen and Campbell [95] because it makes the approximations (e.g., moment closure, flux boundary assumptions) explicit and generalizable to future extensions. However, we also acknowledge that the assumption of n to be treated as a collective variable is needed, and for clarity, we have now added a remark in the manuscript indicating that the same result could be recovered more directly using the approach of Guerriero et al.

      “We note that, under the assumption of globally shared gating and ion concentration variables across the neuronal population, the resulting mean-field equations can also be derived using simpler methods as proposed by Guerriero et al [58]. In this work, we follow the more general formalism of Chen and Campbell [59], which makes the role of key approximations (e.g., moment closure, vanishing flux at boundaries) explicit. This also facilitates potential generalizations to settings with partial heterogeneity or dynamic gating distributions.”

      “Finally, putting together these factors and assuming that n can be treated as a collective variable for each neuron”

      “Unlike the mean membrane potential ⟨V⟩ and the firing rate (r), which can be explicitly derived from the continuity equation under the Lorentzian assumption, the expression for ⟨n(t)⟩ in Eq. (26) is formal. In our mean-field model, the gating variable (n) is treated as a global population variable, evolving deterministically as a function of the average membrane potential. Therefore, ⟨n(t)⟩ corresponds to the collective gating variable assumed to be shared by all neurons, and is not computed by averaging distinct microscopic (n<sub>i</sub>) values.”

      Now I will examine in detail all the manuscript and report comments/remarks/suggestions numbered as (Q#) on how to improve the present manuscript to render it easier to read and more comprehensible, these are not minor remarks, just detailed ones.

      Introduction

      (Q1) The Introduction section needs a part devoted to the reduction methodology developed in [23] for QIF neurons and a presentation of previous works dealing with the introduction of biologically realistic aspects in the neural mass model derived in [23]. Here is a non exhaustive list of such papers concerning the introduction of the following realistic aspects in the neural mass developed in [23]:

      (I) short-term synaptic plasticity :

      [R1] Exact neural mass model for synaptic-based working memory H Taher, A Torcini, S Olmi, PLOS Computational Biology 16 (12), e1008533 (2020)

      [R2] Bursting in a next generation neural mass model with synaptic dynamics: a slow-fast approach H Taher, D Avitabile, M Desroches, Nonlinear Dynamics 108 (4), 4261-4285 (2022)

      [R3] Mean-field approximations of networks of spiking neurons with short-term synaptic plasticity R Gast, K Thomas R, H Schmidt, Physical Review E 104 (4), 044310 (2021)

      (II) spike frequency adaptation:

      [R4] Gast, Richard, Helmut Schmidt, Thomas R. Knösche. "A mean-field description of bursting dynamics in spiking neural networks with short-term adaptation." Neural computation 32.9 (2020): 1615-1634.

      [R5] Population spiking and bursting in next-generation neural masses with spike-frequency adaptation, A Ferrara, D Angulo-Garcia, A Torcini, S Olmi, Physical Review E 107 (2), 024311 (2023).

      (III) conductance-based neuron with a slow current (Izekievic model):

      [R6] A new generation of reduction methods for networks of neurons with complex dynamic phenotypes,IC Guerreiro, M Di Volo, B Gutkin, preprint arxiv: 2206.10370 (2022)

      (IV) spike timing-dependent plasticity:

      [R7] Mean-field approximations with adaptive coupling for networks with spike-timing-dependent plasticity, B Duchet, C Bick, Á Byrne, Neural computation 35 (9), 1481-1528 (2023).

      (V) random connectivity and noise:

      [R8] Mean-field models of populations of quadratic integrate-and-fire neurons with noise on the basis of the circular cumulant approach

      DS Goldobin Chaos: An Interdisciplinary Journal of Nonlinear Science 31 (8) (2021)

      [R9] A reduction methodology for fluctuation-driven population dynamics DS Goldobin, M Di Volo, A Torcini, Phys. Rev. Lett. 127, 038301 (2021)

      [R10] Shot noise in next-generation neural mass models for finite-size networks VV Klinshov, SY Kirillov Physical Review E 106 (6), L062302 (2022)

      I think the authors should refer in the introduction to these previous papers, where realistic biological aspects have been already introduced in the neural mass model developed in [23].

      We have added a whole pragaraph devoted to the next-generation neural mass models and in particular to the other works introducing biological realism in this class of models:

      “Recently, a class of these models, called next-generation neural mass models [42], has been developed based on an analytical approach introduced by [25] that allowed for the exact derivation of mean field parameters for a population of quadratic integrate-and-fire (QIF) neurons. These can be linked to EEG/MEG oscillations [43], including epipeltic seizures [44], and have been used to study various aspects of the whole-brain dynamics such as the low-dimensional manifold of the resting state [45, 46], aging [47] and neural sig natures of consciousness [48]. Number of works dealt with the introduction of biologically realistic aspects in the mostly phenomenological neural mass model derived in [25]. These included short-term synaptic plasticity [49–51], spike frequency adaptation [52, 53], spike timing-dependent plasticity [54], synaptic delay [29], random connectivity and noise [55–57], as well as an extension of the conductance-based neurons with a recovery variable [58–60].”

      (Q2) Line 117 - Please specify what you mean by locally homogeneous, here.

      Thank you for allowing us the opportunity to clarify this. We now report:

      "By locally homogeneous, we mean that all neurons in the population are assumed to share the same extracellular and intracellular ionic environment and are connected with identical coupling rules, allowing us to treat the population as uniform with respect to ion dynamics and connectivity."

      (Q3) In this sub-section the authors should clarify all the hypotheses they employ to derive the neural mass models, not only the Lorentzian approximation they did for a cubic model, but also the fact that they assume that the gating variable n is a global variable as well as that the potassium concentration are assumed to be the same for all neurons, that they assume no heterogeneity at this level. This is a fundamental aspect that should be clarified at this stage already.

      We thank the reviewer for this important observation. We agree and have revised the text in the derivation section to explicitly state all key assumptions. Specifically, we now clarify that:

      (1) The gating variable n is treated as a population-average (global) variable;

      (2) The potassium concentrations Δ[K+]int and [K+]g are assumed to be homogeneous across the neuronal population; and (3) No heterogeneity is assumed at the level of the ion dynamics.

      This assumption is biophysically motivated: ion concentrations — particularly extracellular potassium — tend to redistribute rapidly due to diffusion and electrochemical forces, leading to an effectively well-mixed environment at the mesoscopic scale. As such, assigning separate compartments to individual neurons is not justified in this modeling context. We now explicitly note this in the manuscript to avoid ambiguity.

      “3) We assume that the potassium concentrations, both intracellular(\( \Delta[K^+]_{\text{int}} \)) and extracellular (through the buffering variable \( [K^+]_g \)), are homogeneous across the neuronal population. This is justified physiologically by the rapid redistribution of ions through diffusion and electrochemical gradients, which enforce near-instantaneous equilibration at the mesoscopic scale. As such, assigning separate compartments to each neuron is neither practical nor biologically meaningful in this context. We assume that the potassium concentrations, both intracellular (\( \Delta[K^+]_{\text{int}} \)) and extracellular (through the buffering variable \( [K^+]_g \)), are homogeneous across the neuronal population. This is justified physiologically by the rapid redistribution of ions through diffusion and electrochemical gradients, which enforce near-instantaneous equilibration at the mesoscopic scale. As such, assigning separate compartments to each neuron is neither practical nor biologically meaningful in this context; 4) We assume that the gating variable n, which governs potassium conductance, can be treated as a population-averaged variable. This allows us to describe the neuronal ensemble using a reduced set of collective (mean-field) variables.”

      Comparison with neural network simulations

      (Q4) The comparison the authors perform between the microscopic model and the neural mass is misleading, From what the authors wrote it seems that you are considering 4 variables for each neuron in the network model (this is unclear from how the model is written in Eq (9)), I guess one for the membrane potential, one for the gating variable and two for the potassium concentration. However, this is not the network model for which the neural mass has been developed, the neural mass has been obtained for a network made of N + 3 variables (N membrane potentials and 3 collective variables for gate, and potassium concentrations) this is a sort of mesoscopic network models, analogously to what done previously in references [R1,R3,R4] above and others. If the authors would compare their neural mass with this mesoscopic model the agreement among the two would be improved.

      We agree with reviewer’s observation and we now acknowledge this issue in the Results and in the Limitations. We have already modified the text to explicitly state that for the mean filed derivations n is treated as a collective variable and we have added the following statements:

      “Also note that the gating variable n is treated as microscopic in the neural network, while in the derivations for the mean-field it is considered as a mesoscopic and identical for the whole population. This is likely responsible for some of the discrepancies between the two modalities.”

      “Moreover, the discrepancy between the two modalities would have likely been smaller if for the neural network we also adopted a gating variable that is mesoscopic and identical across the spiking neurons, as in similar works [49–51]. However, here we demonstrate the validity of the mean-field approximation even for the more natural, microscopic representation of the gating variable in the neural network.”

      Comparison with in vitro experiments

      (Q5) Experiment -- The experiment is performed in vitro on the intact Hippocampus of mice between postnatal days P5-P7. It is known [R1] that neuronal activity at an early developmental stage is provided in the Hippocampus by a network primarily driven by synchronized GABA_A that provides an excitatory action and generates giant depolarizing potentials (GDPs) [R11]. However, GDPs have frequencies in the range of 1 Hz - 0.1 Hz, not matching the oscillation frequencies reported by the authors. I have several questions here:

      (E1) At this stage P5-P7 are the interactions among neurons essentially excitatory? Or not, please explain why, Are the oscillations reported by the authors somehow related to GDPs? The depolarizing action of GABAergic transmission and the presence of GDPs during early rodent brain development, as described by Ben-Ari and some others researchers, are characteristics commonly observed in ex vivo brain preparations, but are not evident under physiological in vivo conditions (see doi: 10.3389/fphar.2012.00065).

      In our preparation—intact mouse hippocampus—GABAergic synaptic transmission is not depolarizing. This is evidenced by the fact that inhibition of ionotropic GABA_A receptors with bicuculline triggers interictal-like discharges, which are routinely used as a model of epileptiform activity (see doi: 10.1016/j.nbd.2014.12.013). Therefore, in our experiments at P5–P7, neuronal interactions are not purely excitatory, and the observed low Mg2+ induced oscillations are not related to GDP.

      (E2) What is the nature of the oscillations reported by the authors in Figure 4 ? Which is their origin, please explain in the text of the paper clearly.

      The model of epileptic discharges presented in our study was first introduced over 20 years ago and has since become a well-established paradigm for screening potential antiepileptic drugs and research on the mechanism of epileptic seizure. A detailed description of this model can be found in doi: 10.1046/j.1460-9568.2002.02143.x, and its pharmacological properties are reviewed in doi: 10.1046/j.1528-1157.2003.19503.x. These references have now been added to the manuscript for clarity.

      We have added the following:

      “The model of epileptic discharges presented in our study was first introduced over 20 years ago [115] and has since become a well-established paradigm for screening potential antiepileptic drugs and research on the mechanism of epileptic seizure [116].”

      (E3) How exactly does the concentration of extracellular potassium ions change, this is not clear even in Methods, please clarify.

      [R11] Excitatory actions of GABA during development: the nature of the nurture Y Ben-Ari, Nature Reviews Neuroscience 3 (9), 728-739 (2002).

      We have now added a new Subsection in the methods explaining how we use Mg2+ variation to influence the external potasium variation.

      “The membrane of hippocampal neurons is equipped with N-methyl-D aspartate type glutamate receptors (NMDARs). These receptors have a very high affinity for glutamate and can, in principle, be activated by ambient glutamate present at low concentrations in the brain extracellular fluid (ECF).Under normal physiological conditions, this activation does not occur because extracellular magnesium ions (Mg<sup>2+</sup>) block the NMDAR channel at membrane potentials more negative than about –50 mV; this voltage-dependent block prevents receptor activation at rest. When extracellular magnesium is removed, the block is relieved, allowing NMDARs to be activated, leading to neuronal depolarization toward the action potential threshold [117]. In addition, as a divalent cation, Mg<sup>2+</sup> interacts with the negatively charged neuronal membrane, contributing to the stabilization of the resting membrane potential. Lowering extracellular magnesium concentration disrupts this effect, resulting in membrane depolarization [118]”

      “Consequently, magnesium removal not only facilitates NMDAR-dependent depolarization, but also directly depolarizes neurons. This depolarization increases the driving force for outward potassium currents through K<sup>+</sup> channels, meaning that variations in Mg<sup>2+</sup> can indirectly influence external potassium dynamics during neuronal activity.”

      (Q6) Lines 187-191 and Figure 4 -- The authors wrote : "In Figure 4.c we show the membrane potential and external potassium for a simulation of N = 3000 coupled HH-like neurons showing a similar behavior, although the parameters were modified to simulate shorter fluctuations for computational efficiency." This sentence is unclear. What is clear from Figure 4 is that the network simulations gave rise to collective oscillations on a completely different scale seconds with respect to minutes and also the profile of the potassium concentration has a clearly different evolution. From Figure 4 one can conclude that network simulations have nothing to do with the neural mass evolution and the experiment. I think the authors should better clarify and describe the results reported in Figure 4.

      We thank the reviewer for the observation. We have revised the relevant section of the manuscript to clarify the interpretation of Figure 4 and avoid any implication of quantitative matching. As stated in our response to Reviewer 1 (comment 6), the comparison is intended to highlight the shared qualitative structure across experimental data, the neural mass model, and the network simulation — specifically, the modulation of fast bursting by slow extracellular potassium fluctuations. The difference in timescale in the network simulation arises from rescaled parameters used for computational efficiency. We now explicitly state this and have updated the figure caption and accompanying text accordingly to reflect these points.

      (Q7) Why do the authors consider a purely excitatory network to describe the experimental results? What is the reason for this choice? Why they do not consider as usual balanced excitatory- inhibitory networks? Please clarify this point.

      We thank the reviewer for raising this point. We chose to model a purely excitatory network as a first step in isolating the role of extracellular potassium dynamics in generating population-level bursting. This allows us to focus on the ion-driven modulation mechanisms without introducing additional complexity from inhibitory feedback. Similar modeling choices have been made in previous studies of bursting and seizure-like dynamics (e.g., Gutkin et al.,), where inhibition is omitted to emphasize intrinsic or modulatory mechanisms. We acknowledge that incorporating inhibitory populations is an important next step for capturing a broader range of dynamics, but for the current study, the excitatory-only network provides a minimal and interpretable framework aligned with our focus.

      (Q8) By comparing Figures 4 (a) and (b) it seems that the bursting activity observed in the experiment and in the mean-field simulations seem quite different, originating from different mechanisms and bifurcations, Can the authors comment on this?

      We thank the reviewer for this important observation. We have reorganized the presentation of Figure 4 and revised the accompanying text to better clarify the nature of the comparison (see also our response to Reviewer 1, point 6). Our aim is not to claim that the experimental and simulated bursts arise from identical bifurcation mechanisms, but rather to highlight shared qualitative features — in particular, slow modulation of population activity by extracellular potassium. We now also comment on the potential role of more complex or noise-driven bifurcations (see Saggio et al. 2020) in shaping experimental bursting dynamics, which are not fully captured by the current deterministic model.

      Bifurcation analysis: emergent network states and multistability

      (Q9) This sub-section will gain interest by reporting simulations of the network and of the neural mass model presenting bistable dynamics.

      We agree with the reviewer that this would be an important addition, but we believe that it goes beyond the scope of this work (for the computational reasons among others) and it remains for future work. We have however updated the bifurcation analysis section.

      Limitations of the model

      (Q10) Lines 276- 280 -- I think that the parameters c+,c_,R+,R_ depend not only on the slow variables, potassium concentrations but also on the actual value of the gate variable n. This should be stressed.

      We thank the reviewer for this helpful observation. We agree and have clarified in the revised manuscript. This reflects the mean-field assumption that n is treated as a collective variable, and we now make this dependency explicit in the text.

      “Furthermore, the parabola coefficients c_-,c_+, R_-, R_+ were fixed as constants, however, these coefficients could be made functions of the slow variables and the gating variable, which might unveil new dynamical regimes and extend the validity of the thermodynamic limit beyond the regimes described in this work. Also, in the case of constant values, an in-depth exploration of the parameter space is required to fully characterize the model and its bifurcation structure.”

      (Q11) The authors wrote: " Other limiting assumptions are the moment closure condition (19) and the assumptions that the functions (3) averaged across the neuronal population can be expressed as functions of the average membrane potential V and gating variable n (which is only true in the cases where the functions (3) can be reasonably approximated as linear functions in a range of V and n." Apart from that a parenthesis is lacking, I think that this last aspect has been already taken into account when performing the fit with 2 parabolas to the sum of the currents, or not? In case, please specify.

      We thank the reviewer for catching the missing parenthesis — this has been corrected in the revised manuscript. Regarding the modeling point: the two-parabola fit applies specifically to the membrane potential dynamics and captures the nonlinear dependence of the total current on V (eq.16). In contrast, the moment closure assumption involves approximating averages of nonlinear functions of both V and n, such as those appearing in the gating dynamics (e.g., n∞(V)). This is not directly accounted for by the parabola approximation, but is handled separately via the mean-field approximation of G^n as a function of the average variables (eq.15).

      (Q12) A limitation that should be stressed is that the authors in the neural mass model consider the gate variable and the potassium concentrations, as global variable equal for all neurons, and where n depends on the mena membrane potential, to write that the moment closure (19) is a limiting assumption is honestly too clear, please be explicit here.

      We have now the following two statements:

      “These slow variables are in addition considered to be mesoscopic, meaning they are identical for every neuron in the population.”

      “In our mean-field model, the gating variable (n) is treated as a global population variable, evolving deterministically as a function of the average membrane potential. Therefore, ⟨n(t)⟩ corresponds to the collective gating variable assumed to be shared by all neurons, and is not computed by averaging distinct microscopic (n<sub>i</sub>) values.”

      Discussion

      (Q13) The authors could discuss in this section the further biological ingredients they can introduce in their neural mass based on the previous works [R1-R9] that have already shown how to include plastic synapses, random connectivity, noise, adaptation, spike-timing-dependent plasticity, etc and which of these ingredients they consider more relevant for the whole brain dynamics.

      In order not to repeat the same statements from the Introduction, we have now addded the following sentence:

      “This approach, taking into account key biophysical details, offers a first step in considering the role of the glia in neural tissue excitability. Following this direction, other ions, such as calcium should be taken into consideration, as well as other effects such as plastic synapses, random connectivity, noise, adaptation, spike-timing-dependent plasticity, as already discussed in the Introduction.”

      (Q14) The authors should also discuss why they limited their analysis to purely excitatory networks, and what would change by including excitatory-inhibitory interactions in each single mass and across neural masses, if this makes sense or not.

      As stated in our response to Q7, we chose to focus on purely excitatory networks as a first step to isolate and study the core role of extracellular potassium dynamics in driving bursting behavior. This modeling choice allows for a minimal system where the interaction between intrinsic ionic mechanisms and network coupling is most transparent.

      We also note that excitatory and inhibitory effects can be modeled within the same formalism by adjusting the synaptic reversal potential — for example, $E_{syn}=0$mV for excitatory, and $E_{syn}=-80$mV for inhibitory interactions. Including inhibitory populations would introduce additional complexity and richer dynamical regimes (e.g., oscillatory instabilities, balance states), which are certainly of interest but beyond the scope of this study.

      Materials and Methods

      (Q15) Fig.2 - I think a plus is lost in panel (c) where it should be [K+bath];

      Thank you. We corrected the figure.

      (Q16) Caption of Figure 2- the authors wrote: "In the case where the derivative of the membrane potential is zero for V > V ⋆ (e.g., if the cubic function is shifted up by adding a constant current to the membrane potential derivative), the population is described by the red distribution in the steady state, and the continuity equation is governed by the negative parabola equation." This sentence is unclear, the authors mean in the case where the derivative of the membrane potential crosses zero at V > V*? Please clarify.

      We thank the reviewer for pointing this out. Yes, we refer to the case where the membrane potential derivative crosses zero at a point V>V∗. We have clarified this in the revised figure caption.

      (Q17) Lines 558-562 -- Eqs (6) and (7) are examples of unnecessary complications of which this manuscript is full of. Since the authors do not consider any synaptic dynamics and homogenous (equal) couplings, these equations are not needed, I strongly recommend removing Eqs (6) and (7) and limiting to the expression reported in Eq (8), which indeed should also be corrected see next remark.

      We appreciate the reviewer’s concern regarding clarity. As mentioned in our response to Reviewer 1, the inclusion of Eqs. (6) and (7) was intentional and serves a pedagogical purpose — to present the general structure of the network interactions before introducing simplifying assumptions. While we agree that Eq. (8) suffices for the simulations considered in this manuscript, we believe that showing the more general form helps clarify the model’s extensibility, for instance to cases with heterogeneous coupling or synaptic dynamics.

      (Q18) Eq (8) - line 562 - Since the authors assume no synaptic evolution, i.e. instantaneous post-synaptic potentials, they can clarify that Eq (8) represents the population firing rate that later will be one of the fundamental variables of the neural mass model and call it r, as in the following. Furthermore, $s_i$ does not depend on the neuron index $i$ in a fully coupled network with homogenous coupling, as in the present case, this quantity is the same for all neurons. Please drop the index and call it r since it is the population firing rate.

      We thank the reviewer for this useful suggestion. We now clarify in the text that under the assumptions of all-to-all homogeneous coupling and no synaptic dynamics, s_i is identical for all neurons and can be interpreted as the population firing rate r. This connection is made explicit in the revised manuscript.

      “Under the assumption of instantaneous synaptic transmission and homogeneous all-to-all coupling, the synaptic activation variable (s<sub>i</sub>) is the same for all neurons and corresponds to the population firing rate, which we denote by (r)”

      (Q19) Line 564-567 - Here the network model is incomplete, it is not sufficient that the authors report the evolution equation for the membrane potential Eq (9). They should report the evolution equation for the gate variable n and for the potassium concentration as done in Eq (1). This request is fundamental because it is unclear from the present formulation which are the variables that are microscopic (associated with the single neuron evolution) and which are global (common to all the neurons). This is a fundamental aspect and it should be clarified. I guess that n will depend on the neuron index $i$, while the potassium concentration it is unclear how the authors will consider them, global or local. I guess that the internal density should depend on the neuron index $i$ or not ? Anyway, I would like to know exactly which network model has been simulated e.g. to obtain the results reported in Figure 3.

      We thank the reviewer for this essential clarification request. In the revised manuscript, we now explicitly state the full network model, including the evolution equations for the gating variable n_i and potassium variables. While in some simulations we consider the full microscopic model involving 4N variables (where each neuron has its own V_i ,n_i ,Δ[K+]int_i ,[K+]g_i), for the mean-field reduction and mesoscopic comparisons we assume that the gating and potassium variables are shared across neurons. This assumption is consistent with prior work (e.g., Chen & Campbell) and is biophysically justified in the case of potassium due to its fast spatial equilibration in extracellular space. We also now mention this explicitly in the Limitations.

      (Q20) Continuity equation - Lines 568 - 597 - This part can be largely simplified and rewritten, as a matter of fact, the authors consider the gate variable n, the potassium concentrations as global (collective variables) depending on mean field values of <V> they can directly start from eq 20, by stating that they assume that the other variables (n, $\Delta[K^+]_{int}$, $[K^+]_g$) are collective variables, common to all the neurons, and that depends only on mean field variables as <V> or r. This has been done in many previous cases since the Ott-Antonsen Ansatz can be applied whenever the potential evolution is driven by quadratic terms and in the presence of mean field variables, the first indication of this was reported in 1993 by Watanabe and Strogatz for phase oscillators :

      [R12] Watanabe, Shinya, and Steven H. Strogatz. "Integrability of a globally coupled oscillator array." Physical review letters 70.16 (1993): 2391.

      Anyway, this approach has been previously employed to derive a neural mass model for networks of QIF neurons in the presence of various further neuronal variables (ranging from slow currents to plastic evolution of the couplings) describing more biologically realistic situations, see references [R1-R7] above. I strongly encourage the authors to reformulate their approach in a simpler and clearer manner, particularly interesting is for them the article [R6] by Guerriero et al, the authors examine exactly the same model as in Ref [95] [Chen, L. & Campbell, S. A. Exact mean-field models for spiking neural networks with adaptation. Journal of Computational Neuroscience 50 (4), 445-469 (2022)]. However, they solve the problem in a much more simple way, I encourage the authors to follow this approach.

      We thank the reviewer for the constructive suggestion. We acknowledge that, under the assumption that n, Δ[K+]int , and [K+]g are collective variables shared across the neuronal population, one could directly begin from Eq. (20) and proceed using the simpler approaches found in Guerriero et al. [R6] or related works [R1–R7]. However, we chose to retain the Chen & Campbell formalism, with additional clarification regarding the mesoscopic nature of the gatin variable, as it explicitly highlights the key approximations used in the derivation, which may be beneficial for readers seeking to extend the method. See also general response to reviewer 2 at the beginning.

      (Q21) Eq (26) -- I do not think the authors can estimate explicitly <n(t)> from the equation (26), as they do for the mean membrane potential and the firing rate. This is just a formal expression representing a collective variable, I do not think that <n> will coincide with the average of the values of n_i for each neuron. Please discuss this point, and in this case show that <n> indeed coincides with the average of all of the values of the single neuron gate variable n_i.

      We thank the reviewer for raising this important point. We agree that Eq. (26) is more formal than operational, as ⟨n(t)⟩ is not directly derived from the continuity equation in the same way as ⟨V⟩ or the firing rate r. Rather, it reflects our mean-field assumption that the gating variable evolves as a collective population-averaged quantity, governed by the dynamics of the average membrane potential. In our formulation, n is treated as a global variable shared across neurons, and thus ⟨n(t)⟩ effectively is the gating variable in the neural mass model — rather than the result of averaging heterogeneous n_i. We have clarified this distinction in the text to avoid suggesting that Eq. (26) provides an explicit estimate of microscopic gating dynamics.

      “Unlike the mean membrane potential ⟨V⟩ and the firing rate (r)>, which can be explicitly derived from the continuity equation under the Lorentzian assumption, the expression for ⟨n(t)⟩ in Eq. (26) is formal. In our mean-field model, the gating variable (n) is treated as a global population variable, evolving deterministically as a function of the average membrane potential. Therefore ⟨n(t)⟩ corresponds to the collective gating variable assumed to be shared by all neurons, and is not computed by averaging distinct microscopic (n<sub>i</sub>) values.”

      (Q22) Mean-field dynamics for the gating variable - All this sub-section is in my opinion not useful, if the authors assume from the beginning that <n(t)> is a global variable. Indeed in the end they write for <n(t)> the evolution equation Eq (30) which is the same equation as for the single neuron gate variable (1) but for the mean values of n and <V>. I suggest removing this sub-section.

      We thank the reviewer for this suggestion. We agree that, under the assumption that n is a global collective variable, the resulting equation for ⟨n(t)⟩\langle n(t) \rangle⟨n(t)⟩ is equivalent in form to the single-neuron gating equation, driven by the average membrane potential. However, we chose to retain this subsection to explicitly demonstrate how the gating dynamics enter into the mean-field formulation, especially for readers less familiar with this type of reduction. This step also mirrors the structure of the derivation used for other state variables in the model and maintains clarity for potential extensions where n may not be strictly global.

      (Q23) Line 696 - here an equation reference is lost.

      Thank you for pointing this out. We have corrected the text and restored the missing equation reference in the revised manuscript.

      (Q24) Eqs (36) -(37) -- Since the variables r and x entered in Eq (36) are essentially the same as Eq (25), apart from a constant R/pi, the use of two different names complicated in a useless manner an already complicated expression, Please decide to use everywhere r or x and then proceed consequently this applies also to Eq (37). This will also allow us to rewrite the equation in x or r in a more compact form.

      As noted in our response to Reviewer 1, point 14, we have revised Eq. (37) to ensure consistency in notation by replacing x with r throughout.

      (Q25) Eq (37) - This equation is written in a manner that is not careful enough, apart from that the authors are passed now from (x,y) to (pi*r/R,V) , therefore they should substitute everywhere x with r. Furthermore, the equation for the derivative of V is confusing, the authors should use the same approximate expression employed in eq (36) that makes explicit the quadratic dependence on V itself, otherwise, I believe that the equation is incorrect.

      In the same response to Reviewer 1, point 14, we also clarified the expression for \dot{V} in Eq. (37), we reintroduced the full current-based formulation (as in Eq. 16), reversing the quadratic approximation used earlier. This is now explicitly stated in the text, and we have improved the equation presentation to avoid confusion.

      (Q26) Eq (37) below line 708 - From this expression, it is clear that the gate variable n and the potassium variables are ruled exactly by the same equations as for the single neuron Eq (1) and that the Lorentzian Ansatz enter only in the rewriting of the evolution of the membrane potentials of the neurons in the network. In the end, the authors are doing exactly the same approximation made by many other authors [R1-R7], that these variables are collective, i.e. they are the same for all neurons, and in particular n=n(V) is a function of the mean membrane potential V. The mean field model that the authors derive corresponds to a microscopic model where the single neurons are heterogenous only in the intrinsic currents $\eta_i$, but they are all driven by collective variables, like n(V) and the potassium variables that are identical for all neurons. This should be clarified.

      We agree with the conclusion by the reviewer, and as seen through the previous responses, we now explicitly acknowledge the fact that n and the two slow variables are considered as a mesoscopic variables for the mean-field derivation, while for the spiking network, n remains microscopic.

    1. Author response:

      We are writing to provide our provisional response to the public reviews. We note that the reviewers’ comments focus primarily on strengthening technical rigor and quantitative interpretation. We have designed the planned revisions to directly address the reviewers’ major concerns and to strengthen the study’s evidentiary basis. We plan to submit a revised manuscript for the final Version of Record.

      For clarity, we summarize below the major new experiments and analyses that address the reviewers’ primary concerns:

      (1)Validation of Tracking Parameters (Reviewers 1 & 3): We will re-analyze our single molecule tracking data with tighter gap-time allowances (0 seconds) to demonstrate the robustness of our interpretations of short- and long-lived kinetics. We will also generate a supplementary movie with binding trajectories superimposed directly on detected molecules to visually confirm tracking robustness.

      (2) Photobleaching & Two-State Controls (Reviewers 1 & 3): We will report per-cell photobleaching lifetimes derived from our global fluorescence decay. To strengthen this analysis, we will include supplementary measurements using a H2B-HaloTag control under matched imaging conditions and perform single-molecule tracking of GATA2 zinc-finger deletion mutants (N-terminal, C-terminal, and double) as a binding-deficient functional control.

      (3) Protein Expression & Labeling Efficiency (Reviewers 1 & 2): To address concerns about transgene expression and competition with endogenous proteins, we will quantify Halo-GATA2 levels in G1E-ER4 and HPC7 cells and SNAP-GATA2 levels in primary cells using standardized titration methods with established Halo-CTCF and SNAP-RPB1 reference systems.

      (4) Integration of SMT and CUT&Tag (Reviewer 3): We have conducted a quantitative foldchange analysis of our existing CUT&Tag dataset to complement our single-molecule kinetics.

      However, as detailed in our specific response below (R3 point 5), we emphasize that directly integrating population-level genomic occupancy measurements with single-cell kinetic measurements is not straightforward. We will therefore frame the relationship between these datasets as a conceptual consistency check rather than a strict quantitative integration. This quantitative analysis supports and refines the Early-restricted peak set, identifying a high confidence strict subset consistent with the broader presence/absence-defined set described in Figure 5 of the manuscript (see Author response images 1–3 and our response to R3 point 7).

      (5) Characterization of the GATA2-SNAP Mouse (Reviewer 3): We have characterized hematopoietic populations in the homozygous knock-in mouse, including lymphoid (CD3<sup>+</sup>/CD4<sup>+</sup>/CD8<sup>+</sup>/B220<sup>+</sup>/CD19<sup>+</sup>), myeloid (CD11b<sup>+</sup>/Gr1<sup>+</sup>), and erythroid (Ter119<sup>+</sup>) compartments. These data, presented in Author response image 4, indicate that normal mature hematopoietic output is preserved across genotypes. Statistical caveats are described in the corresponding figure legend and in our response to R3 point 8.

      Public Reviews:

      Reviewer 1 (Public review):

      (1) Two binding states: justification and controls

      The authors propose two states of GATA2 binding. Are there only two states? Some longbinding duration distributions here are very long-tailed (e.g., Figure 2D middle), suggesting a possible third state. The authors must explain how they determined that two states provide the best fit and how they classified short versus long binding. Controls should be included for long-lived and short-lived binding (e.g., histone proteins, HaloTag-NLS, or a binding-deficient GATA2 mutant).

      Agreed in part; we will attempt the requested binding-deficient control using existing GATA2 deletion constructs, complemented by GRID and H2B-HaloTag controls.

      We will clarify that the two-state framework is an operational model rather than a claim that GATA2 can occupy only two physical states. This approach is widely used in SMT studies of chromatin-associated transcription factors and transcription machinery (Gebhardt et al., 2013; Liu et al., 2014; Hansen et al., 2017; Kenworthy et al., 2022). In particular, Ling et al. (Science, 2026) recently used two-exponential survival-probability fitting across 58 Halotagged transcription-associated proteins to distinguish transient and stable chromatin-binding populations, while explicitly noting that the simplified two-state model provides a tractable framework even when the underlying physical behavior may be more heterogeneous.

      We agree that our current two-state model may under-represent the diversity of GATA2 chromatin-binding populations in single cells. However, even within this simplified framework, the existing analysis already indicates increased upper-tail dispersion of kinetic measurements (e.g., residence time and/or percentage of stable events) at the single-cell level in early erythroid cells. To support the goodness-of-fit metrics from our two-state fitting, as Reviewer 3 recommends, we will provide a supplementary table containing confidence intervals for the rate parameters and an F-test metric describing the differences between one- and two-state fits.

      To determine whether additional binding states exist, we will perform GRID (Genuine Rate Identification from Distributions), which does not bias the model toward a particular number of states and, in our experience across multiple proteins, yields fits with 3-5 binding populations. However, we have found that in many cases, GRID requires aggregating binding events from multiple cells to achieve consistently robust fits for the populations of relatively rare, long-lived (>~30 sec) binding events. Therefore, GRID will assess whether additional populations exist, but we will lose the ability to analyze changes in the cell populations at the single-cell level.

      We will include the multi-state analysis as a new supplementary figure. We will additionally clarify in the Results and Methods exactly how short- and long-lived binding events are classified (1-second threshold consistent with prior single-molecule frameworks for transcription-factor chromatin interactions; Gebhardt et al., 2013; Liu et al., 2014; Kenworthy et al., 2022) and direct the reviewer to these passages.

      For the requested controls, we will include H2B-HaloTag imaging under matched conditions as a long-lived reference for both photobleaching correction and as a positive control for stable chromatin association, addressing R1 point 2 and R3 point 1 simultaneously.

      We will also attempt to address the reviewer’s request for a binding-deficient control. We have lentiviral constructs in hand that encode GATA2 with a C-terminal zinc-finger deletion (which removes the primary DNA-binding domain), an N-terminal zinc-finger deletion, and a double deletion. We will perform single-molecule tracking of these mutants in the engineered cell systems and test whether removing GATA2’s specific DNA-binding capacity produces the predicted reduction in long-lived chromatin engagement, providing a functional perturbation control. The interpretation of these experiments will depend on the mutants expressing and localizing appropriately, which we will validate before drawing kinetic conclusions. We note that an analogous binding-deficient mutant cannot be examined in the physiological context of the Gata2SNAP knock-in mouse, and we will frame the cell-line mutant analyses accordingly. Together with GRID and the H2B-HaloTag control, these mutants provide complementary lines of validation for the two-state kinetic framework.

      (2) Photophysical and focal-plane artifacts

      The authors should exclude contributions from (i) photobleaching, (ii) blinking, and (iii) Z-axis motion. Describe and quantify the photobleaching correction. Provide analyses or controls that distinguish true dissociation events from photophysical blinking/bleaching or axial motion.

      Agreed.

      We will substantially expand the methodological description and provide three new pieces of supplementary analysis:

      - Photobleaching: A per-cell photobleaching-rate distribution will be plotted for each cell type and differentiation stage, and photobleach-corrected residence-time values will be reported alongside apparent values in the relevant figures. We will also perform H2B-HaloTag imaging under matched illumination, exposure, and dye conditions in each cell line as a longlived chromatin-bound reference, establishing per-cell-type bleach lifetimes to which the GATA2 measurements can be referenced. This approach follows recent SMT precedent in which H2B decay was used to correct residence-time measurements for photobleaching, chromatin and nuclear motion, microscope drift, defocalization, and dye photophysics (Ling et al., Science 2026). The right-censoring photobleach-correction model used in our analysis will be described in detail in the revised Methods, including parameter values and per-cell handling.

      - Blinking: The STRAP single-particle tracking pipeline already accommodates fluorophore blinking when linking trajectories across successive frames, following the multiple-targettracing framework of Sergé et al. (Nature Methods, 2008). This use of short gap-frame allowances to avoid artificially splitting trajectories due to fluorophore blinking or transient defocalization is consistent with recent live-cell SMT studies of chromatin-associated factors (Ling et al., Science 2026). We will add an explicit statement to the Methods describing how blinking-tolerant linkage parameters are set, and we will reanalyze representative datasets

      with stricter maximum off-frame settings to ensure this parameter does not drive our conclusions (also addressing R3 point 6).

      - Z-axis motion: Given our 500-ms exposure and the ~500-nm axial detection range of the HiLo configuration, axial loss is expected to be a minor contributor. We will quantify this indirectly by plotting, as a supplementary analysis, the maximum in-plane 2D spatial exploration of each binding trajectory, defined as the long-axis diameter of the 2D trajectory envelope. Although this does not directly measure z-position, it serves as a control for large apparent displacements that could reflect molecules moving out of the HiLo detection volume and demonstrates that observed dissociation events are not dominated by axial drift.

      Representative photobleaching traces from individual cells (lowest, highest, and median bleach rates) will be included to support the single-molecule interpretation (also addresses R1 point 5).

      (3) HILO illumination and nuclear region sampled

      HiLo is sensitive to illumination angle: slight changes sample different nuclear regions. Explain how the HiLo angle was controlled and confirmed comparable across cells and conditions.

      Agreed.

      We will add a Methods subsection describing our HiLo illumination procedure. In brief, we started at a TIRF-supercritical angle and reduced it toward epifluorescence just enough to achieve high imaging depth while minimizing out-of-focus background signal. Within each biological system (cell line or primary cells), the TIRF angle was held constant across Basal, Early, and Late conditions to ensure direct comparability of kinetic measurements across stages.

      (4) Quantification of event counts and long-binding durations

      The number of binding events and the duration of long-binding events are influenced by imaging conditions. Provide a more detailed analysis of how these variables were controlled and assess the sensitivity of the results to detection and tracking parameters.

      Agreed.

      We will (i) normalize per-cell binding-event counts to nuclear cross-sectional area (extracted from the segmented nuclear masks already in the STRAP pipeline) to control for differences in nuclear size; (ii) report the tracking-parameter sensitivity sweep described above; and (iii) confirm in the revised Methods that all imaging conditions (laser power, exposure, dye concentration, sample preparation) were held constant across stages and cell types, consistent with the existing manuscript text. Per the Reviewing Editor’s guidance, the planned labeling-efficiency and absolute-molecule-quantification experiments will further constrain the interpretation of binding-event counts across conditions.

      (5) Evidence that spots are single molecules

      Provide evidence that spots represent single molecules.

      Agreed.

      We will include a small number of per-event intensity traces from our STRAP tracking output, selected to illustrate the single-step photobleaching behavior characteristic of single-molecule emission (intensity remains approximately constant during the binding event and then drops to background in a single step). The nuclear-fluorescence measurements from the planned labeling-titration experiment will also allow us to confirm that bound-spot densities are consistent with single-molecule occupancy at the labeled fraction used for tracking.

      (6) Description of the spot-analysis pipeline

      The Methods should include a detailed STRAP pipeline description.

      Partially agreed; the existing STRAP reference is appropriate, but the Methods will be expanded.

      STRAP (Haque & Coleman, 2025) is a consolidated, automated implementation of two well-established, previously published frameworks: SLIMfast / multipletarget tracing (Sergé et al., 2008) and evalSPT (Normanno et al., 2015), both of which are cited in the original manuscript. We will expand the Methods to describe the parameter set used in our analysis (detection thresholds, linking radii, gap-frame allowance, photobleaching correction model) so that readers can assess the analysis without referring exclusively to the STRAP manuscript and code repository, while preserving the cited STRAP reference for the full algorithmic description. We respectfully suggest that a complete pipeline description duplicating Haque & Coleman (2025) would not be appropriate in a primary research article.

      (7) Differences among cell systems

      The three cell systems yield notably different results. Provide a more detailed explanation for these differences.

      Agreed.

      We will also explicitly describe the caveats of the engineered systems versus the native GATA2-SNAP primary-cell system, in which endogenous GATA2-SNAP remains under physiological regulation. Specifically, we will discuss how variables such as the GATA1null background, ectopic forced nuclear import of GATA1-ERT, and ectopic GATA2-Halo in G1E-ER4 cells, as well as ectopic GATA2-Halo, endogenous GATA1, and cytokine signaling in HPC7 cells, likely contribute to the observed differences in signatures.

      Reviewer 2 (Public review):

      (1) Expression levels of the GATA2-HaloTag transgene

      Determine the expression levels of the GATA2-HaloTag transgene over the course of differentiation under the conditions used for single-molecule imaging.

      Agreed.

      This is the central concern flagged by the Reviewing Editor. For each cell line (G1E-ER4 and HPC7), we will (i) measure total nuclear GATA2-Halo fluorescence per cell under matched acquisition conditions and (ii) convert this fluorescence intensity to absolute molecules per cell using a Halo-CTCF/U2OS reference standard (Cattoglio et al., 2019; absolute CTCF abundance quantification applied previously by our group). This will provide per-cell GATA2Halo molecule counts at each differentiation stage (Basal, Early, Late). For the primary GATA2SNAP cells, we will perform the analogous comparison against a SNAP-RPB1/U2OS standard.

      (2) Fraction of molecules labeled

      Carry out a titration of the HaloTag ligand and compare the amount of labeled protein under single-molecule imaging conditions to that of saturating labeling.

      Agreed.

      We will perform HaloTag-ligand and SNAP-tag-ligand titrations in each cell type, comparing nuclear fluorescence under the limiting-label conditions used for single-molecule tracking with that under saturating labeling. This will yield a per-cell-type labeled fraction and allow us to confirm that comparisons of binding-event counts across conditions are not confounded by differences in labeling efficiency. The labeled-fraction values will be reported in a new supplementary figure and incorporated into our quantification of binding-event rates.

      (3) Robust single-particle tracking

      Show images of particle trajectories or movies superimposing trajectories on imaging data.

      Agreed.

      We will generate visualizations of selected long-lived binding events with single-particle trajectories overlaid on the imaging data — using a multi-frame color overlay (e.g., five sequential frames in distinct colors superimposed) so that linkage of the spot across frames is visually unambiguous — and include them as a new supplementary figure or movie. Examples will be drawn from each cell system to demonstrate consistent tracking quality.

      Reviewer 3 (Public review):

      (1) Photobleaching correction; per-cell bleach lifetimes

      Report the per-stage (or per-cell) photobleaching lifetimes and the photobleachcorrected residence time values alongside apparent values, ideally with an H2B-Halo control.

      Agreed.

      Addressed by the photobleach-rate distribution and H2B-HaloTag control analyses described under R1 point 2. The supplementary figure will explicitly compare per-cell bleach lifetimes across stages, report photobleach-corrected residence-time values alongside apparent values and include H2B-HaloTag controls under matched conditions in each cell line.

      (2) Mechanistic differences across systems

      The three systems show qualitatively different signatures: residence time change in G1EER4, bound fraction expansion in HPC7 and primary cells. Reporting an on-rate proxy alongside k_off would help.

      Agreed.

      Addressed by the cross-system kinetic framing described under R1 point 7 and by the GRID state-spectrum analysis described under R1 point 1. We will explicitly frame the three systems in terms of underlying kinetic mechanism in both Results and Discussion, following the conceptual distinction emphasized by Ling et al. (Science 2026) in which residence time reports binding stability once engaged, whereas changes in bound fraction or event frequency can indicate altered association/recruitment efficiency. In this framework, the G1E-ER4 residencetime signature is consistent with reduced dissociation (a longer-lived bound state), while the longlived-fraction expansion in HPC7 and primary cells is consistent with an increased target-search efficiency or specific-binding-competent pool. Alongside the GRID-derived state-spectrum analysis, we will report an apparent engagement-rate proxy calculated as binding events per unit imaging time normalized to detectable molecule number; this proxy is an approximation, not a direct k_on measurement, as accurate determination of k_on from single-molecule tracking requires concentration-dependent on-rate experiments that are outside the scope of the present study. We thank the reviewer for this suggestion, which we agree sharpens rather than alters the central message.

      (3) Per-cell GATA2 concentration and the uncoupling claim

      Quantify total nuclear GATA2-Halo signal per cell across stages; for primary cells, a western blot or quantitative immunofluorescence on flow-sorted populations would make the uncoupling argument more defensible.

      Agreed.

      For the cell lines, the per-cell nuclear GATA2-Halo quantification described in our response to R2 point 1 addresses this point.

      For primary cells, where the biological claim is strongest, we will exploit the endogenous Gata2SNAP knock-in itself as a quantitative reporter of total GATA2 protein. Specifically, we will label flow-sorted CD71/Ter119 populations from Gata2-SNAP mouse bone marrow with SNAP-Cell 647-SiR at saturating concentration in a parallel acquisition to the limiting-label single-molecule tracking experiment. Total nuclear SNAP-GATA2 fluorescence at saturating labeling provides a measure of endogenous GATA2 abundance per cell at each erythroid stage, in the same chemistry used for our single-molecule measurements, and will be benchmarked against a SNAPRPB1/U2OS reference standard for absolute molecule counting. This approach (i) measures the protein of interest in the labeling chemistry already established in this study; (ii) avoids reliance on quantitative immunofluorescence, which we have not been able to validate under our flowsorted-cell conditions; and (iii) extends the same analytical framework — saturating versus limiting labeling, with U2OS reference standards — across cell lines and primary cells. Quantitative western blotting on flow-sorted populations remains an alternative we will consider if specifically requested by the reviewers.

      (4) Single-cell distribution analysis

      Distribution-based statistics (K-S test, mixture model) rather than (or alongside) meanbased ANOVA, particularly for the Early populations, which look potentially bimodal.

      Agreed.

      We will perform Kolmogorov–Smirnov and Gaussian mixture model analyses of the single-cell long-lived fraction and residence-time distributions across stages, reporting these alongside the existing Welch ANOVA results in a new supplementary figure. This analysis is consistent with the conceptual framework cited in the manuscript (Wheat et al., 2020; Palii et al., 2019) for probabilistic hematopoietic transitions and may reveal subpopulation structure underlying the Early-stage signal. The GRID analysis further complements this by formally testing whether multi-state mixture models are statistically preferred at each stage. However, GRID analysis requires aggregating binding events across cells, which limits our ability to monitor changes in population dispersion at the single-cell level.

      (5) Quantitative integration of CUT&Tag with SMT

      Attempt a back-of-the-envelope calculation of whether the residence-time or fraction changes are quantitatively consistent with the acquisition of the 1,167 Early-restricted sites.

      Partially agreed; will attempt an order-of-magnitude framing.

      We thank the reviewer for this thoughtful suggestion. We agree that more explicit framing of the quantitative relationship between the two datasets will strengthen the integration. We will add a paragraph to the Discussion presenting an order-of-magnitude calculation linking the observed residence-time and long-lived-fraction changes to the steady-state occupancy increase predicted at competent regulatory sites, with explicit caveats regarding (i) the inherently semi-quantitative nature of CUT&Tag signal and (ii) the assumptions required to translate population-averaged occupancy into the genome-wide site count observed. For the G1EER4 cells, we observe relatively minor shifts in population-mean behavior as single-cell dispersion increases. Therefore, it may be difficult to directly link population-based measurements (e.g. CUT&Tag) with single-cell kinetic measurements (SPT). This distinction between occupancy and dynamics is consistent with recent systematic SMT analysis of the eukaryotic transcription machinery, in which factors appearing persistently associated in ensemble genomic assays were shown to exchange on second-scale timescales in living cells (Ling et al., Science 2026), emphasizing that population genomic occupancy and single-molecule residence time are complementary but not directly interchangeable measurements. Closing this gap rigorously is a major hurdle for the field and will require substantial technology development on quantitative single-cell CUT&Tag occupancy measurements. We will therefore frame our analysis as a consistency check rather than a strict quantitative integration. The reviewer notes that this analysis “does not change the central message; it sharpens it,” and we agree.

      (6) Short-lived kinetic interpretation and tracking parameters

      The 1.5 s gap allowance is long relative to the short-lived residence times in primary cells. A sensitivity analysis with tighter gap parameters would help. Also clarify how slowing of search reconciles with increased binding events at Early.

      Agreed.

      Addressed by the tracking-parameter sensitivity analysis described under R1 point 2. We apologize for the lack of clarity in our original description of the gap allowance. Our current maximum off-frame parameter is set to 2 frames, corresponding to a 0.5-s gap allowance. We will rerun the tracking analysis on representative datasets using a maximum off-frame parameter of 1, corresponding to no missed frames, and will report the resulting residence-time distributions alongside the original analysis to demonstrate robustness. We will also clarify in the Results and Discussion how changes in short-lived binding kinetics are reconciled with the increase in detectable binding events at the Early stage, drawing on the apparent engagement-rate proxy interpreted alongside the GRID-derived state-spectrum analysis.

      (7) CUT&Tag peak definition and quantitative analysis

      Report (a) signal intensity distribution at the 1,167 sites across stages (scatter or density plot beyond the heatmap) or (b) differential binding analysis (e.g., DESeq2). State replicate count and overlap of Early-restricted sets across replicates.

      Agreed; normalized fold-change analysis completed, with replicate-aware differential binding analysis planned if additional replicates are generated.

      We have performed a normalized count-based fold-change analysis of the union peak set from the existing GATA2 CUT&Tag dataset (14,468 peaks) using the goodpeaks framework previously used in our group, yielding per-peak log2 fold-change values and discrete dynamicstatus calls (Gained / Lost / Unchanged at |log2FC| ≥ 2) for each of the two transitions (Basal → Early at 0 vs 2 h, and Early → Late at 2 vs 24 h). This provides a conservative quantitative complement to the presence/absence peak-calling analysis presented in Figure 5; if additional replicate data are generated, we will perform replicate-aware differential binding analysis (DiffBind/DESeq2; Love et al., 2014; Stark & Brown, 2011) and report replicate overlap. This analysis addresses option (b) of the reviewer’s request and also enables the visualization requested in option (a) as a cross-stage scatter (Author response image 1). We present the quantitative analysis as a supplement to the presence/absence-defined Early-restricted set in Figure 5 of the manuscript, providing two orthogonal lines of evidence for the same biology. We note that the CUT&Tag experiments were initially performed as a validation step to confirm that the tagged GATA2-Halo constructs recapitulate endogenous chromatin-binding behavior, including appropriate genomic localization and expected GATA switch dynamics. This validation supports the conclusion that the observed single-molecule kinetics reflect physiologically relevant GATA2 engagement. Having established this, we subsequently extended the dataset to perform the quantitative analyses presented here.

      Quantitative findings.

      - 384 peaks were Gained (|log2FC| ≥ 2) at the Basal → Early transition.

      - 1,006 peaks were Lost over the same transition.

      - 178 peaks were Gained at Basal → Early and subsequently Lost at Early → Late, defining the strict differentially-restricted Early set (Author response image 1, red points). This set represents the higher-confidence subset of the manuscript’s broader presence/absence-defined Earlyrestricted set (n = 1,167; defined as MACS2 peaks at q < 0.01 present at Early but absent at Basal and Late).

      - 200 peaks were Gained at Early and retained at Late, indicating stable acquisition.

      - 49 peaks were acquired only at the Late stage.

      The discrepancy between the broader presence/absence set (1,167) and the strict differential set (178) reflects the analytical choice the reviewer raised: presence/absence calls based on a peaksignificance threshold are sensitive to near-threshold peaks, whereas differential analysis with a fold-change cutoff captures only sites with quantitatively pronounced stage-restricted enrichment. We interpret these as two complementary definitions: the broader set captures all peaks meeting a stage-specific peak-calling criterion, and the strict subset isolates the most quantitatively dynamic core of that population.

      Importantly, the three named example loci shown in Figure 5D of the manuscript — Nono (promoter-proximal), Nr3c1 (intron 2), and Gata3 (distal intergenic) — all survive the strict differential criterion (each shows |log<sup>2</sub>FC| ≥ 2 in both transitions, consistent with a clean Gainedthen-Lost signature). The published example panel therefore represents the high-confidence intersection of both definitions, supporting the robustness of the manuscript’s selected illustrative cases.

      We will explicitly state the number of CUT&Tag replicates per stage in the revised Methods and figure legends. Where the differential analysis is currently based on a single replicate per stage, we will explicitly note this and treat the strict subset as a conservative confirmatory analysis. An additional replicate is under consideration for the full revision, and if performed, overlap of Earlyrestricted calls across replicates will be reported.

      Motif cross-validation against a matched-GC background using HOMER and/or MEME-ChIP is planned for the strict differential subset and will be reported alongside the original SeqPos analysis in the revised Figure 5F or its supplement.

      Author response image 1.

      Cross-stage log<sub>2</sub> fold-change scatter for GATA2 CUT&Tag peaks. Each point represents a single peak in the union peak set (n = 14,468). The x-axis shows the log2 fold change from Basal (0 h) to Early (2 h); the y-axis shows the log2 fold change from Early (2 h) to Late (24 h). The sign convention follows the field-standard direction (positive log2FC = increased signal at the later time point). Peaks are colored by dynamic-status classification: unchanged/other (gray; n = 9,794); Lost at Early (blue; n = 109); Gained at Early and retained at Late (orange; n = 200); acquired only at Late (teal; n = 49); and Early-restricted, defined as Gained at Early and Lost at Late with |log2FC| ≥ 2 in both transitions (red; n = 178). The Early-restricted population occupies the lower-right quadrant, consistent with a transient kinetic peak of GATA2 binding.

      Author response image 2.

      Density representation of GATA2 CUT&Tag peak dynamics with Early-restricted peaks highlighted.

      Author response image 2 is shown for illustrative reference and is not annotated with a separate legend; it presents the same data as Author response image 1 in a hexbin density format to emphasize the bulk of unchanged peaks at the origin and the spatial separation of the Early-restricted set.

      Author response image 3.

      Genomic-annotation comparison of newly acquired GATA2 binding at Early. Stacked-bar comparison of genomic annotations (ChIPseeker classification) for two definitions of the newly acquired GATA2 peaks at the Early erythroid stage: all peaks Gained at Basal → Early (orange; n = 384) and the strict Early-restricted subset (Gained then Lost; red; n = 178). Annotation categories shown: Promoter (≤1 kb of TSS), Intron, Distal Intergenic, and Other (Exon, 5′/3′ UTR, Downstream). Both peak sets contain substantial promoter-proximal and distal/intronic components, consistent with the two-subclass model described in Figure 5E–G of the manuscript (GATA2-only promoter-proximal peaks with GATA/RUNX motifs, and GATA2/GATA1 cobound distal peaks with composite GATA/E-box motifs). The strict subset shows a higher proportion of intronic and distal-intergenic sites and a lower proportion of promoter-proximal sites than the full Gained set; this difference will be discussed transparently in the revised Results. Motif analysis (HOMER/MEME-ChIP, planned for the full revision) will be performed on both peak sets to confirm that the GATA/RUNX and GATA/E-box subclass signatures are preserved.

      (8) Knock-in mouse hematopoietic validation

      A brief characterization of basic hematopoietic parameters in homozygotes (CBC, LSK/HSPC frequencies, or colony assays) would confirm the tagged allele is physiological.

      Agreed; data acquired and analyzed.

      We have characterized mature trilineage hematopoietic populations in whole bone marrow from wild-type, heterozygous (Gata2Het), and homozygous (Gata2Homo) Gata2-SNAP knock-in mice (n = 5 per genotype). Bone marrow cells were stained for myeloid (CD11b<sup>+</sup> Gr1<sup>+</sup>), lymphoid (CD3<sup>+</sup>/CD4<sup>+</sup>/CD8<sup>+</sup>/B220<sup>+</sup>/CD19<sup>+</sup>), and erythroid (Ter119<sup>+</sup>) markers and analyzed by flow cytometry. Lineage frequencies are shown as percentages of live bone marrow cells in a new Figure Supplement in the revised manuscript.

      For myeloid and erythroid populations, omnibus one-way ANOVA detected no significant differences across genotypes (Myeloid: F(2,12) = 2.616, P = 0.1140; Erythroid: F(2,12) = 0.4943, P = 0.6219). Dunnett’s multiple-comparisons test against the WT control did not detect significant pairwise differences for either knock-in genotype (Myeloid: WT vs Het P = 0.1351, WT vs Homo P = 0.9926; Erythroid: WT vs Het P = 0.7017, WT vs Homo P = 0.9602).

      For the lymphoid compartment, although the omnibus ANOVA reached significance (F(2,12) = 6.690, P = 0.0112), no pairwise comparison against WT remained significant after multiplecomparisons correction (Dunnett’s adjusted P values: WT vs Het = 0.1217; WT vs Homo = 0.2078). We therefore interpret this result conservatively. Brown-Forsythe and Bartlett’s tests showed no significant differences in variance across genotypes (P = 0.1423 and P = 0.0908), so the result is not attributable to unequal variances. We do not interpret these data as indicating an unambiguous lymphoid phenotype in either heterozygous or homozygous Gata2-SNAP mice; this interpretation is consistent with the broader pattern across all three lineages, in which no pairwise comparison against WT survives multiple-comparisons correction. We will note in the figure legend and in the Results text that more granular HSPC-compartment analysis (LSK, MPP, lineage-restricted progenitor frequencies) and a complete blood count (CBC) remain valuable directions for future characterization of this resource and will be considered for the full revision if specifically requested.

      Author response image 4.

      Bone marrow trilineage frequencies in Gata2-SNAP knock-in mice. Bone marrow was harvested from the femurs and tibias of wild-type (WT), heterozygous (Gata2Het), and homozygous (Gata2Homo) Gata2-SNAP knock-in mice (n = 5 per genotype; mixed sex; 12–14 weeks). After ACK lysis, cells were stained for myeloid (CD11b<sup>+</sup> Gr1<sup>+</sup>), lymphoid (CD3<sup>+</sup>/CD4<sup>+</sup>/CD8<sup>+</sup>/B220<sup>+</sup>/CD19<sup>+</sup>), and erythroid (Ter119<sup>+</sup>) markers and analyzed by flow cytometry. Each dot represents one mouse, and horizontal bars indicate genotype means. Statistical results: Myeloid: ANOVA F(2,12) = 2.616, P = 0.1140; Dunnett’s adjusted P values WT vs Het = 0.1351, WT vs Homo = 0.9926. Lymphoid: ANOVA F(2,12) = 6.690, P = 0.0112 (omnibus); Dunnett’s adjusted P values WT vs Het = 0.1217, WT vs Homo = 0.2078. Erythroid: ANOVA F(2,12) = 0.4943, P = 0.6219; Dunnett’s adjusted P values WT vs Het = 0.7017, WT vs Homo = 0.9602. Brown-Forsythe and Bartlett’s tests for unequal variance were non-significant in all three lineages. Although the lymphoid omnibus ANOVA reached nominal significance, no pairwise comparison with WT remained significant after multiple-comparison correction; we therefore interpret this result conservatively (see response to R3 point 8).

      Summary

      We thank the editors and the three reviewers for the constructive and detailed assessment. The planned revisions consist of:

      - Four new experiments [planned] (HaloTag/SNAP labeling efficiency and absolute molecule counts via U2OS reference standards; H2B-HaloTag photobleaching reference; percell quantification of total endogenous GATA2 in flow-sorted primary Gata2-SNAP populations via saturating SNAP-tag labeling, benchmarked against a SNAP-RPB1/U2OS reference standard; single-molecule tracking of GATA2 N-terminal, C-terminal, and double zinc-finger deletion mutants in the engineered cell systems as a binding-deficient functional control).

      - Six analyses of existing data (GRID multi-state fitting [planned]; per-cell bleach-rate distributions and photobleach-corrected residence times [planned]; tracking-parameter sensitivity [planned]; nuclear-area normalization and total-displacement controls [planned]; normalized fold-change CUT&Tag analysis [completed; motif cross-validation planned], presented in Author response images 1–3; distribution-based single-cell statistics [planned]).

      - One previously-acquired dataset [completed] (trilineage hematopoietic flow cytometry of homozygous Gata2-SNAP knock-in mice; presented in Author response image 4 with full statistical detail).

      - Substantial revisions to text and figures [planned] to address statistical reporting, methodological description, mechanistic framing of cross-system differences, and refinement of the Figure 6 schematic.

      With respect to the requested binding-deficient single-molecule control, we will attempt to address this directly using sequence-validated lentiviral constructs in hand encoding GATA2 mutants lacking the C-terminal zinc finger, the N-terminal zinc finger, or both. These mutant analyses will be complemented by GRID multi-state analysis and H2B-HaloTag controls, providing converging lines of validation for the two-state kinetic framework. We note that an analogous mutant cannot be examined in the physiological context of the Gata2-SNAP knock-in mouse, and we will frame the cell-line mutant analyses accordingly.

      We believe these revisions directly address the editors’ specific guidance regarding labeling efficiency and methodological clarification. We thank the editors and reviewers for their time and look forward to submitting the revised manuscript.

      References cited in this response:

      References listed below are cited in this provisional response in support of the planned analyses and methodology.

      Cattoglio, C., Pustova, I., Walther, N., Ho, J. J., Hantsche-Grininger, M., Inouye, C. J., Hossain, M. J., Dailey, G. M., Ellenberg, J., Darzacq, X., Tjian, R., & Hansen, A. S. (2019). Determining cellular CTCF and cohesin abundances to constrain 3D genome models. eLife, 8, e40164. https://doi.org/10.7554/eLife.40164

      Gebhardt, J. C. M., Suter, D. M., Roy, R., Zhao, Z. W., Chapman, A. R., Basu, S., Maniatis, T., & Xie, X. S. (2013). Single-molecule imaging of transcription factor binding to DNA in live mammalian cells. Nature Methods, 10(5), 421–426. https://doi.org/10.1038/nmeth.2411

      Hansen, A. S., Pustova, I., Cattoglio, C., Tjian, R., & Darzacq, X. (2017). CTCF and cohesin regulate chromatin loop stability with distinct dynamics. eLife, 6, e25776. https://doi.org/10.7554/eLife.25776

      Haque, N., & Coleman, R. A. (2025). Dynamic transcription pre-initiation complex assembly governs initiation efficiency. bioRxiv. https://doi.org/10.1101/2025.05.07.652662

      Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y. C., Laslo, P., Cheng, J. X., Murre, C., Singh, H., & Glass, C. K. (2010). Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Molecular Cell, 38(4), 576–589. https://doi.org/10.1016/j.molcel.2010.05.004

      Kaya-Okur, H. S., Wu, S. J., Codomo, C. A., Pledger, E. S., Bryson, T. D., Henikoff, J. G., Ahmad, K., & Henikoff, S. (2019). CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nature Communications, 10(1), 1930. https://doi.org/10.1038/s41467-019-09982-5

      Kenworthy, C. A., Haque, N., Liou, S.-H., Chandris, P., Wong, V., Dziuba, P., Lavis, L. D., Liu, W.-L., Singer, R. H., & Coleman, R. A. (2022). Bromodomains regulate dynamic targeting of the PBAF chromatin-remodeling complex to chromatin hubs. Biophysical Journal, 121(9), 1738–1752. https://doi.org/10.1016/j.bpj.2022.03.027

      Ling, Y. H., Liang, C., Wang, S., & Wu, C. (2026). Live-cell single-molecule dynamics of eukaryotic RNA polymerase machineries. Science, 391, eads0960. https://doi.org/10.1126/science.ads0960

      Liu, Z., Legant, W. R., Chen, B.-C., Li, L., Grimm, J. B., Lavis, L. D., Betzig, E., & Tjian, R. (2014). 3D imaging of Sox2 enhancer clusters in embryonic stem cells. eLife, 3, e04236. https://doi.org/10.7554/eLife.04236

      Loeffler, D., Wang, W., Hopf, A., Hilsenbeck, O., Bourgine, P. E., Rudolf, F., Martin, I., & Schroeder, T. (2018). Mouse and human HSPC immobilization in liquid culture by CD43- or CD44-antibody coating. Blood, 131(13), 1425–1429. https://doi.org/10.1182/blood-2017-07-794131

      Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNAseq data with DESeq2. Genome Biology, 15(12), 550. https://doi.org/10.1186/s13059-014-0550-8

      Machanick, P., & Bailey, T. L. (2011). MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics, 27(12), 1696–1697. https://doi.org/10.1093/bioinformatics/btr189

      Normanno, D., Boudarène, L., Dugast-Darzacq, C., Chen, J., Richter, C., Proux, F., Bénichou, O., Voituriez, R., Darzacq, X., & Dahan, M. (2015). Probing the target search of DNA-binding proteins in mammalian cells using TetR as model searcher. Nature Communications, 6, 7357. https://doi.org/10.1038/ncomms8357

      Palii, C. G., Cheng, Q., Gillespie, M. A., Shannon, P., Mazurczyk, M., Napolitani, G., Price, N. D., Ranish, J. A., Morrissey, E., Higgs, D. R., & Brand, M. (2019). Single-cell proteomics reveal that quantitative changes in co-expressed lineage-specific transcription factors determine cell fate. Cell Stem Cell, 24(5), 812–825.e5. https://doi.org/10.1016/j.stem.2019.02.016

      Sergé, A., Bertaux, N., Rigneault, H., & Marguet, D. (2008). Dynamic multiple-target tracing to probe spatiotemporal cartography of cell membranes. Nature Methods, 5(8), 687–694. https://doi.org/10.1038/nmeth.1233

      Stark, R., & Brown, G. D. (2011). DiffBind: Differential binding analysis of ChIP-Seq peak data. Bioconductor. http://bioconductor.org/packages/release/bioc/html/DiffBind.html

      Taylor, S. J., Stauber, J., Bohorquez, O., Tatsumi, G., Kumari, R., Chakraborty, J., Bartholdy, B. A., Schwenger, E., Sundaravel, S., Farahat, A. A., Dutta, A., Koche, R. P., Steidl, U., & Wheat, J. C. (2024). Pharmacological restriction of genomic binding sites redirects PU.1 pioneer transcription factor activity. Nature Genetics, 56(10), 2213–2227. https://doi.org/10.1038/s41588-024-01911-7

      Wheat, J. C., Salsman, J., Reekie, I., Mathhwala, A., Black, K. L., Tiedt, R., Shroff, H., & Steidl, U. (2020). Single-molecule imaging of transcription dynamics in somatic stem cells. Nature, 583(7816), 431– 436. https://doi.org/10.1038/s41586-020-2432-4

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript is very interesting and timely. By introducing the critical effects of desolvation barriers and solvent (water)-separated minima into the implicit-solvent potentials (of mean force, PMFs) for coarse-grained molecular dynamics simulations of biomolecular liquid-liquid phase separation (LLPS), this work fills a gap that should be apparent to researchers of protein folding in the past couple of decades but has so far escaped deserved attention such that these basic features of aqueous solvation have seldom, though not never, been invoked in recent studies of biomolecular condensates. Although the present paper deals almost exclusively with homopolymers, this work can be a foundation for the future development of a new, more physical coarse-grained interaction scheme for simulating amino acid sequence-dependent effects, which I presume is the authors' ongoing or next endeavor. The results presented in this manuscript are highly valuable.

      We thank the reviewer for all the insightful comments.

      However, there is room for improvement in the authors' description of (i) the broader impact of effects of desolvation barrier and solvent-separated minimum in the thermodynamics of biomolecular condensates, especially with regard to the ramifications on hydrostatic pressure-dependent effects; (ii) the physical implication of using a 20-parameter hydropathy scale rather than a 210-parameter pairwise amino acid interaction scheme; and (iii) temperature-dependent effects, including the authors' discussion of "enthalpic" and "entropic" contributions. In all these aspects, the authors' discussion should be put in a more comprehensive context of the existing literature. At a few other places, the description of the methods and results should be clarified as well. Accordingly, the authors should revise the manuscript to address the following items thoroughly within the revised manuscript (not merely in the response letter) with the additional references mentioned below included in the revised discussion:

      (1) In several places, e.g., on line 77 (p.2), the authors appear to suggest that "implicit-solvent representation" is the origin of the deficiency in commonly utilized coarse-grained potentials that this study is aiming to rectify. But desolvation barriers and solvent-separated minima are also features of implicit-solvent representations; they are just features that should be incorporated in more accurate implicit-solvent potentials. This point is stated quite clearly and accurately in the Abstract (p.1) but not consistently in the rest of the text. The authors should check the entire text carefully to ensure that a coherent, accurate perspective is presented.

      We thank the reviewer for the insightful comment and suggestion. In this work, rather than departing from the implicit‑solvent modeling framework, our intention is to incorporate the desolvation effect within the implicit solvent model framework. In the revised manuscript, we will revise the text to ensure this point is presented clearly and consistently throughout the paper.

      (2) In the discussion of the importance of desolvation barriers and solvent-separated minima in the Introduction (pp.1-3), connections should be drawn to recent works that utilize these PMF features to rationalize hydrostatic pressure (P)-modulated effects on biomolecular LLPS, including the P-dependent reentrant phase separation of alpha elastin; see Cinar et al. (2019) Chem Eur J 25:13049 (https://chemistry-europe.onlinelibrary.wiley.com/doi/full/10.1002/chem.201902210) and references therein, especially discussions around Figures 10, 11 & 13 in this reference.

      We thank the reviewer for bringing these references to our attention. The hydrostatic pressure modulated effects on LLPS provide important context for understanding the physical significance of desolvation barriers and solvent‑separated minima. In the revised manuscript, we will expand the literature discussion by incorporating previous studies on pressure‑modulated phase separation.

      (3) In the lower panels of Figures 2D, E (p.5), what do the differently colored small circles in the double-minimum free energy profiles represent? Does the color shading have the same meaning as that in the upper panels? If so, what do the positions of the circles on the free energy profile represent? The authors should clarify this.

      We thank the reviewer for the suggestion to improve the clarity of the figure. In the lower panels of Figures 2D and 2E, the colored dots were intended solely as a qualitative illustration of the populations of residue‑pair configurations along the effective energy surface. Their colors are not related to the color scale used in the phase diagrams shown in the upper panels. We will modify the color scheme to improve clarity.

      (4) The discussion regarding entropy and enthalpy around Figure 2 is quite confusing as it stands. What do the authors mean exactly by the association of entropy or enthalpy with the desolvation barrier of the solvent-separated minimum? Are they referring to conformational entropy?

      We apologize for the confusion. When the desolvation barrier is high, configurations with inter‑residue distances corresponding to the barrier region become difficult to access, thereby reducing the conformational entropy of the condensed phase. This interpretation is supported by Figure 2—figure supplement 1C, where increasing the desolvation barrier decreases the population in the barrier region of the radial distribution function, indicating that fewer residue‑pair configurations are sampled there. In contrast, increasing the depth of the solvent‑separated minimum makes the condensed phase more energetically favorable. In the revised manuscript, we will incorporate this discussion to improve clarity.

      (5) Do the authors assume that the PMF (effective implicit-solvent potential) is a purely enthalpic term? It appears to be the authors' assumption. If so, the assumption has to be stated clearly in their discussion of "entropy" vs "enthalpy" around Figure 2.

      We thank the reviewer for highlighting this important point. In this work, the PMF profile is constructed from atomistic simulation results, and thus both entropic and enthalpic contributions shape the overall PMF. In the revised manuscript, we will clarify that the PMF represents a free‑energy profile along the intermolecular distance and therefore incorporates enthalpic and entropic contributions from the solute, solvent, and configurational degrees of freedom.

      (6) Closely related to points 3-5 above, it should be stated clearly that the "temperature" used in the authors' simulations does not represent experimental temperature if the authors are using purely enthalpic effective potentials because PMFs are in fact temperature-dependent. This clarification is necessary to avoid misunderstanding. In this regard, it should be noted that temperature-dependent effective interactions have been used for modeling biomolecular condensates in analytical theory (Lin, Song, Forman-Kay & Chan, J Mol Liq 2017, already in the citation list) as well as in coarse-grained molecular dynamics simulations [Dignon et al. (2019) ACS Cent Sci 5:821-830 (https://pubs.acs.org/doi/10.1021/acscentsci.9b00102); Chakravarti & Joseph (2025) Protein Sci 34:e70284 (https://onlinelibrary.wiley.com/doi/10.1002/pro.70284)]. The latter two studies, not cited currently, are particularly relevant and thus should be cited because the authors may wish to incorporate temperature-dependent features in their ongoing or future effort in constructing a more comprehensive coarse-grained interaction scheme for biomolecular LLPS simulation.

      We thank the reviewer for raising this important point. We agree that PMFs and the corresponding effective interactions should be temperature dependent, and therefore the simulation temperature in our current temperature-independent CG potential cannot be interpreted as a fully quantitative experimental temperature. In the revised manuscript, we will clarify the above point. We will also expand the discussion to include previous studies that introduced temperature-dependent effective interactions in analytical theories and coarse-grained simulations of biomolecular condensates.

      (7) In tackling "entropy" vs "enthalpy", it should be noted that the temperature dependence of the effective interactions entails an entropic contribution (which is itself temperature dependent) in addition to conformational entropy. As for the effective potential with desolvation barrier and solvent-separated minimum, it should be noted that the decomposition into entropic and enthalpic contributions at the direct contact, desolvation barrier, and solvent-separated minimum can be dramatically different, see, e.g., MaCallum et al. (2007) PNAS 104:6206-6210 (https://www.pnas.org/doi/full/10.1073/pnas.0605859104) and references therein.

      We agree that a temperature‑dependent PMF includes entropic contributions beyond the configurational entropy discussed around Figure 2. In the present manuscript, our discussion of entropy in that context refers specifically to the reduced accessible configurational space of residue‑pair states in the coarse‑grained ensemble, rather than to a full thermodynamic decomposition of the PMF. In the revised manuscript, we will make this distinction explicit. We will also note that the direct‑contact minimum, desolvation barrier, and solvent‑separated minimum may each have distinct enthalpic and entropic components, and that resolving these components would require additional temperature‑dependent PMF calculations. We will discuss this as a limitation of the current model and as a direction for future parameterization.

      (8) P.7, line 340: The proportionality relation follows directly from the standard Flory-Huggins result T_c = T chi(T)/chi_c, thus the proportionality constant is exactly 1/chi_c. Is this the standard relation that the authors are invoking here? The authors should clarify this.

      We thank the reviewer for pointing this out. Yes, our argument uses the condition that chi_c is fixed at the critical point for a given chain length. We will revise the text to explicitly state this relation and add the missing intermediate step, so that the proportionality used in the manuscript is clearer.

      (9) The study on dynamic consequences on pp.8-11 is interesting, but clarifications are necessary:

      (i) The vertical schematic in Figure 4A should be explained in detail in its entirety. As it stands, no explanation is provided either in the figure caption or in the text. In particular, what does "elasticity driven" refer to?

      (ii) The top snapshot in Figure 4A is labeled t_sim = 0 ns. Does it mean that the snapshot shown is the only chain configuration that the authors used to start the simulation, and that the snapshot does NOT represent the result of any time evolution, no matter how short the duration is? However, if that is the case, why is this snapshot identified with spinodal decomposition if it is not the product of a time evolution from a more homogeneous configuration?

      (iii) Related to (ii) - do the rectangular boxes shown represent the entire simulation box or just part of the box containing the polymer chains? One would imagine that if the top snapshot represents spinodal decomposition, the simulation would have been started at a more uniform distribution a short time prior? Why is this not the case?

      (iv) What precisely do the small yellow beads and black-colored springs in the zoom-in image of Figure 4E represent?

      We thank the reviewer for pointing out these unclear issues in Figure 4. In the revised manuscript, we will better explain the vertical schematic in Figure 4A, including the progression from the early growth of density fluctuations, to intermediate kinetic arrest, and finally to late-stage coarsening. We will also clarify that “elasticity driven” refers to the resistance to domain deformation caused by transient inter-chain network connectivity. We will clarify that t_sim = 0 denotes the time immediately after the temperature quench from the high-temperature homogeneous state to the low-temperature two-phase region. This snapshot is the post-quench initial configuration, while spinodal decomposition refers to the subsequent amplification of density fluctuations after the quench. The displayed snapshot is one representative trajectory, not the only initial configuration used in the simulations. The quantitative kinetic analysis was averaged over multiple independent trajectories. The rectangular box represents the entire simulation box. Although the system was equilibrated at high temperature before the quench, instantaneous density fluctuations remain, so the initial configuration is not perfectly uniform. In Figure 4E, the yellow beads represent interacting residue pairs. The black springs schematically represent the transient elastic network formed by these interactions, rather than a precise structural model.

      (10) In discussing dynamic effects, it is useful to draw connections to related works on the effect of chain flexibility on "aging" of condensate [Biswas & Potoyan (2024) PRX 45:9222-9245 (https://journals.aps.org/prxlife/abstract/10.1103/PRXLife.2.023011)] and characterization of viscoelasticity in simulations of biomolecular condensates [Tejedor et al. (2023) J Phys Chem B 127:4441-4459 (https://pubs.acs.org/doi/10.1021/acs.jpcb.3c01292)], as the effects of desolvation can be explored further based on these prior works.

      We thank the reviewer for this helpful suggestion. In the revised Discussion, we will cite and discuss the related studies on condensate aging and viscoelasticity, including the effects of chain flexibility, sticker lifetime, desolvation, and transient network formation on condensate material properties. These works provide an important context for interpreting our dynamic results. We will clarify that desolvation may influence condensate dynamics not only by slowing local rearrangements, but also by modulating transient network connectivity, kinetic arrest, and viscoelastic relaxation.

      (11) Much of the present study is based on the original HPS formulation of Dignon et al. (2018). In this regard and also in anticipation of future development of improved interaction schemes, several issues should be stated and discussed, even if briefly:

      (i) The original HPS model has a basic shortcoming in accounting for the relative interaction strengths of, among others, arginine vs lysine residues [Das et al. (2020) PNAS 117:28795-28805 (https://www.pnas.org/doi/10.1073/pnas.2008122117)].

      (ii) Compared to 210-parameter pairwise interaction schemes, such as KH in Dignon et al. (2018) and Joseph et al. (2021), the 20-parameter interaction scheme is likely too restrictive to account for pairwise amino acid residue interactions [Wessén et al. (2022) J Phys Chem B 45:9222-9245 (https://pubs.acs.org/doi/10.1021/acs.jpcb.2c06181)].

      (iii) The height of the desolvation barrier may vary significantly for different amino acid residue pairs, see, e.g., Figure 11 of Cinar et al. (2019) mentioned above (and references therein). The authors should discuss these nuances in the revised version. They may also wish to take them into consideration in future investigations.

      We thank the reviewer for highlighting these important limitations of the original HPS-based framework. We agree that a 20‑parameter hydropathy‑scale model has limitation in fully capturing residue‑pair‑specific interactions, including well‑established differences such as those between arginine and lysine. In the revised manuscript, we will explicitly discuss this limitation and cite the suggested studies on residue‑specific and pairwise interaction schemes. We also agree that desolvation barriers and solvent‑separated minima are likely to depend on amino‑acid pair identity. In the present work, we employ a simplified, residue‑independent desolvation parameterization to isolate the general thermodynamic and kinetic consequences of desolvation in coarse‑grained LLPS simulations. In the revised Discussion, we will clarify this scope and emphasize that developing residue‑pair‑specific desolvation parameters, potentially within a 210‑parameter interaction framework, is an important direction for future work.

      Reviewer #2 (Public review):

      Summary:

      This manuscript addresses an important and timely question in the molecular simulation of biomolecular condensates. Most residue-level coarse-grained models used for IDP phase separation employ implicit solvent and represent effective interactions through relatively simple pairwise potentials. While these models have been very useful, they usually do not explicitly distinguish direct contacts from solvent-separated interactions, nor do they include an energetic barrier associated with water removal. This manuscript attempts to address that limitation by introducing desolvation-inspired terms into coarse-grained models and examining their consequences for phase behavior, chain conformations, dense-phase packing, and dynamics.

      Strengths:

      The central idea is physically well motivated. Using a simple homopolymer model, the authors show that increasing the desolvation barrier suppresses phase separation, whereas stabilizing solvent-separated contacts enhances phase separation. They further show that solvent-separated interactions can reduce dense-phase over-compaction, which is a meaningful result given the known challenges in obtaining both accurate single-chain dimensions and realistic dense-phase properties from the same coarse-grained model. The finding that desolvation-like terms can reshape dense-phase packing without simply rescaling the overall interaction strength is interesting and could be useful for future model development. I also found the attempt to connect conformational changes across dilute and dense phases with thermal distance from the critical point to be intriguing. The dynamic analysis, including the FRAP-like simulations and the discussion of kinetic arrest during coarsening, adds another useful dimension to the work.

      We thank the reviewer for all these positive and constructive assessment and comments. We are encouraged that the reviewer found the central idea physically well motivated and recognized the value of introducing desolvation-inspired terms to distinguish direct contacts, solvent-separated interactions, and the energetic barrier associated with water removal in coarse-grained models of biomolecular condensates.

      Weaknesses:

      At the same time, there are several places where the manuscript would benefit from more careful framing. First, the desolvation terms are still effective coarse-grained parameters rather than a direct representation of water molecules. The language sometimes gives the impression that desolvation is being treated explicitly, whereas the model introduces desolvation-inspired effective interactions into an implicit-solvent framework.

      We agree that the current wording should more clearly reflect the nature of our model. The desolvation terms introduced in this work are effective coarse-grained interaction terms rather than an explicit molecular representation of water. In the revised manuscript, we will carefully revise the language throughout the article to describe the model as incorporating desolvation-inspired effective interactions within an implicit-solvent coarse-grained framework.

      Second, the conformational analysis is interesting, but the broader context of prior work on dilute-to-dense phase conformational reorganization of IDPs could be more clearly discussed. This would help clarify what is new in the present work, whether it is the conformational change itself, its dependence on desolvation terms, or the proposed scaling with distance from the critical point.

      We appreciate this suggestion. In the revised manuscript, we will place the conformational analysis in the context of prior work and discuss the observed conformational changes more explicitly from the perspective of desolvation-inspired interactions. We will also clarify the assumptions behind the scaling relation between conformational change and thermal distance from the critical point.

      Third, the dynamic results are potentially useful, but the manuscript should more clearly articulate what is nontrivial beyond the expected slowing of local rearrangements by an added barrier in the potential.

      We thank the reviewer for the suggestion. In the revised manuscript, we will clarify which aspects of the observed dynamics can be directly expected from the added desolvation barrier and which trends arise from the combined effects of desolvation on packing density, chain mobility, kinetic arrest, and coarsening.

      We again thank the editors and reviewers for their constructive comments and suggestions. We believe that the planned revisions will improve the precision of the model description, clarify the physical interpretation of the desolvation-inspired terms, expand the relevant literature context, and better define the scope and limitations of the current framework.

    1. Author response:

      We thank the Reviewers for their time and effort reviewing our manuscript, we are particularly thankful for the literature recommendations of Reviewer 1, and the analysis ideas of Reviewer 2.

      We are glad that both Reviewers agree that the method we developed provides value to the field. We furthermore agree that our theoretical claims and conclusions could be supported by further analyses. Thus, we primarily plan to focus on this.

      We plan to strengthen our statements by:

      - Comparing our metrics to those of alternative learning processes and hypotheses

      - Additional analyses, including ones using standardized learning scores, collapsed saccade likelihoods for learning-dependent and not-learning-dependent saccades, angular deviations instead of the binary update variable, and a breakdown of high-probability triplets into ones that end with a pattern element or a random one.

      - Adding further information regarding saccades, trials without saccades, and saccade starting points.

      Furthermore, we plan to strengthen our Methods section: some of the Reviewers’ points potentially stem from our unclear description of the ASRT task, thus, the Task & Procedure section needs deeper and clearer explanations. Lastly, we will extend the Introduction, citing the literature recommended in the reviews, which indeed could provide further depth.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents valuable findings on the relationship between nutrient availability and NAD/NADH levels, which in turn regulate biomass production in cancer cells. The authors provide solid evidence to support their claims, offering insight into why it is difficult to predict which nutrients limit cancer cell growth: both cell type and nutrient availability together determine the oxidative capacity that constrains the synthesis of various metabolic intermediates. The manuscript will be of interest to researchers working in cancer and cell metabolism.

      We thank the eLife Editor for evaluating our manuscript and for the positive comments.

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates how cellular NAD/NADH ratios are controlled in cancer cell lines in vitro. The authors build on previous work, which shows that serine synthesis is sensitive to NAD/NADH ratios and PHGDH expression. Here, the authors demonstrate that serine synthesis is variable across a panel of cell lines, even when controlling for expression of serine synthesis enzymes such as PHGDH. The authors show that cellular NAD/NADH ratios correlate with the ability to synthesize serine and grow in serine-deprived environments when PHGDH levels remain constant. Investigating this variability in NAD/NADH ratios, the authors find that the cells that can positively respond to serine deprivation are able to increase oxygen consumption and cellular NAD/NADH ratios. Cells that do not increase oxygen consumption in response to serine deprivation do not increase NAD/NADH ratios and cannot grow well without serine. The authors go on to show that in cells with the ability to increase oxygen consumption upon serine deprivation, PHGDH expression alone is sufficient to fully restore growth-serine; in cells that cannot increase oxygen consumption, both PHGDH expression and interventions to increase NAD/NADH ratios are required to increase growth. Thus, cells need both PHGDH and NAD/NADH increases to maximize serine synthesis in response to serine deprivation. The authors previously showed that lipid synthesis likewise requires NAD regeneration. Interestingly, one cell line that does not increase oxygen consumption in response to serine limitation tends to increase oxygen consumption in response to lipid deprivation; accordingly, depriving this cell line of lipids increases the synthesis of serine. Together, these findings show that how cells respond to nutrient deprivation is highly variable and that the response to nutrient deprivation (for example, whether or not oxygen consumption is increased) will determine how well cells tolerate depletion of nutrients with related biosynthetic constraints. This work sheds light on the complexity of cancer cell metabolism and helps to explain why it is difficult to predict which nutrients will be limiting to any cancer cell type or environment.

      Strengths:

      (1) The authors use multiple interventions to manipulate NAD/NADH ratios in cells.

      (2) Experiments are well controlled and appropriately interpreted.

      Weaknesses:

      Overall the data support the conclusions of the manuscript. I have only two minor comments and suggestions:

      We thank Reviewer 1 for their insightful comments, which have helped us improve the manuscript.

      (1) Figure 2B/C: data are presented as relative to +serine, which shows how some cells respond to -serine, but may also be of interest to see how absolute (not relative) NAD/NADH levels correlate with serine synthesis and serine-independent proliferation. In other words, is it the dynamic increase in the ratio that is most important, or the absolute level of the ratio?

      We thank Reviewer 1 for raising this point about whether it is the absolute NAD+/NADH ratio, or the change in NAD+/NADH ratio, that is important for increasing serine synthesis and allowing proliferation under serine depleted conditions. We reported relative ratios for accessibility to a general audience, but agree that this information is informative and should be presented. We assessed the NAD+/NADH ratio using an enzymatic assay, which does not directly measure absolute concentrations of NAD+ or NADH (PMID: 26232225). However, we previously confirmed the assay is in the same linear range for both NAD+ and NADH, and thus is valid for assessing the NAD+/NADH ratio. We now provide the unnormalized NAD+/NADH ratio data in Supplementary Figure 2G of the revised manuscript. This shows that the considered cells exhibit a range of NAD+/NADH ratios, and redox responsive cells do not cluster in having a higher or lower NAD+/NADH ratio.

      To more formally answer Reviewer 1’s question about whether the absolute ratio or change in ratio is important for increasing serine synthesis, we measured the correlation coefficient between the unnormalized NAD+/NADH ratios and the proliferation rate of all examined cancer cells cultured with or without serine. These data are presented in Author response image 1. Of note, we find that there is a significant positive correlation between the raw values of the measured NAD+/NADH ratio and proliferation rate in both serine-replete (r = .371) and serine depleted (r = .562) conditions. However, this correlation is not strong, and when examining the cancer cells whose proliferation in serine depleted conditions cannot be fully explained by serine synthesis enzyme expression (Calu6, 8988T, A549, MIA PaCa-2, H1299, and HCT116), there is no significant correlation between the raw NAD+/NADH ratio and proliferation rate in serine depleted conditions. The association between the relative change in the NAD+/NADH ratio and proliferation rate is much stronger upon serine deprivation (r = .571), as presented in Figure 2C of the revised manuscript. This suggests that the dynamic increase in the ratio is more tightly linked to the change in serine synthesis rate and proliferation in serine depleted environments, and we discuss this point in the revised manuscript with the following text:

      “Of note, whether the NAD+/NADH ratio of a cell was more or less oxidized in serine-replete conditions was not predictive of response to serine withdrawal (Supplementary Figure 2G).” (Lines 163-165)

      Author response image 1.

      Correlations between unnormalized NAD+/NADH ratios and cell proliferation rates between (A) all cancer cells examined (Calu6, MCF7, MDA-MB-231, A549, 8988T, MIA PaCa-2, A375, H1299, HCT116, MDA-MB-231 with PHGDH overexpression) in serine-replete conditions, (B) all cancer cells examined in serine depleted conditions, and (C) select cancer cells (labeled in gray) where serine synthesis enzyme protein expression does not fully explain proliferation in serine depleted conditions. Pearson correlation coefficient and P values were calculated by simple linear regression, *p<0.05, **p<0.01. Data shown are means of three biological replicates ± SD.

      (2) Line 177-178: the authors write, "We hypothesized that the elevated NAD+/NADH ratio represented a cellular response to make the NAD+/NADH ratio more oxidized to enable serine synthesis". I recommend modest edits to avoid anthropomorphizing. It is possible that the ratio responds for reasons yet to be determined and not necessarily because the cell is deliberately trying to enable serine synthesis.

      We thank Reviewer 1 for raising this point. We agree that our data do not show whether the ratio is elevated for the deliberate purpose of enabling serine synthesis and have edited the text accordingly with the following edit to that line of the revised manuscript:

      “We hypothesized that a more oxidized NAD+/NADH ratio could support greater serine synthesis and thus sought to identify the processes that increase the NAD+/NADH ratio in some but not all cancer cells.” (Lines 190-192)

      Reviewer #2 (Public review):

      In the manuscript "Cancer cells differentially modulate mitochondrial respiration to alter redox state and enable biomass synthesis in nutrient-limited environments", Chang et al investigate how cancer cells respond to the limitation of certain environmental nutrients by regulating the cellular NAD+/NADH ratio. They focus on serine and lipid metabolism, pathways known to be controlled by the NAD+/NADH ratio, and propose that changes in mitochondrial respiration in response to deprivation of these nutrients can influence the NAD+/NADH ratio, thereby impacting biomass synthesis.

      While the study is descriptive in nature and does not investigate specific molecular mechanisms that explain the crosstalk between nutrient availability and mitochondrial redox changes, the experimental component is robust, and the conclusions are well supported by the results. Some suggestions could further refine the conclusions and enhance the quality of the manuscript.

      We thank Reviewer 2 for their time and for their suggestions to improve the manuscript.

      Main critiques:

      (1) Throughout the manuscript, the authors utilise the number of cell doublings per day as an endpoint readout of cell proliferation. It would be advisable to include a quantification of the cell number and to display the proliferation rate over time. This would provide valuable insights into the timeline of cellular responses and avoid potential confounding effects associated with the use of Sulforhodamine B dye, an indirect measure of cell proliferation based on protein content, which may be influenced by some of the interventions. Furthermore, it will help determine whether specific treatments reduce cellular doublings resulting from cell death. This concern is particularly evident in treatments with rotenone, e.g., Fig. 1G, where the increase in doublings could be attributed to cell death.

      We thank the reviewer for this suggestion and agree that assessment of cell count provides additional information beyond Sulforhodamine B dye as an indirect measure of proliferation. To address this, we directly measured cell number over time using Incucyte Live-Cell imaging analysis applied to A549 and H1299 cells cultured with or without serine for 72 hours. Consistent with results using sulforhodamine B, A549 cells doubled at a rate of 0.874 per day and H1299 cells doubled at a rate of 1.034 per day in serine-replete conditions. In serine depleted conditions, A549 cells doubled at a rate of 0.205 per day while H1299 cells doubled at a rate of 0.544 per day. We have added the cell number measurements over time as well as the corresponding calculated doublings per day in Supplementary Figure 2D and Supplementary Figure 2E of the revised manuscript.

      We also agree with Reviewer 2 that serine deprivation and rotenone treatment could potentially impact cell viability, which might confound phenotypes, including NAD+/NADH ratio measurements. To assess whether serine deprivation and rotenone treatment cause cell death, we measured cell viability using Sytox Green after exposing cells to these conditions for 72 hours. We find that there is indeed more cell death in cells cultured without serine at most concentrations of rotenone. However, cell death did not exceed 4% in any of the conditions tested, suggesting this is not a major contributor to the cell doubling phenotypes. These data are now presented in Supplementary Figure 1C of the revised manuscript. However, in light of Reviewer 2’s comments, along with a comment from Reviewer 3 about whether rotenone induces ROS and cellular stress responses, we have decided to remove the proliferation data involving rotenone that were in Figure 1F and 1G of the original manuscript. The rationale is that the potential confounding impacts of rotenone on viability make interpreting the proliferation data difficult. Instead, we have focused Figure 1 of the revised manuscript on the observation that there is specifically a correlation between the cell NAD+/NADH ratio and serine synthesis.

      (2) The authors propose a model in which the deprivation of extracellular nutrients impacts mitochondrial respiration, which in turn increases the NAD+/NADH ratio and ultimately affects metabolic biosynthetic pathways that occur in the cytosol, such as serine biosynthesis. The mechanism by which nutrient availability is sensed and transmitted across different cellular compartments to regulate mitochondrial redox status remains unclear. This concern is particularly relevant for serine metabolism, as its synthesis occurs in the cytosol, but the authors connect it to mitochondrial respiration. Compartment-specific measurements of NAD+/NADH ratio would help to understand to what extent the redox state is affected by nutrients in the mitochondria and in the cytoplasm (see also minor critiques point 2). Moreover, the use of the genetic tool LbNox could be employed to manipulate the NAD+/NADH ratio in a compartmentspecific manner, while also avoiding the toxicity of certain compounds, such as rotenone. This set of experiments would add depth to the investigation, which might otherwise appear too descriptive.

      (A) Compartment-specific measurements of NAD+/NADH ratio would help to understand to what extent the redox state is affected by nutrients in the mitochondria and in the cytoplasm

      The question of how nutrient availability is sensed and transmitted across cellular compartments to impact mitochondrial respiration is important. However, rigorous assessment of compartment-specific metabolism is quite challenging, as we are not aware of tools to accurately measure redox ratios in a compartment-specific manner. Direct assessment of cofactor levels in subcellular compartments requires long isolation times and are unlikely to be accurate (PMID: 27565352). Rapid immunopurification of mitochondria has been used to estimate metabolite levels and ratios, but accurate measurements are hindered by rapid oxidation of NADH to NAD+. The use of fluorescence lifetime imaging (FLIM) to monitor NADH levels does not allow for accurate monitoring of the NAD+/NADH ratio as NAD+ cannot be visualized and NADH cannot be distinguished from NADPH. Additionally, the resolution of FLIM to interrogate compartment-specific signals is limited (PMID: 38594590). Fluorescent sensors, such as SoNar, have been used to image the NAD+/NADH ratio in compartments, though SoNar is sensitive to pH changes, which vary across compartments, and it has been argued that these sensors are more suitable for qualitative, not quantitative, changes in the NAD+/NADH ratio (PMIDs: 25955212, 29181426). It has also been argued that sensors are not amenable to measurement of mitochondrial ratios, as the predicted ratios are too reduced for the range of the sensors. Given these technical limitations, we opted to attempt a rapid subcellular fractionation (~25 second process to separate cytoplasm and mitochondria) followed by enzyme-based measurements of the NAD+/NADH ratio (PMID: 36883551), acknowledging the limitations of this approach. We find that across both A549 and H1299 cells, the mitochondrial NAD+/NADH ratio is lower than the cytosolic NAD+/NADH ratio, as expected. Using this approach, we find that in A549 cells, serine depletion leads to a decreased cytosolic NAD+/NADH ratio compared to serine-replete conditions while having no impact on the mitochondrial NAD+/NADH ratio. On the other hand, serine depletion leads to an elevated cytosolic NAD+/NADH ratio in H1299 cells while also having no impact on the mitochondrial NAD+/NADH ratio. In parallel, we used extracellular pyruvate exposure as a positive control, which should support cytosolic NAD+ regeneration, and rotenone as a negative control, which should suppress mitochondrial NAD+ regeneration. We show that pyruvate led to an elevated cytosolic NAD+/NADH ratio whereas rotenone treatment led to a decreased cytosolic NAD+/NADH ratio. Despite rotenone inhibiting complex I of the electron transport chain, we did not observe a change in the mitochondrial NAD+/NADH ratio (Author response image 2). This likely indicates that this assay is not sensitive enough to detect changes in mitochondrial NAD+/NADH, and we opted not to include these data in the revised manuscript given the limitations of the approach.

      Author response image 2.

      Rapid subcellular fractionation to examine compartment-specific NAD+/NADH ratios. (A) Cytosolic and mitochondrial NAD+/NADH ratios of A549 cells grown with or without serine for 24 hours, n=3. (B) Cytosolic and mitochondrial NAD+/NADH ratios of H1299 cells grown with or without serine for 24 hours, n=3. (C) Cytosolic and mitochondrial NAD+/NADH ratios of H1299 cells treated with either 1 mM pyruvate or 50 nM rotenone for 24 hours, n=3. P-values were calculated using a Student’s t-test, *p<0.05, **p<0.01. Data shown are means ± SD.

      We nevertheless draw the following conclusions from these data:

      (1) Changes to mitochondrial NAD+/NADH either do not occur or are not captured with this approach. Even rotenone treatment, which inhibits complex I and might be expected to change mitochondrial redox state, does not change the measured mitochondrial NAD+/NADH ratio.

      (2) The whole cell NAD+/NADH ratio most likely reflects changes in the cytosolic NAD+/NADH ratio. While observing no impact on the mitochondrial NAD+/NADH ratio after rotenone treatment, we still find the cytosolic NAD+/NADH ratio is decreased. Moreover, both pyruvate and serine depletion led to an elevated cytosolic NAD+/NADH ratio in H1299 cells, which we observe at the whole cell level.

      (3) H1299 cells depleted of serine elevate the cytosolic NAD+/NADH ratio, while rotenone treatment decreased the cytosolic NAD+/NADH ratio despite changes in mitochondrial respiration. This suggests that redox shuttles, such as the malate aspartate shuttle, play a role in communicating changes in mitochondrial redox dynamics to the cytoplasm. We test this hypothesis as described in response to Reviewer 2, point B, below.

      (B) The mechanism by which nutrient availability is sensed and transmitted across different cellular compartments to regulate mitochondrial redox status remains unclear

      Multiple known shuttles are involved in exchanging redox equivalents between the mitochondria and the cytosol. It is likely that multiple shuttles are involved, or could be involved in the right context, but one major shuttle is the malate aspartate shuttle (MAS), and the MAS has been shown previously to support de novo serine synthesis (PMID: 37647199). Thus, we hypothesized that the MAS is involved in the response involving elevated mitochondrial respiration in H1299 cells to increase the whole cell NAD+/NADH ratio upon serine deprivation. To test this, we used CRISPR/Cas9 to generate H1299 cells lacking MAS components GOT1, MDH1, or GOT2 and measured the cell NAD+/NADH ratio. We did not knock out MDH2 given its integral role in the TCA cycle. We find that when MDH1 and GOT2 are knocked-out, H1299 cells no longer exhibit elevated whole cell NAD+/NADH ratios upon serine deprivation. Consistently, removing MDH1 and GOT2 also blunted the increase in oxygen consumption as well as the increase in serine synthesis upon serine deprivation. This suggests that MDH1 and GOT2 activity though the MAS support the process by which mitochondrial NAD+ regeneration is transmitted to the cytoplasm to support serine synthesis. We have added these data as Supplementary Figure 7 in the revised manuscript.

      (C) Moreover, the use of the genetic tool LbNox could be employed to manipulate the NAD+/NADH ratio in a compartment-specific manner

      We thank Reviewer 2 for the suggestion to consider whether LbNOX might be used to manipulate the NAD+/NADH ratio in a compartment-specific manner. We expressed LbNOX in both the cytoplasm and the mitochondria of A549 (serine non-responsive) cells. We predicted that if LbNOX expression, either in the cytoplasm or the mitochondria, affected the NAD+/NADH ratio, proliferation in serine depleted conditions might be improved. However, we found that expressing LbNOX in the cytoplasm or the mitochondria of A549 cells had no effect on the NAD+/NADH ratio. Thus, LbNOX expression in either compartment also did not change proliferation in serine depleted conditions. These data are consistent with the known limitations of this genetic tool. While LbNOX can increase NADH oxidation in response to some interventions like rotenone, it does not necessarily change the NAD+/NADH ratio of unperturbed cells. This was reported in the original description of LbNOX (PMID: 27124460). We confirmed that LbNOX was successfully expressed via immunoblotting, and also confirmed that LbNOX functioned by showing either cytoplasmic or mitochondrial LbNOX expression improves cell proliferation following complex I inhibition. Thus, expressing LbNOX in A549 cells is not informative for understanding compartment specific metabolism following serine deprivation. Nevertheless, as this question is likely to come up for other readers, we have included these data as Supplementary Figure 6 in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Minor critiques:

      (1) It seems clear from the authors' data that the response to serine depletion in terms of cell proliferation is not determined exclusively by PHGDH levels. It would be useful to measure the levels of the other two enzymes in the serine synthesis pathway and also to measure serine uptake under normal conditions in the different groups of cells. This information could provide some insight into the different responses of cancer cell lines to serine deprivation.

      (A) It would be useful to measure the levels of the other two enzymes in the serine synthesis pathway

      Reviewer 2 raises a fair point, and we agree that measuring levels of other enzymes in the serine synthesis pathway is informative. Thus, we measured the expression of phosphoserine aminotransferase 1 (PSAT1) and phosphoserine phosphatase (PSPH) across all cancer cells examined and find that, similar to PHGDH protein expression, PSAT1 and PSPH protein expression is lower in many cancer cells that are more sensitive to serine withdrawal (e.g. MCF7). However, among the cancer cells where PHGDH protein expression did not explain the response to serine withdrawal, the protein expression of PSAT1 and PSPH also did not explain how well the cells proliferate without environmental serine. These data have been included in Supplementary Figure 2B of the revised manuscript.

      Of note, we measured serine synthesis enzyme expression for the six cancer cell lines whose proliferation in serine depleted conditions better correlated with a change in the NAD+/NADH ratio than it did with PHGDH expression: Calu6, 8988T, A549, MIA PaCa2, H1299, and HCT116. For these cells, we correlated proliferation upon serine depletion with PHGDH, PSAT1, and PSPH protein expression and found that interestingly, there was a significant negative correlation between PHGDH protein expression and proliferation upon serine deprivation. This was not observed for PSAT1 expression, and a statistically significant positive correlation between proliferation and PSPH protein expression was noted, though the variation in PSPH protein expression was large. We have added these correlation data to the revised manuscript as Supplementary Figure 2F.

      (B) It would be useful to measure…serine uptake under normal conditions in the different groups of cells

      Per the Reviewer’s request, we performed absolute quantification of serine uptake rates in serine-replete conditions for three serine “non-responder” cancer cells (Calu6, 8988T, A549) and three serine “responder” cancer cells (MIA PaCa-2, H1299, HCT116). We did not observe a notable difference in serine uptake rate and whether cells responded to serine deprivation. Additionally, with the exception of 8988T cells having a higher serine uptake rate than the other cells, there was no statistical difference in serine uptake across the cancer cells tested (Author response image 3).

      Author response image 3.

      Basal serine uptake rate of exponentially growing cells in serine replete conditions. Serine levels were measured using GC MS before and after 24 hours of serine depletion and normalized by area under the growth curve (PMID: 26954548). P-values were calculated using one-way ANOVA followed by a post-hoc Tukey HSD test, *p<0.05, **p<0.01

      (2) The authors experimentally demonstrated that some cancer cells respond to serine depletion with an increase in mitochondrial respiration, but the molecular mechanism behind this is not addressed. There is some evidence in the literature showing that serine acts as an activator of the glycolytic enzyme PKM, which is coherent with an increased mitochondrial respiration in the absence of serine (PMID: 23064226). The authors could discuss their findings in the context of this paper. Additionally, they could provide some insights about baseline mitochondrial activity in the different cell lines. Indeed, it seems that "redox responsive cells" might have an overall increased basal OCR.

      We appreciate the suggestion that pyruvate kinase M (PKM) may mediate the elevation in mitochondrial respiration in response to serine depletion. Given that serine is an allosteric activator of PKM, and PKM suppression can increase mitochondrial OCR, we discuss this possibility in the Discussion section of the revised manuscript using the following text:

      “Interestingly, serine is an allosteric activator of the glycolytic enzyme pyruvate kinase, which converts phosphoenolpyruvate to pyruvate and generates ATP (Chaneton, 2012). Thus, decreased environment serine availability in addition to differences in pyruvate kinase activity may yield lower glycolytic ATP, resulting in greater mitochondrial respiration in serine redox responder cancer cells.” (Lines 443-447)

      Additionally, we appreciate the reviewer’s observation that redox responsive cells may have an overall increased basal respiration rate. We directly measured mitochondrial dependent oxygen consumption in the same assay to test whether redox responsive cells exhibit higher mitochondrial respiration. We find that while the redox responsive H1299 and MIA-PaCa2 cells have higher mitochondrial respiration than non-responsive cells, HCT116 cells that are also redox responsive to serine deprivation, did not exhibit higher mitochondrial respiration compared to redox non-responsive Calu6, 8988T, and A549 cells (Author response image 4). However, when comparing redox non-responders versus responders as a whole, there was a statistically significant difference in basal OCR. Together, this suggests that basal mitochondrial respiration rate in serine-replete conditions may be related in some cases to whether cancer cells elevate mitochondrial respiration and the NAD+/NADH ratio upon serine deprivation, but this cannot be the full explanation given the HCT116 cell data. We also acknowledge the reviewer’s statement that we do not understand the molecular mechanism by which respiration responds to serine deprivation and explicitly state this in the revised manuscript.

      Author response image 4.

      Basal Oxygen consumption rate (OCR) of cancer cells in serine-replete conditions. (A) Kinetic OCR measurements of cancer cells before and after rotenone and anti-mycin injection, n=8. Data shown are means ± SD. (B) Quantified mitochondrial OCR (removing residual OCR), n=8. Values are averages obtained over three measurements. P-values were calculated via nested ANOVA, ****p<0.001

      (3) There is a discrepancy between the basal values of the OCR from the same cell lines in different experiments, i.e., Figure 3A and Supp. Figure 3C, or in different experiments, Figure 3A, Figure 5E, and Figure 6A. The authors need to comment on/clarify that. Moreover, authors are encouraged to show ECAR values to support the conclusion that lactate production is not differentially affected by serine depletion, and thus, does not contribute to the increase in the NAD+/NADH ratio.

      We recognize the differences in basal OCR values across different experiments. Given experiment-to-experiment variation and the need for different cartridges for each Seahorse experiment, we have found that measured OCR values using Seahorse assays vary across experiments despite the same conditions. Additionally, while we aim to seed the same number of cells per assay, cell seeding and cell quantification after each Seahorse assay can contribute to variation. Given this variability on a per-assay basis, we performed a singular experiment across all examined cancer cell lines considered to minimize variation in oxygen sensor calibration and address the reviewer question about whether absolute differences might contribute to response. These data are shown in Author response image 4.

      Regarding the reviewer’s request to present ECAR data, we note that measuring ECAR is dependent on using unbuffered media and for this reason do not routinely measure ECAR. Our concern is that removing serum from the culture conditions can impact OCR measurements, and we instead prioritized maintaining the same media composition across all sets of experiments (i.e., cell proliferation assays, NAD+/NADH assays, kinetic tracing assays, and OCR measurements). Additionally, we point out that ECAR does not directly measure lactate. We refer the Reviewer to data included in the manuscript where GC-MS was used to directly measure lactate secretion over time for cells cultured with or without serine. These data are presented as Supplementary Figure 3B in the revised manuscript.

      (4) There seems to be also a discrepancy between the levels of M+2 citrate and the fraction labelled (Figure 5C versus Supplementary Figure 6C) in the H1299 cell line upon serine depletion, whereby the M+2 fraction seems unexpectedly lower in serinedeprived cells. In those conditions, H1299 cells showed an increased mitochondrial respiration, which is consistent with increased total citrate levels. This could be explained by a faster TCA cycle activity and the presence of higher-order isotopologues of citrate upon serine starvation. Is this the case? Showing the abundance of the different citrate isotopologues and their contribution to the total pool would help to interpret the results.

      We thank Reviewer 2 for this thoughtful comment regarding the discrepancy between M+2 citrate produced (normalized ion counts per cell) versus fraction of the total intracellular citrate pool that is M+2 labeled in serine depleted H1299 cells. In our kinetic U-<sup>13</sup>C-glucose tracing experiments, where we performed isotope labeling for up to 15 minutes, we only see a greater presence of M+3 citrate from fully labeled glucose without robust changes in M+4, M+5, or M+6 citrate (Author response image 5). An elevated M+3 citrate could represent pyruvate carboxylase activity, where M+3 labeled pyruvate is converted to M+3 oxaloacetate that then reacts with unlabeled acetyl-CoA to generate M+3 citrate.

      We also find that the total citrate pool in H1299 cells is elevated upon serine depletion (see Supplementary Figure 6C in the original manuscript). Thus, the fractional contribution of an isotope to the citrate pool may decrease despite an increase in the amount of the particular isotope. In the original manuscript, we included data from kinetic U-<sup>13</sup>C-glutamine tracing in H1299 cells cultured with or without serine (Supplementary Figure 6I,J of the original manuscript). We find that H1299 cells depleted of serine exhibit greater M+4 citrate (via oxidative decarboxylation) and greater M+5 citrate (via reductive carboxylation) compared to serine-replete H1299 cells. Thus, one other potential explanation for why M+2 citrate from kinetic U-<sup>13</sup>C-glucose tracing represents a lower fraction of the total citrate pool in serine depleted H1299 cells is because there is a larger contribution from glutamine to the citrate pool. While there was no difference in the fraction of the citrate pool that consists of M+4 citrate, there was a greater fraction of the citrate pool labeled by M+5 citrate upon kinetic U-<sup>13</sup>C-glutamine tracing in serine depleted H1299 cells (see Author response image 6A, B). There was also a greater fraction of the citrate pool from M+6 citrate upon kinetic U-<sup>3</sup>C-glutamine tracing in serine depleted H1299 cells (Author response image 6C). This would require M+3 pyruvate labeling from glutamine, which may be due to malic enzyme, which converts M+4 malate to M+3 pyruvate. M+3 pyruvate may also be formed by PEPCK, which could convert M+4 oxaloacetate to M+3 phosphoenolpyruvate, leading to M+3 pyruvate. While understanding the source of M+6 citrate from glutamine is out of the scope of this study, it may highlight an interesting metabolic shift in H1299 cells depleted of serine that could elevate the total intracellular citrate pool.

      Author response image 5.

      Citrate isotopologues (A. M+3; B. M+4; C. M+5; D. M+6) from kinetic U-<sup>13</sup>C-glucose tracing in H1299 cells depleted of serine for 24 hours. For all measurements, citrate values were normalized to internal norvaline standard and cell number for each condition, n=3. Data shown are means ± SD.

      Author response image 6.

      Fraction of the citrate pool labeled by U-<sup>13</sup>C-glutamine in H1299 cells depleted of serine for 24 hours. (A) Fraction of the total citrate pool that is M+4 citrate (formed via oxidative decarboxylation), n=3. (B) Fraction of the total citrate pool that is M+5 citrate (formed via reductive carboxylation), n=3. (C) Fraction of the total citrate pool that is M+6 citrate, n=3. Data shown are means ± SD.

      (5) The lipid depletion part of the paper seems to be somewhat tangential. The effect of lipid depletion on the NAD+/NADH ratio in A549 cells is modest, and the effects of dual serine and lipid depletion on OCR and NAD+/NADH ratio are not consistent. Moreover, if the authors want to show that these different nutritional environments affect lipid synthesis, apart from glucose incorporation into citrate, they would need to show actual carbon incorporation into palmitate, probably at longer time points.

      We apologize for the lack of clarity for how mitochondrial respiration and the NAD+/NADH ratio play a role in governing glucose oxidation to citrate. To better highlight our logic and rationale for investigating alterations in NAD+/NADH homeostasis and citrate synthesis under lipid depletion, we have added the following text to the revised manuscript:

      “Oxidative biosynthetic reactions other than serine synthesis can also be constrained by the NAD+/NADH ratio. For example, cancer cells deprived of environmental lipids increase oxidative citrate production, and we have previously found that citrate synthesis, either through glucose oxidation or glutamine oxidation, is limited by NAD+ availability (Li, 2022) (Figure 5A, Supplementary Figure 8A). Thus, we sought to uncover whether the increase in the cell NAD+/NADH ratio by mitochondrial respiration in response to serine withdrawal specifically supports greater serine synthesis or also leads to greater oxidative citrate production.” (Lines 307-313)

      While we have previously shown that alterations to the NAD+/NADH ratio can modify both citrate production and palmitate synthesis under lipid depleted conditions (PMID: 35739397), we agree with Reviewer 2 that no conclusion can be made about lipid synthesis without direct measurements and have revised the manuscript accordingly.

      (6) In Figure 6C-6F, showing the results of the controls (+serine +lipids) will help to clarify the extent to which serine and citrate synthesis rates are affected by the different interventions.

      We thank the reviewer for the comment. Because we specifically asked how dual serine and lipid starvation impacted either serine or citrate synthesis compared to singular nutrient deprivation alone, we performed the experiments focusing on these conditions. We felt that conducting an experiment that specifically targeted our question would be make the findings more accessible as we had compared the +serine +lipid conditions to either serine or lipid depletion alone earlier in our manuscript (Figure 2D and Figure 5G,H of the revised manuscript).

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Chang and colleagues provides new insights into how cancer cells adapt their metabolism under nutrient-deprived conditions. They find cells respond differentially to serine and lipid deprivation via oxidising the cell redox state, which enables biomass synthesis and cell proliferation. They identified mitochondrial respiration as the major mechanism that dictates the endogenous NAD+/NADH ratio. By incorporating a dual stress paradigm, serine and lipid deprivation, the study further suggests that the NAD+/NADH ratio can serve as a link to orchestrate the complex interplay between multiple nutrient changes in the tumour microenvironment.

      Strengths:

      A novel aspect of this study is the idea that cancer cells are not uniformly passive victims of nutrient limitation; some can actively invoke endogenous NAD+ regeneration to combat nutrient stress. The conclusion is well-supported by comparing multiple cell lines from different tissues and genetic backgrounds, which improves generalizability. While most of the smaller conclusions align with common reasoning and expectations, the step-by-step deduction that leads to a novel 'big picture' is commendable. Another notable strength is the integration of dual stress (lipid and serine deprivation), which better mimics the complex tumor microenvironment with multiple nutrient fluctuations, raising the translational potential of these findings. The observation that lipid-deprived cells can stimulate serine synthesis and support proliferation in a subset of cancer cell lines offers a novel perspective on metabolic plasticity under starvation conditions.

      We thank Reviewer 3 for their time and for their comments to help us improve the manuscript. We also thank them for highlighting the strengths and significance of our findings.

      Weaknesses:

      (1) Although the authors derive a novel and valuable overarching concept, the presentation of this "big picture" is not clearly articulated, making it less accessible to readers outside the immediate field. It would greatly enhance the manuscript to include a clearer summary of the overarching model and its implications. Additionally, discussing the potential clinical significance and applications of the findings would increase the relevance and broader impact of the work. Finally, the manuscript's clarity and credibility are undermined by inconsistent figure labeling and the lack of statistical analysis, particularly for the Western blot data.

      (A) It would greatly enhance the manuscript to include a clearer summary of the overarching model and its implications. Additionally, discussing the potential clinical significance and applications of the findings would increase the relevance and broader impact of the work.

      We appreciate Reviewer 3’s suggestion to help clarify the findings of this study. To better articulate our overarching model, we have added the following text to the end of the Results section of the revised manuscript

      “Taken together, we propose a model where environmental nutrient availability can impact mitochondrial respiration based on the specific cancer. Because mitochondrial respiration is a major pathway that regenerates NAD<sup>+</sup>, changes to mitochondrial respiration can alter the cell NAD+/NADH ratio, influencing the activity of major NAD<sup>+</sup>-requiring metabolic reactions such as serine synthesis and citrate synthesis that can be important for proliferation. We further propose that changes to the cell NAD+/NADH ratio can impact all oxidative biosynthetic reactions if the enzyme machinery is present, but that specificity for how the cell NAD+/NADH ratio changes is dependent on both cell-intrinsic factors and cellextrinsic factors (Figure 7)." (Lines 396-404)

      Additionally, a new model figure was added as Figure 7 in the revised manuscript, which may help with understanding for a general audience.

      To better highlight the potential clinical significance of these findings, we have added the following at the end of the Discussion section of the revised manuscript:

      “Better understanding the mechanisms cells use to alter respiration and adjust the NAD+/NADH ratio in response to available nutrients could inform the complex interplay between cell-intrinsic and cell-extrinsic factors that determine cancer metabolic dependencies. This is particularly important to consider when targeting metabolism for cancer treatment. Many newer therapies targeting metabolism have not been successful in part because of metabolic plasticity to nutrient shifts (Amoedo, 2017; Fendt, 2020; Xiao, 2023). Co-targeting mitochondrial function limits metabolic adaptations and may also help predict the tissue nutrient conditions that result in pathway dependencies for specific cancers. Thus, better understanding how the cell NAD+/NADH ratio responds to nutrient levels in different cancers could improve selection of patients for cancer therapies that impact metabolism.” (Lines 483-492)

      (B) “…the manuscript's clarity and credibility are undermined by inconsistent figure labeling and the lack of statistical analysis, particularly for the Western blot data.”

      We apologize to the reviewer for any inconsistency in data presentation. To address the comment related to inconsistent figure labeling, we ensured all figures in the revised manuscript are labeled to allow readers to recognize what cell lines are used, what conditions are tested, what parameters are measured, and how the data may or may not be normalized. To address the reviewer’s comments about lack of statistical analysis, in the revised manuscript we ensured that statistical analyses are included for data presented in each figure, when appropriate. We also include a section titled “Statistics and Reproducibility” in the Methods section. In our revised manuscript, we have ensured that the p-value threshold is consistent throughout all figures, and have removed “ns” across the manuscript for consistency as suggested by Reviewer 3 in their minor comments. We also removed any explicit p-values included in figures where the p-values were close to reaching the threshold for significance (a=0.05). We have also performed additional statistical analyses where needed, including adding the pvalues for linear regression analyses, and ensured new data added to the revised manuscript also included appropriate statistical analyses.

      For western blot data, we show representative immunoblots. However, we measured PHGDH, PSAT1, and PSPH protein expression in three biological replicates across examined cancer cells and quantified the average serine synthesis protein expression from each replicate performed with error bars that denote standard deviation (see Author response image 7). We performed a nested ANOVA to examine whether there was a statistically significant difference in PHGDH, PSAT1, and PSPH protein expression between non-responder and responder cancer cells. Interestingly, as noted in our response to Reviewer 2, we find a significant negative association between PHGDH protein expression and response to serine deprivation among the six cancer cells where PHGDH protein expression did not explain proliferation upon serine depletion.

      Author response image 7.

      Serine synthesis enzyme protein expression in serine-replete and serine depleted cancer cells. (A) Immunoblots examining the expression of PHGDH, PSAT1, and PSPH in cancer cells as shown. HSP90 was used as a loading control. Data are from two separate biological replicates. (B) Mean levels of PHGDH, PSAT1, and PSPH normalized to loading control HSP90 across cancer cells from three separate biological replicates. Yellow denotes cancer cells that do not elevate mitochondrial respiration in response to serine depletion (non-responders). Blue denotes cancer cells that do elevate mitochondrial respiration in response to serine depletion (responders). P-values were calculated with nested ANOVA comparing non-responders and responders, **p<0.01

      (2) While this study identifies changes in serine synthesis, mitochondrial respiration, PHGDH protein levels, and NAD+/NADH ratio in different cell lines, some of these relationships appear correlative rather than causally established (Figure 2; Figure 5; Figure 6). Some claims are thus overinterpreted. For example, the co-occurrence of increased NAD+/NADH ratio and citrate levels under lipid deprivation in A549 cells does not establish causality (Figure 5). Direct perturbation experiments that manipulate NAD+/NADH and assess downstream effects on citrate synthesis would substantially strengthen the conclusions.

      We agree with Reviewer 3 that corresponding changes in proliferation, mitochondrial respiration, and serine synthesis are correlated to the NAD+/NADH ratio. As shown in Figure 4, we perturbed the NAD+/NADH ratio with FCCP and rotenone to measure downstream effects on serine synthesis. We also agree with the reviewer that doing similar experiments in the lipid depletion condition would highlight the relationship between the NAD+/NADH ratio and citrate synthesis. However, we point out that these experiments were already published in a manuscript from our group specifically showing that the NAD+/NADH ratio is limiting for citrate synthesis (PMID: 35739397). In that manuscript, the NAD+/NADH ratio was perturbed using electron transport chain inhibitors, including complex I inhibitors, which decreases the cell NAD+/NADH ratio. Exogenous electron acceptors were used to rescue the NAD+/NADH ratio, and under those conditions, cell proliferation, the NAD+/NADH ratio, and glucose and glutamine oxidation to citrate were measured with and without lipid depletion. We showed that decreasing the NAD+/NADH ratio decreases citrate synthesis through both glucose and glutamine oxidation and also affects palmitate synthesis. We could rescue citrate and palmitate synthesis by supplementing cells with exogenous electron acceptors. We also show that expressing cytosolic or mitochondrial NADH oxidase (LbNOX; PMID: 27124460) in mitochondrial complex I-inhibited cells rescues proliferation in lipid depleted conditions and that LbNOX expression raises oxidative citrate production at baseline. Given the extensive prior work showing the relationship between the NAD+/NADH ratio, oxidative citrate synthesis, and palmitate synthesis, efforts to repeat these same experiments for this manuscript were not warranted. We do show in the current manuscript that treating cells with AKB or FCCP, which raises the NAD+/NADH ratio, also increases glucose oxidation to citrate (Figure 5D of the original and revised manuscripts). We did this to confirm that the elevated M+2 citrate production from glucose in serine starved H1299 cells was related to an increase in the NAD+/NADH ratio as opposed to a specific response to serine depletion.

      The study focuses predominantly on mitochondrial respiration as a source of NAD+ regeneration. However, it will also be interesting to check other significant pathways, such as NAD+ salvage, which have been implicated in supporting serine biosynthesis. In addition, the subcellular distribution of NAD+ may distinguish whether some cells are truly redox-unresponsive. Mitochondrial NAD+ regeneration might counteract the cytosolic NAD+ consumption, rendering a relatively stable intracellular NAD+/NADH ratio. The malate-aspartate shuttle can be an interesting aspect.

      (A) The role of NAD+ salvage and serine biosynthesis

      Per the reviewer’s request, we investigated whether NAD+ salvage might be involved in supporting serine synthesis. Specifically, the reviewer comments highlight an interesting question about whether NAD+ salvage may differentially contribute to serine synthesis between cancer cells that elevate mitochondrial respiration in response to serine depletion and cancer cells that do not change mitochondrial respiration in response to serine depletion. Specifically, we wondered whether cancer cells that do not elevate mitochondrial respiration in response to serine depletion depend more on NAD+ salvage to support proliferation in serine depleted conditions. To test this, we treated A549 and H1299 cells in serine depleted conditions with increasing doses of the nicotinamide phosphoribosyltransferase (NAMPT) inhibitor FK866. However, we found no statistically significant difference in sensitivity to FK866 upon serine depletion in these cells based on ANCOVA analysis (p=0.9332). Interestingly, we observe that A549 cells are more sensitive to FK866 treatment than H1299 cells in serine-replete media conditions (ANCOVA analysis, p=0.0004). This suggests that A549 cells at baseline may have greater dependence on NAD+ salvage compared to H1299 cells, though this is not specific to the response to serine depletion. We then asked whether nicotinamide mononucleotide (NMN), the product of NAMPT and the immediate precursor to NAD+ in the salvage pathway, would rescue the proliferation of A549 cells cultured without serine. We find that adding 100 µM NMN, a concentration that can impact PHGDHdriven serine synthesis (PMID: 30157431), does not change proliferation of A549 cells cultured without serine, unlike supplementing cells with AKB or FCCP, which increase NADH oxidation to NAD+. Together, these data suggest that NAD+ salvage does not play a major role in differentiating the redox response to serine deprivation between responder and non-responder cells. We have added these data as Supplementary Figure 3C,D of the revised manuscript.

      (B) The role of the malate-aspartate shuttle and serine biosynthesis

      The MAS has been shown to play an important role in serine synthesis (PMID: 37647199) and may facilitate elevation in mitochondrial respiration in response to serine depletion. As stated in response to Reviewer 2, measuring subcellular compartmentspecific NAD+/NADH ratios accurately is not feasible, so we utilized a functional approach to interrogate the role of compartmentalization. Specifically, we tested a role for the malate-aspartate shuttle (MAS). Using CRISPR/Cas9, we generated GOT1, MDH1, and GOT2 deleted H1299 cells. We did not knock out MDH2 given its integral role in the TCA cycle. Using the knockout lines, we measured the whole cell NAD+/NADH ratio and found that MDH1 and GOT2 KO cells no longer exhibited an elevated cell NAD+/NADH ratio upon serine depletion compared to non-targeting controls (NTC). Consistently, MDH1 and GOT2 KO cells did not elevate OCR upon serine deprivation, nor did they exhibit greater serine synthesis rates compared to NTC cells. This suggests that MDH1 and GOT2 activity support the process by which mitochondrial NAD+ regeneration provides cytosolic NAD+ to support serine synthesis. We next asked whether MAS protein expression differed between cells that elevate respiration in response to serine depletion and cells that do not. While enzyme expression is not equivalent to activity, we wondered whether MAS protein expression would be lower in cells that do not increase their mitochondrial respiration upon serine depletion. However, we observed no major difference in GOT1, GOT2, MDH1, or MDH2 protein expression across the cancer cells examined (Author response image 8). Further experimentation is needed to measure MAS activity across lines and may reveal a mechanism by which mitochondrial respiration is governed by nutrient availability, such as levels of environmental serine.

      Author response image 8.

      Protein expression of the malate aspartate shuttle enzymes GOT1, MDH1, GOT2, and MDH2 in cancer cells cultured without serine for 24 hours. Membranes were first probed for GOT1 or GOT2 then stripped and re-probed for MDH1 or MDH2.

      (3) The authors should acknowledge the limitations of short-term isotope tracing in their experimental design. Differences in metabolic rates across cell lines can affect the kinetics of metabolite labeling, limiting the direct comparability of metabolic fluxes between them. As a result, observed changes may reflect transient adaptations rather than stable metabolic reprogramming. It is important to clarify that the study primarily captures short-term responses, and the conclusions may not extrapolate to longer-term adaptations or protein-level changes under sustained nutrient stress.

      We thank the reviewer for this comment. We apologize for any confusion around experimental approaches. We agree that in the case of acute changes in nutrient availability at the start of kinetic isotope tracing, the observed changes may reflect transient adaptations. However, cells are exposed to conditions for 24 hours prior to performing kinetic tracing. This approach allows us to examine changes that occurred in response to the nutrient condition, not acute changes. Additionally, we add fresh, prewarmed treatment media at least two hours prior to commencing kinetic isotope tracing. Upon analysis of kinetic isotope tracing, we examine whether cells were at metabolic steady state by monitoring metabolite levels over the course of tracing. For example, in the kinetic glucose tracing experiments in serine depleted cells, total serine levels are relatively stable throughout the experiment, and we find that total serine levels are greater in H1299 cells after 24 hours of serine starvation. Data showing total metabolite pools over the course of tracing are shown in the Supplementary Figures (for example, see Supplementary Figure 8C-H in the revised manuscript). The period of treatment prior to the start of kinetic isotope tracing is described in the figure legends and further detailed in the “Kinetic U-<sup>13</sup>C-Glucose Isotope Tracing Experiments” section of the Methods in the revised manuscript. To improve clarity, we added a kinetic graph showing total serine levels over time in Supplementary Figure 2I of the revised manuscript as this can address whether synthesis rates are captured while cells are at metabolic steady state. We also discuss these considerations better in the revised manuscript with the following text:

      “Importantly, we confirmed kinetic U-<sup>13</sup>C-glucose tracing was performed at metabolic steady state by ensuring metabolite levels were stable at each collected time point (Supplementary Figure 2I)” (Lines 178-180).

      Reviewer #3 (Recommendations for the authors):

      It is important to note that, in many cases, the data show only trends rather than statistically significant differences, or, if significance testing was performed, the results are not clearly labeled. For example, in Figure 1B, no p value was denoted in the figure, and the scale bar is quite high, precluding the conclusion that "AKB and rotenone dosedependently increased and decreased the cell NAD+/NADH ratio". In Figure 2E, no pvalue was shown to support the result that "H1299 cells had higher serine level than A549 cells". Inconsistencies in how significance is denoted across figures (e.g., asterisks vs. numerical values; "ns" vs. no label) make interpretation difficult. Marginal significance (e.g., p = 0.06 in Figure B) can be reported explicitly, but all figures should clearly denote whether comparisons are significant or not. Conclusions drawn from nonsignificant trends should be appropriately stated.

      We thank Reviewer 3 for this important comment and for highlighting specific instances where the manuscript could be improved. Please see response to Reviewer 3, Major Comment 1B. We also agree with Reviewer 3 that it is integral to ensure that conclusions made from non-significant trends are appropriately stated. For example, we explicitly mention that there was no statistically significant difference between the serine synthesis rate of A549 cells depleted of serine versus A549 cells depleted of both serine and lipids (Line 375). As another example, we changed the phrase “Moreover lipid depletion led to a greater fraction of total serine derived from glucose in serine depleted A549 cells” to “Moreover, lipid depletion appeared to lead to a greater fraction…” (Line 376).

      Western blot data supporting PHGDH expression variability across cell lines (e.g., Supplementary Figure 2B, 3E) appear to rely on single experiments. At least three biological replicates are required to substantiate claims about discordance between PHGDH levels and serine sensitivity. Supplementary Figure 4G presents overexpression validation based on a single Western blot without quantification. Including statistical validation from biological replicates would strengthen this point.

      We thank Reviewer 3 for this suggestion. Western blots were repeated 3 times, although data from a representative blot is shown. Please see response to Reviewer 3, Major Comment 1B.

      Certain data visualizations (e.g., Figure 2C) lack annotation indicating which data points correspond to which cell lines, limiting interpretability. All figures should include clear labels, consistent statistical notation, and complete legends. The author uses different color labels (redox-responsive (blue) and unresponsive (yellow) cell lines), which provides mechanistic clarity; however, this classification was not consistently used across the manuscript (e.g., Figures 2d and 2e). To further improve reader comprehension, consider adding conceptual schematic diagrams before each main result section to illustrate experimental logic, and a final diagram summarizing the proposed mechanism.

      We apologize for any unclear data presentation. In the revised manuscript we have added greater clarity around what cell lines are used in each experiment and have added explicit labeling to specify cancer cell lines in Figure 2C of the revised manuscript. Throughout, we have ensured that any serine redox non-responder cell lines are labeled in yellow while serine redox non-responder cell lines are labeled in blue. We have also ensured that any lipid redox responder cells are labeled in green while lipid redox non-responder cells are labeled in dark purple, a change from the original manuscript. Finally, we have also added a schematic to summarize the proposed model in Figure 7 of the revised manuscript.

      Although the authors provide justification for using H1299 and A549 as representative cell lines to study serine depletion, it remains unclear whether these two lines are equally suitable for investigating lipid depletion. Additional rationale or supporting data would help clarify their appropriateness for the lipid-related experiments.

      We thank Reviewer 3 for this suggestion. We opted to study H1299 and A549 cells under lipid deprivation to assess their responses in relation to the response to serine deprivation. We specifically wanted to know whether these findings related to serine deprivation applied to other nutrient depleted conditions. We clarify this logic in the revised manuscript by adding the following text:

      “Oxidative biosynthetic reactions other than serine synthesis can also be constrained by the NAD+/NADH ratio. For example, cancer cells deprived of environmental lipids increase oxidative citrate production, and we have previously found that citrate synthesis, either through glucose oxidation or glutamine oxidation, is limited by NAD+ availability (Li, 2022) (Figure 5A, Supplementary Figure 8A). Thus, we sought to uncover whether the increase in the cell NAD+/NADH ratio by mitochondrial respiration in response to serine withdrawal specifically supports greater serine synthesis or also leads to greater oxidative citrate production.” (Lines 307-313)

      We have also included more detailed justification for focusing our studies on A549 and H1299 to study serine depletion by adding the following statements to the manuscript:

      “We performed focused comparisons between A549 and H1299 cells because they exhibit differences in proliferation upon serine deprivation that are not explained by PHGDH protein expression, demonstrate differing responses of the cell NAD+/NADH ratio upon serine deprivation, and have similar basal proliferation rates.” (Lines 171-175)

      The concentration of serine in replete media should be explicitly stated and justified. If the intention is to mimic physiological conditions, alignment with human plasma levels would increase translational relevance.

      We agree that explicitly stating the concentration of serine in replete media is important. In the revised manuscript, we explicitly state that DMEM contains 400 uM of serine and that we use this concentration for serine-replete conditions (Line 102). While an important application of our manuscript is to better explain metabolic changes that can occur in physiologic conditions, we acknowledge that we did not test levels found in different tissues. Rather, by examining extreme conditions of high and low serine, we hoped to dissect how cells adapt to nutrient conditions, and testing the more subtle responses based on tissue serine levels will require a dedicated study.

      Rotenone may elevate ROS levels and trigger cellular stress responses, potentially confounding proliferation assays. The authors should validate that concentrations used do not induce cytotoxicity or excessive oxidative stress, and ideally measure ROS levels to support interpretation.

      We thank Reviewer 3 for raising this important point. We explicitly measured cell viability with the doses of rotenone used in this manuscript in cells cultured with or without serine. We find that rotenone dose-dependently increases cytotoxicity in A549 cells grown in serine-replete conditions in a statistically significant manner as calculated by simple linear regression. However, the cytotoxicity from rotenone is low (at most 4% in serine depleted conditions) and does not explain differences to rotenone sensitivity with respect to serine synthesis. These data have been added to Supplementary Figure 1C of the revised manuscript.

      Evidence for lipid depletion can enhance serine synthesis in A549 cells is inadequate, for the marginal difference in NAD+/NADH ratio and slight increase of M+3 serine levels. The statement "any perturbation that increases the NAD+/NADH ratio led to both elevated serine and citrate production, regardless of what nutrient was depleted from the environment" (introduction section) should be reworded.

      We thank Reviewer 3 for this suggestion. We have changed the above statement to the following:

      “Lastly, we find that any perturbation that increases the NAD+/NADH ratio, including lipid deprivation, could paradoxically improve the proliferation of cells in serine depleted conditions.” (Lines 90-92).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors conducted a comprehensive benchmarking and evaluation of co-folding platforms, including AlphaFold3, Boltz-2, Chai-1, and the docking algorithm Dock3.7, which employs a physics-based scoring function that incorporates van der Waals interactions, electrostatics, and ligand desolvation energies. The system of interest was the SARS-CoV-2 NSP3 macrodomain (Mac1), an increasingly popular antiviral target, and the ligand sets comprised 557 unseen ligand poses (keeping the training for these co-folding platforms in mind). Additionally, the authors investigated whether the co-folding models could distinguish true ligands from non-binding small molecules. The study is thorough, with extensive statistical support and consensus across multiple metrics (chemoinformatics for quantifying ligand similarity and efficacy). The questions that the authors aim to address are whether the co-folding models struggle with memorization, whether they can distinguish between a true and a false binder, whether they replicate experimental binding affinities and efficacy, and how they compare to the physics-based docking algorithm (Dock3.7).

      We thank Reviewer 1 for this thoughtful summary of our work.

      Strengths:

      Overall, this is a scientifically solid paper. The work is highly detailed and well executed, featuring thorough data analysis and statistical assessment.

      Weaknesses:

      My main concern is that the study's aim is a bit unclear. Modern benchmarking studies comparing physics-based docking with deep learning-based co-folding approaches (e.g., AF3, Boltz-2, Chai-1, and others) are increasingly expected to go beyond aggregate performance metrics.

      Indeed, we have gone into several examples of failures and successes for each of these methods. As we are not developing these methods ourselves, we also think this dataset will be a valuable contribution for improving them further.

      In addition to rigorous dataset construction, transparent methodology, and appropriate statistical evaluation, high-impact benchmarks typically provide actionable guidance on when each method class is most appropriate, reflecting their distinct inductive biases and practical constraints. Failure-mode analyses that link performance differences to protein flexibility, ligand chemistry, or binding-site characteristics are particularly valuable, as they move comparisons beyond "scoreboard" assessments toward mechanistic understanding.

      Right now, we do not observe meaningful trends that separate the failure modes for any individual method. This is covered in Supplementary Figures 6 and 7.

      While full biological validation is not expected, qualitative interpretation grounded in physical and biological principles strengthens conclusions. Providing reproducible workflows or reference pipelines is not mandatory, but it is increasingly viewed as a best practice because it facilitates adoption and helps contextualize results for practitioners.

      We note that our code is available (https://github.com/jongbin99/Cofolding/) and all structural data will be publicly accessible in the PDB alongside publication (we only held it back only for “blinding” during peer review to avoid contamination with any new deep learning methods).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Kim et al. evaluates the performance of three modern AI-based methods in predicting complex structures and binding affinities between proteins and chemical compounds. An honest 'prospective' evaluation is achieved by studying benchmark structures and chemical compounds that did not exist in the PDB at the time the AI structure prediction models (AlphaFold3, Chai-1, Boltz-2) were trained.

      Strengths:

      (1) The study addresses an important question in modern computational biology and drug discovery, and establishes the strengths and limitations of the three tools in solving various computational chemistry tasks, including compound pose prediction, active-inactive discrimination, and potency ranking.

      (2) The conclusions are based on examination of four separate targets and respective compound datasets, where for one of the targets, the authors also obtained numerous X-ray structures to serve as experimental answers for the binding pose prediction task.

      (3) The study reports relationships between structure prediction confidence, predicted energies (DOCK3.7), and affinity predictions (Boltz-2) with the geometric accuracy of compound pose prediction as well as the experimentally measured potency.

      (4) One of the key findings is the limited ability of co-folding methods to predict conformational rearrangements, which does not correlate with their ability to predict binding poses of the compounds inducing these rearrangements.

      (5) The findings could serve as useful guidelines for computational chemists in selecting appropriate software and scoring schemes for each task.

      We appreciate Reviewer 2’s summary of the novelty of the dataset and analysis.

      Weaknesses:

      While I consider this a solid study, several aspects would need to be addressed to make it really strong:

      (1) DOCK3.7 docking and scoring experiments were performed using one experimental structure of Mac1, selected from dozens of structures based on a criterion that is not sufficiently well justified. For sigma2 receptor, dopamine D4 receptor, and AmpC β-lactamase, it is not clear which structures or models were selected for docking at all. It is well known that geometry predictions, scoring, and active-inactive ROC AUCs are all strongly influenced by the selected structure. It would be important to attempt Mac1 docking using all available experimental Mac1 structures, or at least against representative structures in various conformations; it would also be quite insightful to compare results to docking of the same compound sets to AF3, Boltz-2 and Chai-1 predicted structures of Mac1. Same goes for the docking studies of sigma2, D4, and AmpC β-lactamase.

      In any program, a decision has to be made as to which template will be used for docking, we justified the choice in the methods:

      “We used this structure because the inhibitor (Z5014193706) was the most potent molecule with a structure determined around the same time as the ligands in this dataset were tested.”

      We stand by this as a reasonable assumption. Similarly, for sigma2, D4, and AmpC β-lactamase, the template was chosen in the respective papers:

      a) The σ2 receptor bound to cholesterol (PDB ID: 7MFI) was used in the docking calculations.

      - This structure was determined in the paper, the first structure of sigma2 and therefore a worthy template

      b) The D4 receptor campaign used PDB 5WIU

      - This was one of two D4 structures available and chosen because it was not bound to sodium

      c) For AmpC, the campaign used the structure in the Protein Data Bank (PDB) 1L2S

      - This maximizes comparisons to other docking studies that used the same receptor template.

      The major goal of this study is to compare different methods under reasonable (but perhaps as the reviewer points out, not optimal) conditions, not to optimize docking score.

      (2) For binding affinity predictions, as a control, authors should consider compound co-folding with an unrelated protein, or even with a pseudo-peptide that consists of a few random single amino acids - this would provide an honest baseline for such predictions.

      This suggestion would be valuable for understanding the performance for these methods from the perspective of ligand specificity (a valuable, but separate, goal). Surely this will generate some number or some prediction - but what would this baseline mean and how would it be relevant for drug discovery? Therefore, we do not think this suggestion is relevant for the issues being investigated in this manuscript.

      (3) ROC curves Figure 3 and elsewhere should be shown, and AUCs quantified/reported on a log or square-root scaled x-axis, to emphasize early enrichment, which is the area of practical significance for these predictions. For example, Figure 3A currently suggests that the pose prediction performance of AF3 exceeds that of Boltz-2 whereas the early enrichment is clearly better for Boltz-2.

      We agree with this, and added a semi-logAUC plot for Figure 3A. For Figure 5, we also generated a semi-logAUC plot to see early ligand enrichment clearly, added as Supplementary Figure 11. We added the text:

      “Considering its early enrichment performance, Boltz-2 Ligand ipTM was the strongest predictor of pose accuracy based on normalized logAUC (20.5% above random, Fig. 3a). In contrast, although Boltz-2 pIC50 showed poor overall discrimination, it overestimated its ability to enrich true positive poses at low false positive rates, despite having a weak early enrichment behavior”

      (4) 'Trained set' in figures and text should probably be 'training set'? Or otherwise explain this new term the first time it is introduced.

      Thank you for pointing out this for clarification. ‘Training set’ is the correct word, and we made changes appropriately across all figures and texts.

      (5) Figure 1 illustrates a projection onto the first two principal components of a space that apparently had only one (scalar) metric for each compound pair (% maximum common substructure or Tanimoto coefficient); the authors need to better explain the principle behind this analysis and visualization.

      This suggestion is valuable, since we often use PCA to reduce dimensionality for more complex features. For clarification, we actually have a full pairwise similarity matrix for all tested Mac1 compounds based on each of Tc and MCS%. PCA for each MCS% and Tc is a representation of each pairwise similarity matrix. We also made a change in Figure 1 caption to make this point clearer:

      “projection of compounds represented by their full pairwise similarity vectors (by ECFP-4 Tc and MCS%)”

      Reviewer #3 (Public review):

      Summary:

      This study's core conclusions are well-supported by data. It is shown that co-folding outperforms docking in known ligand pose/affinity prediction (validated by RMSD and IC₅₀ correlation), struggles with false-positive discrimination in virtual screens (lower AUC values), and is complementary to docking (non-correlated errors, distinct strengths in drug discovery stages).

      Strengths:

      (1) Unprecedented prospective design with 557 novel Mac1-ligand complexes ensures rigorous, independent evaluation of co-folding methods.

      (2) Comprehensive comparison of 3 co-folding tools (AlphaFold3, Chai-1, Boltz-2) with DOCK3.7 across diverse targets and metrics enables nuanced performance assessment.

      (3) The study clearly demonstrates complementary roles of co-folding (superior pose/affinity prediction for known ligands) and docking (better hit prioritization), and addresses deep learning memorization concerns via ligand similarity analysis.

      We thank Reviewer 3 for pointing out the unprecedented and comprehensive nature of our study

      Weaknesses:

      (1) Limited generalization to diverse protein families (e.g., no ion channels/transporters).

      We agree - we have not explored the entire proteome and these are important target classes that will surely be investigated by future studies. We focused on targets here where we had large number of X-ray crystal structures (Mac1) and affinity/inhibition measurements from docking (the other three targets).

      (2) Ambiguity in the mechanism underlying co-folding's failure to predict rare conformational changes.

      Again, we agree. We are not the developers of these methods. We observe that these methods do not predict conformational changes with high fidelity and this weakness is an area that co-folding methods will surely prioritize in the future.

      (3) Virtual screen comparison is unbalanced (docking-prioritized hit lists bias results).

      We acknowledge this in the results: “An important caveat is that the hit-lists were composed of molecules prioritized by docking in the first place, giving it an advantage on these particular sets.” and discussion: “Finally, comparing co-folding to docking based on hit-lists themselves selected by docking is arguably unfair to co-folding. Counter-balancing this is the inclusion, in each of the three hit lists, of molecules that had mediocre and poor docking scores intentionally selected to test the correlation between docking score and hit-rate. Here too, the correlation between co-folding score and likelihood to bind, what we sometimes call a “dock-response-curve” was no better than docking’s, often worse (SFig.11).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Here are suggestions for revisions:

      (1) The writing is at times obtuse and hard to follow.

      This happens sometimes when multiple authors are writing together. We apologize and are happy to respond to specific areas that can be streamlined to be easier to follow.

      (2) In the Results section, "A set of 557 previously unreported Mac1 ligand complexes", the authors have compared the ligand poses across different metrics such as Tc - a standard, highly effective method in chemo-informatics and MCS (maximum common substructures); these are standard metrics for quantifying the structural similarity between pairs of small molecules. This part of the analysis checks whether this is memorization; it is critical to compare the two metrics, but it is not sufficient to draw a conclusion.

      Thank you for pointing out about the structural similarity of molecules co-folded to those present in the training set (resolved as Mac1 complexes and deposited in PDB before training dates). We have conducted an analysis where we do a pairwise similarity comparison for all ligands present in the PDB (regardless of the target), by both Tc and MCS, and overlay the cluster of ligands we tested (Mac1, AmpC, sigma2, D4). This should show where our tested benchmark datasets lie in the chemical space covered in the entire PDB. Each cluster (around 500 to 1300 compounds per target system) is overlaid on the cluster of all ligands deposited in PDB (over 50,000 compounds), and each cluster was relatively diverse by both Tc and MCS.

      (3) In the "Co folding can accurately reproduce poses of ligands dissimilar to those trained." Subsection under Results, the authors' conclusions are hard to follow; they state that the co-folding models often mispredict or miss the alternative conformation, but they also predict poses that are distinct from the training set. What does that imply?

      Our interpretation is actually a somewhat unsettling one: co-folding gets the ligand pose right even when it gets the protein wrong, and even when the ligand is novel. This suggests the models may be anchoring on conserved pharmacophoric interactions (like the adenosine-mimicking purine scaffold) rather than truly modeling the physics of the full complex. We added to the results section:

      This result suggests that co-folding reliably recapitulates dominant ligand-binding interactions even in the absence of accurate protein conformational modeling, providing further support to the idea that they are learning specific interaction patterns rather than a deeper physics-based representation (Masters et al. 2025).

      (4) The Discussion section connects the results and conclusions, but it can be challenging to grasp the study's overall message.

      We think the final paragraph hits on three major points:

      - Co-folding accurately predicts ligand poses for known binders, but fails to capture conformational changes

      - Co-folding does not reliably distinguish true binders from false positives in virtual screening hit lists

      - Docking and co-folding are complementary rather than competing tools

      (5) The work is highly detailed and well executed, featuring thorough data analysis and statistical assessment. The value of the paper would be further enhanced by explaining how it differs from seemingly similar results reported in other studies, including the one cited in this manuscript (see https://www.biorxiv.org/content/10.64898/2025.12.04.692352v1).

      The Mac1 results are completely unique. However, the docking datasets are exactly the same as those analyzed in the Menon et al manuscript. We don’t think our results differs from conclusions of the Menon et al manuscript as we wrote: These observations are supported by a fascinating study on some of the same ligand sets as investigated here, using AlphaFold3, reaching similar conclusions (Menon et al. 2025).

      Reviewer #3 (Recommendations for the authors):

      (1) Expand target diversity to include ion channels, transporters, etc., beyond enzymes and GPCRs.

      (2) Investigate the cause of co-folding's failure in predicting rare conformational changes (e.g., adjust sampling, MSA inputs, or add experimental constraints).

      (3) Mitigate docking bias in virtual screens (e.g., re-analyze unbiased compound libraries).

      We addressed these three points in the public review above

      (4) Test Boltz-2's affinity predictions without linear calibration and compare with FEP.

      The data without linear calibration are included in the manuscript. Comparing such a large number of compounds with FEP is currently beyond our capabilities.

      (5) Conduct proof-of-concept to test co-folding-docking integration for better hit rates.

      We think this is well beyond the scope of this manuscript - but look forward to testing this idea in the future.

      We also got one community review that we respond to below:

      Summary

      This manuscript evaluates the performance of co-folding models when tasked with 1) the recapitulation of a large number of experimentally determined co-crystal structures of Mac1 with a series of Mac1 ligands and 2) the rescoring of hits to identify false positives originally derived from a set of large docking-based virtual screens. The evaluation leverages a dataset of crystal structures and affinity data from high-throughput crystallographic and biophysical screens, respectively. These data uniquely enable this report to focus on the ability of co-folding models to handle ligands, resulting in an analysis that is particularly timely given the wide adoption of co-folding models and the relative scarcity of such ligand-focused benchmarks among existing evaluations, which have primarily focused on protein structure prediction or binder design.

      Thank you for this thoughtful summary of our work

      Feedback

      The experiments and analyses in the manuscript are well thought-out and do not have any significant issues. There are a few high-level points that may improve the clarity and completeness of the results. Importantly, none of the suggested additional experiments will affect the conclusions of the paper, but rather help provide additional context for the results:

      The first section presents an exciting opportunity to frame the Mac1 ligands against ligands in the PDB more broadly. It would be informative to assess whether chemotypes that are easier or harder to predict accurately and confidently are over- or under-represented in the PDB as a whole. Note that this is not a recommendation that new scaffold similarity metrics be incorporated into the analysis, but rather that analyses similar to those already performed in the manuscript are performed using all ligands in the PDB. For example, PCA-based analyses similar to those in Fig. 1c could be used to examine Mac1 ligands in the context of all PDB ligands enabling questions such as whether similarity to a nearest PDB neighbor, cluster size in a Tc/MCS PCA space, or other frequency-based measures show any relationship with prediction vs. crystal structure RMSD. Such analyses could provide additional insight into how effectively models leverage ligand information present in the PDB overall, as opposed to biases arising specifically from scaffolds represented in Mac1 structures in the PDB, which are already well covered in the manuscript. The conclusion that Tc/MCS do not correlate with the ligand RMSDs for the ligands already associated with the Mac1 is well supported, and presumably suggests that a correlation would not exist against the backdrop of the PDB, but it would be interesting to see the data using analyses similar to those already done in the manuscript nonetheless.

      We are adding new figures in SFig.1 that consider how different clusters of ligands tested for our co-folding analysis are distributed across the chemical space in PDB. This is done by making a similarity comparison between every ligand in PDB and those tested in our analysis by Tc and MCS%, then plotting in PCA space for each metric. We are excited to see that each dataset covers a wide scope in PCA space, but at the same time, there are unexplored areas in the chemical space of PDB by co-folding.

      Similarly, even though the four proteins used in this manuscript are not themselves the primary focus of the analysis, it would be valuable to perform a high-level assessment of the precedent for each protein in the PDB (beyond the count of liganded structures in Table S6), either in protein sequence space (e.g., MSAs) or structural space (e.g., FoldSeek). An analysis like this would provide important context about whether any of the proteins in the study have close homologs with liganded structures in the PDB, or are generally overrepresented in the PDB. The fact that the AUC for L-pLDDT for AmpC is higher than σ2 and D4, for example, is notable given the relative abundance of liganded AmpC structures in the PDB (this raises potentially interesting questions related to where DOCK3.7 and AF3 actually place the ligands, given the orthosteric β-lactam binding pocket in AmpC, although this is outside of the scope of this manuscript).

      High-level assessment of the precedent for each protein in the PDB will definitely help to understand if proteins we used have close homologs with liganded structures in the PDB. Our Supplementary Table 6 covers the extent to which these liganded structures were available by cutoff dates for AF3, Chai-1 and Boltz-2. AmpC had more homologs than sigma2 and D4, and this may explain a better AUC for AF3 L-pLDDT specifically for this target.

      A discussion of the affinity probability results (`affinity_probability_binary`) from Boltz-2 is likely warranted in the second section in addition to the pIC50s that are already reported (`affinity_pred_value`). The former seems like it would be more applicable for section 2 of the manuscript, but both warrant inclusion—they should both be calculated by default when the affinity pipeline in Boltz-2 is turned on, so it wouldn't involve any more inference.

      As boltz-2 affinity module outputs both affinity probability binary output and affinity predicted value, we kept track of both metrics. So we tried re-ranking hit lists using both metrics. Where boltz-2 performed better (Sigma2, D4), binary probability values were more representative as a metric to differentiate true actives from non-binders. This was more clear in semi-logarithmic ROC plots. However, in AmpC, both Boltz-2 scoring metrics performed similarly. Such inconsistency in trend made it difficult to draw conclusions.

      Minor points

      A more detailed description of the experimental methods used to generate the ground-truth data in the introduction (even though these have been explained in prior works) would help orient the reader early on, and ground the benchmarking aspect of the story. In general, the abstract and introduction would benefit from a more cohesive through-line to tie the two complementary but orthogonal sections of the paper together.

      We will include a more thorough description alongside the PDB depositions. As for the two sections, we have tried to tie them together from the perspective of drug discovery workflows…

      The cutoffs in the "Co-folding can accurately reproduce..." section shift between 2.5 Å (from the ligand center of mass) and 2.0 Å. Is there a reason for this? Along similar lines, mentioning cutoffs for true positives/negatives when introducing the ROC analyses later on in the Mac1 section seems unnecessary since no cutoff should be necessary here.

      We used 2.5A distance to COM to just get at “broadly the correct binding site” for fast filtering and 2.0A RMSD because that is the broadly accepted standard in the field for “relatively correct binding pose”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The nematode C. elegans is an ideal model in which to achieve the ambitious goal of a genome-wide atlas of protein expression and localization. In this paper, the authors explore the utility of a new and efficient method for labeling proteins with fluorescent tags, evaluating its potential to be the basis for a larger, genome-wide effort that is likely to be very useful for the community. While the evidence for the method itself is solid, carrying out this project at a large scale will require significant additional feasibility studies.

      We appreciate the editor’s recognition that the evidence for our method is solid and that a genome-wide protein atlas in C. elegans would be highly valuable to the community. However, we respectfully disagree that “significant additional feasibility studies” are required. Take the yeast proteome-wide GFP tagging project (Huh et al., Nature 2003). It achieved ~75% coverage of ~6,000 proteins directly from an established protocol without any prior significant feasibility studies, at least to our knowledge. While the C. elegans genome is 3 times in size, we would argue that our tagging protocol may even be less labor intensive as it does not involve any cloning and the screening is visual, requiring no molecular biology skills. Reviewer 3 notes: ‘They also provide convincing evidence that labelling the whole proteome is an achievable goal with relatively limited resources and time.’

      Our pilot study validates all key parameters for genome-wide scaling: editing efficiency at novel loci with untested reagents, viability of tagged worms, and detectability of multiple spectrally separated fluorophores across expression ranges. These address the core technical, biological, and practical challenges of large-scale endogenous tagging in a multicellular organism, leaving no fundamental barriers in our view.

      The proposed cost and timeline align quite favorably with established large-scale consortium projects: e.g., ENCODE pilot analyzed 1% of the human genome at ~$55 million over 4 years; Mouse Knockout Consortium scaled to ~20,000 genes over 20 years (ongoing) with ~$100 million; Human Protein Atlas mapped ~87% of proteins with antibodies in fixed cells (through much more labor intensive methods) over 20+ years at >$100 million. With ~8% of C. elegans genes already tagged (WormTagDB) and labs already tagging entire gene classes (PMID: 40463100), scaling our protocol to the proteome is feasible, potentially covering the genome in 5-6 years by a single lab or faster with distributed effort at a reagent cost of merely $2.2 million. The main barriers now are funding commitment and assembling collaborators, not further feasibility testing.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Eroglu and Hobert demonstrate that injecting CRISPR guides and repair constructs to target three genes at a time, tagging each with a different fluorescent protein, and selecting which gene to tag with which fluorophore based on genes' expression levels, can improve the efficiency of gene tagging.

      Strengths:

      This manuscript demonstrates that three genes can be targeted efficiently with three different fluorophores. It also presents some practical considerations, like using the fluorophore least complicated by agar/worm autofluorescence for genes with low expression levels, and cost calculations if the same methods were used on all genes.

      Weaknesses:

      Eroglu has demonstrated in a previous publication that single-stranded DNA injection can increase the efficiency of CRISPR in C. elegans while inserting two fluorescent proteins and a co-CRISPR marker into three loci. The current work is, therefore, an incremental advance. In general, I applaud the authors' willingness to think ahead to how whole proteome tagging might be accomplished, but I predict that the advance here will be one of many small advances that will get the field to that goal.

      Our manuscript indeed builds on prior multiplex editing (including our own co-CRISPR work), but the manuscript's primary contribution is not a novel technical breakthrough per se. Instead, our main goal was to pilot and strategize a feasible path to whole-proteome tagging in C. elegans and, most critically, test the following key parameters: (1) success rate of triple pools with prior untested reagents at novel targets; (2) utility of fluorophores across expression levels; (3) major effects on tagged protein function. In prior multiplexing, we used two targets which we already knew could be edited quite efficiently, with the 3rd target a point mutation with nearly 100% efficiency. Thus, it was not at all clear that picking 3 random genes and replacing the 3rd highly efficient locus with another less efficient large insertion would work or be sufficiently scalable for thousands of novel genes with unvalidated reagents at first pass.

      The title vastly oversells the advance in my view, and the first sentence of the Discussion seems a more apt summary of the key advance here.

      Some injections target genes on the same chromosome together, which will create unnecessary issues when doing necessary backcrossing, especially if the mutation rate is increased by CRISPR.

      We disagree with the reviewer’s assessment of the need for backcrossing, for two reasons: (1) Prior studies have shown that off-target mutations are not a serious concern in C. elegans (reviewed in PMID: 26336798). For instance, WGS of strains after CRISPR/Cas9 found negligible off-target effects (PMID: 25249454, PMID: 30420468 – using similar RNP/ssDNA method and multiple guides; PMID: 23979577, PMID: 27650892 using other methods). Targeted sequencing studies have reported similar findings, using various CRISPR/Cas9 methods, with essentially no mutations at sites other than the intended target (PMID: 23995389; PMID: 23817069). (2) If the goal is to tag the entire genome, the introduction of backcrossing should not reasonably be a routine part of the initial tagging.

      Lastly, if one really does want to backcross, the existence of tags on the same chromosome is actually an advantage because it permits selection for recombinants with wild-type chromosomes.

      Also, the need for backcrossing and perhaps sequencing made me wonder if injecting 3 together really is helpful vs targeting each gene separately, since only 5 worms need to be injected.

      Apart from our disagreement regarding backcrossing, we are puzzled by the reviewer’s comment. Why would one do single tagging at a time, rather than triple tagging if the whole point is to scale up tagging? It is important to keep in mind that the rate limiting step for tagging the whole genome is the number of injections that can be done per day. Since there is no cloning to generate the repair templates/guides and all other reagents are commercially available and not sample specific, these can be prepared quite rapidly. Being able to isolate multiple lines (together or independently) from the same injection increases throughput 3-fold and in our view does not provide any disadvantages as individual tags can be isolated independently if desired.

      Beyond the numerous technical advantages pooling provides (also lower cost and throughput for making injection mixes as well as imaging), our results show that it yields epistemic benefits as well: we would never have noted the subcellular pattern in Fig. 6B, C with different sets of mitochondria being marked by different mitochondrial proteins had we imaged them separately or even aligned to a pan-mitochondrial landmark. As we mentioned in the discussion, grouping proteins predicted to localize to the same compartment together can simultaneously test how uniform or differentiated such compartments are during the screen.

      The limited utility of current blue fluorescent proteins makes me wonder if it's worth using at all at this stage, before there are better blue (or far red) fluorescent proteins.

      We do not think that the utility of current BFPs is that limiting. At least the theoretical brightness of mTagBFP2 is comparable to that of EGFP (PMID: 30886412), which was useful for the bulk of currently tagged proteins. Due to modestly higher autofluorescence in the blue spectrum, the practical brightness is somewhat less ideal, but we have shown that many proteins are expressed high enough to be detected quite well with mTagBFP2 by eye at low magnification. We also note that many tags that are not visible by eye under a dissection scope become visible with long exposure cameras of widefield microscopes or modern confocal (GaAsP) detectors, so the list of genes detectable with mTagBFP2 is likely to be much higher. We routinely use mTagBFP2 to super-resolve subnuclear structures with endogenous tags (e.g., in the nucleolus), with some tags having lower annotated FPKMs than the genes tested here.

      Some literature reviews, particularly in the Introduction and Abstract, rely too much on recent examples from the authors' laboratory instead of presenting the state of the field. I'd like to have known what exactly has been done with simultaneous injection targeting multiple loci more thoroughly, comparing what has been accomplished to date by various laboratories' advances to date.

      We are not sure what the reviewer is referring to. In the Abstract, we do not refer to any literature. In the Introduction, we cite 28 papers, 6 of those from our lab (4 of which providing examples of protein tags). We do not believe that this can be fairly called an unbalanced presentation of the state of the field.

      This being said, we have gladly expanded our Introduction to provide more background on co-CRISPRing. Labs have routinely used co-conversion (“coCRISPR”) markers for picking out their intended edits (e.g., point mutations or insertions), as it has been shown by multiple groups that a CRISPR/Cas9 edit at one locus correlates with efficiency at other simultaneous targets (PMID: 25161212). Generally, making point mutations with the Cas9/RNP protocol is highly efficient, especially at specific loci such as dpy-10. However, multiple FP-sized insertions have not been routinely attempted. We and only one other group have successfully attempted it using previously working targets and reagents (e.g., 28% in PMID: 26187122). Importantly, the efficiency of such multiple insertions has never been assessed at scale and using entirely untested reagents at novel sites – critical parameters to determine for a whole genome approach. So, we test here (1) the efficiency of triple insertions and (2) the chance of getting them with new and untested guides and reagents.

      In our view, since we have to use some injection/coCRISPR marker anyway for those genes which are not expressed at dissecting-scope visible levels (likely most genes), using highly expressed intended targets as improvised markers in a pooled approach makes our approach much more efficient. It allows us to find the worms with the highest chance of yielding CRISPR insertions, which we can screen with higher power methods for the dimmer targets, while enabling us to co-isolate other intended targets. Insertions, being often heterozygous in F1, can be segregated independently if desired, or homozygosed together to facilitate maintenance then outcrossed individually by those interested in studying specific genes in more detail.

      In the revised version of this manuscript, we now discuss some of these points in the introduction section:

      “Currently, around 1554 proteins representing 8% of the proteome are estimated to have been endogenously tagged (Leyhr et al., 2025). However, at current rates, tagging the proteome is projected to take around 100 years and likely involve numerous duplicate attempts on a small number of commonly studied proteins (Leyhr et al., 2025). It will thus be crucial for the field to coordinate tagging efforts and scale up tagging protocols to enable coverage of the entire genome at a reasonable timescale and cost. Given the number of injections is a major time-limiting factor, pooling multiple injections into one would at minimum cut tagging time by a factor of 3. In C. elegans, screening for novel CRISPR/Cas9-induced genomic edits is already facilitated either by use of co-injection markers (i.e., plasmids that form extrachromosomal arrays) that yield phenotypes or fluorescence in progeny of successfully injected worms, or co-editing well characterized loci using established and highly efficient reagents which likewise yield visible phenotypes. In the latter approach, termed “co-CRISPR”, worms edited at the marker locus are most likely to also carry the intended edit (Arribere et al., 2014). Recent methods for CRISPR/Cas9 mediated genomic insertions have pushed efficiencies to sufficient levels to simultaneously insert multiple fluorophores (e.g., mNeonGreen and mScarlet) as well as a co-CRISPR marker (dpy-10) at three independent loci in a single injection (Eroglu et al., 2023; Paix et al., 2015). These attempts pooled reagents previously established to work efficiently and targeted genes that were known to yield functional fusion proteins when tagged. Thus, while in principle current methods could allow tagging of at least 3 independent loci in one injection if a co-CRISPR marker is omitted, it is not known to what extent such an approach could be generalized across the genome with previously unvalidated reagents (i.e., guides and repair template homology arms) at novel loci to yield functional tags”

      Reviewer #2 (Public review):

      The manuscript by Eroglu and Hobert presents a set of strains each harboring up to three fluorescently tagged endogenous proteins. While there is technically nothing wrong with the method and the images are beautiful, we struggled to appreciate the advance of this work - who is this paper for?

      We consider this paper to have two purposes: (1) motivate the community to come together to consider such genome-wide tagging approach; (2) provide a reference point for funding agencies that such an aim is not unreasonable and will provide novel interesting insights.

      As a technical method, the advance is minimal since the first author had already demonstrated that three mutations (fluorophore insertion and co-CRISPR marker) could be introduced simultaneously.

      We agree that the basic principle is similar. However, it was not clear that triple pooling three novel large edits would work, given the numbers in our original paper or that it would be scalable.

      The dpy-10 coCRISPR marker previously used is a highly efficient single site, with close to 100% hit rate. We also knew in the earlier study that the two pooled insertions already worked quite efficiently and did not disrupt the function of targeted proteins. Exchanging these plus dpy-10 for three novel tags was not guaranteed to succeed for many potential reasons, including both biological and technical. For instance, such a “marker free” approach necessitates that a significant number of targets in the genome should be expressed highly enough to be visible by fluorescence stereomicroscopy when tagged with current best fluorophores. The chance of disrupting gene function by tagging was also not explored in detail in C. elegans, nor whether one untested guide is generally sufficient. We think that establishing these parameters was meaningful and necessary for the goal of whole genome tagging. We have clarified some of these points in the text.

      As a pilot for creating genome-scale resources, it is not clear whether three different fluorophores in one animal, while elegantly designed and implemented, will be desired by the broader community. 

      The usage of three different fluorophores is largely driven by the ability to co-inject and therefore cut injection effort by a factor of three. Moreover, having all three fluorophores together facilitates imaging and maintenance. Lastly, co-labeling has the potential to reveal unexpected patterns of co-localization or lack thereof (example: two mitochondrial proteins that we found to not have overlapping distribution). We clarified this point in the revised text in both the results and discussion.

      Finally, the interpretation of the patterns observed in the created lines is somewhat lacking. A Table with all the observations must be included. This can replace the descriptions of the observations with the different lines, which could be somewhat laborious for the reader, and are often wrong. There are numerous mistaken expectations of protein expression here, but two examples include:

      We are not convinced that our expectations are mistaken. Below we respond to the reviewer’s specific examples, and we are open to hear from the reviewer about additional cases.

      (1) The expectation that ACDH-10 is enriched in the intestine and epidermal tissues (hypodermis).

      There are multiple paralogs of this protein (see WormPaths or WormFlux) that may share functions in different tissues. There is also no reason to assume that fatty acid metabolism does not occur in other tissues (including the germline). Finally, there are no published studies about this enzyme, so we really don't know for sure what it's doing.

      The expression of acdh-10 is annotated in multiple scRNA datasets as intestine and epidermal enriched (CeNGEN/Taylor et al. 2021, highest in epidermis; Ghaddar et al 2023 highest in intestine). We did not mean to imply that fatty acid metabolism does not occur in the gonad, nor that a paralog of acdh-10 could not be performing the same function in tissues where acdh-10 is not expressed.

      However, this raises an important question: why have different paralogs doing the same thing? Duplicate genes with the same function are generally not evolutionarily stable (PMID: 11073452, PMID: 24659815). That there are such striking tissue specific expression patterns of an essential or widely expressed protein class suggests that paralogs of the gene likely differ in some meaningful parameter that might align with tissue-specific functional needs or regulation. The reviewer’s statement that ‘there are no published studies about this enzyme, so we really don't know for sure what it's doing’ is in fact an excellent demonstration of our point; finding out where the duplicates are expressed can provide a starting point to uncover potential differences between the paralogs. At the very least it can delineate to what degree paralogs diverge in their expression across the proteome and identify which such cases merit further study. In a more ideal scenario, prior information of protein function could indicate that the involved pathway requires tissue specific regulation.

      (2) The expectation that HXK-1 is ubiquitously expressed.

      Three paralogous enzymes are all associated with the same reaction, and we have shown that these three function redundantly in vivo, perhaps in different tissues (PMID: 40011787).

      The cited paper (PMID: 40011787) does not show where they are expressed. We discussed redundancy/paralogs above in point 1, and in our view the same applies here. They may perform the same reaction but are likely to differ in some meaningful way, be it regulation or rate of activity, for them to be stably maintained as functional genes over evolution.

      Moreover, single-cell RNA-seq data (PMID: 38816550) also show enrichment of hxk-1 in gonadal sheath cells.

      The Ghaddar et al. and CeNGEN/Taylor et al. datasets do not show this. The scRNA paper cited (PMID: 38816550) also shows enrichment in neurons, pharynx, coelomocyte and germ cells which we did not note. In our view, these in fact further support our goals: often, transcript datasets alone (frequently used to infer tissue function) do not sufficiently predict protein expression. One can post hoc find an scRNA-seq dataset that aligns somewhat with our protein observations, but how does one know which to trust a priori? Disagreements between transcript datasets will ultimately require resolution at the protein level, in our view.

      To clarify these points, we added the following to the discussion section:

      “We also noted unexpected cell type dependent distributions of proteins involved in broadly important metabolic processes such as ACDH-10, which was depleted from the germline compared to other tissues, and HXK-1, which was highly enriched in the gonadal sheath. Notably, for these as well as other cases, scRNA-seq datasets were not sufficient to deduce a priori the observed cell type specific differences at the protein level. Importantly, many genes encoding metabolic enzymes including acdh-10 and hxk-1 have paralogs that likely perform similar catalytic functions. Yet, duplicate genes with identical functions are generally not evolutionarily stable (Adler et al., 2014; Lynch and Conery, 2000); thus such genes are likely to differ in some meaningful parameter (e.g., regulation or activity) that might align with tissue-specific functional needs. Fully annotating the expression patterns of paralogs at the protein level could indicate which tissues require unique metabolic needs and indicate which paralogous genes have undergone sub- versus neo-functionalization. For those proteins that are less functionally understood, unexpected distributions might indicate which merit further study.”

      The table should have at least the following information: gene/protein name - Wormbase ID - TPM levels of single cell data assigned to tissues for L2, L4, and adult (all published) - tissues in which expression is observed in the lines presented by the authors.

      We added some of this information such as annotated expression levels in young adults from various scRNA datasets (but not larval datasets as we did not image these). We note that each of these studies use different pipelines and report different metrics (scaled TPM/Z-score versus Seurat average expression versus TPM), so comparisons between them are not informative unless they are integrated and analyzed together.

      Reviewer #3 (Public review):

      Summary:

      The authors argue that establishing the expression pattern and subcellular localisation of an animal's proteome will highlight many hypotheses for further study. To make this point and show feasibility, they developed a pipeline to knock in DNA encoding fluorescent tags into C. elegans genes.

      Strengths:

      The authors effectively make the points above. For example, they provide evidence of two populations of mitochondria in the C. elegans germline that differ qualitatively in the proteins they express. They also provide convincing evidence that labelling the whole proteome is an achievable goal with relatively limited resources and time.

      We appreciate the referee’s recognition that whole proteome tagging is feasible.

      Weaknesses:

      Cell biology in C. elegans is challenging because of the small size of many of its cells, notably neurons. This can make establishing the sub-cellular localisation of a fluorescently tagged protein, or co-localizing it with another protein, tricky. The authors point out in their introduction that advances in light microscopy, such as diSPIM, STED, and ISM (a close relative of SIM), have increased the resolution of light microscopy. They also point out that recent advances in expansion microscopy can similarly help overcome the resolution limit.

      (1) Have the authors investigated if the three fluorescent tags they use are appropriate for super-resolution microscopy of C. elegans, e.g., STED or SIM? Would Elektra be better than mTAGBFP2? How does mScarlet3-S2 compare to mScarlet 3?

      All three tags work for ISM (i.e., Airyscan). We previously tried Electra (not for the genes tested here) but could not isolate positive tags. Given Electra is not that much brighter on paper than mTagBFP2 we did not pursue it further, though we recognize that these may simply have been unlucky injections. mScarlet3-S2 is quite a bit dimmer than mScarlet3 on paper – the advantage is that it has higher photostability. In our view, the limiting factor will be having FPs that are bright enough to screen, image and scale to the whole genome, so brightness will likely provide an advantage over photostability at this stage.

      (2) Have the authors investigated what tags could be used in expansion microscopy - that is, which retain antigenicity or even fluorescence after the protocol is applied? It may be useful to add different epitope tags to the knock-in cassettes for this purpose.

      mSG and mSc3 retain fluorescence after fixing with formaldehyde. We have not tested mTagBFP2 fluorescence in fixed worms. We agree that adding different epitope tags would be useful.

      The paper is fine as it stands. The experiments above could add value to it and future-proof it, but are not essential. If the experiments are not attempted, the authors could refer to the points above in the discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Merged figures appear saturated, and use colors that won't work for red-green colorblind viewers. 

      For all figures, we also show individual channels separately, which is common practice for making fluorescence images accessible to colorblind readers (PMID: 33788834). Figures highlighting non-overlap like 6B and C are already in accessible colors when merged (blue/green) and include a numerical quantification. 3-color RGB images preserve the greatest information for the highest number of individuals.

      (2) Targeting ubiquitously expressed genes as a proof of concept gives me some concern that this might underestimate the challenges that may be experienced with less widely expressed genes.

      While the genes were predicted to be ubiquitously expressed, many were not in practice, like HXK-1 and F54C8.1, which were also among the lower expressed genes on our list and highly cell type restricted. As discussed, the more tissue restricted a gene, the likelier that bulk RNA levels underestimate expression. Such genes are therefore more likely to be detected in a specific tissue. We routinely isolate tissue restricted endogenous tags, including those expressed in only a few neurons, with bulk FPKMs lower than the ranges tested in this manuscript.

      (3) Some results are not shown or referenced (autofluorescence, for example, is shown using a schematic in Figure 1C).

      We now provide representative images alongside what would be expected to be observed by eye during screening.

      (4) It would be useful to describe how to recover worms from what is shown in Figure 1A. 

      In the revised version, we added the following in the caption for Fig. 1A:

      “Selected worms expressing the brighter tag can be screened for dimmer tags by higher magnification and long exposure imaging. Worms can be recovered directly from slides if immobilized by levamisole as described (Ghanta et al., 2021). Alternatively, single hermaphrodite worms can be isolated, allowed to lay eggs, then screened.”

      (5) A blue bar of data must be missing from Figure 3B injection pool 5.

      As stated in the text, “All but one tag (cox-6B::mTagBFP2) was visible in the F1 generation of injected P0 animals, and these were subsequently isolated among F2 worms positive for the other tags in the pool.”

      To clarify that data points are not unintentionally omitted, we added the following text to the caption of Fig. 3B:

      “For group 5 including cox-6B::mTagBFP2, worms with detectable levels of mTagBFP2 fluorescence were not recovered in the F1 generation but were isolated among progeny of F1s positive for mStayGold and mScarlet3; we were thus unable to quantify efficiency for this locus at F1.”

      (6) Some expression or localization patterns were unexpected, but complications like germline silencing and protein mislocalization, with a small fraction localizing normally and rescuing function, were not presented as possibilities. Viability is used to confirm function, but without presenting whether this means 100% viability, less, or just the ability to maintain a strain.

      We already do discuss mislocalization and functionality issues in the Discussion, as well as tradeoffs of alternate methods. Any existing method to observe biological molecules, be it protein, RNA or DNA, has multiple drawbacks and sources of artifacts, which are unlikely to be fully eliminated in the foreseeable future.

      In regard to germline silencing of endogenously tagged genes in C. elegans, there is actually very little evidence for this. Collectively, various labs have now generated over 200 reporter alleles of germline-expressed genes (WormTagDB), with robust expression throughout the germline and retention of function. Likewise, numerous of our tags across fluorophores showed robust germline expressions including EEF-1A.1::mTagBFP2, Y22D7AL.10::mStayGold, and HAT-1::mScarlet3. In fact, overall transcript levels generally tended to underestimate germline enrichment at the protein level. We note that single-copy transgenes driven by eef-1A.1/eft-3 promoter by itself are frequently not expressed in the germline (PMID: 31064766); that we could detect EEF-1A.1 robustly in the germline when tagged endogenously is evidence that silencing is unlikely to be a widespread concern, and at the least less of a concern than single copy transgenes. We appreciate that for a transgene, presence/absence of specific sequence elements and genomic loci play a role in expression, but an endogenous tag captures all such information at a given locus.

      Indeed, we found only two reports of endogenous tags being silenced in the germline, the first being a novel tag (not fluorophore) which initially prevented expression at the tagged locus (PMID: 30109984), but after making changes to the sequence to avoid silencing signals the authors could rescue expression and thereafter saw robust expression in various novel contexts with this tag. The second example (PMID: 34547227) leaves open the possibility that germline repression of that particular gene might be a part of its endogenous regulation.

      Nevertheless, given it is probably rare if occurs at all, it will likely take a large scale tagging effort to uncover such cases at sufficient numbers to study. In our view, this further justifies tagging at large, ideally genomic, scales. If we do discover that there are numerous annotated germline proteins which we don’t observe by tagging, that would be interesting to study on its own.

      (7) Halotag is presented in the Discussion as a small tag, but it is bigger than GFP.

      Thank you for catching this. We have removed the discussion of Halotag. Given the comparable size to FPs, it would be unlikely to alleviate issues of tag functionality.

      (8) It would be useful to include FPKMs and viability percentages in Table 1.

      FPKM is included in column 6, but the title for this column is cut off. In the revised table FPKM values are now shown more clearly across stages.

      We did not quantify viability percentage. In our view it does not yield an informative metric when there is little information about the protein’s required dosage for function, which was the case for most proteins here. A haplosufficient gene might yield a full brood size even if 50% of protein function is lost; conversely, a highly dose sensitive protein could yield penetrant and severe inviability with mild perturbation of function. It also is not actionable information at this stage if there is no alternate tagging strategy as a baseline of comparison. The worms we picked to image all have viable embryos as adults, so in those individuals the genes were likely to be sufficiently expressed and functional.

      (9) Because establishing that a guide works well is a limiting step for many CRISPR experiments (once a guide works well, it's easy to inject 5 worms and get lines), I wondered if testing that for many genes is what is really needed in the field at this stage. 

      Guide quality is rarely an issue in C. elegans, as for all the genes here we tried only one guide, all of which were previously untested. We now clarified this in the discussion section:

      “Notably, we find that previously untested guide RNAs and homology arms perform exceptionally well at novel loci, as we only tested one set of reagents for each locus which yielded satisfactory tagging rates.”

      (10) For a manuscript where the injection is so central to what was done, I was surprised to read in the Acknowledgments that all of the injections were done by someone who is not included as an author.

      We are likewise surprised by such a comment but gladly clarify: Chi Chen has been with us as an expert microinjection specialist for more than 25 years and her very important technical contributions have been acknowledged in many dozen papers. Multiple authorship guidelines, including COPE’s and ICMJE’s, state that technical contributions alone do not qualify for authorship.

      Reviewer #2 (Recommendations for the authors):

      (1) We would encourage the authors to provide systematic validation of the reported insertions. The manuscript reports that 24 of 30 tags were isolated and visible, but does not clearly state whether each isolated line was confirmed by sequence‑level validation to be correctly in‑frame and free of unintended mutations at the target locus.

      We appreciate the reviewer’s concerns on fidelity. These parameters have been assessed in prior published work (e.g., PMID: 30504364, PMID: 34748534) and in our hands are in the range of 80% whenever we sequence non-fluorescent tags of similar sizes. The efficiencies we observed are high enough that one can expect to recover numerous worms with the exact intended sequence for each target, though we would argue mutations within the FP reporter are less likely to matter if it retains high fluorescence.

      (2) The manuscript presents aggregated success counts (e.g., 8/10 mTagBFP2 tags, 9/10 mStayGold, 7/10 mScarlet3) and useful narrative descriptions of injection outcomes. We also suggest including per‑locus success rates.

      Figure 3B shows per locus success rate and source data is provided for this figure. Each dot is an individual injection and the Y axis is per locus rate. We now worded this more clearly in the figure’s caption.

      “Total insertion efficiencies per locus for the indicated targets across injection pools.”

      (3) For pools that required re‑injection after initial failures, we would like to see a description of the specific changes that were made to the injection mixes or procedures (e.g., new repair template prep, different Cas9 reagent lot, guide redesign). This will be useful troubleshooting information for others.

      We re-made the exact same injection mix but with nanodrop to ensure the purity of the repair templates as assessed by absorbance ratios (A260/230 and A260/280) were sufficient after each purification step. No other changes were made. This is now specified in the methods section in the following way:

      “For re-runs of pools 4, 6 and 10 which failed initially, we regenerated the repair templates and ensured that after each column purification, the A260/230 ratio of the purified DNA was ≥2.2 and A260/280 was 1.8 ± 0.05 when measured with a Nanodrop spectrophotometer.”

      (4) The authors state that the fluorophore sequences are codon-optimized for C. elegans. We suggest they provide the exact donor/tag sequences, specifically state whether the fluorophore sequences contain any synthetic/artificial introns, or whether other sequence modifications (e.g., silent PAM‑disrupting mutations) were included in the donor templates. 

      This information is provided in Supplementary Table 1.

      (5) Page 3: Include a reference for "The C. elegans genome encodes around 20,000 genes" 

      We added a reference to the most recent release of the genome (WS237, May 2013). Spieth et al., 2014.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to uncover novel therapeutic vulnerabilities in APC-mutant colorectal cancer (CRC), which constitutes the majority of CRC cases. They hypothesized that modulating oxygen-sensing pathways (via PHD inhibition) could disrupt adaptive stress responses in these tumours.

      Strengths:

      The study employs a powerful, two-pronged approach to identify Molidustat's targets. By using both Thermal Proteome Profiling (TPP) and an orthogonal chemical proteomic competition assay, the authors provide compelling evidence that GSTP1 is a genuine, direct off-target, effectively addressing the common limitation of indirect effects in proteomic screens.

      Weaknesses:

      (1) In Figure 1, the current data rely on a single guide RNA (sgRNA). To make the data solid, at least two independent sgRNAs targeting different regions of PHD2 should be used.

      We thank the reviewer for raising this. Clarity on the CRISPR strategy was missing from the original submission and we have now added the following to the Methods (Page 4). We did not use a single sgRNA. PHD2 was targeted with a pool of three chemically modified crRNAs:

      (IDT Alt-R; target sequences: 5'-TACAACCAGCATATGCTACA, 5'GTGGCTGCCGAAGCCGAGCC, 5'-GATAAGATCACCTGGATCGA)

      Delivered as in vitro assembled ribonucleoprotein complexes with high-fidelity Cas9. This format has been reported to achieve high on-target efficiency while minimising off-target cutting [1,2] such that any residual stochastic off-target events are distributed across the population and are not expected to manifest as a coherent phenotype at the population level. Working with pooled, unselected knockouts rather than single-cell clones also avoids the confounds of clonal heterogeneity that normally motivate the use of multiple independent guides and rescue experiments in single-clone workflows. We have previously validated this approach for GSTP1 knockout in a separate single-cell proteomics study [3], where loss of GSTP1 protein was observed in over 90% of single cells and GSTP1 was the most significantly altered protein between sgControl and sgGSTP1 populations.

      (2) Figure 3E: Asn205 site should be mutated to prove that whether Molidustat inhibits GSTP1 activity via Asn205 or not.

      This is a good suggestion, and we explored it in silico before concluding it was not tractable. We used PyMol mutagenesis to model Molidustat binding to GSTP1 variants at the predicted contact residues: Asn205 was mutated to Ala, Gly and Ser; Trp39 (predicted to hydrogen-bond Molidustat) was mutated to Ala, Phe and Thr; and a Tyr8Phe/Asn205Ser double mutant was also modelled. In every case, Molidustat reoriented within the active site and adopted an alternative hydrogen-bonding configuration (most commonly with Tyr8), yielding a docking score equal to or better than binding to native GSTP1 (Author response image 1– Author response image 4). The model therefore does not predict any single or double point mutant that would ablate Molidustat binding in a clean, interpretable way, and we could not design a rational loss-of-interaction mutant on this basis. Given this limitation, and that definitive mapping of the binding interface would require co-crystallography, which is beyond the scope of the present study, we have moved the docking model to the supplement and flagged it as predictive rather than definitive.

      Author response image 1.

      Molidustat in native GSTP1

      Author response image 2.

      Molidustat docking with mutated GSTP1, Asn205 mutated to Gln205

      Author response image 3.

      Molidustat docking with mutated GSTP1, Tyr39 mutated to Phe39

      Author response image 4.

      Molidustat docking with mutated GSTP1, Asn205 mutated to Ser205 and Tyr8 mutated to Phe8

      (3) Figure 5B and 5C: The metabolic imbalance phenotype observed upon dual knockout of PHD2 and GSTP1 requires rescue experiments to confirm on-target specificity.

      We thank the reviewer for this important point and agree that rescue experiments could represent the most direct demonstration of on-target specificity for the metabolic phenotype observed in Figures 5B and 5C. These rescue experiments are necessary when working with single clones, as they allow for comparing a knock-out clone with a reconstituted pool and sidestep the issue of clonal heterogeneity.

      In our case, we think that there is no advantage to doing so, as we work with pooled knockouts, so any clonal heterogeneity is diluted in the pool.

      One could even make the case that such a rescue experiment would introduce additional artefacts. Combined loss of PHD2 and GSTP1 leads to reduced cellular viability, with decreased proliferation and increased apoptosis, consistent with a synthetic lethal interaction. To devise a rescue experiment, we would have to isolate a single-cell clone (the pool is not a complete 100% knock out, WT cells would outgrow the knock out cells). The isolation of such a clone that has overcome the anti-proliferative insult of the double knockout is likely to have a phenotype distinct from the original, pooled population, as would the rescued have from the WT cells. For these reasons, we have not performed rescue experiments in the current study. We have added the absence of a rescue as a limitation to the study in the discussion

      “While genetic rescue experiments would provide definitive confirmation of on-target specificity, the pronounced loss-of-fitness and apoptotic phenotype observed upon combined PHD2 and GSTP1 loss limited the feasibility of establishing stable rescued double-knockout populations, and therefore represents a limitation of the current study.”

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to determine Molidustat targets and the potential utility of these findings. They clearly demonstrate that Molidustat interferes with GSTP1 and some other proteins on top of PHD2. They also demonstrate that PHD2 deletion is not sufficient to recapitulate Molidustat effects in cells and proteomes. Finally, they demonstrate synthetic lethality in organoids for Molidustat and APC deletion.

      Strengths:

      The data on Molidustat proteomes, GSTP1 binding, inhibition and metabolic health of organoids is really clear. All biochemical, docking and omic data are really strong. The potential impact of these findings could be the use of Molidustat in APC null tumours and awareness of potential off-target effects.

      Weaknesses:

      A main but minor weakness is that Molidustat also inhibits other PHDs, although these are less expressed. PHD1 has been shown to control the cell cycle and be expressed in the colon, where it is needed for viability. Although this does not explain the lack of effect of other PHD inhibitors, it does warrant some discussion. The use of MTT is not very good to detect viability when it measures metabolism; this also needs to be discussed and perhaps supplemented with colony or cell number measurements.

      Great point, for this reason, we have assayed apoptosis throughout. In addition, we have added a clonogenicity assay with APC organoids. Organoid cells were treated with an acute dose of Molidustat. We subsequently measured the level of Lgr5 (a stem cell marker) and of the ability of the cells to generate organoids (these data have been added as Figure 5 F-G.)

      Reviewer #3 (Public review):

      In this paper, the authors revealed that Molidustat can induce a dose-dependent increase in Caspase-3/7 activity in the HT29 cell line, which is an APC-mutant colorectal cancer cell line. More importantly, they found that targeting PHD2 alone cannot cause cell death. By using thermal proteome profiling (TPP) and orthogonal chemical proteomic competition assays, they determined GTSP1 as a previously undiscovered off-target of Molidustat. They also revealed that combined PHD2 and GSTP1 loss leads to an increase in intracellular ROS and apoptosis. Moreover, they evaluated the effects of Molidustat in colonic organoids and showed that

      Molidustat has a high selectivity for colonic organoids with activated WNT signaling and/or KRAS pathway alterations, and this effect is not reproduced by hydroxylase inhibition alone, providing a new potential approach to targeting both PHD2 and GTSP1 for the treatment of APC-mutant CRC.

      Specific comments:

      (1) What is the possible molecular mechanism of dual GSTP1/PHD2 loss, inducing cell death?

      This is an important question. Our data support a model in which combined loss of GSTP1 and PHD2 disrupts cellular redox homeostasis, leading to accumulation of reactive oxygen species, increased GSSG/GSH ratios, and depletion of antioxidant buffering capacity. This redox imbalance is accompanied by downregulation of pro-survival pathways. In this context, activation of apoptotic signalling, as evidenced by increased caspase-3/7 activity and proteomic enrichment of apoptosis-associated pathways, contributes to the observed cell death phenotype.

      While apoptosis is supported by our data, the magnitude of oxidative stress suggests that additional oxidative stress-associated cell death mechanisms may also contribute. We have clarified this point in the Discussion (Page 11).

      (2) Can the authors mutate the binding site of Molidustat on GTSP1 to verify the in silico docking results?

      This is a very important question. Currently, the model is of limited value. Reviewer 1 had a similar question. Can we refer you to Reviewer 1, question 2.

      (3) Evidence for Molidustat inhibiting PHD2 activity or stabilising HIF-1α should be provided.

      We thank the reviewer for this suggestion. Data showing HIF-1α stabilisation and evidence of downstream signalling is now added to Supplementary Figure 1.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I only have minor suggestions:

      Molidustat also inhibits other PHDs, although these are less expressed. PHD1 has been shown to control the cell cycle and be expressed in the colon, where it is needed for viability. Although this does not explain the lack of effect of other PHD inhibitors, it does warrant some discussion. The use of MTT is not very good to detect viability when it measures metabolism; this also needs to be discussed and perhaps supplemented with colony or cell number measurements.

      This is correct, PHD1 is of particular interest, given the effects inhibition/knock-out has on the inflamed colon. We have added a new paragraph to the Discussion (Page 13) that addresses the isoform selectivity of Molidustat. We note that, although developed as a PHD2 inhibitor, Molidustat retains appreciable activity against PHD1 and PHD3 [4], and we discuss the non-redundant and in some contexts opposing roles of PHD1 and PHD2 in the colon, PHD1 loss is protective in DSS colitis [5] and restrains colitis-associated tumour growth, whereas PHD2 loss in the tumour and stroma is reported to inhibit metastasis and treatment response [6]. We further note that this pattern of isoform engagement is shared with other pan-PHD inhibitors that did not phenocopy Molidustat in our screens, indicating that PHD isoform profile alone is insufficient to explain Molidustat’s distinctive activity and pointing to GSTP1 off-target engagement as the key distinguishing feature. We argue that localised colonic delivery (as discussed earlier in the Discussion) would concentrate drug at the APC-mutant epithelium while limiting systemic exposure.

      We fully agree with the reviewer, MTT measures metabolic activity/NADH levels rather than viability in the strict sense, and that this is particularly relevant for a compound that perturbs redox metabolism. We have added a clonogenicity assay in APC organoids (Fig. 5 F-G) to supplement the MTT and Cleaved Caspase 3 assays already present in the manuscript.

      (1) Lee, J. K. et al. Directed evolution of CRISPR-Cas9 to increase its specificity. Nat. Commun. 9, (2018).

      (2) Sakovina, L., Vokhtantsev, I., Vorobyeva, M., Vorobyev, P. & Novopashina, D. Improving Stability and Specificity of CRISPR/Cas9 System by Selective Modification of Guide RNAs with 2′-fluoro and Locked Nucleic Acid Nucleotides. Int. J. Mol. Sci. 23, (2022).

      (3) Makar, A. N., Holkham, J., Lilla, S., Wilkinson, S. & von Kriegsheim, A. Overcoming preservation challenges to enable single-cell proteomics of fixed cell and tissue samples with retained proteome integrity. Preprint at https://doi.org/10.1101/2025.03.10.642380 (2025).

      (4) Flamme, I. et al. Mimicking hypoxia to treat anemia: HIF-stabilizer BAY 85-3934 (molidustat) stimulates erythropoietin production without hypertensive effects. PLoS One 9, (2014).

      (5) Tambuwala, M. M. et al. Loss of prolyl hydroxylase-1 protects against colitis through reduced epithelial cell apoptosis and increased barrier function. Gastroenterology 139, (2010).

      (6) Leite de Oliveira, R. et al. Gene-Targeting of Phd2 Improves Tumor Response to Chemotherapy and Prevents Side-Toxicity. Cancer Cell 22, (2012).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study used pupillometry to provide an objective assessment of a form of synesthesia in which people see additional color when reading numbers. It provides convincing evidence that subjective color ratings are matched by changes in pupil size that recapitulate brightnessmediated changes when exposed to the real color. The work provides a valuable contribution to the literature on both synesthetic perception and the use of pupillometry to probe perception and related psychological processes.

      We were pleased to learn that our manuscript was of interest to the reviewers and the editor. We thank the reviewers for their useful feedback and have addressed all their comments in the revised version. We here give the most prominent changes as quotes.

      We thank all reviewers and for their very helpful input.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Knowing that small pupil-size variations accompany brightness variations (even when these are illusory), the authors asked whether pupil constrictions would accompany the synesthetic perception of a brighter color (compared with a darker one), induced by the presentation of a blackwhite character. This grapheme-colour synesthesia is only experienced by a few participants, sixteen of whom were enrolled in this study. The results reliably showed that a relative pupil constriction would "betray" the perception of a brighter color in these participants, while no such effect would be observed in control participants who were asked to report a color in association with each grapheme, even though they did not perceive any.

      Strengths:

      The main strength of the study lies in its combination of psychophysics (brightness ratings) and pupillometry, which allowed for showing clear-cut results.

      Weaknesses:

      Some relatively minor weaknesses concern the ancillary analyses, which tackle secondary questions and are not entirely convincing.

      (1) The linear mixed model approach is a powerful way to identify important variables, but it does not clarify whether the key factors are between-subject or between-trial variations. Some variables are inherently defined at a subject level (e.g., PA scores), others are not. I would strongly recommend an alternative visualisation of the results to examine inter-individual variability.

      Visualizing the highly idiosyncratic effects is indeed challenging. Addressing R1’s point 4 and a point brought up by R2, we updated all figures to now visualize pupil size in millimeters instead of arbitrary units. Furthermore, we added a supplementary figure (supplementary figure 4) that visualizes pupil size change without demeaning (please see reply to point 4).

      To get a better grasp of the interaction between lightness and coupling strength, we further included the supplementary figure 5 that splits by lightness and coupling strength in synesthetes.

      Furthermore, as this review and response will be publicly available, Author response image 1 provides participant-mean traces per lightness bin in addition to the overall means and hopefully makes the stability/variability of effects visually clearer (in addition to the strip plots that attempt this for the average response).

      Author response image 1.

      We hope that these additional visualizations make the effects of interest more transparent. Ultimately, however, the LME figure likely provides the information best, albeit at the cost of complexity.

      (2) It is not clear why taking the first derivative of pupil size in Figure 5 would isolate the effect of arousal, eliminating those of luminance and contrast changes (in fact, one could argue for the opposite, since arousal effects are generally constant for extended periods of time while contrast effects are typically more local and transient).

      First, please note that the results in 2.3.1 cannot be explained by task or context effects such as luminance and contrast: the exact same active color reporting task (same task and context) was presented to synesthetes and non-synesthetes.

      Indeed, the reviewer is correct that the first derivative does not eliminate other concurrent pupil-driving effects, that was expressed wrongly in our original text. Indeed, any stimulus-locked effect, such as the luminance and contrast effects, but also the effort effect will reflect similarly in the derivative measure.

      We did take the derivative because pupil responses driven by other non-trial related activity, such as increasing tiredness or excitement over the course of trials differ almost by necessity between participants, thus creating variability. However, these effects are most likely happening at a slower timescale and thus show less in the derivative measure. Accordingly in past research, we previously found clearer response-locked effects in the past when using a derivative measure (Douze et al., 2025; Ten Brink et al., 2024). This way, we also hoped to get rid of such variability that happens between participants for this between participant analysis.

      Even if we were to use the same baseline corrected analysis, we would arrive at the same conclusion: we here directly compared baseline-corrected pupil sizes by taking individual differences into account (using a LME). In other words, we tested for the same question, but not relying on the derivative. We thus compared baseline-corrected pupil sizes using over-time LMEs. Group (active control vs. synesthete) gained significance between ~1.7s and 3s, aligning with the derivative-based result.

      Author response image 2.

      t-values of a per-time point LME predicting pupil response from group (synesthete/active control) Group reached significance.

      In sum, we deem the derivative more powerful/more appropriate in this context, but the interpretation of findings does not hinge on that analysis choice (as can be seen in the Author response image 2).

      We corrected the claims on the derivative as a measure cleaning out other effects that indeed was oversimplified as it stood. We now write:

      “Mental effort presents in task-evoked pupil dilations, yet other factors simultaneously affect the pupil, such as luminance and contrast changes at trial onset, as well as slower trends across the session (e.g., fatigue). To reduce the influence of these slower, non-trial-locked fluctuations while retaining the trial-evoked dynamics, we calculated the first derivative of the pupil time course to assess the velocity of pupillary changes (Butterworth filter, 18 Hz, order 3, 2.5 Hz lowpass, following our previous works [60, 61]).”

      Douze, B. T., Ten Brink, A. F., Dijkerman, H. C., & Strauch, C. (2025). Pupil responses objectively index pharmacologically altered tactile sensitivity. Cortex, 193, 90-104.

      Ten Brink, A. F., Heiner, I., Dijkerman, H. C., & Strauch, C. (2024). Pupil dilation reveals the intensity of touch. Psychophysiology, 61(6), e14538.

      (3) It is a pity that responses to physical brightness modulations were only measured in the synesthete group, not in controls, as this would have allowed for ruling out differences in pupil reactivity across the two populations.

      The reviewer is correct that this would allow additional comparisons, but argue that light responses in healthy control samples are very well documented and stereotypical. For instance, Bergamin & Kardon (2003) provide very systematic latency estimations, for low-luminance change stimuli in the realm of about 320ms that can accelerate to about 250ms for very strong luminance changes. Our relatively small luminance increments should thus be expected in this range. Indeed, this also well describes the response latencies we observed in synesthetes when exposed to the colored disks. While there is no detailed information about participants in Bergamin & Kardon (2003), data from previous studies shows very similar pupil light response profiles in a healthy student control population that matches our synesthetes well demographically (Strauch, Romein et al., 2022 Figure 2a, exact same lab as for the present study; Koevoet et al., 2025 Figure 3a). See also the further responses, baseline pupil size in millimeters across groups did not differ.

      Together, we can safely conclude that pupil light responses in synesthetes are not different from pupil light responses in controls. We agree with the reviewer that this is a sensible point to also make in the manuscript:

      “Specifically, pupil size first responded significantly to physical luminance after 330 ms (see Supplementary Figure 7 for per-timepoint LME; in line with response latencies of similar control populations, see Bergamin & Kardon [52], Koevoet et al. [40], and Strauch et al. [53]), but only responded significantly to synesthetic lightness at about 870 ms (see also Figure 3c vs e and Figure 4 for per-timepoint LME)”.

      Bergamin, O., & Kardon, R. H. (2003). Latency of the pupil light reflex: sample rate, stimulus intensity, and variation in normal subjects. Investigative Ophthalmology & Visual Science, 44(4), 1546-1554.

      Koevoet, D., Naber, M., Strauch, C. & Van der Stigchel, S. Presaccadic Attention Shifts Up-and Downwards: Evidence From the Pupil Light Response. Psychophysiology 62, e70047 (2025).

      Strauch, C., Romein, C., Naber, M., Van der Stigchel, S., & Ten Brink, A. F. (2022). The orienting response drives pseudoneglect—Evidence from an objective pupillometric method. Cortex, 151, 259-271.

      (4) Another concern is with the visualisation of the pupil traces in Figure 3 (main results); these were heavily pre-processed (per-participant demeaned), losing any feature besides the effect of interest and generating the unrealistic expectation that perception of dark/bright colors generate a net dilation/constriction of the pupil - whereas perception-related modulations of pupil size are always relative and generally small compared to the numerous other effects registered in pupil size. It would be far better to see the actual profiles, preserving the unfolding of dilations and constrictions over time, especially since these are further analysed in Figures 4 and 5.

      Indeed, the expectation that any dark synesthetic experience would lead to pupil dilation whereas any bright synesthetic experience would lead to constriction is not warranted – it would only do that relative to the counterfactual of not having that experience.

      Many factors affect the pupillary signal at the same time, and often differently across individuals (think of tiredness etc.), making merely baseline corrected traces seemingly noisy. Our visualization highlights that there is a systematic part to that variation that lies in the synesthetic brightness experience.

      Visualizing the effects of idiosyncratic experiences, varying within and between participants is challenging. For the theoretical insight brought about through our paper in Figure 4 (synesthesia being sensory in nature), demeaning is favorable in our opinion as it isolates the effect of interest in visualization. However, for methodological reasons and to better show effect sizes etc., there is certainly use in additional transparency. We now thus provide non-demeaned traces in the supplementary material as the reviewer suggested and also refer to these in the main manuscript. Furthermore, all figures are now provided in millimeters, with all pupil related analysis being rerun and updated to this end (without qualitative changes to the results). This should further rectify possibly inflated expectations about the absolute size of effects and allows to put effects into perspective across studies. We now added:

      “Pupillary data were transformed from arbitrary eyelink units to millimeters using a conversion factor obtained with an artificial eye (see Hayes & Petrov, 2016).”

      Hayes, T. R., & Petrov, A. A. (2016). Mapping and correcting the influence of gaze position on pupil size measurements. Behavior research methods, 48(2), 510-527.

      Impact:

      Despite these weaknesses, and especially if they are adequately addressed in the review, this work is likely to improve our understanding of synesthesia, providing a new tool to quantify the subjective sensations; an interesting potential extension would be using pupillometry for tracking changes over time of the synesthetic experiences, opening up the possibility to evaluate the importance of learning for this peculiar experience.

      We were happy to read our manuscript was evaluated this positively and hope that our replies can address the remaining smaller concerns and make findings more transparent to the readers.

      Reviewer #2 (Public review):

      Synesthesia is a neurological condition where stimulation of one sensory channel leads to involuntary, automatic, and consistent experience of another, unrelated percept. For example, Sir Francis Galton (1880, Nature) famously described the robust tendency of some individuals (synesthetes) to associate numerals with a distinct color. Ever since, synesthesia has continued to attract a broad interest in the cognitive neurosciences in light of its implications for the study of domains such as perception, consciousness, and brain connectivity, among others.

      Strauch, Leenaars, and Rouw measured pupil size in a group of 16 grapheme-color synesthetes and two matched control groups. The participants were presented with gray digits - that is, visual stimuli having identical physical properties in terms of brightness. Each participant subsequently rated the corresponding evoked color and brightness: unlike controls, synesthetes did so in a very consistent and reliable fashion. Accordingly, this was also shown in their pupils: despite the same objective luminance, digits associated with brighter percepts caused their pupils to constrict, and digits associated with darker percepts caused their pupils to dilate more than controls. These results highlight how crossmodal correspondences are deeply rooted in synesthetes, and put forward pupillometry as a particularly appealing biomarker for some phenomenological experience (at least those grounded in "brightness").

      Further strengths of the technique are its temporal resolution and its responsiveness to several constructs. Across several tasks, the authors show, for example, that responses to synesthetic light are somewhat slower than responses to real light (i.e., they are likely mediated), but at the same time faster than responses to mental imagery. The role of mental imagery can also be reasonably dismissed when considering the second feature of pupil size: its responsiveness to mental effort and cognitive load. The pupils tend to dilate with demanding, challenging tasks, and this was the case when control participants were asked to report the color of a digit for which they did not consistently experience a synesthetic association. The same task was, instead, seemingly effortless for synesthetes, again speaking in favor of the automaticity of number-color correspondences in their case.

      Overall, the findings by Strauch, Leenaars, and Rouw are highly significant for the field and likely to be impactful. The strength of their evidence, when accounting for the relatively small sample size and the inherent variability of both phenomenology (color perception and subjective reporting) and physiology (pupil size), is adequate and sufficiently convincing.

      We were glad to read this overall very positive assessment of our work and thank the reviewer for the additional non-public suggestions for improvements.

      Reviewer #3 (Public review):

      Summary:

      In the present study, the authors examined pupillary responses to uncolored stimuli (number graphemes) among number-color synesthetes and non-synesthetes. After seeing a digit, the synesthetes and active control participants were asked to indicate which color they perceived using three dimensions of hue, saturation, and lightness. The lightness values were the primary independent variable for follow-up analyses. To see how the pupil responded to psychologically "bright" and "dark" digits, the authors split the reported lightness values at the median and plotted them. The synesthetes showed a pupillary constriction to digits they perceived as bright and dilation to digits they perceived as dark. Active control participants did not show that effect. In a subsequent block, only the synesthetes were shown the colors they reported perceiving as colored discs. Their pupillary responses were similar. The authors also found that the differences in pupillary responses between light and dark perceptions (with digits) were only slightly delayed in their onset to the perception of a colored disc, and therefore, the color perception accompanying a digit is unlikely to be effortful or a retrieved association, but occurs rather automatically.

      Strengths:

      The authors employed a well-controlled and designed quasi-experiment comparing colorgrapheme synesthetes to non-synesthetes and showed convincingly that the color perceptions accompanying graphemes alter the physical perception of brightness. They also made a reasoned attempt to rule out the possibility that color associations are occurring effortfully via retrieved associations.

      We appreciate the positive assessment and useful suggestions for revision.

      Weaknesses:

      There are some areas in which the implications of these findings could be elaborated upon. I had the following questions:

      (1) Are the pupillary responses among synesthetes, which objectively do not seem to match the degree of physical stimulation entering the retina, in any way maladaptive for eye functioning? I understand the constriction/dilation of the pupil to not only benefit visual acuity but also to protect the retina from damage. Are synesthetes at any risk of retinal damage due to over-dilation of the pupil to brighter stimuli? Or are these effects of a magnitude that is too small to matter? As reported in arbitrary units, it was hard to know how large these effects were in terms of measurable changes in dilation (e.g., millimeters).

      This is an interesting point. Some argue that pupil size changes in a mid-range mildly affect optics thus affecting detection performance, contrast perception, and depth of field (Eberhardt et al., 2022, Mathôt & Ivanov 2019, Ruuskanen, Boehler, & Mathôt, 2025), rather than serving a protective role for the retina (Mathôt, 2018). Indeed, any effects reported here were quite small. We agree with the reviewer that this can be made more accessible by reporting effects in millimeters. We thus now adjusted all figures accordingly and write in the methods section:

      “Pupillary data were transformed from arbitrary eyelink units to millimeters using a conversion factor obtained with an artificial eye (see Hayes & Petrov, 2016).”

      Note that even the largest effects here (those elicited by physical luminance change in block 2 for the synesthetes) only caused differences in pupil size of about 0.3mm. This lies below the maximal pupil dilations observable in response maximal effort (about 0.5mm), for instance, and substantially below the full range of pupil size changes elicited through strong luminance stimulation (several millimeters). We therefore deem the changes in pupil size as obtained in our study too minor to be practically maladaptive for optics/perception.

      Eberhardt, L. V., Strauch, C., Hartmann, T. S., & Huckauf, A. (2022). Increasing pupil size is associated with improved detection performance in the periphery. Attention, perception, & psychophysics, 84(1), 138-149.

      Hayes, T. R., & Petrov, A. A. (2016). Mapping and correcting the influence of gaze position on pupil size measurements. Behavior research methods, 48(2), 510-527.

      Mathôt, S., & Ivanov, Y. (2019). The effect of pupil size and peripheral brightness on detection and discrimination performance. PeerJ, 7, e8220.

      Mathôt, S. (2018). Pupillometry: Psychology, physiology, and function. Journal of cognition, 1(1), 16.

      Ruuskanen, V., Boehler, C. N., & Mathôt, S. (2025). The Interplay of Spontaneous Pupil-Size Fluctuations and EEG Power in Near-Threshold Detection. Psychophysiology, 62(3), e70035.

      (2) Likewise, is the automatic synesthetic merging of two percepts something that could be learned such that natural synesthetes and "artificial" synesthetes would look similar? For example, if a group of non-synesthetic participants were to learn a color-grapheme association to automaticity, would you expect their pupillary responses to the graphemes look similar to the synesthetes'? If so (or if not), what would this tell us anything about the phenomenology of synesthesia?

      We find this question most interesting. Likely, different synesthesia researchers wouldn’t even fully agree on the most plausible answers to these questions. Training studies have shown that nonsynesthetes can be trained to associate particular colors to particular graphemes, as revealed in the synesthetic Stroop effect: interference effects of the learned color onto reporting the typeface color of the grapheme. The degree to which non-synesthetes can be trained to become similar to synesthetes is however still topic of debate.

      We now discuss as follows:

      “Future studies could examine to what degree training a non-synesthete to associate specific colors to particular inducers (e.g., digits), can provide similar patterns of results as genuine synesthesia (Bor et al., 2014, Colizoli et al., 2012, Rothen & Meier, 2014). Could learning produce similar brightness-related pupil effects in non-synesthetes? Similarly, would effort-linked responses diminish with increased training duration? The perhaps most interesting question relates to response latencies: Would a trained participant ever be able to produce brightnessrelated pupil effects as fast as a synesthete?”

      Bor, D., Rothen, N., Schwartzman, D. J., Clayton, S., & Seth, A. K. (2014). Adults can be trained to acquire synesthetic experiences. Scientific reports, 4(1), 7089.

      Colizoli, O., Murre, J. M., & Rouw, R. (2012). Pseudo-synesthesia through reading books with colored letters. PloS one, 7(6), e39799.

      Rothen, N., & Meier, B. (2014). Acquiring synaesthesia: insights from training studies. Frontiers in human neuroscience, 8, 109.

      (3) Do the synesthetic perceptions of digit graphemes merge in a sensible way? For example, if a synesthete sees a particular color with the digit 1, and a different color with the digit 9, what do they perceive when they see 19? or 1-9, or 1 9? Is there color blending, or an altogether different color perception?

      This is a very interesting question indeed. While each synesthete will have their own specific expression of synesthesia, there are regularities in how a combination of digits evokes synesthetic color. First, if asked about the color of a specific digit, each digit keeps its own color, as the color of a digit is linked to the identity of the digit (Dixon et al., 2006). Context effects are however possible, in particular when context alters the interpretation of the digit (Myles et al., 2003). A particularly common context in a multi-digit number is a dominant first digit, spreading its color to the subsequent digits in the number. However, as the digit color is linked to digit identity, what does ‘not’ happen is a mixing of colors into a qualitatively new color; for example, a yellow "1" and blue "9" do not merge into a green "19".

      Dixon, M. J., Smilek, D., Duffy, P. L., Zanna, M. P., & Merikle, P. M. (2006). The role of meaning in grapheme-colour synaesthesia. Cortex, 42(2), 243-252.

      Myles, K. M., Dixon, M. J., Smilek, D., & Merikle, P. M. (2003). Seeing double: The role of meaning in alphanumeric-colour synaesthesia. Brain and Cognition, 53(2), 342-345.

      Many thanks for the constructive assessment of our work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I am not sure I'd use the term 'cross-modal' given that the case considered here (graphemecolor) is purely visual.

      The reviewer is absolutely right: the term 'cross-modal' has a historical background rather than reflecting an exact factual accuracy. The term is still commonly used however, as it readily reflects how the induced additional experience is always of a different (sub)type than the inducing experience. There is a cross-over between experiences that might occur within the same sensory modality, or even induce awareness of a particular concept. But key to synesthesia is the crossover experience as the inducer and concurrent are different (sub)types of experiences. For example, seeing a letter can evoke a synesthetic experience of seeing a color, or evoke awareness of a particular gender or personality of that letter, but does not evoke another letter. To remain consistent with literature, we refer to 'cross-modality' when explaining the link to previous literature, but generally switched to using 'cross-over experience':

      “Therefore, synesthesia might provide a unique window into how the brain’s constructive processes can generate additional, conscious content, in cross-over experiences, often across modalities, going all the way down to the level of sensory phenomenology.”

      We adjusted throughout the manuscript accordingly.

      (2) I would not recommend focusing the introduction on the problem of qualia; this is a much more general and complex question than the one addressed in the study; the space of the introduction may be better used to present the actual object of study, giving a better picture of the synesthetic phenomenon and of previous work aimed at characterising it (behavioural, including PA scores and consistency measures, and neuroimaging). It is important to discuss how the pupillometric approach differs from the previously adopted neuroimaging techniques and what it can add to those.

      We agree that qualia is a very general and complex question. However, we respectfully disagree that this complex question is not the object of the study. What is remarkable about synesthesia is not the presence of an additional perceptual association per se, but the presence of a specific perceptual experience. As illustration, think of a test where an unconscious color association to the word 'banana' was tested. While a generic 'yellow' could semantically be linked and would likely be obtained in the (e.g. priming) experimental results, a follow-up question of picking on a color wheel the exact shade of yellow to this association, or describing the perceptual sensation of the color, would be non-sensical to the participants.

      This sharply contrasts with the current study: synesthetes, but not non-synesthetes, indicate a perceptual sensation of additional colors, and subsequently indeed the sensory properties of this percept (experienced brightness) affects the objective reflection of this sensation (pupil size) in synesthetes but not in non-synesthetes. In our view, the presence of additional qualia is key in understanding what sets synesthetic apart from non-synesthete associations, including so-called cross-modal correspondences (unconscious consistent associations across modalities, common to us all). We even believe that the reported qualia is what makes synesthesia so interesting in the first place. We now more clearly explain this link to qualia better in the introduction.

      "The most remarkable aspect of synesthesia is the subjective perceptual phenomenology of the induced colors, setting these sensations apart from color memory, thought, or amodal association. The contrast between synesthetes and non-synesthetes can thus offer an interesting doorway into examining qualia, the subjective perceptual phenomenology or first person (what's-it-like) perspective."

      We also improved the explanation of the synesthetic phenomenon, including a more detailed characterisation of behavioural measures (including consistency scores) and added neuroimaging studies. These changes have been incorporated into the text in response to previous comments (point 1- reviewer 1).

      Please note that we have chosen not to include more detailed discussion of PA scores. Our results show a trend but do not allow for a conclusive interpretation on PA scores, and we feel that placing greater emphasis on this topic might therefore be confusing or even misleading. Still, it would be a very interesting topic for follow-up research to examine how alterations in characteristics of the synesthetic experience influence pupil responses.

      The different synesthesia types all share the defining characteristics of an additional conscious and consistent experience. Synesthetes can verbally report their additional experience, and synesthetic sensations can be measured in behavioral paradigms such as the ’synesthetic Stroop’ effect, or brain activation patterns in sensory cortex [15]. Furthermore, test-retest paradigms show how synesthetic, but not non-synesthetic associations are highly specific and consistent [16-18]. Thus, over the past decades, research has established synesthesia as a ’real’ condition that can reliably be identified using behavior, neurophysiology, and neuroimaging [11, 13, 15–21]. The most remarkable aspect of synesthesia is the subjective perceptual phenomenology of the induced additional sensation, i.e., color in grapheme-color synesthesia. This sets synesthetic sensations apart from (color) memory, thought, or amodal association. Synesthesia can thus offer an interesting doorway into examining qualia, the subjective perceptual phenomenology or first person (what’s-it-like) perspective.

      We now discuss the pupillometric approach as it differs from the previously adopted neuroimaging techniques as follows:

      “Compared to neuroimaging studies [12,15,51], pupillometry may offer a more direct window into synesthetic phenomenology, as the directionality between pupil light reflex and perceived brightness is straightforward. Finally, improved understanding of the underlying processes can be obtained by contrasting responses to perceived versus actual (physical) brightness, given that the pupil light reflex is a well-characterised reflex arc involving few inferential steps.

      This adds to the explanation that was already present on how the current approach differs from previous techniques, and what it can add to those techniques:

      "Instead, current paradigms capturing synesthesia employ objective measures, but fail to capture its phenomenology [16, 17, 21, 23]."

      (3) There are a few typos and word repetitions.

      Many thanks – we identified typos and repetitions after another set of careful reads and hope to have eradicated them completely now.

      Reviewer #2 (Recommendations for the authors):

      I am overall very supportive of this work, but addressing the following points may enrich it further:

      (1) Paragraph 2.2.1. Here, models do not seem to compare synesthetes versus controls but rather assess the effects of interest separately in the two groups. The fact that experimental effects are significant in synesthetes, but not in controls, does not tell us much about differences between groups. Controls (e.g., Figure 3) do show a similar trend, albeit clearly smaller. There is one passage in which this issue appears to be tackled (page 10): "Critically, in an LME ran on synesthetes and controls and using only graphemes and the interaction of group and lightness as predictors, we found lightness to predict pupil size in synesthetes (t = -2.754, p = 0.006), but not controls (t = -1.134, p = 0.257)." But I am not sure that the reported statistics belong to the interaction - they seem to refer to the lightness effect within each group, not the difference.

      This is an important point, power for between-group comparisons is inherently limited for n = 16 per group (while still feasible for overall responses, things become trickier when less trials remain). A simple model of pupil ~ grapheme + group * lightness_scaled + (1 | participant) shows no significant interaction (despite one group showing the effect and the other not showing the effect significantly). The additional negative effect for group is in line with the effort-related effect reported later in the manuscript. Where does this leave us? Based on the lightness responses alone, the group difference can be characterized as a quantitative distinction, but the degree in which it is also a qualitative distinction cannot clearly be determined from current data. We revised the manuscript to make sure that such an interaction is not implied/ point to the absence of the significance of that interaction.

      The sensory nature of synesthetic color is supported by within-synesthete analyses, where coupling strength parametrically modulates the lightness-pupil relationship in a theoretically predicted manner. Importantly, the effort-related findings provide a complementary and statistically robust group comparison: synesthetes and controls performing the identical colorreporting task showed significantly different pupil dilation rates, directly demonstrating that the two groups differ in how they access color information. Together, these two independent pupillometric signatures, one tracking perceptual quality, one tracking effort, converge on the same conclusion and mutually reinforce the interpretation that synesthetic color constitutes genuine sensory phenomenology.

      Author response image 3.

      We now make this more explicit in the manuscript as follows:

      “We found significant modulations of pupil size by the lightness of the grapheme's synesthetic color - sustained and in the to-be-expected time window. Specifically, the pupil constricted more for brighter reported colors, and dilated more for darker reported colors, as predicted (Average pupil size 800-4000ms, t = -3.601, p < 0.001). In an LME ran for synesthetes and controls and using only graphemes and lightness as predictors, we found lightness to predict pupil size in synesthetes (t = 2.844, p = 0.004), but not controls (t = 0.606, p = 0.544). However, when taking group as interacting factor in a joint LME, there was no interaction of lightness and group (t = -0.949 p = 0.342).”

      and

      “For controls a separate model was run, now without the PA score as predictor (not assessed for controls). Neither lightness (t = -0.815, p = 0.415), coupling strength (t = 0.438, p = 0.661), nor their interaction gained significance (t = -1.058, p = 0.290; all for average pupil size between 800 ms and 4000 ms). Critically, we also ran a LME with the three-way interaction of coupling strength, group, and lightness (Wilkinson notation: pupil = grapheme + group + lightness * group + coupling strength * lightness * group + (1 | participant)). This analysis revealed a significant three-way interaction between lightness, coupling strength, and group (F = 3.86, p = .021), indicating that the lightness × coupling strength effect on pupil size was not equivalent across groups. Decomposing this interaction by group, the lightness × coupling strength slope was significant in synesthetes (t = 2.59, p = .010) but not in controls (t=-1.01, p=.311), suggesting that reported lightness and its coupling strength were more consistently related to pupil size in synesthetes than in controls. Note however, that this decomposition does not directly test whether the two slopes significantly differ from each other, however. Lastly, pupil size was marginally larger in controls than in synesthetes (t = 1.94, p = .062; see later sections for more in-depth analyses)”

      (2) The authors choose to analyze pupil size in arbitrary eye tracker units. This is fine, although I would recommend assessing and reporting whether the average pupil size (e.g., during the baseline) is roughly comparable between groups. The size of the effects may be difficult to compare between groups in the presence of very different baseline pupil size.

      Please see Author response image 4 for Baseline pupil sizes per group in millimeters. There were no differences between groups.

      Author response image 4.

      F2, 45) = 0.707, p = 0.499 (One-way Anova).

      We now write:

      “Baseline pupil sizes did not differ between groups (F(2, 45) = 0.707, p = 0.499).”

      We agree with the reviewer that millimeters are a more intuitive measure and updated all figures throughout manuscript and supplementary materials accordingly. We also briefly added to signal processing that this conversion was applied.

      “Pupillary data were transformed from arbitrary eyelink units to millimeters using a conversion factor obtained with an artificial eye (see Hayes & Petrov, 2016).”

      Hayes, T. R., & Petrov, A. A. (2016). Mapping and correcting the influence of gaze position on pupil size measurements. Behavior research methods, 48(2), 510-527.

      (3) If I understand correctly, the main task counted 120 trials overall (12 per digit). It seems, however, that only 3 and 4 participants remained with at least 50 trials (or 25 per median split by lightness) after preprocessing. This appears to be quite a massive data loss: is there a reason behind it? Please also clarify: the overall percentage of discarded trials; whether the median split by lightness was computed on all responses or only on those of the remaining, valid trials.

      This is an important point for clarification indeed. The exclusion of participants in Figure 3 applies only to that particular visualization, not to the statistical analyses. The linear mixed effects models (LMEs) used all available valid trials from all participants, with no participant-level exclusions. The figure-specific threshold (≥25 trials per median-split bin) was applied purely for display clarity, as plotting participants with very few trials per bin would produce unreliable/noisy and thus visually misleading traces (as we note in the figure caption and point readers to Supplementary Figure 1, which shows the same visualization without any exclusions).

      Since the paradigm required participants to repeat discarded trials until 120 valid trials were collected, all participants thus contributed exactly 120 valid trials to the analyses. There was therefore no data loss at the analysis level for the LME that is central to the claims of the manuscript (albeit more complex to grasp than the t-tests between bins).

      Why were there sometimes so little trials per brightness bin?

      First, participants differed in how dark or bright (synesthetic or forced-report) colors were overall, meaning that differing proportions thereof would fall above or below the 0.5 cutoff that overall, well represented the sample (but not necessarily every single participant). Note that this median split was not performed per individual but across all color reports to allow an apples-to-apples comparison.

      Second, participants often reported colors that differed in Hue and Saturation, but not Lightness. This is in line with synesthetes picking certain colors more often than others, as compared with non-synesthetes (Rouw & Root, 2019; Ward et al., 2025).

      We now include a new Supplementary Figure that visualizes responses on the Hue and Saturation dimensions of HSL space for both synesthetes and controls; fully saturated reports appear on the outer edge. We refer to the supplementary figure in the caption of Figure 2 as follows:

      "See Supplementary Figure 1 for color reports on the hue and saturation axes.”

      Rouw, R., & Root, N. B. (2019). Distinct colours in the ‘synaesthetic colour palette’. Philosophical Transactions of the Royal Society B: Biological Sciences, 374(1787).

      Ward, J., Maciel, S., Rouw, R., Simner, J., & Root, N. (2025). Synaesthesia is linked to differences in music preference and musical sophistication and a distinctive pattern of sound-color associations. Psychology of Music, 53(3), 453-473.

      Minor points:

      (1) "Building on this evidence, we hypothesized that the cross modal color phenomenology in synesthesia can, if truly sensory in nature, could likewise be (...)" -> may need rephrasing (can/could).

      Many thanks, fixed.

      (2) Caption of Figure 1: "Block 2 (synesthetes only): a colored disk and gray central patch, matching the average indicated color per digit, and the number and luminance of pixels of said digit were presented to assess externally triggered light responses." -> I find this sentence a bit hard to follow; perhaps consider rephrasing it.

      Agreed, we rephrased to:

      Block 2 (synesthetes only): a colored disk was presented, colored according to the synesthete's average indicated color for that digit. At its center sat a gray patch matching the luminance and pixel area of the original digit from Block 1, together allowing assessment of externally triggered light responses.

      (3) Figure 2 b: Consider truncating the y-axis to 1 if that improves the visualization.

      We adjusted the axis accordingly and added a bit more detail in the caption for the interpretation of the measure.

      (4) Caption of Figure 3 points to "see Supplementary Figure 1", but it should probably be SF2.

      Many thanks for spotting, all references to supplementary figures have been checked and are corrected now.

      Elvio Blini

      Reviewer #3 (Recommendations for the authors):

      (1) As a minor comment, there are some terms that felt overused in the manuscript. For example, the words "extraordinary" and "exceptional" were used multiple times throughout. I believe I understand the authors to mean them in their descriptive sense (i.e., outside the realm of typical experience), but in context, those words make it seem like they are touting their own experiment as "exceptional" or "extraordinary," which I don't believe was their intention.

      We agree. We removed words such as exceptional and extraordinary when they do not directly refer to the sensation throughout the manuscript (which is indeed how we intended to use it). We hope that this removes unnecessary and convoluting hyperbole.

      (2) It seemed counterintuitive to me that the color consistency score would be reverse-coded. In this case, the scores actually seem to indicate inconsistency, rather than consistency. Perhaps the raw scores can be inverted for a more intuitive interpretation that aligns with the terminology. I understand that they were following a previous publication in their method (Rothen et al., 2013).

      This manner of coding is counter-intuitive indeed. However, there are both logical and practical reasons to this approach. Importantly, this is indeed the standard way of reporting color consistency in synesthesia research (Carmichael et al., 2015; Eagleman et al., 2007; Root et al., 2025; Rothen et al., 2013). The calculation is based on a simple logic; a higher number reflects a larger distance in color space. An additional advantage is the clear and intuitive zero- reference: a score of zero implies choosing the exact same color. Finally, it intuitively reflects the distinction between synesthetes and non-synesthetes; there is by definition little variation across synesthetes (visualized at the bottom of the graph), then a 'cut-off line' (if consistency is used as diagnostic tool), and then the height of the range shows how large the range in consistency is, in that particular sample of non-synesthetes. In a way we therefore inherit a confusing definition/standard, but changing it would lead to new confusion instead. We now specifically clarify this in the caption as follows:

      “Note that higher consistency is reflected in lower color distance, hence lower values [17].”

      Carmichael, D.A., Down, M.P., Shillcock, R.C., Eagleman, D.M., Simner, J., 2015. Validating a standardised test battery for synesthesia: does the synesthesia battery reliably detect synesthesia? Conscious. Cogn. 33, 375–385

      Eagleman, D.M., Kagan, A.D., Nelson, S.S., Sagaram, D., Sarma, A.K., 2007. A standardized test battery for the study of synesthesia. J. Neurosci. Methods 159 (1), 139–145.

      Root, N., Chkhaidze, A., Melero, H., Sidoro -Dorso, A., Volberg, G., Zhang, Y., & Rouw, R. (2025). How “diagnostic” criteria interact to shape synesthetic behavior: The role of self-report and test–retest consistency in synesthesia research. Consciousness and Cognition, 129, 103819.

      Rothen, N., Seth, A.K., Witzel, C., Ward, J., 2013. Diagnosing synaesthesia with online colour pickers: maximising sensitivity and specificity. J. Neurosci. Methods 215 (1), 156–160.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This work compiles a comprehensive atlas of ncORFs across mammalian tissues and cell types, derived from reanalysis of ~400 public ribosome profiling datasets. The authors then evaluate cross-species conservation and functional signatures, proposing that evolutionarily ancient ncORFs tend to have higher translation potential, stronger expression, and closer relationships with canonical coding sequences.

      Strengths:

      In general, the study provides a large-scale and timely resource of annotated ncORFs, which could be broadly useful for the community. The authors collected ~400 public ribosome profiling datasets for annotations of ncORFs, which, to my best knowledge, is the largest collection of data for such a purpose. The catalog could facilitate future investigations into ncORF biology and broaden understanding of the coding potential of the "non-coding" genome.

      We thank the reviewer for the positive evaluation of our manuscript and for recognizing the significance of our contribution.

      Weaknesses:

      Based on the ncORF catalog, some of the analyses were not properly done. Some of the results are descriptive.

      (1) Bias and representations of the data source. Public ribo-seq datasets are unevenly distributed across tissues and cell lines, raising concerns about heterogeneity and underrepresentation of certain contexts. This may limit the generalizability of the catalog.

      We agree with the reviewer that the uneven distribution of public Ribo-seq datasets across tissues can inevitably introduce bias in the ncORF composition of our catalog. This bias is likely more pronounced in humans due to the narrower tissue coverage. We have addressed this point in the Discussion section of the revised manuscript.

      (2) The discussion on modular domains of ncORFs is unclear, and the claim that they may originate via TErelated mechanisms is not well supported. Stronger evidence or clearer reasoning is needed.

      We thank the reviewer for highlighting this point. We have revised the manuscript to more clearly explain the rationale behind our analysis of ncORF modular domains and have adopted more cautious language regarding their potential transposable element–related origins, limiting interpretations to what is directly supported by the data.

      (3) The conservation comparisons are not fully convincing. Figure S7 shows only mild differences between ncORFs and CDS, and statistical significance is not clearly demonstrated.

      Comparisons with other non-coding RNAs should be added, and overlapping sequences between ncORFs and CDS should be excluded to avoid bias.

      We thank the reviewer for this comment and apologize for the lack of clarity in the original figure. Both CDSs and ncORFs show significant deviation from zero Gnocchi scores (two-sided Wilcoxon signed-rank tests), which is now stated explicitly in the revised legend and text. CDS-overlapping ncORFs were already excluded in the original analysis; this has been clarified to avoid confusion.

      As suggested, we have added lncRNAs for comparison. ncORFs display modestly higher Gnocchi scores than lncRNAs, and this difference persists when restricting the analysis to lncRNA-derived ncORFs and their corresponding full-length lncRNAs (see revised Fig. S7). These additions strengthen the conservation comparison while controlling for transcript context.

      (4) Figure 3 indicates that some ncORFs are subject to evolutionary constraints. This is not surprising. The authors should provide further analyses on more detailed features of these "conserved" ncORFs vs. the "non-conserved" ones. Some pretty informative works have been done in Drosophila, worms, mice, and humans. Figure 3 suggests some ncORFs are under evolutionary constraint, but this is not unexpected. More granular analyses contrasting "conserved" versus "non-conserved" ncORFs would be informative. In fact, small ORFs, especially uORFs, have been extensively studied for their functions and cross-species conservation. The authors should explicitly show what is new here in their analyses.

      We thank the reviewer for this insightful comment. We agree that cross-species conservation of ncORFs (particularly uORFs) has been extensively investigated in prior studies, including our own.

      However, most prior analyses have focused on conservation of start codons or overall ORF integrity, which does not distinguish selection acting on translational activity from selection acting on the encoded peptide sequence itself. In contrast, our analysis leverages codon-level periodic PhyloP signals across the full ORF. The observed three-nucleotide periodicity is consistent with selective constraint at the amino acid level, rather than merely preservation of initiation sites or translational potential. Furthermore, our newly developed branch-length statistic uncovers lineage-restricted conservation patterns among ncORFs, enabling resolution of evolutionary dynamics not captured by conventional conservation metrics.

      Thus, while the existence of conserved ncORFs is not unexpected, the conceptual advance of our study lies in demonstrating that a subset exhibits coding-like evolutionary constraint consistent with selection on their peptide products, as well as revealing lineage-specific conservation patterns. We have clarified this distinction in the revised Discussion.

      (5) Translation levels are reported using RPF counts. However, translation efficiency (normalized by RNA expression) is a more appropriate measure to account for expression heterogeneity.

      We agree that translation efficiency (TE), which normalizes ribosome footprint counts by RNA abundance, is in principle an appropriate metric. We initially calculated TE and compared ncORFs with CDSs. However, we found that TE estimates for short ncORFs were substantially inflated by RPF enrichment near start and stop codons, leading to unstable and potentially misleading values.

      For CDSs, this bias is commonly addressed by excluding the first and last 10 to 20 codons when quantifying RPF density. This strategy is not feasible for ncORFs because of their short length. We therefore used RPF counts in the final analysis, applying stringent positional filtering. Only RPFs whose P sites fall within the ORF body, excluding start and stop codons, were counted. RPFs overlapping the ORF but with P sites outside the annotated frame, likely derived from adjacent ORFs or initiation or termination pausing, were excluded.

      TE and RPF counts both measure translation but capture different aspects. TE reflects ribosome density relative to transcript abundance, whereas RPF counts quantify overall ribosome engagement. Given the short lengths of ncORFs, count-based quantification provides a more robust and conservative estimate of their translational activity.

      (6) The correlation analyses between ncORF translation levels and PhyloCSF are confusing and largely descriptive. These sections need sharper framing and clearer conclusions.

      We thank the reviewer for this comment. We agree that the original presentation lacked clear framing. The relationship between PhyloCSF scores and mean ncORF translation levels across tissues is influenced by both evolutionary age and tissue specificity. Older ncORFs with higher coding potential tend to exhibit stronger tissue-restricted expression. As a result, their mean translation levels across all tissues appear lower, not because they are weakly translated, but because their translation is concentrated in specific tissues. This point is addressed in the revised manuscript.

      (7) Public ribo-seq datasets, generated by different research labs, are known for their strong batch effects. Representations of tissues and cells are also very unbalanced. Therefore, the co-translation analysis between ncORFs and canonical CDS is not well controlled. This should be done by referring to a recent large-scale ribo-seq meta-analysis (Nat Biotechnol. 2025. doi: 10.1038/s41587-025-02718-5).

      We thank the reviewer for highlighting this important study and for raising concerns regarding batch effects and tissue imbalance in public Ribo-seq datasets. We are aware that public Ribo-seq data generated by different laboratories are subject to substantial batch effects. During the ncORF annotation phase, we applied stringent quality-control criteria to minimize technical variability. For the co-translation analysis, inclusion criteria were relaxed to increase tissue and cell-type coverage. To partially mitigate representation bias, libraries derived from the same tissue or cell type were merged when quantifying ORF translation levels, thereby reducing overrepresentation from heavily sampled contexts.

      Nevertheless, we acknowledge that these measures cannot completely eliminate batch effects or imbalance inherent to public datasets. We agree that co-translation analysis would benefit from uniformly processed, high-quality datasets generated under standardized protocols with balanced tissue representation, representing a valuable direction for future research.

      Reviewer #2 (Public review):

      Summary:

      Chang et al. attempted to analyze a large number of ribo-seq datasets through a standardized pipeline, identifying novel non-canonical ORFs and elucidating their evolutionary and expression characteristics.

      Strengths:

      (1) The datasets analyzed by the authors are sufficiently comprehensive, and the use of standardized pipelines ensures excellent analytical consistency.

      (2) Their analyses of ORF evolution and co-expression further deepen our understanding of these ORFs.

      We thank the reviewer for the positive evaluation of our manuscript. It is encouraging to know that the analytical framework was found to be sound and appropriate.

      Weaknesses:

      (1) The authors primarily conducted analyses through bioinformatics, lacking sufficient wet-lab experimental evidence.

      We thank the reviewer for this comment and acknowledge this limitation. We agree that functional validation through wet-lab experiments would provide important mechanistic insight into individual ncORFs. However, this study was designed as a systematic, genome-wide computational analysis to characterize translated ncORFs across species and tissues. Our objective was to define global patterns of translation, conservation, and structural features using large-scale datasets. Given the breadth and scale of these analyses, experimental validation of specific ncORFs falls beyond the scope of the current study. We have clarified this point in the dicussion and noted that our results provide a framework for future targeted experimental investigation.

      (2) Regarding the evolution of non-canonical ORFs, a considerable amount of prior work already exists. The authors need to further clarify what new insights and discoveries they have made based on the analysis of such a large dataset.

      We thank the reviewer for this suggestion. Similar concerns were also raised by Reviewer #1. In response, we have revised the Discussion to more clearly delineate the conceptual advances enabled by our large-scale dataset.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Several aspects of the downstream analyses would benefit from additional refinement. The heterogeneity and tissue imbalance inherent in public Ribo-seq datasets introduce potential biases in ncORF detection and inferences about co-translation. Given the breadth of the dataset, it would also be informative to quantify how consistently the newly identified ncORFs are detected across samples-distinguishing those observed broadly across tissues, those enriched in specific contexts, and those detected in only a few datasets. Such stratification would help differentiate reproducibly translated ORFs from candidates requiring further validation.

      We thank the editor for the helpful comments. We agree that heterogeneity and tissue imbalance in public Ribo-seq datasets can influence ncORF detection and downstream interpretations. We have added discussion of this limitation in the revised manuscript.

      Detection of ncORF translation depends not only on biological activity but also on sequencing depth and data quality. Although all ncORFs reported here were reproducibly identified by multiple methods across independent libraries, we agree that those detected in a larger number of datasets represent stronger candidates for functional validation. Accordingly, we now report the number of methods and libraries in which each ncORF was detected in the final catalog (Supplementary Table 3). Overall, 22.3–26.3% of ncORFs were detected in more than 10 libraries, whereas more than half were observed in only two to five libraries (Fig. S1B), enabling clearer stratification of broadly translated versus more context-specific candidates.

      Some evolutionary and functional interpretations are largely descriptive or consistent with established findings for small ORFs, and the authors should more clearly articulate what is novel in their analyses. The criteria separating "young," "old," and "ancient" ORFs require clearer definition, and conservation analyses would be strengthened by improved statistical rigor and explicit exclusion of regions overlapping annotated coding sequences. Evidence for modular domain features or transposable element-related origins is limited and warrants either stronger support or more cautious framing. Proteomics validation is currently minimal and could be substantially reinforced using existing public MS resources.

      We thank the reviewer for these constructive comments. In the revised manuscript, we more clearly delineate the novel insights derived from our evolutionary analyses of ncORFs, distinguishing them from established findings on small ORFs.

      We have clarified the criteria used to classify ORFs by evolutionary age in figure 6E and refined the terminology describing “young,” “old,” and “ancient” categories to ensure precise definition. The conservation analyses have been strengthened through more rigorous statistical treatment and by explicitly excluding regions overlapping annotated coding sequences.

      With respect to modular domain features and potential transposable element–related origins, we have adopted more cautious language and limited our interpretations to what is directly supported by the data. Finally, we acknowledge that current proteomic validation remains limited and have clarified this point in the manuscript while outlining the potential for future integration of large-scale public mass spectrometry datasets in Discussion.

      The authors additionally report an interesting observation that many ncORFs on mRNA co-translate with the main CDS of the same gene. Because canonical models often posit that uORF translation suppresses downstream CDS translation, further analysis would be valuable. In particular, it would be useful to determine whether patterns of co-translation differ among ORF types or evolutionary categories and to discuss possible regulatory mechanisms underlying these relationships.

      We thank the editor for this thoughtful comment. As noted in our response to Reviewer #2, uORF–CDS co-translation does not contradict the canonical model in which uORFs repress downstream CDS translation. Co-translation reflects concurrent ribosome occupancy, whereas repression concerns the fraction of initiating ribosomes that ultimately reach and translate the CDS. Following the editor’s suggestion, we further examined whether co-translation patterns differ across ORF types or evolutionary categories. We found that ncORFs co-translating with their corresponding main CDSs are predominantly uORFs. However, these uORFs do not show statistically significant differences in conservation metrics or evolutionary age compared with other non-overlapping uORFs. Thus, we did not detect clear subtype- or age-specific distinctions among co-translating ncORFs. We have clarified these analyses in the revised manuscript.

      Addressing these points would enhance the precision, interpretability, and robustness of the study's conclusions.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors developed and refined a standardized pipeline to analyze nearly 400 ribo-seq datasets, identifying over 10,000 novel non-canonical ORFs in both human and mouse samples. Given the scale of this analysis, it is intriguing to consider how many of the newly identified non-canonical ORFs are consistently detected across multiple sample types (conservatively expressed ORFs), how many are restricted to specific tissues/ or tissue-specific ORFs), and how many were detected in only a single or very few samples (ORFs requiring further validation). Providing these data could offer new insights into understanding ORF translation.

      Thanks for this constructive suggestion. This information has been presented in the revised Supplementary Table 3 and in a newly added supplementary figure (Fig. S1B), which together provide a clearer overview of ncORF detection consistency and context specificity.

      (2) The authors' validation of MS data lacks specific details in the paper. Regarding the MS-supported ORF mentioned in Lane 117, which dataset's MS data is being referenced? Or does it refer to the content in Reference 20? At present, substantial research exists in both public general proteomics studies (e.g., CPTAC) and MS investigations targeting non-canonical ORFs. We recommend the authors incorporate additional MS data or public MS-based databases to strengthen validation in this area (PMID: 34129944, 39794466, 37823596,39413795).

      We thank the reviewer for this comment and for the helpful suggestions. The MS-supported ORFs mentioned in line 117 refer to the compilation reported in Reference 20, which integrates evidence from multiple independent proteomics studies. In addition, we examined MS-supported ORFs curated by GENCODE and PeptideAtlas, which are shown in Fig. 1E.

      We agree that incorporating additional MS datasets would further strengthen validation of ncORFs. Studies cited by the reviewer and recent community efforts such as the GENCODE and PeptideAtlas analyses (PMID: 39314370) provide valuable examples in this direction. However, performing a comprehensive reanalysis of more than 95,000 public human MS runs is computationally demanding and currently infeasible for our group given resource and funding constraints.

      To our knowledge, ongoing community-wide initiatives are working toward more comprehensive catalogs of translated human ncORFs. Large-scale, exhaustive MS searches will be particularly effective once a community consensus annotation framework for ncORFs is established. We have added discussion of these limitations and future directions in the revised manuscript.

      (3) The authors classified ncORFs into three groups-"Ancient," "Young," and "Old"-based on their origin nodes. However, both the "Young" and 'Old' groups appear to be "mammalian-specific," yet the specific criteria for their division remain unclear. It is recommended to more clearly define in the figure legend or main text how "Young" and "Old" are categorized (e.g., based on specific evolutionary nodes or distance thresholds from nodes to the end) to avoid reader confusion.

      In Fig. 5, “old” and “young” were intended as qualitative descriptors of relative evolutionary age based on the position of ncORF origination nodes along the phylogeny, as indicated on the x-axis. They were not meant to represent discrete categories. To avoid confusion, we have revised the manuscript to use “older” and “younger” throughout when referring to relative age differences. A binary classification is used only in Fig. 6E, where ncORFs are grouped into ancient (pre-mammalian) and younger (mammalian-specific) categories. This distinction is clearly defined in both the main text and the corresponding figure legend.

      (4) The authors observed an intriguing phenomenon: ncORFs on mRNA tend to co-translate with the main CDS of the same gene. However, the conventional view holds that uORF translation often inhibits the translation of the main CDS. I suggest the authors could refine their analysis in this section further. For instance, do different types of ORFs or ORFs at different evolutionary levels exhibit distinct levels of cotranslation with the main CDS? Additionally, while observing this phenomenon, the authors should also propose hypotheses regarding the regulatory mechanisms involved in these processes.

      We thank the reviewer for these constructive suggestions. After excluding CDS-overlapping ORFs, we identified 258 human and 128 mouse ncORFs that co-translate with their corresponding main CDSs. With the exception of 10 human dORFs, all remaining cases were uORFs. We compared these cotranslating ncORFs with other non-overlapping uORFs and dORFs but did not detect statistically significant differences in evolutionary age and conservation metrics. Because no clear distinguishing features emerged, we did not include these results in the manuscript.

      Importantly, the observation of uORF–CDS co-translation does not contradict the established repressive role of uORFs. Co-translation reflects concurrent ribosome occupancy, whereas repression concerns the proportion of initiating ribosomes that ultimately translate the CDS. For example, if two ribosomes initiate within a given interval and one translates the uORF while one translates the CDS, CDS output is reduced by 50% relative to a uORF-free transcript. If four ribosomes initiate under the same repressive regime, two may translate the uORF and two the CDS. In this case, absolute translation of both ORFs increases, while the fractional repression remains unchanged. Thus, co-translation is compatible with a regulatory model in which uORFs reduce CDS translation efficiency without abolishing it. This has been clarified in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) All outcomes are attributed specifically to L6b neurons, but the genetic manipulation is not specific to L6b neurons. The authors acknowledge this as a limitation, but in my view, this global manipulation is more than a limitation - it affects the overall interpretations of the data. The Hoerder-Suabedissen et al., 2018 paper shows sparse, but also dense, expression of Drd1a+ neurons in brain regions outside of the L6b. Given this issue, the results are largely overstated throughout the paper.

      We appreciate the reviewer’s careful reading and concern that some of our statements may have overstated the implications of our data. The Drd1a Cre mouse model used (FK164) has a relatively selective expression of Drd1a Cre in cortex, but indeed some expression is seen subcortically. This is an acknowledged limitation which is now explicitly addressed in the revised manuscript.

      (2) It is not clear to me that the "silencing" of Drd1a+ neurons was verified.

      In our previous publications, we showed confirmation of the loss of regulated synaptic vesicle release from the Cre-positive neuronal population (Marques-Smith et al., 2016; Hoerder-Suabedissen et al., 2018; Messore et al., 2024). This has now been described in the revised manuscript.

      (3) There were various discrepancies (and potentially misattributions) between the stated significant differences in Supplementary Table T1 data and Figure 3a & S2 spectral plots. This issue makes it difficult to effectively evaluate the main text and stated outcomes.

      We thank the reviewer for their careful attention to the statistical analyses and for noting the inconsistencies in how the results of the spectral analysis were presented: in the text we described two-way ANOVAs with according posthoc tests but in the figures significance markers were positioned based on multiple t tests. We have now carefully revised the spectral results and implemented a consistent approach in statistical reporting and spectral plots. We have updated Supplementary Table T1, Figure 3a and S2 to ensure that all statistics are presented consistently throughout the manuscript, i.e. with two-way ANOVAs and accompanying posthoc tests. Please note that we performed all spectral analyses in the range between 0.5 and 128 Hz (excluding the range between 49-51.5 Hz due to electrical noise from the power grid) but only plot the range between 0.5-30 Hz as the spectral bands most relevant for sleep neurophysiology are contained in this range.

      Related, the authors stated that post hoc comparisons of EEG spectral frequency bins were not corrected for multiple testing. Instead, significance was only denoted if changes in at least two consecutive frequency bins were significant. However, there are multiple plots in which a single significance marker is placed over an isolated bin (i.e., 4c, 6, S5, S6). Unless each marker is equivalent to 2 consecutive frequency bins, these markers should be removed from the plots. Otherwise, please define the frequency and size of these markers in the main text.

      In line with the previous comment, we have adjusted markers to reflect the results from posthoc tests after two-way ANOVAs.Please note that Figure 6 and the related supplementary figures S5 and S6 have now been removed from the manuscript, as careful re-analysis indicated that the sample size was too low to support a strong conclusion regarding the comparison of orexin effects between genotypes. We stated in the text that we would only include posthoc significance when at least two consecutive bins were significant, but this was indeed not supported in our figure, where each marker reflects one 0.25 Hz bin. We have now adjusted our code to ensure that only markers are plotted when at least two consecutive bins are significant in bin-wise posthoc comparisons.

      (4) A rainbow color scale, as in Figure 3, we've now learned, can be misleading and difficult to interpret. The viridis color scale or a different diverging color scale are good alternatives.

      Thank you for pointing this out, we have adjusted the colour scale.

      (5) How much time elapsed between vehicle/orexin A & B infusions?

      There were 2-4 non-infusions days between infusions. We have added this information to methods.

      (6) For Figure 6, there are statistical discrepancies between the main text and the plots (pg. 10):

      (a) The text claims post hoc differences for relative ORXA frontal EEG, but there are no significance markers on the plot.

      (b) The text states that there were no post hoc differences for the relative ORXA occipital EEG, but significance markers are on the plot.

      (c) The main test for the relative ORXB frontal EEG was not significant, but there are post hoc significance markers on the plot.

      (d) For relative ORXB occipital EEG, there are significant markers on the plot outside of the stated range in the text.

      We agree with the reviewer, and we decided to exclude this figure from the manuscript as the sample size for some key comparisons was too low to support any strong conclusions and therefore presenting this analysis is potentially misleading. We explain the rationale for excluding this analyses in the revised manuscript.

      (7) Some important details are only available in figure captions, making it difficult to understand the main text. For example, when describing Figure 3c in the main text on page 7, it is not clear what type of transitions are being discussed without reading the figure caption. Likewise, a "decrease," "shift," and "change" are mentioned, but relative to what? Similar comment for the EEG theta activity description on pages 7 - 8. Please add relevant details to the main text.

      We have adjusted the wording in the main text to reflect more precisely which comparisons are shown in the figures.

      (8) Statistical comparisons for data in Figure 3e, post hoc analyses for data in Figure S7a-b REM data, and post hoc analyses for Figure S7c (not b) occipital EEG should be included to support differences claims. Please denote these differences on the respective plots.

      Please note that the previously named Supplementary Figures S5 and S6 have been removed from the manuscript, and that the Supplementary Figure S7 in this comment refers to the figure currently named Supplementary Figure S5.

      We have added the statistical comparisons for Figure 3e, Supplementary Figure S5A and Figure S5b to the results section. In Figure S5c, there was an overall genotype difference, but there was no significant time x genotype interaction, so we have not performed posthoc tests and did not plot posthoc significance markers for this figure. We have adjusted the wording in the results section to make this clearer. We have adjusted the reference to the figure S5c which was incorrect, thank you for your careful attention.

      (9) In the subsection titled "Layer 6b mediates effects of orexin on vigilance states (pg. 8)," there does not seem to be any stated differences between control and L6b silenced mice. A more accurate subtitle is needed.

      We agree with the reviewer and the title of this sub-section has now been changed accordingly.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Although the authors used a highly selective approach to silence layer 6b neurons, the observed changes in EEG oscillations cannot be solely attributed to layer 6b neurons because of the ICV route for orexin administration.

      We thank the reviewer for this important comment. The ICV route of orexin administration cannot guarantee that only cortical Drd1a-Cre–expressing neurons are reached by orexin, and the Drd1a-Cre driver line is highly selective but not entirely specific for layer 6b neurons (see also response to reviewer #1, comment 1). We have therefore changed the wording of the stated effects and addressed this consideration in the Limitations section of the manuscript. Please note that, as mentioned above, Figure 6 has now been excluded from the manuscript.

      (2) The rationale for using only male rats is not provided.

      We thank the reviewer for highlighting this omission. We now provide the rationale for using only male mice in the methods section as follows: “In the current study, only male mice were used, because our experimental protocol precluded the possibility of accurately monitoring the oestrous cycle, which has marked effects on brain activity, arousal and vigilance states. We therefore decided to use male mice only for the current study but are planning to use both sexes in future work.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Better descriptions of L6b connectivity will improve clarity in the second paragraph of the Introduction (pg. 3). For example, it is not explicitly stated that L6b projects to L5 before the authors describe L5. Therefore, the L5 description seems irrelevant.

      We thank the reviewer for this request for clarification. We mention the connectivity between L6b and L5 because L5 pyramidal neurons have recently been found to play a key role in sleep-wake regulation (Krone et al., Nat. Neurosci. 2021; Honjo et al., 2025; Wasilczuk et al, 2025; Krone et al., 2025). We have now amended the corresponding section of the introduction to emphasise the potential functional relevance of this connection as follows:

      “L5, the major output layer of the cortex, is also bidirectionally communicative with higher order thalamic nuclei (Hoerder-Suabedissen et al., 2018) as well as layer 5 pyramidal neurons (Zolnik et al., 2024). Since several subtypes of L5 pyramidal neurons have recently been shown to play important roles in distinct aspects of sleep-wake regulation (Krone et al., 2021, 2025; Hong et al. 2023; Wasilczuk et al. 2025; Honjo et al., 2025; Chouafeev et al., 2025); depth of anaesthesia (Wasilczuk et al. 2025), and the influence of stress on sleep (Chouafeev et al. 2025) the projections of orexin-sensitive L6b to L5 pyramidal neurons may be a key circuitry in the top-down regulation of brain states.”

      (2) There are plots where the y-axis tick label appears to be offset from the tick mark (4a, S5b, S6a).

      Thank you for spotting this graphical issue. We have removed the y-axis tick labels from Figure 4a to avoid confusion. Please note that we decided to remove Figure S5 and Figure S6, because after careful re-analysis we concluded that the group size was too small to draw conclusions on orexin spectra and that any results could be potentially misleading.

      (3) The 2-h time constant, I believe, is depicted in Figure 4H (not 4G).

      Thank you for spotting this. We have corrected the figure legends accordingly and double-checked that Figure 4G depicts the 2-h time constant and Figure 4H the 6-h time constant.

      (4) "...although there was an indication of a higher absolute theta-peak power in layer 6b silenced mice (Figure S6)," pg. 10. It is not clear to me how the data lead to this conclusion.

      Thank you for identifying this inconsistency, which resulted from a preliminary statistical analysis subsequently corrected. We have now improved the statistical analysis of spectral data (for more details see comments to both reviewers in public response) and removed this statement, which in fact is no longer supported by the data.

      (5) Exclusion of female mice is not listed as a limitation.

      We now discuss this limitation as follows:

      “In the current study, only male mice were used, because our experimental protocol precluded the possibility of accurately monitoring the oestrous cycle, which has marked effects on brain activity, arousal and vigilance states. We therefore decided to use male mice only for the current study but are planning to use both sexes in future work.”

      (6) A brief description of why Cplx3 and Tbr1 antibodies are being used will be helpful to include in the Methods (pg. 21) in addition to what is in the figure caption.

      We have added the following information to the methods section to clarify why we used these two antibodies: “rabbit α-Cplx3 to distinguish between L6a and L6b” “mouse α-Tbr1 to identify the L5-6 boundary”

      (7) Including a label/title for the Figure 2c spectral plots will be helpful. It is not immediately clear if these are light period & dark period data or frontal & occipital data.

      Thank you for pointing this out, we have updated the figure legend to clarify what is shown on this Figure

      Similar comments for S2 and S3a plots. Including a state label on the plots will be helpful in addition to the caption description.

      We have now added the state labels for Figure panels S2 and S3a for improved clarity.

      Reviewer #2 (Recommendations for the authors):

      This is a soundly conducted and well-written study that enhances our understanding of the cortical control of states of consciousness. I do not have any major concerns, but would like the authors to consider some alternate possibilities as suggested in my comments below:

      We thank the reviewer for this positive assessment of our manuscript and the helpful suggestions.

      (1) Given that the inactivation of layer6b neurons did not affect the time spent in sleep-wake states, to me it appears that these neurons likely have a role in creating the background neural conditions/oscillations supportive of an activated state rather than a direct role in behavioral state control.

      We completely agree with the reviewer and have made the wording more consistent throughout the manuscript, now using “brain state control” rather than “behavioural state control” to clarify that the main effect observed in the L6b-silenced mouse model is a change in spectral characteristics reflecting brain oscillations, rather than effects on vigilance states, which were modest.

      (2) Does the observed shift in REM sleep-related theta-peak frequency in the occipital derivation suggest changes in local neural processes, or could it be just a matter of better signal detection because theta is most prominent at or around the hippocampal region, which is approximately the location of occipital electrodes in this study.

      The source of the shift in REM sleep–related theta peak frequency in the occipital derivation cannot be established with EEG recordings alone. Additional intracortical or intrahippocampal recordings would be necessary to distinguish between the two possible explanations proposed by the reviewer. We have discussed this further in the revised manuscript.

      (3) Orexinergic system innervates multiple subcortical sites and widely covers the cortex too, because of which the effect of ICV orexins cannot be attributed to just layer6b neurons as described in the manuscript ("Layer 6b mediates effects of orexin on brain activity.").

      We agree with the reviewer that this is a limitation. We have now adjusted the subtitle of the paragraph describing the results from the ICV administration of orexin and further mention this important consideration in the ‘limitations’ section of the discussion.

      (4) While the current study is focused on sleep-wake mechanisms, the findings reported here have much broader implications for behavioral and/or brain state arousal and provide a mechanistic bridge between different states of consciousness, including general anesthesia. Therefore, the authors may consider tying these findings with the recent work on the role of the prefrontal cortex in arousal from general anesthesia and slow-wave sleep (PMID: 35436248, PMID: 29937348, PMID: 33328847).

      We thank the reviewer for this excellent recommendation. We are now citing these papers in the revised manuscript.

      (5) It's up to the authors, but I do not see the need for the section on Clinical Implications. It's very speculative, and it makes the entire discussion section heavy.<br />

      We have considerably shortened the discussion of potential clinical implications to make the manuscript more concise.

      (6) Figure 1: It's difficult to compare the EEG power the way figures are set up right now. I think it would enhance clarity if the authors separate the plots based on state and show power from the control and silenced neuronal group in the same plot. Also, the colors are too similar (essentially a shade of green/blue) to provide effective visual resolution. This is especially true in panel d. Please consider changing the color scheme.

      This comment seems to refer to Figure 2 and subsequent figures with analysis of vigilance states and EEG spectra (Figure 1 contains histological images). We have selected the colour scheme for colour-blind individuals. Therefore, the main difference is in the saturation, not the colour of the plots. We have tested the visibility of the colour scheme on a high-resolution screen with the original image files and can reassure the reviewer that the genotype differences, which are slightly blurred in the reduced-resolution figures provided within the combined text file for the review process, are easily distinguishable in the final figure quality.

      (7) I don't understand the y-axis scale in Figure 1. How can this be 500% and if it is, then 500% of what?

      This comment also seems to refer to the analysis of slow wave activity (SWA) in Figure 2 rather than to Figure 1 (histology figure). The percentage of SWA is normalised to the average SWA across the recording. Since NREM sleep is characterised by considerably higher SWA than wakefulness and REM sleep, the level of SWA during NREM sleep is in the range of 200-300%, and can be even higher after long wake episodes which are followed by a rebound of NREM sleep SWA. Hence, the upper limit of the y-axis in these (and subsequent) plots of SWA is 500% (of the average SWA). We have amended the figure legend to clarify that SWA is presented here as percentage of average SWA across the recording.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      In this potentially valuable computational study, the authors conducted atomistic and coarsegrained simulations to probe the temperature-dependent phase behaviors of ELF3, a disordered component of the evening complex in plant. The results aim to highlight the role of polyQ tracts in modulating the temperature sensitivity. The level of evidence is considered incomplete, due to the lack of systematic calibration of the coarse-grained model and limited statistical uncertainty analysis, especially considering the relatively subtle nature of the differences due to temperature change.

      We agree that the subtle temperature dependence of ELF3-PrD condensation requires rigorous uncertainty reporting and careful interpretation of CG predictions. In the revised manuscript we therefore (i) report mean ± SEM across independent replicas for all CG observables and provide full time series in the Supplementary Information, and (ii) expand our CG analysis beyond cluster counting to include condensate stability (size), lifetime, internal mobility (D, α), dynamic heterogeneity (van Hove), and structural descriptors (anisotropy, singlechain compaction/density). These additions strengthen the robustness of the conclusions and even enable physical explanations of recent experimental measurements on ELF3-PrD condensates.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript explores the role of the Evening Complex (EC), specifically focusing on ELF3, a disordered protein component of the EC, and its temperature-dependent phase behavior. The study highlights the role of polyQ tracts in modulating temperature-sensitive condensate formation and provides a combination of computational approaches, including REST2 simulations and coarse-grained Martini simulations, to investigate how polyQ tract length and sequence context influence this behavior.

      Strengths:

      The study addresses a key question in plant biology - how temperature influences circadian clock-mediated growth regulation through protein phase behavior. The manuscript introduces the novel finding that polyQ tract length modulates the temperature-dependent formation of helices and condensates.

      Weaknesses:

      (1) Coarse-Grained Simulation Results Not Supported by Data:

      The results presented in Figure 6A of the manuscript do not seem to show a clear trend in the number of clusters formed as a function of polyQ tract length. This is particularly evident in the comparison between 0Q and 7Q polyQ lengths, which display statistically similar values in terms of the number of clusters. The lack of distinction between these values raises questions about the sensitivity of the coarse-grained simulations to polyQ tract length, which the authors claim as a key modulator of condensate formation. This discrepancy weakens the argument that polyQ length directly impacts the clustering behavior in the simulations.

      Suggested Analysis:

      A more detailed statistical analysis should be performed to assess whether the observed differences between polyQ lengths are significant. This could involve hypothesis testing or the use of error bars in the graphs to better communicate the variability in the data.

      Additionally, the authors should examine whether there are other features, such as cluster shape or internal structure, that might differentiate between different polyQ lengths, even if the total number of clusters is similar.

      We agree that the number of clusters in Fig. 6A does not show a strong or monotonic dependence on polyQ length (e.g., 0Q vs 7Q can overlap within uncertainty). The cluster number is highly sensitive to coarsening kinetics and rapidly approaches a late-time plateau, and therefore is not our primary discriminator of variant-dependent condensation behavior.

      To address the reviewer’s request for statistical rigor and additional differentiating features, we have revised the analysis in two ways. First, we now report mean ± SEM across independent replicas for all key CG observables and provide full replicate time series in the Supplementary Information to make variability and convergence/coarsening explicit.

      Second, we shift our main CG conclusions away from “cluster number” and toward more diagnostic observables of condensate robustness and material state, including: (i) stability via the late-time mean largest-cluster size, (ii) persistence/lifetime via the fraction of frames with largest cluster size greater than 50, (iii) internal dynamics via MSD-derived DDD and anomalous exponent ααα, (iv) dynamic heterogeneity via self van Hove distributions relative to a Gaussian reference, and (v) morphology/internal structure via κ<sup>2</sup> and Rg distributions.

      Notably, the κ<sup>2</sup>/Rg distributions are broadly overlapping at 300 K, indicating that in our system variant differences are expressed more strongly in stability/persistence and internal dynamics (D/α/van Hove) than in a large shift in single-chain compaction at this temperature.

      This revised framing also aligns our interpretation with the experimental picture put forward by Huntin et al -- polyQ length modestly affects onset-like behavior but more strongly tunes condensed-phase regimes and dynamics.

      Relevant revisions have been made in the Results and the Discussion sections.

      (2) Inconsistency in Cluster Size Across Temperatures (Figure 6B):

      The results in Figure 6B show a striking difference in the size of the largest cluster between temperatures of 290K and 300K. This abrupt shift in behavior lacks a clear mechanistic explanation. Typically, phase transitions driven by temperature are more gradual, unless there is some underlying structural or chemical shift that the authors have not accounted for. Without a clear explanation, this sudden change in behavior reduces confidence in the simulation Results.

      Suggested Analysis:

      The authors should explore possible explanations for the dramatic difference in cluster size between 290K and 300K. For example, they could investigate whether specific interactions (such as the breaking or formation of hydrogen bonds or hydrophobic contacts) might explain the behavior at higher temperatures.

      It is important to check whether the coarse-grained simulation model has been adequately parameterized and scaled for accurate temperature dependence. Atomistic simulations of monomers and dimers with varying polyQ tract lengths could be used to fine-tune the coarsegrained model, ensuring it accurately reflects molecular behavior. The gross estimate of a 10% scaling factor might be insufficient and could lead to inaccurate representations of cluster formation.

      We agree that the apparently sharp change in largest-cluster size between 290 K and 300 K requires clearer interpretation. In the revised manuscript, we clarify that this behavior does not imply an abrupt thermodynamic phase transition; rather, in a finite (~100-chain) simulation box, the largest cluster size is sensitive to both (i) proximity to a coexistence boundary and (ii) coarsening kinetics. Consistent with this, all systems rapidly coarsen early and then approach a late-time plateau, so the dominant cluster size can change steeply when conditions shift the balance between one system-spanning droplet versus multiple long-lived subclusters.

      To distinguish “true loss of condensation” from “differences in coarsening state,” we added replica-averaged stability and persistence metrics (mean ± SEM) and full time series. Importantly, the condensate lifetime (fraction of frames with largest aggregate-population > 50) is ~1 at both 290 K and 300 K, indicating that both temperatures correspond to a persistently condensed regime, not intermittent nucleation/dissolution. We therefore interpret the smaller dominant cluster at 290 K as reflecting slower coarsening / stronger kinetic arrest, where reduced chain mobility delays merger/annealing into a single large droplet within the simulated time window, leaving a larger satellite/dispersed population despite sustained condensation.

      We further support this interpretation with mechanistic and dynamical analyses added in the revision. As temperature increases from 290 K to 300 K, we observe increased internal mobility (higher effective diffusivity, D) that would accelerate rearrangements and coalescence. In parallel, contact/desolvation analyses show progressive loss of protein-water contacts and gain of protein-protein contacts as clusters mature, and a residue-resolved comparison indicates net contact increases at 300 K relative to 290 K concentrated in aromatic-rich “sticker” regions, consistent with a strengthened intermolecular contact network that promotes more complete annealing at 300 K.

      (We address the reviewer’s points regarding Martini temperature scaling/parameterization together with points (3)-(4) below.)

      (3) Scaling of Coarse-Grained Model with Atomistic Simulations:

      As mentioned, the coarse-grained model used in the study may not have been properly scaled against atomistic data. A simple scaling factor of 10% may not be appropriate for accurately capturing the behavior of polyQ tracts across different lengths, especially considering their sensitivity to subtle changes in temperature. Without rigorous validation against atomistic simulations, the coarse-grained model's predictions could be skewed.

      (4) To address this, the authors should compare the coarse-grained model with atomistic simulations of monomeric and dimeric forms of ELF3 with different polyQ tract lengths. By comparing key structural parameters (e.g., radius of gyration, contact maps, and clustering propensity), the authors could adjust the coarse-grained model to more accurately reflect the atomistic behavior. The authors have wealth of atomistic simulation data that could afford such benchmarking and identification of scaling factor

      Additionally, the authors should investigate whether the assumed scaling factor of 10% is appropriate for each polyQ length or whether it needs to be refined based on specific properties, such as the number of hydrophobic interactions or secondary structure stability.

      We agree that temperature-dependent CG predictions must be interpreted carefully and that the interaction balance should be justified. In the revision, we therefore clarify both our calibration choice and the scope of inference.

      We use Martini 3 with a single, literature-motivated adjustment: protein-water Lennard-Jones interactions are strengthened by 10 percent, following an established strategy shown to improve IDP/multidomain protein behavior in Martini 3. This scaling is applied uniformly to all residues, polyQ lengths, and temperatures to avoid introducing construct-specific parameters and to preserve a controlled comparison across variants.

      We emphasize that our CG simulations are used in a comparative manner (how stability/dynamics/structure change with temperature and polyQ length under a fixed model), and we do not claim a quantitatively exact phase boundary or transition temperature for ELF3. In this spirit, and consistent with how Martini 3 has been used in prior work to probe thermally varying properties across temperature windows (while acknowledging documented limits to temperature transferability), we treat the temperature sweep as a comparative probe rather than an absolute calibration (https://doi.org/10.1063/5.0221199, 10.1021/acscentsci.5c00755, https://doi.org/10.1038/s41592-021-01098-3). Accordingly, we report replica uncertainty (mean ± SEM) for all CG observables and restrict conclusions to qualitative trends that are robust to replicate variability.

      Finally, while we do not undertake a full ELF3-specific reparameterization, we include qualitative checks linking atomistic and CG behavior: the CG model reproduces the same qualitative features of single-chain reorganization inferred from atomistic simulations — notably the radiusof-gyration (Fig. S8) and the rearrangement/exposure of aromatic “sticker” regions that correlate with strengthened intermolecular contacts in the condensate. We emphasize that these comparisons are intended as qualitative sanity checks on trend direction, not as a quantitative validation or calibration of an absolute phase boundary.

      (5) Lack of Analysis for Liquid-Like Behavior in Phase Separation:

      The simulations presented in the manuscript do not analyze the liquid-like behavior of ELF3 condensates, which is a key characteristic of liquid-liquid phase separation (LLPS). In LLPS systems, condensates are often dynamic, with chains exchanging between clusters, indicating liquid-like rather than solid-like behavior. The authors fail to probe this crucial aspect, which is necessary to support the claim that ELF3 undergoes phase separation.

      Suggested Analysis:

      The authors should conduct additional analyses to probe the liquid-like nature of the clusters formed by ELF3. One approach would be to analyze the dynamics of chain exchange between clusters, measuring how frequently chains leave one cluster and join another over time. This analysis would reveal whether the condensates behave as liquid- like, dynamic structures or more static, solid-like aggregates.

      Additionally, the temperature dependence of these exchange dynamics should be investigated. In true liquid-liquid phase separation, the rate of chain exchange is often sensitive to temperature. Observing how this rate changes between 290K and 300K, for instance, could help explain the abrupt shift in cluster size seen in Figure 6B.

      The authors should also analyze whether the internal structures of the condensates are consistent with a liquid-like phase. For example, radial distribution functions and contact lifetimes could be calculated to reveal whether the clusters exhibit liquid-like organization.

      We thank the reviewer for highlighting that liquid-like behavior is a key diagnostic for LLPS. We agree that our original manuscript did not explicitly quantify condensate material properties. In the revision, we therefore add several complementary analyses and figures that directly probe whether the condensed state in our simulations is liquid-like versus dynamically arrested, and how this depends on temperature and polyQ length.

      (i) Condensate persistence vs temperature (stability and lifetime).

      We now quantify two replica-averaged metrics with uncertainty (mean ± SEM): (a) stability, defined as the mean largest-cluster size over a late-time analysis window, and (b) lifetime, defined as the fraction of frames in which the dominant cluster exceeds a fixed size threshold. These results are shown in the new figures “Stability (Mean cluster size)” and “Lifetime (P [size > 50])”. In our system, both 290 K and 300 K correspond to a persistently condensed regime (lifetime ≈ 1 across variants), whereas at 340 K the lifetime drops substantially (≈0.3-0.5 depending on variant), indicating intermittent condensation / partial dissolution at high temperature. This directly demonstrates temperature-dependent persistence of the condensed state and clarifies that the key qualitative change at high temperature is loss of stability and intermittency, rather than a purely static cluster-size difference.

      (ii) Internal mobility and viscoelasticity (D and α).

      To probe liquid-like dynamics within the condensed state, we compute internal Mean squared displacement (MSD) and extract an effective internal diffusivity D(T) and anomalous exponent α(T) (new figures FIG X). In our system, D increases systematically with temperature for all variants, confirming that internal rearrangements accelerate at higher temperature. At the same time, α remains strongly subdiffusive (α ≈ 0.3-0.5), indicating constrained, non-Fickian motion rather than simple liquid diffusion. Importantly, we also observe variant-dependent mobility: around 300-320 K, 0Q exhibits markedly lower D than 19Q, consistent with stronger kinetic arrest in 0Q even when both variants are condensed. Together, these dynamics metrics show that our condensates are not ideal liquids, but instead occupy a viscoelastic / dynamically slowed regime with clear temperature dependence.

      (ii) Dynamic heterogeneity (self van Hove).

      We additionally compute the self van Hove displacement distributions (Fig. SX). In our system, the van Hove distributions deviate from a Gaussian reference matched to the MSD, with an excess of near-zero displacements relative to a simple Gaussian model. This non-Gaussian displacement statistics is consistent with heterogeneous/caging-like dynamics inside the condensed phase, further supporting a viscoelastic (gel-like) rather than purely liquid material state at the timescales accessible to simulation.

      (iv) Internal structure and morphology (Rg and anisotropy).

      Finally, we add structural descriptors as requested. The new Rg distribution and shape anisotropy (κ<sup>2</sup>) violin plots quantify single-chain compaction and heterogeneity in morphology within the condensed phase. In our system these structural distributions are broadly overlapping at 300 K, indicating that differences among variants are more strongly expressed in dynamics (D/α/van Hove) and stability/lifetime, rather than in a large change in single-chain compaction at this temperature. We report these results transparently and include them in the SI as additional mechanistic context.

      We now explicitly frame our CG condensed phases as viscoelastic/dynamically slowed condensates rather than assuming fully liquid droplets. This interpretation is consistent with experimental observations on ELF3 PrLD that report very slow recovery/gel-like behavior under some conditions (i.e., condensates can age into low-mobility hydrogel states).

      (6) Lack of justification of polydispersity of polyQ:

      The authors don't provide any rationale for choice of different copies of polyQ used in the manuscript for their chain- growth simulation studies. It will be more apt if it can be motivated via some precedent experimental observations.

      We agree and have clarified our rationale in the revised manuscript. ELF3’s polyQ tract is a naturally polymorphic short tandem repeat in Arabidopsis, reported to vary from roughly ~7 to ~29 glutamines in natural populations, and this variation has been linked to ELF3-dependent phenotypes and temperature-responsive growth (Undurraga et al.; Jung et al.). Importantly, recent ELF3 PrLD thermosensing/condensation experiments explicitly compare multiple polyQ lengths (including Q0, short/WT-like constructs such as Q7, and expanded tracts around ~Q20) and show that polyQ length tunes temperature-responsive phase behavior and condensate properties (Jung et al.; Hutin et al.).

      Accordingly, for our chain-growth ensembles we chose a small, experimentally motivated set that brackets this range - 0Q (deletion), 7Q (WT-like short), and expanded lengths 13Q and 19Q (with 19Q closely matching the ~Q20 construct used experimentally), so that our simulations map onto established constructs and naturally occurring variation rather than arbitrary copy numbers.

      The manuscript draft has been modified in the Results and Methods sections.

      Jung J-H. et al. A prion-like domain in ELF3 functions as a thermosensor in Arabidopsis. Nature (2020).

      Undurraga S. et al. Background-dependent effects of polyglutamine variation in the Arabidopsis thaliana gene ELF3. PNAS (2012), DOI: 10.1073/pnas.1211021109.

      Hutin S. et al. Phase separation and molecular ordering of the prion-like domain of the Arabidopsis thermosensory protein EARLY FLOWERING 3. PNAS (2023).

      (7) Lack of initiative to connect to Experiments:

      While the computational models and simulations provide robust theoretical insights, the absence of direct experimental validation weakens the overall impact of the manuscript. For example, experimental data on how specific mutations in the polyQ tract influence ELF3 behavior in vivo would significantly bolster the authors' claims. The manuscript would benefit from either citing existing experimental studies that corroborate these findings or from suggesting future experimental directions.

      We agree that our original submission did not make the experimental connections explicit enough, and we have strengthened this in the revision by (i) explicitly anchoring our results to published ELF3 thermosensing/condensation measurements and (ii) articulating concrete, experimentally testable mechanistic predictions that follow from the simulations.

      (i) Explicit connection to published experimental benchmarks: We now cite and discuss key experimental studies that directly probe ELF3 temperature responsiveness and polyQ effects. Jung et al. demonstrated temperature-triggered ELF3 condensation/speckle formation in vivo and showed that polyQ length modulates thermoresponsive behavior. More recently, Hutin et al. compared ELF3 PrLD constructs spanning polyQ lengths (e.g., Q0, Q7, and ~Q20) and reported temperature-triggered phase separation, condition-dependent condensed-phase regimes (droplet-like versus more arrested/gel-/hydrogel-like), and reduced mobility/immobile fractions quantified by FRAP in some regimes. In the revised manuscript we explicitly map these observations onto our results: our coarse-grained simulations capture temperature-dependent condensation propensity, while our added condensate dynamics analyses (MSD-derived internal mobility DDD, anomalous exponent α\alphaα, and self van Hove displacement statistics) indicate dynamically slowed/heterogeneous condensates rather than assuming ideal liquid droplets—consistent with experimentally observed slow FRAP recovery and arrested behavior under some conditions.

      (ii) Mechanistic Connections: While existing experiments establish that ELF3 condensation is temperature-triggered and tuned by polyQ length, they cannot directly resolve the molecular interaction changes that drive these macroscopic readouts. We therefore emphasize that our atomistic and coarse-grained analyses provide a mechanistic interpretation: temperature shifts reorganize and expose “sticker”-rich regions (notably aromatics), strengthening intermolecular contact networks that tune condensate stability and material properties. This framing aligns our conclusions with the experimental picture that polyQ length has modest effects on onset-like behavior but more strongly tunes condensed-phase robustness and dynamics (persistence, internal mobility, and arrest) across temperature

      The modifications relevant to this are in the Discussion section.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to explore how a key protein in the circadian clock of plants, ELF3, responds to temperature changes by forming molecular condensates. They focused on understanding the role of a specific region of the protein, a polyQ tract, in promoting temperature-sensitive structural changes and regulating the formation of condensates. Through a series of computational simulations, they sought to uncover the molecular basis for ELF3's temperature responsiveness and its broader implications for plant growth and adaptation to environmental conditions.

      Strengths:

      The study's strength lies in its focus on an important biological question: how plants sense and respond to temperature changes at the molecular level. The authors employed a variety of computational techniques, including coarse-grained simulations, to explore the role of specific molecular features in this process. These methods provide a multi-scale view of protein behavior and offer valuable insights into how molecular structures may influence biological function.

      Weaknesses:

      However, there are notable weaknesses in the evidence provided. While the authors present trends in molecular changes, such as shifts in helical propensity and the formation of condensates, these results seem subtle and are not strongly substantiated by statistical analysis. The lack of error bars in the figures makes it difficult to distinguish between meaningful signals and potential noise in the data. Furthermore, the temperature-sensitive behavior appears to be influenced more by chain length than by sequence-specific effects of the polyQ region, raising questions about whether the findings truly capture the molecular mechanisms responsible for temperature sensing. Additionally, some simulations, particularly those related to the formation of condensates, do not appear fully converged, which casts further doubt on the robustness of the results.

      We appreciate the reviewer’s concerns regarding statistical support, sequence specificity, and convergence. In the revised manuscript we (i) report replicate-averaged means with uncertainty (mean ± SEM) for all key observables and add error bars/shaded bands to the relevant figures, (ii) provide the full time series plots in the Supplementary Information to make variability and equilibration transparent, and (iii) revise our interpretation to emphasize that polyQ length has only modest effects on onset-like metrics but more strongly tunes condensate stability and material state (lifetime, internal mobility (D), subdiffusion exponent (α), and non-Gaussian van Hove signatures). This revised framing is consistent with recent ELF3 PrLD experiments showing that polyQ variation can subtly affect onset while substantially modulating condensed-phase behavior and dynamics. Relevant changes to the main text have been made in the Results and Discussion section.

      Additional Context for Readers:

      Readers should interpret the results with caution, especially regarding the molecular mechanisms proposed for temperature sensing. While the study presents interesting trends, the evidence is not definitive, and the findings may be more reflective of general protein behavior (such as the effect of chain length on condensate formation) than specific sequence-driven responses to temperature. Further experimental studies and more converged simulations will be necessary to fully understand the role of ELF3 in temperature regulation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I already have listed my possible recommendations for authors for revising their manuscript in the review. By addressing these issues, the authors could significantly improve the robustness of their conclusions and provide stronger evidence for ELF3's role in temperature-responsive phase separation.

  2. May 2026
    1. Author Response:

      Reviewer #1 (Public review):

      Summary:

      The authors present a novel approach to subcellular spatial proteomics by combining laser microdissection with expansion microscopy and LC-MS/MS analysis (SPEx). They implement two different workflows for LMD and LC-MS/MS quantification:

      (1)The standard approach, where an area of interest is cut out by LMD, subjected to proteomics analysis, and compared to the rest of the cell without the dissected ROI.

      (2) The subtraction approach, where ROIs are removed, and the remaining cellular material is compared to samples containing both the surrounding material and the ROI.

      The authors assess the technique by applying it to subcellular targets of various sizes, volumes, and protein compositions such as the nucleus, nucleoli, and Golgi. They demonstrate that SPEx can identify proteins enriched or reduced in ROIs.

      Strengths:

      The broad, relatively easy, and inexpensive applicability of this approach to potentially many cell types and subcellular areas of interest provides an exciting alternative to subcellular fractionation, native immunoprecipitation, or genetically encoded proximity labeling constructs. Moreover, by visually selecting ROIs for subsequent analysis, subcellular context or organelle morphology can be taken into account, as discussed by the authors in the discussion section.

      Weaknesses:

      While strongly supporting the sharing of this approach, we have a number of comments and questions that will improve the impact of the manuscript:

      We thank the reviewer for the careful evaluation of our manuscript and the generally positive assessment. We plan on improving our manuscript based on the reviewers’ comments.

      (1) General:

      a) The manuscript would benefit from restructuring and language revision. In its current form, the writing is sometimes dense and verbose (in particular, the Results section). This makes it difficult to follow the authors' arguments.

      We will improve readability and clarity of the results section in the revised manuscript.

      b) The authors mention the possibility of selecting organelles based on morphology. This is left for the discussion, but it seems like a missed opportunity - the authors could compare individual organelles in different morphological states, e.g., connected vs. fragmented mitochondria.

      The authors agree with the reviewers’ assessment that investigating proteome of organelles based on morphology or cellular state is an exciting application of SPEx. While we plan experiments along this line in the future, we think that these experiments are beyond the scope of this manuscript, which is meant to describe the method and its general usefulness.

      (2) Technical:

      a) Why do the authors strive and optimize for a 10x expansion factor? Is SPEx compatible with a more standard 4x expansion, as e.g., used in the classic U-ExM approach (https://www.nature.com/articles/s41592-018-0238-1)? This could be added to the discussion.

      We aimed for 10x expansion solely because our ultimate goal is to cut out very small structures. Isolating structures as small as nucleoli would not be as reliable with a lower expansion factor (i.e. 4x) expansion. We did not assess the compatibility with U-ExM. We would assume that SPEx would also work with U-ExM as expansion method; omitting protease treatment, however. Still, we performed pilots with just 4x expansion (using TREx) in the early stages of optimization. We were able to isolate single cells and obtain similar protein coverage as with 10x expansion. We will further clarify our motivation to use 10x expansion in the discussion.

      We would also like to point out whether to U-ExM the standard method or not is rather subjective. Even though TREx was published three years later, it is also very widely used. The original expansion microscopy method was published three years prior to U-ExM.

      b) The U-ExM approach shows improved ultrastructural preservation when using 3%FA with 0.1% glutaraldehyde fixation (GA). Is SPEx compatible with the use of low amounts of GA for fixation?

      We tried different fixation methods in the early stages of this study (where expansion was not yet close to 10x). We saw a mild negative effect of GA on the expansion factor, so we avoided it in the later experiments since it also did not seem necessary to preserve the structure of our organelles of interest. However, the use of GA would generally be compatible with SPEx, potentially at the cost of a mild negative effect on expansion factor (see Author response image 1) and proteome coverage. We can add this information to the discussion.

      Author response image 1.

      Fixation methods mini-screen. Cells were fixed with the indicated reagents for 10 minutes at 37°C. After TREx expansion, the diameter of the nucleus was measured (A) and the resulting expansion factor compared to the non-expanded control was determined (B).

      Related to the above, was the anchoring efficiency reduced only to achieve a 10x expansion factor or does this additionally affect the proteome coverage?

      We solely lowered the anchoring in order to allow for higher expansion factors. In earlier pilots we performed proteomic analysis on samples that were just expanded 4x using standard TREx expansion (also using the original anchoring strategy from the TREx publication, consisting of 0.2 mg/ml AcX for overnight at RT). We presented the results of this pilot in Fig S1A. We still detected over 2,000 proteins from 10 cells, a coverage, which is highly similar to what we found in the final experiments (Figure 2F), in which the anchoring was lower yielding 10x expansion. Based on these data, we hypothesize that anchoring (and expansion factor!) has a negligible impact on protein coverage. We will clarify this in the manuscript.

      d) Have the authors considered using alternative anchoring approaches, such as GMA (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0291506#pone.0291506.s001), which potentially increase the amount of sample retained in the hydrogel, thus allowing for better proteome coverage? This could be added to the discussion.

      We did not use alternative anchoring approaches. We modified the TREx protocol to fit our purposes and since this was sufficient, we did not explore alternatives. However, using anchoring approaches, in which higher amounts of sample could be retained in the gel might be beneficial for the proteomics coverage. We will keep this suggestion in mind for future experiments. Thank you for the suggestion!

      e) The limitation of the approach to near-2D samples should be mentioned, and alternative approaches for more 3D samples could be discussed.

      The authors agree that SPEx is limited to near-2D samples at this point. We suggest that SPEx is applicable for 3D samples (e.g. in tissues) by performing cryosectioning. TREx has been shown to be compatible with sectioned tissue (Damstra et al., 2022). We will elaborate this in the discussion.

      f) How are peptides that are directly anchored to the hydrogel dealt with during LC-MS/MS analysis? Are they excluded, or can they be identified during the spectral search? The latter would allow us to get a deeper structural understanding of how proteins are actually anchored into hydrogels, which so far has not been assessed.

      The reviewer raises an interesting point. In general, peptides carrying the anchoring modification are analysed by LC-MS, but we did not include these specific modifications in the database search. Overall, we assumed that the labeling would be low and stochastic and hence should, if at all, only minimally affect the detection of peptides. Nevertheless, in response to the reviewers’ comment, we searched the MS data again for the crosslinking reagent linked to lysine residues. However, we could not get any confident hit for any peptide containing this modification. Since we cannot exclude that the modification precludes the identification of the corresponding peptides, we compared the number peptides generated by trypsin cleavage after arginine and lysine. As the human genome contains similar proportions of both amino acids, one would expect similar numbers of both peptide types being identified. Any modifications of lysine by the anchoring reagent used, would prevent tryptic cleavage and thus reduce the number of lysine peptides. As shown in Author response image 2, the number of lysine terminating is only slightly lower compared to arginine terminating peptides. Notably, the proteomics results of a different fixed human tissue sample directly extracted by laser capture micro dissection without expansion showed a very similar lysine to arginine peptide ratio. This indicates that the large majority of lysine residues is not modified and affected by the hydrogel anchoring.

      Author response image 2.

      Number of peptides identified either terminating with lysine (K) or arginine (R) across all samples shown in Figure 5F.

      An alternative approach to address this question would be to investigate if the peptide coverage of proteins detected by SPEx is enriched for peptides representing the folded core of proteins as opposed to the surface-exposed regions, which likely get more anchored into the hydrogel.

      Because of the negligible amounts of modified peptides, we did not investigate this potential bias of surface-exposed versus folded-core peptides.

      g) Same question regarding peptides with NHS labeling. Can they be identified, or do they just compete for ionization and thus negatively affect coverage and dynamic range of the LC-MS/MS approach?

      The reviewer raises a similar point as above for another lysine labeling used during the SPEx protocol. Again, we specifically looked for this modification by re-searching the raw MS data, but still could not identify any peptides, carrying this modification on a lysine residue. Even though we cannot exclude that this rather large modification prevents detection, considering the high number of lysine terminating peptides in our dataset (see Figure 2), we would expect that also this labeling step is stochastic and affects only a minor proportion of the proteins.

      h) How are the primary and secondary antibodies affecting the proteomics analysis identified as contaminants?

      We thank the reviewer for this comment. Since antibodies bind to proteins in a non-covalent manner, they will be released during the denaturing steps of the protocol. Of course, the antibodies will stay in the sample, be digested and analyzed and could, if very abundant, affect the analysis of the proteins from the samples. To check this possibility, we re-searched the MS data including the sequences of the antibodies used. To our surprise, we could not detect any peptides of these antibodies. This suggests that the concentrations of the antibodies used are much lower than those of the sample proteins and thus should not have any impact on the proteomics results.  We interpret this result also as a benefit of our method compared to organellar-IP.

      i) Have the authors observed differences in proteomics coverage of only antibody vs NHS-labeling? Depending on the questions above, could pure antibody-based labeling increase proteomic coverage?

      We did not perform this comparative analysis, since we always used NHS dyes. In the experiments presented in this manuscript, NHS dyes allowed easy visualization of the whole cell without the use of antibodies. This NHS staining was essential for this particular setup for sample acquisition. We cut out entire cells, cells lacking the nucleus and cells lacking the Golgi apparatus, which served as critical controls. However, other ways of detecting cell boundaries could be used to avoid NHS staining. As shown above, both, the anchor and NHS labeling are likewise sparse and stochastic. Moreover, we could not detect any impact of the antibody labeling to our results. Thus, we assume that both labeling procedures could be used.

      Reviewer #2 (Public review):

      Summary:

      This study introduces a method that combines physical expansion of cells, imaging-guided isolation of defined regions, and protein identification to enable compartment-resolved analysis of protein composition at the subcellular scale. The authors aim to address a central limitation in existing approaches, namely the loss of spatial information during sample preparation or the indirect nature of proximity-based labeling methods. Using several cellular compartments as examples, they demonstrate that their approach can recover compartment-enriched protein sets and identify candidate proteins with previously unassigned localization.

      Strengths:

      A major strength of this work is the conceptual simplicity and accessibility of the approach. By combining established techniques in a modular way, the method avoids the need for genetic manipulation or specialized labeling strategies, making it broadly adaptable across experimental systems. The ability to directly select regions of interest based on imaging represents a clear advantage over indirect enrichment strategies and allows flexible targeting of both membrane-bound and non-membrane-bound compartments.

      The experimental design is also a strong aspect of the study. The use of complementary comparison strategies-analyzing isolated compartments alongside matched "subtracted" controls-provides an internal framework for assessing enrichment and depletion, increasing confidence in spatial assignment. The application of the method across multiple organelles of different sizes and properties demonstrates versatility, and the reported specificity for several compartments is encouraging. In particular, the ability to profile small and biochemically challenging structures highlights a potentially important niche for the approach.

      Weaknesses:

      Despite these strengths, several methodological limitations constrain the interpretation of the results. The most important relates to spatial accuracy in three dimensions. While lateral resolution is improved through physical expansion, the lack of depth resolution introduces uncertainty regarding contributions from structures above and below the selected region. Although the authors argue that this does not substantially affect specificity, the current evidence is largely indirect, and a more rigorous quantification of potential contamination would strengthen this conclusion.

      Quantitative interpretation also remains challenging. Because the measurements reflect total protein abundance rather than local concentration, differences in compartment size and protein density can influence enrichment values, particularly for small structures embedded within larger volumes. This issue is evident in the analysis of smaller compartments and complicates direct comparison across conditions. Additional normalization or modeling would help clarify how to interpret these measurements.

      Another limitation concerns variability in the expansion process and its downstream consequences. Differences in expansion factor across samples may affect the definition of regions of interest and introduce variability in sampling, yet the impact of this variability is not fully explored. Similarly, the use of a modified chemical treatment to preserve proteins for downstream analysis is central to the workflow but is not extensively validated with respect to preservation of spatial organization.

      While the identification of previously unannotated proteins is an appealing aspect of the study, validation is limited to a small number of examples, and broader support from independent datasets or literature context is lacking. In addition, the study primarily focuses on steady-state measurements in a single cell type, and therefore does not yet demonstrate the ability of the method to capture dynamic or condition-dependent changes in protein localization.

      Finally, the positioning of the method relative to existing approaches could be more clearly articulated. Although qualitative comparisons are provided, a more systematic and quantitative benchmarking against alternative strategies would help readers better understand the specific advantages and trade-offs.

      We thank the reviewer for the careful evaluation of the manuscript and for the constructive feedback. We think the reviewer raises valid points and will address them in the revised manuscript.

      Reviewer #3 (Public review):

      Franziscus et al. describe an elegant approach for spatially specific proteome analysis. To achieve this, they expand fixed cells and subsequently use a laser to micro-dissect a region of interest, which is then analyzed by mass spectrometry.

      They demonstrate the effectiveness of their approach by analyzing the nucleus, nucleolus, and the Golgi, and benchmark their hits against previous datasets for these organelles.

      The manuscript is very well written and nicely guides the reader through the applied methods. The presented data is convincing, and I do not see the need for additional experimental verification of the protocol. The only minor concern is the novelty of the method and the presentation. A combination of expansion, laser microdissection, and proteomics has been applied in the past (PMID: 36450705, PMID: 39477916). In the manuscript, one of these studies is cited, though it does not become clear that this approach is already described. However, Franziscus et al. describe the approach better and make it more accessible to the reader, especially since the other studies described this methodology in combination with tissue expansion and not in combination with single cell expansion as it is done here. I would ask the authors to be clearer in the introduction about what others have already done and what their contribution is here. In general, I am convinced that the community will benefit from the presented protocol to analyze organelle proteomics in detail.

      We thank the reviewer for the careful evaluation of our manuscript and overwhelmingly positive assessment. We apologize for the omission of the mentioned citations, and will adjust the introduction to make it clearer what has already been done and what the advance our method provides.

      References

      Damstra HG, Mohar B, Eddison M, Akhmanova A, Kapitein LC, Tillberg PW. 2022. Visualizing cellular and tissue ultrastructure using Ten-fold Robust Expansion Microscopy (TREx). eLife 11:e73775. DOI: https://doi.org/10.7554/eLife.73775

      Gambarotto D, Hamel V, Guichard P. 2021. Ultrastructure expansion microscopy (U-ExM). Methods in Cell Biology 161:57–81. DOI: https://doi.org/10.1016/bs.mcb.2020.05.006, PMID: 33478697

      Liffner B, Silva TLA e., Vega-Rodriguez J, Absalon S. 2024. Mosquito Tissue Ultrastructure-Expansion Microscopy (MoTissU-ExM) enables ultrastructural and anatomical analysis of malaria parasites and their mosquito. BMC Methods 1:13. DOI: https://doi.org/10.1186/s44330-024-00013-4

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their constructive questions, valuable feedback, and for approving our manuscript. We truly appreciate the opportunity to improve our work based on their insightful comments. Before addressing the editor’s and each referee’s remarks individually, we provide below a point-by-point response summarizing the revisions made.

      Duplication of control groups across experiments

      We appreciate the reviewers’ concern regarding the potential duplication of control groups. In the revised manuscript, we have explicitly clarified that independent groups of control mice were used for each experiment. These details are now clearly indicated in the Materials and Methods section to avoid any ambiguity and to reinforce the rigor of our experimental design (Page 15, Line 453-455): “Furthermore, knockout animals and those treated with pharmacological inhibitors or neutralizing antibodies shared the same control groups (chow and HFCD), as required by the animal ethics committee.”

      Validation of the MASLD model

      To strengthen the metabolic characterization of our MASLD model, we have now included additional parameters, including liver weight, Picrosirius staining and blood glucose measurements. These data are presented as new graphs in the revised manuscript and support the metabolic relevance of the HFCD diet model (Figure Suplementary S1). The corresponding description has been added to the Results section (Page 5, Lines 116-117) as follows: “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C)”

      Assessment of liver injury in RagKO and anti-NK1.1 mice

      We fully agree that assessment of liver injury is essential for these models. For mice treated with antiNK1.1, ALT levels are shown in Figure 4G, confirming increased liver injury after treatment. Regarding Rag⁻/⁻ mice, the animals exhibit exacerbation of liver injury when fed a HFCD diet and challenged with LPS (Page 7, Lines 183–184). The corresponding description has been added to the Results section (Page 7, Lines 175-176) as follows: “Interestingly, Rag1-deficient animals under the HFCD remained susceptible to the LPS challenge (Fig. 4C) with exacerbation of liver injury (Fig. 4D) ”

      Discussion of limitations

      We have expanded the Discussion section to provide a more comprehensive and balanced perspective on the limitations of our model and experimental approach (Page 13-14, Lines 401–414) “Our study presents several limitations that should be acknowledged and discussed. First, we cannot entirely rule out the possibility that our mice deficient in pro-inflammatory components exhibit reduced responsiveness to LPS. However, our ex vivo analyses using splenocytes from these animals revealed a preserved cytokine production following LPS stimulation. These results suggest that the in vivo differences observed are primarily driven by the MAFLD condition rather than by intrinsic defects in LPS sensitivity. Second, the absence of publicly available single-cell RNA-seq datasets from MAFLD subjects under endotoxemic or septic conditions limited our ability to perform direct translational comparisons. To overcome this, we analyzed existing MAFLD patients and experimental MAFLD datasets, which consistently demonstrated upregulation of IFN-y and TNF-α inflammatory pathways in MALFD. In line with these findings, our murine model revealed TNF-α⁺ myeloid and IFN-y⁺ NK cell populations, thereby reinforcing the validity and translational relevance of our results.”. This revision highlights the constraints of the MASLD model, the inherent variability among in vivo experiments, and the interpretative limitations related to immunodeficient mouse strains.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 4 the authors are showing the number of IFN+ positive CD4, CD8, and NK 1.1+ cells. Could they show from total IFNg production, how much it goes specifically on NK cells and how much on other cell populations since NK1.1 is NK but also NKT and gamma delta T cell marker? Also, in Figure 2E the authors see a substantial increase in IFNg signal in T cells.

      While we did not specifically assess IFNγ production in NKT cells or other minor populations, our data indicate that the NK1.1+CD3+ cells (NKT cells) cited in Page 7, Lines  188-192 were essentially absent in the liver tissue of LPS-challenged animals, as shown in Supplementary Figures 3C and S10. The corresponding description has been added to the Results section (Page 7, Lines 188-192) as follows: “We observed that the number of NK cells increased in the liver tissue of PBS-treated MAFLD mice compared with mice fed a control diet (Fig. 4E). LPS challenge increased the accumulation of NK1.1+CD3− NK cells in the liver tissue of MAFLD mice and the absence of NK1.1+CD3+ NKT cells (Fig. S3C and 4E)”.

      This absence was consistent across all experimental conditions, corroborating our focus on NK1.1+CD3− cells as the primary source of NK1.1-associated IFNγ production. Furthermore, data demonstrated in Figure 2E illustrate the presence of IFNγ primarily in NK cells. Therefore, the observed IFNγ signal, attributed to NK1.1+ cells, predominantly reflects conventional NK cells, with minimal contribution from NKT or γδ T cells.

      (2) In Figure 4C, the authors state that the results suggest that T and B cells do not contribute to susceptibility to LPS challenge. However, they observe a drop in survival compared to chow+LPS. Are the authors certain there is no statistical significance there?

      The observed decrease in survival is consistent with our expectations, as T and B cells are not the primary source of interferon-gamma (IFNγ) in this context. Even in their absence, animals remain susceptible to LPS challenge due to the presence of other IFNγ-producing cells that drive the observed lethality. We have carefully re-examined the statistical analysis and confirm that it was correctly performed.  

      (3) Since the survival curve and rate are exactly the same (60%) in Figures 3F, 3G, 4C, 4F, 5G, and 5H I would just like to double-check that the authors used different controls for each experiment.

      The number of mice used in each experiment was carefully determined to ensure sufficient statistical power while fully complying with the limits established by our institutional Animal Ethics Committee. To minimize animal use, the same control group was shared across multiple survival experiments. Despite using shared controls, the total number of animals per experimental group was adequate to produce robust and reproducible survival outcomes. All groups were properly randomized, and the shared control data were rigorously incorporated into statistical analyses. This strategy allowed us to maintain both ethical standards and the scientific rigor of our findings.

      (4) In Figure 5 the authors are saying that it is neutrophils but not monocytes mediate susceptibility of animals with NAFLD to endotoxemia. However, CXCR2i depletion and CCR2 knock out mice affect both monocytes/macrophages and neutrophils. And in Figures 5E, 5G, and 5H they see that a) LPS+CXCR2i decreases liver damage more than LPS+anti Ly6G, b) HFCD mice challenged with LPS and treated with anti-LY6G do not rescue survival to levels of CHOW LPS and c) anti Ly6G treatment helps less than CXCR2i. Therefore, from both knock out mice and depletion experiments the authors can conclude that most likely monocytes (but potentially also other cells) together with neutrophils are substantial for the development of endotoxemic shock in choline-deficient high-fat diet model.

      While neutrophils express CCR2, our data clearly show that CCR2 deficiency does not impair neutrophil migration, as demonstrated in Supplemental Figures 5A and 5B (added to the manuscript, page 8, lines 213–217). The corresponding description has been added to the Results section (Page 8, Lines 213217) as follows: ``Interestingly, animals deficient in monocyte migration (CCR2-/-) showed a high mortality rate compared to wild type after LPS challenge and neutrophil migration is not altered (Fig. 5SA and Fig. 5SB)``, In contrast, CCR2 deficiency primarily affects monocyte recruitment, yet in our experimental conditions, monocyte depletion or CCR2 knockout did not significantly alter the severity of endotoxemic shock, indicating that monocytes play a minimal role in mediating susceptibility in HFCD-fed mice.

      To specifically investigate neutrophils, we used pharmacological blockade of CXCR2 to inhibit migration and antibody-mediated neutrophil depletion. Both approaches have consistently demonstrated that neutrophils are critical factors in endotoxemic shock.

      These findings support our conclusion that neutrophils are the primary cellular contributors to susceptibility in HFCD-fed mice during endotoxemia, with monocytes making a negligible contribution under the tested conditions.

      (6) In Figure 6A (but also others with PD-L1) did the authors do isotype control? And can they show how much of PD1+ population goes on neutrophils, and how much on all the other populations?

      To address this issue, we performed additional analyses to assess the distribution of PD-L1 expression on CD45+CD11B+ leukocytes. These new results, detailed on Page 9, lines 245-250, and now presented in Supplemental Figure 6, demonstrate that PD-L1 expression is predominantly enriched in neutrophils compared to other immune subsets. This observation further reinforces our conclusion that neutrophils represent a major source of PD-L1 in our experimental model.

      To ensure the robustness of these findings, we also included FMO controls for PD-L1 staining in the newly added Supplemental Figure S6. These controls validate the specificity of our gating strategy and confirm the reliability of the detected PD-L1 signal. The corresponding description has been added to the Results section (Page 9, Lines 245-250) as follows: ``First, we observed that only the MAFLD diet caused a significant increase in PD-L1 expression in CD45+CD11b+ leukocytes after LPS challenge (Fig. S6C). We observed that within this population, neutrophils predominate in their expression when compared to monocytes (Fig. 6SA, Fig. 6SB, and Fig. 6SD). Furthermore, PD-L+1 neutrophils showed an exacerbated migration of PD-L1+ neutrophils towards the liver (Fig. 6A and 6B)”

      (7) In Figure 6D it is interesting that there is not an increase in PD-L1+ neutrophils in LPS HFCD IFNg+/+ mice in comparison to LPS chow IFNg+/+ mice, since those should be like WT mice (Figure 6A going from 50% to 97%) and so an increase should be seen?

      The apparent difference between Figures 6A and 6D likely reflects inter-experimental variability rather than a biological discrepancy. Although the absolute percentages of PD-L1⁺ neutrophils varied slightly among independent experiments, the overall phenotype and trend were consistently maintained namely, that PD-L1 expression on neutrophils is enhanced in response to LPS stimulation and modulated by IFNγ signaling. Thus, the data shown in Figure 6D are representative of this consistent phenotype despite minor quantitative variation.

      (8) In Figure 7 do the authors have isotype control for TNFa because gating seems a bit random so an isotype control graph would help a lot as supplementary information, in order to make the figure more persuasive

      To address the concern regarding gating in Figure 7, we have included the FMO showing TNFα as a histogram Supplementary Figure 8gG. These control reaffirm the accuracy and reliability of our gating strategy for TNFα, further supporting the robustness of our data. The corresponding description has been added to the Results section (Page 9, Lines 272-274) as follows:`` We observed an exacerbated TNF-α expression by PD-L1+ neutrophils from MAFLD when compared to control chow animals (Fig. 7A, Fig. 7B, Fig. 7D, and Fig8SG).

      (9) Figure 6C IFNg+/+ mice on CHOW +LPS is same as Figure 8E mice chow +LPS but just with different numbers. Can the authors explain this?

      Although the data points in Figures 6C and 8E may appear similar, we confirm that they originate from entirely independent experiments and represent distinct datasets. To enhance clarity and avoid any potential confusion, we have adjusted the figure presentation and sizing in the revised manuscript. These changes make it clear that the datasets, while comparable, are derived from separate experimental replicates.

      (10) Figure 1E chow B6+LPS is the same as Figure 5D B6+LPS but should they be different since those should be two different experiments?

      We confirm that Figures 1E and 5D correspond to data obtained from independent experiments. Although the experimental conditions were similar, each dataset was generated and analyzed separately to ensure the reproducibility and robustness of our results.

      Reviewer #2 (Recommendations for the authors):

      (1) Why did you look at kidney injury in Figure 1D? I think this should be explained a little.

      We assessed kidney injury alongside ALT, a marker of liver damage, because both the liver and kidneys are among the primary organs affected during sepsis and endotoxemia. This rationale has been added to the manuscript (page 5, lines 129–131): “Remarkably, compared to the Chow group, HFCD mice exposed to LPS did not show greater changes in other organs commonly affected by endotoxemia, such as the kidneys (Figure 1D).” By evaluating markers of injury in both organs, we aimed to determine whether our physiopathological condition was liver-specific or indicative of broader systemic injury.

      (2) I know Figure 2C isn't your data, but why are there so few NK cells, considering NK cells are a resident liver cell type? Doesn't that also bring into question some of your data if there are so few NK cells? And the IFNG expression (2E) looks to mostly come from T-cells (CD8?).

      The data shown in Figure 2C were reanalyzed from a separate NAFLD model based on a 60% high-fat diet. Although this model differs from ours, the observed low number of NK cells is consistent with expectations for animals subjected solely to a hyperlipidic diet, which primarily provides an inflammatory stimulus that promotes recruitment rather than maintaining high baseline NK cell numbers.

      In our experimental model, these observations align with published data. Specifically, liver tissue from NAFLD animals typically exhibits low baseline NK cell numbers, but upon LPS challenge, there is a marked increase in NK cell recruitment to the liver. This dynamic illustrates the interplay between dietinduced inflammation and immune cell recruitment in our experimental context and supports the interpretation of our IFNγ data.

      (3) In your methods, I think you didn't explain something. You said LPS was administered to 56 week old mice, but that HFCD diet was started in 5-6 week old mice and lasted 2 weeks, then LPS was administered. So LPS administration happened when the mice were 7-8 weeks old, right?

      We thank the reviewer for pointing out this inconsistency in our Methods section. The reviewer is correct: the HFCD diet was initiated in 5–6-week-old mice, and LPS was administered after 2 weeks on the diet, such that LPS challenge occurred when the mice were 7–8 weeks old.

      We have revised the Methods section (add page 15-16, lines 474–480).  to clarify this timeline and ensure it is accurately described in the manuscript. The corresponding description has been added to the Materials and Methods section (Page 14, Lines 436-442) as follows: “Lipopolysaccharide (LPS; Escherichia coli (O111:B4), L2630, Sigma-Aldrich, St. Louis, MO, USA) was administered intraperitoneally (i.p.; 10 mg/kg) in C57BL/6, CCR2 -/-, IFN-/-, and TNFR1R2 -/- mice. The HFCD was initiated in 5–6 week-old mice, and LPS was administered after 2 weeks on the diet, meaning that LPS administration occurred when the mice were 7–8 weeks old, with body weights ranging from 22 to 26 g. LPS was previously solubilized in sterile saline and frozen at -70°C. The animals were euthanized 6 hours after LPS administration”.

      (4) Throughout the manuscript, I would consider changing the term NAFLD to something else. I think HFCD diet is a closer model to NASH, so there needs to be some discussion on that. And the field is changing these terms, so NAFLD is now MASLD and NASH is now MASH.

      We appreciate the reviewer’s comment regarding the terminology and disease classification. In our experimental conditions, the animals were subjected to a high-fat, choline-deficient (HFCD) diet for only two weeks, a period considered very early in the progression of diet-induced liver disease. At this stage, histological analysis revealed lipid accumulation in hepatocytes without evidence of hepatocellular injury, inflammation, or fibrosis. Therefore, our model more closely resembles the metabolic-associated fatty liver disease (MAFLD, formerly NAFLD) stage rather than the more advanced metabolic-associated steatohepatitis (MASH, formerly NASH).

      Indeed, prolonged exposure to HFCD diets, typically 8 to 16 weeks, is required to induce the inflammatory and fibrotic features characteristic of MASH. Since our objective was to study the initial metabolic and immune alterations preceding overt liver injury, we believe that using the term MAFLD more accurately reflects the pathological stage represented in our model. Accordingly, we have revised the text to align with the updated nomenclature and disease context.

      (6) I am concerned about over interpretation of the publicly available RNA-seq data in Figure 2. This data comes from human NAFLD patients with unknown endotoxemia and mouse models using a traditional high-fat diet model. So it is hard to compare these very disparate datasets to yours. Also, if these datasets have elevated IFNG, why does your model require LPS injection?

      We thank the reviewer for their thoughtful comments regarding the interpretation of the RNA-seq data presented in Figure 2. We would like to clarify that the human NAFLD datasets referenced in our study do not specifically include patients with endotoxemia; rather, they focus on individuals with NAFLD alone.

      Comparing data from human and murine MAFLD models, we observed that NK cells, T cells, and neutrophils are present and contribute to the hepatic inflammatory environment. Our reanalysis indicates that the elevations of IFNγ and TNF in NAFLD are primarily derived from NK cells, T cells, and myeloid cells, respectively.

      In our experimental model, LPS administration was used to evaluate whether these immune populations particularly NK cells are further potentiated under a hyperinflammatory state, leading to exacerbated IFNγ production. This approach allows us to determine whether increased IFNγ contributes to worsening outcomes in NAFLD, providing mechanistic insights that cannot be obtained from static human or traditional mouse datasets alone.

      (7) The zoom-ins for the histology (for example, Figure 1E) don't look right compared to the dotted square. The shape and area expanded don't match. And the cells in the zoom-in don't look exactly the same either.

      We have thoroughly re-examined the histological sections and the corresponding zoom-ins, including the example in Figure 1E. Upon verification, we confirm that the zoom-ins accurately represent the highlighted areas indicated by the dotted squares. The apparent discrepancies in shape or cellular appearance are likely due to minor differences in orientation or cropping during figure preparation. Nevertheless, the content and regions depicted are consistent with the original sections.  

      (8) Did the authors measure myeloid infiltration in the CCR2-/- mice? Did you measure Neutrophil infiltration in the TNF-Receptor KO mice?

      Analysis of CD45+ cell migration in CCR2 knockout mice, as shown in Supplemental Figure 5C and 5D, demonstrates that the absence of CCR2 does not impair overall leukocyte migration. Similarly, assessment of neutrophil migration in TNF receptor (TNFR1/2) knockout mice, presented in Supplemental Figure 8A, shows that neutrophil trafficking is not affected in these animals. These results indicate that the respective knockouts do not compromise the migration of the analyzed immune populations, supporting the interpretations presented in our study.

      (9) Regarding Methods for RNA-seq Analysis. Was the Mitochondrial percentage cutoff 0.8%, because that seems low. And was there not a Padj or FDR cutoff for the differential expression?

      The mitochondrial percentage in our scRNA-seq analysis reflects the proportion of mitochondrial gene expression per cell, which serves as a quality control metric. A low mitochondrial gene expression percentage, such as the 0.8% cutoff used here, is indicative of highly viable cells.

      For differential gene expression analysis, we employed the FindMarkers function in Seurat with standard parameters: adjusted p-value (Padj) < 0.05 and log2 fold change > 0.25 for upregulated genes, and adjusted p-value < 0.05 with log2 fold change < -0.25 for downregulated genes. These thresholds ensure robust identification of differentially expressed genes while balancing sensitivity and specificity.

      (10) Regarding Methods for Flow Cytometry. How were IFNG and TNF staining performed? Was this an intracellular stain? Did you need to block secretion? TNF and IFNG antibodies have the same fluorophore (PE), so were these stainings and analyses performed separately?

      Six hours after LPS challenge, non-parenchymal liver cells were isolated using Percoll gradient centrifugation. Because the animals were in a hyperinflammatory state induced by LPS, no in vitro stimulation was performed; all staining was carried out immediately after cell isolation. Detection of IFNγ and TNF was performed via intracellular staining using the Foxp3 staining kit (eBioscience). Due to both antibodies being conjugated to PE, IFN-γ and TNF-α staining and analyses were conducted in separate experiments. These distinct staining protocols and analyses are detailed in Supplemental Figures 10 and 11. The corresponding description has been added to the Materials and Methods section (Page 16, Lines 490-493) as follows: ``As animals were already in a hyperinflammatory state, no additional in vitro stimulation was required. Intracellular detection of IFN-γ and TNF-α was conducted using the Foxp3 staining kit (eBioscience). Since both antibodies were conjugated to PE, staining and analyses were performed in separate experiments``

      Reviewer #3 (Recommendations for the authors):

      (1) Achieving an NAFLD model/disease is the starting point of this study. I understand that a two-week HFCD diet period was applied due to the decrease in lymphocyte numbers. Was it enough to initiate NAFLD then? Or is it a milder metabolic disease? Which parameters have been evaluated to accept this model as a NAFLD model?

      Indeed, the two-week HFCD diet induces an early-stage form of NAFLD, characterized by initial fat accumulation in the liver without significant hepatic injury. While this represents a milder metabolic phenotype, it is sufficient to study the inflammatory and immune responses associated with NAFLD. To validate this model, we assessed multiple parameters: liver weight, blood glucose levels, and collagen deposition. These measurements confirmed the presence of early-stage NAFLD features in the animals, providing a relevant and reliable context for investigating susceptibility to endotoxemia and immune cell dynamics. They are shown in Figure Suplementary 1 and the text was included in the manuscript (Page 5, Lines 116-117): “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C) ”.

      (2) It is true that the CD274 gene (encoding PD-L1) and the IFNGR2 gene, corresponding to the IFNγ receptor, are among the upregulated genes when authors analyzed the publicly available RNAseq data but they are not the most significantly elevated genes. What is the reasoning behind this cherrypicking? Why are other high DEGs not analyzed but these two are analyzed?

      We highlighted the expression of the IFN-γ receptor (IFNGR2) and CD274 (encoding PD-L1) in the publicly available RNA-seq data to align and corroborate these findings with the key results observed later in our study. To avoid redundancy, we chose to present these genes in the initial figures as they are directly relevant to the subsequent analyses. Regarding the broader analysis of human RNA-seq data, our primary objective was to identify enriched biological processes and pathways, which served as a foundation for the focus and direction of this study.

      (3) Figures 3C-3G: I understand that IFNg-/- and NFR1R2a-/- mice are not showing elevated liver damage but it may simply be because of the non-responsiveness to the LPS challenge. I suggest using a different challenge or recovery experiments with the cytokines to show that the challenge is successful and results are caused by NAFLD, truly. The same goes for Figure 6: Looking at Figure 6D one may think that IFNg deficiency alters the LPS response independent of the diet condition (or NAFLD condition).

      We appreciate the reviewer’s insightful comment and fully understand the concern regarding the potential non-responsiveness of IFN-γ⁻/⁻ and TNFR1R2a⁻/⁻ mice to the LPS challenge. To address this point and confirm that these knockout animals are indeed responsive to LPS stimulation, we conducted an additional set of ex vivo experiments.

      Specifically, WT and cytokine-deficient (IFN-γ⁻/⁻) mice were fed either Chow or HFCD for two weeks, after which spleens were collected, and splenocytes were challenged in vitro with LPS. We then quantified TNF, IFN, and IL-6 production to confirm that these mice are capable of mounting cytokine responses upon LPS stimulation.

      Due to current breeding limitations and a temporary issue in colony maintenance of TNF-deficient mice, we were unable to include TNFR1R2a⁻/⁻ animals in this additional experiment. Nevertheless, we prioritized performing the analysis with the available knockout line to avoid leaving this important point unaddressed.

      These additional data demonstrate that IFN-γ-deficient mice remain responsive to LPS, reinforcing that the differences observed in vivo are related to the NAFLD condition rather than a lack of LPS responsiveness.

      (4) Figure 1 vs Figure 4: Rag-/- mice seem more susceptible to LPS-derived death even after normal conditions. But If I compare the survival data between Figure 1 and Figure 4, Rag-/- HFCD diet mice seem to be doing better than wt mice after LPS treatment. (1 day survival vs 2 days survival). How do you explain these different outcomes?

      We thank the reviewer for this insightful question regarding the survival data in Figures 1 and 4. Although there is a one-day difference in survival outcomes, Rag-/- mice consistently exhibit increased susceptibility to LPS-induced mortality can influence the exact survival timing. Nonetheless, across all experiments, Rag-/- mice display a reproducible phenotype of heightened sensitivity to LPS challenge, which is supported by multiple independent observations in our study.

      (5) How do you explain Figure 4J in connection to the observation presented with Figure 7: TNFa tissue levels, even though significant, seem very similar between the conditions?

      We would like to clarify that the animals in this study are in a metabolic syndrome state, with early-stage NAFLD characterized by hepatic fat accumulation without significant tissue injury, as shown in Figure 1C.

      Under these conditions, the LPS challenge triggers an exacerbated inflammatory response, leading to increased secretion of IFN-γ and TNF-α, primarily from NK cells and neutrophils. While TNFα levels may appear visually similar across conditions, the HFCD mice exhibit a heightened predisposition for an amplified immune response compared to chow-fed mice. This difference is consistent with the functional outcomes observed in our study and highlights the diet-specific sensitization of the immune system.

    1. Author response:

      We will extend and clarify the text of the paper according to the suggestions of the reviewers. In particular we will extend the description and discussion of the calcium chelator approach, re-patching and multiple probability fluctuation analysis. We will also include in the Results section that volume-averaged calcium signals were measured and extend the description about measurement of the resting calcium and variability between boutons. Literature will be included and discussed as suggested.

      In order to avoid any misunderstandings, we will also make it clearer that recordings from

      L5PN – L5PN synapses in S1 were published in our preceding papers (Bornschein et al., 2019a, b), but that these data were partially reanalyzed for the comparison with recordings from L5PN – L5PN synapses in PFC (this paper). We will also emphasize that the recordings from L2/3 to L5PN synapses in S1 and PFC were made directly in the present study. We will include a supplementary table, which explicitly shows for each figure which data are from Bornschein et al. (2019a, b) and which data were obtained in the present study. 

      We will consider all points of the reviewers and the recommendations of the editors in detail in the revised manuscript and/or our pointwise response.

      We recognized one factual error in the public reviews:

      Reviewer 2, point 7: “Methods: The authors use Student's t-test for data comparison. The authors should verify that the data distribution was indeed normal, e.g. by using a Shapiro-Wilk test. If this is not the case, non-parametric tests should be used.”

      A detailed description of the statistics, including test for normality, is given in the Methods section. In particular we wrote in the Methods: “Normality was tested using the Shapiro-Wilk Test. (…) To compare pre- and post-treatment data the paired t-test or the Wilcoxon signed rank test (WSR) was used, depending on the distribution of the data. (…)”

      To further emphasize that the data was tested for normal distribution, we have also extended the description of the statistical tests in the figure legends.

      Bornschein G, Brachtendorf S, Schmidt H (2019a) Developmental increase of neocortical  presynaptic efficacy via maturation of vesicle replenishment. Front Synaptic Neurosci 11:36.

      Bornschein G, Eilers J, Schmidt H (2019b) Neocortical high probability release sites are formed by distinct Ca2+ channel-to-release sensor topographies during development. Cell Rep 28:1410-1418 e1414.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the editors and reviewers for their assessment of this manuscript, and for the positive words highlighting the value of undertaking evaluation of small molecule drugs for snakebite in the neotropics, inclusive of the quality of this work and the value of the validated screening pipeline. We completely agree that the next steps for this work will be to evaluate the preclinical efficacy of the identified drugs in mouse models, though this considerable undertaking will form the basis of future work. Critically, the pipeline that we describe herein facilitates the selection of the most appropriate candidates to progress into such mouse studies, aligning with the 3Rs principles for minimising the need for animal research. The comment around insufficient venom characterisation seems somewhat misplaced – the objective of this project was not to characterise the venoms used, but to evaluate the in vitro inhibition of venom toxin family activities and identify the potential utility of specific repurposed drugs as therapeutics for snakebite in the neotropics. Venom characterisation of the diverse samples used in this project would represent an entire project and manuscript in its own right. We are pleased that the reviewers highlight the gap in research on serine protease inhibitors and the value this paper has in highlighting that more research is required in this area to identify a candidate that is more suitable for future clinical use than nafamostat.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Small molecule therapeutics for snakebite have received a lot of attention for their potential to close the gap between bite and treatment, where antivenom is not immediately available.

      Strengths:

      There has been a lot of focus on Africa, Asia, and India, but very little work related to neotropical regions. The authors seek to begin filling this gap in the preclinical literature. The authors use well-developed methods for preclinical assessment.

      Weaknesses:

      A clearer and more focused discussion of the limitations of the overall present work would be desirable (e.g. protection vs. rescue, why marimastat over prinomastat for in vivo assays when both have been through clinical trials for other indications; real-world feasibility of nafamostat, which has a half-life of 1-2 minutes compared to camostat, which has a half-life of hours). All of this could be improved in a revision.

      We thank the reviewer for their shared opinion of the potential value of small molecules as snakebite envenoming therapeutics and their insight on the gap in focus in the neotropics, which this manuscript aims to address.

      Our work in this manuscript included standard practice of pre-incubation between drug and venom for all in vitro studies, and sequential (i.e. not co-incubation) administration in the egg model. In our revised manuscript we will make these distinctions clearer. Use of a ‘rescue’ approach in the in vitro assays is not feasible due to the rapid destruction of the substrates used for assay readouts. The clearest rationale for the use of rescue models relates to their power within in vivo preclinical models (i.e. murine envenoming models) which, following the in vitro characterisations presented in this paper, are the logical next step for evaluating small molecule drugs for inhibiting neotropical snake venoms.

      Although both marimastat and prinomastat are repurposed drugs that have undergone clinical evaluation for other indications, marimastat has been more extensively characterised preclinically than prinomastat for snakebite, and will soon enter Phase II clinical trial evaluation for this indication (https://www.ddw-online.com/ophirex-to-produce-snake-venom-inhibitor-for-lstm-study-40669-202602/). Marimastat also has a longer half-life in humans of 8-10 hours (Millar et al. 1998), compared to prinomastat (2-5h, Hande et al. 2004). We will more clearly highlight the rationale for selecting marimastat in the revised manuscript.

      Although we appreciate the reviewer’s point regarding the short half-life of nafamostat (which is typically given by continuous iv infusion due to its short half-life), in the manuscript we have already stated that we do not recommend the progression of nafamostat as a snake venom serine protease (SVSP) inhibitor candidate due its low efficacy and off target effects. We highlight the need for the community to identify other serine protease inhibitors that might have utility for snakebite.

      Reviewer #2 (Public review):

      Summary:

      The authors set out to test whether a defined set of small molecules can lessen damaging effects caused by venoms from several Bothrops species, and whether these effects are consistent enough to suggest a broadly applicable approach. They present a cross-venom dataset spanning in-vitro activity readouts and blood-based functional outcomes, and include a chicken embryo model to explore whether venom inhibition can translate into improved survival. The central message is that certain small molecules can reduce specific venom-driven effects across multiple samples, providing a comparative resource for the field and a basis for prioritizing future validation.

      Strengths:

      The main value of this work is the breadth and structure of the dataset, which places multiple venoms and multiple readouts into a single, comparable framework that should be useful for readers evaluating patterns across samples. The experimental flow is generally coherent, moving from activity measurements to functional outcomes and then to an in-vivo test, which helps the reader understand how the authors link mechanism-oriented assays to more integrated endpoints. The manuscript also provides practical information for the community by highlighting which readouts appear most consistently affected across venoms, which can help guide hypothesis generation and study design in follow-up work.

      Weaknesses:

      Several aspects of the study design and framing reduce the confidence with which readers can translate the findings beyond the specific experimental context presented. The evidence base is strongest in controlled in-vitro settings, while the bridge to real-world effectiveness remains limited, particularly for understanding performance under conditions that better reflect delayed treatment and systemic exposure. As a result, the manuscript is best interpreted as a well-organized comparative screening study with promising signals, rather than a definitive demonstration of a broadly effective, deployable intervention.

      We appreciate the reviewer’s opinion on the thorough and logical workflow we present in this manuscript and the value this pipeline providers the field for future and parallel work. We agree with the reviewer that this provides a well-organized comparative screening study applicable to different snake species or therapeutics. In relation to the comment on this manuscript being a definitive demonstration of a broadly effective, deployable intervention we agree with their opinion and are happy to clarify that while the evidence presented in this manuscript is promising, there is much work still to do before such molecules are ready for deployment for treating snakebite. Ultimately, this manuscript supports the growing evidence of the promising utility of marimastat and varespladib, and extends this evidence to neotropical snake venoms in a comparative manner. The next step will be to evaluate the efficacy of these molecules within in vivo murine preclinical models, which will be crucial for further supporting the evidence base for onward translation.

      Reviewer #3 (Public review):

      In this work, the authors wanted to evaluate repurposed small molecule inhibitors for the treatment of envenomation by snakes of the Bothrops genus; one of the most medically relevant in the Americas. I believe the objectives of the research were clearly achieved, and compelling evidence for the ability of these molecules to neutralize enzymatic and toxic activities of metalloproteinases and phospholipases in all the tested venoms is provided. Furthermore, the work highlights the limited efficacy of the tested serine protease inhibitor, suggesting a need for drug discovery campaigns to address toxicity caused by this protein family. The methods are well designed and performed, and the use of both in vitro and in vivo methodologies makes this a thorough and robust work.

      These results are extremely relevant, since they take us one step further to a potential orally administered snakebite treatment. The existence of such a treatment could improve the outcomes for thousands of snakebite victims worldwide. I have a few comments and questions that I hope will be useful to the authors:

      We thank the author for their high regard for the purpose and execution of this work. Their insight in relation to questions are supportive for an improved manuscript and discussion points for the field.

      During the introduction, the authors mention that small-molecule inhibitors can neutralize the localized tissue damage via cytotoxicity of some venoms, and cite PLA2s, SVMPs and/or cytotoxic 3FTxs as the main causing agents of this pathology. I am not aware of any direct effect described by small molecule inhibitors on cytotoxic 3FTxs alone. Has this been observed at all? Or is it more likely that the small molecule inhibitors act on the enzymatic toxins only, preventing synergistic effects with 3FTxs?

      We apologise for this error on our behalf. While inhibitory molecules have been described for cytotoxic 3FTxs, these are not small molecules as alluded to in the previous version of the manuscript. We have amended this text in our revised manuscript.

      I think it would be relevant to address the effects of non-enzymatic PLA2s, such as myotoxin II, which have been described in detail within Bothrops venoms. I believe there is some evidence of Varespladib also having a neutralizing effect on the myotoxicity caused by these non-enzymatic PLA2s. I suggest adding a comment about the contribution of these toxins in the discussion or in the section where PLA2 activity of the venoms is compared. In my opinion, right now it seems like these were overlooked.

      We thank the reviewer for highlighting this point. We agree that this is highly relevant and would benefit from discussion in the revised manuscript given the nature of our assays and the non-enzymatic mechanism of action of certain Bothrops PLA<sub>2</sub>s. We have added this to the discussion.

      Regarding Marimastat and the other MP inhibitors, are there any studies showing that they don't have an effect on endogenous MPs? I understand they have been approved for human use before, but is there any indication that they would not have an effect at the doses that would be required to treat envenomation?

      Most matrix metalloproteinases inhibitors will act on endogenous MPs to at least some extent (variable potency on different MMPs). Marimastat has demonstrated activity against endogenous metalloproteinases, including MMP1, which was hypothesised to cause severe joint pain when used chronically (i.e. frequent dosing over many weeks) for indications such as cancer, though this effect was reversible within 8 weeks of cessation of drug administration (Wojtowicz-Praga, 1998). Thus long-term use of matrix metalloproteinases inhibitors can cause safety concerns. However, the anticipated duration of dosing for snakebite, which is an acute life-threatening condition, is a few days. It is therefore unlikely that prior safety concerns observed following chronic dosing in cancer studies would apply to its potential use as a snakebite field therapy.

      Regarding the quenched fluorescence substrate used for enzymatic activity. Is there a possibility that some of the SVMPs would not act on this substrate, and therefore their activity or neutralization is not observed? Would it be relevant to test other substrates, such as gelatin, collagen, or even specific clotting factors?

      It has been observed that certain SVMPs (specifically several PI SVMPs) are not active against this ES010 substrate in vitro. The substrate used in the in vitro SVMP assay is reported by the manufacturer as a substrate for a wide range of MMPs which target the extracellular matrix components mentioned by the reviewer, i.e. collagenases and gelatinases as well as matrilysins, stromelysins and elastate. This in vitro assay combined with the coagulation assays are complementary in covering the main targets of SVMPs (ECM and clotting cascade), prior to haemorrhagic assessment in the egg model. Thus, we are confident that activity for the broad range of SVMP isoforms will be captured through the screening pipeline we have developed.

      Finally, could the authors comment or provide some bibliography regarding the translatability of the chicken embryo model in the context of envenomation?

      Our current model is based on an earlier egg embryo model (Sells et al. 1997, Sells et al. 1998 and Sells et al. 2000) which described good correlations (p<0.01) with the standard WHO murine preclinical envenoming model. These studies have assessed correlations for minimal haemorrhagic doses (MHDs), LD50s and ED50s in both models for a selection of viper venoms. As chicken embryos at day 6 of development have incomplete neural arcs, the model is not well suited for assessing neurotoxic effects, but can be effectively used for addressing venom-induced haemorrhage and lethality and for testing therapeutics. In addition, a more recent study (Yusuf et al. 2023) reported almost identical LD50s for the venom of Bitis arietans between the two in vivo approaches. The model is also being pursued as a preclinical testing model by an antivenom manufacturer with the focus of reducing the use of rodents in batch release testing (Verity et al. 2021). We will provide further clarification on the rationale for using the egg model, including the supportive references outlined above, in the revised manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The manuscript provides a useful comparative dataset across multiple Bothrops venoms and supports SVMP inhibition as a broadly effective lever in the authors in-vitro work. However, the strength of the 'pan-Bothrops' and translational claims is currently limited by insufficient characterization of the exact venom samples tested and by experimental designs that fall in clinically realistic rescue.

      Major comments:

      (1) The venoms used in this study are historical batches and are not formally characterized beyond SDS-PAGE and literature summaries, despite well-known intra- and inter-population venom variability; this weakens the generalization of the conclusions.

      To address this comment, we have increased clarity on our venom sources being historic, Due to the historic source locality is not available beyond country of origin, with the exception of B. lanceolatus which is endemic to Martinique. Figure 1 also makes clear that we agree with the reviewer that the variation is high within Bothrops species. We discuss this variation on the limitations in our sampling for making broad conclusions throughout the first paragraph of the discussion, with the final sentence stating Future proteomic characterisations of the specific venom samples used in this study, which were all sourced from a historical collection (except for B. lanceolatus), would be informative in this regard. Although venom composition of our samples has not been characterised, the focus of the manuscript is the characterisation of the whole venom functional activity through a wide ranging screening pipeline, and the generalisation of our findings is supported by the diversity of the venom samples (i.e. several species) despite them not being characterised (which is not critical for the focus of the study).

      (2) On a technical comment, the venom inhibition assays appear to rely on drug-first or preincubation conditions, which can easily overestimate efficacy compared with real snakebite envenomation, where toxins distribute and engage targets rapidly. Here, a translational gap is the clinical feasibility of the 'repurposed' inhibitors, as it is unclear whether the drugs central to the conclusions (especially marimastat, prinomastat and varespladib) are realistically available or stocked in hospitals or could be deployed in regions where Bothrops envenoming occurs. I think that the manuscript should clearly distinguish this from candidates with a plausible access and delivery pathway.

      Our work in this manuscript includes standard practice of pre-incubation between drug and venom for all in vitro studies, and sequential (i.e. not co-incubation) administration in the egg model. None of our methods administer drug-first. Throughout the methods and figure legends we have made these distinctions clearer. Use of a ‘rescue’ approach in the in vitro assays is not feasible due to the rapid destruction of the substrates used for assay readouts. The clearest rationale for the use of rescue models relates to their power within in vivo preclinical models (i.e. murine envenoming models), which would be the next step for this research programme.

      While the evidence presented in this manuscript is promising, there is much work still to do before such molecules are ready for deployment for treating snakebite, inclusive of the requirement to complete clinical trials, cost-benefit analysis and policy change and manufacturing/distribution feasibility assessments. Ultimately, this manuscript supports the growing evidence of the promising utility of marimastat and varespladib, and extends this evidence to neotropical snake venoms in a comparative manner. The next step will be to evaluate the efficacy of these molecules within rescue in vivo murine preclinical models, which will be crucial for further supporting the evidence base for onward translation. To further support this point we have included an additional section to the manuscript discussing the current preclinical and clinical progression of prinomastat and marimastat, which also incorporates the public comment on selection of marimastat over prinomastat.

      (3) In my opinion, the Nafamostat results and discussion need reframing, given weak SVSP inhibition and intrinsic anticoagulant behavior at 5 µM. Excluding it from certain analyses undermines interpretability, and it may be more appropriate to include it throughout as an explicit negative control condition (showing its baseline anticoagulant effect) rather than omitting it.

      Although we understand the reviewers opinion here, we disagree and believe that including nafomastat as a ‘negative control’ may present a negative reflection on the benefit that an efficacious serine protease inhibitor could provide. Furthermore, as the intrinsic anticoagulant effect of nafamostat cannot be de-coupled from direct SVSP toxin inhibition we were unable to interpret the activity which undermines the results. This can be seen in Figure 3b, which demonstrates that a false positive result would occur. For the serine protease assay, we do clearly discuss the lack of efficacy and justification of why EC<sub>50</sub> testing wasn’t appropriate within the guidance of our screening protocols.

      In the manuscript we have now further justified our approach in relation to the limitations of nafamostat as a snake venom serine protease (SVSP) inhibitor candidate due its low efficacy and off target effects. We highlight the need for the community to identify other serine protease inhibitors that might have utility for snakebite.

      (4) The data presentation needs consistent statistical analyses (currently absent for multiple key figures, including Figures 2, 3, 4, 6 and 7) and a clearer explanation for the dose of venom and drugs you choose. For example, Figure 3 relies on a fixed 5 µM drug concentration and very different venom amounts (50-100-250 ng), but it is not discussed whether such exposures are achievable in vivo, or how these concentrations map onto expected pharmacokinetics in patients. Likewise, Bothrops venoms can contain both pro- and anticoagulant activities, so the authors should justify how their framework accounts for anticoagulant components and why the observed plasma phenotypes are interpreted as they are

      In relation to the reviewers comment on the need for consistent analysis we thank the reviewer for flagging this and have now included these in figures 3, 4, 6 and 7. However, Figure 2 is presented to display the variation between all the venoms and ultimately used to select the most relevant doses for the latter inhibition experiments, therefore statistical analysis is not relevant for this figure. The updated statistical analysis now includes the following, which has been included in the relevant figure legends and results sections;

      Figure 3 - Bars indicate significant results (p = <0.05) identified through one-way ANOVA with Dunnett’s multiple comparisons test to the DMSO control

      Figure 4 - two-way ANOVA with Šídák's multiple comparisons test of each venom control compared to the matched venom treated with inhibitor

      Figure 6 – the CT and MCF data were analysed independently using one-way ANOVA with Tukey’s multiple comparisons test

      Figure 7 - Log-rank test (Mantel-Cox) with Holm- Šídák's multiple comparisons test against treatment vs venom-only control

      We have ensured that all figure legends clearly indicate the venom and drug dose to aid the clarity which the reviewer requested.

      The comment Figure 3 relies on a fixed 5 µM drug concentration and very different venom amounts (50-100-250 ng), but it is not discussed whether such exposures are achievable in vivo, or how these concentrations map onto expected pharmacokinetics in patients. is an understandable query however, in vitro assessment such as those carried out in this manuscript are not designed to directly inform pharmacokinetic/pharmacodynmanic interpretations, largely because they do not replicate real world envenoming (i.e. preincubation would not occur between a venom and treatment). This is why, as stated, follow on preclinical and clinical assessments are needed for onward progression of these inhibitors to inform dosing regimens that might achieve the necessary exposures required for in vivo venom neutralisation. That being said, PK/PD work has been initiated within Phase I trials, for example with DMPS Abouyannis et al. 2025 demonstrated a plasma exposure of >10 µg/mL for single doses of 1,200 mg and higher. This is equivalent to 80 µM, which although is lower than the EC<sub>50</sub> for some venoms in the clotting assay (Figure 3J), the venom dose (50 to 250 ng/ 50 µL, i.e. 1,000 to 5,000 ng/µL) is estimated to be >1000 times higher than a natural envenoming by Bothrops atrox at less than 1 ng/mL in serum (https://doi.org/10.1016/j.toxicon.2022.09.010). These extrapolations therefore indicate that the doses selected in our studies would have human clinical relevance.

      Finally, in terms of anticoagulant venom effects - these would be observed in our experimental approach either as reduced kinetic responses in the plasma clotting assay (as observed with nafamostat in Figure 3B) or as a prolonged clotting time in the thromboelastography assay (Figure 6). As stated in the results section Comparison of coagulation profiles, all of the venoms tested presented with a procoagulant effect. If underlying anticoagulant activity from PLA<sub>2</sub> toxins was to arise after inhibition of the procoagulant toxins (i.e. SVMPs by marimastat), as has been seen for certain other snake venoms previously, this would result in a percentage inhibition far greater than 100% in the plasma assay (Figure 3C to I) or as a prolonged clotting time in the thromboelastography assay. These described anticoagulant profiles were not observed with any venom tested in this study.

      (5) Finally, the in vivo evidence is limited to a chicken embryo model. To support your hypothesis, a conventional mouse model with delayed post-envenomation dosing (24-36 h monitoring) is needed to address both safety/toxicity and post-exposure efficacy, and to define a realistic therapeutic window, especially because venom toxins act very quickly and the timing of administration is central to the clinical utility of any small-molecule approach.

      We agree with the reviewer that the next important step for this research activity is utilising murine preclinical models to validate the in vitro and preliminary in vivo findings described in this manuscript. However, as stated above, this study provides the initial evidence base that the promising utility of marimastat, DMPS and varespladib as repurposed snakebite drugs extends to a range of neotropical viper venoms. Evaluating the safety, efficacy (both precincubation and rescue approaches) and PK/PD relationships to inform optimal dosing strategies of these molecules will be crucial next steps for the field. However, these activities are far from trivial and will take several years of additional research, and therefore fall outside the scope of this initial manuscript.

      To address the concern related to the evidence is limited to a chicken embryo model, we have included additional sentences to discuss the wider use of the egg model within snakebite research and related translation to murine studies.

      Minor comments:

      (1) Figure 2D: How do you discuss the fact that "no venom" has SVSP activity?

      The data for all in vitro assays in Figure 2 is presented as AUC from the raw data (absorbance or fluorescence), for consistency across assay. Therefore, all assays (B to D) have background signal in the absence of venom. The SVSP assay has a greater background signal.

      (2) For better understanding, I would suggest adding a dedicated column in Figure 4A with Nafamostat SVSP data reported as "N/D" where applicable.

      As stated in the results, due to the weak inhibitory activity EC<sub>50</sub> assessment was not justified, therefore adding this column would be redundant.

      (3) The introduction is too long relative to the experimental content and would benefit from tightening to sharpen the motivation and unmet need.

      We thank the reviewer for their opinion and we have reviewed the introductory section again. While we made minor edits throughout, we decided not to make substantial modifications to it.

      Reviewer #3 (Recommendations for the authors):

      I only have some minor comments:

      (1) In line 100, the word "that" is repeated.

      We thank the reviewer for spotting this error, which we have corrected.

      (2) Line 433. I believe the word "compromising" should be substituted by "comprising" here.

      We thank the reviewer for spotting this.

      (3) Figure 1 and supplementary: Bothrops asper venom has been very thoroughly studied, and using only one study from Costa Rica might underestimate the venom variation within the species. I suggest looking at the following study: https://doi.org/10.1016/j.toxicon.2022.106983. Maybe it is not necessary to change anything, but worth looking into.

      We appreciate the reviewer flagging this paper, it has been added to the manuscript (reference 48) and has provided additional data for Figure 1 and Supplementary table 1.

      (4) Methods: Given the intraspecies variation described for some of these species, I believe it is relevant to add the locality of origin of the venoms, and not only the country. I, of course, understand this is often unknown for historical samples.

      We have included the following sentence in the methods. Due to the historic nature of the venom samples, the source locality is not available beyond country of origin, with the exception of B. lanceolatus which is endemic to Martinique.

      (5) Figure 3: It is not very accurate to show an SD when the sample number is 2. I suggest, when possible, showing the mean and the two data points in the plots. This also applies to other figures where n=2. Also, in Figure 3D, does Marimastat seem to have an anticoagulant effect, or is this just within normal variation?

      We have removed the statement in the statistics paragraph of the methods Standard deviation (SD) for all kinetic reads and standard error for AUC is reported based on Prism v10 but kept the sentence. The sample sizes for HTS assays including the SVMP, PLA<sub>2</sub> and coagulation experiment are the average of the means from independent assays (n >2 within each independent assay). We understand the reviewer’s opinion on limited meaning of SD as well as SE for Fig 3 A to I, therefore we have changed the error bars to range, as we think that displaying the individual points would result in a lack of visual and analytic clarity.

      In relation to the query about marimastat anticoagulant effect in Fig 4D, as shown in 4B marimastat has no direct anticoagulant effect. The >100% inhibition for marimastat is likely to be normal variation as this is a biological assay which has high variability. However, it could also be that the strong inhibition of the SVMPs in B. asper along with limited SVSP activity has unmasked an anticoagulant effect of the remaining PLA<sub>2</sub> toxin which has high activity in this venom. That being said, as B. asper has a similar profile, we would have expected to see a similar profile in B. atrox in both the plasma and TEG assays. Therefore, assay variation seems the most likely reason for this observation.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents a new Bayesian approach to estimate importation probabilities of malaria combining epidemiological data, travel history, and genetic data through pairwise IBD estimates. Importation is an important factor challenging malaria elimination, especially in low transmission settings. This paper focus on Magude and Matutuine, two districts in south Mozambique with very low malaria transmission. The results show isolation-by-distance in Mozambique, with genetic relatedness decreasing with distances larger than 100 km, and no spatial correlation for distances between 10 and 100 km. But again strong spatial correlation in distances smaller than 10 km. They report high genetic relatedness between Matutuine and Inhambane, higher than between Matutuine and Magude. Inhambane is the main source of importation in Matutuine, accounting for 63.5% of imported cases. Magude, on the other hand, shows smaller importation and travel rates than Matutuine, as it is a rural area with less mobility. Additionally, they report higher levels of importation and travel in the dry season, when transmission is lower. Also, no association with importation was found for occupation, sex and other factors. These data have practical implications for public health strategies aiming malaria elimination, for example, testing and treating travelers from Matutuine in the dry season.

      Strengths:

      The strength of this study relies in the combination of different sources of data - epidemiological, travel and genetic data - to estimate importation probabilities, the statistical analyses.

      Weaknesses:

      The authors recognize the limitations related to sample size and the biases of travel reports.

      We appreciate the review and comment about the manuscript.

      Reviewer #2 (Public review):

      Summary:

      Based on a detailed dataset, the authors present a novel Bayesian approach to classify malaria cases as either imported or locally acquired.

      Strengths:

      The proposed Bayesian approach for case classification is simple, well justified, and allows the integration of parasite genomics, travel history, and epidemiological data.

      Weakness:

      While the authors aim to classify cases as imported or locally acquired, the work lacks a quantification of the contribution of each case type to overall transmission.

      Comments on revisions:

      All my questions and concerns were satisfactorily addressed.

      We appreciate the review and comment about the manuscript. In fact, the approach does not pretend to quantify the contribution of each case to overall transmission. In the discussion we state it and refer to future work with this scope.

      Reviewer #3 (Public review):

      This work provides a novel statistical model to identify imported malaria cases, which are an important challenge for elimination, particularly in low-transmission areas. This tool was applied in Plasmodium falciparum populations in Mozambique and determined differences in importation rates in 2 low-transmission districts in the South.

      Strengths:

      The study has several strengths, mainly the development of a novel Bayesian model that integrates genomic, epidemiological, and travel data to estimate importation probabilities. The results showed insights into malaria transmission dynamics, particularly identifying importation sources and differences in importation rates in Mozambique. Finally, the relevance of the findings is to suggest interventions focusing on the traveler population to support efforts for malaria elimination.

      Weaknesses:

      The study also has some limitations, although the authors have plans to address them. The sample collection was not representative of some provinces, and not all samples had sufficient metadata for the risk factor analysis. Additionally, the authors used a proxy for transmission intensity and assumed some other conditions to calculate the importation probability for specific scenarios. They plan to conduct a new sample collection and include monthly malaria incidence estimates in the future.

      Comments on revisions:

      Delete "We added this text to the discussion" in line 302 (Discussion)

      I recommend adding the plans to address limitations indicated in the Response to Reviewers document in the Discussion. This would really strengthen the limitation section.

      Thank you for pointing to these aspects. We deleted the sentence mentioned. In the discussion section, we now finish the paragraph on limitations with the proposed future work to address them.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The method is an extension of the current state-of-art methods, not a fundamentally new one.

      We respectfully disagree with this characterization. While TopoMetry is inspired by the theory of spectral geometry, it is not a simple extension of existing dimensionality reduction methods such as Diffusion Maps. Instead, TopoMetry introduces a new framework for single-cell analysis that:

      Iteratively approximates manifold geometry by constructing refined diffusion operators on spectral scaffolds (“the geometry of the geometry”), a procedure not present in existing methods.

      Provides a unified workflow for dimensionality estimation, clustering, visualization, imputation, lineage inference, and diagnostics, all within the same geometric framework.

      Introduces operator-native fidelity scores and Riemannian diagnostics to single-cell analysis, enabling researchers to evaluate and trust embeddings—functionality absent in prior methods.

      Thus, TopoMetry represents a new paradigm for geometry-aware single-cell analysis, not merely a reimplementation of existing algorithms.

      (2) The paper contains a lot of jargon.

      We have thoroughly simplified the text throughout the manuscript. We now introduce geometric concepts in accessible terms, avoiding technical details where they are not essential for biological interpretation. For example, references to the Laplace–Beltrami operator and its eigenfunctions have been reduced and reframed in terms of “geometry,” “diffusion,” and “spectral scaffolds,” which are more intuitive for a general audience.

      Reviewer #1 (Recommendations for the authors):

      (1) What happens if the LBO is approximated more than twice? As the main idea of the method is an iterative approach to approximate LBO more precisely, then the authors would have already considered this. If so, this could be additionally discussed in the manuscript.

      We thank the reviewer for this important point. Indeed, TopoMetry’s design naturally supports iterating the Laplace–Beltrami operator (LBO) approximation beyond two steps. However, additional iterations (three or more) lead to only marginal improvements in final results while significantly increasing computational cost. In some tested cases, additional iterations could even over-smooth the data, reducing the resolution of fine-scale structure. The revised manuscript avoids an excessive focus on iterative LBO approximations and instead centers the narrative around representing and evaluating the underlying geometry of single-cell data.

      (2) As the paper describes the method in a very comprehensive way, as a result, it contains a lot of mathematical equations and jargon. This could hinder the visibility of the whole manuscript to biologists who do not have a background in mathematics. Thus, I strongly recommend that the authors consider moving a considerable amount of text to the supplementary material, and the main text should focus on the benchmarking results and the possible applications.

      We appreciate this recommendation and have substantially revised the manuscript to make it more accessible to a broad biological audience. In the revised version:

      We moved detailed mathematical derivations and operator definitions to the Methods section, keeping only the most essential concepts in the main text.

      We reframed technical terms (e.g., Laplace–Beltrami operator, eigenfunctions) in simpler and more intuitive language in the main text. 

      The Results section now emphasizes benchmarking outcomes and biological applications.

      Reviewer #2 (Public review):

      (1) To encourage the single-cell community to adopt this method, the authors should more clearly demonstrate its advantages over existing methods. There are many single cell analysis algorithms that are proposed in each task and some of them are widely used by biologists. However, the comparison in this work is somewhat limited. For example, Even methods mentioned in the relevant work paragraph (2nd paragraph) on page 2 are not all compared, or the reason why they are not included is not discussed. Also, I am curious how PC dimensions are determined. The choice of 300 PCs on page 11 seems arbitrary. Furthermore, the usefulness of dimension-reduced data also depends a lot on the preceding processing steps, such as highly variable gene selection. I understand it is hard to control all those factors, but I think there is room for improvement.

      We have substantially expanded the benchmarking and discussion of competing methods. These additions more clearly demonstrate TopoMetry’s advantages and robustness compared to widely adopted alternatives. In the revised manuscript:

      We now benchmark TopoMetry against 68 diverse single-cell datasets, far exceeding the scope of the original version.

      We explicitly compare TopoMetry with PCA→UMAP, standalone UMAP, and scVI. These workflows represent the de facto current standard in single-cell analysis. While numerous other approaches exist, a comprehensive benchmark of every possible workflow lies beyond the scope of this study and would itself warrant a dedicated report.

      We adopt the exact same preprocessing steps for all evaluated workflows to ensure a fair comparison, except for scVI, which requires gene counts data and performs its own internal preprocessing.

      We adjust the number of PCs used for each dataset based on the currently adopted “elbow point” ad hoc.

      (2) The paper lacks experiments that validate the results. It would be beneficial to see additional evaluation settings with better-established ground truths to more strongly demonstrate the method's effectiveness.

      We agree that validation is crucial and have strengthened this aspect:

      We introduce new geometry-preservation metrics and validate that TopoMetry outperforms current de facto standards.

      We demonstrate that TopoMetry resolves well-established ground-truth structures, such as the cell cycle in pancreas development and T cell proliferation, which PCA→UMAP fails to capture (Suppl. Fig. S3).

      We validate the biological relevance of novel T cell subpopulations by linking them to TCR clonotypes and clonal expansion patterns using datasets with paired VDJ information (ECCITE-TCR, TICA).

      We show that TopoMetry faithfully recovers expected lineage trajectories in atlas-scale datasets (MOCA).

      These analyses demonstrate that TopoMetry not only preserves geometry but also recovers biologically meaningful ground-truth structures. Further experimental investigation of biological insights obtained from the presented examples exceeds the scope of the presented methodological work.

      (3) The effect of various parameters, such as those involved in k-nearest neighbors (KNN) or choosing the appropriate Laplacian operator, is not comprehensively explored. How can we ensure the analysis is not overly sensitive to these parameters?

      We now explicitly address parameter robustness and show that results are stable across a wide range of k values (30–200) in the neighborhood graph (Suppl. Fig. S1e).

      The range of possible Laplacian operators was a design choice aimed at increasing user freedom, but we agree with the reviewer that this option could confuse readers and users. TopoMetry now only uses the appropriate operator (density-normalized graph Laplacian, a.k.a. diffusion operator), reducing variability and improving usability.

      (4) Batch effects are prevalent in single-cell data. The paper does not adequately address this issue.

      Several of the datasets we analyzed include cells from multiple donors and experimental batches, and TopoMetry successfully recovers consistent biological structure across these.

      TopoMetry’s spectral scaffolds can be integrated with data integration methods such as Harmony and Scanorama, which are employed to correct the latent PCA space in current practice.

      Reviewer #2 (Recommendations for the authors):

      (1) The paper introduces technical jargon without sufficient explanation abruptly many times. This makes it difficult for readers from a biological background to follow. Even I, with a more computational background, struggled to grasp some parts.

      We thank the reviewer for this feedback and have streamlined terminology throughout the manuscript, replacing jargon with more intuitive language and providing brief explanations when technical terms are first introduced. This makes the text more accessible to both computational and biological audiences.

      (2) There is no comparison of the computational cost of this method with existing approaches, which is an important factor for practical adoption. Including a benchmarking section on this would be useful.

      We thank the reviewer for this suggestion and have now included a runtime benchmark against PCA→UMAP, PHATE, and scVI (Suppl. Fig. 1f), showing that while TopoMetry is slightly slower than PCA→UMAP, it scales more favorably than alternative geometry-aware methods (PHATE) and neural networks (scVI).

      (3) TopOMetry allows users to obtain and evaluate dozens of possible representations. However, I wonder if this could introduce a user burden, increasing uncertainty and subjectivity, as users should examine them manually. I think this should be clarified.

      We appreciate this concern and have streamlined the workflow to minimize user burden. As shown in the original manuscript, representations learned with different TopoMetry kernels and Laplacian variants converge to highly similar results. Based on this, TopoMetry now defaults to the best-performing kernel and the most appropriate Laplacian operator, yielding only two scaffold representations (fixed-time and multiscale) and corresponding visualizations rather than dozens of alternatives. This removes the need for manual selection while retaining flexibility for advanced users. In addition, we introduced a single-line command that runs the entire analysis and generates a comprehensive PDF report, allowing users to evaluate results in a standardized and user-friendly way. Together, these changes eliminate unnecessary subjectivity and ensure consistent outputs across analyses.

      (4) Formatting. There are errors in figure numbering within the main text. For instance, it should be Figure 4 instead of Figure 3 on page 11. Some figures are not concise. For example, Figure 2 contains too much text, which detracts from its visibility. I recommend trimming the figures to improve clarity. A color map is missing in Figure 2, which could help better interpret the data.

      We have thoroughly adjusted the manuscript and figures for improved visibility and clarity.

      Broader Impact and Reception

      Since our preprint, TopoMetry has been used by Hale et al. (Science, 2024), where it helped reveal morphological T cell subpopulations, and in a recent preprint by Tedeschi et al. (2025). These independent applications highlight the utility and impact of TopoMetry beyond our group, supporting its relevance to diverse biological contexts. In addition, two independent studies performing multimodal integration of RNA and TCR data (Zhang et al., 2023 and Drost et al., 2024) have identified a diversity of T cell subpopulations that resembles the clusters identified by TopoMetry using only RNA data.

    1. Author response:

      eLife Assessment

      This study reports the relative importance of Tie1 and Tie2 signaling for atrial versus ventricular trabeculation. It is an important study and is one of the few works to date that have carefully and simultaneously analyzed these two processes. In line with a previous study in zebrafish, the authors demonstrate key differences between atrial and ventricular trabeculation. While the imaging and quantitative data were conducted with solid and validated methodology throughout the manuscript, the work would benefit from more rigourous approaches where Tie1/2 signaling is disrupted prior to the onset of atrial/ventricular trabeculation, to allow for a more direct comparison.

      We thank the editors for the eLife assessment. We would like to request that the following statement be modified: “…the work would benefit from more rigourous approaches where Tie1/2 signaling is disrupted prior to the onset of atrial/ventricular trabeculation, to allow for a more direct comparison”. We request this change for the following reasons:

      We utilized two distinct genetic mouse models in this study (as summarized in Fig. 7I), comprising conventional knockouts (Tie1<sup>tm1a/tm1a</sup>, Tie1<sup>ΔICD/ΔICD</sup> and Tie1<sup>ΔICD/ΔICD</sup>;Tek<sup>+/-</sup>) and inducible gene deletion models (Tek<sup>iECKO</sup>, Tie1ICD<sup>iECKO</sup>, and Tie1ICD<sup>iECKO</sup>;Tek<sup>+/-</sup>) [1-3]. The Tie1<sup>tm1a/tm1a</sup> line is equivalent to the previously published Tie1<sup>-/-</sup mouse line, as demonstrated in our prior work and by others [1, 2, 4-6]. Therefore, the Tie1 or Tek alleles were inactivated prior to the onset of atrial and ventricular trabeculations, as shown in Fig. 1, Fig. 2, Fig. 3, Fig. 5A-D, and Supplemental Fig. 3. Based on these findings, we propose that TIE1 is differentially required for atria versus ventricle morphogenesis, and acts synergistically with TIE2 during cardiac trabeculation.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Ding et al. use genetic mouse models to demonstrate that atrial trabeculation is more dependent on Tie1/Tie2 signaling than ventricular trabeculation. With additional experimentation that would support the current claims, the results may hold significant value, as atrial trabeculation remains an understudied phenomenon in cardiac biology with potential implications for atrial cardiomyopathy and atrial fibrillation.

      Strengths:

      Detailed characterization of atrial versus ventricular trabeculation across different developmental timepoints, and the use of appropriate animal models to address the scientific question at hand.

      Weaknesses:

      The authors have consistently treated mice with tamoxifen after ventricular, but not atrial, trabeculation has already started. As such, the observed cardiac phenotypes - where predominantly atrial trabeculation is affected - might be a mere consequence of the precise time window in which Tie1/2 signaling was impaired, rather than a direct measurement of its relative importance for atrial versus ventricular trabeculation. The conclusions of the paper may thus be significantly strengthened by depleting Tie1/2 signaling prior to the onset of ventricular trabeculation, as is done for atrial trabeculation.

      We thank the reviewer for the comments.

      Regarding the timeline of gene deletion and tamoxifen treatment, we would like to provide the following clarification.

      Fig. 1-3: As described in the Methods and Materials, Tie1<sup>tm1a/tm1a</sup> is a knockout first mouse model established from EUCOMM embryonic stem cells (EPD0735-3B07) targeting Tie1 gene. Therefore, the Tie1<sup>tm1a/tm1a</sup> line is equivalent to the previously published Tie1 null mice (Tie1<sup>-/-</sup>). The Tie1<sup>Flox/Flox</sup> mouse line (with exon 8 floxed) was generated when the lacZ reporter and neo-cassette were excised using the FLPeR mice.

      Fig. 5A-D: To investigate the synergy of TIE1 and TIE2 in cardiac trabeculation, we utilized the Tek<sup>+/-</sup> and Tie1<sup>ΔICD/+</sup> mouse lines and they were crossbred to generate double mutant mice harboring a homozygous Tie1 mutation and a single null Tek allele (Tie1<sup>ΔICD/ΔICD</sup>;Tek<sup>+/-</sup>). Although no obvious defects were observed in atrial or ventricular structures following Tie1 deficiency alone at E10.5, both atria and ventricle development were disrupted in Tie1<sup>ΔICD/ΔICD</sup>;Tek<sup>+/-</sup> mutants at the same stage (Fig. 5A-D).

      Supplemental Fig. 3: To verify the role of TIE1 in atrial development, we employed alternative knockout mouse line targeting the Tie1 intracellular domain by floxing exons 15 and exon 16 (Tie1ICD<sup>Flox/Flox</sup>). Mutants harboring these null alleles are designated as Tie1<sup>ΔICD/ ΔICD</sup>. As detailed in the previous publication [2], the line is also equivalent to the previously published Tie1 null mice (Tie1<sup>-/-</sup>). The cardiac phenotypes shown in Supplemental Fig. 3 are indeed similar to those of Tie1<sup>tm1a/tm1a</sup> mutant mice.

      For the inducible knockouts targeting Tie1, Tek and both, the results are shown in Fig. 4, Fig. 5E-H, Fig. 6, Fig. 7.

      Fig. 4: As mice homozygous for Tek mutation (Tek<sup>-/-</sup>) die before E10.5 [3, 7], we performed studies using the inducible knockout line targeting Tek (Tek<sup>Flox/-</sup>;Cdh5-Cre<sup>ERT2</sup> named as Tek<sup>iECKO</sup>), as shown in Fig. 4.

      Fig. 5-7: To investigate the synergy of TIE1 and TIE2 in the cardiac trabeculation at the later stages of embryogenesis (Fig. 5E-H, Fig. 6) and the postnatal stage (Fig. 7), we used the inducible knockout models targeting Tie1/Tek, including Tie1ICD<sup>iECKO</sup> (Tie1ICD<sup>Flox/-</sup>;Cdh5-Cre<sup>ERT2</sup>) and Tie1ICD<sup>iECKO</sup>;Tek<sup>+/-</sup> (Tie1ICD<sup>Flox/-</sup>;Cdh5-Cre<sup>ERT2</sup>;Tek<sup>+/-</sup>).

      Reviewer #2 (Public review):

      Summary:

      Ding et al. examine the role of TIE1 in cardiac chamber morphogenesis using genetic mouse models targeting Tie1, Tek, or both, and analyzing endocardial cell-mediated chamber formation across multiple embryonic developmental and postnatal stages, supported by analysis of published single-cell datasets and new bulk RNA seq analyses of murine cardiac tissue. The authors find that Tie1 and Tek expression is higher in atrial than ventricular endocardial cells. Notably, endothelial Tie1 is required for atrial trabeculation at E12.5, but is less critical in ventricular trabeculation. TIE1 also acts synergistically with TIE2 during atrial trabeculation. While Tie1 deficiency alone does not cause defects at E10.5, combined heterozygous deletion of Tek disrupts both atrial and ventricular development at E10.5. This synergy is further supported by analyses at later embryonic stages and in postnatal hearts.

      Strengths:

      The study is well-designed, clearly written, and supported by high-quality figures. The performed experiments demonstrate a previously unrecognized role for Tie1 in cardiac development and identify synergistic control of cardiac morphogenesis by Tie1 and Tie2. This synergy is consistent with the previously identified roles of Tie1 and Tek in venous development and with Tie1 involvement in angiopoietin-dependent postnatal vascular and lymphatic remodeling. Together, these findings support a role for Tie1 as a contributor to Ang1-Tie2 signaling during heart development.

      Weaknesses:

      The manuscript does not include direct mechanistic studies; however, RNA seq analysis of atria and ventricles showed reduced expression of Tek, Dll1, and Notch1 upon Tie1 deficiency in developing hearts. Although previously reported mechanisms, such as TIE1-TIE2 heterodimer formation and effects on endothelial junctions, migration, or survival are discussed, no direct mechanistic experiments are performed. Addressing some of these mechanisms would have clarified the basis of Tie1-Tie2 synergy. As two distinct Tie1 models are used, including one targeting the kinase domain, the authors should state whether phenotypes differed or were similar between models.

      We thank the reviewer for the comments. In this study, we have provided genetic evidence that TIE1 is differentially required for atrial versus ventricular trabeculation. Although the precise molecular mechanisms underlying TIE1 function require further investigation, we have provided compelling genetic evidence of its synergistic role with TIE2 during this process. The two genetic models targeting Tie1 (Tie1<sup>tm1a/tm1a</sup>, Tie1<sup>ΔICD/ΔICD</sup>) produced consistent cardiac and vascular phenotypes as shown in this study and our previous work [1, 2].

      Reviewer #3 (Public review):

      Summary:

      Ding et al. investigate the roles of TIE1 and TEK (Tie2) in mouse cardiac development, with a particular focus on atrial trabeculation. The authors employ multiple genetic models, including Tie1ICDflox/flox (with Cdh5-CreERT2), a knockout-first allele (EUCOMM, Tie1 tm1a/tm1a), and a Tek deletion model.

      Based on the dataset from Feng et al. 2022 Nat Commun, the authors report increased expression of Tie1 and Tek transcripts in atrial endocardial cells compared to ventricular cells at embryonic day (E) 14.5. Loss of Tie1 leads to early atrial trabeculation defects detectable at E12.5, whereas ventricular defects appear later and are less pronounced at E14.5. Chamber-specific RNA sequencing reveals stronger transcriptional changes in atrial tissue.

      Conditional deletion of Tek results in a similar phenotype, with more pronounced atrial defects. Combined deletion of Tie1 and Tek (Tie1 ΔICD/ΔICD; Tek+/-) leads to earlier and more severe defects in both atrial and ventricular trabeculation and results in embryonic lethality around E12.5, suggesting a synergistic interaction between the two genes.

      Conditional endothelial deletion of Tie1 combined with heterozygous global Tek at later embryonic stages allows analysis at later time points and again shows more severe defects in atrial trabeculation. Postnatal analysis of this model reveals reduced heart-to-body weight ratios and potential mild atrial abnormalities.

      Strengths:

      (1) The authors address chamber-specific signaling mechanisms underlying atrial versus ventricular trabeculation, an area of high developmental and clinical relevance.

      (2) The study provides a comprehensive temporal analysis across multiple embryonic stages.

      (3) The use of multiple genetic models strengthens the overall conclusions and allows comparative interpretation.

      (4) While focusing on trabeculation, the authors also include observations on coronary vessel development, increasing the broader relevance of the work. The findings are therefore of interest to the wider cardiovascular research community.

      Weaknesses:

      (1) Timing of recombination vs. trabeculation onset

      Ventricular trabeculation begins earlier than atrial trabeculation. Since tamoxifen (in contrast to 4-hydroxytamoxifen) requires metabolic activation, Cre-mediated recombination will occur with a delay. This suggests that atrial trabeculation may be targeted before its onset, whereas ventricular trabeculation may already be underway for 2-3 days at the time of effective gene deletion.

      How do the authors account for this discrepancy in their interpretation?

      Have earlier induction time points been tested to better capture the onset of ventricular trabeculation? This limitation should be explicitly discussed.

      (2) Clarity of genetic models and experimental design

      The study employs several genetic constructs. It would improve clarity if, for each experiment, the specific genetic model and tamoxifen regimen were clearly described before presenting the results.

      We thank the reviewer for the detailed and constructive comments. For studies employing the inducible gene deletion mouse models, the genetic models and tamoxifen treatment schemes have been provided in the related figures. For the rest of studies, we used the conventional knockouts targeting Tie1 and Tek (Tie1<sup>tm1a/tm1a</sup>, Tie1<sup>ΔICD/ΔICD</sup> and Tie1<sup>ΔICD/ΔICD</sup>;Tek<sup>+/-</sup>), as detailed above.

      (3) Tie1 tm1a/tm1a phenotype vs. known global knockout

      Previous studies (PMID: 8846781, 7596437) show that complete Tie1 loss leads to severe edema, vascular rupture, and embryonic lethality around E13.5-E14.5.

      How does the Tie1 tm1a/tm1a allele differ, given that animals appear to survive longer? Is this allele hypomorphic rather than a full knockout?

      This point requires clarification.

      Tie1<sup>tm1a/tm1a</sup> is equivalent to the full knockout (Tie1<sup>-/-</sup>). As demonstrated in our prior work, the Tie1<sup>ΔICD/ΔICD</sup> model produced lymphatic and blood vascular phenotypes similar to those of Tie1<sup>-/-</sup> mutants [1, 2, 5, 6].

      (4) Limited mechanistic insight

      While the authors aim to investigate underlying mechanisms, the current study is largely descriptive and based on mRNA expression and genetic interaction analyses (Tie1/Tek co-deletion). Direct mechanistic insights into signaling pathways remain limited. However, the dataset provides a valuable foundation for future mechanistic studies, which should be more clearly acknowledged in the discussion.

      We thank the reviewer for the comments. The manuscript will be revised accordingly, and a detailed response will be provided in our final submission.

      Reference

      (1) Cao, X., et al., Endothelial TIE1 Restricts Angiogenic Sprouting to Coordinate Vein Assembly in Synergy With Its Homologue TIE2. Arterioscler Thromb Vasc Biol, 2023. 43(8): p. e323-e338.

      (2) Shen, B., et al., Genetic dissection of tie pathway in mouse lymphatic maturation and valve development. Arterioscler Thromb Vasc Biol, 2014. 34(6): p. 1221-30.

      (3) Chu, M., et al., Angiopoietin receptor Tie2 is required for vein specification and maintenance via regulating COUP-TFII. Elife, 2016. 5:e21032.

      (4) Rodewald, H.R. and T.N. Sato, Tie1, a receptor tyrosine kinase essential for vascular endothelial cell integrity, is not critical for the development of hematopoietic cells. Oncogene, 1996. 12(2): p. 397-404.

      (5) D'Amico, G., et al., Loss of endothelial Tie1 receptor impairs lymphatic vessel development-brief report. Arterioscler Thromb Vasc Biol, 2010. 30(2): p. 207-9.

      (6) Qu, X., et al., Abnormal embryonic lymphatic vessel development in Tie1 hypomorphic mice. Development, 2010. 137(8): p. 1285-95.

      (7) Dumont, D.J., et al., Dominant-negative and targeted null mutations in the endothelial receptor tyrosine kinase, tek, reveal a critical role in vasculogenesis of the embryo. Genes Dev, 1994. 8(16): p. 1897-909.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this review paper, the authors describe the concept of neural correlates of consciousness (NCC) and explain how noninvasive neuroimaging methods fall short of being able to properly characterise an unconfounded NCC. They argue that intracranial research is a means to address this gap and provide a review of many intracranial neuroimaging studies that have sought to answer questions regarding the neural basis of perceptual consciousness.

      Strengths

      The authors have provided an in-depth, timely, and scholarly contribution to the study of NCCs. First and foremost, the review surveys a vast array of literature. The authors synthesise findings such that a coherent narrative of what invasive electrophysiology studies have revealed about the neural basis of consciousness can be easily grasped by the reader. The review is also, to the best of my knowledge, the first review to specifically target intracranial approaches to consciousness and to describe their results in a single article. This is a credit to the authors, as it becomes ever harder to apply strict tests to theories of consciousness using methods such as fMRI and M/EEG it is important to have informative resources describing the results of human intracranial research so that theorists will have to constrain their theories further in accordance with such data. As far as the authors were aiming to provide a complete and coherent overview of intracranial approaches to the study of NCCs, I believe they have achieved their aim.

      We appreciate the reviewer's positive feedback on our work.

      Weaknesses

      Overall, I feel positive about this paper. However, there are a couple of aspects to the manuscript that I think could be improved.

      (1) Distinguishing NCCs from their prerequisites or consequences

      This section in the introduction was particularly confusing to me. Namely, in this section, the authors' aim is to explain how intracranial recordings can help distinguish 'pure' NCCs from their antecedents and consequences. However, the authors almost exclusively describe different tasks (e.g., no-report tasks) that have been used to help solve this problem, rather than elaborating on how intracranial recordings may resolve this issue. The authors claim that no-report designs rely on null findings, and invasive recordings can be more sensitive to smaller effects, which can help in such cases. However, this motivation pertains to the previous sub-section (limits of noninvasive methods), since it is primarily concerned with the lack of temporal and spatial resolution of fMRI and M/EEG. It is not, in and of itself, a means to distinguish NCCs from their confounds.

      As such, in its current formulation, I do not find the argument that intracranial recordings are better suited to identifying pure NCCs (i.e. separating them from pre- or post-processing) convincing. To me, this is a problem solved through novel paradigms and better-developed theories. As it stands, the paper justifies my position by highlighting task developments that help to distinguish NCCs from prerequisites and consequences, rather than giving a novel argument as to why intracranial recordings outperform noninvasive methods beyond the reasons they explained in the previous section. Again, this position is justified when, from lines 505-506, the authors describe how none of the reported single-cell studies were able to dissociate NCCs from post-perceptual processing. As such, it seems as if, even with intracranial recording, NCCs and their confounds cannot be disentangled without appropriate tasks.

      The section 'Towards Better Behavioural Paradigms' is a clear attempt to address these issues and, as such, I am sure the authors share the same concerns as I am raising. Still, I remain unconvinced that the distinguishing of NCCs from pre-/post- processing is a fair motivation for using intracranial over noninvasive measures.

      We agree that distinguishing proper NCCs from their prerequisites or consequences is primarily a matter of experimental design and theoretical framework, not merely of recording modality. We did not mean to imply that intracranial recordings inherently solve this dissociation.This is now explicitly stated that at the beginning of this section. Instead, we argued that the high signal-to-noise ratio and spatiotemporal accuracy of sEEG offer a stronger "testing ground" for the null findings often relied on by no-report paradigms. This is now also further clarified in the revised section “Limits of noninvasive measures”.

      We also explicitly acknowledge, as the reviewer noted, that even the most precise recordings require careful task dissociations to distinguish NCCs from their prerequisites and consequences.

      (2) Drawing misleading conclusions from certain studies

      There are passages of the manuscript where the authors draw conclusions from studies that are not necessarily warranted by the studies they cite. For instance:

      Lines 265 - 271: "The results of these two studies revealed a complex pattern: on the one hand, HGA in the lateral occipitotemporal cortex and the ventral visual cortex correlated with stimulus strength. On the other hand, it also correlated with another factor that does not appear to play a role in visibility (repetition suppression), and did not correlate with a non-sensory factor that affects visibility reports (prior exposure). These results suggest that activity in occipitotemporal cortex regions reflecting higher-order visual processing may be a precursor to the NCC but not an NCC proper."

      It's possible to imagine a theory that would predict HGA could correlate with stimulus strength and repetition suppression, or that it would not correlate with prior exposure (e.g. prior exposure could impact response bias without affecting subjective visibility itself). The authors describe this exact ambiguity in interpretation later in the article (line 664), but in its current form, at least in line 270 (when the study is most extensively discussed), the manuscript heavily implies that HGA is not an NCC proper. This generates a false impression that intracranial recordings have conclusively determined that occipitotemporal HGA is not a pure NCC, which is certainly a premature conclusion.

      We agree that our interpretation of these studies (lines 265–271 of the previous version of the manuscript) was presented too definitively. We have modified the text (now lines 314-317) to soften this conclusion and align it with the more nuanced discussion later in the manuscript. Specifically, we now frame this as a "suggested dissociation" rather than a conclusive finding (line 730), and we explicitly acknowledge that alternative interpretations remain viable.

      Line 243: "Altogether, these early human intracranial studies indicate that early-latency visual processing steps, reflected in broadband and low gamma activity, occur irrespective of whether a stimulus is consciously perceived or not. They also identified a candidate NCC: later (>200 ms) activity in the occipitotemporal region responsible for higher-order visual processing."

      The authors claim in this section that later (>200ms) activity in occipitotemporal regions may be a candidate for an NCC. However, the Fisch et al. (2009) study they describe in support of this conclusion found that early (~150ms) activity could dissociate conscious and unconscious processing. This would suggest that it is early processing that lays claim to perceptual consciousness. The authors explicitly describe the Fisch et al results as showing evidence for early markers of consciousness (line 240: '...exhibited an early...response following recognized vs unrecognised stimuli.) Yet only a few lines later they use this to support the conclusion that a candidate NCC is 'later (>200ms) activity in the occipitotemporal region' (line 245). As such, I am not sure what conclusion the authors want me to make from these studies.

      This problem is repeated in lines 386-387: "Altogether, studies that investigated the cortical correlates of visual consciousness point to a role of neural responses starting ~250 ms after stimulus onset in the non-primary visual cortex and prefrontal cortex."

      This seems to be directly in conflict with the Fisch et al results, which show that correlates of consciousness can begin ~100ms earlier than the authors state in this passage.

      We thank the reviewer for pointing out this inconsistency. We agree that stating ">200 ms" conflicts with the findings of Fisch et al. (2009), who observed dissociations as early as ~150 ms. Our goal was to contrast the very early, stimulus-driven responses with the later responses that reflect consciousness. However, as the reviewer correctly notes, the exact "onset" of these signals varies across studies and paradigms. To address this, we have removed the specific ">200 ms" mentioned in line 245 of the previous version of the manuscript and updated the timing in line 284 to "starting 150 ms" to better reflect the results of Fisch et al. We also clarify that while the exact latency depends on the paradigm, a consistent finding is that activity representing conscious contents in higher-order visual cortex follows an initial wave of unconscious processes (lines 809-810).

      (3) Justifying single-neuron cortical correlates of consciousness

      The purpose of the present manuscript is to highlight why and how intracortical measures of neural activity can help reveal the neural correlates of perceptual consciousness. As such, in the section 'Single-neuron cortical correlates of perceptual consciousness', I think the paper is lacking an argument as to why single-neuron research is useful when searching for the NCC. Most theories of consciousness are based around circuit or system-level analyses (e.g., global ignition, recurrent feedback, prefrontal indexing, etc.) and usually do not make predictions about single cells. Without any elaboration or argument as to why single-cell research is necessary for a science of consciousness, the research described in this section, although excellent and valuable in its own right, seems out of place in the broader discussion of NCCs. A particularly strong interpretation here could be that intracranial recordings mislead researchers into studying single cells simply because it is the finest level of analysis, rather than because it offers helpful insight into the NCCs.

      It is true that many prominent theories of consciousness were developed based on macroscopic observations, largely due to the prevalence of non-invasive recordings in humans. However, we argue that recording single-unit activity is important for several reasons, and we made this clearer in the revised version. First, signals like fMRI, EEG (or even LFP) often conflate multiple distinct neural populations. SUA allows us to dissociate neurons representing the percept from neighboring neurons involved in task-related confounds (e.g., motor preparation or arousal) that would otherwise be blurred together. Therefore, some percepts might be represented by sparse coding involving a small, specific population of "concept" or "percept" cells. Electrophysiological studies in animal models reveal that various cognitive processes are encoded within neuronal subspaces that only emerge when single-unit activity is analyzed as lower-dimensional projections of the broader neural activity manifold (Mante et al., 2013; Ebitz & Hayden, 2021; Jayazeri & Afraz, 2017). Importantly, many neural computations are only discernible through the lens of population dynamics (i.e. with single neuron activity) (Vyas et al., 2021). We believe that providing high granularity through SUA recordings prevents over-aggregation of data, ensuring that even system-level theories can build on biologically accurate foundations.

      Moreover, some theories are defined at the cellular level. For instance, the Dendritic Integration Theory (Bachmann et al., 2020) posits that the integration of feedforward and feedback signals occurs at the level of individual pyramidal neurons. Without SUA, these cellular mechanisms remain untestable. Beyond spatial granularity, SUA also provides excellent temporal granularity, which is crucial for testing theories that rely on the precise timing of spikes (e.g., neural synchrony). As LFPs reflect average activity across populations, only SUA can confirm whether individual neurons lock their spikes to a specific phase, a mechanism hypothesized to bind features into a conscious whole.

      We added these points to a new section in the revised manuscript. References:

      Bachmann, T., Suzuki, M., & Aru, J. (2020). Dendritic integration theory: A thalamo-cortical theory of state and content of consciousness. Philosophy and the Mind Sciences, 1(II).

      Ebitz, R. B., & Hayden, B. Y. (2021). The population doctrine in cognitive neuroscience. Neuron, 109(19), 3055-3068.

      Jazayeri, M., & Afraz, A. (2017). Navigating the neural space in search of the neural code. Neuron, 93(5), 1003-1014.

      Mante, V., Sussillo, D., Shenoy, K. V., & Newsome, W. T. (2013). Context-dependent computation by recurrent dynamics in prefrontal cortex. nature, 503(7474), 78-84.

      Vyas, S., Golub, M. D., Sussillo, D., & Shenoy, K. V. (2020). Computation Through Neural Population Dynamics. Annual Review of Neuroscience, 43(1), 249-275.

      (4) No mention of combined fMRI-EEG research

      A minor point, but I was surprised that the authors did not mention any combined fMRI-EEG research when they were discussing the limits of noninvasive recordings. Intracortical recordings are one way to surpass the spatial and temporal resolution limits of M/EEG and fMRI respectively, but studies that combine fMRI and EEG are also an alternative means to solve this problem: by combining the spatial resolution of fMRI with the temporal resolution of EEG, researchers can - in theory - compare when and where certain activity patterns (be they univariate ERPs or multivariate patterns) arise. The authors do cite one paper (Dellert et al., 2021 JNeuro) that used this kind of setup, but they discuss it only with respect to the task and ignore the recording method. The argument for using intracranial recordings is weaker for not mentioning a viable, noninvasive alternative that resolves the same issues.

      We thank the reviewer for this point. We have added a discussion of fMRI-EEG to the "Limits of noninvasive measures" section (lines 167-171). While we acknowledge that fMRI-EEG is a powerful non-invasive tool for bridging spatial and temporal scales, we note that it relies on merging an indirect metabolic signal with a weak electrophysiological one filtered by the skull, which is computationally complex and often noisy. In contrast, intracranial recordings provide direct measures of both local field potentials and spiking activity within the same neural population, offering interpretability and signal-to-noise ratio that non-invasive combinations cannot match. In our view, this is not just an alternative to these methods, but a unique means of accessing the underlying neuronal ground truth.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors review the study of the neural correlates of consciousness (NCCs). They discuss several of the difficulties that researchers must face when studying NCCs, and argue that several of these difficulties can be alleviated by using intracranial recordings in humans.

      They describe what constitutes an NCC, and the difficulties to distinguish between an NCC proper from the prerequisites and consequences of conscious processing.

      They also describe the two main types of experimental designs used to study NCCs. These are the contrastive approach (with its report and non-report variants), and the supraliminal approach, each with its own merits and pitfalls.

      They discuss the limitations of non-invasive methods, such as fMRI, EEG and MEG, as well as the limitations of the use of invasive recordings in non-human animals.

      After setting the stage in this way, the authors provide an extensive review of the knowledge acquired by using invasive recordings in humans. This included population-level measurements in vision and in other sensory modalities, as well as single-neuron level studies. The authors also discuss studies of subcortical NCCs.

      The second half of this work discusses the theoretical insights gained through the use of intracranial recordings, as well as their limitations, and a perspective for future work.

      Strengths:

      This work offers an impressive review, which will serve as a useful reference document, both for newcomers to the study of NCC and for experienced researchers. The inclusion of non-visual and subcortical NCCs is of particular merit, as these have been understudied.

      Besides serving as a review, this work includes a perspective, exploring several directions to pursue for the progress of the field.

      We thank the reviewer for acknowledging the strength of our work.

      Weaknesses:

      The intention of the authors is to argue how some of the problems faced when studying NCCs are alleviated by the use of intracranial recordings in humans. But in some cases, the link between the problems related to the study of NCCs and the advantages of intracranial recordings over non-invasive methods is not clear.

      For example, the authors explain the difficulties in distinguishing between true NCCs from their prerequisites and consequences. This constitutes a difficult conceptual problems that plague all recording techniques. The authors don't provide a convincing explanation of how intracranial recordings offer advantages over EEG or MEG when dealing with these problems.

      We agree that the distinction between proper NCCs and their prerequisites or consequences is a fundamental challenge that affects all recording modalities. We did not intend to imply that intracranial recordings are a "silver bullet" for solving this conceptual problem in isolation, and we now explicitly state that at the beginning of this section (line 101).

      We have revised the section on "Distinguishing NCCs from their prerequisites or consequences" to clarify that intracranial recordings are a powerful tool when used in conjunction with appropriate experimental designs, rather than a standalone solution to these conceptual difficulties.

      For example, the authors explain how the use of non-report designs to rule out post-perceptual processing relies on null results, which, according to them, are harder to interpret given the low resolution of non-invasive methods. But the interpretation of null results is actually more complicated in the case of intracranial recordings. As the coverage achieved by the electrodes is sparse, if a null result is attested, it remains possible that a true effect was present in a nearby patch of cortex out of coverage.

      It is true that a null result in an intracranial study may simply reflect that the relevant neural population was not sampled by the specific electrode implantation scheme. However, we argue that interpreting null results is equally, if not more, complicated in non-invasive methods, albeit for different reasons. While M/EEG offers broader coverage, it is blind to many cortical sources because of their orientation (radial sources in MEG) or their location in deep sulci and subcortical structures. The signal-to-noise ratio of M/EEG is also much lower than that of intracranial EEG, making it more likely that null results obscure the existence of subtle effects (Parvizi & Kastner, 2018).

      To address this, we revised the manuscript to clarify that intracranial recordings provide high local certainty within the sampled regions (lines 224-227), whereas non-invasive methods provide broader coverage (lines 247-249). We now explicitly emphasize that drawing conclusions from null results based on intracranial recordings requires caution regarding electrode placement. We also point out that these approaches are complementary: M/EEG can identify large regions of interest, while sEEG can then provide high-resolution "ground truth" to confirm whether those regions are part of the NCC.

      Reference: Parvizi, J., & Kastner, S. (2018). Promises and limitations of human intracranial electroencephalography. Nature Neuroscience, 21(4), 474-483. https://doi.org/10.1038/s41593-018-0108-2

      The authors argue that the spatial resolution of intracranial recordings is better than that of EEG and MEG. While this is technically true (especially compared to EEG), the true spatial scale of the NCCs is unknown. If NCCs' span is in the mm range, then the additional spatial resolution of intracranial recordings might not be an advantage.

      We agree with the reviewer that the exact spatial scale of the NCC remains a topic of ongoing debate. However, we believe that the advantage of intracranial recordings holds true whether the NCC spans millimeters or centimeters. The main spatial limitation of non-invasive electrophysiology (M/EEG) is not just its spatial resolution but also the inverse problem. Since scalp sensors detect a mixture of signals from across the brain, different cortical configurations can produce identical scalp patterns. This makes it challenging to precisely locate the NCC or distinguish it from nearby activity (e.g., motor or attentional signals). When recording intracortically, a widespread NCC could be captured across multiple adjacent channels with high accuracy. Conversely, if the NCC is focal, it can be isolated with high spatial resolution. In either case, intracranial recordings eliminate the spatial ambiguity inherent in scalp recordings. We have revised the Introduction (lines 158-164) to clarify that the "spatial advantage" of intracranial recordings also pertains to the inverse problem, not merely to the ability to record from smaller cortical areas.

      Another factor that should be taken into consideration when assessing the spatial resolution of intracranial recordings is that while the listening zone of individual intracranial contacts is small, coverage is sparse and defined by clinical criteria (something that the authors discuss). In practice, the activity recorded by contacts is usually attributed to anatomically defined ROIs with a scale in the cm range. Given the sparse and uneven (across regions and patients) coverage afforded by intracranial recordings, the advantage of intracranial recordings in terms of spatial resolution is overstated.

      We thank the reviewer for raising this point regarding how intracranial data is often aggregated into regions of interest. We agree that if researchers generalize findings to large anatomical regions without accounting for single-channel recordings, some of the spatial benefits of intracranial recordings are indeed mitigated. We toned down some of the original claims accordingly, and acknowledged more clearly that clinical constraints of sEEG lead to sparse coverage (245-249).

      However, we maintain that even when using an ROI-based approach, intracranial recordings offer a clear advantage over non-invasive methods, in that they represent a direct measure from a specific patch of tissue, rather than a statistical estimate that may be contaminated by "leakage" from distant sources. To address the reviewer’s concern, we have updated the manuscript (lines 244-245) to emphasize the importance of relying on MNI coordinates and individual anatomy rather than solely on broad ROI labels.

      Appraisal of whether the authors achieved their aims:

      In this work, the authors have gathered an impressive review and have discussed several important problems in the field of study of NCCs, as well as provided a perspective on how the field could move forward.

      What is less clear is how the use of intracranial recordings per se holds potential to overcome problems such as the distinction between true NCCs and the prerequisites and consequences of conscious processing.

      Discussion of the likely impact of the work on the field:

      This work has the potential of becoming a must-read for anyone working in the field of consciousness research.

      Reviewer #3 (Public review):

      Summary:

      This narrative review provides a clear, well-structured, and comprehensive synthesis of intracerebral recording work on the neural correlates of consciousness. It is written in an accessible manner that will be useful to a broad community of researchers, from those new to iEEG to specialists in the field.

      Strengths:

      The manuscript successfully integrates methodological and theoretical perspectives and offers a balanced overview of current, sometimes contradicting evidence. As such, the manuscript is important as it calls for a concerted and better exploration of NCCs using iEEG in the future.

      We thank the reviewer for stating the importance of our work and its potential contribution to the field.

      Weaknesses:

      The manuscript extensively discusses the use of "report" as a criterion for identifying conscious perception and its limitations for separating between correlates of consciousness and post-consciousness processes, yet the term is not defined at the outset. The authors should specify what they mean by "report" (e.g., verbal report, nonverbal self-report, or any meta-cognitive indication of experience). Importantly, this definition should be explicitly linked to the theoretical landscape: whether the authors adopt an access-consciousness perspective in which (self) reportability is central, or whether the review also aims to address phenomenal consciousness. Making this conceptual grounding explicit at the beginning will help readers interpret the empirical work surveyed throughout the review.

      We agree that a clear definition of report is essential for the reader to interpret the empirical findings presented. We have added a definition to the Introduction (lines 108-111), specifying that we use "report" to refer to any explicit behavioral response (whether verbal, manual, or otherwise) that communicates a subject’s subjective state.

      Regarding the conceptual distinction between Phenomenal and Access consciousness, we refer to recent work from some of the co-authors (Mudrik et al., 2025), which suggests that P and A should not be seen as two types of consciousness, but rather as two necessary conditions for conscious experience. While a full discussion of this distinction is beyond the scope of this review, we now clearly state that our focus is on identifying neural activity that reflects the subjective experience itself, regardless of the downstream requirements of report.

      Reference: Mudrik, L., Faivre, N., Pitts, M., & Schurger, A. (2025). On a confusion about there being two types of consciousness. Trends in Cognitive Sciences. https://doi.org/10.1016/j.tics.2025.11.012

      In addition, the review would benefit from an earlier introduction of the distinction between states and contents of consciousness. This distinction becomes important in the later section on anaesthesia, sleep, and epileptic seizures, where the focus shifts from content-specific NCCs to alterations in global states. Presenting these definitions upfront and briefly explaining how states and contents interact would strengthen the coherence of the manuscript.

      We agree that clarifying the distinction between contents and levels of consciousness early on provides a stronger framework for the paper.

      We have added a brief clarification in the Introduction (lines 63-76): "It is also helpful to distinguish between levels of consciousness, defined as a global level of arousal or wakefulness (e.g., being awake vs. under anesthesia), and the contents of consciousness, defined as the specific subjective experiences one has while conscious (e.g., perceiving a visual stimulus; Bayne et al., 2016; Laureys, 2005). While the majority of this review focuses on 'content-specific' NCCs, the two dimensions are intrinsically linked, as global states typically set the conditions for the occurrence of specific conscious contents."

      Overall, this is an excellent and timely review. With clearer initial theoretical definitions of consciousness, the manuscript will offer an even stronger conceptual framework for interpreting intracerebral studies of consciousness.

      We thank the reviewer again for this highly positive assessment of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would like to reiterate that I believe this is a very scholarly piece of writing, and I congratulate the authors on producing such a useful and timely manuscript. Below, I suggest just a few ways the authors may resolve some of the issues I raised in the public review. However, I would like to emphasise that these are merely suggestions - the authors may think of different and better ways to address these comments that are more in line with either their thinking or writing style, and I would certainly encourage the authors to follow their own preferences if they feel they are at odds with my suggestions.

      For the longer comment questioning whether intracranial recordings are really a way to isolate NCCs from their pre- and post-processing, there are two ways the authors could resolve this. One is that they collapse the section distinguishing NCCs from their prerequisites and consequences into the previous section regarding limits of noninvasive measures. For instance, they could make the point that null results are easier to interpret with intracranial recordings in this previous section. Then they could discuss how specific intracranial studies have been able to resolve questions of pre-/post- processing confounds when they introduce studies later in the manuscript. At the moment, the Distinguishing NCCs from their prerequisites and consequences section, at least to me, undermines the argument of why intracranial recordings are important because it spends too much time describing how tasks are the core component of isolating pure NCCs, and not the recording method.

      Alternatively, the authors could keep the structure as it is. In this case, I would urge the authors to emphasise the role of intracortical recordings here and to make the argument that this is a problem that intracortical recordings (rather than novel tasks) can solve more convincingly. Citing specific studies that combined intracortical recordings with no-report paradigms and emphasising how the invasive recording allowed the researchers to reach a conclusion that would not have been possible with noninvasive measures would also be helpful.

      We thank the reviewer for these useful suggestions and agree that we would not want readers to take from this paper that design issues can be fixed by using invasive recordings. Because confounding issues are crucial in research on the NCC, we believe it is important to include a section on this topic in the Introduction. However, as we explained in our response to the public review, we revised the section introducing Human intracranial electrophysiology to reflect that intracranial recordings are a complementary tool that improves the interpretability of no-report paradigms, rather than a “silver bullet” solution for confound issues. We also explicitly say now that this problem is relevant to all techniques in the study of consciousness, including intracranial recordings (line 101). Additionally, based on the reviewer’s suggestion, we have added a more detailed explanation of how studies that pair intracranial recordings with no-report paradigms provide a unique insight in the Temporal Insights section (lines 822-823).

      For my comment: Drawing misleading conclusions from certain studies, I think the public review speaks for itself. I would recommend that the authors make sure they are drawing correct conclusions from the studies they cite, and make clear from the outset where there is ambiguity in interpretation.

      We thank the reviewer for bringing these ambiguities to our attention. As explained in the response to the public review, we have modified the text accordingly.

      Finally, with regard to the single-cell analyses, I would imagine that most readers will share at least some scepticism around single neurons being the appropriate level of analysis for revealing the basis of perceptual experience. As such, I think it would strengthen the manuscript greatly if the authors could provide a brief argument as to how such work can either inform theories of consciousness or contribute more generally to the study of NCCs, given that the field and its theories are mostly biased towards studying system-level neural processes. I think single-cell analyses are extremely valuable to NCC research, and the authors have a good opportunity to frame these studies accordingly.

      We agree. As detailed in the response to the public review, we now specify (1) how a higher level of granularity in electrophysiological measurements can distinguish between awareness-related signals and confounds, (2) that these measurements provide an opportunity to study neuronal population dynamics where various cognitive processes have been shown to emerge in animals and (3) that single-neuron measurements are necessary to test predictions of theories that are defined at the cellular level

      Reviewer #2 (Recommendations for the authors):

      Recommendations for improving the writing and presentation:

      My compliments for having written an impressive review. Overall, I think that this is a beautiful piece of work that will be of great use to the community. My only concern is that the advantages of intracranial recordings over non-invasive methods in solving the difficulties faced in the study of NCCs are overstated.

      Here I provide more precise comments for your consideration.

      (1) On page 5, lines 100 to 102, you argue that "Scalp EEG and MEG have limitedanatomical resolution due to the overlap of deep and superficial brain signals at the scalp level and, in the case of EEG, the scattering of the adjacent electrical signals through the scalp". It would be good to provide precise estimates of the spatial resolutions of EEG, MEG and intracranial recordings, with accompanying references. Consider also that MEG is relatively insensitive to deep sources. I recommend this paper: Piastra et al. 2020 https://onlinelibrary.wiley.com/doi/10.1002/hbm.25272

      We thank the reviewer once again for their positive evaluation of our work. As detailed in the response to the public reviews, we now clarify that intracranial recordings provide high local certainty within the sampled regions (lines 224-227), whereas non-invasive methods provide broader coverage (lines 247-249). We thank the reviewer for their additional suggestions and have clarified our concern about the anatomical conclusions that can be drawn from scalp EEG and MEG data (lines 158-164).

      (2) On page 11, you describe work showing that activity in the occipitotemporal cortex mightreflect a precursor to consciousness, but not an NCC proper, except for the case of faces, in which the fusiform seems to behave like a true NCC. Could you discuss how these seemingly contradictory results could be reconciled?

      One possibility is that activity in some parts of the occipitotemporal cortex instantiates content-specific NCCs, i.e., correlates that are only specific to certain stimulus types (in this case: faces), while activity in other parts instantiates precursors of the NCCs. Because faces have been extensively studied, we might have uncovered the content-specific NCCs for these stimuli but not for others. This is now discussed in the text on lines 342-344. Based on reviewer 1’s suggestion, we have also toned down our claim about occipitotemporal activity being a precursor to the NCC.

      (3) From line 322, you start to discuss connectivity analyses. Adding a subheading mightimprove readability.

      We appreciate the suggestion; however, adding a subheading to a single paragraph would require restructuring the entire section, which could disrupt the flow. We believe the current format maintains clarity and cohesion.

      (4) In line 329, you write "It remains unclear to what extent these connectivity patterns reflectpost-perceptual processing and how the signals associated with perceptual consciousness in the occipitotemporal cortex interact with frontoparietal regions." But it's not clear why this is the case.

      We meant to make two separate points: (1) these studies did not control for report-related activity using no-report paradigms and (2) there has been no investigation so far of the interaction between occipitotemporal and frontoparietal signals associated with perceptual consciousness. These two points have been clarified in the text (lines 378-381).

      (5) In line 692, it would be good to clarify that Pereira 2021 is a single-neuron study.

      This has been clarified in the text.

      (6) The phrase "more research/work is needed" is repeated several times.

      Thank you for pointing this out. To avoid redundancy, we have deleted the second mention of this phrase.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      Amyotrophic lateral sclerosis (ALS) affects nerve cells in the brain and spinal cord. The authors' approach to use genetic code expansion to tag two ALS proteins associated with stress granules has value and should be useful in the ALS field. Parts of the work are well done, but there are concerns that the evidence is incomplete overall, and additional controls would strengthen the study.

      We thank the editors and reviewers for their thoughtful assessment and for highlighting the potential value of applying genetic code expansion (GCE) to study ALSassociated proteins involved in stress granule biology. Our goal in this work was to establish and validate a minimally perturbative labeling strategy using the noncanonical amino acid Anap to monitor the localization and stress-dependent behavior of TDP-43 and G3BP1.

      We agree that additional controls can further strengthen the conclusions. In the revised manuscript, we have clarified the experimental design and added essential controls to better support the reliability of the Anap labeling approach (Supplementary Fig. 1).

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors utilize genetic code expansion to tag TDP-43 and G3BP1, and evaluate this protein tagging system (ANAP) compared to antibodies, and evaluate protein trafficking and stress granule formation in response to stress with sodium arsenite treatment. They find similar staining to antibodies in HeLa cells, mouse embryonic stem cells, and primary mouse cortical neurons. This is a useful study that demonstrates the utility of ANAP tagging to evaluate ALS proteins.

      We sincerely thank the reviewer for the positive assessment of our work and for recognizing the utility of the Anap-based GCE system for studying ALS-associated proteins.

      Strengths:

      Rescue of cell survival by ANAP-tagged TDP-43 is compelling

      We appreciate the reviewer’s highlighting of this point. Demonstrating that TDP43-Anap can rescue cell survival was an important validation in our study, as it indicates that incorporation of the noncanonical amino acid does not substantially disrupt the biological function of TDP-43. Additionally, we also tested the RNA splicing function recovery potency of TDP-43-Anap. As shown in Fig. 1K and 1L, a recovery of expression of PFKP, a protein undergoing cryptic exon when TDP-43 lost its function [1], was observed when expressing TDP-43-Anap in TDP-43 knockout Hela cells.

      Weaknesses:

      While the ANAP-tagged proteins had similar distributions to antibody staining, there were some discrepancies that may be more explained by the technique than by novel findings, as the authors suggested. The inclusion of additional controls to evaluate this would be helpful.

      This is a helpful suggestion. To ensure that the fluorescence signal observed in our experiments was specifically derived from site-specific Anap incorporation rather than background fluorescence, we performed three control conditions. Specifically, we tested: (1) cells cultured with Anap supplement, (2) cells expressing the Anap incorporation system with the addition of Anap, and (3) cells expressing both the TAG-mutated protein plasmid and the Anap incorporation system but without the addition of Anap. These control experiments were performed for both TDP-43 and G3BP1, and no observable fluorescence signal was detected under any of these conditions (Supplementary Fig. 1). We have clarified this control experiment in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Chen and colleagues describe a novel means of labeling two RNAbinding proteins, G3BP1 and TDP-43, using genetic code expansion. Overexpressed constructs that incorporate the intrinsically fluorescent non-canonical amino acid Anap redistribute to cytoplasmic granules upon application of external stressors such as sodium arsenite. Similar labeling and redistribution of overexpressed G3BP1 and TDP43 were observed in cultures of mouse primary neurons.

      We are grateful for the reviewer’s accurate summary of our study and recognition of the value of GCE strategy for labeling the RNA-binding proteins G3BP1 and TDP-43.

      Strengths:

      Genetic code expansion and non-canonical amino acid labeling have quite a few advantages over traditional fusion proteins for tracking protein redistribution in living cells. The authors show that they are able to label exogenous G3BP1 and TDP-43 with the non-canonical amino acid Anap and follow labeled proteins in living cells with and without stress.

      We acknowledge the reviewer’s comment on the advantages of GCE-based noncanonical amino acid labeling for studying protein dynamics in living cells.

      Weaknesses:

      The authors do not convincingly leverage the advantages of genetic code expansion in the current study. There is no specific question posed by the authors that can be or is answered using this approach, and several of the experiments lack critical controls. This is also not the first example of TDP-43 labeling by genetic code expansion (see PMID: 38290242). As a result, the study as a whole adds little to our understanding of protein trafficking and behavior under stress.

      We thank the reviewer for raising these important points. Although as reviewer mentioned, genetic code expansion has previously been applied to TDP-43 [2], it mainly employed the photocaged lysine incorporation system to optogenetic control of TDP-43 translocation, and the protein was still labeled by mRubby. Our paper has totally different goal, to establish and validate a minimally perturbative labeling strategy using the intrinsically fluorescent noncanonical amino acid Anap to monitor the localization and stress-dependent behavior of both TDP-43 and G3BP1. And our work extends this approach in several important ways.

      First, we demonstrate that Anap incorporation enables visualization of stress-dependent redistribution of both TDP-43 and G3BP1, two key proteins involved in stress granule biology. Importantly, we validate this approach across multiple cellular systems, including HeLa cells, mouse embryonic stem cells, and primary mouse cortical neurons, which broadens the applicability of this labeling strategy.

      Second, we provide functional validation of the Anap-tagged protein, showing that TDP43-Anap rescues both cell survival and RNA splicing activity in TDP-43 knockout cells, including restoration of PFKP expression, a known cryptic exon target of TDP-43. These results support that Anap incorporation does not substantially disrupt protein function.

      We performed additional control experiments to ensure the specificity of the labeling system. Specifically, we tested three control conditions: (1) cells cultured with Anap supplement, (2) cells expressing the Anap incorporation system with the addition of Anap, and (3) cells expressing both the TAG-mutated protein plasmid and the Anap incorporation system but without the addition of Anap. These control experiments were performed for both TDP-43 and G3BP1, and no observable fluorescence signal was detected under any of these conditions (Supplementary Fig. 1).

      We agree that the manuscript would benefit from clearer articulation of the advantages of genetic code expansion in this context. Accordingly, we have revised the manuscript to more explicitly emphasize how Anap labeling provides a minimally perturbative alternative to large fluorescent protein fusions, which can alter the phase behavior and localization of stress granule proteins.

      “Conventional fluorescent protein tags have enabled visualization of TDP-43 and G3BP1 in living cells; however, these approaches can perturb the native biophysical properties of the proteins being studied. For example, GFP or other fluorescently tagged TDP-43 usually requires additional modifications, such as deletion of the nuclear localization signal (NLS) [3, 4], to induce cytoplasmic inclusion formation. Such manipulations introduce non-physiological conditions that may alter the native trafficking and aggregation behavior of TDP-43. As for G3BP1, tags like GFP may also cause unexpected effects on the phase separation or other dynamics of the protein. In contrast, Anap based GCE strategy allows the minimally perturbative labeling and visualization of protein localization and stress-induced redistribution while preserving native protein architecture and function of both proteins. Importantly, the approach provides a generalizable genetically encoded platform for quantitatively examining the behavior of ALS-associated proteins in living cells. By enabling faithful monitoring of protein trafficking and stressgranule dynamics without extensive protein engineering, Anap-based GCE can offer a powerful strategy for probing molecular-scale mechanisms underlying ALS-linked proteinopathies”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1A

      The authors report that the nuclear staining of G3BP1 by ANAP labeling shows the presence of nuclear pools of G3BP1 that aren't detected with antibody staining. However, unspecific nuclear staining by aminoacylated tRNAs bound to synthetases has been described. It would be important to have a control to evaluate for this possibility.

      This is an important point. We agree that the nuclear ANAP signal should be carefully controlled to exclude the possibility of nonspecific staining arising from the Anap incorporation machinery itself, such as aminoacylated tRNAs and/or synthetases.

      To address this concern, in methods and material part, we note that after DPBS washes to remove excess Anap, cells were incubated in fresh medium for 2 hours to allow sufficient time for the decay of unstable aminoacylated tRNAs, which are generally cleared within minutes to tens of munites [5].

      Also, we performed three control conditions for both TDP-43 and G3BP1: (1) cells cultured with Anap supplement, (2) cells expressing the Anap incorporation system with the addition of Anap, and (3) cells expressing both the TAG-mutated protein plasmid and the Anap incorporation system but without the addition of Anap. Under all three conditions, we observed no detectable fluorescence signal (Supplementary Fig. 1).

      In addition, as shown in Fig. 1I, the nuclear signal of G3BP1-Anap partially colocalizes with the nuclear signal of TIA-1 in several condensate-like structures. This observation further supports that the nuclear Anap signal reflects protein-associated localization rather than nonspecific fluorescence, as it overlaps with a known RNA-binding protein that can form nuclear condensates under certain conditions.

      (2) Figure 1A, 1B

      Anap labeling appears to stain fewer cytoplasmic structures compared to antibody staining for both G3BP1 and TDP-43 after sodium arsenite treatment. Quantification would be useful to address whether this is the case. If so, might this be due to unincorporated/truncated proteins competing with Anap-labeled proteins?

      We appreciate the reviewer’s helpful suggestion. To address this point, we performed quantitative colocalization analysis using Fiji/ImageJ, calculating the Pearson correlation coefficient (R) for regions of interest between the Anap signal and antibody staining. These analyses indicate a strong overall agreement between the two detection methods under stress conditions, supporting that Anap labeling reliably reports the localization of both G3BP1 and TDP-43 (see Fig1. A, B).

      Regarding the possibility that truncated or unincorporated proteins could influence the observed signal, we note that fluorescence from Anap depends on successful amber suppression and incorporation of Anap at the engineered TAG site. Proteins that fail to incorporate Anap, such as truncated products generated by premature termination, would not produce fluorescence, and therefore would not contribute to the Anap signal. Thus, the Anap fluorescence selectively reports the population of successfully labeled full-length proteins, whereas antibody staining detects both labeled and unlabeled protein pools. This difference may partially explain why antibody staining appears to label a larger number of cytoplasmic structures.

      (3) Figure 1F

      FRAP of G3BP1-GFP in stress granules is slower than in previous publications. The underlying reasons for this should also be addressed.

      We thank the reviewer for this important observation. Differences in FRAP recovery kinetics of G3BP1 in stress granules may arise from several experimental variables that are known to influence stress granule dynamics. These include differences in cell type, expression levels of G3BP1-GFP, and imaging or photobleaching parameters. In our experiments, FRAP measurements were performed under specific conditions optimized for our experimental system, which may lead to recovery kinetics that differ from those reported in previous studies.

      (4) Figure 1H

      A full-size Western blot would be useful to evaluate for amount of truncated protein for G3BP1 and TDP-43. Could truncated proteins be competing with and altering ANAPtagged G3BP1 and TDP-43 localization in response to stress? This should be addressed.

      We acknowledge this important point. Full-size Western blotting can provide information on the overall presence of truncated species in the transfected population; however, it represents a bulk measurement and does not capture cell-to-cell variability in amber suppression efficiency at the single-cell level. We therefore cannot exclude the possibility that truncated products are present at varying levels in individual cells and may contribute, directly or indirectly, to differences between antibody staining and Anap fluorescence.

      Importantly, we observe that cells with successful Anap incorporation consistently exhibit strong antibody staining for TDP-43 or G3BP1, indicating that full-length protein is the predominant species in these cells. Because Anap fluorescence depends on successful amber suppression, it selectively reports the full-length protein population, whereas truncated products are not detected in the imaging assay. The concordance between Anap fluorescence and antibody staining therefore argues against a major contribution of truncated species to the observed localization patterns (Supplementary Fig. 1).

      Accordingly, we interpret the Anap signal as reflecting the localization of successfully labeled full-length protein, while acknowledging that heterogeneity in suppression efficiency is an important limitation of the current approach.

      (5) Figure 3

      This is a well-designed diagram.

      We are grateful for the reviewer’s positive feedback on the diagram and are pleased that the schematic effectively illustrates the experimental design and the principles of the genetic code expansion strategy used in this study.

      Reviewer #2 (Recommendations for the authors):

      The authors present a one-sided viewpoint concerning the connection between stress granules and disease (lines 45-46). A more balanced discussion is recommended, including data arguing against a role for abnormal stress granules in neurodegeneration.

      This is an important suggestion. We agree that the relationship between stress granules and neurodegeneration remains an active area of investigation and that evidence both supporting and questioning a causal role of stress granules in disease has been reported. In the revised manuscript, we have modified the Introduction to provide a more balanced discussion of this topic.

      “Altered stress-granule dynamics have been associated with ALS/FTD [6, 7]; however, whether stress granules directly drive neurodegeneration remains debated, as several studies suggest that stress granules primarily function as protective stress responses [8].”

      (1) A central rationale for the study is missing. The authors state only that G3BP1 and TDP-43 'undergo dynamic stress-dependent redistribution, making them ideal candidates for minimally invasive, site-specific fluorescent labeling.' Is there a controversy or question that can be resolved using these approaches?

      We thank the reviewer for raising this important point. The central motivation of this study is that the dynamic behavior and phase separation properties of stressgranule proteins are highly sensitive to protein modifications and tagging strategies.

      “Conventional fluorescent protein tags have enabled visualization of TDP-43 and G3BP1 in living cells; however, these approaches can perturb the native biophysical properties of the proteins being studied. For example, GFP or other fluorescently tagged TDP-43 usually requires additional modifications, such as deletion of the nuclear localization signal (NLS) [3, 4], to induce cytoplasmic inclusion formation. Such manipulations introduce non-physiological conditions that may alter the native trafficking and aggregation behavior of TDP-43. As for G3BP1, tags like GFP may also cause unexpected effects on the phase separation or other dynamics of the protein.”

      (2) Related to this, there is little context for how or why genetic code expansion is utilized for these studies

      We agree that the rationale for using genetic code expansion should be more clearly explained. In this study, genetic code expansion was employed to enable sitespecific incorporation of the small fluorescent noncanonical amino acid Anap, allowing minimally perturbative labeling of proteins of interest.

      “Anap based GCE strategy allows the minimally perturbative labeling and visualization of protein localization and stress-induced redistribution while preserving native protein architecture and function of both proteins. Importantly, the approach provides a generalizable genetically encoded platform for quantitatively examining the behavior of ALS-associated proteins in living cells. By enabling faithful monitoring of protein trafficking and stress-granule dynamics without extensive protein engineering, Anapbased GCE can offer a powerful strategy for probing molecular-scale mechanisms underlying ALS-linked proteinopathies.”

      (3) The justification for the criteria for selecting the site for incorporation of non-canonical amino acids in G3BP1 or TDP-43 is missing.

      We acknowledge this important comment and agree that the rationale for selecting the incorporation sites should be stated more clearly.

      “For TDP-43, the incorporation site was selected to avoid the major functional domains involved in RNA binding, nuclear localization, and aggregation-related behavior, thereby reducing the likelihood that Anap incorporation would perturb its native trafficking or function. For G3BP1, the selected site was chosen to minimize interference with domains important for stress granule assembly, RNA binding, and protein-protein interactions. More generally, we aimed to place the ncAA at positions likely to be solventaccessible and tolerant of substitution, while avoiding highly conserved or functionally essential residues.”

      (4) Studies in Figures 1 and 2 lack essential controls, including background signal from Anap in non-transfected cells, or those transfected with plasmids lacking the tRNA or tRS.

      This is an important point, also raised by Reviewer 1. To evaluate potential background fluorescence arising from Anap or the labeling system, we performed several control experiments. Specifically, we examined three conditions: (1) cells cultured with Anap supplement, (2) cells expressing the Anap incorporation system with the addition of Anap, and (3) cells expressing both the TAG-mutated protein plasmid and the Anap incorporation system but without the addition of Anap. Under all three conditions, we observed no detectable fluorescence signal (Supplementary Fig. 1).

      (5) Another marker of stress granules should be used for confirming the identity of G3BP1-Anap (+) or TDP-43-Anap (+) structures, including TIA1, TAF15, or polyA RNA.

      We appreciate this helpful suggestion. To further confirm the identity of the stress granule structures observed in our experiments, we performed colocalization analysis with TIA-1, a well-established marker of stress granules. The results have been included in revised manuscript.

      “Additionally, we examined the colocalization of G3BP1-Anap with TIA-1, another established stress granule marker. Under stress conditions, G3BP1-Anap largely colocalized with TIA-1 within stress granules. Interestingly, under basal conditions, the nuclear signal of G3BP1-Anap, which was not detected by antibody staining, appeared to partially colocalize with TIA-1 in several condensate-like structures. (Fig. 1I).”

      (6) There is no information on the number of granules bleached or the number of cells selected for FRAP studies. There is no information on the shaded areas in Figure 1F or 1G, and no information on statistical comparisons between regressions in Figure 1F.

      We thank the reviewer for pointing out these omissions. We have revised the figure legends to clarify these details.

      “One granule from each of three independent cells was selected and photobleached for FRAP analysis.”

      “Here, error bars with filled area are used for better data presentation. FRAP recovery curves were compared using two-way ANOVA.”

      (7) Protein dynamics measured by FRAP are highly dependent on the concentration and/or expression level of each protein. Because of this, the authors need to control for expression level in all FRAP studies.

      We agree that protein concentration and expression level can influence FRAP recovery kinetics. Since Anap incorporation is based on amber suppression, and the suppression rate in each cell varies, so it is difficult to control the expression of Anap labeled proteins, however, to minimize this potential effect, we performed FRAP measurements on cells exhibiting comparable fluorescence intensities, which served as a proxy for similar expression levels of the labeled proteins. In addition, FRAP analyses were conducted on individual granules within cells expressing moderate levels of the protein, avoiding cells with unusually high fluorescence intensity that might reflect overexpression.

      Furthermore, fluorescence recovery was normalized to the pre-bleach intensity of the selected granules, which reduces variability arising from differences in overall expression levels between cells.

      (8) There is no point of reference for TDP-43-Anap FRAP results in Figure 1G. Additional studies using variants harboring a mutated NLS (mNLS) can be used in place of TDP43-YFP.

      This is a helpful suggestion. In response, we have performed additional FRAP experiments using TDP-43<sup>ΔNLS</sup>, a commonly used construct that promotes cytoplasmic localization and facilitates analysis of TDP-43 granules. The results from TDP-43<sup>ΔNLS</sup> have now been included as a reference for the FRAP measurements of TDP-43-Anap in the revised manuscript (Fig. 1D, 1G).

      “We then used YFP-tagged nuclear localization signal (NLS)-deleted TDP-43 (TDP43<sup>ΔNLS</sup>-YFP) as a reference and performed FRAP analysis to compare the mobility of TDP-43-Anap and TDP-43<sup>ΔNLS</sup>-YFP. Fluorescence recovery of TDP-43-Anap reached ~45% within 20 s after photobleaching, consistent with liquid-like dynamics. In contrast, TDP-43<sup>ΔNLS</sup>-YFP showed only ~22% recovery, suggesting more solid-like dynamics (Fig. 1D, 1G). These results are consistent with previous reports describing relatively immobile aggregates formed by TDP-43<sup>ΔNLS4</sup>and illustrate the advantage of Anap-based labeling, which preserves native protein properties and enables real-time assessment of protein dynamics without introducing disruptive mutations.”

      (9) There is no point of reference for comparing FRAP results from G3BP1-GFP to G3BP1-Anap. What is the 'gold standard'? Without this, it is difficult to conclude that "... Anap labeling better preserved the native mobility and biophysical properties of G3BP1 than the conventional GFP tag."

      We acknowledge this important point and agree that there is currently no definitive gold standard for measuring the native mobility of endogenous G3BP1 within stress granules in living cells. Our intention was not to claim that the Anap-labeled protein definitively represents the native state, but rather to compare the relative effects of different labeling strategies.

      Thus, we rewrite the sentence as “These results suggest that G3BP1-Anap displays higher mobility compared with G3BP1-GFP, indicating that Anap labeling may provide a less perturbative approach for monitoring G3BP1 dynamics.”

      (10) The WB in Figure 1H is overexposed, making it difficult to compare expression levels between WT and V100Anap-transfected cells. In addition, there is no similar assay for confirming G3BP1-Anap expression.

      Thank you for pointing this out. In the revised manuscript, we have replaced the image with a properly exposed Western blot to allow clearer comparison of protein expression levels.

      In addition, we have now included a corresponding western blot analysis to confirm the expression of G3BP1-Anap in G3BP knockout U2OS cell (Fig. 1H). These results verify that the Anap-labeled proteins are expressed at detectable levels and support the interpretation of the imaging and FRAP experiments.

      (11) Although survival studies in Figures 1I and J are promising, a more convincing demonstration of functional replacement of TDP-43 would involve an assessment of cryptic exon splicing, comparing WT to TDP-43 KO, V100Stop- and V100Anaptransfected cells.

      This is a valuable suggestion.

      “We also evaluated TDP-43-dependent RNA splicing activity by examining the expression of PFKP, a well-established target that undergoes cryptic exon inclusion upon loss of TDP-43 function17. As shown in Figures 1K and 1L, expression of TDP-43Anap in TDP-43 knockout HeLa cells restored PFKP expression, indicating that the Anap-labeled protein retains functional RNA splicing activity. These results demonstrate that TDP-43-Anap is capable of functionally compensating for endogenous TDP-43, supporting that the incorporation of Anap does not substantially disrupt the protein’s biological function.”

      (12) Tuj1 staining in Figure 2 is inconsistent and often fails to confirm neuronal identity.

      We thank the reviewer for this important comment. We acknowledge that Tuj1 staining in Figure 2 is variable and, in some cases, does not clearly delineate neuronal identity. Notably, the reduced Tuj1 signal is primarily observed in neurons that express Anap-labeled proteins under sodium arsenite treatment, which likely reflects the combined effects of transfection-associated stress and oxidative stress on neuronal morphology and cytoskeletal integrity.

      In addition, transfection efficiency in primary neurons is inherently low and variable, and cells that successfully express the constructs may represent a more stress-sensitive subpopulation, further contributing to variability in staining quality. Despite optimization efforts, these technical constraints limit the consistency of Tuj1 labeling under these experimental conditions.

      (13) Close-up images and correlation scatter plots in Figures 1 and 2 do not add very much information.

      We thank the reviewer for this comment. To address the reviewer’s concern, we have revised the figure legends to better clarify the purpose of these panels and how they support the quantitative analysis presented in the manuscript.

      For scatter plot, “Colocalization threshold analysis was performed in Fiji/ImageJ to calculate the Pearson correlation coefficient (R) for each region of interest (A, B, I, J). The X- and Y-axes represent the fluorescence intensity values of the red and green channels, respectively. When signals are colocalized, pixels with high intensity in one channel correspond to high intensity in the other, forming a diagonal distribution. In contrast, non-colocalized signals cluster along the axes. A higher R value indicates a greater degree of colocalization. Scale bar, 3 μm.”

      Same information was added to figure legend of figure 2.

      For the scheme, please see line 412-413 in the revised manuscript.

      Reference:

      (1) Rothstein, J.D. et al. Sporadic ALS induced pluripotent stem cell derived neurons reveal hallmarks of TDP-43 loss of function. Nature Communications 16, 7092 (2025).

      (2) Shadish, J.A. & Lee, J.C. Genetically encoded lysine photocage for spatiotemporal control of TDP-43 nuclear import. Biophys Chem 307, 107191 (2024).

      (3) Gasset-Rosa, F. et al. Cytoplasmic TDP-43 De-mixing Independent of Stress Granules Drives Inhibition of Nuclear Import, Loss of Nuclear TDP-43, and Cell Death. Neuron 102, 339–357.e337 (2019).

      (4) Yan, X. et al. Intra-condensate demixing of TDP-43 inside stress granules generates pathological aggregates. Cell 188, 4123–4140.e4118 (2025).

      (5) Walker, S.E. & Fredrick, K. Preparation and evaluation of acylated tRNAs. Methods 44, 81–86 (2008).

      (6) Kassouf, T. et al. Targeting the NEDP1 enzyme to ameliorate ALS phenotypes through stress granule disassembly. Science Advances 9, eabq7585 (2023).

      (7) Van Nerom, M. et al. C9orf72-linked arginine-rich dipeptide repeats aggravate pathological phase separation of G3BP1. Proceedings of the National Academy of Sciences 121, e2402847121 (2024).

      (8) Wolozin, B. & Ivanov, P. Stress granules and neurodegeneration. Nat Rev Neurosci 20, 649–666 (2019).

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This important work begins to understand how BDNF regulates the phosphorylation and activity of LRRK2. The overall strength of evidence has been assessed as compelling, though some claims are only partially supported. The work will be of interest for those that might pursue specific LRRK2 interactions and mutational effects on these pathways as the work continues to develop.

      We thank the editors and reviewers for the constructive feedback. We have revised the manuscript to improve clarity, strengthen statistical analysis and increase the western blot sample size in drebrin KO mice.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      LRRK2 protein is familially linked to Parkinson's disease by the presence of several gene variants that all confer a gain-of-function effect on LRRK2 kinase activity.

      The authors examine the effects of BDNF stimulation in immortalized neuron-like cells, cultured mouse primary neurons, hIPSC-derived neurons, and brain tissue from genetically modified mice. They examine a LRRK2 regulatory phosphorylation residue, LRRK2 binding relationships, and measures of synaptic structure and function.

      Strengths:

      The study addresses an important research question: how does a PD-linked protein interact with other proteins, and contribute to responses to a well-characterized neuronal signalling pathway involved in the regulation of synaptic function and cell health.

      They employ a range of good models and techniques to fairly convincingly demonstrate that BDNF stimulation alters LRRK2 phosphorylation and binding to many proteins. In this revised manuscript, aspects are well validated e.g., drebrin binding, but there is a disconnect between these findings and alterations to LRRK2 substrates. A convincing phosphoproteomic analysis of PD mutant Knock-in mouse brain is included. Overall the links between LRRK2, LRRK2 activity, and the changes to synaptic molecules, structures, and activity are intriguing.

      We thank this Reviewer for appreciating our work including the new experiments performed during the revisions.

      Weaknesses:

      The data sets remain disjointed, conclusions are sweeping, and not always in line with what the data is showing. Validation of 'omics' data is light. Some inconsistencies with the major conclusions are ignored. Several of the assays employed (western blotting especially) are underpowered, findings key to their interpretation are addressed in only one or other of the several models employed, and supporting observations are lacking.

      We understand the Reviewer’s points and agree that it is important to increase the sample size (animals) for western blot. In particular, we acknowledge that the initial experiments with the Dbn1 KO mice included only 3 mice, which was insufficient to draw any definitive conclusion on the effect, especially regarding pRab8 levels. In response to this, we have collected additional animals and repeated the experiment with N=7 wild-type and N=7 KO mice (2 months old). Despite a high degree of interindividual variability, we have confirmed that drebin KO mouse brains have reduced levels of pLRRK2 (Author response image 1). In the new figure 2H we included all the replicates (N=7+3 per genotype) for pLRRK2. However, we removed western blot for pRab8 because a new batch of pRab8 antibody did not yield specific results, making it impossible to reassess.

      Author response image 1.

      Western blot analysis of N=7 WT and N=7 drebrin KO whole brains.

      Main Conclusions of Abstract:

      (1) Increase in pLRRK2 Ser935 and pRAB after BDNF in SH-SY5Y & mouse neurons

      Well supported, but only for pLRRK2 in neurons, why not pERK pAkt & pRab?

      The response of pERK and pAKT in neurons is shown in figure 4C. We have repeatedly tried pRab (both pRab8 and pRab10) in primary neurons but with no success. In support of the difficulty in detecting pRab in primary neurons, we are not aware of studies in the literature of western blot analysis of pRabs in primary neuronal cultures. This is likely due to the high levels of PPM1H in neurons as discussed in Berndsen et at, eLife, 2019 (PMID: 31663853).

      (2) Omics Proteome remodelling of LRRK2 interactome with BDNF & different in G2019S mouse neurons.

      Supports that the phosphoproteome of G2019S is different. Drebrin interaction with LRRK2 very well supported. Link between drebrin and LRRK2 activity somewhat supported (pS935 site), but the consequence (non-specific pRab8) not supported, as there is no evidence of a change in LRRK2 substrate(s).

      As discussed above, we removed the pRab8 western blot in figure 2H as we could not confirm with the new set of mice and a new batch of pRab8 antibody.

      (3) Golgi 1 month LKO mouse altered dendritic spines, transient at 1m not older.

      Supported but very small transient change in spines, disconnected to other results (e.g., drebrin).

      We agree with the Reviewer that the observed effect is modest, still we believe it is important to report. As discussed in the discussion, one plausible explanation for the limited magnitude of the effect is functional compensation by LRRK1.

      (4) iPSC-derived neurons BDNF increases mEPSC frequency (transient at 70 not 50 or 90 days) in WT not KO "which appear to bypass this regulation through developmental compensation"

      Weak, not clear what is being bypassed.

      We reviewed the statistical analysis as described below.

      Main Conclusions Based on Old and New Figure / Data:

      (1) Increase in pLRRK2 Ser935 and pRAB after BDNF in SH-SY5Y & mouse neurons

      Well supported, but only for pLRRK2 in neurons, why not ERK Akt & Rab?

      The response of pERK and pAKT in neurons is shown in figure 4C. We have repeatedly tried pRab (both pRab8 and pRab10) in primary neurons but with no success. In support of the difficulty in detecting pRab in primary neurons, we are not aware of studies in the literature of western blot analysis of pRabs in primary neuronal cultures. This is likely due to the high levels of PPM1H in neurons as discussed in Berndsen et at, eLife, 2019 (PMID: 31663853).

      (2) BDNF promotes LRRK2 interaction with "post-synaptic actin cytoskeleton components"

      Tone down, only one postsynaptic validated - drebrin strong BUT CONTRADICTORY; link between drebrin and LRRK2 activity (pS935 site) supported, consequence (non-specific pRab8) broken, no evidence of change in LRRK2 substrate.

      As suggested we tone down the paragraph title and changed it as follow: “BDNF stimulates LRRK2 interaction with drebrin, an actin cytoskeletal-associated protein enriched at the postsynapse”. As mentioned above, pRab8 has not been incorporated.

      (3) LRRK2 G2019S striatal phosphoproteome is different from WT.

      It is different. Where is link to BDNF or Drebrin?

      We found that debrin S339 phosphorylation is 3.7 fold higher in G2019S KI mice as compared to WT, suggesting a potential functional connection between LRRK2 and drebrin. However, differences in phosphorylation do not necessarily translate into physiological effects so further validation is required. To test if BDNF can induce S339 drebrin phosphorylation in a LRRK2-dependent manner we plan an in vivo experiment where BDNF is acutely administered to WT vs G2019S-KI mice +/- MLi2 to control for LRRK2 dependency. This is an important experiment to establish the mechanistic link, though it will require sufficient time due to the necessary ethical authorization needed to administer BDNF in the mouse brain.

      (4) BDNF signaling is impaired in Lrrk2 knockout neurons

      TrkB changes seem higher in SHSY5Y. pAKT impaired, pERK not convincing. Primary neurons Akt slower but it and Erk mostly intact. MLi-2 did not block pAkt or pErk in WT or KO (higher in latter). Whatever is happening in KO, Mli-2 not really blocking effect in WT. If we are to assume that studying the KO was a means to understand LRRK2 function, the authors data should explain why we care if an effect is absent in LKO, if LRRK2 isn't doing the same job in WT?

      To further support the conclusion that this effect is reproducible and dependent on LRRK2 kinase activity acting upstream of AKT and ERK signaling, we probed the membranes shown in Figure 1H for phosphorylated and total AKT and ERK1/2. Consistent with our hypothesis, the inhibition of LRRK2 with MLi-2 significantly reduced BDNF-induced AKT and ERK1/2 phosphorylation (Author response image 2).

      Author response image 2.

      Western blot (same experiments as in figure 1) was performed using antibodies against phosphoThr202/185 ERK1/2, total ERK1/2 and phospho-Ser473 AKT, total AKT protein levels. Retinoic acid-differentiated SH-SY5Y cells stimulated with 100 ng/mL BDNF for 0, 5, 30, 60 mins. MLi-2 was used at 500 nM for 90 mins to inhibit LRRK2 kinase activity.

      BDNF increases synaptic puncta in WT not LKO (which start higher?). Is this BDNF increase blocked by LRRK2 inhibition?

      This is an important experiment that we plan to investigate in a future study.

      (5) Postsynaptic structural changes in Lrrk2 knockout neurons

      Golgi impregnation shows some very small spine changes at 1m. Not sustained over age. mRNA changes are very small (10% not even a fold... very weak and should be written as so). Derbrin levels reduced clearly at 1m, but probably also at 4 & 18. Underpowered, disconnected time course from the spine changes.

      While differences are small they have been observed in independent sets of mice through qPCR, histology, WB and TEM, supporting the consistency of the effect, although small. For clarity we rescaled the qPCR graphs to 0.

      (6) An effect on "spontaneous electrical activity" at Div70

      Weak. What is so special at 70 days that means we should be confident in the differences, or be satisfied that the other time points are legitimately ignored? These are 10-11 cells from 3 cultures assayed at 3 time points but only one is presented (rest in supplement). This should be a 2 (time) or 3 way (+culture RM) ANOVA. As it stands, in WT there is a little - no activity at 50 days, little to no at 70 days, and variable to lots or none at 90. BDNF did nothing at 50 or 90 but may have at 70. In KO low activity stable at 50 & 70, tanks at 90. BDNF would seem to have a similar effect on KO at 90 as WT at 70, but as there are only 7 cells it remains inconclusive. Thus the conclusion that BDNF signalling is broken in LKO is not well supported by the ephys data, nor is the BDNF effect in WT cells (even at the 70 day time point) shown to be susceptible to LRRK2 inhibition.

      We thank the Reviewer for suggesting a more comprehensive analysis of the data. Following this suggestion, we performed separate two-way ANOVAs (DIV × treatment) for WT and LRRK2 KO neurons. This analysis revealed significant main effects of DIV and BDNF treatment in WT neurons, indicating that synaptic activity increases with neuronal maturation and is globally enhanced by BDNF. In contrast, neither DIV nor BDNF treatment reached statistical significance in LRRK2KO neurons, and no DIV × treatment interaction was observed. These results indicate that BDNFdependent enhancement of synaptic activity is preserved in WT neurons but is lost in the absence of LRRK2. We have now incorporated this analysis into the main figure and removed the individual DIV50 and DIV90 plots from the supplementary material. We also revised the title of the last paragraph to reflect the outcome of this analysis and toned down our interpretation (page 12).

      Furthermore, we have added a paragraph to the Discussion section highlighting the limitations of this study. These include the variability observed in protein content and phosphorylation analyses by western blot, as well as the necessity to confirm the electrophysiological findings in larger datasets, including in dopaminergic neurons.

      Reviewer #2 (Public review):

      The data show that BDNF regulates the PD-associated kinase LRRK2, they place LRRK2 within welldescribed BDNF pathways biochemically, and they show that LRRK2 can play a role mediating BDNFdriven synaptic outcomes at excitatory synapses. The chief strength is that the data provide a potential focal point for multiple observations that have been made across many labs. The findings will be of broad interest because LRRK2 has emerged as a protein that is likely to be part of Parkinson's pathology and its normal and pathological actions remain poorly understood.

      We thank this Reviewer for appreciating our work and acknowledging that our findings will be of broad interest.

      A major strength of the study is the multiple approaches that were used (biochemistry, bioinformatics, light and electron microscopy and electrophysiology) across different experimental models (cells, primary neurons, human neurons, mice) to identify and examine the impact of BDNF on LRRK2 signaling and functions. Noteworthy is also the employment of LRRK2KO preparations to validate outcomes and to place LRRK2 actions up or downstream.

      Thank you to the Reviewer

      The demonstration that LRRK2 and drebrin interact directly is important and suggests that other interacting proteins identified biochemically and bioinformatically in the paper will be important to pursue.

      We agree with this statement

      Some data from different models do not fit well with one another (like mouse and human neurons). This is likely due to inherent differences in the preparations. Since different experiments were carried out on the different preps, however, it is not possible to cross compare. The lack of this information is viewed more as an open question than a cause for concern.

      We thank the Reviewer for raising this point. In response, we have added a new section to the Discussion explicitly addressing the limitations of the study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      MLi2 pretreatment experiment is nice. They state in legends BDNF treatment prior to MLi2, they mean prior MLI2 treatment. Or MLi2 pretreatment, prior to BDNF. However, this experiment is hard to interpret as it has no control (non BDNF treated) time course following MLi2, could this be (at least in part) a rebound effect produced by relief of inhibition? This should be discussed if not addressed directly by experiments.

      The non BDNF treated group represents the 0 time point. We have specified it in the figure legend. We have excluded the rephosphorylation kinetic after MLi-2 relief because pRabs increase significantly at 5 minutes, far exceeding the control levels. This observation gives us feel confidence that the effect if BDNF dependent.

      (1) "As suggested, we performed qPCR and observed that 1 month-old KO midbrain and cortex express lower levels of Dbn1 as compared to WT brains (Figure 5G). This result is in agreement with the western blot data (Figure 5H)."

      There is no Fig 5H? 5F? In 5F effect sizes are exaggerated with axes not crossing zero. There is a 10% reduction in mRNA (normally >1 or 2 fold changes would be considered biologically important?). This isn't much change, and should be presented as such. 1 month old WB in G are much more convincing of a reduction of drebrin levels, but what brain region is this from?

      We apologize for the error in the rebuttal, where we incorrectly referred to figure 5G (the correct is 5F), while what we called 5H is instead 5G. We checked the labeling in the manuscript text and it is correct.

      Following the Reviewer’s important suggestion, we rescaled all plots to start at zero. Although some differences appear relatively modest, they are statistically significant. Importantly, all brains used for qPCR analyses (N = 6 per genotype) were obtained from independent mice. In addition, independent cohorts of mice were used for spine morphology analyses (N = 3 per genotype), TEM analyses (N = 4), and western blot experiments (N = 3). Thus, the overall sample size across approaches is substantial.

      WB are from whole brain, now indicated in the figure legend.

      All blots are underpowered, especially given what appears to be an age dependent loss of drebrin in both genotypes beset by high variability

      (i) Western blots looking at pSer935 and pRab8 (pan Rab) in Dbn1 WT and knockout brains.

      "As reported and quantified in Figure 2I, we observed a significant decrease in pSer935 and a trend decrease in pRab8 in Dbn1 KO brains. This finding supports the notion that Drebrin forms a complex with LRRK2 that is important for its activity, e.g. upon BDNF stimulation."

      Non-sig data in Fig2I/H and especially Fig5G are important data but hard to interpret because the experiment is underpowered. I am surprised the authors designed studies on an n=3 western blot.

      For fig 2 this is a problem if they wish to correlate LRRK2 activity with drebrin. The KO have a clear 50% decrease in LRRK2 pS935 but no change to pRab8(pan).

      As discussed above, we increased the sample size by 7 additional mice per genotype (total of 10 brains analyzed).

      Why not look at Rab10, and certainly redo with a higher n than three. Of special confusion is the observation that the WT with the highest drebrin levels, is the animal with the lowest pS935 & pRab

      As discussed above neither pRab8 nor pRab10 returned convincing results in the new round of western blots. We acknowledge that future experiments should explore the phosphorylation levels of Rab12 which is emerging as a more reliable readout of LRRK2 kinase activity in the brain.

      (ii) "Reverse co-immunoprecipitation of YFP-drebrin full-length, N-terminal domain (1-256 aa) and Cterminal domain (256-649 aa) (plasmids kindly received from Professor Phillip R. Gordon-Weeks, Worth et al., J Cell Biol, 2013) with Flag-LRRK2 co-expressed in HEK293T cells. As shown in supplementary Fig. S2C, we confirm that YFP-drebrin binds LRRK2, with the N terminal region of drebrin appearing to be the major contributor to this interaction"

      CoIP with drebrin (and fragments) is very convincing.

      We thank the Reviewer for his/her comment/feedback

      Ephys data, presentation, and response to review is weak.

      We reanalyzed the data as suggested by the Reviewer and reviewed the text and interpretation.

      Reviewer #2 (Recommendations for the authors):

      p. 12, last paragraph. "sealing" should be "ceiling"

      We corrected the misspelled word

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study sought to investigate the role that early childhood malaria exposure plays in the development of antibody responses to unrelated pathogens and vaccine-derived antigens in Kenyan children. In this natural experiment, the authors compare antibody levels among children who have been exposed to different levels of malaria transmission by using protein microarray technology. Although the findings are of importance, the evidence remains incomplete, and the analysis would benefit from a more in-depth evaluation of potential confounders. With the appropriate analysis, the findings will be of great interest for global health, immunology, and vaccine development.

      We thank the editors for highlighting the need for a more comprehensive evaluation of potential confounding. We agree that this is a critical aspect of the study and have now undertaken additional analyses to address this directly.

      The original longitudinal cohort was designed to investigate the acquisition of naturally acquired immunity to malaria and did not include systematic collection of anthropometric/nutritional, environmental or socioeconomic data, precluding direct adjustment for these factors within the primary dataset. However, to assess whether there were population-level differences in these factors, we leveraged contemporaneous hospital-based surveillance data from the same geographic regions, which includes measurements of anthropometry and nutritional status (muac, weight-for-age, and height-for-age) and detailed infection diagnostics.

      Using this independent dataset, we fitted mixed-effects regression models adjusting for age, calendar year, and concurrent infections (RSV, parainfluenza, influenza A, human metapneumovirus, OC43). For all three anthropometric indices, we found no evidence of systematic differences between children from Junju and Ngerenya. Adjusted differences were small and centred around zero (muac: −0.12, 95% CI −0.38 to 0.15, weight-for-age: −0.05, −0.28 to 0.19, height-for-age: 0.08, −0.17 to 0.33), with no consistent directional effect.

      As the longitudinal cohort was randomly selected from these underlying populations, these findings suggest that the groups were broadly comparable with respect to nutritional status and there were no differences in their exposure to the infections that were included in the analysis. We have incorporated these analyses into the revised manuscript, added a new figure focussed on this analysis -fig. 6, updated the statistical analysis and discussion sections), and believe they substantially strengthen the evidence by addressing a key source of potential confounding.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study shows that childhood malaria can weaken the antibody response to other vaccines and infections. This suggests that early exposure to P. falciparum may have a long-lasting effect on immunity, with implications for vaccine efficacy in endemic areas.

      Strengths:

      This study stands out for its longitudinal design, the use of robust immunological techniques, and the comparison between areas with different levels of malaria exposure. Its findings reveal that early malaria can weaken the response to childhood vaccines, with important implications for public health in endemic regions.

      We thank the reviewer for this comment

      Weaknesses:

      One of the study's main limitations is the lack of functional data confirming the clinical impact of the low antibody levels. Furthermore, although multiple immune responses were measured, other important components, such as cellular immunity, were not assessed. Furthermore, the results may not be generalizable to other regions.

      We thank the reviewer for this important comment and agree that the absence of functional immunological assays is a limitation of the current study. Our analysis was designed to determine whether early-life malaria exposure is associated with durable alterations in antibody responses to unrelated pathogens and vaccine antigens, rather than to establish the downstream functional consequences of these differences. As such, the study is able to demonstrate a broad and persistent attenuation of humoral responses but cannot directly determine whether the lower antibody levels observed translate into reduced neutralising capacity or diminished protection at the individual level.

      We have revised the manuscript to make this distinction more explicit. In the revised discussion, we now state that although reduced antibody titres to vaccine-preventable pathogens may have implications for long-term protection, the clinical significance of these differences remains to be established in future studies incorporating functional assays and clinical outcome data.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated whether early-life malaria exposure has long-term effects on immune responses to unrelated antigens. They leveraged a natural experiment in coastal Kenya where two adjacent communities (Junju and Ngerenya) experienced divergent malaria transmission patterns after 2004. Using 15 years of longitudinal data from 123 children with weekly malaria surveillance and annual serological sampling, they measured antibody responses to multiple pathogens using a protein microarray technology and ELISA.

      Strengths:

      (1) Extensive longitudinal data collection with weekly malaria surveillance, enabling precise exposure classification.

      (2) Use of a natural experiment design that allows for causal inference about malaria's immunological effects.

      (3) Broad panel of antigens tested, demonstrating generalized rather than antigen-specific effects.

      (4) Within-cohort analysis in Ngerenya controls for geographic and environmental factors.

      (5) Validation of key findings using both serologic microarray and ELISA.

      (6) Important public health implications for vaccine strategies in malaria-endemic regions.

      We thank the reviewer for these comments

      Weaknesses:

      (1) Lack of participants' characteristics (socio-economic, nutritional, physical).

      We thank the reviewer for this important comment. We have now included a detailed summary of participant characteristics in Table 1to provide context for the study population. This includes key demographic and longitudinal variables stratified by cohort (Junju and Ngerenya), including sex distribution, age at study entry and exit, duration of follow-up, number of visits per participant, and total serum samples analysed. Detailed data on socio-economic status, nutritional status, and other environmental or physical characteristics were not consistently available across all participants and time points, and therefore could not be included. This has now been explicitly stated as a limitation in the discussion.

      (2) Somewhat limited sample size (longitudinal analysis of 123 children total), with further subdivision reducing statistical power for some analyses.

      We thank the reviewer for this important observation. The study is based on an intensively followed cohort with weekly malaria surveillance and repeated serological measurements throughout childhood, allowing detailed characterisation of early-life exposure and subsequent immune trajectories. This depth of longitudinal sampling provides resolution that is not achievable in larger cross-sectional studies. We acknowledge that subdivision of the cohort reduces statistical power for some analyses. Nevertheless, the key findings were consistent in several independent comparisons, including a reduction in antibody levels for broad panel of antigens in the malaria endemic setting, within-cohort analyses in Ngerenya that replicated this observation, and the confirmation of results generated on the protein microarray on the ELISA platform. The consistency of these findings across analytical approaches and measurement platforms reduces the likelihood that the observed effects are driven by small-sample variability. We have clarified this point in the revised discussion to emphasise that the strength of the study lies in the depth and longitudinal resolution of the data rather than the absolute sample size.

      (3) Potential confounding by unmeasured socioeconomic, nutritional, or environmental factors between communities.

      We thank the reviewer for this important point and agree that residual confounding between communities must be considered. As outlined in reponse to the editorial assesment, we have undertaken additional analyses using contemporaneous population-level data from the same regions and found no evidence of systematic differences in anthropometric indices between children from Junju and Ngerenya after accounting for age, calendar year, and concurrent infections, with effect estimates small and crossing zer. In addition, the within-Ngerenya analysis provides an internal comparison within a shared geographic and environmental setting, reducing the likelihood that unmeasured socioeconomic or environmental differences between communities account for the observed associations. The new analyses suggest that major population-level differences in nutritional status or infection burden are unlikely to explain the observed patterns. We have clarified this point in the revised discussion and explicitly acknowledge the possibility of residual confounding.

      (4) Lack of ability to determine the direction of the associations found between malaria exposure and other IgG levels to unrelated pathogens.

      We agree that, as an observational study, our analysis cannot definitively establish the direction of the association between malaria exposure and antibody responses to unrelated antigens. However, several features of the study design strengthen the inference that early-life malaria exposure contributes to the observed differences. First, malaria exposure was characterised prospectively through intensive weekly surveillance, allowing precise temporal definition of exposure in early childhood. Second, within the Ngerenya cohort, where children were exposed to different levels of malaria due to a rapid decline in transmission, those with even limited early-life exposure exhibited lower antibody responses at 10 years of age than malaria-naïve peers, despite residing in the same geographic and environmental context. In addition, we now show that these differences are not confined to a single timepoint but are evident across the full longitudinal follow-up after adjustment for age and repeated measurements. While we cannot exclude the possibility of residual confounding or bidirectional relationships, the convergence of evidence from the natural experiment design, within-cohort contrasts, and age-adjusted longitudinal analyses supports early-life malaria exposure as a key contributor to long-term differences in antibody responses. We have clarified this in the discussion.

      (5) Despite good longitudinal data, the main analysis was conducted as a cross-sectional analysis at age 10 for many comparisons, which limits the understanding of temporal dynamics.

      We thank the reviewer for highlighting this point. While age 10 was initially used as a standardised reference point for cross-sectional comparisons, the underlying dataset is longitudinal, with repeated antibody measurements across childhood. To address this more directly, we have now complemented these analyses with antigen-specific mixed-effects regression models incorporating all available longitudinal data, with adjustment for age and a random intercept for repeated measurements within individuals. These models demonstrate that the differences between cohorts are not confined to the age-10 cross-section but are evident in an age-adjusted longitudinal framework for multiple antigens. We have retained the age-10 comparisons for reference, but the primary inference is now based on the longitudinal mixed-effects analyses. These changes are reflected in the revised results and statistical analysis sections. We thank the reviewer for this astute point, which we think has substantially improved the manuscript.

      (6) Statistical analysis is limited to univariable comparisons without consideration for confounders or adjusting for multiple comparisons.

      We agree that the original analyses relied primarily on univariable comparisons. In the revised manuscript, we have extended the analytical framework to include mixed-effects regression models that account for age effects and repeated measurements within individuals. These models estimate the average age-adjusted difference in antibody responses between cohorts across the full follow-up period. We have also applied false discovery rate (FDR) correction to account for multiple antigen testing. For multiple antigens, the direction and magnitude of cohort differences remain consistent under this approach, strengthening the robustness of the findings beyond the original univariable comparisons. These analyses have been incorporated into the revised results and statistical analysis sections.

      (7) No mechanistic understanding of how early malaria exposure creates lasting immunosuppression.

      We agree that this study does not directly resolve the mechanistic basis underlying the observed long-term differences in antibody responses. The primary aim of this work was to identify and characterise durable alterations in humoral immune profiles associated with early-life malaria exposure, rather than to define the cellular or molecular pathways involved. However, our findings are consistent with a growing body of experimental and clinical literature suggesting that malaria infection can induce sustained perturbations in B cell and T cell compartments, including the expansion of atypical memory B cells, altered germinal centre responses, and increased regulatory immune activity. These mechanisms have been proposed to impair the generation and maintenance of effective humoral immunity. In the revised discussion, we have clarified that the mechanistic basis of this phenomenon remains to be fully defined and have expanded the discussion of plausible pathways in light of existing literature. We now explicitly position our findings as providing population-level evidence of a durable immunological phenotype that warrants further mechanistic investigation.

      (8) No understanding of the clinical Implications of the reduced IgG levels observed in the area with high malaria exposure.

      We agree that this study does not directly establish the clinical consequences of the reduced antibody levels observed in malaria-exposed children. The primary objective of this study was to characterise long-term differences in humoral immune profiles associated with early-life malaria exposure, rather than to assess downstream clinical outcomes or functional antibody activity. We have clarified this limitation in the revised discussion. Nevertheless, the breadth and consistency of the observed differences for multiple vaccine-preventable and infectious antigens raise the possibility that early-life malaria exposure may have implications for long-term immune protection. We now emphasise in the revised discussion that future studies incorporating functional assays and clinical outcome data will be required to determine whether these serological differences translate into altered susceptibility to infection or reduced vaccine effectiveness.

      Assessment of Claims:

      The data appear to support the authors' primary claims, but the strength of the evidence is limited, and the results should be interpreted with caution. Together with the currently available evidence of P. falciparum's impact on the host's immune function, this natural experiment design provides further evidence for a relationship between early malaria exposure and reduced antibody responses. The within-Ngerenya analysis controls for geographic factors and thus enhances the quality of the evidence, however, it still fails to account for the physical, nutritional, and socio-economic factors that may have driven the observed changes. Additionally, the mechanism underlying this effect remains unclear, and the clinical significance of reduced antibody levels is not established.

      We thank the reviewer for this assessment and for recognising the strengths of the natural experiment design and within-cohort analyses. We agree that, as an observational study, our findings should be interpreted appropriately. In the revised manuscript, we have undertaken additional analyses and clarifications to strengthen the evidential basis of our conclusions and to address the points raised. To address potential confounding by nutritional and related factors, we analysed contemporaneous hospital-based surveillance data from the same geographic populations since nutritional and socioeconomic variables were not consistently collected during the course of longitudinal follow up. For three independent anthropometric indices of nutrition status (muac, weight-for-age, and height-for-age), we found no evidence of systematic differences between children from Junju and Ngerenya after adjustment for age, calendar year, and concurrent infections. As the longitudinal cohort subjects were randomly drawn from these populations, these findings suggest that the two groups were broadly comparable with respect to early-life growth and nutritional status.

      We agree that the mechanistic basis of the observed differences is not resolved in this observational study. We have clarified this point in the revised discussion and expanded our consideration of plausible biological pathways based on existing literature, including perturbations in B cell and T cell compartments. Similarly, we now explicitly state that the clinical implications of reduced antibody levels remain to be determined and will require studies incorporating functional assays and clinical outcomes. We believe these revisions strengthen the manuscript by providing a more comprehensive interpretation of the data.

      Impact and Utility:

      This work has fundamental implications for understanding vaccine effectiveness in malaria-endemic regions and may contribute to informing vaccination strategies. The findings, if strengthened, would suggest that children in areas of high malaria transmission may require modified immunization approaches. The dataset provides a valuable resource for future studies of malaria's immunological legacy.

      We thank the reviewer for this comment

      Context:

      This study builds on prior work showing acute immunosuppressive effects of malaria but uniquely attempts to demonstrate the durability of these effects years after exposure. The natural experiment design addresses limitations of previous observational studies by providing a more controlled comparison.

      We thank the reviewer for this comment

      Recommendations for the authors:

      Reviewing Editor Comments:

      We suggest that further analyses of potential confounders such as anthropometric indices, socioeconomic status, and comorbidities would render the evidence more robust.

      We thank the Reviewing Editor for this important suggestion. We agree that careful consideration of potential confounding factors is critical to the interpretation of these findings, and have undertaken additional analyses to address this.

      Because anthropometric and related socioeconomic measurements were not collected systematically within the original longitudinal malaria cohort, we assessed potential population-level differences using hospital-based surveillance data from the same geographic regions. This dataset includes measurements of anthropometry (mid-upper arm circumference, weight-for-age, and height-for-age) as well as detailed infection diagnostics in childhood. Using these data, we fitted regression models adjusting for age, calendar year, and concurrent, clinically diagnosed infections. For all three anthropometric indices, we found no evidence of systematic differences between children from Junju and Ngerenya, with effect estimates small and crossing zero (fig. 6). As the longitudinal cohorts were randomly selected from these populations, these findings suggest that the groups were broadly comparable with respect to nutritional status and infection exposure. With respect to socioeconomic status and comorbidities, detailed individual-level data were not available within the longitudinal cohort. However, the within-Ngerenya analysis, where children with differing early-life malaria exposure were compared within the same geographic and environmental setting, provides a complementary control for these factors. We have incorporated these additional analyses and clarifications into the revised manuscript statistical analysis, discussion lines and believe they strengthen the robustness of the findings by addressing key sources of potential confounding.

      Reviewer #1 (Recommendations for the authors):

      The manuscript is well written, with clear and informative figures that effectively support the findings.

      We thank the reviewer for this comment

      Suggestions:

      (1) Although the study well controlled for malaria exposure, other environmental or infectious factors that influence immunity could be considered:

      Nutritional status in childhood (malnutrition impacts immune response), co-infections (helminths, respiratory viruses), socioeconomic differences, or differences in access to health services, even minimal, between Junju and Ngerenya.

      We thank the reviewer for highlighting the potential influence of environmental, infectious, and socioeconomic factors on immune responses. We agree that these are important considerations in the interpretation of observational data. To address nutritional status and concurrent infectious exposures, we analysed contemporaneous hospital-based surveillance data from the same geographic populations. This dataset includes measurements of anthropometric indices (mid-upper arm circumference, weight-for-age, and height-for-age) alongside detailed clinical diagnostics for common childhood infections. Using regression models adjusting for age, calendar year, and concurrent infections, we found no evidence of systematic differences in anthropometric profiles between children from Junju and Ngerenya (fig. 6). These findings suggest that the populations from which the longitudinal cohorts were randomly selected were comparable with regard to early-life growth and nutritional status. We agree that we cannot fully exclude the influence of unmeasured factors such as helminth infections, socioeconomic variation, or subtle differences in healthcare access. However, the within-Ngerenya analysis, where children with differing early-life malaria exposure were compared within the same geographic, environmental, and healthcare setting, provides an internal control for many of these factors. The persistence of similar patterns within this setting supports malaria exposure as a key contributor of the observed differences. We have clarified these considerations in the revised discussion and believe that, the additional analyses and within-cohort comparisons strengthen the robustness of our conclusions while acknowledging the limitations inherent to observational studies.

      (2) Measurement of other immunological markers:

      In addition to IgG, include: B cell subpopulations (naive, memory, atypical), cytokine levels (IL-10, IFN-γ) to characterize the immunological microenvironment.

      You could include these recommendations in the text for future studies.

      We thank the reviewer for this thoughtful suggestion. We agree that detailed immunophenotyping, including characterisation of B cell subpopulations and cytokine profiles, would provide important insight into the mechanisms underlying the observed differences in antibody responses. In the revised manuscript, we have expanded the discussion to highlight these important avenues for future work, including the potential role of altered B cell subsets (and regulatory or inflammatory cytokine environments in shaping long-term humoral responses).

      Reviewer #2 (Recommendations for the authors):

      The manuscript is well-written.

      We thank the reviewer for this comment

      (1) Methodological Clarifications:

      Do the authors have any information regarding the characteristics of these children that could be of use in understanding their immune responses better? (e.g., weight, height, BMI, socioeconomic status, HB level, access to health care, etc.).

      We thank the reviewer for highlighting the importance of participant characteristics in interpreting immune responses. Anthropometric and related clinical measures were not collected systematically within the original longitudinal malaria cohort, as the study was designed to investigate the acquisition of naturally acquired immunity to malaria.

      To address this, we analysed contemporaneous hospital-based surveillance data from the same geographic populations, which include measurements of anthropometric indices (mid-upper arm circumference, weight-for-age, and height-for-age) alongside detailed infection diagnostics. Using regression models adjusting for age, calendar year, and concurrent infections, we found no evidence of systematic differences in anthropometric profiles between children from Junju and Ngerenya (fig. 6) Detailed individual-level data on socioeconomic status, haemoglobin levels, and healthcare access were not available within the longitudinal cohort impeding direct adjustment in the longitudinal cohorts. However, the within-Ngerenya analysis, where children with differing early-life malaria exposure were compared within the same geographic and healthcare setting, provides an internal control for many of these factors. These considerations are now clarified in the revised discussion.

      Could the authors provide more detailed statistical analysis, including power calculations and multiple comparison corrections?

      In the revised manuscript, we have extended the statistical analysis and now include antigen-specific mixed-effects regression models incorporating all available longitudinal measurements, which is comprehensively described in the statistical analysis section. We have also applied false discovery rate (FDR) correction to account for multiple testing across antigens, and report both unadjusted and FDR-adjusted significance in the revised results. With respect to power, the sample size was determined by the number of children meeting inclusion criteria within the long-term surveillance cohorts in terms of availability of a sufficient number of longitudinal samples. We have clarified this in the revised manuscript.

      Clarify the criteria for selecting the 123-child subset from the larger surveillance cohorts.

      We thank the reviewer for this comment. The 123 children included in this analysis were selected from the larger surveillance cohorts based on the availability of sufficiently dense longitudinal serum sampling as described above. Specifically, children were required to have at least eight longitudinal samples available in the archive, enabling robust assessment of within-individual antibody trends over time. This criterion was applied to ensure adequate temporal resolution to examine the long-term stability of malaria-associated effects on antibody responses. Children with fewer available samples were therefore excluded, as limited sampling would not allow reliable characterisation of longitudinal patterns. We have clarified these inclusion criteria in the revised manuscript.

      (2) Additional Analyses and Data Presentation:

      The authors could consider dose-response analyses relating malaria episode frequency/timing to degree of immunosuppression or even AMA-1 IgG levels and degree of immunosuppression. How do they associate over time?

      We thank the reviewer for this suggestion. To address this, we examined the relationship between malaria exposure (using cumulative febrile malaria episode count derived from longitudinal surveillance data) and the magnitude of heterologous antibody responses. In mixed-effects models adjusting for age and repeated antibody measurements, higher malaria episode burden was associated with lower antibody responses against multiple antigens (fig 7).

      Analyze whether the effects vary by specific age at malaria exposure.

      We agree that age at exposure is an important consideration. We have now assessed how the relationship between malaria burden and antibody responses varies with age by including age as a non-linear term and modelling interactions between malaria exposure and age as described above. These analyses did not suggest substantial heterogeneity in the association over age, and therefore we have retained the simpler presentation for clarity.

      Provide correlation analyses between different antibody responses to assess whether suppression is generalized.

      We have addressed this by modelling responses jointly across a panel of heterologous antigens and by examining antigen-specific associations. The direction of effect was consistent for the majority of antigens, with no evidence of opposing trends, supporting a broad rather than antigen-specific effect.

      The authors could consider moving Figures 2a and b to the supplementary material.

      We thank the reviewer for this suggestion. We carefully considered whether panels 2a and 2b could be moved to the supplementary material. However, we have retained them in the main text because they provide a simple, intuitive illustration of how AMA1 antibody responses track with malaria exposure at the individual level, complementing the population-level analysis shown in fig. 2c. We feel that this helps establish the biological validity of the microarray platform in a way that is immediately interpretable to the reader, and therefore supports the interpretation of subsequent analyses.

      The authors could consider replacing Figures 3a and b with IgG levels from ALL vaccinated children and ALL non-vaccinated children.

      We thank the reviewer for this suggestion. We would like to retain these figures for the same reasons that have been articulated above for figures 2a and b.

      (3) Discussion Enhancements:

      The authors should consider expanding the discussion to address the limitations of the data more thoroughly, particularly regarding the potential differences between cohorts that could have contributed to the results.

      We have expanded the discussion to more explicitly address potential differences between cohorts that could contribute to the observed findings, including nutritional, socioeconomic, and environmental factors.

      The discussion needs to acknowledge the lack of directionality for the associations observed. As stated above, although I agree in general terms with the observations that the authors have made, it is not possible to distinguish between a suppressive effect of malaria on immune responses to infection-derived pathogens or a protective effect of malaria that leads to less exposure to infection-derived pathogens (and consequently lower IgG levels). The mechanisms behind these could include things like different health-seeking behaviors or social interactions from kids who have malaria versus those who don't, for example.

      We agree that, as an observational study, we cannot definitively establish the direction of the association between malaria exposure and antibody responses to unrelated antigens. We have now clarified this limitation explicitly in the discussion. We acknowledge the alternative interpretations raised by the reviewer, including the possibility that differences in exposure to other pathogens, potentially driven by behavioural, environmental or healthcare-related factors, could contribute to the observed patterns. At the same time, we note that the natural experiment design, prospective malaria exposure classification, and within-Ngerenya comparisons support early-life malaria exposure as a key contributing factor. We have revised the discussion to reflect this balance.

      Extend the discussion of potential biological mechanisms underlying durable immunosuppression.

      We thank the reviewer for this suggestion. We have expanded the discussion to more fully consider potential biological mechanisms that could underlie the observed long-term differences in antibody responses. Specifically, we now discuss evidence from prior studies indicating that malaria infection can induce sustained alterations in B cell and T cell compartments, including expansion of atypical memory B cells, disruption of germinal centre responses, and increased regulatory immune activity. We position our findings as providing population-level evidence of a durable immunological phenotype, while noting that targeted mechanistic studies will be required to define the underlying pathways.

      Extend the discussion around the clinical implications of the observed antibody level differences.

      In the revised discussion, we highlight that studies incorporating functional assays and clinical outcome data will be required to determine whether these serological differences translate into altered susceptibility to infection or reduced vaccine effectiveness.

      (4) Technical Issues:

      Could the authors please:

      (1) Clarify microarray data processing and quality control procedures.

      We thank the reviewer for this request. We have expanded the methods section to provide additional detail on microarray data processing and quality control procedures.

      (2) Provide information on inter-assay variability and batch effects.

      We have expanded the methods section to clarify how these were evaluated and addressed. Inter-assay variability was monitored using pooled adult serum included on every slide as a consistent positive control. This allowed us to assess slide-to-slide consistency in signal detection across the full antigen panel. In addition, fluorophore-conjugated IgG and IgA controls were printed directly onto each miniarray to confirm scanner performance independently of antigen–antibody interactions. At the sample level, each specimen was assayed on two independent miniarrays per slide, generating four spatially separated replicate measurements per antigen. Technical variability was quantified using the coefficient of variation (CV), and measurements with CV >20% were excluded from downstream analyses.

      (3) Include details on how missing data were handled in longitudinal analyses.

      We thank the reviewer for highlighting this point. We have added clarification in the statistical analysis section describing how missing data were handled. Specifically, mixed-effects models were used, which accommodate unbalanced longitudinal data without requiring imputation, allowing all available observations to contribute to the analysis.

      (4) Include details of the parameters of the LOWESS analysis shown in Figure 1.

      We have expanded the figure 1 legend to include the parameters used for the loess smoothing shown, including the smoothing span.

      (5) Include details of the samples used for Figure 3d (Negative and Pooled Adult Serum).

      We have clarified in the methods the nature and purpose of the samples used in Figure 3d. The negative control consisted of phosphate-buffered saline applied to a full miniarray in place of serum, allowing assessment of background and non-specific signal in the absence of antibody binding. The pooled adult serum comprised a composite of sera from multiple healthy adults from the same setting and was included as a positive reference sample, expected to contain a broad repertoire of antigen-specific antibodies. These controls were included on each slide to enable interpretation of assay performance, with the negative control defining baseline signal and the pooled adult serum providing a consistent reference for antigen recognition across the microarray.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the reviewers for their thoughtful and constructive comments. We fully agree that when two independent variables (genotype and age) are being evaluated, the statistical analysis must appropriately account for both factors and their potential interaction. We appreciate the reviewers’ guidance in strengthening the statistical rigor of our study.

      In response to this concern, we have carefully reanalyzed the relevant datasets using two-way ANOVA to properly assess the effects of genotype, age, and their interaction. The manuscript, figures, and figure legends have been revised accordingly. Specifically:

      Figure 1:

      The quantification of p16 expression in Fig. 1F has been reanalyzed using two-way ANOVA. The figure has been replotted, and the corresponding legend has been updated to reflect the revised statistical approach.

      Figure 2:

      The quantification of AUC in Fig. 2F has been reanalyzed using two-way ANOVA. The figure and legend have been updated accordingly.

      Figure 3:

      The quantification of F4/80 in Fig. 3C and 3D has been reanalyzed using two-way ANOVA. The figures and corresponding legends have been revised to reflect this updated analysis.

      Public Reviews:

      Reviewer #1 (Public review):

      Sebag et al. addressed the role of ADH5 in BAT in the development of aging and metabolic disarrangements associated with it. This is a follow-up study after the authors' demonstration of the role of BAT ADH5 in glucose homeostasis, obesity, and cold tolerance. By ablating ADH5 specifically in brown adipocytes or pharmacologically modulating ADH5 through activation of its transcription factor, the authors conclude that preservation of BAT function is crucial for healthy aging and ADH5 is causally involved in this process. The topic is appealing given the rise in the aging population and the unclear role of BAT function in this process. Overall, the study uses several techniques, is easy to follow, and addresses several physiological and molecular manifestations of aging. However, the study lacks an appropriate statistical analysis, which severely affects the conclusions of the work. Therefore, interpretation of the findings is limited and must be done with caution.

      We sincerely thank the reviewer for their thoughtful and constructive comments. We fully agree that when two independent variables (genotype and age) are being evaluated, the statistical analysis must appropriately account for both factors and their potential interaction. We appreciate the reviewers’ guidance in strengthening the statistical rigor of our study.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Sex needs to be considered as a biological variable, at a minimum in the reporting of the phenotypes observed in this manuscript, but also potentially by further experimentation. The only mention of sex I could find is that the authors reported the general protein SNO status in BAT is increased with age in male C57Bl/6J mice. Is this also true in female mice?

      We thank the reviewer for this insightful comment. In response, we examined whether aging affects Hsf1 and Adh5 transcript levels in wild-type female mice (3 months vs. 19 months). Our analysis did not reveal significant age-associated changes in the expression of either gene. These results have now been incorporated into the revised manuscript and are presented in Figure 4A.

      (2) It would be helpful to know the extent of ADH5 loss in the adipose tissue of knockout mice, either by mRNA or by immunoblotting for ADH5. It could also be helpful to know if ADH5 is deleted from the inguinal adipose tissue of these mice, especially since they seem to accumulate fat mass as they age (Figure 2B).

      We thank the reviewer for this suggestion. Indeed, we have previously measured ADH5 expression in both brown adipose tissue (BAT) and inguinal white adipose tissue (iWAT). These data were published in Cell Reports (PMID: 3478865).

      (3) For Figure 4D, it's not clear how these BAT samples were treated with HSF1A - was this done in vivo or ex vivo?

      We thank the reviewer for their thoughtful comment. We have now provided additional methodological details in the revised manuscript. In Figure 4D (current Figure 4E), BAT was collected from wild-type mice and cultured ex vivo as explants. The BAT explants were treated for 24 hours with HSF1A (an HSF1 activator; 20 µM). Following treatment, mRNA levels of the indicated genes were measured by RT-qPCR.

      (4) I didn't understand what was on the y-axis in Figure 5A, nor how it was measured.

      We apologize for not making these critical points clearer in the initial submission. Figure 5A shows the release profiles of HSF1A from collagen gels with nanoclay (Collagen–NC–HSF1A) and without nanoclay (Collagen–HSF1A), determined using an established standard curve method (Hu et al., PMID: 33225042).

      The concentration of HSF1A was quantified by UV–Vis spectroscopy. Briefly, a standard curve for HSF1A was generated by measuring the UV–Vis spectra of HSF1A at known concentrations (1.25, 2.5, 5, 10, and 20 µM) prepared in phosphate-buffered saline (PBS). Collagen gels with or without nanoclay were then fabricated to evaluate the release profile. At predetermined time points (1, 5, 9, 14, and 21 days), the PBS supernatant from each sample was collected and analyzed by UV–Vis spectroscopy. The amount of released HSF1A was calculated using the previously established standard curves. A brief description has now been included in the figure legend.

      (6) Figure 1B: What is the age of the positive (ADH5BKO) and negative (Adh5 fl) mice?

      We regret that we did not describe our results clearly in the first submission and have included detailed information in the revised manuscript.

      (7) Figure 1F: Can you clarify what I'm looking at in the P16ink4a panels? The red staining? Is the blue staining DAPI? This is also a problem in Figures 3C, 3D and 5G, and 5I. Figure 4B looks great - maybe this could be used as an example?

      We regret that we did not present results clearly in the first submission and have provided detailed information in these figures in the revised manuscript.

      (8) Figure 3B looks a bit odd. Can the approach to measuring IL-1β be clarified, and could the authors explain why they can't show units of mass for IL-1β levels?

      We have provided information in the revised manuscript.

      (9) What are the levels of nitric oxide synthase in the BAT of the aging model? Since protein S-nitrosylation is regulated by a balance of both, the attribution of greater protein S-nitrosylation to ADH5 is incomplete without determining nitric oxide synthase.

      We thank the reviewer for this thoughtful comment. In response, we have now included the analysis of iNOS transcript expression levels in the revised manuscript. These data are presented in Figure 1C.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (2) Presentation of metabolomics is not appropriate. The authors described, using color coding, the metabolites up- or downregulated in the experimental design. However, the current approach does not allow the reader to detect sample size, magnitude of changes, variability of the data, p-values, etc. This approach does not follow the standard practices of scientific rigor and should be modified. Metabolomic data could be uploaded as supplementary data, or a table with all necessary information to allow a full interpretation of the data should be provided.

      We have now provided the the metabolimic data in a table format as Figure 3I.

      (6) What are the levels of nitric oxide synthase in the BAT of the aging model? Since protein S-nitrosylation is regulated by a balance of both, the attribution of greater protein S-nitrosylation to ADH5 is incomplete without determining nitric oxide synthase.

      We thank the reviewer for their thoughtful comment. We have now included iNOS transcript levels expression level in the revised manuscript (Figure 1C).

      Minor Comments:

      (1) The conclusion of the abstract is somewhat vague. I suggest the authors rewrite it to better recapitulate what was found in the study.

      We thank the reviewers for this helpful suggestion. In response, we have revised the Abstract to improve the specificity and clarity of our conclusions.

      (2) In the introduction, the authors mention that an increased level of mitochondrial ROS activates UCP1. Given that the evidence for this statement is circumstantial and not supported by the current state-of-the-art (PMID: 28710335), where it is accepted that UCP1 activation diminishes ROS production, I suggest that the authors tone down this statement or at least acknowledge conflicting findings and interpretations.

      We thank the reviewer’s insight, we have included this important notion in the introduction.

      (3) Figure 2H - It is unclear what this figure (and statistical analysis) represents. Please, improve the description of the experiment and how the data were plotted to reach such a conclusion.

      We regret that we did not present results clearly in the first submission. The trend lines show the relationship between body weight and time on rotarod. The P value is the comparison of the slope of the line between Adh5 BKO mice and Adh5 fl/fl mice. The data implicate that the heavier the BKO mouse, the less time spent on the rotarod.

      (4) Figure 2M - The unit of LV thickness is missing. Please, provide it. In addition, I am missing the other cardiac parameters obtained from the echocardiogram.

      We have included this information in Figure 2M in the revised manuscript.

      (5) Figure 2G - I believe force is not the right unit for the grip strength test. Please, revise accordingly.

      We regret that we did not describe our results clearly in the first submission. We have corrected this unit in the revised figure.

      (6) Figure 3H - What is the unit when reporting mitochondrial area?

      We regret that we did not describe our results clearly in the first submission. We have added this information in the revised figure.

      (7) Is HFS1 also downregulated in iWAT?

      We thank the reviewer for this thoughtful comment. In response, we measured Hsf1 expression in iWAT from young and aged wild-type male mice. Our analysis did not reveal any significant age-associated changes in Hsf1 expression in iWAT. These results have now been included in the revised manuscript (Figure 4C).

      (8) Can the authors explain how HFS1 expression increases upon HSF1 activation? I understand ADH5 is controlled by HSF1, but what would control HSF1 itself? Off targets?

      We thank the reviewer for this insightful comment. At present, we do not have direct mechanistic evidence to definitively support this notion, and we cannot exclude the possibility of off-target effects of HSF1A. However, previous studies have reported that the HSF1 promoter contains heat shock elements (HSEs) in humans and HSE-like domains in mice. Based on this, we speculate that activated HSF1 may enhance its own transcription through an autoregulatory or positive feedback mechanism.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Strengths:

      This work sets a benchmark for integrative 3D genomics in oncology. Its methodological sophistication and conceptual advances establish a new paradigm for studying nuclear architecture in disease.

      We appreciate the very kind words.

      Weaknesses:

      Major Issues

      (1) Functional tests would strengthen the observed links between structure and gene changes. For example, the COL12A1 gene loop formation correlates with its increased expression. Disrupting this loop using CRISPR-dCas9 at chr6 position 75280 kb could prove whether the loop causes COL12A1 activation. Such experiments would turn strong correlations into clear mechanisms.

      We agree that targeted disruption of specific loops such as COL12A1 will be important for functional validation of the causal relationships between enhancer-promoter loop formation/dissipation and changes in gene expression. However, the intent of our current study was to profile changes in genome organization at a global scale to deduce general features of cancer progression-associated changes in genome organization, rather than to explore specific loop interactions. The current findings are a foundation for more targeted functional follow-up studies.

      (2) The H3K27ac looping idea needs deeper validation. Data suggests H3K27ac loss weakens loops without affecting CTCF. Testing how cohesin proteins interact with H3K27acmodified sites would clarify this process. Degron systems could rapidly remove H3K27ac to observe real-time effects. Also, the AP-1 motifs found at dynamic loop sites deserve functional tests. Knocking down AP-1 factors might show if they control loop formation.

      We agree that modulating histone modifications or transcription factors would provide insights into the underlying mechanisms driving the changes we observed. However, such studies utilizing degrons or small molecule inhibitors that globally knock down either H3K27ac or specific transcription factors are often difficult to interpret. For example, assessing the role of AP-1 factors, as suggested, would be complicated by the variety of AP-1 proteins. In addition, H3K27ac reduction could inhibit loop strength either directly (i.e. by reducing cohesin recruitment) or indirectly (i.e. by reducing gene expression which could in turn affect loop strength). Parsing out the exact relationships between these features will require extensive follow-up work and falls outside of the scope of the current study.

      (3) Connecting findings to patient data would boost clinical relevance. The MCF10 model is excellent for controlled studies. Checking if TAD boundary weakening occurs in actual patient metastases would show real-world importance. Comparing primary and metastatic tumor samples from the same patients could reveal new structural biomarkers. If tissue is scarce, testing cancer cells with added stroma cells might mimic tumor environment effects.

      We have leveraged publicly available datasets to link the observations from the progression model to clinical samples. Specifically, we have compared our datasets to chromatin organization data in non-cancerous mammary epithelial cells (HMEC), five cell lines representing distinct cancer subtypes ranging from less (luminal) to more aggressive (triple negative, TNBC), as well as tissue samples from TNBC patients with contralateral normal controls. We explored the conservation of both loops and TADs identified in the MCF10 progression system in each of these maps, paying particular attention to how features that are differential between MCF10 cells differ across other cancer cell types. We observe a high degree of conservation of static loops and TAD boundaries among the cancer samples, as well as some degree of cell-specific changes among loops and boundaries that change during MCF10 progression. These findings are included in Supplemental Figures 3 and 4 and are discussed on page 7.

      Minor Issues

      (1) Adding a clear definition for static loops would help readers. For example, state that static loops show less than 10 percent contact change across replicates.

      Static loops are defined as loops with a fold-change of 1.5 or more between any two MCF10 cell lines and an adjusted p-value of less than 0.025 considering change across biological and technical replicates. This definition is stated on page 6).

      (2) In the ABC model analysis, removing promoter regions from the enhancer list would focus results on true long-range interactions.

      The ABC model already excludes the promoter of each gene. Only self-promoters are excluded, whereas the model allows promoters of other genes to act as potential long-range enhancers of the target gene. We have added text to make this clear (see page 11).

      (3) Briefly noting why this study sees TAD weakening while other cancer types show different patterns would provide useful context.

      The biological reason for TAD weakening in the MCF10 model is not known, but neither the mechanism for boundary weakening nor the reason for apparently different behavior amongst cancers is known. We expanded the text on this discussion slightly, but we refrain from making any definitive claims. We do note that differences in the types of cancer studied or the methods used for detecting changes in TADs (i.e. different sensitivities and thresholds for detecting change) could be responsible (see page 15). We also mention that the loss of insulation at many TAD boundaries detected in our study are subtle changes in intensity that could be potentially missed if using methods tailored to find more drastic changes in TAD architecture.

      Reviewer #2 (Public review):

      While the conclusions are broadly supported, methodological and analytical refinements are required.

      We appreciate these comments.

      (1) Model representativeness. The long-term culture-adapted MCF10 genome harbours extensive aneuploidies and translocations. Validation of key COL12A1/WNT5A loop dynamics in an independent breast-cancer line (e.g., MDA-MB-231, T47D) or in patientderived organoids/PDX models would strengthen generalizability.

      Although the generation of Micro-C datasets in additional cell lines is outside of the scope of this study, we used publicly available Hi-C data from triple negative breast cancer (TNBC) progression and patient samples (Kim, Han & Chun et al. 2022) to assess generalizability of the MCF10 model findings. While these maps are lower resolution than the Micro-C maps used in our study, they are of sufficient depth to detect loops at a similar resolution (10 kb). We report these findings in Supplemental Figures 3 and 4 and discuss them on page 7.

      We find that chromatin loops and TAD boundaries detected across the MCF10 system are highly conserved across all other mammary epithelial lines studied. Chromatin loops that were more prominent in MCF10AT1 and MCF10CA1a lines were also significantly stronger in TNBC cells. Insulation score boundaries that were weakened in MCF10CA1a showed strong insulation across all cell lines in TNBC. These findings highlight that different model systems indeed have distinct profiles of structural change, just as they have distinct gene expression profiles.

      It is worth noting that direct comparison at individual loci is complicated by variations in gene expression profiles between the MCF10 model and the TNBC progression model; for example, COL12A1 is not significantly upregulated between normal and TNBC tissues in this study (unlike in the TCGA-BRCA data) and is downregulated between HMEC and TNBC cell lines. Regardless, our analysis provides some indication of conserved and divergent features in the various model systems.

      (2) The study remains purely correlative; no perturbation experiments are conducted to demonstrate causal roles of chromatin loops on gene expression. CRISPR interference (CRISPR-Cas9-KRAB/HDAC) or enhancer deletion/inversion should be applied to 3-5 pivotal loops (e.g., COL12A1, WNT5A) to test their impact on target-gene expression and cellular phenotypes (e.g., proliferation, migration).

      We agree that targeted disruption of specific loops such as COL12A1 will be important for understanding the causal relationships between enhancer-promoter loop formation/dissipation and changes in gene expression. However, the intent of our current study was to profile changes in genome organization at a global scale to deduce general features of cancer progression-associated changes in genome organization, rather than exploring specific loop interactions. The current findings are a foundation for more targeted follow-up functional studies.

      (3) The manuscript lacks integration with clinical datasets. Integrate TCGA-BRCA data to assess whether elevated COL12A1/WNT5A expression associates with overall survival (OS) or distant metastasis-free survival (DMFS)

      To assess clinical significance of specific loci, we have queried expression of all differentially expressed genes in the MCF10 progression system among TCGA-BRCA expression data. We summarize our findings in Supp. Fig. 5E and discuss them on page 8.

      We found that roughly 25% of genes that change in our model also change significantly in breast cancer, but only roughly half of those genes change in the same direction (i.e. up-regulated in MCF10CA1a vs MCF10A, and up-regulated in tumor vs normal samples). Interestingly, there was a higher degree of directional agreement between latechanging genes (i.e. genes that change in MCF10CA1a compared to MCF10A and MCF10AT1) than early-changing genes (i.e. genes that change in MCF10AT1 and MCF10CA1a compared to MCF10A).

      We have also explored the impact of select highlighted genes on overall survival (OS). We present these data in Supp. Fig. 6 and discuss it on page 8. While not all genes showcased in this study have a significant impact on overall survival, most trend in the same direction as their differential expression would suggest (i.e. genes more highly expressed in cancer vs tumor also have a hazard ratio above 1).

      Reviewer #3 (Public review):

      The differential topology analysis and its integration with transcription is very well done- one of the best versions of this I have read in the 3D genome field!

      We appreciate the reviewers’ endorsement.

      However, the paper is framed largely as a cancer biology study, and it teaches us much less about this. I am worried that some of the trends for each topologic feature are not going to be consistent across the pre-malignant-malignant-metastatic spectrum and would like the authors to soften some of their claims a bit regarding how this clarifies our understanding of cancer evolution.

      We agree that the strength of the study lies in its deep mapping of chromatin architecture and the landscape of enhancers and differentially expressed genes, which we hope to use to better understand the relationship between chromatin structure and gene expression, regardless of their cancer relevance. To better relate the findings in the progression system to cancer, we have added new data from direct comparisons of the MCF10 progression system with multiple patient-derived cancer cell lines and cancer tissues. These data are shown in Supp. Fig. 3 and 4 and discussed on p. 7. Regardless, we have softened the claims regarding cancer progression throughout the manuscript.

      Weaknesses:

      Major Concerns:

      (1) The integration of gene expression and chromatin loops is intriguing. The authors' differential analysis, however, omits consideration of genes that are on and simply further upregulated versus genes that transition on/off or off/on. It would be nice to see the authors break out looping patterns for these two different patterns of regulation, as it may be instructive regarding the rules for how EP loops govern transcription.

      To address different types of gene expression patterns, we analyzed 108 genes that went from an unexpressed or “off” state (2 or fewer read counts) in one cell line to an expressed “on” state (100 or more read counts) in another, and 111 genes that go from “on” to “high” (1000 or more read counts). We present these data in Supp. Fig. 8 and discuss the findings on page 9. While neither of these genes were enriched for differential loops, a large number overlap with loop anchors. We found a relationship between loop strength and gene expression levels; genes that are more strongly expressed are more likely to overlap with the anchor of a chromatin loop. All gene sets show similar strong trends at distal regulatory regions.

      (2) Given the paucity of differential loops at the majority of genes whose expression changes, the authors should examine chromatin subcompartments, as these may associate more with differential transcription.

      We present subcompartment analysis in Supp. Fig. 9. Our CALDER compartment calls are qualitative rather than quantitative, so to explore this we examined how compartments change genome-wide and at specific promoters. We show these data in Supp. Fig. 9 and discuss the findings on page 10-11. We see that between any two cell types, a majority of changes occur between closely related subcompartments, i.e. from A.2.2 to A.2.1 (1 step more A-like) or B.1.1 (1 step more B-like). The promoters of differentially expressed genes have minimal subcompartment changes, but genes that shift from on to off have larger changes. Differentially expressed genes with promoters that shift by multiple subcompartments have significant impacts on fold-change, but smaller shifts have minimal impacts on gene expression. In summary, small changes in subcompartments are very common and have little impact on gene expression, while larger changes are infrequent and correlate more strongly with changes in gene expression.

      (3) The authors could push their TAD analysis further by integrating it with transcription. Can they look at genes and their enhancers that span these altered boundaries to see if these shifts impact transcription?

      We provide this analysis in Supp. Fig. 9. We started, as suggested, by looking at genes with distal enhancers (as determined by the ABC model) that span a single TAD boundary. However, the number of genes that fit this definition was relatively small, so we expanded to look at any genes with promoters in the proximity (50kb) of differential insulation score boundaries, for which we saw the same trends with more robust signal. Our findings are shown in Supp. Fig. 9 and discussed on page 10. We found that genes near weakened boundaries are not enriched for differentially expressed genes, while those near strengthened boundaries are. Comparing the fold-change of genes near strengthened, weakened, and static boundaries showed a significant inverse correlation between boundary strength and gene expression, although effect sizes were small. These results show that changes in TAD boundary insulation have small but noticeable impacts on gene expression.

      (4) The progression of cancer critically goes from a benign -> pre-malignant -> malignant -> metastatic series of steps. The AT1 line is described as 'premalignant' and thus the authors' series omits a malignant line. While I think adding such a sample is an unreasonable request at this point (as it would have had to have been studied in 'batch' with these other samples), the authors should acknowledge that they omit this step and spend some time discussing the genetic, morphologic, and phenotypic features for their 3 conditions. The images in Figure 1S aren't particularly useful- they don't tell the reader that these cells are malignant/benign. The karyotypic data are intriguing but not fully analyzed, so it is hard to know what true phenotype these cells represent. For example, malignant means DCIS/invasive carcinoma - so then what does this pre-malignant cell model represent? The described alteration in the AT1 line is a Ras oncogene, so in some sense, the transition to this line really is just +/- Ras. The authors could spend some time thinking about the effects of Ras specifically on the 3D genome.

      We have expanded our discussion of the relevance of the MCF10 model on page 4, and the limitations of the model on page 17. The MCF10 progression model has been extensively used by many laboratories, and its properties have been discussed in detail (i.e. Polizzotti et al. 2012). Critically, the MCF10AT1 cell line is the product not only of Ras oncogene expression but then derived from a 100-day-old precancerous lesion that formed a squamous carcinoma in a mouse, and over this time it accumulated additional changes. The MCF10AT1 line is considered pre-malignant as it has accrued critical changes that prepare it for the metastatic transition, but it does not immediately form tumors when injected back into mice. Unlike the MCF10DCIS cell line which is malignant but not metastatic, the more aggressive MCF10CA1a is classified as both malignant and highly metastatic, forming tumors that quickly metastasize to the lungs in mouse xenografts. While both MCF10AT1 and MCF10CA1a are tumorigenic, we acknowledge the lack of a nonmetastatic malignant cell line in the discussion on page 17. We have also provided updated karyotype characterization of the cell lines used in this study in Supp. Fig. 1B and now include full composite karyotypes in the Methods section (page 18).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The reviewer’s recommendations are the same as their public review comments. See our response to the review comments above.

      Reviewer #2 (Recommendations for the authors):

      (1) If conditions permit, it is recommended that inclusion of primary human mammary epithelial cells (HMECs) to distinguish immortalisation-specific from malignancy-specific 3D changes.

      Micro-C data of equal resolution is not available for HMECs. We have, however, incorporated analysis of publicly available deeply sequenced Hi-C data of HMECs into several figures that explore the conservation of loops and TADs in these cells (Supp. Fig. 3 and 4).

      We find that chromatin loops and TAD boundaries detected across the MCF10 system are highly conserved across all other mammary epithelial lines studied. Chromatin loops that were more prominent in MCF10AT1 and MCF10CA1a lines were also significantly stronger in TNBC cells. Insulation score boundaries that were weakened in MCF10CA1a showed strong insulation across all cell lines in the TNBC system. These findings highlight that different model systems indeed have distinct profiles of structural change, just as they have distinct gene expression profiles.

      (2) The relationship between loop alterations and copy-number variations (CNVs) is not explored. If conditions permit, it is recommended that overlay differential loops with SNP/Indel/CNV data to exclude spurious differences arising from structural alterations.

      While we have not conducted an in-depth SNP analysis, we have clarified our discussion of the karyotype analysis on pages 21 and 23 and how we mitigated these effects when identifying differential loops between cell lines.

      (3) The horizontal and vertical coordinates of the diagram are difficult to view; it is recommended that the size of the text on the picture be adjusted to ensure that it is clear to read. Some of the text coordinates of the figure are labeled in gray; it is recommended that they be in black.

      The clarity of the figures has been improved.

      Reviewer #3 (Recommendations for the authors):

      I really like this paper. I think if the cancer focus can be down-emphasized (because I'm not fully clear what we've really learned about cancer), then it represents a nice dataset and a thoughtful, comprehensive analysis.

      We greatly appreciate the kind words and helpful feedback. The cancer focus has been toned down throughout the manuscript, as suggested.

      Minor Concerns:

      (1) The authors present a nice summary of the topological changes across samples. However, summary statistics can mask noise/bias and also don't fully convey the effect size of the reported changes. Highlighting individual loci and visualizing these would strengthen the paper and participate in maintaining a high standard for our genomic studies of topology, in which we summarize, but also provide representative examples. I would appreciate seeing more example plots at distinct loci (even if in the supplemental information).

      We have included several more example regions in Supp. Fig. 7 and 12, including four looped genes that change similarly between the MCF10 series and TCGA-BRCA data (2 stably looped genes and 2 differentially looped genes, 2 up-regulated and 2 downregulated), and six differentially looped and differentially expressed genes (3 which change in the same direction as the loops, and 3 which change in the opposite direction).

      (2) "To identify loops that changed significantly during cancer progression, we assessed changes in contact frequency among every loop in each cell type, correcting for karyotypic differences that result in differences in coverage between cell lines (see Methods)." The Methods section is not adequately explained. Also, could you go a bit deeper to define if these large-scale changes shift the 3D genome specifically? This is hard, but there may be some low-hanging fruit given the otherwise fairly isogenic features in your model.

      We have added more detail to the Methods section on pages 21 and 23 on how karyotypic abnormalities were included in our analysis and differential loop calling. A deeper analysis of how large-scale karyotypic changes affect chromatin organization (i.e. through the formation of neoloops and TADs through translocations) is indeed an attractive subject, but due to its complexity requires a separate dedicated study.

      (3) "Approximately half of chromatin loops featured some combination of active gene promoters and enhancers within 10kb of loop anchors". The authors have high-resolution topology data and should be more stringent; these features should have to overlap loop anchors or at least use a distance less than 10kb, which, in some sense, forfeits the advantages of high-resolution topology data.

      The threshold of 10kb was chosen for several specific reasons: First, the loop sizes detected here are large enough that this relatively large region still represents a small fraction of the loop span, and these regions are reasonably considered anchor-proximal. Second, the loops we detect are non-punctate, both in aggregate analysis (Figure 1H, bottom) and at individual loci (see example regions), showing increased contact frequency among several 5kb or 10kb bins. Therefore, adding 10kb to either side (2 pixels on 5kb maps and 1 pixel on 10kb maps) ensures that the full region of increased contact frequency is included. Finally, ultra-resolution Hi-C data has also shown that loops remain diffuse even with 1kb resolution maps (albeit they do get smaller than the 30kb used here) (Harris & Gu 2023). We have added a brief justification of this overlap size to the text on page 24.

      (4) "These results show that not only changes in either contact frequency and enhancer activity correlate with increased gene expression, but they also correlate with each other, suggesting a potentially linked functional role during enhancer-promoter communication." The authors could use this opportunity to disentangle the contributions of loops and chromatin modifications a bit more. The exceptions are of interest - e.g., loop is stable, gene expression changes or loop changes, gene expression does not. Highlighting exemplar cases for these exceptions (rather than just a genomics summary) would be nice to see.

      The additional example regions we have included in Supp. Fig. 7 and 12 now showcase a wider variety of scenarios; in addition to more examples of static loops with gene expression changes (Fig. 2, Supp. Fig. 7E-F) and differential loops with matching gene expression changes (Fig. 4, Supp. Fig. 7C-D, Supp. Fig. 12A-C), we now also feature examples of differential loops where gene expression changes in the opposite direction (i.e. a strengthened loop at a down-regulated gene, Supp. Fig. 12D-F).

    1. Author response:

      eLife Assessment

      This study reports a novel function for syntaxin 11, a specialized SNARE protein critical for the immune system whose mutations cause familial hemophagocytic lymphohistiocytosis type 4. The data convincingly show that depletion of STX11 impairs store-operated calcium entry in Jurkat T cells and that this defect is recapitulated in primary cells from a patient suffering from the disease; the authors further show that the syntaxin interacts with the pore subunit of the ORAI1 channel and propose that it primes the channel by promoting the assembly of multimers before activation by its endogenous ligand, the ER Ca2+ sensing protein STIM1. This is a conceptually important claim that challenges the prevailing view that all structural transitions in ORAI1 are STIM-driven. The data are high-quality and broadly consistent with the interpretation, but alternative mechanisms for the defects are not considered; additional work should rule out vesicular trafficking, discuss other mechanisms, and address methodological issues.

      We thank the editor and reviewers for assessing our work. Although significant amount of data in this paper already rule out any potential defects in the vesicular trafficking of Orai1 in cells lacking STX11, we will still include the additional suggested experiments. In the revised version, we will include the various experiments that we had already performed to measure vesicular trafficking and ER-PM junctions in STX11 depleted cells. We will discuss any remaining alternate explanations, include missing methods, quantifications and calibrations, where applicable, and provide response to each of the reviewer’s comments.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      For readers to appreciate the value of patient experiments derived from a single individual, the authors should quote prior studies showing that STX11 protein levels are abolished in all known human STX11 mutations. The priming model, while functionally well-supported, rests on indirect structural evidence, and the precise conformational transition involved remains to be defined. These are acknowledged limitations, but alternate mechanisms have not been explored and formally excluded. More direct evidence should be provided to exclude the possibility that STX11 could act as a conventional SNARE and sustain calcium fluxes by promoting the delivery of additional ORAI1 channels from vesicles.

      In the revised version, we will include references for the prior STX11 human mutations that have been biochemically characterized till date (Bryceson, Rudd et al. 2007);(Muller, Chiang et al. 2014);(Macartney, Weitzman et al. 2011);(Marsh, Satake et al. 2010). As the reviewer has correctly pointed out, the STX11 protein levels were almost completely abolished in these studies. Therefore, the prior mutations are essentially comparable to the frameshift mutation characterized in this study, in terms of the mechanisms underlying the phenotypic defects reported here versus earlier. From a mechanistic point of view, we believe that our data from even a single FHLH4 patient, where STX11 levels were severely depleted, and additional knockdown studies across three different cell lines, are representative of all STX11 patients that have been reported thus far.

      Regarding the Reviewers’ concern that absence of STX11 as a conventional SNARE could affect Orai1 channel delivery from intracellular vesicles. We would like to point out the following:

      (1) In Miao et al. 2013 (Miao, Miner et al. 2013), Figure 3C-D, we conclusively showed that expression of a dominant negative mutant of NSF, a non-redundant protein in vesicle trafficking, impaired vesicle trafficking but did not impair SOCE. This experiment had essentially ruled out a role for vesicle trafficking in SOCE. In the same paper, we had also shown that Orai1 levels in the PM do not increase post-store depletion (Figure 3, figure supplement 2).

      (2) In this manuscript (Supplementary Figure 3B), we have shown that U2OS cells stably expressing Orai1-BBS-YFP have identical levels of Orai1 in the PM with and without STX11 depletion. This shows that the biosynthesis or delivery of Orai1 to the PM is not affected by STX11 depletion, another broadly classified member of the vesicle trafficking. The levels were also assessed in store-depleted U2OS cells but not included here because in Miao et al. 2013 we had already shown that levels of PM Orai1 are essentially equal in resting and store-depleted cells. In our revised submission, we will include the data from store-depleted cells in U2OS and also repeat this experiment in the other cell types used in this paper. In addition, in our revised submission, we will include three different vesicle trafficking assays performed in STX11 depleted cells.

      (3) Most importantly, in Figure 7I-J of this manuscript, we showed that calcium influx from a constitutively active mutant Orai1 (Orai H134S) is identical between STX11 depleted and scramble control cells. If wildtype Orai1 was indeed stuck in vesicles in STX11 depleted cells, then how would H134S Orai1 be able to rescue the defect in SOCE? In fact, the Orai1 mutant calcium flux assays were done using a 20X water objective, to visualize and confirm whether the expression of mutant and WT Orai1 was comparable in the PM. We will include the quantification of PM levels of Orai1 mutants w.r.t WT Orai1 in the revised paper.

      (4) We have generated and been using HEK293, U2OS and Jurkat cell lines that stably express fluorescently tagged Orai1 for most of our experiments (Miao, Miner et al. 2013); (Li, Miao et al. 2016);(Ramanagoudr-Bhojappa, Miao et al. 2021). In each case, we have never observed Orai1 in intracellular vesicles with or without store depletion. In all cases, it is constitutively and stably expressed in the PM.

      In summary, significant amount of data in this paper already rule out any potential reduction in the PM levels of Orai1 in cells lacking STX11. We will still do the additional experiments suggested by the Reviewer 1.

      Regarding STX11 induced precise conformational transition, we are trying to setup collaborations with scientists who might be able to visualize this in vivo.

      The readers should note that purification of isolated pore subunits of ion channels followed by crystallization or expression in membranes for cryo-EM is currently considered a gold standard in the analysis of ion channel pore subunits. However, we have shown that ion channels are dynamic macromolecular complexes, in vivo (Li, Miao et al. 2016), where synaptic proteins dynamically bind to induce conformational changes and affect their stoichiometry (Li, Miao et al. 2016). Please also see (Chorev, Baker et al. 2018) and (Dorwart, Wray et al. 2010). More advanced in vivo approaches therefore need to be developed to enable visualization of the dynamics of ion channel macromolecular complexes in the native environment. In the absence of such approaches, the structural insights obtained from detergent purified subunits will remain incomplete and biased.

      Reviewer #2 (Public review):

      Weaknesses:

      The authors conclude that Syntaxin 11 directly binds Orai1. This conclusion is well supported by a multifaceted approach, including co-immunoprecipitation (co-IP), molecular dynamics simulations, co-localization/FRET assays, and targeted mutational analysis-all of which are thoroughly executed. While the interaction appears reasonably strong in co-IP experiments, the STX11-Orai1 interaction is comparatively weaker in pull-down assays, which the authors attribute to instability of the purified His-STX11 protein. A remaining gap is direct evidence of interaction in live cells; this is understandably challenging given that fluorescent tagging of STX11 is not feasible. Fully resolving this question lies beyond the scope of the present study and will require more advanced approaches to capture STX11 binding dynamics.

      We thank the reviewer for acknowledging that the above studies will require standardization of advanced techniques which are beyond the scope of the present study. We plan to continue developing methods that will allow us to visualize the binding and unbinding of STX11 to Orai1 in vivo.

      References:

      Bryceson, Y. T., E. Rudd, C. Zheng, J. Edner, D. Ma, S. M. Wood, A. G. Bechensteen, J. J. Boelens, T. Celkan, R. A. Farah, K. Hultenby, J. Winiarski, P. A. Roche, M. Nordenskjold, J. I. Henter, E. O. Long and H. G. Ljunggren (2007). "Defective cytotoxic lymphocyte degranulation in syntaxin-11 deficient familial hemophagocytic lymphohistiocytosis 4 (FHL4) patients." Blood 110(6): 1906-1915.

      Chorev, D. S., L. A. Baker, D. Wu, V. Beilsten-Edmands, S. L. Rouse, T. Zeev-Ben-Mordehai, C. Jiko, F. Samsudin, C. Gerle, S. Khalid, A. G. Stewart, S. J. Matthews, K. Grunewald and C. V. Robinson (2018). "Protein assemblies ejected directly from native membranes yield complexes for mass spectrometry." Science 362(6416): 829-834.

      Dorwart, M. R., R. Wray, C. A. Brautigam, Y. Jiang and P. Blount (2010). "S. aureus MscL is a pentamer in vivo but of variable stoichiometries in vitro: implications for detergent-solubilized membrane proteins." PLoS Biol 8(12): e1000555.

      Li, P., Y. Miao, A. Dani and M. Vig (2016). "alpha-SNAP regulates dynamic, on-site assembly and calcium selectivity of Orai1 channels." Mol Biol Cell 27(16): 2542-2553.

      Macartney, C. A., S. Weitzman, S. M. Wood, D. Bansal, M. Steele, M. Meeths, M. Abdelhaleem and Y. T. Bryceson (2011). "Unusual functional manifestations of a novel STX11 frameshift mutation in two infants with familial hemophagocytic lymphohistiocytosis type 4 (FHL4)." Pediatr Blood Cancer 56(4): 654-657.

      Marsh, R. A., N. Satake, J. Biroschak, T. Jacobs, J. Johnson, M. B. Jordan, J. J. Bleesing, A. H. Filipovich and K. Zhang (2010). "STX11 mutations and clinical phenotypes of familial hemophagocytic lymphohistiocytosis in North America." Pediatr Blood Cancer 55(1): 134-140.

      Miao, Y., C. Miner, L. Zhang, P. I. Hanson, A. Dani and M. Vig (2013). "An essential and NSF independent role for alpha-SNAP in store-operated calcium entry." Elife 2: e00802.

      Muller, M. L., S. C. Chiang, M. Meeths, B. Tesi, M. Entesarian, D. Nilsson, S. M. Wood, M. Nordenskjold, J. I. Henter, A. Naqvi and Y. T. Bryceson (2014). "An N-Terminal Missense Mutation in STX11 Causative of FHL4 Abrogates Syntaxin-11 Binding to Munc18-2." Front Immunol 4: 515.

      Ramanagoudr-Bhojappa, R., Y. Miao and M. Vig (2021). "High affinity associations with alpha-SNAP enable calcium entry via Orai1 channels." PLoS One 16(10): e0258670.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important methodological issue-the fragility of meta-analytic findings-by extending fragility concepts beyond trial-level analysis. The proposed EOIMETA framework provides a generalizable and analytically tractable approach that complements existing methods such as the traditional Fragility Index and Atal et al.'s algorithm. The findings are significant in showing that even large meta-analyses can be highly fragile, with results overturned by very small numbers of event recodings or additions. The evidence is clearly presented, supported by applications to vitamin D supplementation trials, and contributes meaningfully to ongoing debates about the robustness of meta-analytic evidence. Overall, the strength of evidence is moderate to strong.

      Strengths:

      (1) The manuscript tackles a highly relevant methodological question on the robustness of meta-analytic evidence.

      (2) EOIMETA represents an innovative extension of fragility concepts from single trials to meta-analyses.

      (3) The applications are clearly presented and highlight the potential importance of fragility considerations for evidence synthesis.

      Reviewer #3 (Public review):

      (1) The manuscript would benefit from a clearer explanation of in what sense EOIMETA is generalizable. The author mentions this several times, but without a clear explanation of what they mean here.

      This is a point I was remiss not to better elucidate. With regards to generalisation, the text has been modified to explicitly state that generalisability in this context means no specific study dependence, just a net number of subjects required to flip a result. The text reads:

      “Atal's method is highly useful, but one possible objection is that it has the downside of non-generalisability, as it finds very specific combinations of trials and patients that would have to be re-coded (events classified as non-events and vice-versa) for results to become insignificant. For example, an Atal meta-analytic fragility of 4 pertains to a specific and often unique circumstance when 4 patients could be recoded from a specific study or combinations thereof to change outputs, but this does not generalise to any 4 patients in that meta-analysis. This makes this definition of meta-analytic fragility useful but not general, and perhaps less intuitive to interpret than a typical RCT fragility metric. In this work, we establish a generalizable meta-analytic fragility metric, based upon Ellipse of Insignificance (EOI) analysis for dichotomous outcome trials. This method creates a pool of events and non-events in both arms, adjusted for weighing, and answers the general question of how many patients would have to be effectively recoded in a meta-analysis for results to flip, without requiring specific study identification.”

      (2) The authors mentioned the proposed tools assume low between-study heterogeneity. Could the author illustrate mathematically in the paper how the between-study heterogeneity would influence the proposed measures? Moreover, the between-study heterogeneity is high in Zhang et al's 2022 study. It would be a good place to comment on the influence of such high heterogeneity on the results, and specifying a practical heterogeneity cutoff would better guide future users.

      This is a very fair observation, and I need to better explain myself here! So there are effectively two measures of heterogeneity considered in this work; the typical value from a meta-analysis and the measure of divergence between the crude and the inverse-variance weighed adjusted – when these differ my small amounts, one could conceivably use either measure. I’ve changed the text to better reflect this, including:

      “This modification in akin to pooled in a meta-analysis, and adjusts for study level heterogeneity. After this modification, a standard EOI analysis can then be applied to the vector . In addition, we can also employ ROAR analysis to the same vector, yielding the raw number of patients in either or both arm who could be added a given direction to change the result, and exact combination of control and experimental group redactions required to change the result from a significant finding to a null one. Caveats for implementation and interpretation are outlined in the discussion section.”

      (3) I think clarifying the concepts of "small effect", "fragile result", and "unreliable result" would be helpful for preventing misinterpretation by future users. I am concerned that the audience may be confusing these concepts. A small effect may be related to a fragile meta-analysis result. A fragile meta-analysis doesn't necessarily mean wrong/untrustworthy results. A fragile but precise estimate can still reflect a true effect, but whether that size of true effect is clinically meaningful is another question. Clarifying the effect magnitude, fragility, and reliability in the discussion would be helpful.

      This is an excellent suggestion – I’ve tried to do it with percentages, as in table 2, but these are minute in the case of the vitamin D trials, partially I suspect because they are extraordinarily weak. The Cohen’s H for these meta-analyses yields tiny values, which I think might be tied to the virtually negligible percentages we obtain for number needed to flip. With stronger data, it might be worth expanding this into a useful heuristic measure for robustness, though I don’t think vitamin D data as in this work is going to help us much. In light of the reviewer’s excellent comment, I added the following:

      In light of the reviewer’s excellent comment, I added lines 230-240 in the revised manuscript.

      (4) Comments on revisions:

      I am unable to find the author's responses to my previous round comments (Reviewer #3) in the revision package, though replies to the other reviewers are present. I will provide my updated feedback once these responses are available for review.

      My sincere apologies, I neglected the specific comments in error – this document should address them now, thank you again for giving this your time and consideration!

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents evidence that addition of the two GTPases EngA and ObgE to reactions comprised of rRNAs and total ribosomal proteins purified from native bacterial ribosomes can bypass the requirements for non-physiological temperature shifts and Mg+2 ion concentrations for in vitro reconstitution of functional E. coli ribosomes.

      Strengths:

      This advance allows ribosome reconstitution in a fully reconstituted protein synthesis system containing individually purified recombinant translation factors, with the reconstituted ribosomes substituting for native purified ribosomes to support protein synthesis. This represents a significant development in the long-term effort to produce synthetic cells.

      Weaknesses:

      The authors carried out additional experiments indicating that ~60% of the reconstituted ribosomes are functional and that a significant proportion are capable of synthesizing GFP from the correct initiation codon to the correct stop codon, and also of producing an enzymatically active protein at appreciable levels. Their SDS-PAGE and MS analyses of N-terminally tagged GFP are also quite useful but did not assess the frequency of initiation at the wrong start codon, termination at the incorrect stop codon, or the frequency of frameshifting during elongation. This would require examining additional reporters designed to examine dependence on a Shine-Dalgarno sequence or the impact of an in-frame stop codon to assess the fidelity of initiation and termination events, respectively, and one with a programmed frameshift site to assess the elongation fidelity of their reconstituted ribosomes.

      In response to the reviewer’s comment, we expanded the MS analysis and performed additional analyses against amino acid sequences corresponding to all three reading frames (updated Supplementary Data 2). As a result, only a single peptide fragment likely derived from the +1 frame was detected, but its intensity was approximately 1/1000 of that of peptide fragments detected from the normal frame. No other out-of-frame peptides were detected, and no evidence of stop-codon readthrough was found. We consider that these results suggest that the kind of deterioration in ribosome function is not occurring in the reconstituted ribosomes. Because this analysis cannot completely rule out abnormal translation events such as initiation from internal start codons or termination at internal stop codons, we also added a statement acknowledging that further analyses will be required to examine all aspects of the translation reaction.

      Reconstitution studies in the past have succeeded by using all recombinant, individually purified RPs that, if successful here, would have eliminated the possibility that one or more unknown ribosome assembly factors that co-purify with native ribosomes was added to their reconstitution reactions.

      The issue raised by the reviewer was already added at the end of the Discussion in the previous revision. We fully agree with the reviewer’s point and we are currently continuing research in our laboratory aimed at achieving a more fundamental understanding of ribosome assembly.

      Reviewer #2 (Public review):

      This study has developed a single-step method to assemble active bacterial ribosomes under near-physiological conditions by using the GTPase factors EngA and ObgE. These factors eliminate the need for the traditional, harsh manipulations of temperature and magnesium levels. This integration is an important step toward the bottom-up construction of synthetic cells.

      Comments on revisions:

      The authors have addressed my concerns in the previous round of review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors are urged to acknowledge that more sophisticated reporter assays would be required to compare the frequencies of errors occurring at each step of translation using their reconstituted versus native ribosomes.

      As described in our response to Reviewer #1, we performed additional MS analyses, updated Supplementary Data 2, and added a statement acknowledging the reviewer’s comment.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The paper from Hudait and Voth details a number of coarse-grained simulations as well as some experiments focused on the stability of HIV capsids in the presence of the drug lenacapavir. The authors find that LEN hyperstabilizes the capsid, making it fragile and prone to breaking inside the nuclear pore complex.

      I found the paper interesting. I have a few suggestions for clarification and/or improvement.

      (1) How directly comparable are the NPC-capsid and capsid-only simulations? A major result rests on the conclusion that the kinetics of rupture are faster inside the NPC, but are the numbers of LENs bound identical? Is the time really comparable, given that the simulations have different starting points? I'm not really doubting the result, but I think it could be made more rigorous/quantitative.

      (2) Related to the above, it is stated on page 12 that, based on the estimated free-energy barrier, pentamer dissociation should occur in ~10 us of CG time. But certainly, the simulations cover at least this length of time?

      (3) At first, I was surprised that even in a CG simulation, LEN would spontaneously bind to the correct site. But if I read the SI correctly, LEN was parameterized specifically to bind to hexamers and not pentamers. This is fine, but I think it's worth describing in the main text.

      Comments on revisions:

      I found that the authors addressed my concerns satisfactorily. The other reviewer raised a number of important points regarding the nuances of the model and the interpretation of the simulations, which the authors rebutted. I think the paper in its current form now is a worthwhile addition to the literature.

      Reviewer #3 (Public review):

      I have carefully reviewed the manuscript, the two referee reports, and the authors' detailed responses. I appreciate the substantial effort the authors have invested in addressing the reviewers' comments, and I also recognize the strength and ambition of the work. This is a technically sophisticated study that integrates coarse-grained modeling with live-cell imaging to address an important and timely question regarding HIV-1 capsid inhibition by lenacapavir.

      Embedded within Reviewer #2's report are several substantive points that warrant careful consideration, particularly with respect to framing, terminology, and engagement with the broader literature. I view my role here is to distinguish those issues from claims that I do not find to be supported.

      We thank Reviewer 3 for the positive assessment of our work.

      First, I do not agree with Reviewer #2's central assertion that the manuscript lacks novelty or fails to present meaningful new findings. While individual elements of the system studied herecapsid docking at the NPC, lenacapavir-induced capsid hyperstabilization, capsid rupture, and competition with FG- nucleoporins-have been observed previously, this work provides a coherent, mechanistic account of how these elements are coupled. In particular, the proposed sequence linking LEN-induced lattice hyperstabilization, preferential pentamer loss at the narrow end, NPC-induced mechanical stress, and failure of nuclear import represents a nontrivial integration that goes beyond prior phenomenological observations. I therefore do not view this work as redundant with existing literature.

      We thank Reviewer 3 for the positive assessment of our work.

      That said, Reviewer #2 is correct to note that the manuscript would benefit from broader and more explicit engagement with recent independent studies, including computational and hybrid modeling efforts that address capsid mechanics, nuclear entry, and LEN effects using different frameworks. While the authors' bottom-up coarse-grained approach is clearly distinct and, in many respects, more systematically derived, eLife readers would benefit from a clearer discussion of how the present results relate to, complement, or differ from these other approaches. I strongly encourage the authors to add a short discussion paragraph situating their work within this broader context, without disparaging alternative models.

      We have now added several sentences describing papers that use two other CG models that are of some relevance to our work at the beginning of the fourth paragraph of the Introduction, and we have also highlighted the distinguishing features of our work at the end of that paragraph.

      Second, I find that some mechanistic claims in the manuscript would benefit from more careful language distinguishing model-conditioned interpretation from de novo prediction. This applies in particular to discussions of LEN binding heterogeneity and stoichiometry, as well as to conclusions drawn from biased enhanced-sampling simulations. While I agree with the authors that parameterization does not invalidate mechanistic insight, it is important to be precise about what aspects of the behavior emerge from the simulations versus what is constrained by prior experimental knowledge. Modest tightening/revising of language (e.g., "suggests," "is consistent with," "within the model") would address this concern without weakening the scientific conclusions.

      We have revised and softened the language in several places as suggested. However, we do still asert that our overall CG modeling approach is quite rigorous. The use of limited “top down” information on LEN binding is not problematic and in fact warranted in this problem.

      Third, Reviewer #2 raises a legitimate semantic issue regarding the use of the term "elasticity." The manuscript infers changes in capsid mechanical response using heterogeneous elastic network models, which quantify effective stiffness and deformability rather than elasticity in the macroscopic materials sense. I recommend that the authors clarify this definition explicitly in the text to avoid confusion and unnecessary debate.

      We have now added a clarification at the end of the third paragraph of the subsection entitled “LEN binding to the capsid results in hyperstabilized lattice domains”. We have also added text in the second paragraph of the Discussion. Our view is that our perspective is more useful for this problem than a “macroscopic” perspective as the capsid is, in fact, a mesoscopic object and not a macroscopic one.

      Finally, I note that several of Reviewer #2's objections-particularly those asserting circular reasoning, misuse of enhanced sampling methods, or invalidity of coarse-grained predictions reflect a misunderstanding of contemporary bottom-up coarse-grained modeling rather than genuine methodological flaws. I do not believe these points require further rebuttal or revision beyond what the authors have already provided.

      We agree.

      In summary, in my view, the manuscript represents a solid contribution to the field, provided that the authors undertake a limited set of targeted revisions aimed at improving framing, clarity, and engagement with the broader literature. Addressing these points will strengthen the manuscript and ensure that its contributions are clearly and fairly communicated to the community.

      We have done so as suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study examines the cleavage of motor neuron nucleoporins by proteases 2A and 3C of enterovirus D68, a pathogen associated with acute flaccid myelitis. The evidence supporting the effects of EV-D68 proteases on nuclear import and export is solid and confirms previous results on the specific targeting of nucleoporins by proteases from other enteroviruses. However, the claim that cleavage of nucleoporins by EV-D68 2A is neurotoxic, though intriguing, is incomplete, as the evidence is largely indirect.

      We appreciate that the reviewers highlighted multiple strengths of manuscript, including its detailed mechanistic dissection of the disrupted composition and function of the nuclear pore complex during EV-D68 infection, the finding that the viral 2A protease is toxic to motor neurons, and that several novel hypotheses on the pathogenesis of acute flaccid myelitis that are raised by our work.

      It appears that two independent eLife Assessments were made regarding the strength of evidence in our manuscript. The evidence supporting the impact of EV-D68 proteases on the NPC was felt to be solid.

      A second assessment was made as to whether our data support that “the cleavage of nucleoporins by EV-D68 2A is neurotoxic”. We would like to clarify that we did not intend to make this second claim in our manuscript and thought that we had been careful not to do so. In response to reviewer and editorial feedback, we have edited the text to improve the clarity on this issue. Although our data show that 2A<sup>pro</sup> is toxic to motor neurons, it cannot yet be determined whether this toxicity is mediated via 2A<sup>pro</sup>’s effects on the NPC. That is a logical hypothesis that arises from our manuscript, which we are testing through ongoing work that will require a significant volume of experiments that are outside the scope of the present study. We view this manuscript as an important first step towards a comprehensive understanding of the role of the 2A protease in the pathogenesis of AFM. Please see the response to point # 3 of Reviewer 2 below for a more detailed discussion of this issue and the changes we have made to the text in response. We respectfully request that a judgement on the role of nucleoporin cleavage as the mechanism of neurotoxicity not be included in the eLife Assessment.

      Also in response to reviewer feedback that our data was too reliant on the expression of recombinant viral proteins in isolation, we have added additional experiments extending our results into the context of live virus infection of cell lines and motor neurons. We feel that our revised manuscript has been improved as a result of the reviewers’ and editor’s input, and provides strong support for the following claims: (1) NPC composition and function is disrupted during EV-D68 infection, (2) 2A<sup>pro</sup> is primarily responsible for functional disruption, and (3) 2A<sup>pro</sup> is neurotoxic.

      We appreciate your review of this revised manuscript. Detailed responses to each of the reviewers’ comments are provided below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zinn and colleagues investigated the role of proteases 2A and 3C of enterovirus D68 (EVD68), an emerging pathogen associated with outbreaks of acute flaccid myelitis (AFM), a polio-like disease, on the nucleocytoplasmic trafficking in different systems, including human neurons derived from pluripotent cells. They found that 2A specifically cleaved Nup98 and POM121. Using reporter proteins and RNA synthesis and trafficking assays in cells expressing viral proteases, they showed that 2A induces broad loss of the nuclear pore barrier function, but, surprisingly, the RNA export appears to be minimally affected. Since nucleocytoplasmic trafficking defects are known to be associated with neuropatologies, they propose a hypothesis that 2A-dependent cleavage of nucleoporins in motoneurons underlies the development of EVD68-induced AFM. They further show that a 2A-specific inhibitor increases the survival of human neurons differentiated from stem cells upon EVD68 infection.

      Strengths:

      Use of multiple methods to investigate the effect of 2A and 3C expression on nucleoporin cleavage and nucleocytoplasmic trafficking.

      We thank the reviewer for detailed and accurate review of our manuscript and recognition of these strengths.

      Weaknesses:

      Overall, the paper follows multiple others that extensively investigated the cleavage of nucleoporins by enterovirus 2As, so the results are of limited novelty. The hypothesis that infection of motoneurons is the cause of EVD68-induced neurological complications so far is supported by only one autopsy report. Other data suggest that infection of other cell types, such as astrocytes, and/or inflammatory cell infiltration in the CNS, are likely to be responsible for the symptoms. In any case, the claim that EVD68 is specifically neurotoxic because of the 2A-dependent cleavage of nucleoporins in neurons is unfounded, as the virus will be just as "toxic" for other infected cell types.

      While we agree that other papers have investigated this pathway in other enteroviruses, we note that our work is the first to do so in Enterovirus D68 and the most comprehensive study, in terms of the number of nucleoporins studied. As we reviewed in paragraph 5 of the introduction section, the activities of enterovirus proteases against specific nucleoporins varies from strain to strain, and is important to understand any strain-specific effects before determining whether this pathway is relevant to toxicity in AFM.

      The infection of motor neurons is strongly supported not only by the aforementioned autopsy data [1], but also by mouse model data demonstrating replication of EV-D68 within motor neurons in the anterior horn of the spinal cord.[2] There are also numerous reports of electromyography and nerve conduction studies from human AFM patients demonstrating that the site of pathology is the spinal motor neuron.[3-10]

      By contrast, infection of astrocytes has been demonstrated only in primary murine astrocyte cultures in which no neurons were present [11]. Therefore, while the available data suggest that EV-D68 infection of astrocytes is possible, in the in vivo context of human and mouse spinal cord, tropism to motor neurons appears to be preferential. The relative toxicity of neuron-autonomous vs non-autonomous processes such as glial dysfunction and inflammatory cell infiltration remain to be elucidated, and are not mutually exclusive.

      The paper also requires a more convincing presentation of the data.

      We are uncertain what other specific changes the reviewer would like to see based on this comment, but feel that the revisions have improved the presentation of the data.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the role of EV-D68 proteases 2A and 3C in nuclear pore complex (NPC) dysfunction and their contribution to motor neuron toxicity. The authors demonstrate that both proteases cleave only a limited number of nucleoporins, with 2A^pro showing the strongest impact by inhibiting nuclear import and export of proteins and disrupting NPC permeability without affecting RNA export. Importantly, treatment with the 2A^pro inhibitor telaprevir reduced neuronal cell death in a dose-dependent manner, achieving neuroprotection at concentrations below those required to inhibit viral replication. The study addresses a relevant mechanism underlying EV-D68-induced neuropathology and explores a potential therapeutic intervention.

      Strengths:

      (1) Provides significant mechanistic insight into how EV-D68 proteases alter NPC function and contribute to neuronal toxicity.

      (2) The use of recombinant 2A and 3C proteins allows clear dissection of the specific contribution of each protease.

      (3) Demonstrates a therapeutic effect of telaprevir, with neuroprotection independent of viral replication inhibition, adding translational value to the findings.

      (4) The topic is highly relevant given the association of EV-D68 with acute flaccid myelitis.

      We thank the reviewer for their insightful comments and recognition of these strengths in our study.

      Weaknesses:

      (1) Most experiments were performed with recombinant proteases, lacking validation in the context of viral infection, where both proteases act simultaneously.

      In response to this concern, we have added additional experiments in the context of viral infection. We show that POM121 and Nup98 are also cleaved in motor neurons infected with EV-D68 and that their cleavage is inhibited by telaprevir (Fig 4A). We also repeated the EU pulse-chase RNA export assay in EV-D68-infected RD cells and again found no effect on RNA export (Fig 3D-E).

      (2) The conclusion that RNA export is unaffected requires confirmation during actual infection.

      As above, we have repeated this experiment in EV-D68 RD cells, showing no effect of EV-D68 infection on RNA export.

      (3) The reduction of neurotoxicity by telaprevir does not fully demonstrate that the protective effect is solely mediated through NPC preservation; additional analyses of eIF4G cleavage, nucleoporin integrity, and stress granules are needed.

      We agree that while the evidence in our manuscript raises the hypothesis that telaprevir-mediated neuroprotection is mediated via NPC preservation, it does not fully demonstrate this to be the case. As discussed above, we have been careful to state only the following conclusions: (1) NPC composition and function is disrupted during EV-D68 infection, (2) 2A<sup>pro</sup> is primarily responsible for functional disruption, and (3) 2A<sup>pro</sup> is neurotoxic.

      Future work will determine the extent to which NPC dysfunction contributes to 2A<sup>pro</sup>-mediated motor neuron toxicity versus other potential targets of 2A<sup>pro</sup>, as suggested by the reviewer. This work is already underway in our lab and it is clear to us the additional experiments required will be extensive, likely 1-2 additional manuscripts. These experiments are therefore beyond the scope of the present study, which represents a key first step in this line of inquiry.

      We specifically acknowledged in the Discussion that “A significant limitation of our study, however, is that we cannot exclude potentially toxic effects of 2A<sup>pro</sup> on aspects of host neuronal biology aside from the NPC.” We have also made the following adjustments to the text to make it more clear that this remains an open question:

      Change the title to more clearly separate the effects of 2Apro on NPC function and motor neuron toxicity as independent events: “Enterovirus D68 2A protease causes nuclear pore complex dysfunction and independently contributes to motor neuron toxicity”

      In the abstract, shortened the following sentence: “We therefore sought to determine the impact of EV-D68 proteases on NPC composition and function” to avoid any implicit connection that a mechanistic link has been established between these two concepts. Neurotoxicity is now introduced later in the abstract by saying “Independently, we show…” instead of “We further show…”

      Removed language in the last paragraph of the Results section that may have been construed to suggest a mechanistic linkage: “Because similar deficits have been reported to contribute to neurotoxicity in neurodegenerative disease…” and simply stated “We next sought to determine the extent to which 2Apro activity independently contributes to motor neuron injury during EV-D68 infection.”

      Edited the opening sentence of the discussion, where it was ambiguous whether the word “their” was referring to the enterovirus protease (which was our intent) or to NPC disruption as the cause of motor neuron toxicity. We removed the discussion of toxicity from this paragraph entirely to remove such confusion.

      Edited the final paragraph of the discussion to include “We have also demonstrated that 2A<sup>pro</sup> activity contributes to nucleocytoplasmic transport dysfunction and separately to cell death in motor neurons infected with EV-D68”. We then go on to discuss the hypothesis that this toxicity might be mediated partially or entirely through NPC dysfunction, and propose that this be a focus of further study.

      (4) The study would be strengthened by including another 2A inhibitor (e.g., boceprevir) to confirm the specificity of telaprevir's protective effects.

      While we would like to be able to include multiple pharmacologic inhibitors of 2A<sup>pro</sup>, unfortunately telaprevir is the only known inhibitor of EV-D68 2A<sup>pro</sup>. The same study that identified telaprevir as an EV-D68 2A<sup>pro</sup> inhibitor also evaluated boceprevir and determined that its inhibitory activity against 2A<sup>pro</sup> is minimal [12].

      Reviewer #3 (Public review):

      Summary:

      The author showed expression of the viral proteases 2Apro and 3Cpro of EV-D68, which cleaved specific components of the nuclear pore complex (Nup98 and POM121 by 2Apro), and 2A but not 3C expression altered nuclear import and export. Similar nucleocytoplasmic transport deficits are observed in EV-D68-infected RD cells and iPSC-derived motor neurons (diMNs). 2A inhibitor telaprevir partially rescued the nucleocytoplasmic transport deficits and suppressed neuronal cell death after infection. While it's clear that 2A can cleave NPC proteins and affect nuclear transport, the link to neurotoxicity after EV-D68 infection is less convincing.

      This study opens up a very intriguing hypothesis: that EV-D68 2Apro could be directly responsible for motor neuron cell death, mediated by POM121 and possibly Nup98 cleavage, that ultimately results in paralysis known as acute flaccid myelitis. This hypothesis notably does run counter to other published data showing that human neuronal organoids derived from iPSCs can support productive EV-D68 infection for weeks without cell death and that EV-D68-infected mice can have paralysis prevented by depletion of CD8 T cells, still with EV-D68 infection of the spinal cord. However, even if 2Apro is not ultimately responsible for motor neurons dying in human infections, that does not exclude the possibility that cleavage of nups could still disrupt motor neuron function. Notably, most children with AFM have some amount of motor function return after their acute period of paralysis, but most still have some residual paralysis for years to life. It is possible that 2A pro could mediate the acute onset of weakness, while T cells killing neurons could determine the amount of long-term, residual paralysis.

      We thank the reviewer for their thoughtful comments. As discussed above, we agree that the present data demonstrate that 2A<sup>pro</sup> causes NPC dysfunction and is toxic in motor neurons, but has not proven that the mechanism of neurotoxicity is via NPC dysfunction.

      We appreciate the commentary on novel hypotheses opened by our work. Our recent thinking on this topic has been similar and we look forward to addressing these ideas further in future studies. Motor neuron dysfunction and motor neuron death may ultimately prove to have separate causes. The infection of motor neurons is likely the initiating event, with multiple downstream consequences which may be neuron-autonomous, or mediated by glial and inflammatory responses, or a mixture thereof.

      Strengths:

      The characterization of nuclear pore complex components that appear to be targets of both poliovirus and EV-D68 proteases is quite thorough and expansive, so this data set alone will be useful for reference to the field. And the process by which the authors narrowed their focus to EV-D68 2Apro reducing Nup98 and POM121 as consequential to both import and export of nuclear cargo but not RNA was technically impressive, thorough, and convincing. As will be detailed below, when the authors move from studying over-expressed proteases in transformed cell lines to studying actual virus infection in both transformed cell lines and iPSC-derived neurons, some of the data only indirectly support their conclusions; however, the quality of the experiments performed is still high. So even if the claim that 2Apro causes neurotoxicity is circumstantial, the data certainly are intriguing and certainly justify further study of the effects of EV-D68 2Apro on the NPC and how this impacts pathogenesis. This is a convincing start to an intriguing line of inquiry.

      We appreciate the reviewer’s recognition of our comprehensive evaluation of NPC disruption and our approach to arriving at a mechanistic understanding of this process. We agree with the reviewer’s viewpoint that the present study represents a beginning, rather than a conclusive end to this line of inquiry. For technical reasons, we were able to achieve more rigorous and mechanistic data in cell lines expressing recombinant proteins than in neurons infected with live virus. In response the reviewers’ comments, as described above, we have added additional experiments in this revision in which we further evaluate nucleoporin cleavage and RNA export during live virus infection, and performed these experiments in iPSC-derived neurons whenever it was technically feasible to do so.

      Weaknesses:

      This study falls a bit shy of actually showing that 2Apro effects are causing motor neuron toxicity because the evidence of this is fairly indirect. At points, the authors do admit these limitations, but at other times, they claim to have shown the link directly. The following are reasons why these claims are only indirectly supported:

      We agree that we have shown direct toxicity of 2A<sup>pro</sup> in motor neurons, but have not shown that the mechanism is via NPC dysfunction. We felt that we were careful to frame our conclusions as such. However, we have revised the text to improve the clarity on this point as described above.

      (1) Cleavage of Nup98 and POM121 after EV-D68 infection in RD cells and diMNs is never demonstrated.

      We have added data showing the cleavage of POM121 and Nup98 in EV-D68 infected diMNs (Figure 4A).

      (2) Telaprevir was able to rescue nucleocytoplasmic transport in RD cells at low concentrations (Figure 4A). It is not shown if this correlates with its antiviral effect in RD cells, or could this correlate with inhibition of 2A cleavage of Nup98 or POM121, which is never measured.

      In the aforementioned new experiment in Figure 4A, we have also included a dose-response curve for telaprevir showing its inhibition of POM121 and Nup98 cleavage.

      (3) Building off of the prior point, the authors' claim that the neuroprotective effect of telaprevir is independent of its antiviral effect is not well-founded. Figure 4E (neuroprotection) was done with MOI 5, and Figure 4G (virus growth) was MOI 0.5. Telaprevir neuroprotection is not shown at MOI 0.5, nor is the neuroprotective effect correlated with inhibition of 2A cleavage of Nup98 or POM121.

      The selection of MOIs for these two experiments was limited by technical considerations. If the viral growth curve were to be performed at MOI 5, it would be confounded by cell death. Further, a low MOI is required in order to allow multiple rounds of infection, and is therefore more sensitive for assaying the effect of telaprevir on viral replication. On the other hand, at MOI 0.5 diMN death is very gradual, and the neuroprotection assay we would have lacked the statistical power to determine whether a rescue of this small magnitude of toxicity is significant. The EC<sub>50</sub> of telaprevir is not expected to vary at different MOIs.

      We have also now correlated the inhibition of 2A<sup>pro</sup> cleavage of Nup98 and POM121 with the neuroprotective effect at comparable concentrations of telaprevir, as described above.

      (4) The use of mixed virus isolates only in the diMNs is problematic because different EV-D68 isolates are known to have drastically different effects on pathogenesis in mice. Since all initial data were generated with the MO isolate, adding the additional MD isolate to the diMN experiments actually adds uncertainty to the conclusions. It is not clear if the authors infected different cultures with the different isolates and combined the data or infected all cultures with a mixture of the two isolates. If the former, then the data should be reported separately to see the effect of each individual strain, which would be interesting to EV-D68 virologists. If the latter, then there is no way to know from these data whether one of the two isolates had increased fitness over the other and exerted a dominant effect. If the MD isolate overtook the MO isolate, from which all other data in this manuscript are derived, then we have much less of an idea how much the data from the first three figures supports the final figure.

      We apologize for the lack of clarity in describing this experiment. The MO/2014 and MD/2018 isolates were not mixed. These were performed in separate experiments, each with four biologically independent replicates. The original figure showed the mean and SEM for these 8 replicates together. To improve clarity, we separated each viral strain into its own panel of the figure. We have also increased the rigor of the statistical analysis in this experiment by using Cox proportional hazard regression instead of ANOVA.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Please consider both public reviews above and recommendations for the authors below. The general consensus among reviewers is that more evidence is needed to support the claim that 2A causes motor neuron toxicity during infection.

      Reviewer #1 (Recommendations for the authors):

      Most of the conclusions are made upon analysis of images, yet the images themselves are seldom shown. It is difficult to evaluate the validity of conclusions without seeing the material that was analyzed.

      (1) Figure 1. Representative Western blots should be shown.

      We considered including representative western blots in this already large figure, however the figure size and complexity became un-manageable because the figure summarizes the quantification of 246 Western blots. In the original submission, we uploaded a supporting data file that included complete un-cropped Western blots for all experiments, including ladders, loading controls, and clear labeling of the samples. We believe these data allow the reader to assess the quality and reliability of our Western blot experiments while maintaining the approachability of the figures and data presentation. We have also included these supporting data again in the revised manuscript.

      (2) Figure 3. Representative images should be shown. This is especially important for the ethynyl-uridine labeling experiment. It would be highly surprising that RNA transcription and processing would proceed normally in 2A-expressing cells on the background of a major redistribution of nuclear proteins. One possible explanation for that would be that cells that can be analyzed express a relatively small amount of 2A, which is known to be toxic, and thus may not fully represent the cellular changes upon infection. The results from bona fide infected cells would be much more convincing.

      Representative images have been added for the ethynyl-uridine pulse-chase experiment, and this experiment has been repeated in RD cells infected with EV-D68. Transfection of proteases or infection of the cells utilized the same protocols and timeframes upon which nucleoporin cleavage and disruption of protein transport were found to be present. The timepoint for all of these experiments was selected to precede the onset of toxicity, and the representative images demonstrate normal cellular morphology. We also selected for analysis only GFP+ cells with normal morphology, ensuring that only viable 2A<sup>pro</sup>-GFP-expressing cells were included in the analysis. The new experiments again showed no effect on RNA export. We were equally surprised as the reviewers by this outcome. However, as we note in the text, disruption of RNA export has not been uniformly present across all enteroviruses previously studied.

      (3) Figure 4 A-D. Similarly, representative images should be shown.

      We have added representative images for these experiments, which are now Fig 4B-E.

      (4) Figure 4G. The demonstration that the "neuroprotective" effect of 2A inhibitor is not related to the inhibition of viral replication requires a control showing that a similar inhibition of viral replication by an inhibitor with another target would not similarly diminish cell toxicity.

      Neuronal survival experiments showed inhibition of toxicity with concentrations of telaprevir as low as 0.3 uM, a concentration at which there was no significant effect on viral replication. Telaprevir had only a marginal inhibitory effect on viral replication at 10uM (achieving statistical significance in only one of two strains), and no consistent effect on replication at lower concentrations. Therefore, the suggested control experiment would not be possible, because the neuroprotective concentration of telaprevir does not inhibit viral replication

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) Most of the experiments were performed with recombinant 2A and 3C proteins. While these experiments are highly informative for dissecting the role of each protease in NPC dysfunction, it would be important to also perform experiments in the context of infection. How are import and export processes affected when both proteases are present during infection? How is passive transport modified under these conditions?

      Thank you for this important comment. Please see the above discussion of additional experiments that we added utilizing live virus infection to complement the experiments that used recombinant proteins.

      (2) The results regarding RNA export in the presence of recombinant 2A and 3C proteases suggest that RNA export is not altered. It would be important to confirm this finding during infection.

      We agree that this is an important experiment, and have done so as described above.

      (3) While the background information suggests that NPC dysfunction contributes to neurotoxicity, the observed reduction of neurotoxicity by telaprevir does not demonstrate that this effect is solely due to the action of 2A on the NPC. It would be important to evaluate the integrity of eIF4G, nucleoporins, and stress granules during treatment.

      We agree that additional experiments would be required to determine the extent to which the toxicity of 2A<sup>pro</sup> is mediated through its effects on the NPC versus other potential targets. Please see above discussion for more details.

      (4) Including another 2A inhibitor (e.g., boceprevir) would strengthen the conclusions by confirming the results obtained with telaprevir.

      Please see above discussion of boceprevir

      Reviewer #3 (Recommendations for the authors):

      (1) Preferred ICTV nomenclature abbreviates rhinovirus as RV instead of HRV, so the authors should change their abbreviations appropriately. See Simmonds et al.

      Archives of Virology (2020) 165:793-797 https://doi.org/10.1007/s00705-019-04520-6

      We have updated these abbreviations accordingly.

      (2) There is no mention of Figures 1C and 1D in the text.

      These have been added in the appropriate locations.

      (3) In the section "2A protease alters nucleocytoplasmic trafficking of protein substrates" it would be very helpful to just directly state what each construct is meant to demonstrate. Along the lines of "NLS-tdTomato should be located in the nucleus, so seeing more signal in the cytoplasm would indicate a defect in nuclear import." And something equivalent for the other two constructs.

      Thank you for the suggestion. We have added descriptions of the use and interpretation for each construct.

      (4) The following sentence would be more accurate with the addition of "partially" because the effect is not returned to normal levels: "The mislocalization of NLS-tdTomato was partially rescued by 3μM telaprevir."

      We have edited this as recommended.

      (5) SNAP29 is probably a typo and meant to be CREB in the legend of Figure 1B.

      Thank you for catching this. We have corrected this to CREB.

      (6) "Panel A" should likely be "Panel E" in the Figure 4F legend.

      We have corrected this to refer to the appropriate panel, which has also been re-lettered due to the addition of new panels to this figure.

      (7) The authors should at least show representative Western blot data used to determine the data for Figure 1 in a supplemental figure.

      As discussed above, these Western blots were included as supplemental data in the original submission, and have also been included in the revised version.

      (8) As suggested in the public comments, if the diMNs were infected separately with the MO and MD strains of EV-D68, those data should be separated from each other and reported individually. In any case, whatever was done (combined virus inoculum or separate inocula) needs to be clarified.

      These data are now reported separately. Please see above discussion for details.

      References:

      (1) Vogt MR, Wright PF, Hickey WF, De Buysscher T, Boyd KL, Crowe JE, Jr. Enterovirus D68 in the Anterior Horn Cells of a Child with Acute Flaccid Myelitis. N Engl J Med. May 26 2022;386(21):2059-2060. doi:10.1056/NEJMc2118155

      (2) Hixon AM, Yu G, Leser JS, et al. A mouse model of paralytic myelitis caused by enterovirus D68. PLoS Pathog. Feb 2017;13(2):e1006199. doi:10.1371/journal.ppat.1006199

      (3) Andersen EW, Kornberg AJ, Freeman JL, Leventer RJ, Ryan MM. Acute flaccid myelitis in childhood: a retrospective cohort study. Eur J Neurol. Aug 2017;24(8):1077-1083. doi:10.1111/ene.13345

      (4) Elrick MJ, Gordon-Lipkin E, Crawford TO, et al. Clinical Subpopulations in a Sample of North American Children Diagnosed With Acute Flaccid Myelitis, 2012-2016. JAMA Pediatr. Feb 1 2018;173(2):134-139. doi:10.1001/jamapediatrics.2018.4890

      (5) Hovden IA, Pfeiffer HC. Electrodiagnostic findings in acute flaccid myelitis related to enterovirus D68. Muscle Nerve. Nov 2015;52(5):909-10. doi:10.1002/mus.24738

      (6) Knoester M, Helfferich J, Poelman R, et al. Twenty-Nine Cases of Enterovirus-D68 Associated Acute Flaccid Myelitis in Europe 2016; A Case Series and Epidemiologic Overview. Pediatr Infect Dis J. Jan 2018;38(1):16-21. doi:10.1097/INF.0000000000002188

      (7) Martin JA, Messacar K, Yang ML, et al. Outcomes of Colorado children with acute flaccid myelitis at 1 year. Neurology. Jul 11 2017;89(2):129-137. doi:10.1212/WNL.0000000000004081

      (8) Saltzman EB, Rancy SK, Sneag DB, Feinberg Md JH, Lange DJ, Wolfe SW. Nerve Transfers for Enterovirus D68-Associated Acute Flaccid Myelitis: A Case Series. Pediatr Neurol. Nov 2018;88:25-30. doi:10.1016/j.pediatrneurol.2018.07.018

      (9) Van Haren K, Ayscue P, Waubant E, et al. Acute Flaccid Myelitis of Unknown Etiology in California, 2012-2015. JAMA. Dec 22-29 2015;314(24):2663-71. doi:10.1001/jama.2015.17275

      (10) Natera-de Benito D, Berciano J, Garcia A, E MdL, Ortez C, Nascimento A. Acute Flaccid Myelitis With Early, Severe Compound Muscle Action Potential Amplitude Reduction: A 3-Year Follow-up of a Child Patient. J Clin Neuromuscul Dis. Dec 2018;20(2):100-101. doi:10.1097/CND.0000000000000217

      (11) Rosenfeld AB, Warren AL, Racaniello VR. Neurotropism of Enterovirus D68 Isolates Is Independent of Sialic Acid and Is Not a Recently Acquired Phenotype. Mbio. 2019;doi:10.1128/mBio

      (12) Musharrafieh R, Ma C, Zhang J, et al. Validating Enterovirus D68-2A(pro) as an Antiviral Drug Target and the Discovery of Telaprevir as a Potent D68-2A(pro) Inhibitor. J Virol. Jan 23 2019;doi:10.1128/JVI.02221-18

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fogel & Ujfalussy report an extension of a visualization tool that was originally designed to enable an understanding of detailed biophysical neuron models. Named "extended currentscape", this new iteration enables visual assessment of individual currents across a neuron's spatially extended dendritic arbor with simultaneous readout of somatic currents and voltage. The overall aim was to permit a visually intuitive understanding for how a model neuron's inputs determine its output. This goal was worthwhile and the authors achieved it. Their manuscript makes two additional contributions of note: (1) a clever algorithmic approach to model the axial propagation of ionic currents (recursively traversing acyclic graph subsections) and (2) interesting, albeit not easily testable, insights into important neurophysiological phenomena such as complex spike generation and place field dynamics. Overall, this study provides a valuable and well-characterized biophysical modeling resource to the neuroscience community.

      Strengths:

      The authors significantly extended a previously published open-source biophysical modeling tool. Beyond providing important new capabilities, the potential impact of "extended currentscape" is boosted by its integration with preexisting resources in the field.

      The code is well-documented and freely available via GitHub.

      The author's clever portioning algorithm to relate dendritic/synaptic currents to somatic yielded multiple intriguing observations regarding when and why CA1 pyramidal neurons fire complex spikes versus single action potentials. This topic carries major implications for how the hippocampus represents and stores information about an animal's environment.

      Weaknesses:

      While extended currentscape is clearly a valuable contribution to the neuroscience community, this reviewer would argue that it is framed in a way that oversells its capabilities. The Abstract, Introduction, Results, and Methods all contain phrases implying that extended currentscape infers dendritic/synaptic currents contributing to somatic output., i.e. backwards inference of unknown inputs from a known output. This is not the case; inputs are simulated and then propagated through the model neuron using a clever partitioning algorithm that essentially traverses a biologically undirected graph structure by treating it like a time series of tiny directed graphs. This is an impressive solution, but it does not infer a neuron's input structure.

      We are sorry if our text could be interpreted as if we were inferring unobserved inputs from the known outputs. This was not intentional and we were unaware of the possibility of such interpretation.

      In fact, at the beginning of the Results, we started the description of the extended currentscape method by explicitly stating that we need to measure the input currents: “Our method … requires measuring the membrane and axial currents throughout the dendritic tree of a neuron (in every node of the circuit)”.

      To further clarify that our method starts with measuring the input currents, we made this information explicit already in the abstract (“Our approach relies on the iterative decomposition of the axial current flowing between neighbouring compartments in proportion to the underlying membrane currents measured in the model.”), and in the Introduction (“Even if the membrane currents are known, studying the impact of particular ion channels on the neuronal response in such a dynamical system under in vivo conditions is hindered by two major obstacles”). We also rewrote several parts of the text to remove any phrases that could imply the inference of the inputs (line 568). We believe that after clarifying this at the beginning of the paper, the readers will not misinterpret our descriptions later in the text.

      Because a directed acyclic graph architecture is shown in Figure 2, it is unintuitive that the authors can infer bidirectional current flow, e.g. Figure 3 showing current flowing from basal dendrites and axon to soma, and further towards the apical dendrites. This is explained in Methods, but difficult to parse from Results amidst lots of rather abstract jargon (target, reference, collision, compartment). Figure 2 would have presented an opportunity to clearly illustrate the author's portioning algorithm by (1) rooting it in the exact morphology of one of their multicompartmental model neurons and (2) illustrating that "target" and "reference" have arbitrary morphological meanings; they describe the direction of current flow which is reevaluated at each time step.

      We thank for this comment. We agree that the concepts introduced here to explain our method are rather abstract and could be difficult to understand. To help the reader we followed the instructions of Reviewer and redesigned Fig. 2 to provide a step by step explanation of the extended currentscape method. In particular,

      We used a simpler model where the structure of the graph can be directly related to the morphology of the model.

      We show that the target node can connect multiple subtrees with axial currents flowing in different directions. We explain that in this case the inward and the outward subtrees are pruned and partitioned separately.

      We provide a glossary in Table 1 to ensure that the readers can follow our description and do not get lost amidst lots of rather abstract jargon.

      We also clarified that although the target compartment is chosen arbitrarily by the user, it remains the same for all time points throughout the analysis.

      Analyses in Figure 7, C and D, are insightfully devised and illuminating. However, they could use some reconciliation with Figure 5 regarding initiation of individual APs versus CSBs within place fields.

      We thank the reviewer for the positive comments and also for pointing out the potential source of misunderstanding. We slightly changed the text at Fig 5 to emphasize that this is a single example trial, and we added the following sentence to the paragraph describing Fig 7CD: “Consequently, the somatic current dynamics before the iAP and the CSB presented in Fig 5Cc-Dd can be regarded as illustrative samples from a broad distribution, but the differences observed between them are not representative.}”

      The intriguing observations generated by extended currentscape also point to its main weakness, which the authors openly acknowledge: as of now, no experimental methods exist to conclusively tests its predictions.

      We agree with the Reviewer that not being able to apply our extended currentscape method to reveal the current types driving real neurons recorded in vivo is currently a weakness of our approach. However, we would like to emphasize that it may be feasible to use it to estimate the spatial distribution of the membrane currents driving the cell based on in vivo voltage imaging data, as we briefly outline in the discussion.

      Reviewer #2 (Public review):

      Summary

      The electrical activity of neurons and neuronal circuits is dictated by the concerted activity of multiple ionic currents. Because directly investigating these currents experimentally isn't possible with current methods, researchers rely on biophysical models to develop hypotheses and intuitions about their dynamics. Models of neural activity produce large amounts of data that is hard to visualize and interpret. The currentscape technique helps visualize the contributions of currents to membrane potential activity, but it's limited to model neurons without spatial properties. The extended currentscape technique overcomes this limitation by tracking the contributions of the different currents from distant locations. This extension allows tracking not only the types of currents that contribute to the activity in a given location, but also visualizing the spatial region where the currents originate. The method is applied to study the initiation of complex spike bursts in a model hippocampal place cell.

      Strengths.
>

      The visualization method introduced in this work represents a significant improvement over the original currentscape technique. The extended currentscape method enables investigation of the contributions of currents in spatially extended models of neurons and circuits. 
>

      Weaknesses.

      The case study is interesting and highlights the usefulness of the visualization method. A simpler case study may have been sufficient to exemplify the method, while also allowing readers to compare the visualizations against their own intuitions of how currents should flow in a simpler setting. 
>

      We thank the reviewer for this comment. In fact we had been also considering to include a simpler case study to illustrate the extended currentscape method in the original submission. In accordance with the comments from Reviewer 1, we now use a simple model to introduce the concepts in Figure 2 and provide a few examples where the reader can compare the results with their own intuition in simpler cases.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Model complexity vs. intuition/validation. The case study relies on a very complex CA1 model, making it difficult to build intuition about current flow and to validate the visualization. Inclusion of a simpler benchmark (e.g., soma plus a dendrite with two branches, fewer compartments) is recommended to demonstrate how the extended currentscape behaves in a more tractable setting.

      Inspired by the suggestions of the Reviewers, we modified Figure 2 and now first use a simple model with a soma and a dendrite with two branches to introduce the concepts of our analysis. We start with a few examples where the reader can compare the results with their own intuition in simpler cases.

      (2) Rationale and citations for input structure. The in vivo-like input design (untuned inhibition; 12 co-tuned excitatory clusters with large conductances; the goal of generating place fields) would benefit from a more explicit rationale and substantially more literature support. Alternative plausible scenarios (e.g., distributed co-tuned inputs and homosynaptic plasticity) should be articulated, and choices situated within the experimental literature on CA1 excitation/inhibition, including tuning and anti-tuning results.

      We extended the paragraph in the Results describing the input structure and added the most important references there. We added further references to the Methods section where we argue that “Reliable place cell tuning can be achieved by functional synaptic clustering without increased excitatory drive in the place field (Ujfalussy and Makara 2020) or via strong excitatory drive without input clustering (Grienberger et al., 2017, Ujfalussy and Makara, 2020). However, experimental data indicates that both of these mechanisms are present and contribute to the activity of place cells (Adoff et al., 2021,Tasciotti et al., 2025)” and “although interneurons can display spatial tuning, they typically have a broad tuning with low selectivity (Ego-Stengel et al., 2007, Dupret et al., 2013, Geiller et al., 2020). A weak disinhibition within the place field can also contribute to the selective firing of place cells (Geiller et al., 2022, Valero et al., 2022), this was not necessary for place cell activity in novel environments (Geiller et al., 2022) and the overall inhibitory input to place cells is largely untuned (Grienberger et al., 2017).”

      (3) Scope of PCA-based claims. The interpretations derived from the PCA analysis appear broader than warranted, given subcellular heterogeneity and the dominance of somatic action potential variance. These claims should be tempered with more explicit statements about what PCA can and cannot resolve in this context.

      We thank the Reviewer for the opportunity and encouragement to clarify this part of the text. We agree with the Editor and the Reviewers that the results of the PCA analysis can not be used to support claims regarding the presence or the absence of independent dendritic events. In fact, we aimed to use it as an illustration that global activity tends to dominate PCA analysis even when the “neuron is mainly driven by strong, functionally clustered synaptic inputs to a few dendritic branches”. We acknowledge that we did not formulate this point clearly in the original submission. Therefore we substantially rewrote this part of the Results and performed additional analysis to clarify that there is a substantial amount of soma-independent dendritic activity in our model that remains invisible for a PCA based analysis.

      Reviewer #1 (Recommendations for the authors):

      Major concerns:

      (1) Depolarization-inactivated K+ may be an important consideration to model burst-firing.

      Our current model includes 2 kinds of transient K+ channels that show inactivation after depolarization: a proximal and a distal type, as the original model in Jarsky et al., 2005. We now made this explicit in the main text (line 178).

      (2) Description of the in vivo-like model's excitatory and inhibitory input structure needs many more citations of biological studies to communicate rationale for the author's decisions, e.g. untuned inhibitory neurons, organization of a subset of excitatory inputs into 12 function synaptic clusters with co-tuned presynaptic neurons and outsized synaptic conductances. The goal is clearly to create CA1 pyramidal neurons with place fields, which would be helpful to state upfront. But additionally, (a) place fields could arise from homosynaptic potentiation of distributed co-tuned excitatory inputs (e.g., Bittner, et al. 2017 study describing BTSP made no assumptions) and (b) CA1 inhibitory interneurons can be spatially tuned (Ego-Stengel & Wilson, 2006; Wilent & Nitz, 2007; Geiller, et al. 2020) and even anti-tuned (Geiller, et al. 2021).

      We thank the Reviewer for pointing out the lack of appropriate references in this section. We made the following changes in the manuscript:

      (1) Stated explicitly that the goal was to create place cell activity.

      (2) Added references to the main text to justify our choices of the inputs (lines 234-241).

      (3) We included a longer rationale for the choice of synaptic clusters and the lack of inhibitory (anti-)tuning in the Methods section, describing the neuron model. In brief, Adoff et al., 2021 reported more clustering of excitatory inputs within the place field. In our model, the degree of clustering is somewhat larger than the clusters reported. Although inhibitory neurons can be tuned, their tuning is much weaker than that of place cells and seems to play only a minor role in the generation of place fields (Grienberger et al., 2017). The presence of inhibitory anti-tuning is controversial: although Geiller et al., 2021 reported weak (~10%) anti-tuning, they did not find it in novel environments, indicating that it is not needed for spatially selective activity (lines 628-646).

      (3) Interpretation of principal component-based analyses shown in Figure 4 could be toned down. As written in section "CSBs in the CA1 pyramidal neuron", it sounds like CA1 pyramidal neuron dendrites display minimal autonomous activity. However, PCA does not seem well-suited to address the heterogeneity of subcellular voltage dynamics over physiologically relevant timescales. Somatic action potentials, and their backpropagation/modulation of dendritic voltage, would of course explain a very large fraction of variance. However, if local dendritic events summate over fine timescales to initiate somatic firing, it is hard to imagine this important nuance being detected. On the other hand, it is hard to imagine single dendritic branches driving robust somatic firing except in the relatively extreme situation in which large numbers of synapses synchronously drive the same branch to initiate a local Ca2+ spike (Figure 3, A-C).

      We agree with the reviewer that PCA can not reveal the potential dendritic origin of somatic APs, and thus is not suitable to assess the role of local dendritic spikes in shaping the output of the cell. We wanted to highlight here that even in cells with excitable dendrites driven by strong, local input clusters, exhibiting frequent local dendritic spikes, the dendritic membrane potential dynamics will be dominated by global fluctuations with surprisingly little sign of local dynamics in the PCA components. As the reviewer also pointed out, this may not be surprising as local events either remain spatially restricted and thus contribute little to the overall variability of the dendritic Vm or they initiate somatic APs and will thus be counted as global events.

      To demonstrate the high propensity of local dendritic events, we analysed local Vm peaks in dendritic branches and found that ~7.6% of the peaks were not coupled to somatic APs.

      Although this number could seem low, we emphasize that most of the 92.4% of the dendritic peaks coupled to APs potentially reflect the backpropagation of the same somatic events to multiple dendritic sites. To confirm this, we performed an additional analysis measuring the spatial extent (number of branches involved) of the individual dendritic events. We found that 90% of the events remained local, restricted to a few dendritic branches, while 10% of the events were global, associated with BAPs and involving the majority of the dendritic tree. Interestingly, these global events dominate the PCA analysis and are responsible for >90% of the dendritic Vm peaks. These results are included in a new panel in Figure 4H.

      We conclude that, “this way, although only 10% of the dendritic Vm events were associated with bAPs, they were ~60-times larger than local events and they dominated the PCA analysis even in the presence of local regenerative dendritic events driven by strong, functionally clustered synaptic inputs.” We believe that this model and analysis could serve as an important benchmark for future experimental studies investigating the structure of membrane potential correlations in in vivo voltage imaging data (Lee et al., 2026).

      (4) One suggestion would be to display more data as shown in Figure 4F, with a longer X axis to clarify the temporal relationship between local dendritic spikes and the first somatic action potential.

      We added a few more examples including the CSBs presented in Fig8G-I as a new supplementary Figure S4. We also slightly extended the x-axis on this supplementary figure as the reviewer requested.

      If the models indicate that passively filtered EPSPs drive most somatic action potentials, as seems to be the case in Figure 5, then this would also be helpful to show as in Figure 4F.

      In Fig 5 we showed two examples of isolated APs. The first AP was indeed driven by passively filtered EPSPs. The second one was preceded and possibly caused by a dendritic spike, as highlighted by the black arrowhead labelled c in Fig. 5Cc. We further analysed the currents driving iAPs in Fig 7B and C, and found that there is considerable heterogeneity in the magnitude of the dendritic Na currents driving the soma before action potentials. Figure 8 and Figure S3 (now Fig. S5) show further examples for iAPs driven either by passively filtered EPSPs or dendritic spikes. We also included these examples in the new supplementary Figure S4.

      (5) Another suggestion would be to use one-hot vectors containing onset times of different event types, since this would divorce the amplitude/duration of events from their influence over total variance.

      In this paper our goal was to illustrate the ability of the extended currentscape method to reveal the origin of the axial currents driving neuronal activity. In Fig. 4, our primary intention was to characterize the membrane potential response of the model in a way that is easily comparable with experimental data. To further quantify the frequency of local events, we added a new panel showing the spatial extent of dendritic events (Fig. 4H). To make our model more comparable with recent publications, we also calculated two additional metrics used to evaluate the relationship between somatic and dendritic activity (Fig 4I-J). We hope that these additional analyses help the reader to characterize the prevalence and impact of local dendritic events on somatic activity.

      (6) From section "Input conditions for complex spike burst generation", paragraph 2: "Note that synapse density, the ion channel mechanisms and the input statistics are identical for tuft and oblique branches,...". The authors should justify this parameterization given the numerous known differences between tuft and oblique branches in both of these regards and acknowledge accompanying interpretational caveats.

      We agree with the reviewer that experimental data demonstrated several significant differences between the tuft and oblique branches regarding both the inputs they receive and the way they process it. However, in the present paper we chose not to include these differences for several reasons:

      Here we aimed to focus on the abilities of the dendritic currentscape methods and use CSBs as a case study to illustrate how dendritic currentscape can reveal the membrane currents underlying complex neuronal responses.

      Currently there is no CA1PN model that would be able to reproduce all data regarding tuft and oblique integration and would be able to fire calcium spikes. We only wanted to make minimal modifications to the existing CA1PN model to make it capable of generating Ca-spikes and CSBs. We are currently working towards developing and extensively testing a new model, examining the role of these regional differences in CSB generation.

      Although there is information regarding input statistics and dendritic physiology in the literature, many of the relevant parameters are underconstrained. We wanted to avoid overfitting by keeping the model simple.

      By maintaining identical inputs and ion channel distribution we can distinctly highlight the special role of tuft morphology in CSB generation. Altering the inputs or the ion channel density for the tuft would make the interpretation more ambiguous, and elucidating the specific role of the different factors in CSB generation is the subject of future investigations.

      In sum, although we acknowledge that our model does not reflect the full complexity of CA1 PNs and its inputs, we regard this simplicity as a useful feature of the model. We added a section discussing potential future extensions of the model and highlighting interpretational caveats in the discussion (lines 482-490).

      (7) Given the debate in the field regarding the level of functional autonomy present in dendrites, the authors' finding that dendritic voltage largely tracks that of the soma (though see concern above re: PCA), and their access to specific currents, the authors have an important opportunity investigate the divergence between Ca2+ and voltage sensors as reporters of dendritic activity.

      For instance, why have some studies reported relatively common isolated dendritic Ca2+ transients in CA1 pyramidal neurons while other studies, including voltage imaging studies, have reported the opposite?

      We thank the Reviewer for the opportunity to highlight a few important points regarding functional autonomy of dendrites based on the analysis of our model. We would like to first note that only parallel calcium and voltage imaging studies will be able to ultimately resolve this debate. Nevertheless, below we briefly summarize our take on this issue.

      (1) In general, most Ca2+ imaging studies found that soma-independent dendritic events are rare. "Isolated dendritic transients (no coincident somatic event; see fig. S6, C and D, for example) were overall rare. Isolated apical dendritic Ca2+ transients, which have not previously been reported in CA1PNs, were larger and more frequent than those observed in basal dendrites." (O’Hare et al., 2022). "Activity in the ... basal dendrites ... along the track but outside of the place field was rarely observed” (Sheffield and Dombeck, 2014) and “overall, isolated dendritic transients were similar in size but occurred far less frequently than coincident dendrite-soma transients”, or “data indicate that spatially reliable dendritic firing was almost exclusively yoked to somatic tuning, likely reflecting strong backpropagation of burst firing during traversals of the somatic PF” (Rolotti et al., 2022). Consistent with this observation, a dendritic Vm peak chosen randomly from any branch has ~93% probability to be related to a bAP in our model. However, it is also true that ~90% of events in the model are local events, simply because isolated events involve ~60-times fewer branches (1.8 on average) than events associated with bAPs (114 branches) in the model. If the spatial extent of typical local events are also similarly small in real neurons as in the model, then even rare occurrences of dendritic events may reveal substantial dendritic independence. We added a section quantifying the functional autonomy of dendrites in the model in the main text, around Fig 4H.

      (2) Ca2+ indicators are slower and nonlinear and thus they are somewhat unreliable reporters of dendritic voltage events, especially in distal dendrites (Wu et al., 2026; Gonzalez et al., 2026). To illustrate this, we calculated three metrics in our model that were also reported in recent dendritic Ca2+ imaging studies (Rolotti et al., 2022, Sheffield et al., 2014, 2017). First, we calculated the fraction of bAPs detected in a branch (called dendrite-soma coupling in Rolotti et al., 2022, see their Fig. 2C) as a function of the distance of the branch from the soma (our new Fig. 4I). In the Ca2+ imaging data, this was essentially constant ~30% between distances 5-100 µm from the soma. In contrast, the fraction of bAPs detected in the model was 100% in this range as bAPs propagation failures did not occur before µ100 µm. This is also consistent with a recent voltage imaging study showing that even low-transmission bAPs reliably propagate to the proximal dendrites (Lee et al., 2026, Fig 3G). The low and distance independent dendrite-soma coupling reported by Rolotti et al. can only be reconciled with the known biophysics of neurons if the recorded calcium signal is unreliable reporter of the underlying voltage. Indeed, it has been reported that Ca signals associated with bAPs can be absent in some dendritic branches (Landau et al., 2022) or that local, nonlinear Ca signals can appear in the absence of local regenerative voltage response (Weber et al., 2016, Tran-Van-Minh et al., 2016) and that the Ca signals are highly variable across cells (Eltes et al., 2019).

      Second, we calculated the fraction of local events as a function of the distance from the soma (our Fig 4J; see also Fig. 2F in Rolotti et al.). When averaged across all branches, this was somewhat lower in the model (18%) than in the data (38%) which, again, could be explained by the low reliability of detecting global voltage events in all compartments based on the calcium signal.

      Third, the range of branch-spike-prevalence (BSP) values in our model (0.5-0.9; Fig. 4H) seem consistent with that reported (0.4-0.8) at first (Fig 4C of Sheffield et al., 2014; Fig 2 of Sheffield et al., 2017). However, we note that there are several important differences: for technical reasons, Sheffield et al. reported BSP for place field traversals and not for individual events, and they measured Ca2+ dynamics in the basal dendrites. Since bAPs are almost always present in all basal dendrites in the model (basal BSP > 0.9 for all events with somatic spikes) and place field traversals were always accompanied by somatic APs, BSP for basal dendrites would be nearly 1 in the model. Thus, the lower BSP values reported by Sheffield et al. could be explained by the limited reliability of the Ca2+ indicators in reporting regenerative voltage events in neuronal processes.

      We briefly discussed these differences in the Discussion (lines 474-478).

      (3) Finally, to our knowledge, there are 3 relevant in vivo voltage imaging studies in CA1 PNs. Liao et al., 2024 found that in induced place cells the tuning of dendritic events (presumably local or back-propagating Na-spike) was similar to the somatic tuning, which is consistent with our model where dendritic activity and tuning is dominated by bAPs. However, they did not acquire simultaneous signals from the dendrites and the soma so they could not study the independence of the dendritic events. Lee et al. (2026) found that only 10% of the dendritic events are not associated with a somatic spike, which is lower than the number of independent events in the model. However, the events they found were generated in the distal apical trunk (their Fig 3D) and they could not record from the most distal branches where most of the isolated events were generated in our model. Gonzalez et al., 2026 measured voltage and calcium in selected locations within the dendritic tree, and could not reliably estimate the fraction of isolated events throughout the cell. (Gonzalez et al, 2024 measured voltage only in single spines and soma, but did not quantify independent dendritic events; Wong-Campos et al., 2023 measured dendritic integration and bAPs in L23 branches; Wu et al. 2026 recorded in CA2 neurons.)

      We added a paragraph in the discussion comparing the level of functional autonomy present in the model dendrites to recent Ca- and voltage-imaging studies (lines 467-474).

      Minor concerns:

      (1) Abstract:

      There is a need to explain what currentscape is - even at the cost of not invoking its name. To a reader not familiar with currentscape, the abstract is extremely difficult to understand.

      We reworded the title and the abstract to make them more accessible to readers not familiar with the term currentscape.

      (2) "Currentscape analysis of place field dynamics" section:

      It would be helpful to emphasize upfront that dendritic determinants of individual somatic APs versus CSBs will be discussed separately. Since somatic action potentials are discussed before CSBs, I found this section initially confusing as I attributed those findings to CSBs until reading the next paragraph.

      We added a sentence to clarify that we analysed subthreshold responses, APs and CSBs separately.

      (3) Bottom of p2 discussing mixed literature on what drives CSBs in CA1 PCs:

      Overall accurate and useful point, but an important nuance is glossed over which misportrays state of field. References ex vivo studies that fail to drive CSBs with somatic current injection and in vivo study successfully doing so. These aren't really conflicting results. In vivo current injection co-occurs with spontaneous synaptic input, which is high in CA1 and results in PCs that are significantly depolarized at rest relative to those in acute slices. Bittner 2017 ex vivo results are consistent with this: CSBs driven by Cs+-based internal solution to block K+ channels (partially, using strategy of purposefully high series resistance). Similar situation in vivo given that A-type K+ channels are inactivated by depol. Resulting increase in input resistance lowers input threshold to CSB. This is clarified in Results, p.5: "Under in vivo-like synaptic input conditions (see below and Methods), dendritic Ca2+-spikes could also be evoked by somatic current injection (Fig. S1E), as in Bittner et al. (2015).", which makes p. 2 feel especially awkward.

      We agree with the Reviewer that these are not necessarily conflicting results. We rephrased this section, emphasizing that the role of the different input pathways in the initiation of CSBs are not clear.

      (4) Abbreviating "pyramidal neuron" with PC is confusing:

      PC often means place cell. The authors could change this, such that PC refers to "pyramidal cell", or else use PN as an abbreviation. It is important to avoid confusion, especially because place cell dynamics feature prominently in the manuscript.

      Thanks for the suggestion. We replaced PC with PN throughout the manuscript.

      (5) Only apical dendritic parameters are described in section 2 of Results, but the full morphology is shown in Figure 3B with basal currents shown in panels C and F. Some clarification is needed - either what currents were considered for basal dendrites and why, or else why basal dendritic current parameters were not considered for this simulation using apical dendritic current injection but nonetheless examining basal dendritic currents.

      We clarified in the text that the original model contained a standard set of Na and K channels (line 178).

      (6) Clarify "i" and "s" in the Figure 3C legend - "intrinsic" and "synaptic" white letterings are small/hard to see in the bottom subpanels.

      We now spell out intrinsic and synaptic in the Figure and increased the contrast of the letterings.

      (7) Regarding the computational benefit of recursively decomposing axial currents along an adaptively truncated acyclic graph, it would be useful to (a) include a supplemental figure benchmarking this approach to standard approaches to quantify the described gain in computational efficiency and (b) describe computing hardware in the Methods.

      We included an estimated benefit of the pruning process (line 758) as well as the utilised computing hardware and the simulation times in the Methods (line 776).

      Reviewer #2 (Recommendations for the authors):

      The manuscript is in great shape, it is well organized, and the figures are gorgeous. I believe that the extended currentscape is a great extension of the original currentscape method. In particular, the possibility of partitioning currents by the spatial location of their sources is a great addition. 
>

      Recommendations:

      (1) The method is applied in the context of an interesting case study that highlights its usefulness. However, the model in the study is so complex that it is difficult to develop an intuition of how currents should be flowing, and this makes it hard to intuitively validate the visualization method. I think that applying the extended currentscape in a simpler model - maybe a soma with a dendrite with two branches, fewer compartments - would be instrumental in developing this intuition. 
>

      We now first use a simple model with a soma and a dendrite with two branches to introduce the concepts in Figure 2 and provide a few examples where the reader can compare the results with their own intuition in simpler cases. We also added the currentscape analysis of a standard, two-compartmental model from Pinsky and Rinzel, 1994 as Supplementary Figure 1.

      (2) I found a number of typos and minor stylistic details you may want to fix in a revised version of the manuscript.

      (a) Abstractine, line 12. I believe the word "recursive" is a bit technical at this point. It's meaning in this context becomes clear after ones goes through the details of the algorithm (Figure 2). 
>

      We replaced the word “recursive” with “iterative”. We hope that this will make the abstract clearer for the readers. In fact, we realized that the word iterative is a better description of the algorithm, so we replaced the “recursive” with “iterative” consistently throughout the manuscript.

      (b) Figure 1, caption."Since we included the capacitive current, the magnitude of the inward and the outward currents is identical (Kirchhoff's law)."This sentence can be confusing. If the inward and outward currents are the same, the membrane potential doesn't change. I believe that you are including the capacitive current in the inward (or outward) currents.

      Indeed, we included the capacitive current in the inward or outward currents. We changed the text to clarify this.

      (c) Lines 92-93. I do not fully understand this sentence. Are you making an assumption? What does 'continuos flow of axial current' mean?
>

      By ‘continuous flow of axial current’ we meant a spatially continuous stream of axial currents flowing from the reference to the target. To clarify this, we added the explanatory sentence: “i.e., if the axial current is not blocked or reversed between the reference and the target.”

      (d) Equation (1.) Why summing axial currents over j? Is this for the case of a branching point?

      The compartment could be 1) part of a continuous segment of dendritic branch, where axial currents can flow from the distal and the proximal direction (sum over 2); 2) It can be a branch point with 3 axial currents; 3) or it can be a leaf compartment with only one axial current, in which case the summation is not relevant. We clarified this in the text.

      (e) Figure 2, caption. Typo. "When the axial currents flows…" Should it be 'current'? - Figure 3, caption. Typo in (C) "Extended currentscape" 
>

      Corrected.

      (f) Figure 4. I cannot see the grey lines or the dotted lines mentioned in the caption. 
>

      We added an arrow highlighting the gray and the dotted lines in the figure.

      (g) Figure 5, caption. "Red boxes highlight regions analyzed in panels B-D."Because this is a spatially extended model, region may be confused with spatial location, but you are highlighting a temporal interval.
>

      We rephrased the caption referring to temporal intervals now.

      (h) Line 341. This is a numerical experiment, correct? 
>

      We clarified in the text and added that it was indeed a simulation experiment.

      (i) Line 349. Should it be 'distributions'? 
>

      Corrected

      (j) Line 422. Typo. Missing space 'in vivousing'
>

      Corrected

      (k) Line 537. "Preprocessing membrane…" I found this entire subsection a bit confusing and hard to read.

      We rephrased this subsection to clarify it and facilitate reading.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) A few of the claims made are not supported by the references provided. For instance, line 76 states MgIG has hepatoprotective properties and improved liver function, but the reference provided is in the context of myocardial fibrosis.

      Thank you for the correction. We have made the revision on page 4, line74.

      (2) MgIG is clinically used for the treatment of liver inflammatory disease in China and Japan. In the first line of the abstract, the authors noted that MgIG is clinically approved for ALD. In which countries is MgIG approved for clinical utility in this space?

      Thank you for this important comment. MgIG has been recommended for the treatment of alcoholic liver disease (ALD) in Chinese clinical guidelines (2018). We have clarified this point in the manuscript (Page 5, Line 79-80).

      (3) Serum TGs are not an indicator of liver function. Alterations in serum TGs can occur despite changes in liver function.

      Thank you for this important comment. We fully agree that serum triglycerides (TGs) are not a direct indicator of liver function. ALT and AST are more appropriate markers for hepatocellular injury, whereas TG and TC primarily reflect systemic and hepatic lipid metabolism status. We have made the necessary revisions as suggested on page 12, lines 285-288

      (4) There are discrepancies in the results section and the figure legends. For example, line 302 states Idil is upregulated in alcohol fed mice relative to the control group. The figure legend states that the comparison for Figure 2A is that of ALD+MgIG and ALD only.

      We thank the reviewer for the valuable suggestion. Accordingly, we have revised the legend for Figures 2A and 2B as suggested.

      (5) Oil Red O staining provided does not appear to be consistent with the quantification in Figure 1D. ORO is nonspecific and can be highly subjective. The representative image in Figure 1C appears to have a much greater than 30% ORO (+) area.

      Thank you for this insightful comment. We acknowledge that Oil Red O (ORO) staining can be influenced by background signal and may appear subjective in representative images. In our quantification, only well-defined lipid droplets with strong positive staining were included, while diffuse background staining (e.g., light reddish hue) was excluded. This may explain the apparent discrepancy between the representative image and the quantified ORO-positive area. To further strengthen the reliability of our findings, we additionally measured hepatic triglyceride (TG) and total cholesterol (TC) levels. These biochemical assays yielded results consistent with the ORO quantification, thereby supporting our conclusion regarding lipid accumulation. Please refer to page12, lines 285-288. As requested, we have added the required information to Figures 1G.

      (6) The connection between Idil expression in response to EtOH/PA treatment in AML12 cells with viability and apoptosis isn't entirely clear. MgIG treatment completely reduces Idi1 expression in response to EtOH/PA, but only moderate changes, at best, are observed in viability and apoptosis. This suggests the primary mechanism related to MgIG treatment may not be via Idi1.

      Thank you very much. We agree that although MgIG almost completely reverses Idi1 expression induced by EtOH/PA, the improvements in cell viability and apoptosis are only moderate, suggesting a potential discrepancy between these observations. This may indicate that Idi1 functions as a permissive factor, rather than the sole mediator, in this pathological process. In other words, while modulation of Idi1 contributes to the protective effects of MgIG, additional pathways are likely involved in mediating its overall impact on hepatocyte viability and apoptosis. We have clarified this point in the revised manuscript (Page 12, Lines 325–335), stating that MgIG exerts its protective effects against ethanol-induced hepatocellular injury, at least in part, through the regulation of Idi1.

      (7) The nile red stained images also do not appear representative with its quantification. Several claims about more or less lipid accumulation across these studies are not supported by clear differences in nile red.

      Thanks a lot. We acknowledge that Nile Red staining can be influenced by imaging conditions and may appear less distinct in representative images, which could affect visual interpretation. To minimize subjectivity, all images were analyzed using a consistent and standardized thresholding method across groups. We agree that the visual differences in Nile Red staining alone may not be sufficiently pronounced to fully support the quantitative conclusions. Therefore, to strengthen the reliability of our findings, we have included additional biochemical measurements, including serum TG and TC levels, as well as hepatic TG and TC content. These independent assays consistently support the observed changes in lipid accumulation. The corresponding data have been added to the revised manuscript (page 12, lines 285-288)

      (8) The authors make a comment that Hsd11b1 expression is quite low in AML12 cells. So why did the authors choose to knockdown Hsd11b1 in this model?

      Thank you for this important comment. Although the basal expression of Hsd11b1 in untreated AML-12 cells is relatively low, we observed that it is inducible upon EtOH/PA stimulation, indicating its functional relevance under stress conditions. Therefore, knockdown experiments were performed to assess its contribution to EtOH/PA-induced hepatocellular injury. We have clarified this point in the revised manuscript (page 15, lines 281-382).

      (9) Line 380 - the claim that MGIG weakens the interaction between HSD11b1 and SREBP2 cannot be made solely based on one Western blot.

      Thank you for this important comment. We agree that the conclusion that MgIG weakens the interaction between HSD11B1 and SREBP2 should not be based solely on a single co-IP/Western blot experiment. In the revised manuscript, we have therefore toned down this statement to more appropriately reflect the data. Specifically, we now describe this result as a preliminary observation suggesting a potential modulation of the interaction, rather than a definitive conclusion. Please refer to Page 15, line 391.

      (10) It's not clear what the numbers represent on top of the Western blots. Are these averages over the course of three independent experiments?

      Thank you for this helpful comment. We apologize for the lack of clarity in the original figure presentation. The numbers shown above the Western blot bands represent the densitometric quantification of protein expression normalized to GAPDH, calculated from three independent experiments. However, this information was not clearly specified in the original figure, which may have led to confusion. To address this concern, we have now revised the manuscript by explicitly clarifying the meaning of these values in the figure legends. In addition, we have added bar graphs showing the quantified results from three independent experiments for Figures S3A, S4D, S6B, and S8H to improve transparency and data presentation.

      (11) The claim in line 382 that knockdown of Hsd11b1 resulted in accumulation of pSREBP2 is not supported by the data provided in Figure 6D.

      Thank you for pointing out this issue. We sincerely apologize for the incorrect description in the original manuscript. This was a wording error. We have made the revision on page 15, line394-396.

      (12) None of the images provided in Figure 6E support the claims stated in the results. Activation of SREBP2 leads to nuclear translocation and subsequent induction of genes involved in cholesterol biosynthesis and uptake. Manipulation of Hsd11b1 via OE or KD does not show any nuclear localization with DAPI.

      Thank you for this important comment. We agree that the original description was not sufficiently clear, which may have led to misunderstanding of the results. To clarify, Figure 6E includes two experimental contexts. Under basal (physiological) conditions in AML-12 cells, manipulation of Hsd11b1 (overexpression or knockdown) does not significantly affect the subcellular distribution of SREBP2. However, under EtOH/PA-induced stress conditions, Hsd11b1 overexpression promotes both nuclear and cytoplasmic levels of SREBP2, whereas Hsd11b1 knockdown reduces SREBP2 expression in both compartments. We have made the revision on page 16, line399.

      (13) The entire manuscript is focused on this axis of MgIG-Hsd11b1-Srebp2, but no Srebp2 transcriptional targets are ever measured.

      We sincerely appreciate this great suggestion. We have made the necessary revisions as suggested on page 12, lines 285-288, line 292 by adding the mRNA changes of Lcn2 and Ldlr, which are SREBP2 target genes. As requested, we have added the required information to Figures 1F and 1H.

      (14) Acc1 and Scd1 are Srebp1 targets, not Srebp2.

      Thank you for this important comment. We agree that Acc1 and Scd1 are well-established downstream target genes of SREBP1 rather than SREBP2. To better support our proposed SREBP2-related mechanism, we further examined canonical SREBP2 downstream target genes, including Lcn2 and Ldlr. The results are consistent with activation of SREBP2 signaling in our model. These data have now been included in the revised manuscript (Page 12, Lines 285–288 and 292; Figures 1F and 1H).

      (15) A major weakness of this manuscript is the lack of studies providing quantitative assessments of Srebp2 activation and true liver lipid measurements.

      Thank you for this important comment. We acknowledge the concern regarding the lack of direct quantitative assessment of SREBP2 activation in the original version of the manuscript. To address this limitation, we have strengthened the evidence supporting SREBP2 activation using multiple complementary approaches. Specifically, we assessed the expression of canonical SREBP2 downstream target genes (Page 12, Lines 285–288 and 292; Figures 1F and 1H), together with Western blot analysis (Figure 6D) and immunofluorescence staining (Figure 6F), which collectively support activation of SREBP2 signaling in the EtOH/PA-induced ALD model.

      In addition, to provide a more comprehensive evaluation of hepatic lipid accumulation, we measured serum TG and TC levels, as well as hepatic TG and TC content. These biochemical analyses further confirm the presence of significant lipid accumulation in our model. We have made the necessary revisions as suggested on page 12, lines 285-288 (Figure 1G).

      Reviewer #2 (Public review):

      (1) In Supplemental Figure 1A, all the treatment arms (A-control, MgIG-25 mg/kg, MgIG-50 mg/kg) showed body weight loss compared to the untreated controls. However, Figure 1E showed body weight gain in the treatment arms (A-control and MgIG-25 mg/kg), why? In Supplemental Figure 1A, the mice with MgIG (25 mg/kg) showed the lowest body weight, compared to either A-control or MgIG (50 mg/kg) treatment. Can the authors explain why MgIG (25 mg/kg) causes bodyweight loss more than MgIG (50 mg/kg)? What about the other parameters (ALT, ALS, NAS, etc.) for the mice with MgIG (50 mg/kg)?

      We agree that this observation does not strictly follow a dose-dependent pattern. In vivo responses to pharmacological interventions, particularly in metabolic and liver disease models, are not always linear. The relatively greater body weight reduction observed in the 25 mg/kg group may be influenced by inter-individual variability, differences in metabolic adaptation, or sample size–related variation. Importantly, these differences in body weight were not statistically significant. Therefore, we selected the 50 mg/kg dose for subsequent animal experiments, as it demonstrated more consistent and stable improvements across multiple parameters, including body weight, ALT, AST, TG, and TC.

      (2) IL-6 is a key pro-inflammatory cytokine significantly involved in ALD, acting as a marker of ALD severity. Can the authors explain why MgIG 1.0 mg/ml shows higher IL-6 gene expression than MgIG (0.1-0.5 mg/ml)? Same question for the mRNA levels of lipid metabolic enzymes Acc1 and Scd1.

      Thank you for this important comment. We agree that IL-6, as well as lipid metabolism–related genes such as Acc1 and Scd1, are key indicators in ALD. The relatively higher expression observed at 1.0 mg/mL MgIG compared to lower concentrations (0.1–0.5 mg/mL) may be related to experimental constraints associated with the MgIG formulation used in this study.

      Specifically, to maintain consistency with our in vivo experiments, we used a clinically available liquid formulation of MgIG (5 mg/mL), which is approved for intravenous administration in China. Due to its relatively low stock concentration, achieving higher working concentrations (e.g., 1.0 mg/mL) in vitro required a larger volume of the MgIG solution, thereby proportionally reducing the volume of culture medium. This reduction in effective culture conditions may adversely affect hepatocyte viability and function.

      Supporting this, our CCK-8 and LDH assays indicated that higher MgIG concentrations were associated with subtle cytotoxicity or impaired cell status.

      (3) For the qPCR results of Hsd11b1 knockdown (siRNA) and Hsd11b1 overexpression (plasmid) in AML-12 cells (Figure 5B), what is the description for the gene expression level (Y axis)? Fold changes versus GAPDH? Hsd11b1 overexpression showed non-efficiency (20-23, units on Y axis), even lower than the Hsd11b1 knockdown (above 50, units on Y axis). The authors need to explain this. For the plasmid-based Hsd11b1 overexpression, why does the scramble control inhibit Hsd11b1 gene expression (less than 2, units on the Y axis)? Again, this needs to be explained.

      Thank you for this important comment, and we apologize for the lack of clarity in the Y-axis labeling, which may have led to misunderstanding.

      As shown in Figures 5A and 5B, we have revised the Y-axis description to clearly indicate that gene expression levels are presented as relative expression normalized to GAPDH (fold change relative to the control group).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Use terms that show directionality to help the readers comprehend the data. For instance, Line 295 states MgIG treatment also modulated the expression.... In reality, MgIG treatment reduced the expression of those genes relative to ethanol-fed control mice.

      Thank you very much for this precious suggestion. We have thoroughly revised this part as ‘In line with the observed histological and physiological improvements, MgIG treatment also reduced the expression of genes involved in lipid synthesis metabolism (Srebp1, Srebp2, Acc1, and Scd1, Lcn2, and Ldlr), inflammation (Tnf-α and Il-6), and pro-apoptosis (Bax) while restored the level of anti-apoptotic gene (Bcl2) in the liver tissue of EtOH mice (Fig. 1G-1H).’. Please refer to page 12, lines 290-294.

      (2) Oil Red O staining is subjective and nonspecific. The authors make a claim that serum TGs are an indicator of liver function; however, measurement of hepatic TGs would be a better measure here and more consistent with the ORO staining.

      We sincerely appreciate this great suggestion. We have made the necessary revisions as suggested on page 12, lines 285-288 as ‘Notably, significant differences were observed between the EtOH group and the MgIG-treated (EtOH+M) group in serum levels of liver enzymes (ALT and AST), serum lipid parameters (TG and TC), as well as Liver TG and TC contents—-key indicators of liver function and lipid metabolism.’. As requested, we have added the required information to Figures 1G.

      (3) The focus of the paper is on this SREBP2 axis. However, in Figure 1, the authors do not show any SREBP2 target genes. This would be helpful in interpreting SREBP2 activity. Further, hepatic free cholesterol levels would also strengthen these data.

      We sincerely appreciate this great suggestion. We have made the necessary revisions as suggested on page 12, lines 285-288, line 292 by adding the mRNA changes of Lcn2 and Ldlr, which are SREBP2 target genes. As requested, we have added the required information to Figures 1F and 1H.

      (4) Labels showing directionality on the volcano plots in Figures 2A, B would be of great help here. It's unclear which groups are on the left or right.

      Thank you very much! The authors have revised Figures 2A-C as requested. Please refer to the new version of Figures 2A-C.

      (5) Ensure consistency in what is written in the results and the figure legends. See Figure 2 volcano plots for examples. The volcano plot in Figure 2B has no figure legend.

      We thank the reviewer for the valuable suggestion. Accordingly, we have revised the legend for Figures 2B as suggested.

      (6) Ensure consistency in the nomenclature. In some cases, the authors use ALD+MgIG, and in others, they just use MgIG. My recommendation would be to use Ctrl, EtOH, EtOH+M.

      We sincerely appreciate this great suggestion. We have made the necessary revisions as suggested on page 6, lines 111-112, page 11, line 280 and page 12, line 282, 284, 293, 298, 301.

      (7) The gene enrichment analysis in Figure 2C should also include some text about directionality, either in the figure or the figure legend. Upregulated DEGs in the MgIG group? It's unclear.

      We thank the reviewer for the valuable suggestion. Accordingly, we have revised the legend for Figures 2C as suggested.

      (8) The authors should consider shuffling the order of some of the figures for better transitions from one panel to the next. For instance, Figure 3B, C shows cell viability responses before showing the siRNA and OE are effective in knocking down and overexpressing their protein of interest.

      We thank the reviewer for the valuable suggestion. Accordingly, we have revised the legend for Figures 3B and 3C as suggested.

      (9) The authors need to be consistent in the colors that are used in the figures. It's incredibly hard to follow, as presented.

      We appreciate the reviewer's comment regarding color consistency. In response, we have carefully revised all figures to ensure consistent use of colors across the manuscript. The updated versions are shown in Figures 3, 6, and 7.

      (10) For Nile Red staining, multiple images at a lower objective need to be shown and/or cellular triglycerides and cholesterol levels should be quantified.

      We appreciate the reviewer's insightful comment regarding the Nile Red staining. In response, we have quantified triglyceride and total cholesterol levels in the cell supernatant, which are now presented on page 12, line 285-287 and Figures 2F. Furthermore, we have included additional Nile Red staining images at a lower objective in Supplementary Figures 2D, 3B, 4C to better illustrate the lipid droplet distribution.

      (11) Line 362 refers to Figure 4 when it should refer to Figure 5.

      Thank you very much! The authors have revised on page 14, line 364.

      (12) qPCR should be performed on canonical Srebp2 targets throughout the manuscript to tie in the MgG treatment with changes in sterol sensing and Srebp2.

      Thank you for your valuable suggestion. The results are now included on page 12, lines 292 and 311, and the corresponding data in Figures 1H and 2G have been enhanced accordingly.

      Reviewer #2 (Recommendations for the authors):

      (1) The statement, figure labeling, and figure legend for Figure 1A-C are confused. The MgIG dosing on the X-axis for Figure 2D is missing.

      Thank you for the correction. We have revised this problem. Please refer to the new version of Figure 1A-C and Figure 2D.

      (2) Figure 3E is not well described in the main text and figure legend. What are those numbers on top of the blotting bands? It was guessed that the numbers were the mean for each group. But where is the SD or SE for each group? It is hard to tell the statistical significance without showing SD or SE. The same question applies to Figure 5E, Figure 56C-6D, and Figure 7G.

      We sincerely appreciate this great suggestion. We have made the necessary revisions as suggested on page 13, lines 317-322. As suggested, we have added the required information to Figures S3A, S4D, S6B and S8H.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The researchers aimed to identify which neurotransmitter pathways are required for animals to withstand chronic oxidative stress. This work thus has important implications for disease processes that are caused/linked to oxidative stress. This work identified specific neurotransmitters and receptors that coordinate stress resilience, both prior to and during stress exposure. Further, the authors identified specific transcriptional programs coordinated by neurotransmission that may provide stress resistance.

      Strengths:

      The manuscript is very clearly written with a well-formulated rationale. Standard C. elegans genetic analysis and rescue experiments were performed to identify key regulators of the chronic oxidative stress response. These findings were enhanced by transcriptional profiling that identified differentially expressed genes that likely affect survival when animals are exposed to stress.

      Weaknesses:

      Where the gar-3 promoter drives expression was not discussed in the context of the rescue experiments in Fig 7.

      Comments on revisions:

      This issue has now been appropriately addressed in the revision.

      We thank the reviewer for their time and constructive feedback.

      Reviewer #2 (Public review):

      In this paper, Biswas et al. describe the role of acetylcholine (ACh) signaling in protection against chronic oxidative stress in C. elegans. They showed that disruption of ACh signaling in either unc17 mutant or gar-3 mutants led to sensitivity to toxicity caused by chronic paraquat (PQ) treatment. Using RNA seq, they found that approximately 70% of the genes induced by chronic PQ exposure in wild type failed to upregulate in these mutants. The overexpression of gar-3 selectively in cholinergic neurons was sufficient to promote protection against chronic PQ exposure in an AChdependent manner. The study points to a previously undescribed role for ACh signaling in providing organism-wide protection from chronic oxidative stress likely through the transcriptional regulation of numerous oxidative stress-response genes. The paper is well-written, and the data are robust, though some conclusions seem preliminary and are not fully support the current data (see below). While the study identifies the muscarinic ACh receptor gar-3 as an important regulator of the response to PQ, the specific neurons in which gar-3 functions were not unambiguously identified, and the sources of ACh that regulate GAR-3 signaling and the identities of the tissues targeted by gar-3 were not addressed.

      Comments on revisions:

      The authors addressed my comments adequately in their revised submission. Please include representative images to accompany the quantification of the new results presented in Fig S4A.

      We thank the reviewer for their time and constructive feedback. We now include representative images as requested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In the manuscript "Conformational Variability of HIV-1 Env Trimer and Viral Vulnerability", the authors study the fully glycosylated HIV-1 Env protein using an all-atom forcefield. It combines long all-atom simulations of Env in a realistic asymmetric bilayer with careful data analysis. This work clarifies how the CT domain modulates the overall conformation of the Env ectodomain and characterizes different MPER-TMD conformations. The authors also carefully analyze the accessibility of different antibodies to the Env protein.

      Strengths:

      This paper is state-of-the-art, given the scale of the system and the sophistication of the methods. The biological question is important, the methodology is rigorous, and the results will interest a broad audience.

      Weaknesses:

      The manuscript lacks a discussion of previous studies. The authors should consider addressing or comparing their work with the following points:

      (1) Tilting of the Env ectodomain has also been reported in previous experimental and theoretical work: https://doi.org/10.1101/2025.03.26.645577

      (2) A previous all-atom simulation study has characterized the conformational heterogeneity of the MPER-TMD domain: https://doi.org/10.1021/jacs.5c15421

      (3) Experimental studies have shown that MPER-directed antibodies recognize the prehairpin intermediate rather than the prefusion state: https://doi.org/10.1073/pnas.1807259115

      (4) How does the CT domain modulate the accessibility of these antibodies studied? The authors are in a strong position to compare their results with the following experimental study: https://doi.org/10.1126/science.aaa9804

      Based on the Reviewer’s comments and suggestions, we have added a discussion related to each previous study mentioned above.

      (1) Tilting of the Env ectodomain has also been reported in previous experimental and theoretical work: https://doi.org/10.1101/2025.03.26.645577

      At the end of the third paragraph (originally the second paragraph) in the Discussion section we added:

      “Shehata et al. also built a model of full-length gp120–gp41 trimer embedded in a lipid bilayer and performed all-atom simulations, in which a tilting motion of the ectodomain was observed. Based on the analysis of accessible surface area using different probe radii, they reported that antibody epitopes on the ectodomain are largely shielded by glycans, while the MPER epitope is mainly occluded by the membrane with tilt angles above 30° required to achieve greater MPER exposure (Shehata et al., 2025).”

      (2) A previous all-atom simulation study has characterized the conformational heterogeneity of the MPER-TMD domain: https://doi.org/10.1021/jacs.5c15421

      In the middle of the first paragraph in the Discussion section we added:

      “This is consistent with the all-atom simulations of MPER–TMD–CT and MPER–TMD in an asymmetric membrane conducted by Majumder et al., which likewise show multiple different conformational states of MPER and TMD (Majumder et al., 2025).”

      (3) Experimental studies have shown that MPER-directed antibodies recognize the prehairpin intermediate rather than the prefusion state: https://doi.org/10.1073/pnas.1807259115

      The paper mentioned by the Reviewer mainly reports the NMR structure of the MPER and TMD. In this study, the authors experimentally examined a series of MPER mutations to assess whether alterations in the MPER affect epitope accessibility in other regions of the Env ectodomain. This study did not investigate whether MPER-directed antibodies recognize the prehairpin intermediate. Instead, it cited prior studies (Frey et al.; 2008, Alam et al., 2009; and Chen et al., 2014) reporting that MPER-directed antibodies target the prehairpin intermediate conformation. We have already cited two of them (Alam et al., 2009 and Chen et al., 2014) in the original preprint, and we have now added the third one (Frey et al., 2008) in the revised manuscript.

      In the middle of the third paragraph (originally the second paragraph) in the Discussion section we added:

      “This is consistent with experiment studies indicating that MPER-targeting antibodies bind effectively only after the gp120–gp41 trimer undergoes major conformational rearrangements toward a fusion-intermediate or post-fusion state (Frey et al., 2008; Alam et al., 2009; Chen et al., 2014; Lee et al., 2016).”

      (4) How does the CT domain modulate the accessibility of these antibodies studied? The authors are in a strong position to compare their results with the following experimental study: https://doi.org/10.1126/science.aaa9804

      At the beginning of the second paragraph in the Discussion section we added:

      “Comparison of the full-length and CT-truncated systems shows that the primary difference arises from changes in the lipid bilayer, particularly in the exoplasmic leaflet, whereas differences in protein conformation and dynamics are less evident. Previous experimental studies have reported that mutations of the TMD residue and CT truncation can substantially affect antigenicity of ectodomain (Edwards et al., 2002; Chen et al., 2015; Dev et al., 2016). However, the ectodomain remains relatively rigid in our simulations for both full-length and CT-truncated systems. It is unclear whether this behavior reflects insufficient conformational sampling or artifacts associated with the model structures. Structural information for the CT is very limited, and the NMR structure (PDB ID: 7LOI) was the only available CT structure at the time the simulation systems were constructed. As a result, the extent to which this structure represents the native CT conformation remains uncertain. Additional experimental structural characterization of the CT will be important for achieving a more complete understanding of its functional role.”

      Reviewer #1 (Recommendations for the authors):

      A minor point: The RMSD values in Figure 3-figure supplement 1, seem a little too small. Please check the units.

      Figure 3-figure supplement 1 shows the RMSD of the ectodomain. Prior to RMSD calculation, the snapshots extracted from each trajectory were aligned to the initial structure using the ectodomain as the reference to avoid falsely high RMSD values arising from different orientations of the ectodomain. The relatively small RMSD values therefore reflect the intrinsic structural stability of the ectodomain, indicating that its internal conformation remains stable even though it undergoes substantial tilting motions.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors aim to elucidate how a viral surface protein behaves in a membrane environment and how its large-scale motions influence the exposure of antibody-binding sites. Using long-timescale, all-atom molecular dynamics simulations of a fully glycosylated, full-length protein embedded in a virus-like membrane, the study systematically examines the coupling between ectodomain motion, transmembrane orientation, membrane interactions, and epitope accessibility. By comparing multiple model variants that differ in cleavage state, initial transmembrane configuration, and presence of the cytoplasmic tail, the authors aim to identify general features of protein-membrane dynamics relevant to antibody recognition.

      Strengths:

      A major strength of this study is the scope and ambition of the simulations. The authors perform multiple microsecond-scale simulations of a highly complex, biologically realistic system that includes the full ectodomain, transmembrane region, cytoplasmic tail, glycans, and a heterogeneous membrane. Such simulations remain technically challenging, and the work represents a substantial computational and methodological effort.

      The analysis provides a clear and intuitive description of large-scale protein motions relative to the membrane, including ectodomain tilting and transmembrane orientation. The finding that the ectodomain explores a wide range of tilt angles while the transmembrane region remains more constrained, with limited correlation between the two, offers useful conceptual insight into how global motions may be accommodated without large rearrangements at the membrane anchor.

      Another strength is the explicit consideration of membrane and glycan steric effects on antibody accessibility. By evaluating multiple classes of antibodies targeting distinct regions of the protein, the study highlights how membrane proximity and glycan dynamics can differentially influence access to different epitopes. This comparative approach helps place the results in a broader immunological context and may be useful for readers interested in antibody recognition or vaccine design.

      Overall, the results are internally consistent across multiple simulations and model variants, and the conclusions are generally well aligned with the data presented.

      Weaknesses:

      The main limitations of the study relate to sampling and model dependence, which are inherent challenges for simulations of this size and complexity. Although the simulations are long by current standards, individual trajectories explore only portions of the available conformational space, and several conclusions rely on pooling data across a limited number of replicas. This makes it difficult to fully assess the robustness of some quantitative trends, particularly for rare events such as specific epitope accessibility states.

      In addition, several aspects of the model construction, including the treatment of missing regions, loop rebuilding, and initial configuration choices, are necessarily approximate. While these approaches are reasonable and well motivated, the extent to which some conclusions depend on these modeling choices is not always fully clear from the current presentation.

      Finally, the analysis of antibody accessibility is based on geometric and steric criteria, which provide a useful first-order approximation but do not capture potential conformational adaptations of antibodies or membrane remodeling during binding. As a result, the accessibility results should be interpreted primarily as model-based predictions rather than definitive statements about binding competence.

      Despite these limitations, the study provides a valuable and carefully executed contribution, and its datasets and analytical framework are likely to be useful to others interested in protein-membrane interactions and antibody recognition.

      Based on the Reviewer’s comments, we have revised the Discussion section to emphasize the limitation related to model construction and analysis of antibody accessibility.

      In the middle of the second paragraph in the Discussion section we added:

      “Similar limitations apply to other modeled regions where structural information is incomplete, including missing loops in the ectodomain, the cleavage site and heptad repeat 2 where two PDB structures (IDs: 6B0N and 7LOI) were merged. These regions introduce additional uncertainty, and the extent to which they influence the interpretation of our results remains an open question.”

      In the middle of the third paragraph (originally the second paragraph) in the Discussion section we added:

      “In addition, this analysis is based on geometric and steric criteria without accounting for potential conformational adaptations of gp120–gp41, antibodies, or the membrane; therefore, the calculated frequency of antibody accessibility should be interpreted as an approximation rather than a definitive indicator of binding competence.”

      Reviewer #2 (Recommendations for the authors):

      (1) Lines 45-47: The phrase "A major breakthrough was the design of ..." may be confusing. The gp140 trimer refers to a naturally occurring form of the HIV envelope protein rather than a structure designed de novo. If this statement refers to the development of a specific experimental construct or model system, this should be clarified to avoid misunderstanding.

      We have revised the sentence to clarify that the statement refers to soluble gp140 trimer constructs developed to stabilize the prefusion Env ectodomain for structural and immunological studies.

      At the beginning of the second paragraph in the Introduction section, we have modified the following:

      “A major advance was the development of soluble gp140 trimers, composing gp120 and the ectodomain portion of gp41, designed to stabilize the prefusion Env trimer for structural and immunological characterization.”

      (2) Figure 1A: The figure displays a model structure lacking the cytoplasmic tail. Given that the full-length model is central to the study, the authors may wish to explain why the truncated structure is shown here or consider displaying the full-length model to better reflect the complete system analyzed.

      We have combined Figure 1 and Figure 1—figure supplements 1 to show both full-length and CT-truncated models in one figure. We have also added an explanation of why the CT-truncated model was used as the primary system for analysis.

      In the middle of the third paragraph in the Introduction section we added:

      “However, structural information for the CT remains limited, leading to uncertainty in its conformational organization. To reduce potential bias arising from this uncertainty, we also generated a CT-truncated model and used it as the primary system for analysis (Figure 1, Figure 1—figure supplements 1).”

      We have modified Figure 1

      We removed Figure 1—figure supplements 1

      (3) Line 106: The probability distributions of θEC and θTM are cited in support of the statement that the angles "typically range from ... with occasional tilting." Providing explicit quantitative measures (for example, means, percentiles, or fractions of time spent in different angular regimes) would strengthen this claim.

      We have revised the text to explicitly indicate that only 0.7‰ of the sampled θ<sub>EC</sub> values are greater than 40°.

      In the middle of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD” we have modified the following:

      “Across trajectories, θ<sub>EC</sub> typically ranges from 0° to 40°, with only 0.7‰ exceeding 40°”.

      (4) Figure 2: The meaning of the contour lines is not clearly explained. If these represent probability density estimates of angular values over the trajectory, this should be stated explicitly. In addition, because the angles may evolve over time, it would be helpful to clarify how temporal drift is accounted for in the contour representation.

      We have clarified in both the main text and the figure caption that the contour lines in Figure 2B represent the joint probability density of the ectodomain and TMD tilt angles. We have also added Figure 2—figure supplements 5–8 showing the temporal evolution of the ectodomain and TMD tilt angles.

      In the middle of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD” we have modified the following:

      “The temporal evolution of θ<sub>EC</sub> and θ<sub>TM</sub> is additionally shown in Figure 2—figure supplements 5–8. For the CT-truncated systems, the joint probability densities of θ<sub>EC</sub> and θ<sub>TM</sub> calculated from the final 0.5 µs of each trajectory are shown in Figure 2B, while those for the full-length systems are shown in Figure 2—figure supplement 9.”

      In the caption of Figure 2 we have modified the following:

      “(B) Probability densities of ectodomain and TMD tilt angles, calculated from CT-truncated systems with various initial configurations.”

      We have added Figure 2—figure supplements 5–8.

      We have modified the following:

      “The original Figure 2—figure supplements 5 has been renumbered as Figure 2—figure supplements 9.”

      (5) Figure 2 (supplements): Some datasets are shown using scatter plots, while others are presented as contour plots. Using a consistent visualization style across panels or clearly explaining the rationale for the different representations would improve clarity.

      The contour plots in Figure 2B and Figure 2—figure supplements 9 show the joint distribution of the ectodomain and TMD tilt angles during the final 0.5 µs of each trajectory, whereas the scatter plots in Figure 2—figure supplements 1–4 illustrate the variations of the tilt angles across different time intervals. Each 1-µs trajectory was divided into four 0.25-µs intervals, indicated by light gray, dark gray, black, and red respectively, as shown in the legends of Figure 2—figure supplements 1–4. We have clarified in the main text that the multi-colored scatter plots are intended to demonstrate that large conformational changes predominantly occurred during the first 0.5 µs of each trajectory.

      In the middle of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD” we have modified the following:

      “Each 1-µs trajectory is divided into four consecutive 0.25-µs intervals, and data points from each interval are distinguished by four different colors (Figure 2—figure supplements 1–4). The variations of θ<sub>EC</sub> and θ<sub>TM</sub> over time show that large conformational changes predominantly occurred during the first 0.5 µs, followed by convergence of the θ<sub>EC</sub> and θ<sub>TM</sub> distributions during the second 0.5 µs in most trajectories.”

      (6) As noted in Line 97, θEC and θTM tilt independently. In this context, presenting time series plots of θEC and θTM separately would be highly informative. Such plots would help readers distinguish between equilibration behavior, drift from initial conditions, and equilibrium fluctuations.

      We have added Figure 2—figure supplements 5–8 showing the temporal evolution of the ectodomain and TMD tilt angles, as noted in our response to comment (4).

      (7) Figure 3A: It is not immediately clear which panels correspond to top views and which correspond to side views. Explicitly labeling these views in the figure or caption would reduce ambiguity.

      We have added labels in Figure 3A to clearly denote the top-view and side-view panels.

      (8) Figure 3B: The description "...by solid and transparent colors..." is ambiguous, as it is unclear whether this refers to color intensity or transparency. The caption would benefit from explicitly stating the visual encoding used (for example, darker/lighter colors or left/right bars).

      We have revised the figure caption to clarify which boxes correspond to cleaved systems and which correspond to uncleaved systems.

      In the caption of Figure 3 we have modified the following:

      “For each residue, the distribution from cleaved systems is shown in dark color (left), and that from uncleaved systems is shown in light color (right).”

      (9) Figure 4H: The definition of "frequency" expressed as a percentage is unclear. If this represents the fraction of snapshots in which two atoms fall within a specified distance range, this should be stated explicitly. The authors should also clarify whether the reported quantity is a probability or a rate, and ensure that the units and terminology are consistent.

      We have revised the figure caption to clarify that the frequency represents the fraction of snapshots in which the heavy atoms of a TMD residue and the interacting component are within 5 Å.

      In the caption of Figure 4 we have modified the following:

      “For each TMD residue–interacting component pair, the frequency represents the fraction of snapshots in which the heavy atoms of the TMD residue and the corresponding component are within 5 Å. Bar shading reflects this fraction, with fully filled bars indicating 100% and empty bars indicating 0%.”

      (10) Line 170: The manuscript describes a "rapid rearrangement" of the transmembrane domain at early simulation times. It would be helpful to clarify whether this regime is considered equilibration and whether it is excluded from subsequent analyses. Plotting time series of the relevant tilting angles and transmembrane rearrangement metrics could help address this point.

      We have clarified that the TMD underwent conformational changes early in the equilibration stage to enable R696 to interact with lipid headgroups, ions, or CT residues, and these interactions were largely maintained throughout the production stage. The time series of TMD tilting angles are now shown in Figure 2—figure supplements 5–8. Notably, the TMD exhibits heterogeneous conformational changes, including tilting, bending, and partial loss of helical structure. Therefore, no single metric or limited set of metrics can comprehensively capture the full extent of TMD conformational variability.

      In the middle of the first paragraph in the subsection “The energetically unfavorable R696 in the hydrophobic core results in asymmetric, kinked TMD conformations and disrupts membrane integrity” we have modified the following:

      “Early in the equilibration stage, the TMD rapidly rearranged to allow R696 residues to interact with more favorable partners, including negatively charged lipid headgroups from either leaflet, ions and water molecules diffusing into the bilayer center, as well as polar and positively charged groups in the CT when present. Once the interactions between R696 residues and their binding partners (lipid headgroup, ions or CT residues) were established, they remained stable with minimal changes throughout the production stage.”

      (11) Line 213: As with earlier sections, time series plots of θEC and θTM, similar to those shown in Figure 3-figure supplement 1, would greatly aid interpretation by showing whether these angles drift or fluctuate around stable values.

      The time series of θ<sub>EC</sub> and θ<sub>TM</sub> are now shown in Figure 2—figure supplements 5–8. Line 213 refers to the conformational variability of the MPER. For the same reason discussed in our response to comment (10), the MPER exhibits even greater conformational heterogeneity than the TMD, and therefore cannot be adequately described by a single or small set of geometric metrics such as tilt or bending angles.

      (12) Lines 216-222: The term "trajectories" may be misleading in this context. It is unclear whether the differences discussed arise from different trajectories of the same system or from different systems altogether. Clarifying this distinction would improve interoperability.

      In this paragraph, we describe MPER conformational variations observed across all trajectories from all systems. A preceding sentence has been modified to emphasize that all trajectories from all systems are included. In addition, we have clarified which specific trajectory is referred to when discussing each example.

      At the beginning of the first paragraph in the subsection “MPER adopts diverse conformations, and its exposure depends on both MPER and TMD conformations” we have modified the following:

      “…, and a wide variety of conformations were sampled across all trajectories from all systems.”

      “Such conformation and orientation were maintained in some trajectories such as CL<sup>ΔCT</sup>3 (the third trajectory of the cleaved, CT-truncated system with the low TMD position, Figure 4—figure supplement 2C). In other trajectories, such as CL<sup>CT</sup>1, the helix-turn-helix MPER in one protomer shifted into a horizontal orientation parallel to the membrane surface (Figure 4—figure supplement 6A). In UL<sup>ΔCT</sup>1, the entire MPER adopted a more vertical arrangement, with both MPER-N and MPER-C tilted outward (Figure 4E, Figure 4—figure supplement 4A). We also observed in UH<sup>ΔCT</sup>3 and UL<sup>ΔCT</sup>3 that the HR2 helix in the ectodomain, MPER, and TMD merged into a continuous long helix (Figure 4C, F, Figure 4—figure supplement 3C, 4C). In addition, loss of helical structure within the MPER was common, particularly in the MPER-C region, which often transitioned to a random coil.”

      (13) Lines 280 and 287: Similar concerns apply to the use of the term "trajectories." If observations differ primarily between systems rather than between trajectories within a system, revising the wording accordingly would avoid confusion.

      We have revised the text to clarify that all trajectories from all systems are considered collectively.

      In the middle of the second paragraph in the subsection “Ectodomain epitopes are conditionally accessible, whereas MPER epitopes are virtually inaccessible in the closed prefusion state” we have modified the following:

      “When considering all trajectories from all systems collectively, approximately half of them exhibited at least one protomer with >35% accessibility (Supplementary file 1–Supplementary Table 2).”

      (14) Figure 5B: Providing a time series of the distance dF673, at least in the Supporting Information, would help assess sampling and equilibration. Such plots would complement the probability distributions and increase confidence in the reported trends.

      We have added Figure 5—figure supplement 1 showing the time series of the distance d<sub>F673</sub> to complement the probability distribution in Figure 5B.

      In the middle of the second paragraph in the subsection “MPER adopts diverse conformations, and its exposure depends on both MPER and TMD conformations”, we have modified the following:

      “In the initial ‘low’ and ‘high’ TMD configurations, dF673 was 6.1 Å and 9.1 Å, respectively, but across simulations it spanned a wide range from -15 Å to 20 Å (Figure 5A, B, Figure 5—figure supplement 1).”

      We have added Figure 5—figure supplement 1.

      Reviewer #3 (Public review):

      Summary:

      This study uses large-scale all-atom molecular dynamics simulations to examine the conformational plasticity of the HIV-1 envelope glycoprotein (Env) in a membrane context, with particular emphasis on how the transmembrane domain (TMD), cytoplasmic tail (CT), and membrane environment influence ectodomain orientation and antibody epitope exposure. By comparing Env constructs with and without the CT, explicitly modeling glycosylation, and embedding Env in an asymmetric lipid bilayer, the authors aim to provide an integrated view of how membrane-proximal regions and lipid interactions shape Env antigenicity, including epitopes targeted by MPER-directed antibodies.

      Strengths:

      A key strength of this work is the scope and realism of the simulation systems. The authors construct a very large, nearly complete Env-scale model that includes a glycosylated Env trimer embedded in an asymmetric bilayer, enabling analysis of membrane-protein interactions that are difficult to capture experimentally. The inclusion of specific glycans at reported sites, and the focus on constructs with and without the CT, are well motivated by existing biological and structural data.

      The simulations reveal substantial tilting motions of the ectodomain relative to the membrane, with angles spanning roughly 0-30° (and up to ~50° in some analyses), while the ectodomain itself remains relatively rigid. This framing, that much of Env's conformational variability arises from rigid-body tilting rather than large internal rearrangements, is an important conceptual contribution. The authors also provide interesting observations regarding asymmetric bilayer deformations, including localized thinning and altered lipid headgroup interactions near the TMD and CT, which suggest a reciprocal coupling between Env and the surrounding membrane.

      The analysis of antibody-relevant epitopes across the prefusion state, including the V1/V2 and V3 loops, the CD4 binding site, and the MPER, is another strength. The study makes effective use of existing experimental knowledge in this context, for example, by focusing on specific glycans known to occlude antibody binding, to motivate and interpret the simulations.

      Weaknesses:

      While the simulations are technically impressive, the manuscript would benefit from more explicit cross-validation against prior experimental and computational work throughout the Results and Discussion, and better framing in the introduction. Many of the reported behaviors, such as ectodomain tilting, TMD kinking, lipid interactions at helix boundaries, and aspects of membrane deformation, have been described previously in a range of MD studies of HIV Env and related constructs (e.g., PMC2730987, PMC2980712, PMC4254001, PMC4040535, PMC6035291, PMC12665260, PMID: 33882664, PMC11975376). Clearly situating the present results relative to these studies would strengthen the paper by clarifying where the simulations reproduce established behavior and where they extend it to more complete or realistic systems.

      A related limitation is that the work remains largely descriptive with respect to conformational coupling. Numerous experimental studies have demonstrated functional and conformational coupling between the TMD, CT, and the antigenic surface, with effects on Env stability, infectivity, and antibody binding (e.g., PMC4701381, PMC4304640, PMC5085267). In this context, the statement that ectodomain and TMD tilting motions are independent is a strong conclusion that is not fully supported by the analyses presented, particularly given the authors' acknowledgment that multiple independent simulations are required to adequately sample conformational space. More direct analyses of coupling, rather than correlations inferred from individual trajectories, would help align the simulations with the existing experimental literature. Given the scale of these simulations, a more thorough analysis of coupling could be this paper's most seminal contribution to the field.

      The choice of membrane composition also warrants deeper discussion. The manuscript states that it relies on a plasma membrane model derived from a prior simulation-based study, which itself is based on host plasma membrane (PMID: 35167752), but experimental analyses have shown that HIV virions differ substantially from host plasma membranes (e.g., PMC46679, PMC1413831, PMC10663554, PMC5039752, PMC6881329). In particular, virions are depleted in PC, PE, and PI, and enriched in phosphatidylserine, sphingomyelins, and cholesterol. These differences are likely to influence bilayer thickness, rigidity, and lipid-protein interactions and, therefore, may affect the generality of the conclusions regarding Env dynamics and antigenicity. Notably, the citation provided for membrane composition is a laboratory self-citation, a secondary source, rather than a primary experimental study on plasma membrane composition.

      Finally, there are pervasive issues with citation and methodological clarity. Several structural models are referred to only by PDB ID without citation, and in at least one case, a structure described as cryo-EM is in fact an NMR-derived model. Statements regarding residue flexibility, missing regions in structures, and comparisons to prior dynamics studies are often presented without appropriate references. The Methods section also lacks sufficient detail for a system of this size and complexity, limiting readers' ability to assess robustness or reproducibility.

      With stronger integration of prior experimental and computational literature, this work has the potential to serve as a valuable reference for how Env behaves in a realistic, glycosylated, membrane-embedded context. The simulation framework itself is well-suited for future studies incorporating mutations, strain variation, antibodies, inhibitors, or receptor and co-receptor engagement. In its current form, the primary contribution of the study is to consolidate and extend existing observations within a single, large-scale model, providing a useful platform for future mechanistic investigations.

      Following the Reviewer’s comments and suggestions, we have revised the manuscript accordingly.

      While the simulations are technically impressive, the manuscript would benefit from more explicit cross-validation against prior experimental and computational work throughout the Results and Discussion, and better framing in the introduction. Many of the reported behaviors, such as ectodomain tilting, TMD kinking, lipid interactions at helix boundaries, and aspects of membrane deformation, have been described previously in a range of MD studies of HIV Env and related constructs (e.g., PMC2730987, PMC2980712, PMC4254001, PMC4040535, PMC6035291, PMC12665260, PMID: 33882664, PMC11975376). Clearly situating the present results relative to these studies would strengthen the paper by clarifying where the simulations reproduce established behavior and where they extend it to more complete or realistic systems.

      We have added a summary of the prior computational studies in the Introduction section.

      At the beginning of the third paragraph in the Introduction section we added:

      “Molecular dynamics (MD) simulations have been employed to investigate the stability and conformational properties of monomeric and trimeric helical TMD in both aqueous and lipid bilayer environments since late 2000s (Kim et al., 2009; Gangupomu et al., 2010; Baker et al., 2014; Baker et al., 2014; Hollingsworth et al., 2018). Early studies were constrained by limited computational resources and therefore the simulation times are relatively short. Subsequent work employed metadynamics to probe rare events (Gangupomu et al., 2010; Baker et al., 2014), and simulations performed on Anton supercomputers extended sampling to multi-microsecond time scale (Baker et al., 2014). Piai and coworkers determined the NMR structure of a construct comprising the MPER, TMD, and CT, and carried out MD simulations to access the structural stability of the trimeric MPER–TMD–CT complex (Piai et al., 2021). Majumder et al. subsequently simulated the same MPER–TMD–CT complex and applied a machine learning-based approach to classify its conformational ensemble (Majumder et al., 2025). Maillie et al. combined conventional MD, steered MD, and coarse-grained simulations to examine interactions between MPER-targeting antibodies and membrane lipids (Maillie et al., 2025). In addition, MD simulations have been extensively applied to the well-studied ectodomain. Despite these advances, it remains challenging to investigate the gp120–gp41 trimer as an intact entity considering its structural complexity.”

      We have also added a discussion of previous MD simulation studies to the Result section regarding interactions of the TMD residue R696 with ions and lipid headgroups.

      At the end of the first paragraph in the subsection “The energetically unfavorable R696 in the hydrophobic core results in asymmetric, kinked TMD conformations and disrupts membrane integrity”

      “Previously, Kim et al. reported that the inter-chain interactions between protonated R696 gradually diminished over a short simulation time (23 ns), leading to increased crossing angles and reduced bundle length (Kim et al., 2009). Gangupomu et. al and Baker et. al observed that R696 snorkeled toward either exoplasmic or endoplasmic headgroups in simulations of the TMD monomer, resulting in TMD tilting and membrane thinning due to water penetration and lipid headgroups interacting with R696 (Gangupomu et al., 2010; Baker et al., 2014; Baker et al., 2014). These observations are consistent with our finding. Hollingsworth et. al also reported membrane thinning; however, they attributed this effect to interfacial interactions of R683 and R707 with both leaflets and proposed that R696 only interacted with water and ions permeating into the center of the TMD timer (Hollingsworth et al., 2018).”

      A related limitation is that the work remains largely descriptive with respect to conformational coupling. Numerous experimental studies have demonstrated functional and conformational coupling between the TMD, CT, and the antigenic surface, with effects on Env stability, infectivity, and antibody binding (e.g., PMC4701381, PMC4304640, PMC5085267). In this context, the statement that ectodomain and TMD tilting motions are independent is a strong conclusion that is not fully supported by the analyses presented, particularly given the authors' acknowledgment that multiple independent simulations are required to adequately sample conformational space. More direct analyses of coupling, rather than correlations inferred from individual trajectories, would help align the simulations with the existing experimental literature. Given the scale of these simulations, a more thorough analysis of coupling could be this paper's most seminal contribution to the field.

      We have added a discussion of the coupling between TMD, CT and Env antigenicity, and the independent motion of ectodomain and TMD in our simulation.

      In the middle of the second paragraph in the Discussion section

      “Our analysis of the ectodomain and TMD coupling indicates that the motions of these two domains are largely independent. This observation does not contradict experimental studies demonstrating functional coupling between the TMD, CT, and the antigenic profiles of Env (Chen et al., 2015; Dev et al., 2016). Munro et al. proposed that unliganded Env is intrinsically dynamic, transitioning among three distinct prefusion conformations: a closed ground state (predominant), a transient state, and a CD4-/co-receptor-stabilized state. Both laboratory-adapted and clinically isolated strains can spontaneously transition among these three states, although their relative occupancies differ (Munro et al., 2014). It is therefore possible that TMD mutations or CT truncation also alter the equilibrium distribution among three states, thereby affecting the epitope exposure, particularly for epitopes that are occluded in the closed ground state while exposed in the CD4-/co-receptor-stabilized state. However, transition among three states occur on millisecond-to-second timescales. Our simulations on microsecond timescales primarily capture conformational variations within the closed ground state and suggest that the MPER acts as a hinge, providing substantial flexibility that enables the ectodomain and TMD to move independently while Env remains in the closed ground state.”

      We have also calculated the dynamical cross-correlation maps showing very weak correlations between the ectodomain and the TMD.

      At the end of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD”

      “We also calculated the dynamical cross-correlation maps (Ichiye et al., 1991) of Cα atoms for all systems using CPPTRAJ (Roe et al., 2013). The results indicate only very weak correlations between the ectodomain and the TMD (Figure 2—figure supplements 10–13).”

      We have added Figure 2—figure supplements 10–13.

      The choice of membrane composition also warrants deeper discussion. The manuscript states that it relies on a plasma membrane model derived from a prior simulation-based study, which itself is based on host plasma membrane (PMID: 35167752), but experimental analyses have shown that HIV virions differ substantially from host plasma membranes (e.g., PMC46679, PMC1413831, PMC10663554, PMC5039752, PMC6881329). In particular, virions are depleted in PC, PE, and PI, and enriched in phosphatidylserine, sphingomyelins, and cholesterol. These differences are likely to influence bilayer thickness, rigidity, and lipid-protein interactions and, therefore, may affect the generality of the conclusions regarding Env dynamics and antigenicity. Notably, the citation provided for membrane composition is a laboratory self-citation, a secondary source, rather than a primary experimental study on plasma membrane composition.

      We have added references to primary experimental studies on plasma membrane composition (van Meer et al., 2008; Sampaio et al., 2011), as well as the prior simulation study proposing the lipid and cholesterol distributions (Ingolfsson et al., 2014).

      At the beginning of the Membrane subsection in the Materials and methods section

      We have modified the following:

      The full-length and CT-truncated gp120–gp41 models were embedded into an asymmetric lipid bilayer with the lipid composition corresponding to a mammalian plasma membrane (van Meer et al., 2008; Sampaio et al., 2011; Ingolfsson et al., 2014; Pogozheva et al., 2022),

      We have also clarified the limitations associated with the choice of lipid composition and emphasized the need to investigate its influence in future studies.

      At the end of the second paragraph in the Discussion section we added:

      “In addition to the limitations inherent to protein structure modeling, the choice of lipid composition remains an open question. In this work, we selected an asymmetric mammalian plasma membrane because it is one of the 18 complex biomembrane systems we previously studied (Pogozheva et al., 2022), and among them, it provides the closest available approximation to the HIV membrane. Nevertheless, experimental studies have reported differences in lipid composition between HIV virions and the host plasma membrane (Aloia et al., 1993; Brugger et al., 2006; Huarte et al., 2016; Mucksch et al., 2019; Tomishige et al., 2023). Although we do not anticipate that our main conclusions regarding Env domain motions and MPER flexibility would change substantially, evaluating the influence of lipid composition represents an important direction for future work.”

      Finally, there are pervasive issues with citation and methodological clarity. Several structural models are referred to only by PDB ID without citation, and in at least one case, a structure described as cryo-EM is in fact an NMR-derived model. Statements regarding residue flexibility, missing regions in structures, and comparisons to prior dynamics studies are often presented without appropriate references. The Methods section also lacks sufficient detail for a system of this size and complexity, limiting readers' ability to assess robustness or reproducibility.

      We have corrected the error in which PDB structure 7LOI was described as a cryo-EM structure; it is in fact an NMR structure. We have also verified that all PDB structures are properly cited at their first occurrence in the manuscript.

      We have clarified that the modeling of palmitoylation sites, glycans and lipid bilayers are done in an automated fashion by different modules in CHARMM-GUI, and added Supplementary file 1–Supplementary Table 8 showing the simulation settings for equilibration and production stages.

      At the end of the subsection “Modeling of full-length gp120–gp41 trimer” we have modified the following:

      “Two mutations (S764C and S837C) were introduced in the CT to restore the palmitoylation sites, and lipid tails oriented towards the hydrophobic core of the bilayer were then attached to the palmitoylation sites using the PDB Manipulation module in CHARMM-GUI (Jo et al., 2008; Jo et al., 2014; Park et al., 2023) (Figure 1D).”

      At the end of the subsection “Glycosylation” we added:

      “The select glycan sequences were represented in the Glycan Reader Sequence format (Jo et al., 2011; Park et al., 2017) and added to the corresponding glycosylation sites using the Glycan Reader & Modeler graphical interface.”

      In the middle of the subsection “Membrane” we added:

      “Membrane systems were constructed using CHARMM-GUI Membrane Builder, which provides a user-friendly graphical interface for selecting lipid types and defining their numbers in each leaflet (Jo et al., 2007; Jo et al., 2009; Wu et al., 2014; Lee et al., 2016; Lee et al., 2019).”

      In the middle of the subsection “Simulation details” we added:

      We have modified the following:

      “Positional and dihedral restraints were applied to proteins, glycans, and lipids, with force constants progressively reduced over successive intervals (Supplementary file 1–Supplementary Table 8).”

      We added Supplementary file 1–Supplementary Table 8.

      Reviewer #3 (Recommendations for the authors):

      Major concerns:

      (1) Strengthen analysis of conformational coupling: Consider analyses that more directly assess coupling between the TMD/CT and ectodomain, such as residue-residue correlation networks, comparisons to smFRET-defined conformational states, or data-driven (e.g., machine learning-based) trajectory analyses. Machine-learning analysis would be particularly helpful in understanding otherwise elusive allosteric networks that could govern large-scale behavior. Discuss how, due to the apparent local minima that occur after ~0.5 us, enhanced sampling methods might be employed to better cover the Env conformational landscape.

      We have calculated the dynamical cross-correlation maps showing very weak correlations between the ectodomain and the TMD.

      At the end of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD”

      “We also calculated the dynamical cross-correlation maps (Ichiye et al., 1991) of Cα atoms for all systems using CPPTRAJ (Roe et al., 2013). The results indicate only very weak correlations between the ectodomain and the TMD (Figure 2—figure supplements 10–13).”

      We added Figure 2—figure supplements 10–13.

      We have also noted in the Discussion section that enhanced sampling methods could be employed to better explore the conformational landscape of Env trimer, including fluctuations within the closed state as well as transitions among the closed ground, transient and CD4/co-receptor-stabilized states proposed in the previous experimental study (Munro et al., 2014).

      In the middle of the second paragraph in the Discussion section we added:

      “Enhanced sampling methods could be applied to more thoroughly explore the conformational landscape, including not only variations within the closed ground state but also transitions among the closed ground, transient and CD4-/co-receptor-stabilized states.”

      (2) Qualify strong independence claims: Rephrase or further support statements asserting independence of ectodomain and TMD motions, particularly in light of known experimental evidence for coupling (PMC4701381, PMC4304640, PMC5085267).

      In addition to adding the dynamical cross-correlation maps showing very weak correlations between the ectodomain and the TMD, we have added a discussion of the coupling between TMD, CT, and Env antigenicity, and the independent motion of ectodomain and TMD in our simulation.

      In the middle of the second paragraph in the Discussion section we added:

      “Our analysis of the ectodomain and TMD coupling indicates that the motions of these two domains are largely independent. This observation does not contradict experimental studies demonstrating functional coupling between the TMD, CT, and the antigenic profiles of Env (Chen et al., 2015; Dev et al., 2016). Munro et al. proposed that unliganded Env is intrinsically dynamic, transitioning among three distinct prefusion conformations: a closed ground state (predominant), a transient state, and a CD4-/co-receptor-stabilized state. Both laboratory-adapted and clinically isolated strains can spontaneously transition among these three states, although their relative occupancies differ (Munro et al., 2014). It is therefore possible that TMD mutations or CT truncation also alter the equilibrium distribution among three states, thereby affecting the epitope exposure, particularly for epitopes that are occluded in the closed ground state while exposed in the CD4-/co-receptor-stabilized state. However, transition among three states occur on millisecond-to-second timescales. Our simulations on microsecond timescales primarily capture conformational variations within the closed ground state and suggest that the MPER acts as a hinge, providing substantial flexibility that enables the ectodomain and TMD to move independently while Env remains in the closed ground state.”

      (3) Clarify membrane composition assumptions: Provide a clearer rationale for the chosen lipid composition, and explicitly discuss how differences between host plasma membranes and HIV virions (e.g., PS, sphingomyelin, and cholesterol enrichment) may affect the conclusions.

      We have clarified the limitations associated with the choice of lipid composition and emphasized the need to investigate its influence in future studies.

      At the end of the second paragraph in the Discussion section we added:

      “In addition to the limitations inherent to protein structure modeling, the choice of lipid composition remains an open question. In this work, we selected an asymmetric mammalian plasma membrane because it is one of the 18 complex biomembrane systems we previously studied (Pogozheva et al., 2022), and among them, it provides the closest available approximation to the HIV membrane. Nevertheless, experimental studies have reported differences in lipid composition between HIV virions and the host plasma membrane (Aloia et al., 1993; Brugger et al., 2006; Huarte et al., 2016; Mucksch et al., 2019; Tomishige et al., 2023). Although we do not anticipate that our main conclusions regarding Env domain motions and MPER flexibility would change substantially, evaluating the influence of lipid composition represents an important direction for future work.”

      (4) Address citation and reference issues: Replace PDB-only references with proper citations, correct mischaracterizations of structure determination methods, and ensure all supplementary citations are fully referenced.

      We have corrected the error in which PDB structure 7LOI was described as a cryo-EM structure; it is in fact an NMR structure. We have also verified that all PDB structures are properly cited at their first occurrence in the manuscript.

      (5) Expand the Methods section: Provide additional detail on system construction, glycan modeling, lipid asymmetry, equilibration, sampling, and limitations, including a discussion of potential benefits of enhanced-sampling approaches.

      We have clarified that the modeling of palmitoylation sites, glycans and lipid bilayers are done in an automated fashion by different modules in CHARMM-GUI, and added Supplementary file 1–Supplementary Table 8 showing the simulation settings for equilibration and production stages.

      At the end of the subsection “Modeling of full-length gp120–gp41 trimer” we have modified the following:

      “Two mutations (S764C and S837C) were introduced in the CT to restore the palmitoylation sites, and lipid tails oriented towards the hydrophobic core of the bilayer were then attached to the palmitoylation sites using the PDB Manipulation module in CHARMM-GUI (Jo et al., 2008; Jo et al., 2014; Park et al., 2023) (Figure 1D).”

      At the end of the subsection “Glycosylation” we added:

      “The select glycan sequences were represented in the Glycan Reader Sequence format (Jo et al., 2011; Park et al., 2017) and added to the corresponding glycosylation sites using the Glycan Reader & Modeler graphical interface.”

      In the middle of the subsection “Membrane” we added:

      “Membrane systems were constructed using CHARMM-GUI Membrane Builder, which provides a user-friendly graphical interface for selecting lipid types and defining their numbers in each leaflet (Jo et al., 2007; Jo et al., 2009; Wu et al., 2014; Lee et al., 2016; Lee et al., 2019).”

      In the middle of the subsection “Simulation details” we have modified the following:

      “Positional and dihedral restraints were applied to proteins, glycans, and lipids, with force constants progressively reduced over successive intervals (Supplementary file 1–Supplementary Table 8).”

      We added Supplementary file 1–Supplementary Table 8.

      The discussion of potential benefits of enhanced-sampling approaches is included in our response to major concern (1).

      (6) Data availability: In addition to code, deposit all MD trajectories for re-analysis. The scale of this simulation was likely costly (GPU time), and so data availability is imperative.

      We have deposit MD simulation trajectories to Zenodo.

      At the end of the section “Data availability” we added:

      “The simulation trajectories can be found at https://doi.org/10.5281/zenodo.18853902, https://doi.org/10.5281/zenodo.18854615, and https://doi.org/10.5281/zenodo.18854639.”

      Minor:

      (1) Stylistic: Suggested to revise Figure 1 to provide a clearer overview of all constructs with consistent nomenclature (e.g., "full-length" versus "ΔCT") and explicit domain boundaries. With a better overview figure, the current figures could comprise the Figure 1 associated with Figures 1 and 2.

      We have combined Figure 1 and Figure 1—figure supplement 1 to show both full-length and CT-truncated models in one figure.

      We have modified Figure 1.

      We have removed Figure 1—figure supplements 1.

      (2) Explicitly cross-validate against prior studies: Integrate comparisons to existing MD simulations and experimental studies (e.g., PMC2730987, PMC2980712, PMC4254001, PMC4040535, PMC6035291, PMC4701381, PMC5085267) directly into the Results and Discussion.

      We have added discussion of previous MD simulation studies to the Result section regarding interactions of the TMD residue R696 with ions and lipid headgroups.

      At the end of the first paragraph in the subsection “The energetically unfavorable R696 in the hydrophobic core results in asymmetric, kinked TMD conformations and disrupts membrane integrity” we have modified the following:

      “Previously, Kim et al. reported that the inter-chain interactions between protonated R696 gradually diminished over a short simulation time (23 ns), leading to increased crossing angles and reduced bundle length (Kim et al., 2009). Gangupomu et. al and Baker et. al observed that R696 snorkeled toward either exoplasmic or endoplasmic headgroups in simulations of the TMD monomer, resulting in TMD tilting and membrane thinning due to water penetration and lipid headgroups interacting with R696 (Gangupomu et al., 2010; Baker et al., 2014; Baker et al., 2014). These observations are consistent with our finding. Hollingsworth et. al also reported membrane thinning; however, they attributed this effect to interfacial interactions of R683 and R707 with both leaflets and proposed that R696 only interacted with water and ions permeating into the center of the TMD timer (Hollingsworth et al., 2018).”

      The discussion of PMC4701381 and PMC5085267 is included in our response to major concern (2).

      (3) "In the cryo-EM structure (PDB ID: 7LOI)": This is an NMR model and lacks citation.

      We have corrected this error and added the citation at the first occurrence of PDB ID: 7LOI in the Result section.

      In the middle of the first paragraph in the subsection “The energetically unfavorable R696 in the hydrophobic core results in asymmetric, kinked TMD conformations and disrupts membrane integrity” we have modified the following:

      “In the NMR structure (PDB ID: 7LOI) (Piai et al., 2021),”

      (4) "Higher RMSF values were observed in the residues missing from the cryo-EM structure": This is lacking citation, as there are multiple cryo-EM structures and several dynamics studies using NMR.

      The missing residues here specifically refer to those absent in the cryo-EM structure (PDB ID: 6B0N) used for model building, rather than all cryo-EM structures in the PDB. We have revised the text to clarify this distinction.

      In the middle of the second paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD” we have modified th following:

      “Higher RMSF values were observed in the residues missing from the cryo-EM structure (PDB ID: 6B0N) (Sarkar et al., 2018), which was used for the ectodomain in model building (these missing residues are highlighted in red in Figure 1A, B),”

    1. Author response:

      eLife Assessment

      This study provides fundamental insights by demonstrating that the Nanog mRNA coding sequence (CDS) and 3′UTR domains are spatially segregated and functionally distinct in pluripotent stem cells and blastocysts, with 3′UTR-enriched border cells primarily influencing morphogenesis and CDS-enriched inner cells largely regulating transcription and epigenetic programs. The work opens a novel conceptual avenue for understanding how separable mRNA domains can differentially control cell behavior and differentiation. However, the evidence is incomplete, as key aspects of the molecular nature, biogenesis, and precise characterization of the separated 3′UTR and CDS RNA species, as well as causal links between their perturbation and the observed phenotypes (e.g., via rescue and deeper characterization of 3′UTR elements), remain to be fully established.

      We thank the editors and the three reviewers for their careful and constructive engagement with our manuscript. We greatly appreciate the reviewers’ recognition of the conceptual significance of the study and their thoughtful suggestions for strengthening the mechanistic and molecular characterization of the work. We have carefully considered all points raised and outline below the revisions planned for the revised manuscript.

      The phenomenon of differential CDS and 3’UTR expression is not unique to Nanog. Independent 3’UTR and CDS expression and differential CDS/3’UTR usage has been observed across multiple genes, tissues, and developmental contexts, including genome-wide (Mercer et al., 2011) and transcriptome scale studies (Kocabas et al., 2025, Ji et al., 2021). Prior studies have proposed that isolated 3’UTRs may arise through regulated RNA processing pathways coupled to exonucleolytic degradation and, in some cases, recapping mechanisms (Malka et al, 2017, Haberman et al., 2024). While the precise molecular mechanisms underlying isolated Nanog CDS and 3’UTR generation remain unresolved, our observations (contained here) support regulated RNA processing models. Our original submission included a brief discussion of this topic; however the revised manuscript will include substantially expanded analyses and discussion of the generation of isolated Nanog CDS and 3’UTR species.

      The revised manuscript will address the major concerns regarding:

      (1) The molecular nature, biogenesis, and precise characterization of the separated 3′UTR and CDS mRNA species

      (2) The causal relationship between perturbation of these RNA species and the observed phenotypes, including additional rescue experiments and deeper computational characterization of putative, functional 3′UTR elements.

      Specifically:

      (A) New supplementary analyses and schematics designed to further clarify the conceptual and mechanistic framework of the study, including:

      (i) Computational examination of the Nanog 3’UTR across all reading frames for open reading frames (ORFs).

      (ii) As suggested by Reviewers 1 and 3, single cell traces of Nanog mRNA expression from the full-length mESC dataset used in this study, illustrating distinct transcript isoforms and CDS/3’UTR expression patterns across individual cells, complementing the color-coded tSNE analyses currently presented in Fig. 2.

      (iii) Expanded schematic model and analyses addressing possible mechanisms underlying the generation of isolated Nanog CDS and 3’UTR enriched RNA species, including transcript architecture, predicted RNA structural barriers, and exonucleolytic processing models.

      (iv) Expanded discussion of the predominantly nuclear localization of the Nanog 3’UTR signal and its implications for transcript biogenesis, processing, and potential noncoding functions.

      (B) Correction of all minor labeling errors.

      (C) Additional experimental analyses, including:

      - Expansion of Nanog 3’UTR overexpression and rescue experiments to include cell spreading assays.

      - Expanded analysis of the effects of ROCK pathway inhibitors on colony morphology and cytoskeletal organization.

      - Examination of the ability of ROCK inhibition to restore normal embryoid body formation.

      Collectively, these planned revisions are intended to strengthen the mechanistic framing, molecular characterization, and broader significance of the study while clarifying the interpretation and scope of the conclusions.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      There is evidence that some genes encode mRNAs from which separate processed transcripts may arise, separating the coding sequence (CDS) from the 3'-UTR, and with both mRNA elements remaining stable in the cell. However, the functional consequences of these mRNA fragments have not been firmly established. In the manuscript by Yang et al., the authors probe the mRNA domain architecture of Nanog in the context of embryonic stem cell colonies and blastocysts. The authors detect spatial separation of Nanog CDS-containing mRNA from abundant Nanog 3'-UTR RNAs depending on the cell position in 2D embryonic stem cell colonies or in blastocysts.

      Strengths:

      The phenotypic analyses of the Nanog mRNA hold promise for revealing distinct roles for the Nanog encoded protein and a separate RNA encompassing the Nanog 3'-UTR.

      Weaknesses:

      There are a number of questions about the molecular nature of the mRNA species that the authors should address in order for the results to be firmly established, as noted below.

      (1) It is not clear how the authors verified that their probes are specific for Nanog CDS or 3'-UTR regions. Especially for the 3'-UTR probe, it is confusing why colonies show green only regions, suggesting only the CDS is present. I would expect the CDS and 3'-UTR probes to colocalize in the interior cells. Is it possible that the 3'-UTR probe is targeting another RNA?

      We thank the reviewer for raising the important question of probe specificity. We realize that the data that underlying this concern is the absence of colocalizing between CDS and 3’UTR probes in colony border cells.

      The absence of CDS/3’UTR colocalization in colony border cells is not due to probe failure but instead reflects the principal observation underlying the study. If Nanog CDS and 3’UTR sequences were present exclusively as intact full-length transcripts in a strict stoichiometric ratio, Nanog positive cells would be expected to be positive for both probes (appearing yellow). Instead, border cells exhibit strong 3’UTR signal with minimal or absent CDS signal, while adjacent interior cells show the opposite pattern.

      The fact that both probes robustly detect signal within the same sample but in spatially distinct cell populations, argues that both probes are functional and that the observed differential localization reflects genuine biological differences in levels of transcript components.

      The CDS probe targets ~300 bp within the coding region, while the 3’UTR probe targets ~300 bp within the proximal region of the Nanog 3’UTR. Hybridization specificity was validated as described in the Methods and in our previous studies (Kocabas et al 2015; Ji et al 2021), including negative controls. We additionally now provide a supplemental figure (New Figure 1-figure supplement 2A), highlighting that the Nanog 3’UTR and CDS probes label cell populations distinct from each other, further indicating their specificity.

      In addition, full-length scRNA seq datasets from both mouse and human ESCs demonstrate differential CDS/3’UTR expression patterns for Nanog and many other genes. To further clarify this point, the revised manuscript will include single cell transcript traces from mESCs illustrating the distinct Nanog isoforms detected across individual cells (New Figure 2-figure supplement 1A)

      (2) It would help for the authors to include a graphic similar to Figure 3, Figure Supplement 1A, that diagrams the location of the CDS and 3'-UTR probes (this should also be done for Oct4 and Sox2). This graphic could also show all potential polyadenylation signals.

      We agree that additional schematic clarification would improve readability. The revised manuscript will include schematics showing the locations of the CDS and 3’UTR probes for Nanog, Sox2 and Oct4 (New Fig. 1- figure supplement 1A).

      (3) I think, based on the fluorescence patterns, there is evidence that the signal for the Nanog 3'-UTR probe is nuclear (images with DAPI staining), but this is not commented on that I could find. This should be discussed, as nuclear retention has implications for the noncoding function of the 3'-UTR fragment.

      The reviewer is correct that the Nanog 3’UTR signal mostly nuclear. Whie this was noted in (the original) Figure 1-figure supplement 2A, we agree that it is possible that mechanistic and functional implications were not sufficiently discussed in the original manuscript. The revised manuscript will include expanded discussion of the relationship between nuclear localization transcript processing, and potential noncoding functions of isolated Nanog 3’UTR species

      (4) Figure 2, Figure Supplement 1A needs a better explanation. It's not clear how the reads map to the different regions of the Nanog mature mRNA. The authors should show examples at different ratios of CDS to 3'-UTR. Do the reads have a sharp boundary at the junction of where the isolated 3'-UTR is thought to occur?

      We thank the reviewer for this suggestion. The revised manuscript will include new single cell read maps across the Nanog locus from full length mESC scRNA-seq datasets (New Figure 2-figure supplement 1A), illustrating distinct CDS enriched and 3’UTR enriched transcript isoforms across individual cells.

      These analyses indicate that some CDS dominant transcripts contain 3’UTR sequence, while many appear to contain little or no detectable 3’UTR sequence. Conversely, many 3’UTR enriched transcripts contain only minimal or truncated CDS sequence. Importantly full CDS and 3’UTR mRNA components are frequently not present in a strict 1:1 ratio, either within individual cells, or across cell populations.

      The revised manuscript will also include expanded supplementary analyses integrating transcript architecture, predicted RNA structural barriers, polyadenylation analysis, and single cell coverage patterns to further examine possible mechanisms underlying the generation of isolated Nanog CDS and 3’UTR species (New Figure 2-figure supplement 1B,C).

      (5) I looked in the Zenbu browser at human NANOG CAGE mapping in the FANTOM5 dataset. I could not see evidence for substantial capping of a 3'-UTR fragment when filtering for embryonic cell types. Given the strong signal for the 3'-UTR in border cells, I would expect to see evidence for capping if the RNA were indeed capped. This suggests that if it exists, it is likely uncapped and (as noted in point 3) is likely nuclear retained.

      Prior studies have reported isolated uncapped and recapped 3’UTR species in multiple systems (Malka et al, 2017; Haberman et al, 2024). We agree that the predominantly nuclear localization and lack of a strong CAGE signal for Nanog are important observations and will expand discussion of these points in the revised manuscript.

      (6) Are there predicted polyadenylation signals near the end of the CDS that would generate a short 3'-UTR, and are these signals conserved across mammals?

      Computational analysis of the mouse Nanog 3'UTR identifies a single canonical PAS (AATAAA) at position 1074, located at the 3’ end of the annotated 3’UTR and this terminal PAS is conserved across mammals. These analyses will be included as a supplementary figure and discussed further in the revised manuscript section addressing Nanog transcript biogenesis.

      (7) It would help to see a zoomed-in view of the region targeted by one of the guide RNAs in the 3'-UTR, and where that site is relative to the polyadenylation signal. Is the polyadenylation signal upstream, i.e., CDS proximal?

      This will be provided in the revised manuscript (New Figure 2-figure supplement 1C,i) Two guide RNAs were used to generate the Nanog 3’UTR deletions. The downstream guide is upstream of the terminal polyadenylation signal at nt 1074 to preserve polyadenylation of the remaining Nanog CDS containing transcript.

      Consistent with this, all Nanog 3’UTR knockout lines retain normal Nanog protein levels. The revised manuscript will include supplementary schematics showing guide RNA positions relative to the CDS, 3’UTR probes, and terminal PAS.

      (8) A final note, the use of green and red together will be challenging for those who are colorblind. Providing a different false color palette would be helpful. 

      We appreciate this attention to accessibly. The red/green color combination was chosen to provide the highest contrast between CDS and 3’UTR signals in the in situ hybridization experiments, which is important for visualizing their differential spatial localization. We will ensure that figure legends clearly indicate channel assignments throughout the manuscript.

      I am refraining from comments on the cell biology and morphological insights, as they are remote from my core expertise.

      Reviewer #2 (Public review):

      Summary:

      This manuscript shows that the coding sequence (CDS) and 3' untranslated region (3'UTR) of mRNA transcripts from the Nanog gene have distinct expression patterns and functions. In both human and mouse embryonic stem cells colonies and blastocysts, these domains are spatially segregated, with 3'UTR-enriched cells occupying the borders and CDS-enriched cells residing in the interior. CDS mRNA expression is correlated with the expected regulation of transcription and epigenetics associated with the Nanog protein. Interestingly, expression of the 3'UTR appears to play an independent role in cell behavior and colony morphogenesis. Indeed, deletion of the 3'UTR causes specific defects in cell spreading and protrusive activity, with alteration in the localization of adhesion and cytoskeleton-associated proteins. Remarkably, a large proportion of those defects are rescued upon ROCK inhibition. Deletion of either Nanog CDS or 3'UTR leads to distinct modifications in the differentiation competence.

      Strengths:

      The independent role of 3'UTR mRNA domains, although identified in neurosciences a couple of years ago, is a novel and exciting field relatively unexplored in early development.

      The manuscript offers a multilayer series of experiments, in ES cells colony, blastocysts, and embryoid bodies, including imaging, -omics, genetic and pharmacological challenges, and differentiation experiments, thereby unveiling very convincingly the role of Nanog 3'UTR in morphogenesis.

      Weaknesses:

      The pathways leading to the generation of those distinct transcript domains are unknown. Although the functional differential roles are well demonstrated whether the expression patterns are a cause or a consequence of the cells' localization in the embryo remains to be explored.

      We thank the reviewer for these thoughtful comments and for recognizing the potential significance of independent 3’UTR functions in early developmental systems.

      Regarding the mechanisms underlying generation of distinct CDS and 3’UTR transcript domains, the revised manuscript will include new supplementary analyses and schematic models addressing possible Nanog transcript processing pathways, as outlined above.

      We agree that the relation between spatial location and Nanog 3’UTR expression is an important question. Specifically, it remains unclear whether cells first acquire high Nanog 3’UTR expression and subsequently localize to the colony border or whether border position itself promotes high Nanog 3’UTR expression.

      Our current data suggest that both processes may contribute. Deletion of the Nanog 3’UTR does not prevent colonies from establishing border/interior pattern, indicating that high Nanog 3’UTR is not strictly required for border pattern itself. At the same time, Nanog 3’UTR overexpression and rescue experiments increased the likelihood of border localization, suggesting that elevated Nanog 3’UTR expression promotes behaviors associated with border occupancy.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Yang et al reported distinct functions of the protein-coding sequence (CDS) and the 3' untranslated region (UTR) in the Nanog mRNA in pluripotent stem cells. They first observed different localization patterns for the CDS and 3' UTR in embryonic stem cells and in blastocyst embryos, and this pattern correlates with cell populations in different pluripotent states based on single-cell sequencing data. To characterize the potentially distinct functions of these regions, the authors generated knockout (KO) cell lines in which either the CDS or the 3' UTR was genetically ablated. These deletions led to different phenotypes in multiple assays. These results provided evidence that the CDS and 3' UTR of an mRNA could have distinct functions. Although these results are potentially interesting, several questions need to be addressed before the validity of their conclusion can be confirmed.

      Strengths:

      This study provides evidence for distinct functions of the protein-coding sequence and 3' untranslated region of an mRNA in pluripotent stem cells. The concept could be more broadly applied.

      Weaknesses:

      The initial observation (distinct localization of CDS and 3' UTRs) and the causal relationship between the KO and phenotype need further validation.

      Major points:

      (1) The authors showed distinct localization patterns of the CDS and 3' UTRs in human and mouse ESCs and blastocysts, and the overlap between their signals was minimal (Figure 1). Does this mean that the CDS and 3' UTR RNAs exist separately? For example, in cells that only showed signals for 3' UTRs, do these RNAs only contain 3' UTRs and lack CDS? Was this confirmed by RNA-seq experiments? If so, how are they generated (i.e., by transcription from a novel promoter or partial degradation of the full-length mRNAs)? This is a key question. Without a clear characterization of these RNAs, the rest of the study cannot be substantiated.

      We thank the reviewer for raising this important question, which overlaps substantially with several key points raised by Reviewer #1 concerning the molecular nature and characterization of the Nanog CDS and 3’UTR species.

      Colony border cells exhibit strong Nanog 3’UTR signal with minimal detectable CDS signal, while adjacent interior cells show the reciprocal pattern. These observations strongly suggest the existence of distinct Nanog transcript species rather than exclusively full-length transcripts containing stoichiometric amounts of both CDS and 3’UTR sequence.

      This conclusion is independently supported by full-length Smart-seq2 scRNA seq datasets from both mouse and human ESCs, which provide transcript coverage across both CDS and 3’UTR regions.

      (2) To confirm that the phenotypes of CDS or 3' UTR KO cells were caused by the deleted regions instead of other artifacts, rescue experiments should be performed.

      Rescue experiments were included in the original submission (Fig. 4). The revised manuscript will expand these analyses to include cell spreading. We will also include additional ROCK pathway modulation experiments.

      (3) As over-expression of the 3' UTR showed a phenotype, important regions within it should be identified, and also the possibility that the 3' UTR contains open reading frame(s) and is translated should be tested.

      The revised manuscript will also include supplementary computational analyses of the Nanog 3’UTR, including open reading frame prediction, Kozak scoring, and evolutionary conservation analysis. (New Figure 2-figure supplement 1B). These analyses identify no evidence for strongly supported coding potential within the 3’UTR. Further, isolated Nanog 3’UTR transcripts are largely confined to the nucleus, making active translation unlikely.

      The revised manuscript will include new supplementary analyses addressing Nanog transcript structure and possible biogenesis mechanisms (New Figure 2-figure supplement 1C).

      References:

      ViennaRNA/RNA fold – Lorenz et al 2011 Algorithms Mol Biol 6:26- RNA Secondary Structure stem loop, minimum free energy (MFE) prediction

      NCBI BLASTP- Altschul et al (1990) J Mol Biol 215:403- ORF conservation, protein sequence similarity search

      NCBI Entrez/Biohthon- Cock et al (2009) Bioinformatics 25:1422- sequence retrieval

      PhastCons/UCSC multiz alignments- Siepel et al (2005) Genome Res 15:1034- evolutionary conservation scoring

      UCSC Genome Browser- Kent et al. (2002) Genome Res 12:996-1006- conservation track access

      Eaton et al (2020) Mol Cell 78:439- Stall model

      Brannan et al (2012) Genes Dev 26:2621-Stall model

      Addition to Methods.

      ORFs (≥10 amino acids) were identified in all three forward frames according to Kozak (1987). Evolutionary conservation was assessed by BLASTP (Altschul et al., 1990) against RefSeq proteins. Poly(A) signals were identified by pattern matching for canonical and non-canonical hexamers. Conserved sequence blocks were obtained from UCSC PhastCons tracks (Siepel et al., 2005). RNA secondary structures were predicted using ViennaRNA RNAfold (Lorenz et al., 2011) with a sliding 80-nt window. The stall model for isolated transcript generation follows Eaton et al. (2020).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Dalben et al. grafted the fusion loop mature (FLM) modification, based on a previously reported D2-FLM, to another serotype DENV4, and adapted them to replicate in Vero cells for live attenuated vaccine (LAV) manufacturing while retaining favorable antigenic profiles, generating two new strains: D2-vFLM and D4-vFLM. Deep sequencing revealed adapted mutations at the junction of envelope domains I and II (EDI and EDII), and both D2-vFLM and D4-vFLM showed no evidence of ADE in the presence of FL-targeting Abs. Sera from D2-vFLM immunized mice displayed strong homotypic and reduced heterotypic neutralization compared to wild-type viruses, with minimal to no ADE potential in vitro. Moreover, D2-vFLM immunization completely protected AG129 mice from lethal challenge with mouse-adapted D220. They demonstrate that the FLM modification platform is transferable across serotypes and yields strains with favorable immunogenicity and reduced ADE risk. The FLM approach provides a promising path toward the development of a safer tetravalent DENV LAV.

      Strengths:

      The authors carried out a series of experiments to generate and characterize two new strains (D2-vFLM and D4-vFLM) of FLM-modified viruses, and showed their antigenic and immunogenic profiles. The observation that the FLM modification platform is transferable across serotypes and yields strains with favorable immunogenicity and reduced ADE risk is interesting.

      We thank reviewer 1 for the encouraging comments for our work.

      Weaknesses:

      However, one concern is the total number of mutations (including originally introduced and compensatory mutations) in this FLM vaccine platform, and it is not clear regarding the future directions for the proof-of-concept vaccine in this study.

      Author response table 1.

      We summarize the mutations in the FLM platform below.

      The maturation mutations are located at the furin cleavage site, which is buried within the membrane or virion. As a result, only five mutations are surface exposed, two of which are in the fusion loop region targeted for removal. Therefore, for a proof-of-concept study, the total number of mutations remains well within the genetic diversity observed among DENV genotypes.

      Compensatory mutations may affect overall DENV antigenicity. Notably, one such mutation, K204R, has been reported to alter antigenicity and could contribute to the improved safety profile of the vaccine. However, we have also shown that multiple adaptive pathways can support Vero cell adaptation, and our data indicate that K204R is not absolutely required for this process.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, YR Dalben et al describe the generation of DENV2 and DENV4 strains with mutations in the fusion loop (FL) of the E protein and pre-membrane (prM) protein to limit potential antibody-dependent enhancement (ADE) resulting from vaccination with live-attenuated vaccines and adapted these strains for growth in Vero cells. They show that the DENV2 version D2-vFLM is immunogenic and generates neutralizing serum against DENV2 and DENV4 after 2 boosts and is protective against lethal challenge. Serum from D2-vFLM also showed no ADE against DENV4.

      Strengths:

      Overall, the paper is well written and presented, and the data presented support most of the conclusions made. Grafting D2-FLM mutations to DENV4 and adapting both to growth in Vero cells is a good step to show that this method could be used to generate production-level LAV. The growth and stability data are clear and well-conducted.

      We thank reviewer 2 for the encouraging comments for our work.

      Weaknesses:

      However, there are several weaknesses, mostly in regard to the immunogenicity data, that limit the overall impact. The FLM mutations were only grafted to DENV4 but not to the other Dengue serotypes. The authors acknowledge that this is a proof-of-concept, but generating mutants of the other serotypes would strengthen the idea that this could be used to develop a tetravalent LAV.

      We selected DENV2 and DENV4 because they are the most genetically divergent. Currently, our data should support the FLM mutations that can be grafted on both DENV2 and DENV4, likely extend to their corresponding genotypes. We agree that the FLM mutations should be evaluated in additional serotypes. We also have promising preliminary data for FLM mutation grafting in DENV1 and are currently applying the same approach to DENV3. We hope to include these results, whether positive or negative, in the revised manuscript.

      Immunizations in mice were only performed for D2-vFLM but not D4-vFLM. Immunogenicity data for D4-vFLM would strengthen this work if it shows that it can be immunogenic, protective, and limit ADE, as is shown for D2-vFLM.

      We are currently immunizing AG129 mice with DV4 and D4-vFLM, followed by heterotypic challenge with D220. Because DENV vaccine-related hospitalization in clinical trials typically occurs 3 - 4 years after vaccination, we are cautious about whether this experimental design will fully capture the added safety benefit of the FLM mutations. We are also developing a passive immunization model in AG129 mice using diluted DENV4 serum to better mimic long-term waning antibody titers. We will include the future findings in the revised manuscript.

      ADE from D2-vFLM was only tested against DENV4; does it also limit ADE from the other serotypes? This would better show that these mutations do limit ADE across serotypes and not just a single one.

      We are trying to keep the scope of the paper within DENV2 and DENV4, however, we will perform ADE and neutralization assays for all four serotypes in the revised manuscript.

      Additionally, some of the immunization data likely need to be repeated:

      The authors should describe why they pooled the sera from the mice and whether they purified total IgG or not (Figure 5).

      We used pooled serum, consisting of equal volumes from each mouse, rather than purified IgG. In Figure 5, our goal was to show the overall increase in serum titer after each immunization using cheek-bleed samples from individual animals. Because the available sample volume was limited, we pooled the sera for this analysis. We also measured end-point serum titers for each individual animal.

      They should also probably repeat the challenge experiment since it was 4 mice (D2) against 5 (D2-vFLM), and it is unclear if there is a statistical difference between the results obtained. It is not even mentioned in the Results section (D2 result vs D2-FLM), and thus unclear if using D2-FLM is an improvement in the way the data is currently presented.

      This experiment was designed to determine whether D2-vFLM protects AG129 mice against homotypic challenge as effectively as DV2-WT. Although the sample size was small, the results support our conclusion. However, we agree with the reviewer that the study should include more animals, and we will increase the group size to n > 8 to 10 in the revised experiment.

    1. Author response:

      We thank the editors and reviewers for their time and feedback. We are encouraged by the feedback that the purpose and abstractions of the model are well articulated and justified, that the explicit control of bursting characteristics is useful, and that the circuit-level validations are convincing.

      Before responding to individual reviewer comments, we would like to address the framing in the current assessment that the model "appears to have limited neurobiological relevance and utility but may be useful as a controller for an artificial system, such as in neuro-robotics applications." We respectfully suggest that this framing understates the model's relevance to neuroscience. Specifically, a growing body of literature aims to understand biological motor control by building embodied simulations. Yet, these simulations either use overly simple artificial neural network (ANN) units without dynamics or computationally intensive biophysical ones that are difficult to train. Our model is not intended as a biophysical account of how individual neurons generate bursts at the level of ionic mechanisms or spikes that goal is already well served by the conductance-based and reduced biophysical models we cite. Rather, its contribution is to make intrinsic bursting dynamics readily incorporable into neural circuit models that can be used in complex settings, with parameters that map directly onto quantities that circuit-level neuroscience most often measures and tunes in models (burst duration, duty cycle, amplitude, shape, input dependence). Indeed, Reviewer #1 notes that: "The purpose of the model and applied abstractions are well articulated and justified [...] This allows modelers to focus on circuit interactions and is especially useful when details of intrinsic currents and bursting mechanisms are unknown. One could even imagine a scenario where this model would help identify predictions on key underlying burst generation mechanisms."

      We see our work as a neuroscience contribution as much as a neuro-robotics one. Bringing tractable, controllable bursting into this regime allows circuit modelers to study how intrinsic bursting interacts with circuit connectivity without committing to specific biophysical mechanisms, and it lets ANN-style models incorporate a class of dynamics that is biologically pervasive but currently underrepresented. We validated the model against two well-studied biological CPGs (the crustacean pyloric circuit and the mammalian locomotor circuit) precisely because the target use case is biological circuit modeling.

      While we remain committed to the belief that bringing bio-inspired neurons with interpretable intrinsic dynamics into ANN-style modeling of biological control systems is a useful contribution as an eLife Methods paper, the reviews have made clear that we have not situated our work clearly enough within the literature. In revision, we will sharpen this positioning in the Introduction and Discussion, and better situate the model relative to both the long tradition of non-spiking relaxation-oscillator and piecewise-linear modeling in neuroscience and also to current trends in simulated control.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Formal analysis

      The paper heavily relies on numerical demonstrations but does not provide a formal analysis of stability, bifurcations, or entrainment. While appropriate for the intended purposes, a more formal footing could strengthen the model.

      We agree that a formal dynamical-systems treatment would deepen the work, and we appreciate the reviewer's acknowledgment that the numerical-only approach may nevertheless be appropriate for the intended purposes. Because the model is hybrid (continuous dynamics combined with discrete switching rules), a full formal analysis is non-trivial, and we view it as a substantial follow-up rather than something to fold into the present manuscript. In revision, we will discuss more explicitly the opportunities such formal analysis presents.

      (2) Parameter tuning and parameter-space characterization

      It is less clear how model parameterization was chosen, how behavior depends on parameterization, and in what parameter ranges certain behavior can be expected.

      We agree that this would substantially improve usability, and we will expand this aspect of the paper. The revision will include: (a) more details describing how parameters maps onto observable features of the bursting waveform, (b) recommended parameter ranges and the qualitative behaviors expected at their boundaries, and (c) practical guidance for tuning the model to match observations or embed into circuits.

      (3) Locomotor CPG interneuron ablation and noise

      The correspondence of these silencing/ablation of neuron classes has not been shown by the model. Importantly, though, it appears that authors didn't show how the model in general behaves under the influence of noise.

      The reviewer is right that the cited work establishes validity of the circuit model in large part through silencing/ablation experiments, and we did not reproduce those experiments. We understand those gait expression phenomena to be arising from non-bursting interneuron activations and a robust solution found for connection weights between them. The half-center bursting neurons only see a time-varying input signal, and their response is well-characterized by the constant, pulse, and periodic analyses we perform. As such, we chose to reproduce a few key experiments to retain a focus on our simplified neuron model. We will rephrase the relevant passages to make this scope explicit and ensure that our reproduction claims are appropriately stated. We will also expand on how the model interfaces with noise together with the proposed parameter-space characterization.

      Reviewer #2 (Public review):

      (1) Biological relevance

      Central pattern generators and other bursting neurons use specific physical principles to generate their bursts of activity. These principles place constraints on the tuning of these bursts, including relationships between active and silent phase durations and other properties. By discarding these relationships, the proposed model risks losing key constraints that affect performance in biologically relevant scenarios.

      We agree that biophysical models impose constraints that arise from underlying mechanisms. For instance, as input alters the curved shape of nullcline-v in Figure 1, the active/quite phase durations and duty cycle change in constrained ways. The question seems to be if our model is too flexible for instance, making it too easy to achieve desired phase durations, duty cycles, and other input-dependent responses. We see this as a valuable feature of our model, not a bug. Firstly, even if our model may be expressive enough to achieve a variety of response profiles (as in Figure 3—figure supplement 3), the careful modeler will ensure matching to experimental observations. Moreover, in many circuit systems, the relevant biophysical details are often unknown for the specific neurons being modeled as noted by Reviewer #1, and the modelers' primary goal is to reproduce circuit-level activity. Such can be achieved easily with a simplified model, and also with a biophysical model as data becomes available. Finally, we should note that modelers can and do tune the parameters of biophysical models within determined ranges in order to achieve desired phase durations and duty cycles, relaxing constraints somewhat in order to reproduce appropriate activity.

      It is also important to note that spikes within bursts can be important and of interest. [...] The authors' model is specific to square-wave bursting.

      We agree that spikes are important and interesting in many settings, and we believe that biophysical models would be most appropriate in these cases. In many cases, too, some abstraction and simplification is desirable, and this would not necessarily detract from the model's biological relevance. As we discuss in our high-level comments, we aim to bring intrinsic bursting dynamics into the ANN-style modeling regime that typically neglects intrinsic dynamics altogether. While the simplified model may be limited in some ways, it is nevertheless useful for many common biologically relevant scenarios, as validated by our circuit experiments. Finally, we would note that many of the raised limitations (no intra-burst spike structure, restricted bursting class, abstracted constraints) are shared by the relaxation-oscillator and piecewise-linear traditions that the reviewer cites approvingly, which suggests that our model lies along a familiar abstraction continuum rather than outside it. In revision, we will explicitly acknowledge that the model captures a basic/regular form of bursting within a broader taxonomy, and clarify the conditions under which abstracting the biophysical constraints is appropriate.

      (2) Practicality

      The model makes use of various cut-off functions and other aspects that are implemented as rules. Combining rules with differential equations makes for an awkward modeling framework

      On the modeling framework, we would defend the hybrid formulation (rules + ODE) as our aim is to prioritize usability by modelers, not the simplicity or elegance of equations. While a "pure-ODE" Fitzhugh-Nagumo-style polynomial may seem simple and elegant—with dv/dt = av^3 + bv^2 + cv + d and a, b, c, d parameters as the reviewer has pointed out a lot of complexity can arise from this. Tuning these parameters is far from intuitive, as small changes can produce nonlinear effects and qualitative shifts in behavior. Achieving the right phase durations, input-dependent scaling, waveform amplitude and shape, phase delays, and other characteristics simultaneously to match experimental data is quite cumbersome in the elegant models, not to mention the biophysical models. In contrast, these characteristics are easy to control in our model, because we translate complex dynamical behavior from implicit to explicit and surface a set of interpretable and tunable parameters.

      The authors argue for their model based on the idea that more biophysical models are difficult to tune, yet they compare their model to a biophysical one that they were able to tune to achieve the various patterns that they study. They do not give any indication of how easy or hard it was to tune their own model [...] The biophysical model seems to have 22 parameters, whereas the simplified one has 21 in Table 2, which is essentially the same number.

      To clarify, we did not tune the biophysical model, but rather copied its parameters from the cited work. We will make this more explicit in the relevant Methods section.

      We could not simply specify or tune these parameters because they have complex biological priors that must be derived from experimental data for example, the membrane capacitance (20 pF), ionic conductance and reversal potentials (4.5 nS, -62.5 mV), and many gating kinetics parameters (slopes, midpoints, time constants for sigmoid/bell curves).

      It is often the case that such parameters must be estimated in specific preparations then reused and refined over many years. For instance, the biophysical model we compare to borrowed parameters from (Kim et al. 2022), which retuned time constants relative to (Danner et al. 2017), which altered NaP conductance from (Danner et al. 2016), which retuned duty cycles from (Molkov et al. 2015), which adapted from respiratory networks of (Rubin et al. 2008), which used gating kinetics parameters from (Butera et al. 1999). Similarly, the crustacean pyloric circuit model we compare to is from (Alonso and Marder 2020), which augmented the circuit and parameters of (Prinz et al. 2004), which sampled from a database of procedurally generated parameters from (Prinz et al. 2003), which developed parameter priors from the lobster STG experimental work of (Turrigiano et al. 1995). These brief descriptions of the multi-decade lineage of parameter sets omit the substantial parallel and preceding work related their development, but they suffice to demonstrate the incredible science and effort that goes into building biophysical models for particular circuits. Such data is often unavailable and such detail is often undesirable for different research goals, in which case our simplified model is a valuable and practical tool.

      The key parameters of our simplified model are observable quantities like active/quiet durations (in seconds), input-dependent duration scaling (as a fraction of intrinsic durations), input strength that induces tonic firing, etc. As such, tuning the bursting neuron parameters for circuit models was easy, with manual tuning from scratch taking less than 1 day. As Table 3 shows, the resulting parameters are often simple, elegant numbers and can be derived directly from observations. For instance, the pyloric PD active and quiet durations (200 ms and 800 ms, respectively) are set using the exact target values that (Alonso and Marder 2020) encode in their objective for a genetic algorithm to tune their model’s biophysical parameters (or rather, a subset of them for tractability).

      Thus, the 22-vs-21 comparison is not very informative, because the parameters are not comparable in kind. However, to make it easier to tune our model, we will revise the manuscript to include: (a) more details describing how parameters maps onto observable features of the bursting waveform, (b) recommended parameter ranges and the qualitative behaviors expected at their boundaries, and (c) practical guidance for tuning the model to match observations or embed into circuits.

      (3) Originality

      What the authors fail to acknowledge is that Rinzel, Terman, Kopell, and others did seminal work on neuronal activity [...] The authors do not cite the substantial existing work on piecewise linear models [...] I don't see any advantage of the proposed framework over the earlier relaxation oscillator setting, where many important mechanistic principles have already been analyzed, including extensions to networks.

      We thank the reviewer for these pointers and apologize for the gap in our literature coverage. While we had cited McKean, FitzHugh-Nagumo, Izhikevich, et al. as representative examples of different model classes, we agree that the broader relaxation-oscillator and piecewise-linear traditions deserve more comprehensive treatment including Rinzel, Terman, Kopell, et al. on relaxation-oscillators; and Hahnloser, Coombes, Aguirre, et al. on piecewise-linear models. We will expand the related work discussion and clarify how our contribution is novel and valuable.

      To be clear, we do not claim to be the first to use piecewise-linear models for neurons. Our intended contribution is the specific construction a rectangular limit cycle whose horizontal/vertical decoupling permits a closed-form mapping from interpretable parameters to burst features and the demonstration that this construction integrates cleanly into firing-rate circuit models of biological CPGs, which we believe will provide realism for more complex models with learned components.

      Moreover, in contrast to many other relaxation-oscillator models including the elegant Fitzhugh-Nagumo-style model we discussed above, our model is not aimed at establishing mechanistic principles or being simple enough to analyze formally. It is a practical tool that affords precise control of many bursting characteristics, which is important for closer alignment between firing-rate circuit models and biological activity. We will state this contribution more precisely in the revision so it is not conflated with a broader novelty claim.

      Reviewer #3 (Public review):

      (1) Novelty of piecewise-linear approximation

      The use of piecewise linear approximations to explicitly estimate properties of biophysical neurons is a well-known and common technique. This study adds nothing to the technique in terms of novelty.

      We agree that piecewise-linear approximations of neurons are not themselves novel, and we have not intended to claim otherwise: We cite the McKean model as a direct predecessor and, prompted by Reviewer #2, we will substantially expand citations to the relaxation-oscillator and piecewise-linear traditions (Rinzel, Terman, Kopell, Hahnloser, Coombes, Aguirre, et al.). Our intended contribution is not the use of piecewise-linear pieces per se but the specific construction: a rectangular limit cycle whose horizontal/vertical decoupling permits a closed-form, interpretable mapping from burst features (duration, duty cycle, amplitude, shape, input dependence) to dynamics, and clean integration into firing-rate circuit models of biological CPGs. We will revise the relevant passages so this contribution and the boundaries of our novelty claim are stated precisely.

      (2) Dynamical system mechanism

      This is no better than having a look-up table [...] The neuron is restricted to what the user puts in, and therefore, calling it a dynamical system is entirely wrong.

      We would like to take the opportunity to clarify this point, because the model's behavior is much richer than the lookup-table characterization suggests. The model is closed-loop: trajectories evolve through coupled state variables whose response to time-varying input depends on current state, not on a precomputed table of input-to-output values.

      Specifically:

      (a) The input represents the net time-varying synaptic drive, not a clamped voltage level;

      (b) The adaptation and voltage variables evolve according to coupled differential equations both on and off the limit cycle;

      (c) The duration and scale parameters only constrain active/quiet durations at input endpoints (-1, 0, +1), while the response at intermediate inputs is determined by the dynamics and other parameters such as the adaptation time constant, which can qualitatively reshape the constant-input response curve (Figure 3—supplement figure 3);

      (d) The response to a transient input depends on the current state for example, excitatory pulses early in the active phase have little effect, as in the biophysical model.

      This is a direct result of the simplified model using a similar limit cycle and nullcline structure as the biophysical model’s dynamical system (Figure 1).

      (3) PRC usage

      The phase resetting curves are used incorrectly. PRCs are useful when the perturbation is weak (soft) [...] A hard PRC would always reset the cycle to the fixed offset from the perturbation phase and is therefore uninformative in understanding dynamics.

      We appreciate this point and would like to clarify what we show and why. We present finite (non-infinitesimal) PRCs across a range of input strengths and signs, spanning both the "soft" (weak-perturbation) regime as well as the "hard" (strong-perturbation) regime, rather than focusing on the "hard" regime alone. Importantly, even in the strong-perturbation regime we do not see that pulses "always reset the cycle to the fixed offset from the perturbation phase". In Figure 4, we see that the active phase exhibits a non-resetting region whose size and location depend on parameters. This region governs entrainability and phase-locking offset, and is thus a key aspect of the neuron's dynamics. Moreover, the strong-perturbation regime is also biologically relevant in our circuit examples. For instance, the inhibitory connections within the pyloric CPG are strong enough to cause hard resets, and these resets shape the circuit-level dynamics we reproduce. We will revise the pulse-input section to state these points more explicitly so the rationale is clear for showing PRCs across a range of inputs.

      (4) Defining active/quiet phases

      The definition of the active and quiet parts of a burst is often less clear than what the authors suggest. Bursting neurons often do multiple bursts in a cycle, and therefore, substituting the burst envelope is a subjective matter. This is even more problematic in bursting neurons in the brain, where there is often no quiet period.

      We agree that waveform envelope can be subjective in some preparations, and we can add this caveat to the discussion.

      On neurons with no quiet period, we note that this behavior is in fact already supported in our model, as seen in Figure 3: under strong excitatory input, both the biophysical and simplified models enter a regime in which firing rate never reaches zero. As the model can generally be viewed as an abstract limit cycle that maps onto periodic waveforms through the firing function, the quiet phase need not correspond to literal silence.

      On more complex waveforms, we could imagine different firing functions that produce richer burst shapes including multi-peak bursts, but we have not tried this explicitly. Of course, for research questions concerned with irregular bursting or spike-to-burst transitions, a lower-level biophysical model would be more appropriate. In revision, we will expand on how the firing function could produce more complex burst shapes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the Reviewers for their careful reading and insightful critiques, which have helped make the manuscript clearer and more impactful.

      In response to the Reviewers, we substantially revised the manuscript to improve clarity, framing, and accessibility for readers outside the Drosophila connectomics community, while keeping the core conclusions unchanged. We clarified the study’s scope (defining parallel circuit architecture rather than testing sufficiency for reconstructing grooming sequence order), restructured the last Introduction paragraph, several Results sections, and the Discussion to foreground the main findings and their relevance to the parallel hierarchical-suppression model. We also added key methodological clarifications for non-specialist readers, including how BMN classes were identified in FAFB by a correlative approach (with type-level, not single-bristle, resolution), how FlyWire/Codex synapse counts are defined (contacts vs T-bars), how sensory BMNs can have postsynaptic sites, and what is meant by ascending vs descending neurons in a brain-only dataset. Across the Results, we improved terminology and definitions (e.g., projection zones, hemilineage 23b, BMN nomenclature such as BM-InOm), clarified what derives from prior work (Eichler et al., 2024) versus new analyses, strengthened interpretation of BMN→motor connections as likely modulatory, and expanded explanation of postsynaptic partner categories. We also revised figures and legends to better highlight overlap/segregation and somatotopy, moved the cosine-similarity matrices into the main figures (new Figure 9), added a new graphical summary figure (new Figure 15), and explicitly acknowledged key limitations, including one-hemisphere analysis and lack of VNC coverage in FAFB.

      In addition, in response to the suggestion of a rank-order test relating BMN→second-order wiring to the grooming hierarchy, we clarified throughout the revised manuscript that this study does not aim to test whether connectivity alone is sufficient to reconstruct grooming sequence order, and we removed wording that could imply such a claim. As detailed in our response to that specific critique below, sequence sufficiency is outside the scope of this study, and a simple linear ordering based on aggregate synapse weights is not straightforward to interpret in this system (e.g., BM-Taste vs. BM-InOm output strength does not track grooming order, BMNs likely contribute to multiple behaviors, and head grooming order is not resolved at sufficient granularity). We therefore respectfully request that the sentence in the eLife Assessment suggesting that the paper is weakened by not including this analysis be removed. As currently written, it frames an out-of-scope analysis as a missing test of the manuscript’s main claims and may mislead readers about the paper’s intended contribution: a synaptic-resolution anatomical definition of parallel BMN circuit architecture and motifs consistent with hierarchical suppression.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Calle-Schuler et. al. reconstruct all the pre- and post-synaptic neurons to the bristle mechanosensory neurons on the adult fly head to understand how neural circuits determine the sequential motor patterns during fly grooming. They find that most presynaptic neurons, interneurons, and excitatory postsynaptic neurons are also somatotopically organized, such that each neuron is more connected to bristles mechanosensory neurons that are closer on the head and less connected to bristles mechanosensory neurons that are further away. These include the direct BMN-BMN circuits, excitatory interneurons, as well as the inhibitory networks. They also identify that the entire hemi-lineage 23b forms excitatory postsynaptic circuits with BMNs, highlighting how these circuits and hence their function could be developmentally determined.

      Strengths:

      This is a complete map of all the neurons that make 5 or more pre- and post-synaptic connections of the fly head BMNs. Using this, the authors have identified various trends, such as ascending neurons providing most of the GABAergic inhibitory input, which could provide the presynaptic inhibition essential for the parallel model for sequential grooming generation. Moreover, they identified that the entire cholinergic hemilineage 23b is postsynaptic to BMNs.

      Weaknesses:

      Although the somatotropic organization is an elegant mechanism to generate sequential motor sequences during grooming, none of the analyses in the paper directly demonstrate that this somatotropic connectivity is sufficient to generate hierarchical suppression and reconstruct the grooming sequence. If somatotropic organization is sufficient, then hierarchical clustering should recover the grooming sequence. Their detailed connectome enables the authors to test if some networks are more crucial for grooming sequence than others: to what extent can each network individually (ascending neurons-BMN alone) or a combination (BMN-BMN, ascending-BMN, BMN-descending, etc.) recover the sequence observed during grooming. If all the pre- and post-synaptic neurons put together cannot explain the sequence, then the sequence is probably determined by individual synaptic strengths or other key downstream neurons.

      We appreciate the Reviewer’s interest in how BMN connectivity relates to the grooming sequence, and agree that understanding how mechanosensory circuits contribute to hierarchical action selection is an important direction. In this study, however, our goal was not to test whether connectivity alone is sufficient to reconstruct the full grooming sequence. Rather, we focused on defining the parallel circuit architecture underlying individual grooming movements and on identifying anatomical features—most notably extensive presynaptic inhibition—that are consistent with previously proposed models of hierarchical suppression.

      We recognize that aspects of the Introduction and the references cited there to prior work on the grooming sequence may have led some readers to expect a direct sequence-prediction analysis. To address this, we revised the Introduction and Results to clarify the scope of the study and adjusted language to avoid implying that we aimed to derive the grooming order from connectivity. Consistent with this framing, the Abstract mentions the sequence only in the context of presynaptic inhibition, which provides anatomical support for existing models of hierarchical suppression. We therefore do not draw conclusions about the ordering of grooming movements from the connectome itself. Details of the specific manuscript revisions are provided below in the Recommendations for authors section.

      The Reviewer suggests testing whether somatotopic organization is sufficient to recover the grooming sequence by clustering BMN connectivity or by examining whether specific subnetworks (e.g., BMN → ascending, BMN → descending, or BMN→BMN pathways) reproduce the sequence. We carefully considered these possibilities. However, several factors currently limit the interpretability of such analyses.

      First, synaptic weight alone does not align with known features of the grooming sequence. For example, BM-Taste neurons contribute the majority of BMN synaptic output, yet proboscis grooming is not the first head grooming movement, whereas BM-InOm neurons contribute less than 9% of total output despite eye grooming occurring first. As we now clarify in the Results, global synapse number therefore does not predict the order of grooming movements.

      Second, BMNs likely distribute signals across multiple behavioral pathways beyond grooming, including circuits involved in feeding and escape behaviors. Because the connectome aggregates all postsynaptic targets, analyses based solely on connectivity strength cannot isolate the subset of circuits specifically responsible for grooming-related action selection.

      Third, the head grooming sequence itself has not been resolved at the spatial granularity required for such analyses across head regions. While eye grooming is well characterized as the first head movement, the relative ordering among antennae, proboscis, and other head bristle regions remains less clearly defined, making it difficult to evaluate correspondence between connectivity-derived rankings and behavioral order.

      Because of these limitations, we concluded that clustering or network-based analyses aimed at reconstructing the grooming sequence from connectivity alone would be difficult to interpret and therefore chose not to include them. Accordingly, we have deliberately avoided claiming that the connectome is sufficient to generate the grooming sequence. Instead, we interpret the somatotopic architecture and inhibitory circuitry described here as anatomical features consistent with previously proposed models of hierarchical suppression, while leaving the question of sufficiency for future studies that integrate connectomics with functional and behavioral analyses.

      Given that we do not claim sufficiency of the connectome for producing the grooming sequence, we respectfully request that the eLife Assessment avoid framing the manuscript around this expectation, as wording that implies the manuscript should reconstruct the sequence from connectivity could misrepresent the intended scope of the study and potentially mislead readers about its primary contributions.

      Reviewer #2 (Public review):

      Summary:

      Schuler et al. present an extensive analysis of the synaptic connectivity of mechanosensory head bristles in the brain of Drosophila melanogaster. Based on the previously described set of bristle afferent neurons, (BMNs), located on the head, the study aims to provide a complete, quantitative assessment of all synaptic partners in the ventral brain. Activation of head bristles induces grooming behavior, which is hierarchically organized, and hypothesized to be grounded in a parallel cellular architecture in the central brain. The authors found evidence that, at the synaptic level, neurons downstream of the BMN afferents, namely the postsynaptic LB23 interneurons and recurrent GABAergic neurons (involved in sensory gain control), are organized in parallel, following the somatotopic organization described for the BMN afferents. This study, therefore, represents an important step towards a better understanding of the cellular circuits that govern the hierarchical order of sequentially organized grooming behavior in Drosophila melanogaster.

      The study is well done, the images are well designed and extensive in number, but the account is challenging to read and digest for the reader outside the Drosophila /connectome community. It is amazing what can be done with the connectome nowadays using the up-to-date FAFB dataset, the analytical and visual tools (as in FlyWire), in combination with known anatomy/physiology/behavior in DM. I suggest that the authors provide more detail on hemilineages, their relationship to the FAB connectome, the predicted neurotransmitter identity, and the use of statistical CatMAID tools used in some of the Figures.

      A graphical summary at the end of the study would be very useful to highlight the important findings focusing on neuron populations identified in this study and their position in the hypothesized parallel central circuitry of BMNs.

      We thank the Reviewer for the thoughtful and constructive comments. In response, we substantially revised the manuscript to improve clarity and accessibility, particularly for readers outside the Drosophila connectomics community. We rewrote portions of the Introduction, Results, and Discussion to better foreground the main findings, reduce density, and more clearly distinguish prior work from the new analyses presented here. We also added methodological clarification throughout, including how BMN classes were identified in the FAFB dataset using a correlative, type-level approach, how FlyWire/Codex synapse counts are defined, and clarified terminology related to projection zones, pre- versus postsynaptic structure, and partner classes. To address the Reviewer’s request for more developmental context, we added a more explicit definition of hemilineages at first mention in the Abstract and Results. In addition, we revised figures and legends to make the somatotopic and parallel organization of the circuitry easier to interpret, including moving the cosine-similarity matrices into the main figures. Finally, in direct response to the Reviewer’s suggestion for a higher-level synthesis, we added a new graphical summary figure (Figure 15) at the end of the manuscript to highlight the principal neuron populations identified in the study and their proposed positions within the parallel central BMN circuitry. Together, we believe these revisions have made the manuscript clearer, more accessible, and better framed for a broad readership while preserving its core conclusions. Details of these changes are provided in the Recommendations for the authors section.

      Reviewer #3 (Public review):

      Summary:

      The authors set out to extend their previous mapping of Drosophila head mechanosensory neurons (Eichler et al., 2024) by reconstructing their full second-order connectome. Their aim is to reveal how bristle mechanosensory neurons (BMNs) interface with excitatory and inhibitory partners to generate location-specific grooming movements, and to identify the circuit motifs and developmental lineages that support this transformation.

      Strengths:

      The strengths of this work are clear. The authors present a comprehensive synaptic-resolution connectome for BMNs, identifying nearly all of their pre- and postsynaptic partners. This dataset reveals important circuit motifs:

      (1) BMNs provide feedforward excitation to descending neurons, feedforward inhibition to interneurons, and are themselves strongly regulated by GABAergic presynaptic inhibition.

      (2) These motifs together support the idea that BMN activity is locally gated and hierarchically suppressed, fitting well with known behavioural sequences of grooming.

      (3) The study also shows that connectivity preserves somatotopy, such that BMNs from neighbouring bristle populations converge onto shared partners, while distant BMNs remain segregated.

      (4) A developmental analysis reveals both primary and secondary partners, suggesting a layered scaffold plus adult-specific elaborations.

      (5) Finally, the identification of hemilineage 23b (LB23) as a core postsynaptic pathway - incorporating previously described antennal grooming neurons (aBN2) - provides a striking link between developmental lineage, anatomical connectivity, and behavioral output.

      (6) Together, the dataset represents a valuable resource for the neuroscience community and a foundation for future functional studies.

      Weaknesses:

      There are also some weaknesses that mostly only limit clarity.

      (1) The writing is dense, with results often presented in a cryptic fashion and the functional implications deferred to the discussion. As a result, the significance of circuit motifs such as BMN→motor or reciprocal inhibitory loops is sometimes buried, rather than highlighted when first described.

      We thank the Reviewer for this helpful suggestion. In response, we revised several sections of the Results to improve clarity and more clearly highlight the functional significance of key circuit motifs when they are first introduced. Specifically, we streamlined dense passages and added brief explanatory statements linking motifs such as reciprocal inhibitory loops to their potential roles in the proposed parallel circuit architecture. Additional details of these revisions are provided in the Recommendations for the authors section below.

      (2) Some assumptions require more explanation for non-specialist readers - for example, how bristle identity is inferred in EM in the absence of cuticular structures, or what is meant by "ascending" and "descending" in a dataset that does not include the ventral nerve cord. While some of this comes from the earlier paper, it would help readers of this one to explain this.

      In response, we added clarifying text describing how BMN types were identified in the FAFB dataset using a correlative approach based on stereotyped projection morphologies and prior light-level anatomical data, and we explicitly state the limits of this type-level assignment in the absence of cuticular bristles in the EM volume. We also expanded the explanation of partner categories, including what is meant by “ascending” and “descending” neurons in a brain-only dataset. Additional details of these revisions are provided in the Recommendations for the authors section.

      (3) Visualization choices also sometimes obscure key conclusions: network graphs can be visually appealing but do not clearly convey somatotopy or BMN-type differences; heatmaps or region-level matrices would make the parallel, block-like organization of the circuit more evident.

      We incorporated connectivity matrices (cosine-similarity heatmaps) into the main figures to more clearly illustrate the somatotopic and parallel organization of BMN connectivity, complementing the network graph visualizations (new Figure 9). These matrices make the block-like structure of BMN partner relationships more apparent and help highlight differences among BMN types; additional details are provided in the Recommendations for the authors section.

      (4) The data might also speak to roles beyond grooming (e.g., mechanosensory modulation of posture or feeding), and a brief acknowledgement of this would broaden the impact.

      We added text acknowledging that BMNs contribute to additional behaviors beyond grooming, such as feeding and other mechanosensory-guided actions. These roles are supported by prior studies of bristle function and are also consistent with the diverse downstream circuits revealed in the connectome. This clarification broadens the interpretation of the dataset while maintaining the primary focus of the study on grooming-related circuitry.

      (5) The restriction to one hemisphere should be explicitly acknowledged as a limitation when framing this as a 'comprehensive' connectome.

      We thank the Reviewer for this suggestion. We now explicitly acknowledge this limitation in both the Results and Discussion.

      In the Results section entitled “The BMN connectome” we added a sentence at the end of the paragraph that mentions the limitations. This sentence reads: “In addition, because our analysis was restricted to BMNs entering the left hemisphere, the complete right-side BMN connectome is not included, limiting assessment of bilateral symmetry, inter-hemispheric coordination, and variability across sides.”

      The last paragraph of the first Discussion section describes limitations to our ‘comprehensive’ connectome. The text in this paragraph pertaining to the left/right variability reads: Second, the analysis focuses only on BMNs from the left hemisphere. Although contralateral neurons synapsing with left-side BMNs are included, the absence of the right-side BMN connectome limits assessment of bilateral symmetry, interhemispheric coordination, and side-to-side variability.

      Overall, the authors achieve their main goal: they convincingly show that BMNs connect into parallel, somatotopically organized pathways, with LB23 providing a key lineage-based link from sensory input to grooming output. The dataset is carefully analyzed, and while the presentation could be streamlined, the connectome will be a valuable resource for researchers studying sensory processing, motor control, and the logic of circuit organization.

      Recommendations for the authors:

      Reviewing Editor Comments:

      We enjoyed this work and are enthusiastic about its contribution: the resource is valuable, and the anatomical evidence is solid. Most of our suggestions concern clarity and visualization, detailed below.

      In addition, the editors and reviewers felt one focused analysis would materially strengthen the paper: please use the BMN→second-order synapse weights to produce a similarity-based, one-dimensional order of BMN types and test its agreement with the known grooming sequence (e.g., via a rank correlation). A positive result would support sufficiency of the mapped wiring for the sequence; if not, the claims can be framed as "consistent with" rather than "sufficient for."

      We appreciate the Reviewers’ interest in how BMN connectivity relates to the grooming sequence, and agree that understanding how mechanosensory circuits contribute to hierarchical action selection is an important direction. In this study, however, our goal was not to test whether connectivity alone is sufficient to reconstruct the full grooming sequence. Rather, we focused on defining the parallel circuit architecture underlying individual grooming movements and on identifying anatomical features—most notably extensive presynaptic inhibition—that are consistent with previously proposed models of hierarchical suppression.

      We recognize that references in the Introduction to prior work on the grooming sequence may have led some readers to expect a direct sequence-prediction analysis. To address this, we revised the Introduction and Results to clarify scope and adjusted language to avoid implying that we aimed to derive the grooming order from connectivity. Consistent with this framing, the Abstract mentions the sequence only in the context of presynaptic inhibition, which provides anatomical support for existing models of hierarchical suppression. We do not draw conclusions about the ordering of grooming movements from the connectome itself.

      The Reviewer-suggested analysis—using BMN-to-partner synaptic weights to derive a linear ordering of BMN types—is conceptually reasonable, but its interpretability is limited at present. First, synaptic weight alone does not align with known features of the grooming sequence: BM-Taste neurons contribute the majority of BMN synaptic output, yet proboscis grooming is not the first head movement, whereas BM-InOm neurons contribute less than 9% of output despite eye grooming occurring first. Second, BMNs likely project to multiple pathways supporting distinct behaviors, such as feeding and escape, complicating any attempt to infer a single grooming hierarchy from aggregate connectivity. Third, the head grooming sequence itself has not been resolved at the granularity required for such an analysis, particularly among the antennae, proboscis, and other head bristle regions. Accordingly, we have deliberately refrained from making claims that connectivity is sufficient to generate the grooming order.

      Given that we do not claim sufficiency of the connectome for producing the grooming sequence, we respectfully request that this point be removed from the public eLife Assessment, as its current wording implies an unmet expectation outside the intended scope of the study and could mislead readers about the manuscript’s primary contributions. We appreciate the opportunity to clarify our framing and to ensure that the goals and outcomes of the work are accurately represented.

      Revisions.

      (1) Gave the last paragraph of the Introduction more structure to clearly state the main findings of the study in the context of what we learned about the circuit architecture proposed by the parallel model of hierarchical suppression.

      New paragraph: “Here, we define the synaptic connectivity of head BMNs by mapping nearly all of their pre- and postsynaptic partners—including other BMNs, ascending and descending neurons, interneurons, and motor neurons—within the FAFB dataset. Consistent with a parallel model, we find that both presynaptic and postsynaptic partners are somatotopically organized, preserving the spatial layout of the bristle map and revealing a set of parallel mechanosensory pathways that correspond to distinct head regions. Within the postsynaptic population, we identify the developmentally-related cholinergic hemilineage 23b (LB23), whose members exhibit region-specific BMN connectivity and include neurons previously shown to elicit aimed head grooming movements when activated. This demonstrates how LB23 neurons participate in parallel postsynaptic pathways that may drive discrete components of head grooming. On the input side, BMNs receive substantial presynaptic inhibition from predominantly GABAergic partners, providing strong feedback and feedforward control over mechanosensory signaling. This inhibitory architecture is consistent with hierarchical-suppression models in which inhibition regulates sensory gain and prioritizes competing actions in the grooming sequence. Together, this mechanosensory connectome reveals core organizational principles—parallel somatotopic architecture, region-specific excitatory pathways, and strong inhibitory regulation—that are thought to constitute foundational circuit motifs supporting head grooming.”

      (2) In the Results section entitled “BMN synapses show large quantitative variation across types”, we added text to the third paragraph that makes it clear that raw synapse numbers alone do not predict the sequence, if one just compares the first movement (eye grooming) and a later movement in the sequence (proboscis grooming).

      That text reads: “Notably, if grooming order were driven simply by relative sensory drive—i.e., by BMN types with the strongest synaptic output eliciting cleaning of their corresponding locations first—then synapse number should track the grooming sequence. Instead, differences in synapse number do not align with the order of the grooming sequence: BM-Taste neurons account for the majority of BMN output, yet proboscis grooming is not the first head grooming movement performed, whereas BM-InOm neurons contribute only a small fraction of output despite eye grooming occurring first (Figure 1E, Figure 2A,B). This indicates that global synapse number alone is not a reliable predictor of the grooming sequence.”

      (3) In the results section entitled “BMN postsynaptic partners are excitatory and inhibitory”, we added text to two different sentences to better link the results with what we are trying to test with respect to the parallel model of hierarchical suppression.

      Modified sentence 1: “This excitation is hypothesized in the parallel model to help form BMN feedforward circuits that elicit aimed grooming of specific body locations, while feedforward inhibition could mediate suppression of competing grooming movements (Figure 1 – figure supplement 1A, B).”

      Modified sentence 2: “Taken together, the BMN postsynaptic partners include a diverse set of neurons that mediate both feedforward excitation and inhibition and feedback inhibition, features predicted by the parallel model.”

      (4) In the Results section entitled “BMNs and LB23 neurons form somatotopic pathways that elicit aimed grooming, we added text to the first sentence that better ties the section to the overall goals of the manuscript.

      That text now reads: “In accordance with the parallel model of grooming, we hypothesize that BMNs connect with somatotopically organized excitatory parallel pathways eliciting aimed grooming of specific head locations (Figure 1 – figure supplement 1A, C).”

      Reviewer #1 (Recommendations for the authors):

      (1) The connectivity matrix (like that in Lesser et al., 2024, Nature, and also in Figure 9, Figure Supplement 1 of this paper) is an easier-to-digest representation of the various connections shown in Figure 2.

      We agree that connectivity matrices provide a clearer and more accessible representation of these data. Based on the context of this and other comments, we understand the Reviewer to be referring to Figure 9 rather than Figure 2. In response, we have moved the cosine-similarity connectivity matrices previously shown in Figure 9 – figure supplement 1 into the main manuscript, where they now appear as Figure 9.

      These matrices depict similarity among BMN postsynaptic partners. At present, we are unable to generate equivalent matrices for presynaptic partners due to recent personnel constraints in the lab. For this reason, we have retained the original network-graph representation (now Figure 10) to display the full pre- and postsynaptic connectome structure.

      We hope this compromise addresses the Reviewer’s request while clearly presenting the available analyses.

      (2) Again, "Cosine based clustering is essential to demonstrate the somatotropic organization" the data in Figure 9 - Figure Supplement 1 demonstrates this better than the main Figure 9. This supplementary figure would be a great addition to the main manuscript.

      Please see the preceding response for details on the changes that we made to address this reviewer comment.

      (3) Figure 9 - Figure Supplement 1A: Can the authors explain why the InOm occur in two clusters (red in top and bottom)? Do InOm neurons show two different kinds of connectivity patterns?

      This is a great question! We had written a possible explanation for this in the Discussion section entitled “A synaptic resolution connectome of a head somatotopic map”.

      “One notable exception to this pattern is the BM-InOm population, which occupies a central position in network diagrams and exhibits broad connectivity similarity with BMNs from across the head (Figure 9A, Figure 10A-E). This likely reflects the large surface area of the compound eyes, which span dorsal, ventral, and posterior regions and neighbor multiple bristle populations. Consistent with previous work showing morphological diversity among BM-InOm neurons (Eichler et al., 2024), our output connectivity analysis suggests the presence of multiple BM-InOm subtypes defined by distinct partner profiles (Figure 9A). Future work will be needed to determine how this heterogeneity relates to spatial organization within the eye.”

      Reviewer #2 (Recommendations for the authors):

      All further comments for the authors are aimed at a better understanding of the text and for clarity. The manuscript needs revision.

      (1) Ventral brain:

      Please specify this term. Is it the SEG, or the gnathal ganglion? Throughout the paper, 'ventral brain', or 'brain', is the only anatomical terms you use. Are all pre-/post- partners of BMNs located in this region? I understand that you provide a statistical analysis on a network level, here, but as far as I know, the neuropil regions in Drosophila are reported in more detail on the macroscopic level (see, e.g., Itoh).

      Based on our understanding of the Ito et al reference, SEG was “retired” in that manuscript in favor of gnathal ganglia. We considered using the term subesophageal zone (SEZ) in the manuscript, but ultimately chose not to adopt it. In the Drosophila brain nomenclature (Ito et al., 2014), the SEZ is defined as a region below the esophagus that encompasses multiple neuropils, such as the gnathal ganglia (GNG) and saddle (SAD), rather than a single anatomically discrete structure.

      In our dataset, the GNG are the ventral-most neuropil containing the BMN projections and the highest density of BMN-related synapses, and we therefore refer to this structure explicitly where appropriate. However, BMN pre- and postsynaptic partners are not confined to the GNG or to the SEZ as a whole; some partner neurites extend dorsally into additional neuropils. As a result, the term SEZ does not accurately capture the full spatial extent of the BMN connectome analyzed here.

      For clarity and consistency across analyses that span multiple adjacent neuropils, we therefore use the broader functional descriptor “ventral brain”, while explicitly identifying the gnathal ganglia and other neuropils when discussing neuropil-level synapse distributions. We believe this approach most accurately reflects both the anatomical organization of the circuit and the scope of our analysis.

      Given this Reviewer’s comment, we anticipate that not mentioning the SEZ in this manuscript might result in similar confusion among readers of our manuscript. Therefore, we now mention the SEZ and the supraesophageal zone (SPZ) at the end of the Results section entitled “Synapses of BMN partners are mostly concentrated in the ventral brain”. We also added the SEZ and the SPZ to the new last summary figure (Figure 15) to help clarify the locations of the BMNs and their second order connectome.

      That text reads: “Thus, while most neuropils containing synapses of second-order BMN partners are located below the esophagus (in the subesophageal zone, SEZ), we found more limited involvement of neuropils in the supraesophageal zone (SPZ; above the esophagus), suggesting relatively limited direct top-down control.”

      (2) Please provide greater clarity in your use of the terms synapse-presynapse-pre- and postsynaptic partners:

      In insects, synapses are polyads. It is therefore essential to distinguish whether by presynaptic (pre) you mean 1. the number of T-bars (presynaptic sites) or 2. the number of (outgoing) synaptic contacts made by a single presynaptic T-bar site. For example, a synapse configured as a tetrad (a polyad) consists of one presynaptic T-bar opposed to four postsynaptic profiles and can be counted either as one synapse (one presynaptic site, one T-bar, in CATMAID: a presynaptic connector) OR as four (outgoing) synaptic connections since the single T-bar connects to four different postsynaptic profiles. This distinction is crucial for quantifying synaptic networks in insects. Thus, the "number of synapses" may refer to 1. The number of presynaptic sites = number of T-bars = number of polyads formed by a particular neuron. 2. the number of actually outgoing synaptic contacts, a number that also reflects the degree of polyadicity. 3. number of postsynaptic sites (that is easy).

      This distinction (regarding the counts of presynapses) was reported in previous connectome studies (e.g., Horne, 2018; Gruber, 2025; Schlegel,2023). Schlegel notes: ' Insect synapses are polyadic, i.e., each presynaptic site can be associated with multiple postsynaptic sites. In contrast to the Janelia hemibrain dataset, the synapse predictions used in FlyWire do not have a concept of a unitary presynaptic site associated with a T-bar. Therefore, presynapse counts used in this paper do not represent the number of presynaptic sites but rather the number of outgoing connections.' End of citation from Schlegel.

      We thank the Reviewer for highlighting this important distinction. We now clarify in the Materials and methods that synapse counts are based on Codex/FlyWire annotations, which report individual pre- and postsynaptic contacts rather than unitary presynaptic sites (T-bars), consistent with prior FlyWire-based connectome studies (e.g., Schlegel et al.). We also added a brief clarification in the Results indicating that pre- and postsynaptic numbers refer to incoming and outgoing contacts.

      We added a sentence to the first section of the Materials and methods entitled “Connectome data and neuron meshes”. This text reads: “Synapse counts throughout this study are based on FlyWire/Codex synapse annotations and represent the number of individual pre- to postsynaptic contacts (incoming or outgoing connections), rather than the number of presynaptic active sites (T-bars); thus, presynaptic counts reflect polyadic connectivity as described previously (Schlegel et al., 2023).”

      (3) In your study, a potential misunderstanding of this distinction arises when comparing statements on line 168 versus line 184:

      On line 168, you state: '... each BMN type having .... more postsynaptic than presynaptic sites'. However, on line 184 you state: 'There were significantly more postsynaptic than presynaptic partners, in agreement with the BMNs containing more presynaptic than postsynaptic structures. These are contradictory: the statement on line 168 seems to refer to the number of presynaptic T-bars, while on line 184 you refer to the number of actually outgoing connections (which more accurately reflects the degree of polyadicity). Since BMNs are sensory afferent, they are indeed expected to have more outgoing synapses into the central brain.

      We thank the Reviewer for identifying this mistake. We have revised the sentence at former line 168 to now read: “In addition to differing in total synapse number, BMN types vary in their pre- versus postsynaptic composition: all BMNs contain both (Eichler et al., 2024), with presynaptic sites outnumbering postsynaptic sites by ~2× to ~9× across types (mean ≈5:1 output-to-input ratio, Figure 2 – figure supplement 1A, B, Supplementary file 2, Supplementary file 3).”

      (4) Identification of bristle sensory afferents in the brain:

      This is explained in more detail in the Eichler paper, but not here. I do not understand how you identified these neurons in the FAFB dataset. The number and distribution of the individuum of the FABF EM dataset are not known, and because there is variability in the number of bristles in individual flies, the true number of bristle neurons for synaptic analysis can only be estimated. The correlative approach necessary to find the bristle sensory neurons in the FAFB set is still unclear to me. See also my comments on Figure 1.

      We thank the Reviewer for raising this point. We agree that our original draft did not clearly explain the correlative approach used to identify head BMNs in the FAFB dataset, and we have revised the manuscript to make this workflow explicit.

      In our prior work (Eichler et al., 2024), we quantified the number of bristles in each head bristle population and assessed the extent to which populations are invariant versus variable across individuals. This established an expected range for BMN counts by bristle population and clarified the level of variability that can be expected biologically.

      We then identified BMN types corresponding to specific bristle populations using different techniques, such as dye fills and light microscopy, which allowed us to define the characteristic projection morphologies and CNS entry routes associated with each population. These light-level anatomical signatures provided the basis for locating the corresponding axons in the FAFB EM volume and reconstructing the same neuron classes in EM. Importantly, because bristles themselves are not present in the EM volume, this approach supports type-level assignment (bristle population/BMN class) rather than single-bristle resolution, and we now state this explicitly to avoid overinterpretation.

      To ensure this is clear to readers who have not read Eichler et al., we have added explanatory text in the Results and expanded the Figure 1 legend describing: (i) how BMN types were identified and matched, (ii) what can and cannot be resolved given natural bristle-number variability, and (iii) how this impacts interpretation of “completeness” at the level of BMN types rather than individual bristles.

      In paragraph 1 of the first Results section, entitled “BMN synapses are somatotopically distributed in the ventral brain”, we added text that briefly describes the previous linkage of the head BMNs to the FAFB dataset. That text reads: “In prior work (Eichler et al., 2024), we showed that head bristle populations are innervated by specific BMN types whose axons project to distinct, spatially localized regions (projection zones) in the ventral brain (Figure 1C,D, left, Figure 1 – figure supplement 2A-E). This was determined using dye fills and light-microscopy-based tracing to identify BMN types innervating defined head bristle populations and to establish their characteristic brain projection morphologies. Bristle population counts and their variability across individuals provided expectations for BMN number per type. This quantitative constraint, combined with the highly stereotyped projection morphologies, provided a correlative anatomical framework to locate and reconstruct nearly all BMNs in the FAFB serial-section EM volume and map their projections into the CNS. Because FAFB does not include the head cuticular bristles, individual BMNs could not be linked to single bristles. Therefore, these assignments are necessarily correlative and provide type-level (population) rather than single-bristle resolution. Nevertheless, this level of resolution was sufficient to define somatotopically organized projection zones."

      (5) Results:

      (a) Line 102: explain hemilineage 23 B

      We added text in the manuscript to better define hemilineages.

      In the Abstract, we added to a sentence that highlights that the LB23 neurons are developmentally related. That sentence now reads: “We identified an excitatory cholinergic hemilineage (hemilineage 23b), a developmentally related group of neurons that elicits aimed head grooming and exhibit differential connectivity with BMNs from distinct head locations, revealing a lineage-based somatotopically organized parallel circuit architecture.”

      Results section entitled “The entire cholinergic hemilineage 23b (LB23) is postsynaptic to BMNs”, we added a sentence that defines hemilineage at its first mention in the Results section. We also made slight modifications to the preceding and following sentences. That text reads: “To identify neurons crucial for establishing the BMN-postsynaptic parallel pathways that elicit head grooming movements, we focused on secondary hemilineages. In the Drosophila CNS, a hemilineage refers to the cohort of neurons derived from a single stem cell-like neuroblast that share a common developmental origin, stereotyped morphology, and are thought to have related functional roles within a circuit (Harris et al., 2015; Wreden et al., 2017). This focus was motivated by earlier findings that neurons whose activation elicited head grooming had morphologies consistent with specific hemilineages (Hampel et al., 2015; Seeds et al., 2014).”

      (b) Line 151: - line 171: it is not clear to me what a projection zone is.

      We thank the Reviewer for raising this point. We agree that the term “projection zone” benefits from a brief clarification. We have made minor edits at two locations to explicitly state that projection zones refer to spatially localized regions of BMN axonal arborization and synaptic distribution corresponding to specific head locations.

      Changes made in the manuscript:

      A sentence that first introduces the term in the fourth paragraph of the Introduction now reads: “Indeed, the BMN axon projections in the central nervous system (CNS) show a somatotopic arrangement, where distinct projection zones—spatially localized regions of axonal arborization and synaptic output—correspond to specific head and body locations (Eichler et al., 2024; Johnson and Murphey, 1985; Murphey et al., 1989; Newland, 1991; Newland et al., 2000; Tsubouchi et al., 2017).”

      In a sentence in the first paragraph of the first Results section, we added a brief clarifying definition of “projection zones” at their first mention in the Results. That sentence reads: In prior work (Eichler et al., 2024), we showed that head bristle populations are innervated by specific BMN types whose axons project to distinct, spatially localized regions (projection zones) in the ventral brain (Figure 1C,D, left, Figure 1 – figure supplement 2A-E).

      (c) Input-output versus presynapse-postsynapse?

      A revised sentence in the last sentence of the Results section makes this distinction clear: In addition to differing in total synapse number, BMN types vary in their pre- versus postsynaptic composition: all BMNs contain both (Eichler et al., 2024), with presynaptic sites outnumbering postsynaptic sites by ~2× to ~9× across types (mean ≈5:1 output-to-input ratio, Figure 2 – figure supplement 1A,B, Supplementary file 2, Supplementary file 3).

      (6) Figures:

      For clarity, it would be helpful if you indicated by the arrow the name of the sensory location (antenna, eye, etc.).

      We appreciate this suggestion. Major sensory locations corresponding to different head bristle populations are indicated in Figure 1 – figure supplement 1C. We explored adding these labels directly to Figure 1A, but found that doing so made the panel overly crowded and less clear. To improve visibility while keeping the main figure uncluttered, we now explicitly direct readers to this figure supplement in the Introduction.

      Specifically, we added a reference to Figure 1 – figure supplement 1C in the following sentence in the Introduction: Dust-induced head grooming is performed by the forelegs that start with the eyes and progress to other locations such as the proboscis and antennae (major head locations shown in Figure 1 – figure supplement 1C) (Seeds et al., 2014).

      (a) Figure 1:

      A: the presence of bristle types on the head. Are the JO afferents you mention in the text reported here?

      Figure 1 does not include the JONs, which were described in detail in our previous study (Hampel et al., 2020).

      The JONs are mentioned in the Figure 1 – figure supplement 1. We have added text to this legend to indicate that the JONs are not the subject of this study. This text reads: “(C) Mechanosensory neurons from different head locations project to distinct, somatotopically organized zones in the ventral brain and elicit aimed grooming of those locations, including the antennae (via JONs [Johnston’s organ neurons; not analyzed in this study] and BMNs), eyes (BMNs), and proboscis (BMNs).”

      Are the reconstructions shown 1 B-D also from the Eichler paper?

      We regret that this was not explicitly stated in the figure legend, and have revised the legend to distinguish between what was previously published and what is new to this study.

      In the Figure 1 legend, we revised the following sentence: (C, D) Reconstructed BMN projections in the ventral brain (left, previously described in (Eichler et al., 2024)) and their corresponding pre- and postsynaptic sites (right, this study), colored by type according to the bristles that they innervate.

      To make this clearer in the main text, we have rewritten the first sentence in the first paragraph of the Results: In prior work (Eichler et al., 2024), we showed that head bristle populations are innervated by specific BMN types whose axons project to distinct, spatially localized regions (projection zones) in the ventral brain (Figure 1C,D, left, Figure 1 – figure supplement 2A-E).

      The dots are symbolic, or do they represent the number of bristles? The number of bristles cannot be identified, and thus stems from the FABF dataset.

      The dots are symbolic and do not represent the number of bristles in the FAFB dataset. As noted in response to a related reviewer comment above, the numbers and variability of head bristles were quantified in our prior work (Eichler et al., 2024). We also used dye fills and light-microscopy approaches, which provided the framework for linking BMN types to bristle populations. We have clarified this point in the revised manuscript, as described in the response above.

      Synapse number of bristle afferents: number of all pre-and postsynaptic contacts?

      We have addressed this point above.

      (b) Figure 2:

      Again, the term synapses refers to all pre-and postsynaptic contacts ?

      The Figure 2 legend indicates that synapse numbers include both input and output synapses. Additionally, now the first reference to Figure 2 indicates that numbers refer to both input and output synapses.

      (c) Figure 2:

      Supplement presynaptic/postsynaptic means pre- and post partner?

      Presynaptic: number of BMNs that were connected with at least 5 synapses to any given presynaptic partner (n), the numbers of synaptic inputs to BMNs (inputs), and the number of presynaptic partners (partners). Postsynaptic: number of BMNs that were connected with at least 5 synapses to any given postsynaptic partner, the numbers of synaptic outputs to postsynaptic partners, and the number of postsynaptic partners.

      (d) Figure 3:

      Explain downstream-upstream

      Downstream refers to postsynaptic while upstream refers to presynaptic partners or pathways.

      Comparing the right side of the Sankey d. with your diagram in B, just by judging, I see more partners of descending (post) than interneurons (post) in A. However, in B, there are clearly more postsynaptic interneurons than descending posts? There are no numbers in Figure 3A.

      This is a great point! Figure 3A (the Sankey diagram) summarizes the fraction of BMN synaptic output distributed across partner classes, normalized within each BMN type. In this representation, descending neurons occupy a larger fraction because, across BMN types, they collectively receive a higher proportion of BMN output synapses.

      In contrast, Figure 3B (the sunburst plot) summarizes the number of distinct postsynaptic partner neurons in each category. Here, interneurons are more numerous than descending neurons, even though individual interneurons tend to receive fewer BMN synapses on average.

      Thus, the two plots are consistent: descending neurons are fewer in number but receive more synapses per neuron, whereas interneurons are more numerous but receive fewer synapses per neuron on average. When postsynaptic synapse counts are summed (as in the bottom plots), the totals for descending neurons and interneurons can therefore appear similar, despite their different representations in the Sankey diagram.

      We have added text in the Results section entitled “BMN synaptic partners in the CNS: ascending, descending, and interneurons”. Text was added here because it also nicely responds to another Reviewer comment below for more description of the postsynaptic partners. That added text reads: “Interneurons are more numerous as distinct partner neurons, whereas descending neurons receive a larger fraction of BMN output synapses across BMN types (Figure 3A,B). Thus, descending neurons are fewer in number but tend to receive more BMN synapses per neuron on average, while interneurons are more numerous but often receive fewer synapses per neuron.”

      (e) Figure 10: I cannot see colored circles. I found Figure 10 very hard to understand. Is this a visualization created in CATMAID? As I mentioned before, a graphical summary highlighting the information flow and architecture of the circuits analyzed in this study would be useful. In such a diagram, you could combine the findings of your study, the open question, and the undeciphered pathways. In short, a schematic of the current knowledge of the potentially parallel and recurrent architecture of the BMN circuitry.

      Figure 10 (now Figure 11) is intended to specifically examine neurons that are both pre- and postsynaptic to BMNs, rather than to summarize the full connectome. The goal of this figure is to highlight two features of pre/post neurons: their somatotopic connectivity with BMN types and the presence of bilaterally symmetric neuron pairs that connect to common BMN populations.

      This visualization was generated from connectome-derived connectivity data and not from CATMAID, although it uses neuron reconstructions and synapse annotations from the FAFB dataset. The colored nodes represent BMN types and are now consistently referred to as “dots” rather than “circles” to better match their appearance. We have simplified the figure legend to clarify these points.

      In response to this and related comments, we also added a new graphical summary figure (Figure 15) at the end of the manuscript that schematically summarizes the information flow and parallel, recurrent architecture of the BMN circuitry at a higher level.

      (7) Discussion:

      I found the first part of your discussion hard to read; the second part is better. You can condense the discussion by mentioning the results/hypothesis of previous work once, and avoiding repetitions, such as the uniqueness of the BMN connectome/FAB dataset.

      In response to this comment, we condensed the opening portion of the Discussion by reducing repetition of background and prior findings, particularly references to earlier BMN work and the uniqueness of the FAFB dataset. We streamlined overlapping sections, mentioned prior hypotheses and results only once, and focused the revised text more directly on the new contributions of this study—namely, the synaptic-resolution organization, somatotopic connectivity, and circuit principles revealed by the BMN connectome.

      There are several cases of vague sentences, e.g.: a) Line 827: 'Head BMNs project from bristles to somatotopically organized zones in the brain (? ventral brain ?), with those innervating neighboring populations (? of bristles ?) occupying overlapping zones (Figure 1A-D)'.

      We made this suggested change: Head BMNs project from bristles to somatotopically organized zones in the ventral brain, with those innervating neighboring bristle populations occupying overlapping zones (Figure 1A-D).

      A remark: maybe you should indicate in Figure 1D the overlapping and segregated zones. The resolution is very low in these images.

      We thank the Reviewer for this comment and agree that overlap versus segregation of projection zones was not sufficiently guided in the original presentation. Rather than adding arrows to Figure 1C,D, which we felt would reduce clarity, we now explicitly describe how overlap and segregation can be identified based on color mixing of BMN synapses in the text and figure legend. In addition, we highlight these features more clearly in Figure 1 – figure supplement 3, which provides higher-resolution, multi-view visualizations of BMN synapses where overlap and non-overlap are most evident.

      Results:

      Segregation between projection zones is apparent where synapses of distinct BMN types occupy non-overlapping regions with little or no color mixing, whereas overlap between projection zones is visible as spatial intermixing of differently colored synapses from neighboring BMN types (Figure 1C, D, right, Figure 1 – figure supplement 3A-E).

      Figure 1 legend:

      Overlapping projection zones are evident where synapses of different BMN types spatially intermingle, whereas segregated zones show little or no color mixing.

      Figure 1 – figure supplement 3 legend:

      These views highlight both overlapping projection zones, visible as intermingled synapses of different colors from neighboring BMN types, and segregated zones, where synapses from distinct BMN types remain spatially separated with minimal color mixing.

      (b) Line 860: What is: 'location groomed'?

      Added a clarification to this sentence: Thus, the location groomed (i.e. antennae) corresponds to the location of the majority of BMN inputs.

      (c) Line 944: 'The sensory to motor resolution' What do you mean, here?

      We have revised this sentence to “The spatial resolution of the sensory-to-motor transformation in this parallel circuit architecture remains to be tested.”

      (d) The term: 'neighboring bristles' is unclear. Does it mean 'neighbor relates to members within he same bristle type (antennae)', or 'bristles of different types', e.g. antennae and eye bristles.

      We thank the Reviewer for raising this point. Throughout the manuscript, the term “neighboring bristles” is used primarily to refer to neighboring bristle populations (i.e., bristles from different anatomical groups that are spatially adjacent on the head). In some contexts, the term is also used more generally to describe spatial proximity, regardless of whether the bristles belong to the same or different populations. Importantly, in both cases, the usage reflects the same underlying observation: BMNs innervating bristles that are spatially closer—whether within or between populations—show greater similarity in their postsynaptic connectivity than BMNs innervating more distant bristles.

      (e) Avoid abbreviations, or explain shortly, the term under discuss: line 725: BMlnOm?

      We thank the Reviewer for pointing out that the BMN nomenclature was not sufficiently clear. BMNs are named according to the bristle population they innervate (e.g., BM-Ant neurons innervate antennal bristles; BM-InOm neurons innervate interommatidial eye bristles), as defined in the Figure 1 legend. To improve clarity, we ensured that the first occurrences of these terms in the Results explicitly include the corresponding head location (e.g., “eye BM-InOm neurons”), and we added brief contextual reminders at later points where this abbreviation appears. These changes clarify the meaning of BM-InOm and related abbreviations without introducing additional terminology.

      Changes made:

      Figure 1 legend: clarified that BMNs are named according to the bristle population they innervate (e.g., BM-Taste neurons innervate Taste bristles).

      Results, early first section (second paragraph): added head-location qualifiers at first mention (e.g., “eye BM-InOm neurons,” “proboscis BM-Taste neurons”) in sentences such as: “35 BM-Taste neurons innervating Taste bristles on the proboscis…” and “405 eye BM-InOm neurons innervating the interommatidial bristles on the eyes…”.

      Later Results text where the abbreviation appears (including the sentence addressing the 5-synapse cutoff): added “eye” before BM-InOm for context (e.g., “although 555 eye BM-InOm neurons are present… only 405 meet the five-synapse threshold”).

      (f) LB23 hemilineage: what was that again?

      We added text in the manuscript to better define hemilineages. This is described above in response to another Reviewer suggestion.

      (g) Line 732: What are ascending neurons?

      We had already included a definition of ascending neurons in the second Results section entitled “The BMN connectome”. Since this was not clear to the Reviewers, we expanded on this section. There is now a new paragraph in this same section. This paragraph reads:

      “Partners were grouped into five morphological categories—interneurons, descending neurons, ascending neurons, BMNs, and motor neurons—following FlyWire annotations (Dorkenwald et al., 2024). Interneurons were defined as neurons whose soma and all neurites were confined to the brain. Descending neurons were defined as neurons whose somata are located in the CNS and whose neurites extend into the descending tracts toward the ventral nerve cord (VNC). Conversely, ascending neurons were identified as neurons whose neurites enter the brain through the cervical connective and whose somata lie outside the FAFB imaged volume, resulting in only their neurites being visible in the dataset.”

      (h) Line 896: What is lineage matching?

      We thank the Reviewer for pointing this out. We realized that this sentence did not add clarity and contributed little to the manuscript, so we removed the sentence that used “lineage matching” from the manuscript.

      (i) Line 926: The Previous work ... sentence makes no sense to me.

      The sentence was reworked and now reads: “The mechanosensory neurons hypothesized from the parallel model that elicit the Drosophila grooming sequence were identified in previous work (Eichler et al., 2024; Hampel et al., 2020a, 2017, 2015; Mueller et al., 2019; Seeds et al., 2014; Zhang et al., 2020).”

      (j) The FAB-dataset is indeed unique, but the fact that it is repeated several times in your discussion does not ensure understanding of the obviously complex circuit architecture potentially underlying behavior. Please, focus on your discussion strictly and condense your arguments to the specific contribution and outcome of the data in the current manuscript.

      In response to this comment, we condensed the opening portion of the Discussion by reducing repetition of background and prior findings, particularly references to earlier BMN work and the uniqueness of the FAFB dataset. We streamlined overlapping sections, mentioned prior hypotheses and results only once, and focused the revised text more directly on the new contributions of this study—namely, the synaptic-resolution organization, somatotopic connectivity, and circuit principles revealed by the BMN connectome.

      (k) At some parts of the discussion, it is not clear to me, if you refer to results of the actual study or refer to previous studies (Hampel, Eichler) e.g., 'Our work has shown ...' on line 872.or '...we find ... LB23 neuron elicit antennal grooming....'. or line 909: Our work reveals ......

      Sentence a former line 872 was revised and now reads: “While our past and present work together reveal that a subpopulation of LB23 neurons elicits antennal grooming, we also find evidence that other LB23 neurons in the hemilineage elicit additional head grooming movements.”

      Sentence at former line 909 was revised and now reads: “Our previous work and the present study reveal that the antennal grooming circuit receives inputs from two different classes of antennal mechanosensory neurons, the BMNs and JONs.”

      Reviewer #3 (Recommendations for the authors):

      All my comments are mostly only for clarity.

      (1) It would help readers if the manuscript explicitly stated how a sensory neuron can be postsynaptic - i.e., that BMN axons receive inhibitory inputs in the CNS - since this may not be intuitive to a broader audience.

      We appreciate this comment and added the following text to the last paragraph of the first Results section: As expected for sensory afferents, BMNs provide synaptic output to downstream circuits; however, the presence of postsynaptic sites may be less intuitive, and reflects that BMNs can also receive synaptic input onto their central axons within the CNS.

      (2) Figure 1 is a helpful context, but since much of it is directly reused from Eichler et al., 2024, it would strengthen the presentation if you clarified what is new here (e.g., the synapse quantification) versus what is recap. In addition, for readers less familiar with EM connectomics, it would be valuable to spell out how bristle neurons are assigned to classes in the absence of bristles themselves in the volume - i.e., that classification rests on stereotyped nerve entry and projection zones, which allow type-level but not single-bristle resolution. Explicitly flagging these methodological boundaries up front would make it clearer what information comes from the current work, what derives from previous reconstructions, and what the limits of resolution are.

      We have addressed this recommendation above for a similar suggestion by Reviewer 2 (see above for details). In brief, we inserted an overview of the methodology used to identify BMN types in the FAFB dataset, and we now explicitly state the limitations of this correlative approach. We added a sentence in the first paragraph of the Results section that states, “Because FAFB does not include the head cuticular bristles, individual BMNs could not be linked to single bristles. Therefore, these assignments are necessarily correlative and provide type-level (population) rather than single-bristle resolution.” In addition, we revised the Figure 1 legend to more clearly distinguish panels and reconstructions that were previously reported in Eichler et al. (2024) from synapse quantification and analyses that are new to the present study.

      (3) BMNs from neighboring bristle populations converge onto shared partners, while distant BMNs remain segregated - while the overlap was clear, the segregation was not visually clear in the first figure.

      We thank the Reviewer for this suggestion. We have addressed this point in our response to a similar comment from Reviewer 2 (see above), where we clarified how overlap versus segregation can be identified in Figure 1 and strengthened the text and figure legends to guide readers to these features without adding clutter to the figure.

      (4) The identification of direct BMN → motor neuron synapses is intriguing, but since these inputs make up only a small fraction of motor neuron synapses, it would help if the authors explicitly cautioned readers that these are likely modulatory contributions rather than stand-alone reflex arcs. This would prevent over-interpretation of the sensory-motor link. Similarly with the BMN>BMN connections.

      We thank the Reviewer for this suggestion. We revised the Results section “BMN postsynaptic motor neurons” to more explicitly caution that the direct BMN → motor neuron connections are likely modulatory rather than stand-alone reflex arcs, consistent with their small contribution to total motor neuron input. The revised text reads: “However, BMN inputs accounted for only a small fraction of total synapses onto each motor neuron (≦6.28% of total inputs/BMN type, Figure 4 – figure supplement 1, Supplementary file 7), suggesting a modulatory contribution rather than direct sensory-driven motor activation.”

      (5) Since the FAFB dataset only includes the brain, it would be helpful to clarify what is meant by "ascending" and "descending" partners in this context - namely that ascending neurons are VNC-derived axons entering the brain, while descending neurons are brain-derived neurons projecting out toward the VNC. Explicitly stating this will prevent confusion, given that all BMNs themselves terminate in the SEZ.

      We had already included definitions in the second Results section entitled “The BMN connectome”. Since this was not clear to the Reviewers, we expanded on this section. There is now a new paragraph in this same section. This paragraph reads: Partners were grouped into five morphological categories—interneurons, descending neurons, ascending neurons, BMNs, and motor neurons—following FlyWire annotations (Dorkenwald et al., 2024). Interneurons were defined as neurons whose soma and all neurites were confined to the brain. Descending neurons were defined as neurons whose somata are located in the CNS and whose neurites extend into the descending tracts toward the ventral nerve cord (VNC). Conversely, ascending neurons were identified as neurons whose neurites enter the brain through the cervical connective and whose somata lie outside the FAFB imaged volume, resulting in only their neurites being visible in the dataset.

      (6) In the section titled "BMN synaptic partners in the CNS: ascending, descending, and interneurons", the balance of explanation is skewed toward presynaptic input to BMNs. It would strengthen clarity if you expanded equally on the postsynaptic side (i.e., BMN outputs) or explicitly signposted why the focus here is on inputs. That way, readers won't be left wondering whether outputs are less important or just deferred to later figures.

      We have revised the section that was previously skewed toward presynaptic BMNs. This section also addresses some confusion about interpreting Figure 3, from a critique from Reviewer 2. The section now reads: “Postsynaptic connections were predominantly interneurons (56%), with significant contributions from descending (28%) and ascending (16%) neurons (Figure 5D, F,H,J). Interneurons are more numerous as distinct partner neurons, whereas descending neurons receive a larger fraction of BMN output synapses across BMN types (Figure 3A, B). Thus, descending neurons are fewer in number but tend to receive more BMN synapses per neuron on average, while interneurons are more numerous but often receive fewer synapses per neuron. Together, these partner categories underscore the strong integration of BMNs with local brain circuitry (interneurons), and with pathways linking the brain and ventral nerve cord (VNC), through ascending neurons that provide VNC-derived synaptic input and descending neurons that carry BMN output toward the VNC.”

      (7) The network diagrams in Figure 9 convey clustering, but a complementary heatmap of BMN type × partner connectivity could highlight the parallel organization more clearly. This would make the block-like separation of dorsal, ventral, and posterior subnetworks more immediately apparent, reinforcing the conclusion of parallel somatotopy-based processing. This section would also benefit from drawing the functional message more explicitly: that BMNs form largely independent, somatotopically aligned pathways with regional overlap, supporting the idea of parallel grooming circuits. Right now, the text reads as a connectivity catalog, and the key concept of parallel regional architecture risks being underemphasized.

      We agree that connectivity matrices provide a clear and accessible representation of these data. We have moved the cosine-similarity connectivity matrices previously shown in Figure 9 – figure supplement 1 into the main manuscript, where they now appear as Figure 9. These matrices depict similarity among BMN postsynaptic partners. For this reason, we have retained the original network-graph representation (now Figure 10) to display the full pre- and postsynaptic connectome structure.

      Based on the Reviewer’s suggestion to clearly state the key concepts of the parallel architecture, we added a sentence to the end of the Results section entitled: Somatotopy-based connectivity among BMN synaptic partners in the CNS. That text reads: “Thus, the BMNs form largely independent, somatotopically aligned pathways with regional overlap, supporting the idea of parallel grooming circuits.”

      (8) It would help if the manuscript if the authors explained more explicitly the somatotopy logic (that reciprocal inhibition preserves local head regions, ensuring that suppression and gain control act locally) more clearly. At present, the narrative is buried in network-graph detail - a heatmap or simple region-level summary would make this organizational principle much clearer to readers.

      We thank the Reviewer for this suggestion. To make the somatotopy logic of pre/post feedback inhibition clearer and less buried in network-graph detail, we revised the text in this Results section to more explicitly distinguish (i) reciprocal, head-region–localized inhibitory loops that could support local gain control from (ii) non-reciprocal cross-type inhibitory pathways that could contribute to heterotypic suppression between head regions. In addition, we modified the figure to more clearly convey somatotopy by adding text on the plot and updating the legend to state: “Bold text indicates the general head location of BMNs on the plot, revealing somatotopy-based connectivity with pre/post neurons (i.e. ventral, dorsal, posterior, and the ventral/dorsal transition).”

      (9) Please adjust the section title, "LB23 hemilineage member neurons elicit aimed head grooming movements" to avoid implying new functional experiments. For example:

      (a) "LB23 neurons include previously defined antennal grooming command neurons" or

      (b) "LB23 hemilineage anatomically corresponds to grooming-related neurons".

      This would make it clear that the contribution here is anatomical linkage, not fresh functional data.

      We changed the section title to the Reviewer-suggested title b: LB23 hemilineage anatomically corresponds to grooming-related neurons

      (10) The current network graphs in Figure 13B are not very intuitive - it is hard to visually extract the somatotopy. A connectivity heatmap or matrix (BMN types on one axis, LB23 neurons or subgroups on the other, with synapse strength as colour) would make the block-like, region-specific mapping immediately clear. A coarse-grained version (e.g., dorsal/ventral/posterior BMNs vs LB23 subgroups) could further highlight the parallel, somatotopically organized pathways. This would better support the central claim of Figure 13 than the current spring-layout graphs. Figure 13F does this for BMN inputs onto aBN2 neurons. (But it is presented only in binary form; could the authors not add a graded colour scale proportional to synapse number?)

      The binary form was necessary because the results are from different sources (i.e. Catmaid versus flywire synapse counts) with different synapse numbers.

      We modified the Figure 13B to more clearly convey somatotopy by adding text on the plot and updating the legend to state: “Bold text indicates the general head location of BMNs on the plot, revealing

      somatotopy-based connectivity with LB23 neurons (i.e. ventral, dorsal, and posterior head).” We hope that this modification satisfies the Reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The strength of this manuscript lies in the behavior: mice use a continuous auditory background (pink vs brown noise) to set a rule for interpreting an identical single-whisker deflection (lick in W+ and withhold in W− contexts) while always licking to a brief 10 kHz tone. Behaviorally, animals acquire the rule and switch rapidly at block transitions and take a few trials to fully integrate the context cue. What's nice about this behavior is the separate auditory cue, which shows the animals remain engaged in the task, so it's not just that the mice check out (i.e., become disengaged in the W- context). The authors then use optical tools, combining cortexwide optogenetic inactivation (using localized inhibition in a grid-like fashion) with widefield calcium imaging to map what regions are necessary for the task and what the local and global dynamics are. Classic whisker sensorimotor nodes (wS1/wS2/wM/ALM) behave as expected with silencing reducing whisker-evoked licking. Retrosplenial cortex (RSC) emerges as a somewhat unexpected, context-specific node: silencing RSC (and tjS1) increases licking selectively in W−, arguing that these regions contribute to applying the "don't lick" policy in that context. I say somewhat because work from the Delamater group points to this possibility, albeit in a Pavlovian conditioning task and without neural data. I would still recommend the authors of the current manuscript review that work to see whether there is a relevant framework or concept (Castiello, Zhang, Delamater, 'The retrosplenial cortex as a possible 'sensory integration' area: a neural network modeling approach of the differential outcomes effect of negative patterning', 2021, Neurobiology of Learning and Memory).

      The widefield imaging shows that RSC is the earliest dorsal cortical area to show W+ vs W− divergence after the whisker stimulus, preceding whisker motor cortex, consistent with RSC injecting context into the sensorimotor flow. A "Context Off" control (continuous white noise; same block structure) impairs context discrimination, indicating the continuous background is actually used to set the rule (an important addition!) Pre-stimulus functional-connectivity analyses suggest that there is some activity correlation that maps to the context presumably due to the continuous background auditory context. Simultaneous opto+imaging projects perturbations into a low-dimensional subspace that separates lick vs no-lick trajectories in an interpretable way.

      In my view, this is a clear, rigorous systems-level study that identifies an important role for RSC in context-dependent sensorimotor transformation, thereby expanding RSC's involvement beyond navigation/memory into active sensing and action selection. The behavioral paradigm is thoughtfully designed, the claims related to the imaging are well defended, and the causal mapping is strong. I have a few suggestions for clarity that may require a bit of data analysis. I also outline one key limitation that should be discussed, but is likely beyond the scope of this manuscript.

      Major strengths

      (1) The task is a major strength. It asks the animal to generate differential motor output to the same sensory stimulus, does so in a block-based manner, and the Context-Off condition convincingly shows that the continuous contextual cue is necessary. The auditory tone control ensures this is more than a 'motivational' context but is decision-related. In fact, the slightly higher bias to lick on the catch trials in the W+ context is further evidence for this.

      (2) The dorsal-cortex optogenetic grid avoids a 'look-where-we-expect' approach and lets RSC fall out as a key node. The authors then follow this up with pharmacology and latency analyses to rule out simple motor confounds. Overall, this is rigorous and thoughtfully done.

      (3) While the mesoscale imaging doesn't allow for cellular resolution, it allows for mapping of the flow of information. It places RSC early in the context-specific divergence after whisker onset, a valuable piece that complements prior work.

      (4) The baseline (pre-stim) functional connectivity and the opto-perturbation projections into a task subspace increase the significance of the work by moving beyond local correlates.

      Key limitation

      The current optogenetic window begins ~10 ms before the sensory cue and extends 1s after, which is ideal for perturbing within-trial dynamics but cannot isolate whether RSC is required to maintain the context-specific rule during the baseline. Because context is continuously available, it makes me wonder whether RSC is the locus maintaining or, instead, gating the context signal. The paper's results are fully consistent with that possibility, but causality in the pre-stimulus window remains an open question. (As a pointer for future work, pre-stimulusonly inactivation, silencing around block switches, or context-omission probe trials (e.g., removing the background noise unexpectedly within a W+ or W- context block), could help separate 'holding' from 'gating' of the rule. But I'm not suggesting these are needed for this manuscript, but would be interesting for future studies.)

      We thank the reviewer for the comprehensive summary of our work.

      We also thank the reviewer for highlighting the work from the Delamater group (Castiello et al., 2021), and we now briefly discuss this paper on P. 14 Lines 434-437 writing: “RSC was shown to contribute to negative patterning in behavioral tasks requiring rats to learn that the simultaneous presentation of two stimuli lead to an opposite outcome than each individual stimulus (Castiello et al., 2021).”

      We also agree with the reviewer’s noted ‘Key limitation’ regarding the role of RSC as either maintaining context representation or serving a gating function. The reviewer proposes an exciting set of further experiments inactivating RSC at different time points to investigate when RSC activity is needed. We hope to carry out such experiments in the future. We now include a brief discussion of this interesting point on P. 14-15 Lines 455-459 writing: “First, further inactivation experiments would shed light on the timing at which RSC activity is necessary for the integration of contextual information. Specifically, it would be of great interest to inactivate RSC at different time points such as during the intertrial interval or at the transition between contexts.”

      We have of course also addressed each of the more detailed comments from the “Recommendations for the authors” section, please see below.

      Reviewer #2 (Public review):

      Summary:

      The authors aim to understand the neural basis of context-dependent sensory processing and decision-making.

      Strengths:

      They used an innovative behavioral paradigm where the action-outcome association changes independent of the sensory stimulus. This theoretically allows the authors to disentangle the effect of behavioral context on sensory processing. Using this approach combined with optogenetic silencing, they discover that RSC activity is necessary for suppressing a lick response when the stimulus switches to the unrewarded context.

      Weaknesses:

      Sensory processing appears to be entangled with jaw/tongue movement initiation. Activity in M1 and RSC during auditory-evoked lick responses appears to be identical to activity during whisker-evoked lick responses, indicating that movement initiation is the main driver of M1/RSC activity, rather than changes in the flow of sensory information. If sensory information were the main driver of the initial M1/RSC response, then auditory evoked responses should have a longer latency. Perhaps this is beyond the resolution of the calcium indicator or imaging frame rate. It is not clear from the data shown if differences in S1 activity when comparing W+ and W- stimulation are caused by context-sensitive sensory processing or whisker movement following whisker deflection.

      We thank the reviewer for the comments on our work and we agree that separating sensory processing and movement initiation is very important. In the revised manuscript, we have carried out several new analyses to specifically address the points of the reviewer. The most important point is that context-dependent activity in RSC emerges at ~50 ms after the whisker stimulus, which precedes any differences in movements of the jaw or whisker. Although sensory and motor representations become increasingly entangled after stimulus delivery, we think that the first ~100 ms after the whisker stimulus is a relatively safe period for analysing sensory processing and decision making before overt context-dependent differences in movements.

      Addressing the specific point “Activity in M1 and RSC during auditory-evoked lick responses appears to be identical to activity during whisker-evoked lick responses, indicating that movement initiation is the main driver of M1/RSC activity, rather than changes in the flow of sensory information.” - We have now directly compared the pattern of cortical activity evoked by whisker and auditory stimuli in correct trials in the W+ context (new Figure 3 – figure supplement 2). As expected, activity in wS1/wS2 and A1 is stronger in whisker and auditory trials respectively, following their sensory modalities. However, we also evidence a stronger response of wM1/wM2 in whisker trials as early as 40 to 60 ms following the stimulus, showing the specificity to the whisker system. We also observe a stronger response of RSC to whisker than to auditory stimulus. The auditory and whisker evoked responses are therefore different.

      Addressing the specific point “If sensory information were the main driver of the initial M1/RSC response, then auditory evoked responses should have a longer latency. Perhaps this is beyond the resolution of the calcium indicator or imaging frame rate.” – As stated above, the responses to auditory and whisker stimuli are different.

      Addressing the specific point “It is not clear from the data shown if differences in S1 activity when comparing W+ and W- stimulation are caused by context-sensitive sensory processing or whisker movement following whisker deflection.” - We think that the data shown in Figure 3F-H indicate that differences in S1 activity when comparing W+ and W- stimulation are not directly caused by context-sensitive sensory processing. On P. 9 Lines 270273 we write: “Early after stimulus onset, whisker deflection evoked similar activation of primary and secondary whisker somatosensory cortices (wS1 and wS2) in both W+ and W− contexts.” Indeed, context separation in wS1/wS2 only emerged later than 100 ms, which is indeed confounded by the difference in movement evoked by the sensory stimulus (now quantified in new Figure 3 – figure supplement 4). On the contrary RSC and wM1/2 responses to the whisker stimulus were different in W+ and W- at early time points (~50 ms for RSC and ~80 ms for wM1/2) which is consistent with context dependent sensory processing. At least 2 hypotheses could explain the absence of early difference in whisker evoked activity in wS1/wS2 between W+ and W-. The first one is that sensory activity in wS1/wS2 is not modulated by contextual information at all, while the alternative option would imply that sensory activity is mediated by different neuronal populations depending on context with an overall similar average response. We think this is an interesting question which we hope to address in future experiments using Neuropixels recordings and multiphoton cellular imaging to address the single neuron representation of whisker stimulus in wS1/wS2 according to context in the task presented here.

      We have of course also addressed each of the more detailed comments from the“Recommendations for the authors” section, please see below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Suggestions to strengthen the manuscript (no new data collection)

      (1) The block-switch dynamics were clearly demonstrated behaviorally. It would be very powerful to mirror this with an analysis of neural data around the block-switch: how do the various areas adjust immediately after a shift in the continuous contextual sound? Does the RSC show any evidence of changing activity patterns? How does the within-trial activity dynamic look as a function of the number of trials from the context switch? This could be done with the data collected for Figure 3 (for within-trial dynamics), but also for the pre-stimulus baseline activity data (Figure 4A-B).

      We thank the reviewer for raising this interesting point. We have now investigated the change of cortical activity at the transition between contexts (new Figure 3 – figure supplement 5). At the context transition, both to W+ and to W- contexts, we observed a rapid activation of the auditory cortex (new Figure 3 – figure supplement 5A). In addition, there appeared to be a slightly higher activation of RSC when transitioning to W- rather than to W+ (new Figure 3 – figure supplement 5A). In the future, it will be of great interest to further investigate this phenomenon.

      We also evaluated the whisker deflection-evoked responses of the different cortical regions according to the number of whisker trials from context switch (new Figure 3 – figure supplement 5B&C). This analysis revealed that while the sensory response in wS1 and wS2 were constant over the time course of a context block, the response of wM1/2 and especially RSC became progressively lower in the W- context, consistent with the behavioral results in Figure 1 supporting time-dependent contextual integration.

      Overall, these results strengthen the role of RSC and wM1/2 in integrating contextual information to guide the response to the whisker stimulus, and we thank the reviewer for raising this important point.

      (2) It might be useful to state 'earliest among the imaged dorsal cortical areas,' and briefly acknowledge potential subcortical contributors (since those were not explored and could be earlier than cortical areas).

      We agree with the reviewer. In the Summary, on P. 2 Line 39-40 we now write: “Widefield calcium imaging revealed that retrosplenial cortex was the first dorsal cortical area to show context discrimination in response to whisker stimulation”. On P. 8 Lines 257-258, we now write: “To investigate the spatiotemporal neural dynamics underlying task execution, we recorded calcium activity across the dorsal cortex in transgenic mice”. On P. 13 Lines 416-420 we now write: “Functional imaging of cortical activity with two different genetically-encoded calcium indicators each showed similar spatiotemporal dynamics of whisker sensory processing with the earliest contextdependent divergence in signalling being detected in RSC, out of the imaged dorsal cortical areas (Figure 3).” On P. 15 Lines 470-473, we now write: “Finally, it is of course important to note that many subcortical regions (as well as non-dorsal cortical regions, which were not imaged) are likely to contribute importantly to context-dependent task performance.”

      (3) Fit a simple exponential/logistic to lick probability vs time-since-switch (your Figure 1Hstyle analysis) to report a time constant with CIs; it will help quantify the integration of the continuous cue.

      We thank the reviewer for this suggestion. We have fitted an exponential to the grand average data to quantify the time constants for integration of contextual information before the presentation of the first whisker stimulus of the block (see new Figure 1H). On P. 6 Lines 170-173 we now write: “To assess whether this temporal integration would differ between contexts we fitted an exponential to the time evolution of the lick probability. This suggested a faster transition to the W+ context than to the W- context (W+ time constant: 9.4 s, W- time constant: 15.5 s) (Figure 1H).”

      (4) Because catch-trial false alarms are higher in W+ than W−, report per-context d′ and criterion for whisker trials (using signal detection theory); this separates sensitivity from bias and makes the behavioral shift more interpretable. It is also further proof that the behavior is contextual (versus a compound stimulus, for example).

      We have computed the d’ and criterion for the whisker trials in the W- and W+ contexts. (see new Figure 1 - figure supplementary 1D). As suggested by the reviewer, this further supports that the behavior is driven by contextual information.

      (5) For the pre-stimulus seed-correlation analysis, can you regress out the pupil/jaw/whisker activity to confirm whether the context modulation is (or is not) movement-driven? It would be helpful to better understand whether the baseline correlation is driven by differences in lowlevel factors between the contexts, versus the higher-level decision rule/context.

      The reviewer raises an interesting point. However, we did not find a straightforward way to regress out movements, and thus we leave this point for future in-depth analysis. On P. 11 Lines 354-357 we now write: “It is important to note that these context-dependent changes in resting-state functional connectivity could relate to the overt context-dependent movements in the prestimulus baseline (Figure 1I&J) and/or a manifestation of higher-level internal rule representations.”

      (6) For the earliest divergence analysis, is this consistent across animals and across sessions within animals? Can you show per-mouse distributions of first-crossing times (d′>2) for RSC vs wM1/2/wS2? This would help provide confidence in this key finding.

      The d’ presented in Figure 3H is computed as the discriminability between contexts at the population level, meaning that at each timepoint (from Figure 3F) we compared the 2 distributions built on N=6 mice. As such if the divergence between context was not consistent across animals this d’ would be low. That said, as suggested by the reviewer, we further investigated this context divergence at single mouse level and single session level. Our analysis supporting the main finding (Figure 3F-H) is shown in new Figure 3 – figure supplement 3.

      First, we show the results for a single mouse across sessions in Figure 3 – figure supplement 3A. We show the stimulus aligned activity in correct whisker trials in both contexts for the 3 recording sessions. For each session we quantified the main effect size defined as the difference of the trial average between contexts. Plotting the difference of mean response, we consistently observed that RSC ramps-up before wM1/2 for the 3 sessions.

      Second, across all individual mice: we further aggregated the session average responses to show discriminability between context for each region at the single mouse level (Figure 3 – figure supplement 3B). We show that RSC is the first region to exhibit context separation in 4 out of the 6 mice that we recorded. In 2 other mice all regions seemed to show context separation but without clear temporal ordering.

      Finally, when averaging across mice, we observed a clear separation and first discrimination in RSC (Figure 3F-H and Figure 3 – figure supplement 3C).

      Overall, these further analyses suggest that the early divergence of RSC activity appears to be robust with a consistent mean difference in single sessions and single mice, as well as across the population of mice. We think this analysis has strengthened our manuscript and we thank the reviewer for the valuable suggestion.

      (7) For the opto mapping data, could you provide P(lick) effect sizes with CIs per grid site? It would also be nice to summarize the qualitative dichotomy: RSC/tjS1 increases licking in W−; canonical wS1/wS2/wM/ALM decreases licking across contexts (to my understanding).

      We now provide the P(lick) effect sizes for the main cortical areas studied in the paper in Figure 2 – figure supplement 1C. This shows the relative change in lick probability in optogenetic trials compare to control trials for each mouse.

      Reviewer #2 (Recommendations for the authors):

      (1) Do mice move their whiskers after stimulus onset? If so, are these movements dependent on behavioral context? What causes the increase in S1 activity during auditory-evoked response trials?

      To answer the reviewer’s questions we have further investigated whisker movements following the sensory stimuli (whisker and auditory correct trials) in both contexts. The results of this analysis are presented in new Figure 3 – figure supplement 4.

      We find that mice move their whiskers shortly after the whisker stimulus in both contexts. The time course of whisker angle in correct whisker trials is similar in both contexts with a discriminability index (d’) consistently below 1. The whisker speed in response to stimulus is slightly higher in the W+ context compared to W- with a d’ slightly above 1 after ~100 ms. We also observed evoked whisker movements in auditory trials independent of context. Thus, whisker movements are indeed evoked by the sensory stimuli, but the overall context-dependent modulation of whisker movements is weak. The early differences in whisker-evoked cortical activity in W+ compared to W- contexts are therefore more likely related to the integration of contextual information than to differences in evoked movements.

      The reviewer is correct to point out that wS1 activity increases in auditory trials (Figure 3E). The response is initially very weak, but becomes more prominent after ~100 ms following the auditory tone. We do not know the underlying mechanisms, but there are several likely explanations. First, as discussed above, there are indeed some whisker movements evoked in response to the auditory stimulus (Figure 3 – figure supplement 4), which could result in sensory input to wS1. Equally, the increase could relate to licking, given the broad representation of movements in cortex and an appropriate reaction time in auditory trials (Figure 3C). Alternatively, wS1 activity in auditory trials could also be related to input connectivity from auditory cortex, top-down input from frontal cortex or subcortical regions such as high-order POm.

      (2) What do the authors think is causing the W+ vs W- difference in S1/S2 activity approximately 100ms after whisker deflection?

      The late W+ vs W- difference in wS1/wS2 activity could be explained by several factors. First this could be due to the difference in whisker movements after ~100 ms as shown in Figure 3 – figure supplement 4. Second this could be driven by the lick vs no lick activity (see reaction time in Figure 3C for whisker trials ~110 ms). Finally, this could be partly due to some movement independent top-down contextual information reaching wS1/wS2 at late time points. Overall, our claim in the paper is that there was no contextual difference in whisker primary and secondary cortices at early time points (before movement). On P. 9 Lines 270-273 we explicitly write: “Early after stimulus onset, whisker deflection evoked similar activation of primary and secondary whisker somatosensory cortices (wS1 and wS2) in both W+ and W− contexts.” In contrast, our main findings are grounded in the divergence of cortical activity in RSC and wM1/2 at early time points (<100 ms).

      (3) The choice of PC3 seems arbitrary. Is there no task-relevant information in PC1 and PC2?

      We appreciate the point raised by the reviewer and have clarified the reasoning leading to PC3 selection in the main text, where on P. 12-13 Lines 384-391 we now write: “The loadings of the first principal components were uniformly distributed and could reflect a late movement driven activation distributed across all cortical areas (Figure 4 – figure supplement 2C&D). PC2 loadings show variation along the anteroposterior axis that could reflect differences between sensory and motor regions but its time course does not separate between lick and no lick in control conditions (Figure 4 – figure supplement 2C&D). The loadings of PC3 highlighted task-related cortical regions and its time course exhibited clear differences comparing lick and no-lick trials.” In addition, we now also show the time courses for PC1 and PC2 in Figure 4 – figure supplementary 2D.

      Overall, the reasoning is the following:

      PC1 has spatially-homogeneous positive loadings (Figure 4 – figure supplementary 2C) and activity along PC1 gradually ramps up following sensory stimulation (Figure 4 – figure supplementary 2D). It is likely driven by widespread activation of the cortex following the whisker stimulus and the lick response. As such we believe that the taskrelated information captured by PC1 is movement related and not necessarily informative about processing of whisker and context.

      PC 2 has loadings varying along the antero-posterior axis (Figure 4 – figure supplementary 2C), which could be relevant for the task, but its time-course does not discriminate between lick and no lick neither in W+ nor W- (Figure 4 – figure supplementary 2D).

      PC3 has both loadings that vary between several cortical regions involved in the task (Figure 4 – figure supplementary 2C) and a time course that separates between lick and no lick in both contexts (Figure 4 – figure supplementary 2D). We thus focus on PC3 to investigate the effect of optogenetic inactivation on whisker stimulus evoked activity.

      The remaining components beyond PC3 contain a very small fraction of variance and were thus not considered.

      (4) Figure 3 - Supplement 1: What explains the change in fluorescence in GFP/tdT mice during W+ stimulation? Is it brain movement on the z-dimension? Could this explain differences in calcium imaging results?

      We thank the reviewer for this question. The nature of intrinsic signals is a complex topic, but brain movement is unlikely to contribute importantly, because under similar behavioral conditions we (and others) typically find brain movements to be on the scale of a few microns. The three most widely-reported contributions to intrinsic optical changes in cortex relate to:

      (i) Light scattering – as neurons integrate synaptic inputs and fire action potentials, the neuronal elements swell slightly due to the ionic and water fluxes (see for example Vincis et al. Cell Reports 2015, doi: 10.1016/j.celrep.2015.06.016). This reduces the refractive index mismatch between the intracellular and extracellular space. This in turn reduces light scattering, which could result in fluorescence increases.

      (ii) Hemodynamics – changes in blood volume and changes in oxygenation/deoxygenation will change the absorption of light at different wavelengths, in an activity-dependent manner (also forming the basis of BOLD fMRI signals).

      (iii) Flavoproteins – endogenous fluorescent proteins, such as flavoproteins present at high levels in mitochondria, have been reported to change their fluorescence depending upon neuronal activity, presumably in relationship to increased mitochondrial activity.

      We therefore think it is very important to image GFP/tdTomato-expressing mice as controls, and we would suggest that this should be carried out more commonly in the field. Indeed, similar to our results, another study (Yogesh et al., eLife 2025, doi: 10.7554/eLife.104914) recently reported upon the importance of carefully examining intrinsic fluorescence changes, which were found to be present in both wide-field and two-photon imaging of GFP expressing mice.

      Our results reported in Figure 3 – figure supplement 1, show that GFP/tdTomato signals over the first ~120 ms following whisker stimulation were much smaller that the equivalent changes in GCaMP6f/jRGECO1a-expressing mice, and therefore would only have a minor contribution to our analyses. However, we refrained from analysing fluorescence changes at later post-stimulus times, because the intrinsic signals indeed become increasingly prominent as the mice initiate licking.

    1. Author response:

      The following is the authors’ response to the original reviews

      General note

      We have issued a new release of the general Peekbank database, 2026.1, which includes more data integrity checks and several more datasets. As a result of this release, the underlying dataset we use in our paper has shifted slightly. The shifts represent a relatively small proportion of the total data and thus these changes have caused only relatively minor changes to our numerical results. We also highlight that we now include a small amount of data regarding children younger than 12 months, increasing the developmental range of our analysis (see Figure 1).

      Reviewer 1 (Public review):

      The limitations of the study are acknowledged to some extent, but need to be improved and ensured that they run throughout the manuscript. Thus, in the discussion, the authors note that the approach is observational and exploratory, and highlight for me a key alternative explanation of the findings, namely that faster children could be faster due to their larger vocabulary, rather than faster children learning more words. Indeed, the latter explanation for the relationship is called into question, given that growth in speed was not related to growth in vocabulary. Here, the authors note that the null result may be related to the fact that they do not sufficiently precise estimates of growth slopes, rather than taking the alternative explanation seriously that there may not be as causal a link between being a faster word learner and a better word learner (learn more words).

      Thank you very much for your challenging and thoughtful comments. In hindsight we did not realize that the way we were writing about our results was ambiguous between several interpretations (one of which we endorse and one of which we do not).

      We respond below to the specific suggestions about causal directionality in the longitudinal analysis, but we certainly believe that we cannot draw strong conclusions about causality from our dataset and have attempted throughout the paper to remove causal language that might have crept into our interpretation.

      In response to your comments, we have made a number of key revisions aimed at qualifying and clarifying our points:

      • The abstract now prominently notes that our design is observational: “In an observational study…”

      • The abstract notes a positive and a negative result in the relationship between word recognition and vocabulary: “Further, across a range of longitudinal models, speed, accuracy, and vocabulary were coupled. Children with overall faster word recognition tended to show faster vocabulary growth, though developmental growth in word recognition skill was not specifically associated with growth in vocabulary.”

      • The abstract removes potential casual language in the final sentence: “... these findings support the view that word recognition is a skill that develops gradually across early childhood and that this skill is deeply intertwined with early language learning.”

      • A new paragraph in the Results introduces the potential hypotheses investigated via the longitudinal models.

      • The final paragraph of the Results section sharpens the contrast between two possible growth hypotheses: “However, we did not find evidence for the stronger version of this claim: in neither the non-linear growth model nor the linear SEM did we find evidence that increases in speed were related to increases in vocabulary size. Thus, our findings do not support a ‘virtuous cycle’ model in which increases in recognition specifically lead to increases in vocabulary size.”

      We hope these changes lead to a manuscript that better aligns with the limitations of the study.

      This is especially since, but correct me if I’m wrong here, the current vocabulary size is not taken into consideration in the model examining vocabulary growth. Given the increasing number of studies showing that current vocabulary knowledge predicts vocabulary growth (Laing, Kalinowski et al, Siew & Vitevitch), one simple alternative explanation is that current vocabulary knowledge predicts both current word recognition skill and later vocabulary knowledge. Is there anything in the data speaking against this hypothesis?

      We think the reviewer’s overall point is generally correct, as we described above, but we want to clarify a specific statistical point. The non-linear longitudinal model of vocabulary growth does in fact take into account a child’s average vocabulary size. (This point feels tricky in a non-linear model but it’s actually quite similar to a linear model for the purposes of this discussion). Basically, vocabulary (at all timepoints) is modeled as a function of age, with both main effects and interactions with age. Critically, each participant is also modeled as having a random intercept capturing their deviation from the average growth pattern across ages (as expressed by the fixed effects). In this model, the “main effect” (here captured by the intercept for the logistic curve in the model) that we observe for speed indicates that vocabulary growth for individuals is predicted to be faster (their curve is shifted left) if their RTs are fast. The presence of the random effects in this model thus “controls” for the fact that some participants have overall higher vocabularies (and are shifted up relative to the average growth curve).

      But, we note that this model does not show an “interaction effect” (here captured by the null effect of RT on the slope parameter in the logistic model). That’s one of the null effects that we now call out much more prominently in the abstract and end of the results (per our response above).

      Equally, while the SEM examines vocabulary growth controlling for age, I wonder about the other way around. What would happen to the effect of age on word recognition skill (in the LME model, S8) if one were to add concurrent vocabulary size? So does chronological age explain word recognition skill or vocabulary knowledge? Right now, the manuscript describes this effect purely related to chronological age, but is it age per se or other cognitive abilities, including a key change across development, namely, vocabulary size? Thus, the presentation of the skill learning hypothesis suggests that age is a proxy for experience, while you actually have here a very nice proxy for experience in terms of children’s vocabulary size.

      Again, thank you for engaging with this tricky set of issues. Overall, our goal is to adjust the manuscript to reflect points of agreement; in particular, we agree that age is a proxy for language experience, vocabulary, and other cognitive changes, and we have stated this explicitly now in the intro to the factor analyses: “In our prior analyses, chronological age acts as a proxy for greater language experience and larger vocabulary as well as a host of other correlated developmental changes in cognition. Now we explicitly explore relations to vocabulary growth and the triadic relationship between age, word recognition, and vocabulary.”

      On the statistical side, we do think that the NLME (non-linear mixed effects; the logistic growth mode) effectively controls for average vocabulary size, as described above. The longitudinal SEM also relates vocabulary growth to growth in word recognition skill. In both models, we find no evidence for coupled growth; instead the evidence points to children with higher baseline word recognition skill showing faster growth in vocabulary (speed intercept significantly related to vocabulary slope, -.14, p < .01) but not the reverse (vocabulary intercept not strongly related to speed slope; -.01, ns).

      More generally, we hope our edits to the paper, detailed above, both clarify this tricky set of issues and also remove inappropriate casual language throughout.

      Critically, while the discussion is more nuanced, the way the abstract is concluded and the way the Introduction is phrased suggest that the study is able to answer a causal question, which, as the authors themselves note, is not possible. The abstract, for instance, states that word recognition becomes faster, more accurate and less variable...consistent with a process of skill learning. And also that this skill plays a role in supporting early language learning, which is very causal language. I don’t think you can really claim that you are testing the two hypotheses you suggest here. The work is definitely embedded in the context of these hypotheses, but are you really able to test them? My worry is that while the discussion is more nuanced, the extent to which this study will then be cited down the line as showing that children learn more words down the line because they are faster at recognizing words, and anything that you can do to tamper with such interpretations would be good for the literature. For me, this should not just be relegated to the discussion but should be touched upon in the abstract and Introduction.

      Thanks for pushing us to be more precise with how we frame and describe our findings. We agree with the reviewer that our findings do not warrant strong conclusions about the causal role of word recognition skill in vocabulary growth. Per our response above, we have now tried to carefully revise our language throughout the paper (in particular, in the abstract and introduction, as noted by the reviewer).

      Finally, it would help to talk more about the mechanisms at work in any relationship between word recognition and language learning. It seems to me that this would rely on some predictive processing framework, given the description on page 4, and it would be good to make this clear (faster and more accurately you can recognize a ball, better use this evidence to infer the speaker’s intended meaning).

      Thanks, this is a great point. We’ve revised this text and added references to predictive processing, unpacking a problematic paragraph into two:

      “Familiar word recognition -- as measured by LWL -- is hypothesized to play a key role in language learning (19). The idea, in a nutshell, is that the faster and more accurately a child can process incoming words, the more opportunities they have for learning. Consider a child hearing the utterance "Can you put the ball in the crate?" The better the child can recognize the word "ball", the better they can use this evidence to help infer the speaker's intended meaning, allowing possible inferences about the meaning of the less familiar word, "crate" (20).

      “Real time language processing, including word recognition, relies heavily on predictive processing, in which comprehenders integrate expectations from prior linguistic context with noisy and ephemeral incoming signals (21, 22). The more input a child receives, the better their predictions are likely to be, and hence the more they can learn (19, 23). Indeed, measurements of children's language input at home are consistently associated with their vocabulary size (24, 25). And, in line with this predictive processing framework, one important study found that children's word recognition speed mediated the longitudinal relationship between home language input and vocabulary growth (26). Thus, word recognition is thought to be a key support for ongoing word learning.”

      Equally, when referring to word recognition, it would be good to clarify what this refers to - how well a child knows what a word refers to (and in the context of LWL, what it does not refer to) or how quickly it directs attention to what is referred to.

      Thanks, we’ve added a capsule definition in the second paragraph, and added the sentence “This procedure [LWL] measures the general construct of word recognition by operationalizing knowledge of a meaning as visual attention to a specific named referent.” We hope this clarifies the relationship between LWL and word recognition.

      With regards to the data, I wonder if there is a clustering of kids past 24 months that is happening here, looking at Figures 1 and 2, where it seems like there is less change past the 24-month point. Is there any way to look at whether the effect of age or vocabulary on word recognition is not linear but asymptotic?

      Thanks for pointing this out; we do see what you are talking about but think it’s being handled appropriately in the analysis. In Figure 1 it clearly looks like changes to RT are asymptotic – this is why we analyze the logarithm of RT throughout the paper. In Supplement S6 we show that reaction time is indeed best fit by a log-log function. Your question about Figure 2 asks whether there is further structure beyond the log-log fit; in Supplement S7 we show some analyses that suggest a polynomial fit is not better than the log-log fit; there is some small additional linear effect of age over and above the log-log fit, but it’s minor and pretty hard to interpret in our view.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Page 3. Word production may manifest in overt behaviour but need not reflect complete knowledge. A child can say the word dog and use it to refer to a cat.

      This is a good point. Since we are not able to speak to the precision of meaning representations (an important issue in its own right), we have omitted the phrase “with incomplete knowledge.”

      Page 4. The first two sentences of the paragraph beginning with word recognition ability... don’t go together. The second sentence does not support the claim that word recognition plays a role in language learning.

      Thanks, we’ve tried to smooth out this transition as part of unpacking the role of predictive processes.

      Page 4. “predicts children’s standardized test scores years later” - make clear what test scores are here.

      We added some additional details. The specific tests were the CELF (expressive language) and the KABC (IQ), but we thought too much detail might be distracting.

      Page 5. I love Table 1, but would like for the data to be weighted somehow. So, given that some studies had a lot more trials and more children, what percentage of the data did this study contribute? That allows a clearer view of how biased the sample is in certain studies. The x in CDIS and longitudinal could be aligned to the right. I kept wondering why there was an x near some trials.

      Thanks, we’ve adjusted the table to add the percentage of the total dataset (in trials) due to each study and fixed the alignment issue.

      Page 6. 12 million individual samples: what samples are these? Individual data points per trial per time point. Making this clear would be great.

      Clarified, thanks.

      Page 9. Your accuracy measures only seem to consider the target. From what I remember of my preferential looking days, this measure usually also includes the distractor. Why do you not do this? This is especially since you have such a wide age range, so if a 12-month-old only looks for about 50 per cent of the trial and spends that time looking at the target, that is very different from a child who looks at the screen all of the trial and spends less time looking at the target here.

      Sorry for any lack of clarity: we do in fact compute accuracy as the ratio of looking to target over looking to target plus looking to distractor. We have added this information to the parenthetical referenced above: “... accuracy (more target looking; computed as the ratio of target to target plus distractor looking)”.

      Page 12. I only found out that age was in this model by looking at S9.

      Thanks for mentioning this omission, we’ve clarified in the text: “We initially add age as an additional variable to our models to explore whether this factor structure relates to age; later we treat age as a predictor of latent factors.”

      Page 12. Isn’t it trivial that speed and accuracy show negative covariance, especially given how you measure accuracy? Thus, if I take longer to fixate the target, I have less time to look at the target during the trial. If, however, I included the distractor in my accuracy measure, then I could still take longer to look at the target, but still look more at the target than the distractor.

      Thanks for mentioning that this covariance is not the key result of interest; that observation didn’t come out in the text. Now we note that this covariation is “... as expected since they [speed and accuracy] are derived from the same data.” Note per above that accuracy is computed as target / target + distractor looking; even so, your observation is correct: slower looking at the target means lower accuracy at least to some degree.

      Page 19. If you excluded data from trials with less than 50% of timepoints, how did this vary across age? Arguably, your study has to worry less about this, given your sample size, but it would be nice to know, which you could include in the percentage of data that each study contributed to the final sample.

      Thanks, we’ve added this information to a new table in S1.

      Reviewer #2 (Public review):

      First, I wasn’t entirely clear about what the authors meant by “word recognition ability”. For much of the manuscript (including the use of the term “word recognition ability” itself), this comes across as an intrinsic ability or skill that improves with development. Alternatively, the speed and accuracy metrics taken from studies in Peekbank might capture children’s increasing knowledge of the common, concrete words typically used in these studies. To me, this is a somewhat different construct from a general skill at recognizing words. It would be helpful if the authors could clarify which construct they intend to capture, or if it is not possible to distinguish between these constructs from the Peekbank data.

      In response to this comment and related comments above, we’ve added text to the first two paragraphs trying to clarify the general construct that we’re talking about – recognizing the meaning of a word in real-time language comprehension. We’ve also clarified several times throughout the introduction that we’re talking about familiar word recognition, that is, the ability to recognize specific known words. Further, we directly acknowledge the issue above in the introduction:

      “Critically, most word recognition paradigms use words that children at the target age are reported to understand and produce. They are thus not indices of vocabulary size but rather measures of how quickly and accurately the child can recognize a familiar spoken word and use it to guide their visual attention to a referent. However, it is unknown the extent to which specific responses reflect an individual child's general speed of language processing versus their familiarity of specific words.”

      Second, and relatedly, if the source of the age-related improvements is increasing experience with the common concrete words used in the Peekbank studies, then one might expect word recognition and improvements with age to be related to word frequency, given that more frequent words are experienced more often. Word frequency predicts word knowledge when assessed using CDI data. Can effects of frequency be detected in Peekbank word recognition metrics? If not, why? Similarly, is the speed and accuracy of word recognition in Peekbank data related to CDI-derived word age of acquisition, and again, if not, why?

      This is a fascinating set of ideas, and one that we’ve pursued extensively using the Peekbank data. Unfortunately, we think it is out of scope for the current paper, which focuses on child-level metrics (including vocabulary and processing measures). Right now the current paper doesn’t include any analysis of individual words.

      Just to expand a bit on the problem here: unfortunately, modeling word recognition as a simple linear function of (log) word frequency is only possible in the case that distractors are held constant (e.g., “ball” always has “book” as its distractor), because distractor frequency plays an important role in the recognition process. However, in our dataset, words are paired with many different distractors across studies. This property means a fairly complex model of the LWL decision process would be necessary for a model to successfully predict effects for individual words. While such a model is an exciting research goal, it’s not something we can include in the current manuscript.

      Finally, there is a bit of a risk of the main findings of this paper coming across as a foregone conclusion. I.e., how could it be otherwise that word recognition improves with development?

      Reviewer #2 (Recommendations for the authors):

      Regarding the feedback about the risk of the findings coming across as a foregone conclusion - perhaps a primary place in the paper where it would be useful to clarify this point is on page 6, in the paragraph beginning, “We investigate two specific hypotheses here. First, one influential theory...”. Here, it might be worth clarifying whether there are alternative ideas about the emergence of word recognition in childhood that predict different patterns, so that the findings of the current paper can be framed as shedding new light on word recognition in development, rather than a confirmation of the common-sense idea that word recognition must improve over development.

      Thanks, we appreciate this feedback and it’s something we’ve struggled with in this project. Our conclusion is that this paper does not constitute a binary hypothesis test of e.g., whether word recognition is linked to vocabulary development. Instead, we lean into the idea that there are empirical issues (rather than hypotheses) that have not been quantified sufficiently. Thus, we end the revised introduction with the following paragraph:

      “Across both of these issues, the contribution of our work here lies in the detailed quantitative description of development. Nearly every theory of language learning assumes some role for continuous developmental change in word recognition, but these assumptions have not previously been anchored to specific measurements. Hence neither the functional form of the assumed changes nor their concurrent and predictive relationships to vocabulary have been quantified. We leverage the Peekbank dataset to accomplish these goals.”

    1. Author response:

      The following is the authors’ response to the original reviews

      The following revisions have been made to address most of the publicly available suggestions made by the Reviewers.

      We have also corrected formatting issues in two figure panels:

      Fig.1B: embryo ages added over placenta images.

      Fig. 4D: fixed a truncated label.

      Reviewer #1 (Public review):

      The study would benefit from clearer evidence and additional experiments that would help to establish the molecular and cellular mechanisms underlying the brain phenotype, the central topic of the work.

      We agree that additional experiments are necessary to elucidate the mechanism(s) by which EML3 deficiency causes the observed developmental phenotypes. However, as no further experimentation is possible due to the closure of our laboratory, we are committed to sharing available materials including custom antibodies and cryopreserved sperm from our mouse lines. We include previously generated experimental data not presented in the original submission. While these additional data do not reveal the mechanisms, we believe that sharing hypotheses that were experimentally ruled out will benefit the scientific community.

      M&M: we have added a section listing several tissue-specific Eml3 KOs generated. All of the generated cKO mice were indistinguishable from Eml3<sup>wt</sup> controls.

      Supp. Fig. 2 with staining for major PBM components has been added. We have included antibody information to M&M.

      Reviewer #2 (Public review):

      (1) While the manuscript presents valuable data, there are also several weaknesses that limit the overall impact of the study. Most notably, there is no clear mechanistic link established between the loss of Eml3 function and the observed phenotype, leaving the biological significance of the findings somewhat speculative, as it is not straightforward how a microtubule-associated protein can have an impact on the stability of the pial basement membrane. In this respect, but also in general for the whole manuscript, there seems to be a considerable amount of experimental work that has been conducted but is not presented, possibly due to the negative nature of the results. At least some of those results could be shown, particularly (but not only) the stainings for the composition of the ECM components.

      We agree that additional experiments are necessary to elucidate the mechanisms at play. While we cannot conduct further experiments, we provide additional existing data, including a new Supp. Fig. 2 showing ECM component staining. As this reviewer rightly anticipated, these results might not clarify the mechanism but sharing the hypotheses that were already experimentally tested will be helpful.

      (2) Additionally, the phenotype reported appears to be dependent on the genetic background, as it is absent in the CD1 strain. This observation raises concerns as to how robust the results are and how much they can be generalized to other mouse strains, but, more importantly, to humans.

      Indeed, we have determined that genetic background greatly influences the manifestation of developmental defects caused by absence or mutation of the EML3 protein in mice. Modifier genes appear to play a significant role in phenotypic expression. In humans, the presence or absence of such modifiers may result in a broad spectrum of outcomes from no clinical relevance, as seen in CD1 mice, to potential intrauterine mortality. We agree that this underscores the challenge of translating mouse model findings to human implications. Future studies could include a search for EML3 non-coding regulatory mutations and expanded analysis of neuronal development defects, such as COB, as well as cases of intrauterine growth restriction (IUGR).

      (3) There is no data included in the manuscript about the generation and analysis of the Eml3AAA/AAA mouse line. This is an important omission, especially as no details on the validation or phenotypic characterization of this additional mouse line are provided. Including these elements would greatly strengthen the rigor and interpretability of the work, especially if that mouse line is to be shared with the scientific community.

      We acknowledge this oversight and have added a Materials and Methods section describing the generation of Eml3 TQT86AAA mice. Validation of the Eml3 TQT86AAA mice included showing absence of EML3-DYNLL binding in our co-IP MS data in Table 3. We state that the validated Eml3 TQT86AAA mice were phenotypically indistinguishable from Eml3<sup>wt</sup> control mice.

      Reviewer #3 (Public review):

      (1) Besides the data provided in the figures, the authors report a significant amount of experiments/results as "Data not shown". Negative data is still important data to report, and the authors may want to choose some crucial "not shown data" to report in the manuscript.

      We have incorporated key datasets previously omitted, with priority given to those specifically requested by Reviewer #2.

      (2) Results in Figure 3A apparently contradict results in 3B. A better explanation of the results should improve understanding of the data. Even though the conclusion that the "onset and progression of neurogenesis is normal in Eml3 null mice" seems logical based on the data, the final numbers are not (Figure 3A) and this should be acknowledged, as well.

      We provide further explanations for the data presented in figures 3A and 3B to better convey the fact that the two datasets are not contradicting. In essence, since Eml3 null mice are developmentally delayed (as determined by the number of somites at a specific age, Fig. 1C), the milestones in neurogenesis are reached at a later age in Eml3 null mice, thus at embryonic age E11.5 Eml3 null mice have fewer TBR2-positive cells (Fig. 3A). However, Eml3 null mice have reached the same neurogenesis milestones as their WT counterparts when they have the same number of somites (Fig. 3B).

      Results section for Fig. 3: we provide additional explanations that reconcile the results shown in Fig. 3A and Fig. 3B.

      (3) The authors should define which cell types are identified by SOX1 and PAX6.

      We have defined the expression timing and cell identity marked by SOX1 and PAX6 in neural progenitors during cortical development.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to express our deep appreciation to the editor and reviewers for their constructive comments and suggestions, which have significantly improved the quality of our manuscript. In response, we have carefully revised the manuscript, addressed all comments, and performed additional experiments and analyses to strengthen our findings.

      (1) We repeated retrograde tracing using CTB-647 to verify precise targeting of SPN and DGC neurons, as shown in the new Figure 7.

      (2) We performed dual retrograde tracing combined with fiber photometry or optogenetic activation to investigate the role of PMC dual-projecting neurons in the control of urination, as shown in Figure supplements 11 and 12.

      (3) We conducted new experiments activating PMC<sup>ESR1+</sup> neurons after PDNx to assess their role in urination, as shown in new Figure 6.

      (4) We added a more detailed analysis of the dynamics of neural responses in PMC<sup>ESR1+</sup> neurons in Figure supplements 3F-3G.

      (5) We analyzed peak Ca<sup>2+</sup> signals in the PMC during and after the onset of EMG bursting, as shown in Figure supplement 4F.

      (6) We added a comparison of spontaneous and light-induced spikes in PMC<sup>ESR1+</sup> neurons, as shown in Figure supplements 3B–3C.

      (7) We expanded the Discussion to address how PMC<sup>ESR1+</sup> neurons coordinate bladder contraction and sphincter relaxation to control both the initiation and suspension of urination.

      We hope these revisions meet the reviewers' expectations and contribute to the improvement of our manuscript.

      Reviewer #1 (Public review):

      Summary:

      Urination requires precise coordination between the bladder and external urethral sphincter (EUS), while the neural substrates controlling this coordination remain poorly understood. In this study, Li et al. identify estrogen receptor 1-expressing neurons (ESR1+) in Barrington's nucleus as key regulators that faithfully initiate or suspend urination. Results from peripheral nerve lesions suggest that BarEsr1 neurons play independent roles in controlling bladder contraction and relaxation of the EUS. Finally, the authors performed region-specific retrograde tracing, claiming that distinct populations of BarEsr1 neurons target specific spinal nuclei involved in regulating the bladder and EUS, respectively.

      Strengths:

      Overall, the work is of high quality. The authors integrate several cutting-edge technologies and sophisticated, thorough analyses, including opto-tagged single unit recordings, combined optogenetics, and urodynamics, particularly those following distinct peripheral nerve lesions.

      We are grateful for your insightful and constructive comments, which affirmed the importance and technical depth of our work. Thank you for dedicating your expertise and time to reviewing our manuscript. Guided by your suggestions, we have revised the paper as detailed below.

      Weaknesses:

      (1) My major concern is the novelty of this study. Keller et al. 2018 have shown that BarEsr1 neurons are active during urination and play an essential role in relaxing the external urethral sphincter (EUS). Minimally, substantial content that merely confirms previous findings (e.g. Figures 1A-E; Figures 3A-E) should be move to the supplementary datasets.

      Thank you for this valuable and constructive comment. We fully agree that the novelty of our study relative to Keller et al., 2018 must be made explicit. Keller et al. established that PMC<sup>ESR1+</sup> neurons are active during socially evoked urine-marking behavior (voluntary urination) and demonstrated their essential role in relaxing the EUS. Their study mainly focused on behavioral context and EUS relaxation. In contrast, our work addresses a distinct, mechanistic question: how these same neurons participate in reflexive, physiological urination and coordinate both bladder detrusor contraction and EUS relaxation.

      Novel aspects of the present study:

      (1) Temporal dynamics of PMC<sup>ESR1+</sup> neurons during reflexive micturition.

      Using opto-tagging and single-unit recordings, we reveal the precise firing pattern of PMC<sup>ESR1+</sup> neurons during reflexive voiding. Simultaneous fiber photometry, cystometry, and EUS-EMG recordings demonstrate that population-level activity of PMC<sup>ESR1+</sup> neurons precedes and tightly correlates with both bladder contraction and EUS relaxation a coordination not previously demonstrated.

      (2) Causal role in reflexive urination.

      Manual closed-loop optogenetic inhibition at the onset of reflexive voiding acutely terminates EUS bursting and bladder contraction, immediately halting urine release.

      (3) Dual control of bladder and EUS.

      Optogenetic activation combined with selective pelvic or pudendal nerve transection shows that PMC<sup>ESR1+</sup> neurons drive both bladder contraction and EUS relaxation, revealing a coordinating role beyond EUS relaxation alone.

      (4) Anatomical substrate for coordinated control of bladder contraction and EUS relaxation in reflexive urination.

      Retrograde tracing identifies three spinal-projecting sub-populations: SPN-only, DGC-only, and dual-targeting neurons, providing a circuit-level explanation for the simultaneous control of bladder and EUS.

      Following your suggestion, panels that merely replicate Keller et al. (former Figures 1A–1E and Figures 3A–3E) have been moved to new Figure Supplements 1 and 7, respectively, so that the main figures now emphasize the new mechanistic findings.

      (2) I also have concerns regarding the results showing that the inactivation of BarEsr1 neurons led to the cessation of EUS muscle firing (Figures 2G and S5C). As shown in the cartoon illustration of Figure 8, spinal projections of BarEsr1 neurons contact interneurons (presumably inhibitory) that innervate motor neurons, which in turn excite the EUS. I would therefore expect that the inactivation of BarEsr1 should shift the EUS firing pattern from phasic (as relaxation) to tonic (removal of relaxation), rather than stopping their firing entirely. Could the authors comment on this and provide potential reasons or mechanisms for this finding?

      Thank you for this crucial comment. We apologize that the representative EUS-EMG traces in Figures 2G and S5C were too small to be clearly seen and that the corresponding results description was not sufficiently accurate. We have now replaced these EMG traces with enlarged versions (revised Figures 2G and S5C) and revised the corresponding Results section (lines 184, 197, 340-341). Based on the enlarged traces, we found that acute photoinhibition of PMC<sup>ESR1+</sup> neurons at the onset of phasic EUS-EMG bursting shifted the EUS firing pattern from large-amplitude phasic bursts to low-amplitude tonic firing. This suggests that ongoing activity of PMC<sup>ESR1+</sup> neurons is required to maintain phasic EUS bursting. A similar shift from phasic to tonic EUS-EMG activity during optogenetic silencing of PMC<sup>ESR1+</sup> neurons was reported by Keller et al., 2018 (Figure supplement 8C), confirming the reproducibility of the phenotype. We propose that the potential mechanism of this low-amplitude tonic activity may be mediated in part by a spinal reflex pathway (the guarding reflex) for preventing urination, whereby the loss of PMC<sup>ESR1+</sup> neurons-mediated supraspinal facilitation reduces inhibition of spinal interneurons, leading to enhanced baseline excitability of EUS motor neurons in response to bladder afferent input during bladder distension (William C. de Groat et al., Comprehensive Physiology. 2015, PMID: 25589273).

      (3) Current evidence is insufficient to support the claim that the majority of BarEsr1 neurons innervate the SPN but not DGC. The current spinal images are uninformative, as the fluorescence reflects the distribution of Esr1- or Crh-expressing neurons in the spinal cord, along with descending BarEsr1 or BarCrh axons. Given the close anatomical proximity of these two nuclei, a more thorough histological analysis is required to demonstrate that the spinal injections were accurately confined to either the SPN or the DGC.

      Thank you for raising this important concern. To rigorously verify that our spinal injections were confined to either the SPN or the DGC, we performed new retrograde-tracing experiments in ESR1-Cre and CRH-Cre mice. We injected a mixture of AAV-Retro-DIO-mCherry or AAV-Retro-DIO-EGFP with the retrograde tracer CTB-647 specifically into the SPN or DGC (Methods, lines 465-466). Only animals in which CTB-647 fluorescence was strictly limited to the target nucleus, without detectable spread to the adjacent region, were included in the analysis (new Figures 7A and 7E). These results confirm our original observation that PMC<sup>ESR1+</sup> neurons comprise three distinct spinal-projection subpopulations: one (19.0%) targeting the SPN, one (52.2%) innervating the DGC, and a third (28.8%) projecting to both regions (Results, lines 304–306; new Figures 7F–7H). In addition, the majority of PMC<sup>CRH+</sup> neurons project to the SPN but not the DGC (new Figures 7B–7D; Results, lines 297–301). We have assembled new Figure 7 using the newly acquired spinal images and the validated data.

      Reviewer #1 (Recommendations for the authors):

      From the abstract: "Anatomically, PMCESR1+ cells possess two subpopulations projecting to either the pelvic or pudendal nerve". I don't think these neurons directly project to either nerve.

      Thank you for this precise comment. We apologize for incorrectly stating that PMC<sup>ESR1+</sup> cells project directly to the pelvic or pudendal nerves. In the revised Abstract (lines 32–36) we have rephrased the sentence to clarify the actual anatomy: “Anatomically, PMC<sup>ESR1+</sup> neurons consist of three distinct spinal-projection-based subpopulations: one targeting the sacral parasympathetic nucleus (SPN), one innervating the dorsal gray commissure (DGC), and a third that projects to both regions, thereby enforcing the coordination of bladder contraction and sphincter relaxation in a rigid temporal sequence.”. We trust this revision now accurately reflects the anatomical findings.

      Reviewer #2 (Public review):

      Summary:

      The authors have performed a rigorous study to assess the role of ESR1+ neurons in the PMC to control the coordination of bladder and sphincter muscles during urination. This is an important extension of previous work defining the role of these brainstem neurons, and convincingly adds to the understanding of their role as master regulators of urination. This is a thorough, well-done study that clarifies how the Pontine micturition center coordinates different muscle groups for efficient urination, but there are some questions and considerations that remain.

      Strengths:

      These data are thorough and convincing in showing that ESR1+PMC neurons exert coordinated control over both the bladder and sphincter activity, which is essential for efficient urination. The anatomical distinctions in pelvic versus pudendal control are clear, and it's an advance to understand how this coordination occurs. This work offers a clearer picture of how micturition is driven.

      We sincerely thank you for highlighting the rigor of our study and for recognizing the advance in understanding how PMC<sup>ESR1+</sup> neurons exert coordinated, anatomically segregated control over bladder and sphincter. We also appreciate the constructive suggestions that helped us further improve clarity, which we address point-by-point below.

      Weaknesses:

      The dynamics of how this population of ESR1+ neurons is engaged in natural urination events remains unclear. Not all ESR1+ neurons are always engaged, and it is not measured whether this is simply variation in population activity, or if more neurons are engaged during more intense starting bladder pressures, for instance. In particular, the response dynamics of single and doubly-projecting neurons are not defined. Additionally, the model for how these neurons coordinate with CRH+ neuron activity in the PMC is not addressed, although these cell types seem to be engaged at the same time. Lastly, it would be interesting to know how sensory input can likely modulate the activity of these neurons, but this is perhaps a future direction.

      Thank you for this insightful comment. First, we agree that not all ESR1+ neurons are consistently engaged during urination (Figure 1B). Because bladder pressure was not measured during the opto-tagging experiments, we cannot determine whether this reflects trial-to-trial variability in population activity or pressure-dependent recruitment of additional neurons. We speculate that stronger starting bladder pressures may recruit a larger subset of ESR1+ neurons, analogous to graded, pressure-dependent recruitment observed in peripheral sensory neurons (Bruns et al., J Neural Eng. 2011, PMID: 21878706; Marshall et al., Nature. 2020, PMID: 33057202).

      Second, using fiber photometry recording and optogenetic activation, we examined the dynamics of dual-projecting neurons in the PMC that were retrogradely labeled from the SPN and DGC. Their activity correlated with bladder contraction and sphincter relaxation, and optogenetic activation sequentially induced these events to trigger urination (see Recommendation #8). Although retrograde labeling captured only a subset of dual-projecting neurons, the results indicate that they coordinate bladder and sphincter activity.

      Third, previous studies suggest that PMC<sup>CRH+</sup> cells are associated with bladder contraction and likely serve as an integration center for context-dependent micturition behavior (Hou et al., Cell. 2016, PMID: 27662084; Ito et al., Elife. 2020, PMID: 32347794). We therefore propose that PMC<sup>CRH+</sup> cells establish the baseline conditions and contextual readiness for voiding, whereas PMC<sup>ESR1+</sup> cells act as the executive command to reliably initiate and execute the event.

      Finally, we agree that sensory inputs likely modulate PMC<sup>ESR1+</sup> neuron activity. Although this falls beyond the scope of the present study, it represents an important avenue for future investigation.

      Reviewer #2 (Recommendations for the authors):

      (1) In the introduction, the authors write that Keller 2018 only showed this ESR1 population to induce EUS relaxation, but those results also do show bladder contraction with photostimulation of this population. While the authors' work extends this finding in important ways, this should be acknowledged (line 60).

      Thank you for this important correction. We have now revised the Introduction to explicitly acknowledge that stimulation of neurons expressing estrogen receptor 1 (ESR1) in the PMC (PMC<sup>ESR1+</sup>) contributes to sphincter relaxation and increased bladder pressure (Introduction, lines 60-62), as originally reported by Keller et al., 2018.

      (2) I think a more detailed analysis of the dynamics of neural responses in the PMC ESR1 neurons would be valuable. For example: are the same cells always engaged before micturition, or do different populations activate on different trials? Can the authors comment on the half of the opto-tagged ESR1 population that is not firing during urination? Do they ever fire? A cell-by-cell analysis of which neurons are engaged over multiple trials would be very valuable to understand the dynamics of population activity. Figure 1H shows cumulative sessions, but what do single sessions look like?

      Thank you for these valuable comments. In response, we have performed refined single-trial analyses of neuronal activity, as detailed in the point-by-point replies below.

      For example: are the same cells always engaged before micturition, or do different populations activate on different trials?

      Among 11 PMC<sup>ESR1+</sup> units that showed urination-related excitation, 8 units exhibited a consistent firing increase in every voiding trial, whereas the remaining 3 increased their discharge in >78 % of trials (Figure 1B; new Figure supplement 3F). Thus, the same PMC<sup>ESR1+</sup> cells are recruited repeatedly, rather than distinct populations being activated on different trials. We have added this clarification to Results (lines 106–108).

      Can the authors comment on the half of the opto-tagged ESR1 population that is not firing during urination? Do they ever fire? A cell-by-cell analysis of which neurons are engaged over multiple trials would be very valuable to understand the dynamics of population activity.

      Approximately half of the opto-tagged PMC<sup>ESR1+</sup> cells showed no increase in firing rate during urination, yet exhibited spontaneous spikes at other times (new Figure supplement 3G), confirming their electrical competence. Because the PMC also participates in defecation, uterine activity, and other pelvic functions (Rouzade-Dominguez et al., Eur J Neurosci. 2003, PMID: 14686905; Schellino et al., Frontiers in Neuroanatomy. 2020, PMID: 33013330; Quaghebeur et al., Auton Neurosci. 2021, PMID: 34391125), these ESR1+ neurons may serve functions other than urination. We have now added this cell-by-cell analysis and discussion to the manuscript (Results, lines 108-112).

      Figure 1 H shows cumulative sessions, but what do single sessions look like?

      As shown in new Figure supplements 3F–3G, single-session raster plots reveal that PMC<sup>ESR1+</sup> neurons display consistent firing patterns across individual trials. Neurons whose firing rate increased during urination did so in most trials (Figure supplement 3F), whereas neurons unrelated to voiding remained silent or showed no discernible rate change during voiding across trials (Figure supplement 3G). These single-session observations are consistent with the cumulative population analysis shown in Figure 1H (new Figure 1B).

      (3) Supplemental Figure 4: It seems clear from this figure that NVCs are only occurring when the sphincter fails to engage. Can the authors quantify how often this is the case?

      Thank you for this important point. We have now quantified the occurrence of non-voiding contractions (NVCs) across all 229 bladder contraction events from 3 mice shown in Supplemental Figure 4. NVCs were observed exclusively when the external urethral sphincter failed to relax, accounting for 62/229 events (27.1 %), whereas coordinated voiding contractions (VCs) occurred in the remaining 167 events (72.9 %). These new data are presented in Figure supplement 4C.

      (4) Continuing from the above point: the authors say that the insufficient top-down drive or strength of activity from PMC ESR1 neurons is why NVCs occur. In looking closely, it also seems there is a small hump and subsequent increase in the calcium signal when the EUS bursting begins (particularly clear in Supplementary Figure 4). Could this instead mean that the bursting/urethral activity itself is feeding back onto the PMC to continue/enhance its activity, and it is instead the lack of sphincter bursting that results in the NVC? Could the authors analyze the signal during and after bursting starts? This model is consistent with one of the classic reflexes defined by Barrington, in which urethral fluid flow/activation enhances bladder contraction. The Figure 4 transection experiments do not fully answer this, as the authors are driving activity in the PMC at this time, but they could test this using PDN transection with fiber photometry recording.

      Thank you for this important point. We fully agree that EUS bursting may provide excitatory feedback to the PMC that sustains or even amplifies its activity, and that the absence of such feedback could underlie NVCs. To test this possibility, we re-analyzed the fiber-photometry traces aligned to the onset and offset of each EUS bursting (new Figure supplement 4). A small but consistent hump in the Ca<sup>2+</sup> signal appeared before bursting onset and the Ca<sup>2+</sup> signal continued to rise throughout the bursting (Figure supplement 4B, yellow arrow). The amplitude at bursting offset was significantly higher than both the NVC peak and the level recorded at bursting onset. These observations support the interpretation that urethral fluid flow/activation supplies excitatory feedback that reinforces PMC activity and bladder contraction, consistent with Barrington’s classic reflex. We have incorporated these new analyses into the revised manuscript (lines 145–155 and Figure supplement 4F).

      We agree that the positive-feedback loop described by Barrington’s classic urethra-to-bladder reflex is an intriguing mechanism. However, the PDN-transection experiment in Figure 4 was designed to determine if bladder contractions triggered by PMC<sup>ESR1+</sup> cells can proceed in the absence of sphincter bursting, not to evaluate this reflex. Incorporating simultaneous fiber-photometry recording into the PDN-transection experiment would therefore go beyond the scope of the present study. In future work we are keen to combine PDN transection with fiber photometry to further determine whether the urethra-to-bladder reflex contributes to the sustained PMC activity observed in our paradigm.

      (5) In Figure 4, is the timing of sphincter engagement different with ChR2 stimulation from what normally occurs? It appears that the bursting happens immediately upon activation whereas bladder contraction is a bit delayed.

      Thank you for this important observation. We have carefully re-examined the EMG traces from all animals shown in Figure 4. We confirm that the onset of sphincter bursting activity during ChR2 stimulation is indeed more rapid than during natural reflex voiding; nevertheless, the onset of phasic sphincter bursting during ChR2 stimulation remained delayed relative to the intravesical pressure rise (see Figure 8B).

      The immediate sphincter discharge visible in some trials was tonic EUS discharge or rare irregular bursting, not the typical EUS bursting. This tonic pattern corresponds to the spinal guarding reflex that suppresses urine leakage (Fowler et al., Nature Reviews Neuroscience. 2008, PMID: 18490916; Keller et al., Nature Neuroscience. 2018, PMID: 30104734). These segments were identified by their amplitude and spectral content and excluded from burst-onset analysis. Our analysis protocol therefore distinguishes tonic guarding activity from true phasic bursting, ensuring that only the latter was used to determine burst timing.

      (6) The explanation on line 299 about how spinal reflexes are impinging on this circuit is confusing. I agree that the bladder contraction stopping later than the EUS signal likely has something to do with spinal reflexes, but it seems this could instead be feedback from the urethral fluid flow, which continues bladder contractions (urethra-destrusor facilitative reflex). Could the authors clarify their thoughts here?

      Thank you for highlighting this ambiguity. We agree that the delayed cessation of bladder contraction could equally reflect either (1) the urethra-to-bladder facilitative reflex driven by ongoing urethral fluid flow or (2) spinal reflexes that we described. In the revised manuscript (Results, lines 343–349), we have re-worded the paragraph to make this dual possibility explicit, thereby avoiding an overly strong emphasis on spinal mechanisms alone.

      (7) A note on phrasing: the authors frequently say PMCESR1 cells drive sphincter relaxation, but then show an effect on sphincter bursting. Experienced readers might realize that relaxation and bursting are connected, but this might be confusing for readers and should be clarified in the text.

      Thank you for highlighting the potential ambiguity. We agree that the sentence “PMC<sup>ESR1</sup> cells drive sphincter relaxation” can seem paradoxical when our data show increased EUS bursting. In adult mice, the EUS does not remain continuously relaxed during voiding; instead, it generates rhythmic bursting composed of high-frequency spike clusters (active periods) alternating with low tonic activity (silent periods), resulting in rhythmic contractions and relaxations of EUS. This phasic activity acts as a pump that facilitates urine flow through the narrow rodent urethra (Kadekawa et al., Am J Physiol Regul Integr Comp Physiol, 2016, PMID: 26818058). The EUS bursting activity we recorded is consistent with the results reported in previous studies (Keller et al., Nat Neurosci, 2018, PMID:30104734; Ito et al., Elife, 2020, PMID:32347794).

      Consequently, when PMC<sup>ESR1</sup> neurons initiate bursting, they simultaneously generate the relaxation phases that separate the spikes. To make this explicit we have replaced the phrase “PMC<sup>ESR1+</sup> cells drive sphincter relaxation” with “PMC<sup>ESR1</sup> neurons trigger EUS bursting, which generates rhythmic sphincter contractions and relaxations.” (Results, page 7, lines 219-221). We have applied similar clarifications throughout the revised manuscript (Results, lines 125-129). We hope this revision eliminates any apparent contradiction.

      (8) The question remains as to which neurons (dual projecting, single projecting, or all?) are active in natural urination. This is possible to do through dual injection of retrograde virus in SPN and DGC that could coordinately turn on Gcamp, but this challenging experiment is perhaps beyond the scope of this paper. Even still, the authors could discuss their model for whether the dual- and single-projecting neurons are all engaged at once in a natural urination event. Do the authors have any data that could provide insight as to when these sub-populations are active? Results from the opto-tagging in Figure 1 (and comment #2 about single neuron firing properties) might provide a foundation for hypotheses or insights.

      Thank you for this valuable suggestion. We have now performed the experiment you proposed: dual injection of retrograde virus (AAV-Retro-Cre and AAV-Retro-DIO-GCaMP6s) in SPN and DGC were used to selectively label PMC dual-projecting neurons, and a 200-µm optic fiber was implanted above the PMC to record their Ca<sup>2+</sup> dynamics during natural urination (Figure supplement 11A and Methods, lines 470–474, 652-655). Dual-projecting neurons exhibited robust activation throughout the entire voiding phase that was tightly correlated with intravesical pressure rise and EUS bursting (Figure supplements 11A–11H). However, technical limits of current retrograde tools preclude selective isolation of single-projecting (SPN-only or DGC-only) subsets for independent fiber-photometry recordings and injection restricted to one target unavoidably labels both single- and dual-projecting cells. We now state this technical limitation explicitly (Discussion, lines 426-430).

      Accordingly, in the revised Discussion (lines 389-406), we integrate fiber-photometry Ca<sup>2+</sup> signals with single-unit data from opto-tagged recordings to propose several testable, non-mutually-exclusive models for how dual- and single-projecting PMC<sup>ESR1+</sup> neurons are engaged during natural urination: “Based on population dynamics obtained by fiber photometry (Figures 1D-1H, Figure supplements 1A-1F, and Figure supplements 11A-11H) and single-neuron firing properties recorded via optrode (Figures 1A-1C), we propose several mechanistic models for the engagement of dual- and single-projecting PMC<sup>ESR1+</sup> neurons during natural micturition. One possibility is that all three populations (dual-projecting, SPN-projecting and DGC-projecting neurons) are co-activated, with the dual-projecting subset acting as a “bridging amplifier” that sustains rising bladder pressure while coordinating EUS relaxation. Alternatively, SPN-projecting neurons may be recruited first to initiate bladder contraction, followed by DGC-projecting neurons that evoke EUS bursting and facilitate urine entry into the urethra; once flow begins, the urethro-detrusor facilitative reflex could recruit dual-projecting neurons to further enhance voiding efficiency. In addition, contextual or state-dependent urination—such as scent-marking behavior characterized by multiple voiding events with smaller volumes than reflexive urination—may predominantly rely on sequential and cooperative activation of single-projecting neurons. Other recruitment sequences remain conceivable. Future studies combining diverse urination-related behavioral paradigms with simultaneous recordings from projection-specifically labeled PMC neurons will be required to validate and refine these models.”

      Reviewer #3 (Public review):

      Summary:

      The paper by Li et al explored the role of Estrogen receptor 1 (Esr1) expressing neurons in the pontine micturition center (PMC), a brainstem region also known as Barrington's nucleus (Hou et al 2016, Keller et al 2018). First, the author conducted bulk Ca2+ imaging/unit recording from PMCESR1 to investigate the correlations of PMCESR1 neural activity to voiding behavior in conscious mice and bladder pressure/external urethral muscle activity in urethane anesthetized mice. Next, the authors conducted optogenetics inactivation/activation of PMCESR1 to confirm the contribution to the voiding behavior also conducted peripheral nerve transection together with optogenetics activation to confirm the independent control of bladder pressure and urethral sphincter muscle.

      We sincerely thank you for providing a thoughtful summary and insightful comments on our study.

      Weaknesses:

      (1) The study demonstrates that pelvic nerve transection reduces urinary volume triggered by PMC ESR1+ cell photoactivation in freely moving mice. Could the role of pudendal nerve transection also be examined in awake mice to provide a more comprehensive understanding of neural involvement?

      Thank you for this valuable suggestion. We conducted an additional experiment to determine the contribution of the pudendal nerve to PMC<sup>ESR1+</sup> neuron-driven voiding in awake mice. Bilateral pudendal nerve transection (PDNx) reduced the optogenetically evoked urine volume compared with sham-operated controls, yet photoactivation of PMC<sup>ESR1+</sup> neurons still reliably induced urination after PDNx (new Figure 6). Thus, bilateral integrity of the pudendal nerve is required for efficient PMC<sup>ESR1+</sup> neuron-driven voiding, most likely by transmitting the signals that entrain rhythmic EUS bursting. These data and experimental details have been incorporated into Figure 6, Results (lines 272–276), and Methods (lines 542–545).

      (2) While the paper primarily focuses on PMCESR1+ cells in bladder-sphincter coordination, the analysis of PMCESR1+-DGC/SPN neural circuits - given their distinct anatomical projections in the sacral spinal cord - feels underexplored. How do these circuits influence bladder and sphincter function when activated or inhibited? Also, do you have any tracing data to confirm whether bladder-sphincter innervation comes from distinct spinal nuclei?

      Thank you for this critical comment. To determine how PMC<sup>ESR1+</sup> neurons that target distinct sacral nuclei influence bladder–sphincter coordination, we first focused on the dual-projecting subset in a new experiment (Figures supplement 11 and Methods, lines 470–477, 652-655, 669-673). Dual retrograde virus injections into SPN and DGC selectively labelled PMC dual-projecting neurons, a subset of which are ESR1+. Fiber-photometry recordings showed that these cells were active during bladder contraction and sphincter relaxation (Figure supplements 11E-11H), whereas optogenetic activation reliably initiated urination: bladder pressure rose immediately and was followed by rhythmic EUS bursting (Figure supplements 11I-11N and 12B; Results, lines 309-313, 332-335). Thus, the dual-projecting sub-population is sufficient to coordinate bladder contraction with sphincter relaxation. Current retrograde tools do not allow selective isolation of single-projecting (SPN-only or DGC-only) subsets; injecting only one target unavoidably labels both single- and dual-projecting cells. Consequently, we cannot yet compare the functional impact of pure SPN-only versus DGC-only PMC populations. This limitation is now stated explicitly in the revised Discussion (lines 426–430).

      In our 2025 paper (Yan et al., Commun Biol, 2025, PMID: 40259086), we used PRV-based retrograde tracing to show that SPN and DGC constitute two separate spinal nuclei controlling the bladder and the EUS, respectively. Classic studies have reached the same conclusion (Yao et al., Nat Neurosci, 2018, PMID: 30361547; Karnup & De Groat, IBRO Reports, 2020, PMID: 32775758; Karnup, Auton Neurosci, 2021, PMID: 34391124). These citations and a concise summary have been added to the Results (lines 289–294).

      (3) Although the paper successfully identifies the physiological role of PMCESR1+ cells in bladder-sphincter coordination, the study falls short in examining the electrophysiological properties of PMC ESR1+-DGC/SPN cells. A deeper investigation here would strengthen the findings.

      Thank you for this thoughtful suggestion. While a detailed electrophysiological characterization of PMC<sup>ESR1+-DGC/SPN</sup> neurons would provide complementary information, the primary goal of the present study was to define the in vivo functional dynamics and behavioral role of these neurons during natural urination. As you suggested, further electrophysiological analysis of PMC<sup>ESR1+-DGC/SPN</sup> neurons will be an important direction for our future work.

      (4) The parameters for photoactivation (blue light pulses delivered at 25 Hz for 15 ms, every 30 s) and photoinhibition (pulses at 50 Hz for 20 ms) vary. What drove the selection of these specific parameters? Moreover, for photoactivation experiments, the change in pressure (ΔP = P5 sec - P0 sec) is calculated differently from photoinhibition (Δpressure = Ppeak - Pmin). Can you clarify the reasoning behind these differing approaches?

      Thank you for this opportunity to clarify our experimental design. The photoactivation protocol (25 Hz, 15 ms pulses) was chosen because PMC<sup>ESR1+</sup> neurons faithfully follow this frequency without depolarisation block and it reliably triggers voiding (Keller et al., Nat Neurosci, 2018, PMID:30104734). For photoinhibition we originally stated “50 Hz, 20 ms pulses”, but this was an error. Consistent with the same study (Keller et al., Nat Neurosci, 2018, PMID:30104734), we used continuous light (constant illumination) to maintain sustained suppression. The Methods section has been corrected (lines 659-661, 690-691).

      The ΔP formula was tailored to the temporal profile of each manipulation. For activation, ΔP (P<sub>5 sec</sub> - P<sub>0 sec</sub>) captures the rapid pressure rise after light onset; the same window was used in (Hou et al., Cell. 2016, PMID: 27662084). For inhibition, because saline infusion produces rhythmic reflex voiding, we delivered light at the onset of EUS bursting (i.e. when pressure was already at ~peak). Inhibition abruptly stops the bladder contraction, so the bladder cannot return to its pre-void baseline. The Δpressure (P<sub>peak</sub> – P<sub>min</sub>) was therefore used to quantify the extent to which the ongoing pressure wave was aborted by photoinhibition. P<sub>min</sub> is the lowest value reached before the next infusion-driven upswing, making the metric insensitive to the slow baseline drift produced by continuous infusion. These clarifications have been added to the Methods (Methods, lines 676-677, 679-680, 692-693).

      (5) The discussion could further emphasize how PMCESR1+ cells coordinate bladder contraction and sphincter relaxation to control urination, highlighting their central role in the initiation and suspension of this process.

      Thank you for this valuable comment. We have revised the Discussion to emphasize that PMC<sup>ESR1+</sup> neurons coordinate urination by sequentially driving bladder contraction followed by sphincter relaxation through their dual projections to the SPN and DGC. We also emphasized that this coordination is essential for the initiation and effective execution of voiding (Discussion, lines 369-388). In addition, in the revised Discussion (Discussion, lines 389-406), we integrate fiber-photometry Ca<sup>2+</sup> signals with single-unit data from opto-tagged recordings to propose several testable, non-mutually-exclusive models for how PMC<sup>ESR1+</sup> cells are engaged during natural urination.

      (6) In Figure 8, The authors analyze the temporal sequence of bladder pressure and EUS bursting during natural voiding and PMC activation-induced voiding. It would be acceptable to consider the existence of a lower spinal reflex circuit, however, the interpretation of the data contains speculation. Bladder pressure measurement is hard to say reflecting efferent pelvic nerve activity in real time. (As a biological system, bladder contraction is mediated by smooth muscle, and does not reflect real-time efferent pelvic nerve activity. As an experimental set-up, bladder pressure measurement has some delays to reflect bladder pressure because of tubing, but EUS bursting has no delay.) Especially for the inactivation experiment, these factors would contribute to the interpretation of data. This reviewer recommends a rewrite of the section considering these limitations. Most of the section is suitable for the results.

      We agree with the reviewer that bladder pressure, mediated by smooth muscle contraction, provides an indirect measure of efferent pelvic nerve activity and is subject to both physiological and experimental delays. Regarding potential delay from the tubing system, pressure propagates in fluid at approximately 1000 m/s (Kela & Pekka, Proceedings of World Academy of Science Engineering & Technology, 2009, DOI: 10.5281/zenodo.1080526). Given that the total tubing length in our setup is 0.5-1 meter, this gives an estimated transmission delay of only 0.5-1 ms. However, this delay is negligible compared with the observed time difference (~700 ms) between the cessation of EUS bursting and the termination of bladder contraction. Theoretically, pressure transmission is not expected to introduce a temporal delay. However, we cannot exclude the possibility that the pressure measurement itself may impose such a delay, because bladder pressure does not necessarily reflect efferent pelvic nerve activity in real time. Future studies using simultaneous recordings of bladder pressure and pelvic nerve discharges will help clarify whether a true temporal delay exists. Nevertheless, we agree that additional physiological or peripheral factors may also contribute to this difference in timing. As suggested by the reviewer, we have revised the discussion to consider the potential influence of other factors, such as urethra-detrusor facilitative reflex (Results, lines 343-349).

      Reviewer #3 (Recommendations for the authors):

      (1) In opto-tag experiments, a comparison of average AP waveform during behavior and during light stimulation should be included as criteria. It should be mostly the same waveform.

      Thank you for bringing this to our attention. We have now added this comparison as an inclusion criterion in the revised manuscript. Figure supplement 3B shows representative examples of the average waveforms, and Figure supplement 3C displays the distribution of correlation coefficients between spontaneous and light-evoked spikes for all recorded PMC<sup>ESR1+</sup> units, all of which exhibited r > 0.8.

      (2) Optical fiber implantation seems to be done in two different methods. In Figure 1 and Figure 2, the fiber tip is positioned just above PMC but in Figure 3 it seems to be angled. The information should be included in the Methods section.

      Thank you for this important comment. We have now clarified in the Methods that for Figures 1 and 2, the optical fibers were implanted vertically above the PMC, whereas for Figure 3, the left optical fiber was implanted at a 33° lateral angle targeting the PMC (Methods, lines 499-503).

      (3) In the closed-loop inhibition experiments of Figure 2, the parameters to start closed-loop photo-inactivation were not described in the method. If it is a manual closed loop, it should be described clearly.

      Thank you for raising this important point. We apologize for omitting these details in the original Methods. We have now added a complete description of the manual closed-loop photo-inhibition protocol, including the triggering criteria and operator-controlled timing, in the revised Methods section (lines 602–605).

      (4) In Figure 7A/E the authors provide a spinal cord image to show the injection site, but the image is misleading. The figure only shows AAV-infected CRH/ESR1 neurons in the spinal cord section. It does not indicate the AAV injection site or the terminal distribution.

      Thank you for your important comment. We apologize for providing a spinal cord image that did not accurately depict the injection site. To rigorously verify that our spinal injections were confined to SPN or DGC, we performed new retrograde-tracing experiments in ESR1-Cre and CRH-Cre mice. A mixture of AAV-Retro-DIO-mCherry or AAV-Retro-DIO-EGFP with the retrograde tracer CTB-647 was injected specifically into SPN or DGC. Only animals in which CTB-647 fluorescence was strictly limited to the target nucleus, without spread to the adjacent region, were included (new Figures 7A and 7E). These data confirmed our original observations and have been pooled in Figure 7. The manuscript and figure have been updated accordingly (Results, lines 297-301, 304-306; Methods, lines 465–466).

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank both editors and the three reviewers for their constructive criticism of our work. As a result of these comments, we have made several significant revisions to the paper that we believe strengthen and clarify our major results:

      (1) Following suggestions from Reviewers #1 and #3, we have have improved our introduction to the different fitness concepts (lines 105–148) and streamlined the discussion of the logit encoding (lines 175–190). In particular, we have moved the most technical points to the SI (Sec. S3).

      (2) Based on criticisms of our usage of the population dynamics model from Reviewers #1 and #3, we significantly revised our explanation of the motivation and interpretation of this model (lines 284–310 and 323–336) and our discussion of the generalizability of these results (lines 678–728), including the possible effects of interactions besides resource competition.

      (3) Following a request from Reviewer #3, we have expanded our analysis of epistasis to systematically test all possible double mutants between qualitative types of trait perturbations in the model. We have added a new main text figure (Fig. 3), new SI figures (Figs. S9–S15), a new subsection in the Results (lines 344–395), and corresponding new sections in the Methods (lines 864–892) and SI (Sec. S8).

      (4) Following concerns from Reviewers #2 and #3 about the limited empirical data, we have expanded our analysis of the LTEE data (new main text Fig. 4, revised text on lines 416–439, and revised SI Figs. S16–S18) and have analyzed two new benchmarking datasets for bulk fitness to test our predictions (new main text Fig. 6, new Results subsection on lines 561–590, and new SI Figs. S24 and S25).

      (5) Following the criticism of Reviewer #3 about the lack of a clear recommendation on fitness quantification that provides the greatest value for a given scientific question, we have better explained what we think the scientific consequences of fitness are as a motivation for our analysis (lines 82–88, 319–322, and 615–630) and replaced the final flowchart figure with a step-by-step guide in the Methods to implement our recommendations in practice (lines 964–982).

      Reviewer #1 (Public review):

      The authors point out that the fitness estimates obtained from different experimental assays (monoculture, pairwise competition, or bulk competition) are not generally equivalent, not even with regard to the fitness ranking of different genotypes. Using a computational model based on experimentally measured growth phenotypes for knockout strains in yeast, as well as data from Lenski’s Long Term Evolution Experiment (LTEE), they derive a set of best practice rules aimed at extracting the optimal amount of information from such experiments.

      The study is very complete on a technical level and I have no suggestions for further analyses. However, I feel the readability and the conceptual focus of the manuscript could be significantly improved by rearranging the material with regard to the contents of the main text vs. the Methods and the Supplement. Detailed recommendations:

      (1) Regarding readability, the large number of references to material in the Methods and Supplement fragment the main text and make it difficult to follow.

      We understand the challenges these references pose to the flow of the main text; we have attempted to keep those references to a minimum, while ensuring that technical details of the work are fully documented and referenced for completeness.

      (2) Conceptually, it seems to me that the current presentation obscures the reasons why we should care about fitness in the first place. In the first paragraph of Results, the authors define fitness “as any number that is sufficient to predict the genotype’s relative abundance x(t) over a short-time horizon”. To me, this seems like an extremely narrow and not very interesting definition. Instead, I view fitness as an intrinsic property of a genotype that allows us to predict its performance under a range of conditions, including in particular conditions that are different from the experimental setup that was used to obtain the fitness estimates. The latter viewpoint is well expressed in Supplementary Section S1, where the authors discuss the notion of fitness potential. I would recommend to move at least part of this discussion to the main text.

      We appreciate the reviewer’s viewpoint and have moved that conceptual discussion from the SI to the beginning of the Results section to give readers a broader perspective on fitness (lines 105–148). We use “potential” in analogy with potential energy in physics and have clarified this on lines 126–135.

      What we call fitness potential, like the other notions of fitness we discuss in this paper (relative and absolute fitness), is still specific to an environmental condition. Fitness as a property intrinsic to a genotype and independent of any environment, as the reviewer mentions, is an interesting concept but beyond the scope of this paper, which is focused on analyzing fitness measurements that are inevitably environment-specific and we have clarified this on lines 142–148. While it is true that this definition of fitness is narrow, it is what can be empirically measured directly, and thus we believe it is crucial to understand how to best interpret that data.

      By comparison, the arguments in favor of the logit encoding that currently opens the Results session are rather straightforward and could be shortened significantly.

      We agree and have condensed this section (lines 175–192).

      (3) Similarly, the modeling strategy used in this work is quite subtle and needs to be explained more fully in the main text. The authors use growth traits (lag time, growth rate, and yield) extracted from monoculture experiments on a yeast knockout collection and feed them into a specific mathematical model to simulate pairwise and bulk competition scenarios. Since a key claim of the work is that monoculture experiments are generally poor predictors of competitive fitness, the basis for this conclusion and the assumptions on which it is based need to be described clearly in the main text. In the current version of the manuscript, this information has been largely relegated to the Methods section.

      We agree that our motivation for the population dynamics model and growth curve data was not clearly explained. We have significantly revised this section of the Results in the main text (lines 284–310).

      In particular, we recognize the potential for misunderstanding this material we do not intend the relative fitness values calculated from this model to be interpreted as predictions of the true relative fitness between yeast deletion strains. Rather, we use the population dynamics model for our proof of principle: that the most basic features of microbial population dynamics in laboratory experiments, as captured by this model (resource competition, lag phase, growth phase, saturation), are sufficient to create discrepancies between common fitness statistics used in these experiments (different encodings, time scales, choices of reference subpopulations). We have added a statement to highlight existing work on monoculture predictors for competition outcomes [32, 34, 36, 37] on lines 453–459.

      Reviewer #1 (Recommendations for the authors):

      In the discussion of the LTEE in Section S8, the authors write on page 8 that “we couldn’t fit the fitted values a,b in ref. 29 so we were unable to check it”. I don’t understand this sentence - is the claim that the fit in ref. 29 was incorrect?

      We have clarified this point in the SI (now Sec. S9). Our point was not that the fit in Wiser et al. 2013 is incorrect, but merely that we could not find the exact values of the fitted parameters they obtained documented in their paper, so we could not compare our own fitted parameters directly to theirs.

      Also, at the end of the section, the authors refer to theory work on the long-term fitness trend in the LTEE. Here, two early references arguing for a logarithmic increase in fitness could be mentioned as well:

      International Journal of Modern Physics B 12,:361-391 (1998) Evolution and Extinction Dynamics in Rugged Fitness Landscapes Paolo Sibani, Michael Brandt, and Preben Alstrøm

      J. Stat. Mech. (2008) P04014 Evolution in random fitness landscapes: the infinite sites model Su-Chan Park and Joachim Krug

      We thank the reviewer for providing these two references and have added them to the list of previous works on long-term fitness trends at the end of the section (now Sec. S9).

      Reviewer #2 (Public review):

      Summary:

      The manuscript “Quantifying microbial fitness in high-throughput experiments” provides a comprehensive analysis of the various approaches to quantifying fitness in microbial evolution, focusing on three primary factors: encoding of relative abundance, time scale of measurement, and the choice of reference subpopulation. The authors systematically explore how these choices impact fitness statistics and provide recommendations aimed at standardizing practices in the field. This manuscript aims to highlight the impact of differing fitness definitions and the methodologies utilized for analysis and how that can significantly alter interpretations of mutant fitness, affecting evolutionary predictions and the overall understanding of genetic interactions in the experiments. Although this manuscript focuses on a critical issue in the quantification of fitness in high throughput experiments, it heavily relies on only one experimental dataset (Warringer et al 2003) and one organism i.e, Yeast (Saccharomyces cerevisiae) grown in a defined medium, the environmental influence is not completely captured. While the theoretical framework is strong, more experimental examples with more organisms (i.e., more datasets) in their analysis and comparison would enhance the manuscript, especially its conclusion.

      We have expanded our analysis of competition data from the Long-Term Evolution Experiment in E. coli (lines 416– 439), including adding a main text figure (Fig. 4) along with the three SI figures (Figs. S16–S18). We have also added two completely different data sets that directly test our predicted discrepancies in fitness estimates from bulk competition experiments. From this data we have added a new main text figure (Fig. 6), two new SI figures (Figs. S24 and S25), and a new section at the end of the Results (lines 563–590).

      We wish to clarify, though, that the aim of this study is to develop theory on fitness quantification choices and minimal examples to demonstrate the potential for discrepancies between these choices. While we appreciate the reviewer’s interest in understanding how discrepancies in fitness statistics vary across organisms and environments, that is an empirical question beyond the scope of this paper.

      Strengths:

      The choices for quantifying fitness in evolution experiments are critical and highly relevant given the increasing prevalence of high-throughput experiments in evolutionary biology. The authors methodically categorize fitness statistics and their implications, providing clarity on a complex subject. This structured approach aids in understanding the nuances of fitness measurement. The manuscript effectively highlights how different choices in fitness measurement can influence fitness rankings and the understanding of epistasis, which is important for modeling evolutionary dynamics.

      Weaknesses:

      The theoretical framework is robust, but the manuscript could benefit from more empirical examples to illustrate how different fitness quantification methods lead to varied conclusions in experiments.

      Please see our response to the previous comment on this point.

      The discussion on the choice of reference subpopulation could be expanded with the influence of the environment or the condition. Different types of reference groups might yield different implications for fitness calculations, and further elaboration would enhance this section.

      While we agree that studying how environmental conditions affect fitness is an important and interesting problem, it goes beyond the scope of this paper, which focuses on the basic theory of quantifying microbial fitness from highthroughput experiments. Applications of this theory to empirical questions about environmental variation would be best served by their own studies. We have added a statement clarifying this goal (lines 144–148).

      We are unsure how the choice of reference subpopulation is related to this issue. In our view, if the goal of a mutant fitness measurement is to predict how that mutant would behave when arising spontaneously and competing against its immediate ancestor, the gold-standard reference subpopulation must always be the mutant’s immmediate ancestor, or another mutant that is known to be phenotypically equivalent to the ancestor (e.g., neutral mutants in the case of a large mutant library). Other choices of reference subpopulations would not provide directly meaningful information in this regard.

      The authors overgeneralize some findings; for instance, the implications of fitness measurement choices could vary significantly across different microbes or experimental conditions. A more detailed discussion would strengthen the conclusion.

      We certainly agree that the consequences of fitness quantification choices could vary significantly across organisms and environments; our goal for this paper is to demonstrate what discrepancies are possible in principle and in particular how they depend on basic features of microbial population dynamics (e.g., variation in yield). We have added two separate paragraphs in the Discussion section to address the generalizability of our results in the context of pairwise (lines 678–710) and bulk fitness measurements (lines 711–728).

      Overall, this manuscript is a significant contribution to the field of evolutionary biology, addressing a critical issue in the quantification of fitness but lacks more experimental support to make it a wider claim. By systematically exploring the factors that influence fitness measurements, the authors provide valuable insights that can guide future research - the framework is computationally thorough but needs a more detailed explanation of concepts instead of generalizing.

      We have improved our explanation of several of the important concepts. In particular, we have significantly revised our explanation of the population dynamics model (lines 284–310) to emphasize its role as a null model to demonstrate how fundamental aspects of microbial growth are sufficient to cause discrepancies between fitness statistics. We have also revised two paragraphs on the generalizability of our results in the Discussion section (lines 678–728).

      Further work is needed, particularly to incorporate empirical examples and expand certain discussions to include environmental variation and their impact, which would improve clarity and applicability.

      We have added a sentence at the beginning of the Results section to acknowledge the environmental dependence of fitness (lines 142–148). We believe further discussion of that issue is beyond the scope of this paper, as it would require a significant amount of additional data and/or environmental modeling.

      Reviewer #2 (Recommendations for the authors):

      In addition to the comments from the previous sections, other specific comments:

      (1) Figure 5 needs to be populated with additional parameter details. For example, include brief descriptions of each parameter involved in the encoding, time scale, and reference choices. This will help users understand the implications of each choice. Adding these details will make the flow diagram more comprehensive, aiding researchers in implementing these steps more clearly.

      Following this comment and another comment about this figure from Reviewer #3, we decided to replace this figure with a new Methods section with step-by-step instructions (lines 964–982).

      (2) Duplication in Line 620: “Nevertheless, the fact that we see the fact that we see...” This redundancy needs to be corrected.

      We thank the reviewer for pointing this out; we have rewritten this paragraph.

      (3) More experimental data comparisons and their assessment concerning various microbial systems and multiple environmental conditions are recommended to support the claim.

      Please see our responses to the related public comments.

      Reviewer #3 (Public review):

      Summary:

      The authors present analyses of different fitness measures derived from empirical data from yeast knockout mutants and the long-term evolution experiment (LTEE) with Escherichia coli to explore discrepancies and identify preferred methods to estimate relative fitness in high-throughput experiments. Their work has three components. They first discuss the different “encodings” of relative abundance data and conclude that logit transformations are preferred because they transform nonlinear abundance trajectories into linear trajectories with greater predictive power. Next, they compare per-generation with per-growth cycle relative fitness estimates inferred from simulations of pairwise competitions based on published growth traits for the yeast strains and on published pairwise competition measurements for the LTEE data. Both data sets show quantitative and qualitative (i.e. rank order) discrepancies of estimates across different time scales, which are highlighted by considering possible underlying causes (i.e. trade-offs between growth traits) and consequences (i.e. epistasis among mutations affecting different growth traits). Finally, the authors compare simulated pairwise and bulk (i.e. where many mutants compete during a growth cycle in a single environment) competition assays based on the yeast knock-out mutants and demonstrate an optimal ratio of collective mutants to wild-type strains that minimizes both sampling error and overestimation of fitness estimates when compared with pairwise competitions.

      Strengths:

      The study deals with a highly relevant topic. Fitness is central to general evolutionary theory, but also poorly defined and implies different traits for different organisms and conditions. For microbes, which are often used in evolution experiments, high-throughput experiments may yield different measures to quantify abundance over time, from individual growth traits to bulk competition experiments. Hence, it is relevant to consider discrepancies among those measures and identify preferred measures with respect to predicting population dynamics and evolutionary processes. The present study contributes to this aim by (i) making readers aware of differences among commonly used fitness estimates, (ii) showing that simulated (yeast) and calculated (E. coli) competitive fitness may differ across time scales, and (iii) showing that bulk competitions may yield relative fitness estimates that are systematically higher than pairwise competitions. The study is rather thorough on the theory side, with extensive derivations and analyses of various fitness measures using their resource competition model in the Supplementary Information. The study ends with a few practical recommendations for preferred methods to infer relative fitness estimates, that may be useful for experimentalists and stimulate further investigations.

      Weaknesses:

      The study has several limitations. Perhaps the most apparent limitation is the lack of a clear answer to the question of which fitness measure is best “in the light of first principles”. The authors show clear discrepancies between fitness estimates across different time scales or using different reference genotypes in bulk competition and provide useful recommendations based on practical considerations (e.g. using pairwise competitions as the “golden standard”), but it remains unclear whether these measures provide the greatest value for the questions researchers may want to answer with them (e.g. predict shifts in genotype frequencies).

      We agree on the importance of considering the scientific questions researchers want to answer in determining the best way to quantify fitness. We have revised both the Introduction (lines 82–88) and the Discussion (lines 615–630) to more clearly explain possible downstream questions researchers may wish to answer with fitness data, and thus why discrepancies in that data based on analysis choices may be important.

      We believe that the text does provide a specific recommendation (second subsection of the Discussion, lines 635– 658) for how to quantify relative fitness: using the logit encoding (rather than other encodings), measuring fitness per-cycle (rather than per-generation), and using the wild-type or a phenotypically-equivalent proxy as reference subpopulation to calculate pairwise fitness in a bulk competition (rather than using the mutant library as a whole). This recommendation is based on first principles: the logit encoding is based on the principle of the logistic equation as the null model of relative abundance dynamics (lines 635–637), the choice of the per-cycle timescale is based on the principle that in non-steady state environments the time scale for measuring selection should not depend on the wild-type growth (lines 640–645), and the choice of reference population is based on the principle that a mutant’s fitness should serve as a predictor of its dynamics when arising de novo at low frequency and competing against its wild-type (lines 648–653).

      A second limitation is that the authors analyse fitness differences arising solely from resource competition, whereas microbes often interact via other mechanisms, e.g. the production of anticompetitor toxins, cross-feeding of metabolites, or lack of growth to enhance their persistence in stress conditions. Without simulations of these processes, understanding discrepancies among fitness measures is necessarily limited.

      We agree that other interactions are important in many microbial ecosystems and could affect measurements of fitness. We discuss the possibility of these other interactions and their potential consequences for fitness on lines 697– 710.

      We focus on resource competition in this paper, however, for two reasons. One is that we are using it as a null model: resource competition is always present, and thus it provides an important baseline for discrepancies in fitness statistics in the absence of any other assumptions. Indeed, our results are that this minimal assumption alone is sufficient to produce a wide range of significant discrepancies, which provides the proof of principle that choices of fitness quantification matter. We have clarified this in a revised explanation of the population dynamics model on lines 294–304.

      The second reason is that fitness measurements of the type discussed in this paper are typically performed on mutants that have only small genetic differences with their ancestor (e.g., a point mutation or gene deletion). While more complex interactions between such similar genotypes are not impossible, we expect them to be rare, in which case resource competition is the only interaction. Explicit modeling of other interactions is an important question for future work, but would require more detailed models and data of those phenomena, and thus would go beyond the scope of the present study. We have added a sentence to explain our emphasis on resource competition on lines 298–301 and 690–697.

      In addition, the analysis of trade-offs between growth traits causing these discrepancies during resource competition seems confounded by biases in measurement error or parameter estimation, at least for growth rate and lag time (Figure 2B), where the replicate estimates for the wildtype show a similar negative correlation.

      The tradeoff between growth traits was only an incidental observation and is not necessary for the fitness statistic discrepancies we analyze in this paper; the only important pattern in the growth traits is the existence of mutants with reduced yields (so as to reduce the wild-type log fold-change in a competition) as well as variation in one other trait under selection (lag time or growth rate in this model). We have clarified this mechanism on lines 328–336, which is demonstrated by Fig. S7. Since these tradeoffs are not relevant to the results and we agree that their significance may be unreliable due to the noisiness of the data, we have removed mention of them.

      Third, the study does not validate relative fitness predictions from growth traits (as is done for the yeast mutants) with measured relative fitness estimates using competition assays, while such data are available, e.g. for the LTEE. This would strengthen their inferences about preferred fitness measures.

      The goal of our modeling with the yeast growth trait data is not to test the ability to predict competition experiments from monoculture data; that has been the focus of previous studies [32, 34, 36, 37]. Rather, we use the population dynamics model for a proof of principle: that the most basic features of microbial population dynamics in laboratory experiments, as captured by this model (resource competition, lag phase, growth phase, saturation), are sufficient to create discrepancies between common fitness statistics used in these experiments (different encodings, time scales, choices of reference subpopulations). The yeast growth curve data merely provides realistic parameters for this model, to ensure we are studying a biologically relevant regime of the dynamics. To avoid this misconception, we have revised our explanation of this model and the data on lines 284–310.

      Fourth, the analysis of epistasis between mutations affecting different growth traits (shown in Figure 3) based on the LTEE data could be better introduced and analysed more comprehensively. Now, the examples given in panels C-F seem rather idiosyncratic and readers may wonder how general these consequences of using fitness estimates based on different time scales are.

      We agree that this analysis was incomplete and missed an opportunity to emphasize this important consequence of fitness quantification. We have thus expanded this analysis into a systematic test of all possible double mutants between qualitative types of trait perturbations in the model. We have added a new main text figure (Fig. 3), new SI figures (Figs. S9–S15), a new subsection in the Results (lines 346–395), and corresponding new sections in the Methods (lines 864–892) and SI (Sec. S8).

      Finally, the study is generally less accessible to experimentalists due to the extensive and principled treatment of specific population dynamic models and fitness inferences. This may distract from the overarching aim to identify fitness measures that are most accurate and useful for predictions of population dynamics and evolutionary processes.

      We appreciate this concern as we do hope to make the paper as broadly accessible as possible, especially to experimentalists who measure microbial fitness. To this end, we have reduced the technical discussion of encodings in the first section of the Results (lines 164–187); revised explanations of the population dynamics model (lines 284–310), importance of growth trait variation (lines 328–336), and epistasis (lines 346–395) to better emphasize the conceptual intuition of these parts; and added a step-by-step guide for our recommended best practices of quantifying fitness in bulk competition experiments (lines 964–982).

      In this light, the motivation for the initial discussion of the importance of how to best encode relative abundance (Figure 1) is unclear. Also, the conclusion, that logit encoding is preferred, because it linearizes logistic growth dynamics and “improves the quality of predictions”, is not further motivated. Experimentalists using non-linear models to infer fitness from growth curves or competition assays may miss the relevance of this discussion.

      The motivation for the discussion of encodings is that it is one of the choices made differently by researchers, mainly using either the logit (more common in experimental evolution and population genetics studies) or log encoding (more common in TnSeq analyses). As such we believe it is important to explain where this choice comes from (a transformation of relative abundance data to make it approximately linear in time, and thus amenable to characterization by a single slope parameter) and why we believe the logit encoding is more logical in most cases. We have streamlined and revised this subsection to make it clearer (lines 164–187).

      Our argument for favoring the logit encoding in most cases is based on the logistic model being a null model for relative abundance dynamics (Sec. S3). In light of the reviewer’s comments, we have realized this may be confusing because there are two common usages of logistic dynamics that are biologically distinct. What we mean by logistic model is the dynamics of relative abundance x of a mutant in competition with other genotypes:

      Here s turns out to be the relative fitness under the logit encoding. On the other hand, researchers also use a logistic ODE to describe the dynamics of absolute abundance N of a single strain in monoculture (e.g., as in a growth curve):

      We believe the reviewer’s last point refers to Eq. (2), whereas our argument about the logit encoding is based on Eq. (1). We have added a note to clarify this distinction for the reader (lines 192–196).

      Reviewer #3 (Recommendations for the authors):

      In addition to my general comments in the public review, I have several more specific recommendations:

      (1) Line 183-189: unclear why logit-based relative fitness is preferred. Abundance data are not typically binomial.

      We agree this claim about abundance data was incorrect and have removed it. We have revised the section to focus on motivating the logit encoding from logistic dynamics of relative abundance as a null model for most systems (main text lines 175–187 and Sec. S3).

      (2) Line 205: it may be mentioned that s(logit) is the same as the “selection rate constant” often used in microbial studies.

      We have added a sentence clarifying the equivalence of the logit-encoded relative fitness to the selection coefficient in population genetics (lines 188–190).

      (3) Line 368: why do mutations that increase biomass yield also increase WT LFC? Is this, because they grow slower and hence allow the WT more time to grow?

      Mutants with higher yield allow the wild-type to achieve higher log fold-change because those mutants consume fewer resources per cell, which frees up more resources for the wild-type to consume and increase its overall growth. It’s not about growth rate or time, as this would occur even for mutants whose growth rates are identical to the wild-type’s. We have revised our explanation of how variation in growth traits differentially affects fitness statistics (lines 323–340) and epistasis (lines 361–378).

      (4) Line 382-386: you may want to cite Ram et al. (2019, 10.1073/pnas.1902217116), who also did such analyses for experimental data from E. coli.

      We have cited this work as Ref. [34].

      (5) Line 415: perhaps use “bulk relative fitness” instead of “total relative fitness”, to contrast with “pairwise relative fitness”.

      We acknowledge the language in this section can be subtle. However, “bulk” is not a sufficient identifier for the concept of total relative fitness as bulk competition experiments (with many genotypes competing simultaneously) can be used to measure either total relative fitness or pairwise relative fitness. (In pairwise competition experiments with only two genotypes, these two types of fitness are identical.) As such we adhere to our original language but have added words to clarify which type of experiment (bulk or pairwise) we are talking about in a given context (e.g., on lines 495–504).

      (6) Line 451-453: why does a population in bulk competition consume resources more slowly than in pairwise competitions?

      Mutant libraries used in bulk competition experiments usually include a large number of deleterious mutants, which grow more slowly than the wild-type. Thus these populations typically consume resources more slowly than a population in a pairwise competition would, where a large part of the population is the wild-type.

      (7) Line 565: I don’t understand how one can compare relative fitness to other timescales.

      Relative fitness, as we’ve defined it, has units of rate, since it describes the rate of change of relative abundance (or an encoding of it) over some time scale (e.g., a batch growth cycle or a generation). Therefore it can be compared to other times scales of the system, such the rate of new mutations arising or the rate of genetic drift fluctuations, as long as they are measured in the same units. This comparison is important to population genetics analyses, such as determining whether the population is in the strong selection-weak mutation limit or the clonal interference regime.

      (8) Line 620 repeats text.

      Thank you, we have revised this paragraph and removed the typo.

      (9) Figure 1C+D: the link between the scenarios on the left and the graphs on the right may be better explained. For example, it may help to make explicit that the 4 scenarios in panel C show the same relative fitness per cycle and that mutant and wildtype have the same growth rate, but different growth periods in both scenarios in panel D. It is also unclear whether the grey dot links to the upper scenario in D.

      We have clarified this issue in the caption and changed the colors to avoid this confusion.

      (10) Figure 2E: it is unclear why “mutants with equal fitness are assigned the lowest rank”.

      This was a technical comment about how to handle ties in our analysis of mutant rankings, but it is moot since no exact ties actually occur in our simulations. We have removed this remark to avoid confusion.

      (11) Figure 2F: the axis labels are confusing, as for the WT estimates no LFC mutant exists. It would also help to make explicit in the legend against which WT replicate/reference strain each strain has competed.

      We agree the inclusion of wild-type replicates in this plot was confusing and unnecessary, so we have removed them. The mutants compete against a wild-type with traits defined by their median values across all wild-type replicates; this is noted in Fig. 2A and the Methods section on our analysis of this data (lines 809–813).

      (12) Figure 5: I am not sure this is needed, as its information is rather limited.

      We agree and have removed this figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is a useful study presenting solid data indicating that the bacterial GTPases EngA and ObgE enable single-step reconstitution of functional 50S ribosomal subunits under near-physiological conditions. The study elegantly bridges the gap between the non-physiological aspects of the previous two-step reconstitution method and the extract-dependent iSAT system to enable ribosome assembly under translation-compatible conditions; however, it is limited by reliance on rRNA and proteins extracted from native ribosomes and does not achieve a true bottom-up reconstruction from all synthetic components. The evidence is incomplete in not characterizing the spectrum of reporter polypeptides produced and not comparing their rate and yield of synthesis from reconstituted ribosomes to that obtained with pure native ribosomes; and the impact of the study is limited by not including reporters to examine the fidelity of initiation, elongation or termination achieved with the reconstituted ribosomes.

      As described below, based on the comments from the public reviewers, we have summarized at the end of the Discussion how this study contributes toward true bottom-up reconstruction from fully synthetic components, as well as the aspects that will require further development. In addition, we have newly provided data characterizing the reporter polypeptides from multiple perspectives, demonstrating that the assembled ribosomes do not exhibit issues such as reduced fidelity (Fig. 6, 7, Supplementary Data 2, 3). We believe that these data adequately address the limitations that were pointed out in the eLife Assessment.

      Public Reviews:

      Reviewer #1 (Public review):

      This study presents evidence that the addition of the two GTPases EngA and ObgE to reactions comprised of rRNAs and total ribosomal proteins purified from native bacterial ribosomes can bypass the requirements for non-physiological temperature shifts and Mg+2 ion concentrations for in vitro reconstitution of functional E. coli ribosomes.

      Strengths:

      This advance allows ribosome reconstitution in a fully reconstituted protein synthesis system containing individually purified recombinant translation factors, with the reconstituted ribosomes substituting for native purified ribosomes to support protein synthesis. This work potentially represents an important development in the long-term effort to produce synthetic cells.

      Weaknesses:

      While much of the evidence is solid, the analysis is incomplete in certain respects that detract from the scientific quality and significance of the findings:

      (1) The authors do not describe how the native ribosomal proteins (RPs) were purified, and it is unclear whether all subassemblies of RPs have been disrupted in the purification procedure. If not, additional chaperones might be required beyond the two GTPases described here for functional ribosome assembly from individual RPs.

      Native ribosomal proteins (RPs) were prepared from native ribosomes, according to the well-established protocol described by Dr. Knud H. Nierhaus [Nierhaus, K. H. Reconstitution of ribosomes in Ribosomes and protein synthesis: A Practical Approach (Spedding G. eds.) 161-189, IRL Press at Oxford University Press, New York (1990)]. In this method, ribosome proteins are subjected to dialysis in 6 M urea buffer, a strong denaturing condition that may completely disrupt ribosomal structure and dissociate all ribosomal protein subassemblies. To make this point clear, we described the detailed ribosomal protein (RP) preparation procedure in the manuscript, rather than merely referring to the book.

      In addition, we would like to clarify one point related to this comment. The focus of the present study is to show that the presence of two factors is required for single-step ribosome reconstitution under translation-compatible, cell-free conditions. We do not intend to claim that these two factors are absolutely sufficient for ribosome reconstitution. Hence, we have revised the manuscript to more explicitly state what this work does and does not conclude.

      (2) Reconstitution studies in the past have succeeded by using all recombinant, individually purified RPs, which would clearly address the issue in the preceding comment and also eliminate the possibility that an unknown ribosome assembly factor that co-purifies with native ribosomes has been added to the reconstitution reactions along with the RPs.

      As noted in the response to the Comment (1), the focus of the present study is the requirement of the two factors for functional ribosome assembly. Therefore, we consider that it is not necessary to completely exclude the possibility that unknown ribosome assembly factors are present in the RP preparation. Nevertheless, we agree that it is important to clarify what factors, if any, are co-present in the RP fraction. To address this, we performed proteomic analysis of the TP70 preparation (Supplementary Data 3) and stated the possibility of other factors’ inclusion.

      We also agree that additional, as-yet-unidentified components, including factors involved in rRNA modification, could plausibly further improve assembly efficiency. We also consider that such studies may contribute to extending the system to the use of in vitro-transcribed rRNA and fully recombinant ribosomal proteins, which could be essentially a next step of this study. We noted the possibility of as-yet-unidentified components and the future perspectives in the Discussion.

      (3) They never compared the efficiency of the reconstituted ribosomes to native ribosomes added to the "PURE" in vitro protein synthesis system, making it unclear what proportion of the reconstituted ribosomes are functional, and how protein yield per mRNA molecule compares to that given by the PURE system programmed with purified native ribosomes.

      According to this suggestion, we measured the sfGFP synthesis rate from the increase in fluorescence over time under conditions where the template mRNA is in excess, and compared this rate directly between reconstituted and native ribosomes. We consider that this comparison provides insight into what fraction of ribosomes reconstituted in our system are functionally active (Fig. 6).

      As noted in the provisional responses, quantifying protein yield per mRNA molecule is substantially more challenging. The translation system is complex, and the apparent yield per mRNA can vary depending on factors such as differences in polysome formation efficiency. In addition, the PURE system is a coupled transcription–translation setup that starts from DNA templates, which further complicates rigorous normalization on a per-mRNA basis. Because the main focus of this study is to determine how many functionally active ribosomes can be reconstituted under translation-compatible conditions, we addressed this comment by just carrying out the experiment comparing sfGFP synthesis rate.

      (4) They also have not examined the synthesized GFP protein by SDS-PAGE to determine what proportion is full-length.

      We have added an affinity tag to the sfGFP reporter, and then, purified the synthesized products from the reaction mixture and analyzed it by SDS–PAGE (Fig. 7a).

      (5) The previous development of the PURE system included examinations of the synthesis of multiple proteins, one of which was an enzyme whose specific activity could be compared to that of the native enzyme. This would be a significant improvement to the current study. They could also have programmed the translation reactions containing reconstituted ribosomes with (i) total native mRNA and compared the products in SDS-PAGE to those obtained with the control PURE system containing native ribosomes; (ii) with specifc reporter mRNAs designed to examine dependence on a Shine-Dalgarno sequence and the impact of an in-frame stop codon in prematurely terminating translation to assess the fidelity of initiation and termination events; and (iii) an mRNA with a programmed frameshift site to assess elongation fidelity displayed by their reconstituted ribosomes.

      Following the recommendation, we selected DHFR as an enzymatically active protein and used it as a reporter, confirming that it exhibited enzymatic activity comparable to that observed when synthesized by native ribosomes (Fig. 7c). In addition, MS analysis of the purified sfGFP used for SDS-PAGE analysis showed that nearly all peptide fragments were detected, covering almost the entire sequence from the initiator amino acid to the amino acid immediately preceding the stop codon (Fig. 7b, Supplementary Data 2. These results suggest that protein synthesis by the newly assembled ribosomes proceeds smoothly from initiation to termination, with no apparent problem in fidelity, and therefore indicate that functional ribosomes were successfully reconstituted.

      Reviewer #2 (Public review):

      This study presents a significant advance in the field of in vitro ribosome assembly by demonstrating that the bacterial GTPases EngA and ObgE enable single-step reconstitution of functional 50S ribosomal subunits under near-physiological conditions-specifically at 37 {degree sign}C and with total Mg<sup>2+</sup> concentrations below 10 mM.

      This achievement directly addresses a long-standing limitation of the traditional two-step in vitro assembly protocol (Nierhaus & Dohme, PNAS 1974), which requires non-physiological temperatures (44-50 {degree sign}C), and high Mg<sup>2+</sup> concentrations (~20 mM). Inspired by the integrated Synthesis, Assembly, and Translation (iSAT) platform (Jewett et al., Mol Syst Biol 2013), leveraging E. coli S150 crude extract, which supplies essential assembly factors, the authors hypothesize that specific ribosome biogenesis factors-particularly GTPases present in such extracts-may be responsible for enabling assembly under mild conditions. Through systematic screening, they identify EngA and ObgE as the minimal pair sufficient to replace the need for temperature and Mg<sup>2+</sup> shifts when using phenol-extracted (i.e., mature, modified) rRNA and purified TP70 proteins.

      However, several important concerns remain:

      (1) Dependence on Native rRNA Limits Generalizability

      The current system relies on rRNA extracted from native ribosomes via phenol, which retains natural post-transcriptional modifications. As the authors note (lines 302-304), attempts to assemble active 50S subunits using in vitro transcribed rRNA, even in the presence of EngA and ObgE, failed. This contrasts with iSAT, where in vitro transcribed rRNA can yield functional (though reduced-activity, ~20% of native) ribosomes, presumably due to the presence of rRNA modification enzymes and additional chaperones in the S150 extract. Thus, while this study successfully isolates two key GTPase factors that mimic part of iSAT's functionality, it does not fully recapitulate iSAT's capacity for de novo assembly from unmodified RNA. The manuscript should clarify that the in vitro assembly demonstrated here is contingent on using native rRNA and does not yet achieve true bottom-up reconstruction from synthetic parts. Moreover, given iSAT's success with transcribed rRNA, could a similar systematic omission approach (e.g., adding individual factors) help identify the additional components required to support unmodified rRNA folding?

      We fully recognize the reviewer’s point that our current system has not yet achieved a true bottom-up reconstruction. Although we intended to state this clearly in the manuscript, the fact that this concern remains indicates that our description was not sufficiently explicit. We therefore added the paragraph to ensure that this limitation is clearly communicated to readers.

      (2) Imprecise Use of "Physiological Mg<sup>2+</sup> Concentration"

      The abstract states that assembly occurs at "physiological Mg<sup>2+</sup> concentration" (<10 mM). However, while this total Mg<sup>2+</sup> level aligns with optimized in vitro translation buffers (e.g., in PURE or iSAT systems), it exceeds estimates of free cytosolic [Mg<sup>2+</sup>] in E. coli (~1-2 mM). The authors should clarify that they refer to total Mg<sup>2+</sup> concentrations compatible with cell-free protein synthesis, not necessarily intracellular free ion levels, to avoid misleading readers about true physiological relevance.

      We agree that this is a very reasonable point and revised the manuscript to clarify that we are referring to the total Mg<sup>2+</sup> concentration compatible with cell-free protein synthesis, rather than the intracellular free Mg<sup>2+</sup> level under physiological conditions. We also changed the term “physiological” to “near-physiological” to avoid the misunderstanding.

      In summary, this work elegantly bridges the gap between the two-step method and the extract-dependent iSAT system by identifying two defined GTPases that capture a core functionality of cellular extracts: enabling ribosome assembly under translation-compatible conditions. However, the reliance on native rRNA underscores that additional factors - likely present in iSAT's S150 extract - are still needed for full de novo reconstitution from unmodified transcripts. Future work combining the precision of this defined system with the completeness of iSAT may ultimately realize truly autonomous synthetic ribosome biogenesis.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Recommendations for improvement:

      (1) Assess the length distribution of GFP polypeptides being produced using SDS-PAGE.

      SDS-PAGE was performed according to the comment 4 of the Reviewer #1 (Fig. 7b). Please refer to our response addressing the comment.

      (2) Compare the rate and yield of GFP synthesized per mRNA using their reconstituted ribosomes to that obtained with pure native ribosomes.

      The efficiency of the reconstituted ribosomes was compared to native ribosomes according to the comment 3 of the Reviewer #1 (Fig. 6). Please refer to our response addressing the comment.

      (3) Expand the panel of reporter mRNAs being examined to compare the fidelity of initiation, elongation or termination achieved with reconstituted ribosomes to that obtained using native ribosomes.

      DHFR synthesis was addressed and also MS analysis of synthesized sfGFP was performed according to the comment 5 of the Reviewer #1 (Fig. 7b, c). Please refer to our response addressing the comment.

      (4) Revise the manuscript to clarify that the in vitro assembly demonstrated here is contingent on using native rRNA and thus does not achieve a true bottom-up reconstruction from synthetic parts.

      We added to the Discussion a paragraph summarizing the findings of this study, limitations, and future perspectives according to the comment 1 and 2 of the Reviewer #1 and the comment 1 of the Reviewer #2. Please refer to our responses addressing these comments.

      (5) Revise the manuscript to clarify that they are referring to total Mg2+ concentrations compatible with cell-free protein synthesis, not necessarily intracellular free ion levels, to avoid misleading readers about the physiological relevance of the reconstitution.

      We revised the manuscript to clarify this point according to the comment 2 of the Reviewer #2. Please refer to our response addressing the comment.

      (6) Revise the text to fully describe how the native ribosomal proteins (RPs) were purified and indicate whether all subassemblies of RPs were disrupted in the purification procedure.

      We revised the Methods section to clarify how the native RPs were purified and that all subassemblies of RPs were disrupted according to the comment 1 of the Reviewer #1.

      (7) Revise the text to indicate that achieving ribosome reconstitutions using all recombinant, individually purified RPs is required to achieve a true bottom-up reconstruction from all synthetic components.

      As with our response to the comment 4, we have added the point at the end of the Discussion as a future perspective toward true bottom-up reconstruction from all synthetic components.

      (8) Consider conducting a similar systematic omission approach (e.g., adding individual factors) to help identify the additional components required to support unmodified rRNA folding.

      As with our response to the comment 4 and 7, we have added the point at the end of the Discussion as a future perspective toward identification of additional essential factors for true bottom-up reconstruction.

      Reviewer #1 (Recommendations for the authors):

      (1) Assessing the spectrum of GFP polypeptides being produced by SDS-PAGE and comparing the rate and yield of GFP produced to that obtained with pure native ribosomes would seem to be essential additional measurements needed to bolster the evidence supporting the main conclusions of the work.

      SDS-PAGE and MS analysis of the synthesized sfGFP were performed (Fig. 7a, b). Comparison of the assembled ribosomes and native ones were also performed (Fig. 6).

      (2) Examining translation of other reporter mRNAs designed to compare the fidelity of initiation, elongation or termination achieved with reconstituted ribosomes to that produced by native ribosomes in the PURE system would be required to elevate the scientific quality of the work and its significance to the field.

      DHFR synthesis and its activity measurement were performed (Fig. 7c). Also, MS analysis of the purified sfGFP showed that nearly all peptide fragments were detected, covering almost the entire sequence from the initiator amino acid to the amino acid immediately preceding the stop codon (Fig. 7b). We consider that these findings indicate that there is no apparent problem with fidelity.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is an important study that describes the consequences of the DNMT3A mutation in human neuronal development for the first time. The selective impact of DNMT3A function on GABAergic interneurons is interesting and an important feature of future therapeutics. The claims made in that manuscript are supported by strong evidence for the most part. And the data are of high quality in general and presented well.

      Strengths:

      The strengths of the work include: Characterization of multiple DNMT3A loss-of-function alleles, including two misense variants, R882H, P904L, and a deletion allele. The missense mutation lines both include an ideal control with the same genetic background. The CRISPRi-mediated DNMT3A knockdown has also been included. The study identifies the mTOR-PI3K pathway as a factor of overgrowth issues found in the mutant organoid. In bulk mRNA sequencing and whole-genome bisulfite sequencing, identify hypomethylated genomic regions associated with gene expression repression. Again, this is more pronounced in the ventral organoid compared to the dorsal organoid. In addition, the extensive electrophysiological characterizations with a high-density microelectrode array support the more mature status of mutant interneurons.

      Weaknesses:

      Although a strong study overall, some weaknesses are noted. These include:

      (1) The lack of validation data for the generated iPSCs and hESCs, such as the chromosomal contents, ploidy, and pluripotency states.

      We thank the reviewer for their constructive feedback. We previously validated our 882 models with whole genome sequencing and teratoma formation upon mouse fat pad injection, while the parental human embryonic stem cell line (WA01 hESCs) used for P904L variant knock-in was validated by our Genome Engineering Stem Cell (GESC) core upon derivation of that variant knock-in model. We have now added both karyotyping and pluripotency staining (SOX2/OCT4) for all other hPSC lines as (new) Supplementary Figure S17 and included further description in our Methods section under “hPSC Model Generation and Culture”.

      New Data: Supplemental Figure S17 (SOX2/OCT4 staining in hPSCs and karyotyping of all lines used)

      Text edits: Additional language confirming hPSC line validation will be added to the Methods section under “hPSC Model Generation and Culture” on page 18.

      (2) Other weaknesses relate to data interpretation and insufficient discussion of related matters, as detailed in the recommendations to the authors.

      We thank the reviewer for their insightful suggestions and have detailed our responses in the “recommendations to the authors” section.

      (3) Also, some errors are noted and detailed in the recommendation section.

      We thank the reviewer for catching these errors and have since corrected them, with detailed responses below.

      Reviewer #2 (Public review):

      Summary:

      Chapman, Determan et al. investigate how pathogenic mutations in DNMT3A, which cause Tatton-Brown-Rahman Syndrome (TBRS), disrupt human cortical developmental processes using a comprehensive panel of human pluripotent stem cell models spanning DNMT3A loss-of-function severity. The authors aim to identify the cellular and molecular mechanisms underlying TBRS-associated brain overgrowth and intellectual disability, and to test whether mechanistic convergence exists between TBRS and other overgrowth-intellectual disability disorders (OGIDs) caused by mutations in EZH2 (Weaver syndrome) or PIK3CA pathway components. Their central conclusion is that GABAergic interneuron development is selectively vulnerable to DNMT3A mutation, where reduced DNA methylation causes premature de-repression of neuronal and synaptic genes, driving precocious neuronal maturation and hyperactivity sufficient to disrupt neuronal network synchrony. This report adds to a growing literature supporting the vulnerability of GABAergic interneurons in NDDs and further provides a mechanistic view of this vulnerability, potentially convergent across OGIDs. The mechanistic claims around H3K27me3 compensation and mTOR-based therapeutic convergence, while promising, rest on more preliminary evidence and would benefit from the distinction between correlation and mechanism being made more explicit in the text. Overall, this is a compelling study with a rigorous experimental design and novel findings with a potential impact on a better understanding of the OGID pathophysiology.

      Strengths:

      (1) A major strength of this work is the breadth and rigor of the disease modeling approach. Four independent TBRS model systems are used in tandem: a patient-derived iPSC line with isogenic CRISPR-corrected control (R882H), a knock-in hESC model (P904L) with its wild-type isogenic, patient deletion iPSC lines (Del1/2), and CRISPRi knockdown models (G1/G2), collectively spanning a range of DNMT3A loss-of-function that correlates with phenotypic severity. This allelic series design substantially strengthens causal inference beyond what any single isogenic pair could provide.

      (2) The multi-omic integration across matched developmental stages provides a strong mechanistic foundation for the cellular phenotyping and provides significantly enhanced novelty. RNA-seq, whole-genome bisulfite sequencing, and H3K27me3 CUT&Tag are combined in the same cell types, and timepoints show that DNMT3A loss reduces CG methylation at neuronal and synaptic gene loci, leading to premature transcriptional activation.

      (3) The selective vulnerability of ventral (GABAergic) versus dorsal (glutamatergic) progenitors is one of the study's most important findings. This lineage specificity is consistently observed across all model systems and in both 2D and organoid formats, where ventral NPCs show increased proliferation, premature neuronal gene expression, and increased neurogenesis, while dorsal NPCs are largely unaffected at the transcriptomic and cellular level despite exhibiting comparable DNA methylation changes. This adds to a body of emerging work showing GABAergic interneuron vulnerability in NDDs where ubiquitously expressed genes such as chromatin modifiers are perturbed, and provides additional molecular insights into potential mechanisms of "resilience" of dorsal populations.

      (4) The functional characterization follows a logical progression from single-neuron electrophysiology (demonstrating GABAergic hyperactivity with increased action potential amplitude and firing rate) to network-level analysis using high-density multi-electrode arrays. The HD-MEA experimental design - pairing TBRS or control GABAergic neurons with a constant background of control iGlut neurons - cleanly isolates GABAergic dysfunction as the driver of network hypersynchrony.

      Weaknesses:

      (1) The concomitant induction of proliferation and differentiation in TBRS V-NPCs is conceptually striking, since these are generally considered antagonistic developmental programs. The authors partially address this tension by noting that DNMT3A LOF alone is insufficient to initiate neuronal differentiation, i.e., V-NPCs upregulate neuronal and synaptic genes while retaining progenitor identity, implying that transcriptomic priming and commitment to differentiation are decoupled. However, the relationship between the proliferative phenotype and the epigenetic priming phenotype remains mechanistically unresolved. The manuscript documents mTOR pathway upregulation at the protein level and identifies shared DEGs that include proliferative regulators, but it does not establish whether mTOR-driven proliferation and mCG-loss-driven neuronal gene de-repression/enhanced differentiation are causally linked or represent two independent consequences of DNMT3A LOF.

      We thank the reviewer for their comment and agree that this phenotype, whereby progenitors exhibited both increased proliferation and hallmarks of gene expression associated with neuronal differentiation is striking and interesting, given that these are typically antagonistic paradigms during normal development.

      We documented that these phenotypes involve upregulated expression of both neuronal/synaptic and proliferative genes in V-NPCs (Figure 2d), with concomitant loss of repressive DNA methylation at regulatory elements associated with these genes (Figure 2f, Supplemental Data 5). In this work, DNMT3A mutation had a more prominent role in de-repressing neuronal and synaptic gene expression to promote hallmarks of neuron differentiation, while playing a relatively less central role in direct regulation of proliferation genes, as seen from the relative prominence of neuronal/synaptic- versus proliferation-related GO terms in our Supplemental Data 5 table.

      To examine the mechanisms underlying increased V-NPC proliferation in our TBRS models, we assessed a potential relationship with the PIK3/AKT/mTOR pathway, as this is implicated in increased proliferation resulting from DNMT3A-associated mutation in myeloid leukemia (Dai et al., 2017, PMID: 28461508). In our work, DNMT3A mutation increased the expression and/or phosphorylation of mTOR signaling pathway targets specifically in V-NPCs (Figure 1q-r, Supplemental Figure S3a-d). However, while TBRS mutation directly affected repressive DNA methylation at a suite of cell proliferation-related genes, these did not include the PIK3/AKT/mTOR pathway genes themselves, suggesting an indirect relationship between altered DNA methylation and increased mTOR signaling.

      Text Edits: We will incorporate further discussion of how DNMT3A-mediated gene repression and levels of PIK3/AKT/mTOR pathway signaling may be interacting, providing a framework for future studies to identify how these related OGID gene mutations may converge mechanistically.

      (2) Relatedly, the rapamycin rescue experiment is a valuable proof-of-concept for the PIK3/AKT/mTOR convergence but is limited to a single dose in a single model (882) with a single readout (Ki67+ proliferation). Given the prominence of mTOR pathway convergence in the manuscript as a potential shared therapeutic avenue across OGIDs, the data supporting this claim are somewhat preliminary. It remains unknown whether mTOR inhibition rescues downstream phenotypes (neurogenesis, gene expression, neuronal maturation) or whether less severe TBRS models respond similarly. This might also help tackle the first comment above. e.g., if mTOR inhibition rescued proliferation but not the transcriptomic priming, that would support two independent mechanisms.

      We thank the reviewer for their comment. We explored both the overall levels and phosphorylation of proteins involved in PIK3/AKT/mTOR signaling in the 882, 904, Del1, Del2, and KO V-NPC models (Figure 1q-r, Supplementary Figure S3a-d), finding specific increases of all proteins. We showed that rapamycin addition reversed the increased proportion of KI67+ proliferating cell nuclei resulting from 882 mutation in V-NPCs in main Figure 1s, while demonstrating that rapamycin also reduced the proportion of KI67+ nuclei observed in both less severe 904 and Del1 V-NPC models (Supplementary Figure S3e-f).

      We agree that understanding whether rapamycin treatment can rescue TBRS neuronal phenotypes would be very interesting, as previous work on Tuberous Sclerosis Complex has utilized rapamycin and other mTOR inhibitors to effectively reverse TSC-related alterations of neuronal morphology and neuronal hyperexcitability (Buttermore et al., 2025, PMID: 40792287). Future studies examining convergent mechanisms and therapeutics for OGIDs should examine how similarly targeting this and related pathways rescues altered neuronal morphology, maturation, and function, as we have demonstrated that TBRS mutation has subsequent consequences for V-IN differentiation, maturation, and function. This point has been detailed in the discussion section on pages 15-16.

      (3) The claim that H3K27me3 compensates for mCG loss is an important mechanistic point, but the current data do not distinguish between active compensation, in which EZH2 is recruited in response to methylation loss, and functional redundancy, in which H3K27me3 is independently established and becomes the dominant repressive mark once DNA methylation is reduced. The EZH2 knockdown/inhibition experiments show that H3K27me3 is sufficient to maintain repression at hypo-DMR sites, but they do not establish that H3K27me3 gain is itself a response to methylation loss. Because H3K27me3 profiling was performed only in the severe 882 model, it is also unclear whether H3K27me3 gain scales with DNMT3A LOF severity, as a compensatory model would predict. Finally, the EZH2 overexpression rescue is performed in V-NPCs, whereas the compensation model is developed primarily in D-NPCs, making it difficult to assess whether the same mechanism operates in the lineage where it was originally inferred.

      We thank the reviewer for the opportunity to clarify our findings and experimental reasoning. A previous study using a conditional Dnmt3a knockout mouse model (Li et al., 2022, PMID: 35604009) demonstrated increased expression of multiple PRC2 components following the loss of Dnmt3a. This study demonstrated that sites which lost DNA methylation gained H3K27me3 in postnatal neurons upon Dnmt3a loss. Therefore, we hypothesize that the gain of H3K27me3 likely occurs in response to loss of DNMT3A methylation.

      While we did not perform CUT&Tag for H3K27me3 in our less severe models, we did validate gene expression changes following EZH2 knockdown and inhibition in both the R882H (Figure 4g-h) and P904L (Supplementary Figure S8b) models, finding that gene expression was unchanged in the model with the less severe DNMT3A mutation (P904L). Based upon these findings, we hypothesized that compensatory H3K27me3 may occur only upon severe DNMT3A loss, as seen in the dominant-negative R882H model. Furthermore, as H3K27me3 compensation was more prominent in D-NPCs, we hypothesized that this might be sufficient to prevent de-repression and aberrant neuronal gene repression upon loss of DNMT3A-mediated repression in D-NPCs. However, since TBRS mutation caused the most prominent de-repression of neuronal gene expression in V-NPCs, we also tested whether EZH2 overexpression could reverse this, finding that it partially suppressed this dysregulated neuronal gene expression. To better clarify this logic and the findings, we will make text edits to this results section.

      Text edits: We will clarify the reasoning for performing the EZH2 overexpression experiments in V-NPCs and reference Li et al., 2022 in both the results (pg. 9-10) and discussion.

      (4) The narrative framing of dorsal neuron development as unaffected by DNMT3A LOF is somewhat at odds with the data presented. The 882 D-NPCs show substantial DNA methylation changes, and TBRS D-INs exhibit what the authors describe as "substantive transcriptomic differences" involving persistent expression of pluripotency and progenitor genes, which seems to be a distinct but potentially significant phenotype. The impact of DNMT3A loss between ventral and dorsal lineages might be more accurately framed as divergent in nature rather than specific to a certain population.

      We thank the reviewer for their comment. While TBRS mutations appear to have a significantly stronger effect on V-NPCs and subsequently V-INs, both transcriptomic and methylation alterations do also occur upon TBRS mutation in D-NPCs and D-INs, as noted in Supplemental Figure S4d, S11, and Supplemental Data 2. However, we observed substantially greater molecular alterations in V-NPCs/V-INs, a lack of overt cellular phenotypes in D-NPCs where assayed, and a lack of functional consequences in matured D-INs, suggesting a more significant requirement for DNMT3A in regulating the differentiation and subsequent maturation of cortical inhibitory interneurons during embryonic and early pre-natal development, the developmental periods that we can readily model in hPSC-derived neurons.

      It should also be noted that these hPSC differentiation models do not recapitulate post-natal deposition of non-CpG (mCA) DNA methylation, a mechanism disrupted postnatally by TBRS-associated mutations in our prior work in murine models (Harrison Gabel; e.g. Beard et al., 2023, PMID: 37952155). Therefore, we hypothesize that if we could sufficiently mature D-INs to a state that modeled postnatal development and recapitulated this non-CpG methylation, we might be able to detect cellular and functional phenotypes in later stage D-INs. To avoid misinterpretation, we will alter the language in the results section to confirm that there are both transcriptomic and methylation changes in our D-NPCs/D-INs, but that these are not accompanied by cellular phenotypes or neuronal dysfunction.

      Text edits: We will better clarify that there are transcriptomic and methylation changes in D-NPCs/D-INs, but that these changes are minimal compared to those in V-NPCs/V-INs, as supported by the lack of cellular and functional phenotypes seen in D-NPCs/D-INs.

      (5) SST stainings are not entirely convincing. They appear mostly nuclear, and some instances localized to rosettes in organoids, whereas the protein is largely confined to processes and is expected to be found outside progenitor-rich zones like rosettes.

      We agree that the perinuclear SST staining detected in these young ventral telencephalic-patterned organoids at day 30 differs somewhat from the more process-localized and cytosolic signal seen in later stage organoids in other studies. This may be related to the use of different commercial SST antibodies across studies but also likely reflects SST immunoreactivity in newborn neurons near the onset of SST expression. For example, immature SST-immunoreactive neurons in the early postnatal rat cortex exhibit predominant SST staining in perinuclear cytoplasm and short processes (e.g. Fig. 3 in Lee et al, PMID: 9664223) while acquiring more cytosolic and process-localized staining as postnatal neuron maturation occurs. Evaluation of immunopositivity for other markers of neurogenesis (ASCL1) and immature neurons (TUJ1) is also congruent with these findings for SST, with TBRS-associated mutations increasing in the fraction of cells in V-NPCs/V-ORGs that express these three markers.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors investigated TBRS etiology by using new human pluripotent stem cell models, modeling varying levels of TBRS-associated loss of DNMT3A function. They identified increased lineage-specific proliferation of precursors in TBRS ventral MGE-like progenitors, which they propose was related to increased signaling through the PIK3/AKT/mTOR pathway. Furthermore, they show that reduced DNA methylation during MGE-like progenitor differentiation into GABAergic interneurons can cause a premature expression of neuronal and synaptic genes, triggering precocious neuronal maturation. In conclusion, they propose that TBRS-derived GABAergic neurons exhibit hyperactivity that can alters the development and structure of neuronal networks.

      Strengths:

      Overall, the data presented is convincing, from an early developmental point of view, given that the iPSC-derived 2D cultures or organoids used do not get to reach a mature state. Nonetheless, the data clearly show the effects that deleterious mutations in TBRS can cause during the period of neurogenesis, which was missing in the field.

      Weaknesses:

      (1) Li et al., 2022 (referred to in the manuscript) seems to already show the interplay between H3K27me3 and Dnmt3a discussed in this study i.e., that in the absence of DNA methylation, there is an expansion of polycomb-like repression. These data should be better acknowledged in the paragraph 'Repressive H3K27me3 compensates for severe loss of DNA methylation' (page 9), given it supports the data presented in this manuscript and suggests this as a common mechanism in the interplay between these two repressive marks, as it is well established in the literature.

      We thank the reviewer for this suggestion and will incorporate this reference into both the results and the discussion when discussing the respective roles of DNMT3A and PCR2-mediated repression.

      Text edits: We will add Li et al., 2022 to both the results section (pg. 9-10) and our discussion section.

      (2) The authors should acknowledge that the omics data come from a mixed population of cells.

      We thank the reviewer for their comment. We have validated that the established 2-D differentiation methods we used in this study generate cell populations with >85-90% enrichment for the desired progenitor and neuronal cell type, based upon marker expression, but acknowledge that these are bulk -omics data obtained from cells that may represent a mixed population and have now detailed this in the methods section under “Sequencing”.

      Text edits: we will add language acknowledging that our omics data (bulk) was generated from mixed populations of cells.

      (3) The authors are encouraged to further discuss whether the overgrowth observed in ventral GABAergic cultures or organoids compares to the overgrowth observed in diseased patients. One expects MRIs to have been performed in patients and that these could be harnessed to discern if overgrowth occurs in the cortex or ventral regions of the brain.

      We thank the reviewer for their suggestion and do note that at least one published study documents increased cortical thickness in the MRIs of TBRS patients (Jiménez de la Peña et al., 2024, PMID: 37795572); however, to our knowledge studies have not examined regional or cell type-selective overgrowth of cortical tissue in TBRS patients. Future clinical studies examining the nature of the neuronal progenitor overgrowth and resulting consequences for patient brain imaging would be of interest to better understand TBRS-associated etiology of brain overgrowth and its manifestations.

    1. Author response:

      We sincerely appreciate the efforts of the Senior and Reviewing Editors, as well as the three reviewers, for their careful evaluation of our manuscript and their insightful comments. Previous studies have suggested that smooth muscle activity contributes to gut elongation; however, these studies do not directly demonstrate that peristaltic movements per se drive elongation. For example, studies in mouse have primarily focused on residual stress of smooth muscle (Yang et al., 2021), rather than the dynamic spatiotemporal nature of peristalsis. In chickens, inhibition of peristalsis by nifedipine has been interpreted as evidence for a role of peristalsis in gut elongation (Khalipina et al., 2019). However, because nifedipine broadly affects calcium-dependent cellular processes, these experiments cannot distinguish whether the observed effects arise specifically from loss of peristalsis or from other cellular perturbations. In our current study, we aimed to challenge this limitation by combining pharmacological inhibition with optogenetic reactivation. This approach allows us to selectively restore peristaltic movements under conditions in which endogenous peristalsis are suppressed. Based on these experiments, we provide evidence supporting a causal contribution of peristalsis to the anisotropic gut growth. We agree with the reviewers that the positioning of our study relative to previous work should be clarified. In a revised manuscript, we will more clearly distinguish between static mechanical tension and endogenous peristaltic movements, and better define the conceptual advance of our study. In addition to macroscopic growth analysis, we identified cellular dynamics associated with elongation, including circumferentially oriented cell division and peristalsis-dependent longitudinal cell rearrangement. We agree that the mechanistic link between peristalsis and downstream cellular behaviors remains incompletely understood. In the revised manuscript, we will clarify this limitation and outline future directions, including experiments to test the role of mechanical cues (e.g., mechanical perturbation and pharmacological manipulation of mechanotransduction pathways).

      Public Reviews:

      Reviewer #1 (Public review):

      The mechanism by which peristalsis and the cell rearrangement are mediated

      We appreciate this important point. As suggested, the possibility that mechanical aspects of peristalsis contribute to the gut elongation is highly plausible. To address this, we plan to perform additional experiments aimed at isolating the mechanical component of peristalsis. Furthermore, we will investigate the involvement of mechanotransduction pathways, including Piezo-mediated pathway, using pharmacological approaches. We will revise the manuscript to better discuss these possibilities and clarify the current limitations of our study.

      The novelty and positioning of our study

      We appreciate this comment and have addressed this point in the General response above. In the revised manuscript, we will more clearly position our study relative to the previous studies.

      Reviewer #2 (Public review):

      Longitudinal separation of daughter cells even without peristalsis

      We appreciate this insightful and important comment. As noted, daughter cells can exhibit longitudinal separation even under nifedipine treatment, whereas the divergence index (DI) shows a clear increase only in the control (with peristalsis) condition. We interpret this as follows; immediately after cell division, two daughter cells occupy nearly identical positions along the longitudinal axis, and stochastic fluctuations may cause them to separate each other. Such local separation does not necessarily reflect population-level cell rearrangement. In contrast, DI captures collective dispersion of a cell population, which reflects organized tissue-level rearrangement associated with elongation. We will revise the manuscript to clarify this distinction between local cell behavior and population-level dynamics, and to better explain how DI reflects elongation-related processes.

      Contributions from other gut layers and ECMs

      We agree that contributions from other tissue layers and extracellular matrix (ECM) components might be important. To address this, we plan additional experiments including targeted ablation of specific tissue layers and pharmacological manipulation of ECM remodeling (e.g., using MMP modulators). We will also expand the Discussion to better acknowledge these factors.

      Reviewer #3 (Public review):

      (1) We agree that experiments based solely on nifedipine treatment cannot fully exclude potential off-target effects. To address this limitation, we plan to perform additional experiments that rescue the mis-rearrangement of cells by applying mechanical forces.

      (2) We agree that more elaborate analyses of cell proliferation and apoptosis are needed. In the revised manuscript, we will incorporate additional analyses using appropriate markers and methods suitable for developing gut tissue.

      (3) In Figure 2, we had already shown an increased the frequency of peristaltic contractions (30 s intervals, Fig. 2i, j, k, n). This did not result in a significant increase in elongation or widening compared to the control condition (120 s intervals). This suggests that the effect of peristalsis on elongation may reach a plateau at a certain frequency. We will revise the manuscript to clarify this interpretation and discuss its implications.

      (4) We appreciate this important comment and have addressed the issue of novelty and positioning in the General response shown above.

      Reference

      Yang, Y. et al. Ciliary Hedgehog signaling patterns the digestive system to generate mechanical forces driving elongation. Nat. Commun. 12, 7186 (2021).

      Khalipina, D., Kaga, Y., Dacher, N. & Chevalier, N. R. Smooth muscle contractility causes the gut to grow anisotropically. J. R. Soc. Interface 16, 20190484 (2019).

    1. Author response:

      [Editors' note: The authors included an author response to reviews from another journal]

      Reviewer #1 (Comments to the Authors):

      In this manuscript the authors describe that cells in collective movements adopt a superdiffusive behavior to out pace individual cells. This behavior is regulated by cell-cell junctional stability and force transmission. The authors state that speed is regulated by vinculin through mechanosensitivity.

      While is makes intuitive sense that cells may move more efficiently collectively as it reduces their exploratory space and therefore increases their efficiency of movement,

      We agree that this is an intuitive explanation. However, previous literature had shown that confluent cells may or may not migrate depending on conditions that do not solely depend on the space available per cell, but also involve the intrinsic activity of the cell, its cortical tension, and its adhesion with its neighbors, with sometimes counterintuitive effects (doi: 10.1016/J.CEB.2021.07.011). This was the reason that motivated us to investigate how these various ingredients affected space exploration efficiency on different time scales.

      Our results indeed refute the intuition that cells move more efficiently when their exploratory space is reduced by showing that the outcome depends on the time scale considered (Fig. S3B). Specifically, on short time scales (less than 3 hours), the area explored by individual MDCK cells is larger than that explored by MDCK cells at confluence. On a longer time scale (greater than 3 hours), however, the area explored by confluent MDCK cells is larger. This switch is a direct consequence of the change in migratory behavior from persistent random walk to superdiffusion, Moreover, its position in time depends on the cell line: extrapolation of our results on RPE-1 cells suggests that it should theoretically occur after approximately 300hrs, if this time scale was experimentally accessible (Fig. S3F).

      …the role of junctions specifically is less clear.

      We are sorry that we were not able to clearly convey the roles of junctions. We have substantially rewritten our text to address this and all the changes are highlighted in orange. As summarized in Fig. 6F, junctions have three roles. The first role is on persistence, through velocity coordination between neighbors, the second is on speed, through the stability of junctions, and the third role is on directionality, through the sensitivity of the monolayer to the wound edge.

      The first role is evidenced thanks to the comparison of the MSD between single cell and confluent migration assays and the use of the alpha-catenin KD cell line. Alpha-catenin depletion is known to be the most potent disruptor of adherens junctions (DOI:10.1091/mbc.e06-05-0471, , DOI:10.1126/science.aaf7119, (DOI:10.1073/pnas.1002662107, DOI:10.1073/pnas.1119313109), and we show that it significantly alters the superdiffusive behavior that emerges in the confluent migration assay (Fig. 3E,F, 5C). Therefore, junction integrity is critical for the control of cell persistence.

      Moreover, alpha-catenin depletion induces a loss of velocity coordination between neighbors (Fig. S3E), which we show through numerical simulations to induce superdiffusion (Fig. 3G). By contrast, E-cadherin KO and vinculin mutants have no effect on the superdiffusion of confluent cells (Fig. 3E, 4A). Therefore, the critical molecular ingredient is the link provided by alpha-catenin to the cytoskeleton that provides junction integrity.

      The second role of junctions is evidenced thanks to the comparison of cell speeds between single and confluent migration assays with the vinculin mutants (Fig. S4A). Results show that cell speed is reduced of about 10µm/h by confluence, regardless of the mutant except for YE, whose only difference with other mutants is its lower stability (Fig. 4F). This supports that junction stability, and not the other effects of mutants, controls cell speed (we provide a detailed demonstration in the response to the following question). As expected, junction integrity is required as well, as seen from the higher cell speed of the alpha-catenin KD cell line compared to WT (first MSD point in Fig. 3B, E).

      The third role of junctions is evidenced thanks to the comparison between confluent and directed migration assays (Fig. 6A). Results show that the wound healing rate is proportional to cell speed at confluence, regardless of the mutant except for YE, which displays no tension gradient in junctions from front to back cells (Fig. 6C). This supports that such gradient is required for cells to identify on which side is the wound edge. As expected, junction integrity is required as well, as seen from the loss of directional bias of the alpha-catenin KD cell line (Fig. 5F).

      The authors chose vinculin as the basis by which to manipulate tensions at cell-cell junctions, but this comes with considerable drawbacks. Namely, since vinculin appears at both cell-cell and cell-matrix junctions, its role and the role of its mutations is not clear here. The authors state that the collective migration speed is related to junctional stability, but because vinculin is also at FA, how can this be concluded?

      We apologize for the lack of clarity. We hope that the highlighted changes in the revised manuscript will improve this point. As exemplified above, comparing cell migration between isolated cells and confluent cells is essential to enable us to distinguish between the contributions of AJs and FAs. Indeed, since isolated cells lack AJs, the impact of vinculin mutants on single cell migration can only be explained by their effects on FAs. This is how we first determine the effects of vinculin mutants on migration that depend on FAs. Because confluent cells also have FAs, we expect that the effects of vinculin mutants on the migration of isolated cells will still be present in confluent cells, to which will be added the effects of these mutants on AJs and their consequences on migration, if any.

      Therefore, when compared to WT cells, if a given mutant decreases or increases migration speed in individual cells, and does so in confluent cells in the same proportion, then its effects at confluence can be entirely explained by its effects in individual cells, and there are no additional effects of that mutant from AJs. This is indeed what we observe for all mutants except the YE mutant (Fig. S4C), leading us to conclude that none of the vinculin mutants, except the YE mutant, have an effect on migration at confluence that results from AJs. In contrast, the YE mutant has effects on migration at confluence that cannot be explained by its effect on individual cell migration. Therefore, the effects of YE at confluence depend on AJs, whether they result from alterations in AJs, FAs, or both. To distinguish between these scenarios, we proceed by elimination, comparing the effects of YE to those of other mutants on force transmission and adhesion stability, and how these two factors associate with migration speed, as explained below. In FAs, YE alters force transmission differently in individual cells and at confluence, but we already know from Fig. 2 that force transmission in FAs cannot alone explain the speed of migration. This result rules out an indirect effect of AJs on cell migration at confluence through FAs. Furthermore, in AJs, YE affects stability and force transmission, but TL has the same effect on force transmission as YE and we already know that none of the effects of TL on migration depend on AJs (Fig. 3, S4C). This result rules out an effect of force transmission in AJs on migration speed at confluence. We conclude that stability at the AJ level, which is the remaining property specifically impaired by YE, is what regulates migration speed at confluence.

      The manuscript's logic and flow are not clear in some places, making the story hard to follow. As one example, the FRAP data, which the authors suggest is used to investigate vinculin's combined role does not help in this capacity as the interpretation and its connection to the bigger story are not clear.

      We are sorry again for the lack of clarity. We used FRAP data to evaluate the effects of vinculin mutants on adhesion stability. Indeed, mutants have different effects on adhesion stability (Fig. 2E, 4F). In addition, they also have different effects on force transmission (Fig. 2D, 4D,E). The partial overlap in functional alterations caused by the mutants allows us to test the involvement of the overlapping function (here stability) in the overall migration outcome. For example, if two mutants have a similar effect on adhesion stability but different effects on migration speed (such as TL and T12), we can then rule out that speed results from adhesion stability. Similarly, if two mutants have different effects on stability but a similar effect on speed (such as TL and YE), we can also rule out that speed results from stability. We applied the same reasoning to force transmission to conclude that neither adhesion stability nor force transmission alone is sufficient for cells to migrate rapidly. However, the combination of the two enables rapid migration.

      As another example, the information derived from the use of the mutants is not clear in the context of the message in the manuscript since they affect cell-cell and cell-matrix junctions and in some places show results that are counter intuitive and not well-explained, to which the authors admit they are surprising but then do not explain their meaning.

      As such, it is very hard to follow the logic with regard to the information resulting from the mutant experiments.

      We provide above a detailed break-down of our strategy to analyze the results. We regret that our manuscript did not adequately convey our conclusions and we hope that the new version of the manuscript improves this point.

      Proliferation has been shown to play a role in wound healing. Does proliferation change in the various conditions?

      This is an important point. The average speed of cells at confluence is approximately 20 µm/h (Fig. 4B), which means that each cell moves approximately its own size in one hour. During this time, assuming a 16-hour cell cycle, 6% of the cells would have divided, each of them likely pushing one of its neighbors a distance equivalent to the size of a cell. Therefore, cell proliferation accounts for at most a few percent of the total cell movement. For this reason, we can assume that growth does not account for a large part of the movement we observe. This is consistent with previous work showing that proliferation does not contribute significantly to wound healing (DOI: 10.1073/pnas.0705062104, DOI: 10.1083/jcb.201207148).

      Minor comments:

      The authors should provide a better description of the mutants: what does a tailless mutant not bind, or bind differently? More context is needed to help interpret the results. While the mutants have all been published on before, it would be helpful to have more information here so that the manuscript is easier to follow.

      We are sorry that the information we provided was insufficient. We have now detailed the mutations to help the reader understand how interactions are altered.

      Figure 1A is not necessary. Figure 1 overall is fairly predictable as there have been many papers using the persistent random walk as the best model to single cell migration (dating back to the early 1990's). The authors define a new term, angular memory, which they show decreases with increasing delta t as one would predict.

      We acknowledge that persistent random walks have already been observed for individual cells, as in references 3-4 cited in the introduction. Nevertheless, we believe that Figure 1 is important because not all cells migrate as persistent random walkers when isolated. Some migrate in a more exotic manner, resulting in superdiffusive behavior, as in references 5-8 cited in the introduction. Since we observe superdiffusive behavior at confluence (Figure 2), it was therefore necessary to show whether or not single cells were superdiffusive too. We also use this figure to introduce angular memory, a measure that, to our knowledge, has never been used before. According to intuition, it decreases to 0 for persistent random walkers, just as another resembling measure, velocity autocorrelation, would do. However, the angular memory of fractional Brownian walkers does not vanish with increasing delta t (Fig. 3D), while velocity correlation would, just as that of persistent random walkers. This difference makes angular memory much more appropriate for distinguishing between the two migration behaviors, and prompted us to introduce it in the first figure as a reference.

      In the wound healing assay, which cells were measured? Leading edge or interior, and does it matter?

      Figure 5A shows that cells behave differently depending on their distance from the wound. This is because the traces shown correspond to the first few hours of the movie, during which the cells at the front begin to move first. Figure S5A shows the speed of the cells over time after the wound and indicates that the cells reach a stable speed after approximately 3 to 4 hours. Figure S5B shows the speed of the cells as a function of distance from the wound at steady state. These results show that the speed of the cells no longer depends on the distance from the wound at this stage. As indicated in the “Materials and Methods” section, we only considered time points beyond this stage for subsequent analyses of population-averaged MSD and velocity presented in Figure 5, so the location of cells at the front or rear was irrelevant.

      Reviewer #2 (Comments to the Authors):

      To migrate cells must spatially explore their environments, a process that is guided by intrinsic signals (adhesive and mechanical properties, etc) and extrinsic (gradient cues) signals. This exploration can occur on the single or multicellular level. In this study, the authors examine the effect of cell-cell interactions, guidance cues, and cell mechanics in the exploratory capacity of MDCK cells. The authors show that cell-cell adhesion provides a "infinite directional memory for migration" and cell speed is dependent upon the focal adhesion stability, cell mechanics, and the mobility of adherens junctions-these processes are modulated by vinculin.

      My three major concerns with the manuscript are as follows:

      (1) While there is potential new information about the role cell-cell junctions and guidance cues play in cell migration, there is not enough NEW insight presented. Rather the role of vinculin in these processes is expected given what is already known about its ability to control focal adhesion stability, mechanics, and adherens junctions.

      We agree that our cell migration results make sense based on the effects of vinculin mutants on the stability and force transmission of adhesions. Nevertheless, we argue that this was not the only possible scenario. Indeed, we find that none of the effects of vinculin mutants on AJs (except YE) have an impact on cell migration (Fig. S4C). One might have expected that the increased stability provided by the TL and T12 mutants would reduce the speed of collective cell migration, just as the YE mutant increased cell speed due to its altered stability. This is not what we found, and this reveals a nonlinear relationship between AJ stability and migration speed that could be investigated more thoroughly in future studies. Another example is that the effects of the mutants on force transmission in AJs do not impact migration speed at confluence but do impact directed collective migration (Fig. 6). One might have expected that vinculin-mediated force transmission in AJs would impact collective migration, whether directed or not.

      More importantly, we show that the role of intercellular adhesion in cell migration is more complex than expected. Indeed, it depends on the timescale considered: intercellular adhesion is detrimental to short-term spatial exploration and beneficial in the long term (Fig. S3B). Such a timescale-dependent behavior is impossible to predict from previously known effects of the mutants or other molecular considerations. Furthermore, we show that this behavior can be fully explained by the coordination of velocities between neighbors, which depends on intact connections between AJs and the cytoskeleton via alpha-catenin, but is independent of vinculin mutants that connect AJs to the cytoskeleton in parallel with alpha-catenin. One might have expected these connections to also have an impact on velocity coordination, and thus on spatial exploration, but we show that this is not the case (Fig. 3). Finally, we show that directed collective migration has a negligible impact on cell exploration at our experimental timescale (Fig. 5), whereas we initially expected the wound to make migration more ballistic. This reveals that such a directional signal affects spatial exploration at much longer timescales than expected.

      Overall, our results quantify the outcome of competing effects and provide timescales at which one effect outweighs the other in influencing cell migration. We believe this is an original approach that provides substantial new insights into collective cell migration.

      (2) The phenotypes of the cells expressing the mutant vinculins varying greatly. These phenotypes are not addressed despite the fact that they could potentially complicate the analyses. For example, there are dramatic differences between focal adhesion numbers and sizes in the cells expressing the different vinculin mutants; cell spreading is also dramatically altered. Likewise, the T12 mutant vinculin has previously been reported to have increased adhesive strength, increased traction forces, and cell spreading. How does this knowledge change the interpretation?

      We agree that vinculin mutants may have effects on the size and number of FAs, cell spreading, and traction forces that we do not examine here. These consequences can be explained by the effects of these mutants on force transmission in FAs and on their stability, which we report in our work. They do not affect our interpretations. Here, we provide a predictive model of migration speed based on the combination of two consequences of vinculin function, namely stability and force transmission. An interesting avenue for future research would be to assess whether these combinations can be reduced to a single higherlevel effect of vinculin on the cellular phenotype that would be sufficient to predict migration speed. This work remains to be done, as neither FA size and number, cell spreading, adhesion force, nor traction forces alone are sufficient to predict migration speed.

      Along the same lines, it has previously been established that tagged version of vinculin do not efficiently integrate into adherens junctions. Published work from the Nelson laboratory suggests that GFP-vinculins do not localize to cell-cell junctions and work from other laboratories suggests localization occurs only when the endogenous vinculin is silenced.

      We are aware that some GFP-vinculin constructs may not localize as well as the endogenous protein at AJs. This is due to the localization of the GFP tag on the head of vinculin and depends on the length of the linker between GFP and the head of vinculin. The longer the linker, the easier the interaction with AJ partners. Unlike these constructs, the vinculinTSMod sensors we use in our work do not carry a GFP on the head and do not suffer from the same limitations.

      Furthermore, vinculin recruitment to AJs depends on force, with little or no recruitment when tension on the AJs is relaxed (DOI: 10.1038/ncb2055). Vinculin recruitment has in fact already been used as an indicator of AJ tension in Drosophila (DOI: 10.1038/s41467-01807448-8). Consequently, the amount of vinculin visible at the AJs varies depending on the tension exerted on the AJs, which our results confirm: vinculin is more difficult to detect at the AJs in cells located at the front of a wound than in those located at the back (Fig. 6B), which is consistent with the difference in vinculin tension between front and back cells (Fig. 6C) and to the E-cadherin tension gradient between front and back cells (DOI: 10.1083/jcb.201706013). Overall, these results show that vinculin is not always easy to detect at AJs, but this is due to the properties of vinculin, which the constructs we use reproduce better than previous constructs (see also below).

      The images in figure S2 and the prebleach images in figure S4 do not show convincing localization of the mutant vinculins to cell-cell adhesions. This is of special concern given that YE mutant protein hardly has any discernable localization to cell-cell junctions; additionally, none of the mutant proteins were tested for their ability to co-localize with adherens junction components. This raises the question if the parameters being examined and the conclusions drawn from them are affected by a difference in localization.

      We agree that the recruitment of vinculin at intercellular contacts may be difficult to see.

      Besides force-dependent effects mentioned above, other factors are involved. The images shown in Figures S2 and S4 are from live cells in which cytoplasmic vinculin is still present, and its level proportional to the mobility of vinculin. Indeed, the TL and T12 mutants show a more marked contrast between intercellular contacts and the cytoplasm, which is consistent with their greater stability at AJs (Fig. 4F). Conversely, YE shows lower contrast, which is consistent with the lower stability of this construct at AJs (Fig. 4F). The FL construct lies between the two. As a result, the cytoplasmic content can variably mask vinculin recruitment at the AJs depending on the mutant.

      We have now performed additional quantifications of mutant recruitment at intercellular contacts as a function of distance from the basal surface of the cells and relative to their recruitment in FAs, in live cells. Results are shown in the new Fig. S4F. We find that all the constructs are recruited to intercellular contacts with a density that is at most half of that in FAs and that varies along the height. FL shows the highest density, localized more apically, consistent with the localization of an AJ-bound actin belt. The mutants appear to be more homogenously distributed along the height of the lateral surface, which may be explained by their impaired autoinhibition (TL, T12), or mechanosensitivity (YE). This variability also contributes to the difficulty in seeing vinculin recruitment in all cells in a single z-slice.

      To confirm the proper recruitment of vinculin constructs to AJs we have performed immunofluorescence against alpha-catenin and phalloidin on each of the stable cell lines. Results are shown in the new Fig. S4D and E. In these experiments, cell permeabilization allows for the release of some of the cytoplasmic pool of vinculin, which highlights the recruitment of all vinculin constructs to intercellular contacts. There, all vinculin constructs colocalize with alpha-catenin and F-actin, as expected. Additionally, images displayed are maximum intensity projections to mitigate recruitment variability along the height.

      Overall, our results clearly support the localization of vinculin at intercellular contacts, and the differences between the constructs are consistent with the effects of their mutations.

      (3) There is a lack of new mechanistic insight. Conclusions are made about a role of vinculin dimerization. This conclusion appears to be based upon the usage of the mutant version of vinculin Y1065. Did the authors directly measure the ability of this mutant protein to dimerize? Is actin binding also affected.

      The binding properties of the Y1065E mutant, including its dimerization and binding to actin, have already been characterized by other researchers (ref. 40 in our manuscript, as well as DOI:10.1111/j.1432-1033. 1997.01136.x or DOI: 10.1016/j.febslet.2013.02.042). We assumed that these properties are now well established and can be used to explain higher-level phenotypes that we show for the first time, to our knowledge.

      Reviewer #3 (Comments to the Authors):

      Canever et al. tracked two epithelial cell lines on collagen coated glass and showed that isolated cells (non confluent) move as persistent random walkers, whereas confluent monolayers migrate super diffusive, with long range directional memory. By systematically perturbing adhesion machinery they found that focal adhesion mutations mainly tune the speed of single cell tracks, but cannot create long range memory, while force bearing adherens junctions are essential for the super diffusive regime-genetically perturbing them collapses collective memory. These interesting results identify junctional tension as important to switch epithelial cells/sheets between individual and collective search modes - an important quantitative insight that is of clear relevance to cell biologists.

      - The presented data is nicely quantitative and convincing, but I have subtle concerns about the generality of the findings. While the authors show that the differential behavior, they describe is not cell-line specific (MDCK, RPE), there are no experiments evaluating the generality of their conclusions across different matrix conditions. How are the measured migration parameters affected by matrix stiffness? Cell migration on collagen coated glass coverslips is a relatively narrow and artificial condition. How is the collective directional memory expected to behave on softer substrates? The generality of the conclusions could be strengthened by repeating measurements using hydrogels of varying stiffness. Further, it should be discussed to which tissues in the body the selected matrix conditions and migration modes plausibly apply.

      We agree that the generality of our results and the relevance of glass-rigid substrates is an important point. In vivo, epithelial cells rest on a basement membrane with a typical stiffness of approximately 10 MPa, as demonstrated by experimental evaluations on various tissue explants, including renal glomeruli and Bruch's membrane, which are relevant to MDCK and RPE-1 cells (DOI: 10.1111/j.1742-4658.2007.05823.x, DOI: 10.1172/JCI106898, DOI:10.1038/eye.1987.35), we have added these references in the manuscript to support our experimental strategy. In vitro, the most significant effects of substrate stiffness on FAs and cell migration generally occur at much lower stiffnesses, between 0.2 and 100 kPa, and cell phenotypes generally plateau at levels comparable to those observed on glass, even below 100 kPa (DOI: 10.1242/jcs.133645, DOI: 10.1038/ncb3268, DOI:10.1039/c5ib00307e, DOI: 10.1039/c9sm01893j). Furthermore, substrate stiffness has much more moderate effects on confluent cells than on isolated cells. For example, it has been previously demonstrated that confluent layers of MCF10A epithelium showed no change in velocity coordination in the range of 3 to 65 kPa (DOI: 10.1083/jcb.201207148). Therefore, collagen-coated glass appears to be a reasonable model for the basement membrane. Overall, we believe that we have conducted our experiments under physiological conditions, and that our results apply to a wide range of substrate stiffnesses.

      - It would be nice to see how long it takes confluent cell layers to close rectangular wounds of defined size when cells migrate as individual (adherens junctions perturbation) versus collective (wt) (on substrates of different stiffness). Presumably, there should be faster wound closure under the collective regime, at least for simple shaped wounds.

      This is an interesting question, which our results indirectly address. In our study, we measured the wound healing speed of the WT MDCK cell line as well as lines expressing mutant vinculin constructs (Fig. 6A). These results show that this speed ranges from 5 to 15 µm/h depending on the construct expressed (and for reasons that we explain in the manuscript). These values make it easy to estimate the time required to close a wound based on its width. For example, it would take 5 hours to close a 100 µm wide wound for the WT cell line, which has a rate of 10 µm/h (on both sides of the wound).

      Wound closure for cells with disrupted adhesive junctions has already been documented (DOI: 10.1083/jcb.200910041). The results show that wound closure is indeed slower than with WT cells. Although this previous study does not reveal the underlying causes, our work now shows that there are two factors: weaker directional memory due to impaired intercellular coordination and, in the longer term, an additional lack of sensitivity to the guidance signal provided by the wound.

      - Akin to substrate stiffness variation, I am missing experiments that test the effect of cytoskeletal tension on these migration modes. Experiments with Rho kinase or myosin inhibitors could meaningfully broaden the scope of this study.

      Rho kinase or myosin inhibitors applied to cells during the time required to assess migration patterns (a movie recorded overnight is necessary to obtain a statistically reliable calculation of MSD over 3 to 4 hours) are likely to affect many other cellular processes in addition to the cytoskeletal tension directly involved in migration. We believe that the accumulation of these effects will make interpretation of the results very difficult. For example, it has been shown that inhibition of ROCK by Y27 promotes healing of corneal endothelial lesions by affecting proliferation through cyclin D and p27 (DOI: 10.1167/iovs.13-12225), or by improving respiration, which would provide the energy necessary for migration (DOI: 10.1096/fj.202101442RR). Consistently, another study on HaCaT epidermal cells confirms that myosin phosphatase accelerates wound healing through proliferation (DOI: 10.1016/j.bbadis.2018.07.013). In contrast, in HUVEC cells, ROCK inhibition significantly impaired the proliferation and migration of vascular endothelial cells in vitro in a dose-dependent manner (DOI: 10.1097/ICO.0000000000000493).

      Furthermore, previous studies have highlighted that differential contractility at the subcellular level is important for collective migration (DOI: 10.1038/ncb2133, DOI: 10.1083/jcb.201706013), which is not possible to examine with global activation or inhibition of contractility. This prompts the development of more refined and specific measurement and disruption strategies to assess the respective impact of cytoskeletal tension on cell-cell and cell-matrix adhesion mechanisms. Our work, which uses biosensors to assess how this tension differentially affects cell-cell and cell-matrix adhesions, is a step in this direction. The localized spatio-temporal activation or inhibition of myosin subtypes or Rho GTPase regulators specific to these adhesion structures will likely answer these questions in the future, but we believe that the development and application of these approaches will require a substantial amount of work that goes beyond the scope of our study.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      This study uses dental traits of a large sample of Chinese mammals to tract evolutionary patterns through the Paleocene. It presents and argues for a 'brawn before bite' hypothesis -- mammals increased in body size disparity before evolving more specialized or adapted dentitions. The study makes use of an impressive array of analyses, including dental topographic, finite element, and integration analyses, which help to provide a unique insight into mammalian evolutionary patterns.

      Strengths:

      This paper helps to fill in a major gap in our knowledge of Paleocene mammal patterns in Asia, which is especially important because of the diversification of placentals at that time. The total sample of teeth is impressive and required considerable effort for scanning and analyzing. And there is a wealth of results for DTA, FEA, and integration analyses. Further, some of the results are especially interesting, such as the novel 'brawn before bite' hypothesis and the possible link between shifts in dental traits and arid environments in the Late Paleocene. Overall, I enjoyed reading the paper and I think the results will be of interest to a broad audience.

      Weaknesses:

      For the original draft of the manuscript, I had four major concerns with the study, especially related to the sampling, diet, and evidence for the 'brawn before bite' hypothesis. I still believe that the original issues that I raised may be weaknesses of the study. For example, there is still limited discussion on diets (even though the dental topographic analyses used in the study are designed for inferring diets). And I find the results a little challenging to interpret because teeth of multiple positions are included in the same samples, which seems problematic. That said, the authors have addressed each of my previous concerns and have made major revisions, including running new analyses, and thus I support the paper.

      This revised submission includes only minor changes aimed at clarifying the main text.

      Reviewer #2 (Recommendations for the authors):

      I appreciate that the authors made many improvements to their study based on reviewers' comments. I don't have any remaining major issues with the paper, but I do have several minor comments.

      Thank you for taking the time to provide additional helpful feedback on our study. We have made minor revisions to the manuscript based on your suggestions. Please see our point-by-point response below.

      Lines 48-50. I reiterate my suggestion in my previous review to explicitly state which clade is being discussed, which is important because several major mammal groups beyond placentals (metatherians, multituberculates, dryolestoids, gondwanatherians) survived the K-Pg and had very different diversification patterns. You mention "mammal taxonomic diversity" but in the next sentence say "This initial placental mammals diversification ..." and later mention "stem placental/eutherian lineages." To stay consistent, you might replace "mammal" (L48) and "placental mammals" (L50) with "eutherian(s)" (usually defined as stem + crown placentals). If you follow this suggestion, then elsewhere in the paper I recommend replacing "mammals" with "eutherians" for consistency.

      Thank you for this suggestion. We modified the use of “mammals” throughout the text to general reference to the group only; specific mentions of the dataset analyzed are revised to “eutherians.”

      Lines 75-83. I respect the authors' hesitancy to reconstruct specific diets for the fossil taxa (L75-83), especially considering that dental topographic analyses (DTAs) often struggle to differentiate diets in extant taxa (e.g., Pineda-Munoz et al. 2016 Methods Ecol Evol). I still think that the authors might be able to interpret dietary trends from their results (e.g., an increase in average OPCR values indicating a shift toward more herbivorous diets) - I think discussing dietary trends would be an interesting discussion topic later in the paper. That said, I also recognize that different DTA results seem to show conflicting dietary trends (based on my limited knowledge of those metrics) so maybe that complicates things too much.

      We concur with Reviewer 2 that dietary inferences of DTA data are premature, especially given the ongoing controversies of its use in studies of extant mammal teeth. We kept our current scope of discussion unchanged.

      Lines 75-77. "early mammals ... are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction." But your fossils (eutherians) are certainly within 'phylogenetic brackets' of modern clades (therians, i.e. Eutheria + Metatheria). Maybe you're alluding to the fossils being stem lineages of extant subgroups like Ungulata, which means we can't bracket them specifically within those eutherian subgroups? So, I recommend revising or expanding your statement for clarity. Also, the considerable phylogenetic uncertainty for Paleocene groups (e.g., Halliday et al. 2015) complicates this issue, which you could mention.

      We modified the sentence to now say “Additional complications with ecomorphological analysis of these stem eutherians include the uncertainty in their dietary ecology, having diverged prior to the crown radiation, and uncertainty in phylogenetic positions of Paleocene taxa [7]; thus, they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction.”

      Line 84. "We investigated dental topography-performance shifts ...". You haven't introduced dental topography or even mentioned teeth yet, and "performance shifts" is vague. So, this phrase might confuse readers. Maybe you can just erase it and start the sentence with "We investigated the timing of ecomorphological ..."?

      We made the recommended revision.

      Lines 104-105 (and elsewhere). "Dental traits paralleled Paleocene global and regional environmental conditions" and "We found that dental topographic trait variability in Paleocene mammals in south China tracked global and regional climatic changes". These conclusions seem a little too assertive to me. Your sample is grouped into 3 rough time bins (of somewhat uncertain ages) and is from a relatively small geographic range - that seems like very limited information for inferring links between dental patterns and climatic changes, especially global patterns. I think it's worth HYPOTHESIZING that dental traits are linked to environmental/climatic changes (with results like those in Figure 2A & B as evidence to support that hypothesis), but I wouldn't make that claim with any confidence. So, I recommend that you temper your relevant conclusion statements. For example, for Line 105, you could replace "We found ..." with "We posit ..." (L105). I would make similar changes to similar statements throughout the paper (e.g., L243).

      Thank you for this suggestion to temper our phrasing. We edited throughout the text to make our interpretations less assertive.

      Figure 1 (and your response to reviewers). Why was the timescale changed to 65.5 Ma for the K-Pg boundary? The K-Pg is 66 Ma (not 65.5), which is the age you mention in the text (e.g. Pg 3 L39) and is well established in the literature - see recent papers from the Paul Renne lab for a more exact age.

      We revised the figure to have the K-Pg at 66 Ma.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Meiotic recombination at chromosome ends can be deleterious, and its initiation-the programmed formation of DSBs-has long been known to be suppressed. However, the underlying mechanisms of this suppression remained unclear. A bottleneck has been the repetitive sequences embedded within chromosome ends, which make them challenging to analyze using genomic approaches. The authors addressed this issue by developing a new computational pipeline that reliably maps ChIP-seq reads and other genomic data, enabling exploration of previously inaccessible yet biologically important regions of the genome.

      In budding yeast, chromosome ends (~20 kb) show depletion of axis proteins (Red1 and Hop1) important for recruiting DSB-forming proteins. Using their newly developed pipeline, the authors reanalyzed previously published datasets and data generated in this study, revealing heretofore unseen details at chromosome ends. While axis proteins are depleted at chromosome ends, the meiotic cohesin component Rec8 is not. Y' elements play a crucial role in this suppression. The suppression does not depend on the physical chromosome ends but on cis-acting elements. Dot1 suppresses Red1 recruitment at chromosome ends but promotes it in interior regions. Sir complex renders subtelomeric chromatin inaccessible to the DSB-forming machinery.

      The high-quality data and extensive analyses provide important insights into the mechanisms that suppress meiotic DSB formation at chromosome ends. To fully realise this value, several aspects of data presentation and interpretation should be clarified to ensure that the conclusions are stated with appropriate precision and that remaining future issues are clearly articulated.

      (1) To assess the chromosome fusion effects on overall subtelomeric suppression, authors should guide how to look at the data presented in Figure 2b-c. Based on the authors' definition of the terminal 20 kb as the suppressed region, SK1 chrIV-R and S288c chrI-L would be affected by the chromosome fusion, if any. In addition, I find it somewhat challenging to draw clear conclusions from inspecting profiles to compare subtelomeric and internal regions. Perhaps, applying a quantitative approach - such as a bootstrap-based analysis similar to those presented earlier-would be easier to interpret.

      The reviewer is correct that we could not simply fuse two ends but had to create translocations that also removed variable amounts of subtelomeric sequence. Targeted translocations require unique sequences, and thus the extent to which telomeric sequences were deleted varied based on the availability of such sequences. As noted by the reviewer this necessarily limits the conclusions that can be drawn. We have expanded the description of this experiment and also explicitly state the limitations of this assay. To improve clarity, we have also included a schematic to better highlight which chromosomal sequences were removed.

      To further probe our finding that subtelomeric axis protein enrichment may largely be encoded in cis, we now compared axis protein enrichment between S288c and SK1, as suggested by reviewer 2. For this analysis, we took advantage of a dataset we had produced previously that measures Red1 enrichment in SK1/S288c hybrid strains, which provide a powerful internally controlled setup that eliminates effects caused by differential timing and synchrony between samples. As now shown in Supplementary Fig. 5, SK1 and S288c differ substantially in their subtelomeric architecture at many ends, including extensive differences in the number and distribution of Y’ elements. Importantly, axis protein distribution was very consistent between SK1 and S288c when correcting for the differences in length of individual chromosome ends, supporting the conclusion that axis protein enrichment levels are primarily encoded in cis. This analysis is now shown in Fig. 2c. These data also indicate that the presence of a Y’ element does not affect axis protein levels beyond displacing the axis-recruiting sequences further into the chromosome interior.

      (2) The relationship between coding density and Red1 signal needs clarification. An important conclusion from Figure 3 is that the subtelomeric depletion of Red1 primarily reflects suppression of the Rec8-dependent recruitment pathway, whereas Rec8-independent recruitment appears similar between ends and internal regions. Based on the authors' previous papers (referencess 13, 16), I thought coding (or nucleosome) density primarily influences the Rec8-independent pathway. However, the correlations presented in Figure 2d-e (also implied in Figure 3a) appear opposite to my expectation. Specifically, differences in axis protein binding between chromosome ends and internal regions (or within chromosome ends), where the Rec8-dependent pathway dominates, correlate with coding density. In contrast, no such correlation is evident in rec8Δ cells, where only the Rec8-independent pathway is active and end-specific depletion is absent. One possibility is that masking coding regions within Y' elements influences the correlation analysis. Additional analysis and a clearer explanation would be highly appreciated.

      Thank you for pointing this out. We now also included Y’ elements in the analysis in Fig 2d. Including the Y’ elements yielded an increase in average coding density near the very ends of the chromosomes. This increase matches the higher level of axis protein binding seen in rec8 mutants in Fig. 3a and is consistent with the previously noted link between coding density and axis protein deposition. We now provide further description in the text and the figure legends.

      We do not have an explanation for why there is no correlation with coding density in the EARs but assume that this reflects the unique regulation of this region (as also implied by Supplementary Fig. 4d). At present, the signals that establish the EARs remain unknown although our data indicate that the Hop1-CBR as well as Dot1 are important for axis protein enrichment in the EARs.

      (3) The Dot1-Sir3 section staring from L266 should be clarified. I found this section particularly difficult to follow. It begins by stating that dot1∆ leads to Sir complex spreading, but then moves directly to an analysis of Red1 ChIP in sir3∆ without clearly articulating the underlying hypothesis. I wonder if this analysis is intended to explain the differences observed between dot1∆ and H3K79R mutants in the previous section. I also did not get the concluding statement - Dot1 counteracts Sir3 activity. As sir3Δ alone does not affect subtelomeric suppression, it is unclear what Dot1 counteracts. Perhaps, explicitly stating the authors' working model at the outset of this section would greatly clarify the rationale, results, and conclusions.

      Thank you for this comment. We reworked the introduction to this paragraph to be more focused on Sir3 rather than Dot1. We hope that this introduction is less confusing and more in line with the data presented in this paragraph. We also expanded the conclusion to suggest the alternative possibility that the Sir complex only becomes a regulator of axis proteins in the absence of Dot1.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Raghavan and his colleagues sought to identify cis-acting elements and/or protein factors that limit meiotic crossover at chromosome ends. This is important for avoiding chromosome rearrangements and preventing chromosome missegregation.

      By reanalyzing published ChIP datasets, the researchers identified a correlation between low levels of protein axis binding - which are known to modulate homologous recombination - and the presence of cis-acting elements such as the subtelomeric element Y' and low gene density. Genetic analyses coupled with ChIP experiments revealed that the differential binding of the Red1 protein in subtelomeric regions requires the methyltransferase Dot1. Interestingly, Red1 depletion in subtelomeric regions does not impact DSB formation. Another surprising finding is that deleting DOT1 has no effect on Red1 loading in the absence of the silencing factor Sir3. Unlike Dot1, Sir3 directly impacts DSB formation, probably by limiting promoter access to Spo11. However, this explains only a small part of the low levels of DSBs forming in subtelomeric regions.

      Strengths:

      (1) This work provides intriguing observations, such as the impact of Dot1 and Sir3 on Red1 loading and the uncoupling of Red1 loading and DSB induction in subtelomeric regions.

      (2) The separation of axis protein deposition and DSB induction observed in the absence of Dot1 is interesting because it rules out the possibility that the binding pattern of these proteins is sufficient to explain the low level of DSB in subtelomeric regions.

      (3) The demonstration that Sir3 suppresses the induction of DSBs by limiting the openness of promoters in subtelomeric regions is convincing.

      Weaknesses:

      (1) The impact of the cis-encoded signal is not demonstrated. Y' containing subtelomeres behave differently from X-only, but this is only correlative. No compelling manipulation has been performed to test the impact of these elements on protein axis recruitment or DSB formation.

      Thank you for this comment. Our data indeed appeared contradictory because XY’ ends showed overall lower axis protein enrichment, yet our analysis of chromosome fusions, which also eliminated Y’ elements at some the fused ends, provided no evidence for an effect of Y’ elements at those ends. As also noted in the response to reviewer 1, we now compared axis protein enrichment between S288c and SK1, which differ substantially in their number and distribution of Y’ elements (Supplementary Fig. 5). We found that axis protein distribution and enrichment was very consistent between SK1 and S288c when correcting for the displacement caused by the presence of Y' elements and other subtelomeric sequences (now shown in Fig. 2d). These data support the conclusion that axis protein enrichment levels are primarily encoded in cis and indicate that the presence of Y’ elements does not affect axis protein levels beyond displacing the axis-recruiting sequences further into the chromosome interior (giving rise to the apparently lower axis protein enrichment on XY’ ends).

      (2) The mechanism by which Dot1 and Sir3 impact Red1 loading is missing.

      Although we do not yet understand the precise molecular details of these effects, we nevertheless believe we have obtained several important insights into this mechanism. First, our data indicate that the suppressive effect of the ends primarily impacts the Rec8-dependent loading of Red1, whereas loading via the Hop1-CBR is largely unaffected. The effect of Dot1 thus likely occurs via the Rec8-Red1 interaction. Second, the increase in Red1 recruitment is fully rescued by deletion of Sir3, suggesting that Sir3 becomes a promoter of axis protein recruitment in the absence of Dot1. These dependencies are now outlined in the model in Fig. 9. We would also like to note that the Sir complex was previously shown to impact cohesin in mitotic cells. Thus, a connection between the Sir complex and cohesin is not without precedent.

      (3) Sir3's impact on DSB induction is compelling, yet it only accounts for a small proportion of DSB depletion in subtelomeric regions. Thus, the main mechanisms suppressing crossover close to the ends of chromosomes remain to be deciphered.

      Thank you, we absolutely agree. We had discussed this point in the discussion but now also explicitly state this point in the abstract and expanded the discussion of these findings in the results and discussion.

      Reviewer #3 (Public review):

      Summary:

      The paper by Raghavan et. al. describes pathways that suppress the formation of meiotic DNA double-strand breaks (DSBs) for interhomolog recombination at the end of chromosomes. Previously, the authors' group showed that meiotic DSB formation is suppressed in a ~20kb region of the telomeres in S. cerevisiae by suppressing the binding of meiosis-specific axis proteins such as Red1 and Hop1. In this study, by precise genome-wide analysis of binding sites of axis proteins, the authors showed that the binding of Red1 and Hop1 to sub-telomeric regions with X and Y' elements is dependent on Rec8 (cohesin) and/or Hop1's chromatin-binding region (CBR). Furthermore, Dot1 functions in a histone H3K79 trimethylation-independent manner, and the silencing proteins Sir2/3 also regulate the binding of Red1 and Hop1 and also the distribution of DSBs in sub-telomeres.

      Strengths:

      The experiments were conducted with high quality and included nice bioinformatic analyses, and the results were mostly convincing. The text is easy to read.

      Weaknesses:

      The paper did not provide any new mechanistic insights into how DSB formation is suppressed at sub-telomeres.

      We respectfully disagree with this assessment. We show that the Sir complex suppresses DSB formation at a number of cryptic hotspots in the X elements and the adjacent subtelomeric sequences by causing chromatin compaction. The role of the Sir complex in transcriptional silencing, chromatin accessibility, and DSB formation had not previously been analyzed in the meiotic subtelomeres. That being said, Sir-dependent suppression is clearly not the only mechanism that suppresses DSBs in the subtelomeres, as we only observed DSB formation at a small number of hotspots. This was in and of itself a surprise, in particular given the large scale effect on chromatin compaction. We made an effort to more strongly emphasize the fact that additional layers of regulation must exist in the abstract and in the discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Evidence for cis-acting suppression by Y' elements requires further support. The authors propose that Y' elements act in cis to suppress axis protein association at chromosome ends. While this is an attractive model, the current analyses do not yet provide sufficient support for it.

      Thank you for this comment. Our data indeed appeared contradictory, because XY’ ends showed overall lower axis protein enrichment, yet our analysis of chromosome fusions, which also eliminated Y’ elements at some the fused ends provided no evidence for an effect of Y’ elements at those ends. As noted above, we now compared axis protein enrichment between S288c and SK1, which differ substantially in their number and distribution of Y’ elements (Supplementary Fig. 5). We found that axis protein distribution and enrichment was very consistent between SK1 and S288c when correcting for the displacement caused by the presence of Y' elements and other subtelomeric sequences. These data support the conclusion that axis protein enrichment levels are primarily encoded in cis and indicate that the presence of Y’ elements does not affect axis protein levels beyond displacing the axis-recruiting sequences further into the chromosome interior (giving rise to the apparently lower axis protein enrichment on XY’ ends).

      (1a) In Figure S4c, the authors masked Y' elements to rule out the possibility that reduced binding within Y' elements themselves accounts for the overall underrepresentation in subtelomeric regions. However, since the authors propose that Y' elements suppress axis protein binding in surrounding regions in cis, it is appropriate to perform this analysis specifically on chromosome ends containing XY'.

      Thank you for this suggestion. We agree that this would specifically affect the XY’ ends. However, given that we did not see a change even with all the ends, we do not expect a change with just the XY’ ends.

      (1b) In Figure 2b-c, the authors conclude that removal of Y' elements by chromosome fusion does not reveal a long-range suppressive effect. However, the spatial extent of Y'-mediated suppression is not defined, making it unclear whether this experiment can test the proposed model. Perhaps plotting the averaged axis protein profile as a function of distance from Y' elements could help define the effective range of suppression and clarify whether the fusion experiment is informative in this context.

      Thank you. As noted above we now compared SK1 and S288c ends, which provided further evidence that Y’ elements do not affect axis protein enrichment beyond displacing binding sites further into the chromosome interior. In addition, we substantially expanded the description of the chromosome fusion experiment to more clearly outline the setup and the limitations of this experiment.

      (2) L402: "one of the first pieces of direct evidence that nucleosomes block meiotic DSB formation in vivo" sounds overstated, considering past publications (e.g., ref 45 and S. pombe ade6-M26 papers).

      We toned this down and added the references.

      (3) Figure 2e and other scatter plots: Correlation coefficients are reported without p-values. If the authors prefer to use confidence intervals from linear regression instead, they should justify this approach.

      We added p-values to all scatter plots.

      (4) Figure 7f. Explain blue dots.

      We apologize for this oversight (also applies to Supplementary Fig. 10). The blue dots are measurements within 5 kb of an X element. The red dots are the rest of the genome. We now included a legend in the panel to clarify this notation.

      (5) Figure 8d. To assess whether the conclusion can be generalized, the authors could plot the MNase and TrAEL-seq signal fold changes (sir3Δ/SIR3) for hotspots within 5 kb of X elements.

      We attempted various analyses in this direction. However, the range of the MNase-seq effect in sir3 mutants is much greater than the effect on DSBs, making it difficult to make any correlative statements. There are clearly additional layers of DSB suppression in the telomeric regions, and loss/gain of nucleosomes is not sufficient to switch hotspots on/off at most hotspots. We now included a statement to this end in the abstract and also further discuss this notion in the discussion.

      (6) Figure S1c. The apparent difference in X-element distribution may be influenced by bin size. This could be tested by repeating the analysis using smaller bins, comparable in size to the X elements, for all regions.

      We thank the reviewer for this thoughtful suggestion. To address this concern, we repeated the analysis using smaller bins comparable in size to X elements (450 bp) across all region types. Specifically, X elements were analyzed per annotated element, while Y′ elements, subtelomeric 20 kb regions, and internal regions were subdivided into fixed 450 bp windows, and mean input coverage was calculated for each window using the same width-weighted approach.

      This reanalysis did not materially alter the overall distribution patterns observed in Figure S1c. We observed only minor shifts in absolute values, which are expected when changing bin granularity.

      Any residual differences likely reflect underlying copy number of X elements at chromosome ends. Importantly, all ChIP signals in the manuscript are normalized to their corresponding input (ChIP/Input), which mitigates potential biases arising from local copy number variation.

      (7) Figure S2. X elements are difficult to find (e.g., chrVII-L).

      We now included arrowheads at locations with full-length X elements. Partial X elements are marked with stars.

      (8) Figure S7. Please indicate the endpoints of spreading.

      As apparent in this figure and also indicated in the quantification in Supplementary Fig. 9a, spreading of the Sir complex is in most cases quite limited. The example in Supplementary Fig. 9b is one of the largest spreads we observed. The scale of the spreading is hard to meaningfully visualize in Supplementary Fig. 8 given the relatively large genomic distances shown in these profiles. We therefore refer the reader to the analyses shown in Supplementary Fig. 9a, which shows chromosome-resolved extent of spreading.

      Reviewer #2 (Recommendations for the authors):

      To go beyond the correlation between the presence of Y' elements and low levels of protein axis binding, subtelomeres could be easily truncated. Analyzing strains with different distributions of Y' elements would also be informative. The correlative analysis could also be expanded to compare how far the influence of Y' elements goes and whether the number of Y' impacts the extent of protein axis depletion.

      We respectfully disagree with the assertion that subtelomeres could easily be truncated. The high repetitiveness of these sequences makes targeted manipulations of the extreme ends where the Y’ elements are located essentially impossible and is the main reasons for the limitations associated with the analysis of the chromosome fusions as outlined in the response to reviewer 1.

      However, we would like to thank the reviewer for their suggestion to analyze different strain backgrounds. We now compared axis protein enrichment between S288c and SK1. For this analysis, we took advantage of a dataset we had produced previously that measures Red1 enrichment in SK1/S288c hybrid strains, which provide a powerful internally controlled setup that eliminates effects caused by differential timing and synchrony between samples. As now shown in Supplementary Fig. 5, SK1 and S288c differ substantially in their subtelomeric architecture at many ends, including extensive differences in the number and distribution of Y’ elements. Importantly, axis protein distribution was very consistent between SK1 and S288c when correcting for the differences in length of individual chromosome ends, supporting the conclusion that axis protein enrichment levels are primarily encoded in cis. This analysis is now shown in Fig. 2c. These data also indicate that the presence of Y’ elements does not affect axis protein levels beyond displacing the axis-recruiting sequences further into the chromosome interior.

      Given the separation between protein axis loading and DSB induction, it would be interesting to test whether the presence of Y' elements influences the frequency and position of DSB induction.

      We agree that this experiment would be very interesting. However, given the experimental challenges associated with targeted manipulation of Y’ elements as outlined above, we believe that this experiment lies outside the scope of this study. Our observations that Y’ elements do not grossly influence axis protein enrichment in their vicinity may also make an effect on DSB formation less likely.

      The effect of Dot1 on Red1 loading is intriguing because it is at least partially independent of its main known target H3K79, yet fully dependent on Sir3. However, this effect extends far beyond Sir3 binding as detected by ChIP. This is surprising because Dot1 has a limited effect on Sir3 binding as detected by ChIP, and SIR3 deletion has no impact on Red1 binding. However, Dot1 was shown to limit Sir3 spreading to 20 kb on average when overexpressed (Katan-Khaykovich and Struhl 2005; Hocher et al, 2018). It would be interesting to test whether the regions affected by DOT1 deletion coincide with the zone covered by Sir3 upon overexpression (Extended Silent Domains: ESDs, Hocher et al., 2018).

      We agree that this would be an interesting analysis. Unfortunately, the available data on the extended silent domains were not obtained in SK1 and, as noted above, the chromosome end structure differs substantially between the strains, preventing direct comparisons without repeating all the relevant analyses in S288c. In addition, the available data was collected in vegetative cells, although this may be less of an issue given that our analyses show similar spreading in vegetative and meiotic cells. However, short of repeating SIR3 overexpression in meiosis (which also would require a different overexpression regimen as galactose interferes with meiosis), we are not in a position to do this analysis.

      As mentioned in the manuscript, the interplay between the Sir complex and Dot1 has been shown to affect checkpoint regulation during meiotic recombination. However, a discussion on how this relates to the observations reported here is missing.

      Thank you. We included a discussion of this role and its relation to our observations.

      Also, it is unclear why the authors did not investigate the impact of Dot1 and Sir3 impact on the binding of Hop1 rather than Red1, given that Hop1 is currently « the most upstream regulator of recombination known to be depleted about 20 kb from chromosome ends. »

      We changed this statement in the introduction to avoid confusion and also included a model figure that specifically highlights the Rec8-dependent recruitment as a regulatory target.

      Our data show that most of the telomere-proximal effects seem to act through the Rec8-dependent recruitment pathway for which Red1 is the most upstream regulator known. So, although the most upstream factor known before this study was Hop1, our data now identify the interaction between Red1 and Rec8 as the most upstream regulatory node.

      Sir3's impact on DSB induction is compelling, yet it only accounts for a small proportion of DSB depletion in subtelomeric regions. Thus, the main mechanisms suppressing crossover close to the ends of chromosomes remain to be deciphered. This should be acknowledged and discussed.

      In addition to the explicit statement of this conclusion in the results, we now added another statement in the abstract and also expanded the discussion of the fact that there are clearly additional levels of regulation that remain to be discovered.

      Reviewer #3 (Recommendations for the authors):

      Major points:

      It would be nice to show a schematic summary of the authors' main conclusion.

      Thank you, we now included a model schematic as Fig. 9.

      Minor points:

      (1) Supplemental Figure 2: A small box for the X element is marked with the same color as the Y' element, and so it is very hard to find the X element. Please use the clearer color, and it would be nice to show the chromosome ends without the X element (lines 129-130).

      We now included arrowheads at locations with full-length X elements. Partial X elements are marked with stars. This notation also makes it obvious which ends lack annotated X elements.

      (2) Line 156-163, Figure 2b: In the main text, "chromosome fusion between chromosome IV right arm and chromosome I left arm" should be mentioned. Moreover, it isn't very clear to have the data in the S288C background. The fusion points are different between S288C and SK1 (the structures of these ends are quite different). Please explain the authors' logic in the text. 

      To improve clarity, we have included a schematic to better highlight which chromosomal sequences were removed. We have also substantially expanded the description of this experiment and explicitly state the limitations of this assay.

      (3) Supplemental Figure 6: Since the sir3 mutation affects the binding of Red1 EARs (and centromeres). It would be nice to show the similar sets for the HML, MAT, and HMR loci (and intergenic regions as a control).

      We are unfortunately statistically underpowered to perform a meta-analysis of just HML, HMR and MAT. However, we now indicated the positions of HML and HMR in Supplementary Fig. 2 and 8, so the binding of the axis proteins and Sir3 can be inspected directly. MAT is not within 50 kb of a chromosome end and thus was not captured in these analyses.

      (4) Line 322-, the section: From here, the authors switched their story from the sir3 to the sir2. It would be nice to provide the logic with a small introduction on the relationship between Sir2 and Sir3.

      We apologize for this confusion. We are not switching our story to Sir2 but rather are taking advantage of an available dataset that analyzed DSBs in sir2 mutants. We then return to Sir3 to also analyze DSBs in the sir3 mutant and analyze its interaction with a dot1 mutation. To better support the logic, we now briefly reiterate that Sir2 and Sir3 are part of the same complex at the beginning of this section.

      (5) Line 330-331, Figure 8a (and also Supplemental Fig. 8c): Would you explain a bit more about matched strain in the text or figure legend? Each dot represents a strain. If so, please show the strains used here.

      Each dot refers to an individual X or Y’ element that is shown matched in WT and mutant to highlight the trends at the level of individual elements. This is noted in the figure legend.

      (6) Supplemental Figure 7 (and 2): It would be nice to show the position of the HML, MAT, and HMR loci as well as the centromeres in the Figure.

      We now indicated the positions of HML and HMR in Supplementary Fig. 2 and 8. MAT and the centromeres are not located within 50 kb from chromosome ends.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank you for the time you took to review our work and for your feedback! The main changes to the manuscript are:

      (1) We have performed additional experiments to increase the number of recordings from frontal and occipital electrodes (previously 51 (occipital: O1+O2) and 26 (frontal: Fp1+Fp2), now 133 and 102). The additional data have strengthened many of our results, including for example the trend for a latency difference between occipital and frontal electrodes that was likely underpowered and is now significant (Figure 3E). We have updated all relevant figures to include the additional data (Figures 2–6, Figure S4, Figure S5). None of the main conclusions have changed.

      (2) As suggested by reviewer 1, we have conducted additional experiments to rule out the possibility that the observed effects were driven by the temporal order of open and closed loop sessions (new Figure S6). We also found another 9 participants who were willing to go on the ‘vomit comet’ of six degrees of freedom (6DOF) playback (previously 5, now 14). These data have further strengthened our conclusion that playback halt responses in 4DOF and 6DOF playback are not substantially different (Figure S4).

      (3) To address the point of reviewers 2 and 3, that mismatch negativity (MMN) responses would be larger on temporal electrodes, we conducted additional experiments in which we also recorded from temporal electrodes T3–T6. We have now added a comparison of visuomotor mismatch and MMN responses on T3–T6 electrodes as Figures S8–S9. On all electrodes, visuomotor mismatch responses were larger than MMN responses.

      (4) As suggested by reviewer 1, we have added an analysis of the experience-dependent changes in mismatch responses comparing frontal and occipital responses early and late in the session (new Figure 4).

      (5) As suggested by reviewer 2, we conducted additional experiments in an independent cohort of participants (note, without concurrent EEG) to measure eye movements triggered by visuomotor mismatches. We found eye-movement speed and blink/eye-closure changes, but these had longer latency than visuomotor mismatch responses (Figure S7).

      (6) Finally, as suggested by reviewers 2 and 3, we applied independent component (ICA) and time–frequency analyses to the EEG data. We show these results and explain why they are not applicable or useful in our case in the responses below.

      Please note, during the revision, we found that a part of our analysis used a bandpass of 0.2-100 Hz while a 1-100 Hz bandpass filter was used elsewhere. This has now been standardized to a 1-100 Hz bandpass filter, and the corresponding methods were updated. This resulted in no relevant changes to the figures. Additionally, the 50 Hz band-stop filter was erroneously described in the methods as 49-51 Hz. The filter used was 40-60 Hz, and the methods have been updated to reflect this.

      Reviewer #1 (Public review):

      In this paper, the authors wished to determine human visuomotor mismatch responses in EEG in a VR setting. Participants were required to walk around a virtual corridor, where a mismatch was created by halting the display for 0.5s. This occurred every 10-15 seconds. They observe an occipital mismatch signal at 180 ms. They determine the specificity of this signal to visuomotor mismatch by subsequently playing back the same recording passively. They also show qualitatively that the mismatch response is larger than one generated in a standard auditory oddball paradigm. They conclude that humans therefore exhibit visuomotor mismatch responses like mice, and that this may provide an especially powerful paradigm for studying prediction error more generally.

      Asking about the role of visuomotor prediction in sensory processing is of fundamental importance to understanding perception and action control, but I wasn't entirely sure what to conclude from the present paradigm or findings. Visuomotor prediction did not appear to have been functionally isolated. I hope the comments below are helpful.

      (1) First, isolating visuomotor prediction by contrasting against a condition where the same video stream is played back subsequently does not seem to isolate visuomotor prediction. This condition always comes second, and therefore, predictability (rather than specifically visuomotor predictability) differs. Participants can learn to expect these screen freezes every 10-15 s, even precisely where they are in the session, and this will reduce the prediction error across time. Therefore, the smaller response in the passive condition may be partly explained by such learning. It's impossible to fully remove this confound, because the authors currently play back the visual specifics from the visuomotor condition, but given that the visuomotor correspondences are otherwise pretty stable, they could have an additional control condition where someone else's visual trace is played back instead of their own, and order counterbalanced. Learning that the freezes occur every 10-15 s, or even precisely where they occur, therefore, could not explain condition differences. At a minimum, it would be nice to see the traces for the first and second half of each session to see the extent to which the mismatch response gets smaller. This won't control for learning about the specific separations of the freezes, but it's a step up from the current information.

      In theory, it is correct that the open loop (playback) session is predictable. However, this is relatively unrealistic. The open loop session is a 5-minute sequence that participants have only experienced once before, when they were generating it in the closed loop session a couple of minutes earlier. It is unlikely that participants would remember the entire sequence to a precision of less than a second, which is what they would need to predict the mismatch event. However, the reviewer is correct that it is possible that the mismatch events lose salience with time, for example as a consequence of participants losing interest in the task with time, or by undergoing some form of adaptation. To address this, we repeated the experiments with the sequence of closed and open loop sessions reversed (Figures S6A-S6C), and we analyzed the responses as a function of time within the session (Figures S6D and S6E), as suggested.

      The reversed-order design consisted of (1) open loop session: a playback, in which participants viewed the recorded closed loop session of a previous participant. This was followed by (2) a closed loop session, in which participants actively walked through the tunnel and experienced visuomotor mismatch events. Using this design, we again found that responses in the closed loop session were significantly larger than in the open loop session (Figures S6A-S6C).

      In addition, we analyzed both new and previously collected data as a function of time in the session. We computed moving average responses across 10 mismatch or playback halt trials at different percentages of progress through the paradigm (Figures S6D and S6E). This analysis revealed no consistent experience-dependent changes that could account for the observed differences between closed and open loop session. While there was indeed some form of experience dependent attenuation of visuomotor mismatch responses (see new Figure 4), the difference at the transition from mismatch to playback halt (and vice versa) far exceeded these adaptation effects (Figures S6D and S6E). This analysis was performed only on data from participants for whom we had both closed and open loop sessions and met our inclusion criteria.

      We used a similar analysis to test whether early and late responses within a session systematically differed (new Figure 4). Here, to maximize the chance of finding a difference, we compared early (first five) and late (last five) trials. Behaviorally, participants reduced their walking speed following mismatch events, with a significantly larger reduction during early trials (14.3%) than during late trials (5.7%) (Figure 4A). Neural responses mirrored this pattern primarily on frontal electrodes: frontal activity showed a clear attenuation from early to late trials (Figure 4B), consistent with the reduction in behavioral responses. In contrast, changes on occipital electrodes were much smaller between early and late trials (Figure 4C-4D). Thus, experience-related modulation is substantially stronger in frontal compared to occipital regions.

      In sum, we do not believe that the difference between visuomotor mismatch responses and playback halt responses can be explained by differences in the predictability of mismatch and playback halt events.

      (2) Second, the authors admirably modified their visual-only condition to remove nausea from 6 df of movement (3D position, pitch, yaw, and roll). However, despite the fact it's far from ideal to have nauseous participants, it would appear from the figures that these modifications may have changed the responses (despite some pairwise lack of significance with small N). Specifically, the trace in S3 (6DOF) and 2E look similar - i.e., comparing the visuomotor condition to the visual condition that matches. Mismatch at 4/5 microvolts in both. Do these significantly differ from each other?

      Yes, the 6DOF playback halt response shown in the previous Figure S3 and the mismatch response shown in previous Figure 2E are significantly different (Author response image 1).

      Author response image 1.

      Comparison of visuomotor mismatch response (A) and 6DOF playback halt response (B) from the original submission with statistics of the comparison (C).

      Nevertheless, to strengthen this conclusion, we collected additional data in the 6DOF condition. We show the comparison for participants for whom both closed loop (active) and open loop sessions (6DOF) were recorded within the same recording session (14 participants) in Figure S4. Consistent with our previous findings, visuomotor mismatch responses were significantly larger than 6DOF playback halt responses (Figures S4A-S4C). And we found no evidence of a difference between 6DOF and 4DOF playback halt responses (Figures S4D and S4E).

      (3) It generally seems that if the authors wish to suggest that this paradigm can be used to study prediction error responses, they need to have controlled for the actions performed and the visual events. This logic is outlined in Press, Thomas, and Yon (2023), Neurosci Biobehav Rev, and Press, Kok, and Yon (2020) Trends Cogn Sci ('learning to perceive and perceiving to learn'). For example, always requiring Ps to walk and always concurrently playing similar visual events, but modifying the extent to which the visual events can be anticipated based on action. Otherwise, it seems more accurately described as a paradigm to study the influence of action on perception, which will be generated by a number of intertwined underlying mechanisms.

      We are not entirely sure we understand the point here correctly. If the reviewer is suggesting that visuomotor coupling is not describable by the ideas of predictive processing, we disagree. However, given that the papers the reviewer is pointing to are premised on what seems to be a somewhat unorthodox interpretation of predictive processing when it comes to cortical circuits, we suspect this is contributing to the misunderstanding here. Let us briefly explain. In the two papers, Press and colleagues argue that most experiments cannot distinguish between “predictive cancellation” and “gated suppression”. This is indeed relatively tricky, even when one has single neuron data. The question is, does movement simply suppress sensory feedback (as is likely the case e.g. in the famous example of the cricket), or does movement result in a precise removal of only the self-generated sensory reafference? The first good evidence of the latter happening in any system is quite recent (Keller and Hahnloser, 2009). The premise the authors build their argument on is that the theory posits that “the brain predictively ‘cancels’ expected action outcomes from perception” (from the abstract of one of the papers). This is incomplete. The minimum circuit for predictive processing is composed of 3 neuron types: positive prediction error neurons, negative prediction error neurons, and internal representation neurons. Only the positive prediction error neurons have the predictive cancellation property the authors discuss. This is not the case for either negative prediction error neurons, or for the internal representation neurons. Negative prediction error neurons are excited by predictions and suppressed by sensory input (i.e. if anything, they are “predictively amplified”). This circuit is relatively well characterized in mouse cortex – for a brief summary see (Keller and Mrsic-Flogel, 2018). Note, this is not our idea of course – the original formulation of predictive processing (Rao and Ballard, 1999) was built to explain end-stopping. These are responses to the absence of an expected line that were stronger than would be expected from classical theories (i.e. negative prediction error responses). In mouse visual cortex, we know that a sudden break in the coupling between locomotion and visual flow selectively activates layer 2/3 negative prediction error neurons. Thus, if human cortex also implements a predictive processing like circuit with positive and negative prediction error neurons, we would expect a break in visuomotor coupling to drive a measurable response in visual cortex (by exciting the population of negative prediction error neurons – this is also why we are quite excited by the phase reversal of visual and mismatch responses as this could indicate that mismatch activates negative prediction error neurons first and positive prediction error neurons later, and vice versa for visual stimulation – negative prediction error neurons are more superficial in cortex (O’Toole et al., 2023)). We do indeed find a response over occipital cortex consistent with the negative prediction error response we observe in mouse cortex. The difficulty in distinguishing “predictive cancellation” and “movement driven suppression” comes only when looking at positive prediction error type responses (that are suppressed by predictive inputs) but does not apply to negative prediction error responses. The predictive processing circuit we are testing is the one described by (Keller and Mrsic-Flogel, 2018; Rao and Ballard, 1999), and here the break in visuomotor coupling is a stimulus that drives negative prediction error responses. Note, other authors who have thought about cortical implementations of predictive processing (e.g. (Bastos et al., 2012)) have glossed over the problem that individual neurons cannot trivially encode both positive and negative errors. Prediction errors are a signed quantity. If neurons signal prediction errors in firing rates and are close to zero firing rate at baseline (as is the case in layer 2/3 of cortex), they cannot (short of rather exotic ideas) encode a signed prediction error. Hence such proposals are not very useful for thinking about prediction error responses in cortex. For these reasons, we see no problem with referring to the response as a prediction error response. This is in line with a large body of mouse research (using a nearly identical paradigm) on the topic.

      One could of course argue that gated suppression could also mean that movement relieves suppression. Thus, one could assume that some neurons are suppressed by movement while others are enhanced. If one allows for enough neuron and stimulus specificity in the precision of the movement related suppression and enhancement of responses, the two models (predictive processing and gated suppression) become equivalent, and the discussion becomes semantic. See (Vasilevskaya et al., 2023) for an extended discussion on this point, and the reasons why we think predictive processing is a more useful model than gated suppression (keep in mind, gated suppression only explains the data if we allow for stimulus/neuron specific gain factors of the suppression, in which case the two models are equivalent).

      More minor points:

      (1) I was also wondering whether the authors may consider the findings in frontal electrodes more closely. Within the statistical tests of the frontal electrodes against 0, as displayed in Figure 3c, the insignificance of the effect of Fp2 seems attributable to the small included sample size of just 13 participants for this electrode, as listed in Table S1, in combination with a single outlier skewing the result. The small sample size stands out especially in comparison to the sample size at occipital electrodes, which is double and therefore enjoys far more statistical power. It looks like the selected time window is not perfectly aligned for determining a frontal effect, and also the distribution in 3B looks like responses are absent in more central electrodes but present in occipital and frontal ones. I realise the focus of analysis is on visual processing, but there are likely to be researchers who find the frontal effect just as interesting.

      That is correct; our data in frontal electrodes was likely underpowered. The reason we have fewer data in frontal electrodes is that eye-blink artifacts are particularly strong in frontal channels, resulting in a larger proportion of trials failing to meet our data inclusion criteria. We have now added more data from frontal and occipital electrodes by including additional experimental sessions. In addition, we applied less stringent trial-exclusion criteria, requiring that no artifacts occur within the time window −0.5 to 1 s relative to the event trigger (instead of −0.5 to 2 s). This adjustment allowed us to retain a larger number of trials. As anticipated by the reviewer, this increase in data was sufficient to confirm a significant response to the visuomotor mismatch event at both frontal electrodes (Figure 3C). The expanded dataset also revealed a significant difference in response onset times between occipital and frontal electrodes (Figure 3E), an effect that was not significant previously. In addition, we have included analysis comparing early and late mismatch responses in frontal and occipital electrodes (Figure 4).

      (2) It is claimed throughout the manuscript that the 'strongest predictor (of sensory input) - by consistency of coupling - is self-generated movement'. This claim is going to be hard to validate, and I wonder whether it might be received better by the community to be framed as an especially strong predictor rather than necessarily the strongest. If I hear an ambulance siren, this is an especially strong predictor of subsequent visual events. If I see a traffic light turn red, then yellow, I can be pretty certain what will happen next. Etc.

      This is a statistical argument. Every movement – throughout life – is directly and immediately coupled to sensory feedback and has been throughout evolutionary history. The vast majority of visual input you receive (we estimate, well above 99%) is the consequence of your own movements (e.g. every few 100 ms your eye movements cause a full field change in your visual input). The same is likely true of proprioceptive and somatosensory input – the vast majority is the direct consequence of your own movements (not other people poking you). This is likely different in the auditory system where a much larger fraction of the input is externally driven (depending a bit on how much one likes to talk). But even here the best predictor is self-motion (most non-self-generated sounds one experiences in life are very difficult to predict with millisecond precision). The example the reviewer gives is a good illustration of this. Take the siren that hails the appearance of an ambulance. The siren tells us that an ambulance will appear, but not how it will look, not when exactly it will appear, and with only very low resolution as to where it will appear. Incidentally, if you ask people to draw an ambulance they tend to draw a WWII style white square vehicle with a red cross on the side – a style of ambulance they likely have not ever seen in life. Their visual predictions of what they are about to see are very low resolution. We catastrophically fail at making pixel perfect predictions from learned stimulus associations of this nature. The traffic light example is difficult to compare to visual feedback control of movement as it is a much simpler prediction of a single bit in the form of a change in color of an existing object.

      In addition, consider how often (in life) you have seen an ambulance after hearing it? 100 times maybe? Maybe less. How often have you seen traffic lights change - 10 000 times? 100 000 times? Now consider, how often you have experienced the visual consequences of moving your head or eyes to the left (keep in mind this includes micro saccades) – at a conservative, once per second, that is somewhere on the order of 1 000 000 000. This is not even in the same ballpark. Our brains can certainly learn to make the ambulance and traffic light type predictions - to some extent - but by far the best predictor of sensory feedback (simply by virtue of the physics of how our body interacts with the world) is self-motion.

      We think this is an argument we can make based on first principles, and one that is frequently overlooked in the field, as experiments often focus on training people or animals to learn novel associations that, especially in the case of mice, we often have no idea whether cortical circuits can even learn. We should focus experiments on the predictive systems our brains have evolved since long before the evolutionary appearance of ambulances and traffic lights. We understand that the reviewer may disagree with this, but unless the reviewer has a concrete example of an even stronger predictor (as measured by frequency of experience, consistency in coupling, and precision in timing – we can’t think of one), it is a point we will make.

      (3) The checkerboard inversion response at 48 ms is incredibly rapid. Can the authors comment more on what may drive this exceptionally fast response? It was my understanding that responses in this time window can only be isolated with human EEG by presenting spatially polarized events (cf. c1, e.g., Alilovic, Timmermans, Reteig, van Gaal, Slagter, 2019, Cerebral Cortex).

      We don’t know, but it is not inconsistent with previous reports. For example, compare the “standing” and “fast walking” target ERP responses in Figure 5 of (Gramann et al., 2010). Both here and in our data, the fast response peak is only really apparent in the direct comparison of visual responses recorded while participants were walking to those when they were stationary.

      While we have taken great care to calibrate the timing of the visual display with the EEG recording, one could be worried that the alignment is off by as much as tens of milliseconds. However, even if this were so, one could use P1 as a reference and determine that the fast peak roughly precedes P1 by about 40 ms. Which again would result in a latency of about 50 ms of the fast walking peak (assuming P1 peaks at about 90 ms). In sum, we have added a reference to the previous work (that we found thanks to the reviewer’s comment) but fear we have nothing intelligent to say beyond that.

      Reviewer #2 (Public review):

      Summary:

      This study investigates whether visuomotor mismatch responses can be detected in humans. By adapting paradigms from rodent studies, the authors report EEG evidence of mismatch responses during visuomotor conditions and compare them to visual-only stimulation and mismatch responses in other modalities.

      Strengths:

      (1) The authors use a creative experimental design to elicit visuomotor mismatch responses in humans.

      (2) The study provides an initial dataset and analytical framework that could support future research on human visuomotor prediction errors.

      Weaknesses:

      (1) Methodological issues (e.g., volume conduction, channel selection, lack of control for eye movements) make it difficult to confidently attribute the observed mismatch responses to activity in visual cortical regions.

      (2) A very large portion of the data was excluded due to motion artefacts, raising concerns about statistical power and representativeness. The criteria for trial inclusion and the number of accepted trials per participant appear arbitrary and not justified with reference to EEG reliability standards.

      (3) The comparison across sensory modalities (e.g., auditory vs. visual mismatch responses) is conceptually interesting, but due to the choice of analyzing auditory mismatch responses over occipital channels, it has limited interpretability.

      We have responded to these points in the more detailed itemization below.

      The authors successfully demonstrate that visuomotor mismatch paradigms can, in principle, be applied in human EEG. However, due to the issues outlined above, the current findings are relatively preliminary. If validated with improved methodology, this approach could significantly advance our understanding of predictive processing in the human visual system and provide a translational bridge between rodent and human work.

      Reviewer #2 (Recommendations for the authors):

      Overall, the study addresses an interesting and underexplored question (translation of the visuomotor mismatch responses observed in rodents to humans). Below, please find a list of specific suggestions for improvement

      Introduction:

      (1) "updating internal representations and internal models" - what is the difference between the two, and why is it relevant to this study?

      In a nutshell, an internal model is the synaptic weight matrix that transforms between coding spaces. An internal representation is the activity pattern coding for the current representation. See (Aizenbud et al., 2025; Keller and Mrsic-Flogel, 2018) for more lengthy elaborations. The fact that the mechanism used for representation update can also be used to update internal models (i.e. solve the credit assignment problem) is likely the prime advantage of predictive processing (see work from the Bogacz lab). The relevance to the current study is justifying why predictive processing is a reasonable hypothesis for the function of cortex.

      (2) "Certain stimuli can be predicted from the preceding sensory input" vs. "Predictions can also be based on memory" - how are these two different? Do you mean specific (e.g., long-term associative or episodic) memory types in the latter?

      Correct, this is an arbitrary distinction that primarily makes sense in the light of experimental approaches. In this particular case, we were talking about spatial memory. We made this explicit to increase clarity.

      (3) "the strongest predictor - by consistency of coupling - is self-generated movement"

      (a) Externally induced movement, while not self-generated and therefore not predicted, will also generate sensory coupling, so is it really only about consistency?

      Externally induced movement (as in somebody else moving one’s arm we are not sure this is what the reviewer means) will induce sensory-sensory coupling but not sensorimotor coupling. We might be misunderstanding the point. In case the reviewer means stimuli that trigger movement as in us asking participants to walk, or a sudden startle stimulus that makes them jump in all such cases there are of course sensorimotor predictions. Sensorimotor predictions are driven by efference copies of the motor command thus all movements whether ‘voluntarily’ executed or triggered by an external stimulus will drive sensorimotor predictions. (All of this of course assumes that the predictive processing theory is correct.)

      (b) Do you mean temporal consistency (minimal lags), statistical contingencies (same movements linked to the same sensory inputs), or both? How does it differentiate sensorimotor/visuomotor mismatch responses from responses to incongruent stimuli in sensory modalities (e.g. audiovisual)?

      Both. We have rephrased the sentence to try to make this clearer. See also response to reviewer 1 minor point 2 above.

      How does it differentiate sensorimotor/visuomotor mismatch responses from responses to incongruent stimuli in sensory modalities (e.g. audiovisual)?

      Most cross-modal associations are much less consistent (the exact sound of a glass shattering is always slightly different and impossible for us to predict), and orders of magnitude less frequently experienced, than sensorimotor associations. Again, see also response to reviewer 1 minor point 2 above.

      (4) "Every movement is directly coupled to sensory feedback throughout life"

      This may be the case for proprioceptive and/or somatosensory feedback, but not necessarily for visual feedback (e.g., a mouse moving its tail), which is the topic of the study.

      Correct, there are movements that can be disconnected from visual feedback. Most of the time, most movements however are not, and we are studying one of the more prominent ones that is clearly not decoupled locomotion. The contrast we aim to highlight here very prominently is that there is still this vague idea in the field that you can take a participant, or a mouse, and expose them/it to a few tens or hundreds of trials of some sensory stimulus contingency and then probe for prediction error responses to a pattern only recently if at all learned. Given the life-long experience of subjects and mice, is it really surprising that oddball responses are less strong than a sensorimotor mismatch?

      (5) "However, the overall level of this motor-related activity is much higher than one would expect simply from predictions of visual feedback that are compared against visual input."

      Could you please clarify what one would expect in this case, and/or back it up with citations?

      This is in reference to the fact that there are very strong movement related signals in the mouse visual cortex that persist even when the mouse is in complete darkness. In darkness, movements should not trigger any visual feedback change hence the activity is difficult to explain as a movement related prediction of visual flow. We have rephrased this section of the introduction to make this clearer.

      (6) "The more precise the prediction and comparison, the less motor-related activity should be detectable in visual cortex."

      I think this conflates two issues. A good match between prediction and input would indeed result in sensory attenuation. However, sensory precision, at least in active inference, can upregulate prediction error responses. Since predictions cannot be assumed to be perfect (due to external or internal noise), increased precision may therefore augment activity. See e.g. https://doi.org/10.1007/s10339-013-0571-3

      We agree with the reviewer – the phrasing here was misleading. We do not mean precision in the predictive processing sense, but the precision of sensorimotor control necessary for the behavior. We have rephrased the corresponding section of the manuscript.

      (7) Neither the introduction nor the discussion refers to previous human EEG studies on sensorimotor mismatch responses, where sensory feedback doesn't match motor actions (e.g. https://doi.org/10.3758/s13423-021-01992-z ; https://www.sciencedirect.com/science/article/pii/S0028393214003777 ; https://www.sciencedirect.com/science/article/pii/S0028393219301265).

      The studies cited by the reviewer primarily test how discrete violations of learned action–outcome associations are represented in the brain, whereas our visuomotor mismatch paradigm probes violations of continuous sensorimotor coupling during ongoing action. The paradigms are conceptually different both in how strong the coupling is (lifelong vs. learned in the experiment), and in how prediction errors are likely used (visuomotor control vs. stimulus detection). We have added a brief part to our introduction discussing this.

      Results:

      (1) A very large proportion of the dataset was excluded due to movement artefacts. This is rather problematic as

      (a) the rationale behind finding mismatch responses is that motion-related (neural) signals should affect visual cortical activity, so it's essential to disentangle these neural signals from artefacts;

      Correct, we excluded 21.7% of the total data for visuomotor mismatch paradigm. Note, this percentage compares to other similar studies of EEG recordings during movement (Oliveira et al., 2016). By “problematic”, we assume the reviewer means the fact that we have artefacts, not that we exclude trials with artefacts. The movement artefacts are typically caused by the acceleration during stepping in participants with a heavy gait. None of these movement artefacts are time locked to any of the responses we investigate. Thus, they should just appear as increased levels of noise if not excluded. We don’t understand why the reviewer thinks this is particularly problematic for our analysis/conclusions (beyond the trivial consequence of increasing noise levels that would only cause us to underestimate the strength of the mismatch signals we report).

      (b) the criterion for the number of trials of 15 triggers (per condition?) is arbitrary and lower than widely used in the literature, so authors should demonstrate that this is a sufficient number to observe a measurable ERP even for those participants with 15 triggers;

      We have between 16 and 25 visuomotor mismatch events per participant. Author response image 2 is a selection of single participant examples with different number of trials. The number of mismatch events is limited by the fact that we introduce them approximately every 10 - 15 s and have a total duration of the closed loop session of 5 minutes. Thus, on average, we expect to have 24 mismatch events. But we are not sure we understand the logic of the comment, if we set exclusion too low, we just risk losing a response in the noise. And we clearly have stronger and higher signal to noise mismatch responses with an average of 20 trials compared to visual responses during movement with an average of 40 trials or MMN responses with an average of 28 trials.

      Author response image 2.

      Reliable ERPs can be observed with as few as 16 trials across EEG channels. (A) Histograms showing the distribution of the number of valid mismatch trials per participant for each electrode pair (Fp1–2, C3–4, P3–4, O1–2). (B) Representative EEG responses to visuomotor mismatch events from a single participant, recorded at electrode pairs Fp1–2, C3–4, P3–4, and O1–2. Waveforms were computed using the indicated number of trials (shown above each trace). Dashed vertical red lines are onset and offset of the visuomotor mismatch.

      (c) it seems that the seemingly static "visual" condition resulted in a larger proportion of data rejected due to movement (or, as later mentioned, nausea) than the "visuomotor" condition, which is counterintuitive and needs further explanation;

      This is a misunderstanding the ‘visual paradigm’ the reviewer is referring to are the experiments shown in Figure 1. Here we record visual responses in both sitting and walking participants. In this experiment, as in others, exclusion was primarily driven by part of the paradigm where the subjects were moving. To make this clearer we have added Table S2 to the manuscript that provides an overview of trials excluded by paradigm and session.

      (d) authors mention eye movements as a potential issue, which should be possible to detect from frontal channels. Additionally, it's not entirely clear how many datasets were discarded (the results section mentions 19/48 in the visual condition, then 4+11 in the playback condition - isn't this the same condition?)

      The visual paradigm corresponds to the data shown in Figure 1, in which participants viewed a flipping checkerboard in both a walking and a stationary session. The open loop session is part of the visuomotor paradigm shown in Figure 2, where participants were exposed to a replay of the visual flow that had been self-generated during the preceding closed loop session, including the visual flow halts that constituted visuomotor mismatches in the closed loop session. Please note, to avoid such confusion, we have attempted to standardize the usage of paradigm (visual vs. visuomotor) and session (sitting vs. walking, and closed loop vs. open loop) throughout. In addition, we have added a table to summarize the number of excluded trials by paradigm and session as Table S2 to the manuscript.

      In comments 1 and 2 of the public review, the reviewer also points out that we did not control for eye movements and we presume relatedly claims that we did not use common EEG reliability standards. Regarding the first point, we performed additional experiments in an independent cohort of participants to test whether eye movements could account for the visuomotor mismatch responses. We recorded eye movements during closed loop sessions and found that changes in eye speed (Figure S7A) or blink rate (Figure S7B) following the mismatch stimulus had a longer latency than visuomotor mismatch responses in EEG. This suggests that the visuomotor mismatch response cannot be explained by eye blinks or changes in eye movement speed. Regarding the second point, we are not sure we understand. Trial exclusion based on a fixed voltage threshold of 100 µV is relatively common, and our rejection rates are on par, and particularly on occipital electrodes even lower, with other work in EEG recordings during locomotion or movement (see e.g. (Oliveira et al., 2016)).

      Nevertheless, we did attempt to apply independent component analysis (ICA) based filtering to the EEG data (Delorme and Makeig, 2004). However, these methods were designed for high channel density recordings. With only 8 channels, ICA is unable to reliably isolate eye movement or motion artefact components of the EEG. To illustrate this, we tested two artifact-rejection strategies. In the first approach, components associated with non-neural artifacts (e.g., muscle activity, line noise, eye movements) were removed only if at least 90% of the component’s variance was assigned to a single artifact class (Author response image 3A). In the second, more permissive approach aimed specifically at reducing eye movement artifacts, components were removed if artifact-related activity exceeded 90% for non-eye artifacts, while the threshold for eye-related components was lowered to 60% (Author response image 3C). We lowered the threshold for excluding eye-related components to ensure that EEG signals influenced by eye movements were effectively removed. In both cases - whether the eye-component threshold was set to 90% or 60% - the averaged responses to visuomotor mismatch trials remained largely similar to the previously reported data, despite higher noise in some traces. Interestingly, when we then followed the ICA filtering by our voltage threshold based exclusion with a threshold of 100 µV, the resulting traces closely resembled the patterns described in the paper (Author response image 3B and 3D). Thus, we conclude the nonICA filtered responses are easier to interpret, free of any potential ICA filtering artifacts, and far less parameter choice (of the ICA filtering) dependent.

      Author response image 3.

      Removal of artifacts identified with ICA does not change the visuomotor mismatch responses. (A) Visuomotor mismatch responses recorded from occipital electrodes after artifact correction. Components associated with non-neural artifacts (e.g., muscle activity, line noise, eye movements) were removed only if ≥90% of the component’s variance was attributed to a single artifact class. Solid black line represents the mean, and shading indicates the SEM across participants. Dashed vertical red lines are onset and offset of the visuomotor mismatch. (B) As in A, but excluding trials with amplitudes exceeding 100 µV. (C) As in A, but components were removed if artifact-related activity exceeded 90% for non-ocular artifacts, while the threshold for eye-related components was lowered to 60%. (D) As in C, but excluding trials with amplitudes exceeding 100 µV.

      (2) The finding that mismatch responses are observed at all channels, with differences in amplitudes but not latencies, indicates that volume conduction may affect the results. I would strongly suggest accounting for this using a method appropriate for the very small number of channels, e.g., phase lag index.

      We are not sure we understand. The phase lag index is a method to estimate functional connectivity in a way that corrects for volume conduction (using phase lag). We make no claims about functional connectivity; thus, we are not sure what the reviewer is suggesting we do. The fact that the visual and visuomotor mismatch responses were measurable on all electrodes could indeed be in part explained by volume conduction, but we see no way to estimate the volume conduction contribution. From mouse calcium imaging data, we know that both visual and visuomotor mismatch responses spread across large parts of dorsal cortex (including frontal regions like the ACC).

      With the addition of new data, the latency difference between occipital and frontal electrodes - previously observed only as a trend - is now statistically significant (Figure 3E). Occipital responses emerge earlier than frontal responses, suggesting that mismatch-related activity likely originates in sensory visual regions and subsequently propagates to more frontal areas, as similar to what had been reported in mouse cortex (Heindorf and Keller, 2024).

      (3) The authors compare different types of mismatch responses (including auditory oddballs) in the same set of (occipital) channels, but doesn't this undermine the spatial specificity of the results? Classical auditory mismatch negativity is typically observed over central channels, so weaker amplitudes of auditory mismatch responses in occipital channels are likely trivially explained by modality differences. As such, I'm not convinced that this comparison is informative even in a qualitative manner.

      To address this point, we conducted additional auditory oddball experiments with recordings over the auditory cortex (channels T3, T4, T5, and T6). Given our central reference, these channels should capture the strongest mismatch negativity. The amplitude of the visuomotor mismatch response exceeded that of mismatch negativity on all tested channels (new Figures S8 and S9).

      (4) On a similar note, is the polarity reversal found for visual vs. mismatch responses specific to occipital channels?

      Thank you for this interesting question. In fact, polarity reversal was consistently observed across all recorded channels; this has now been added as a main figure to the manuscript (Figure 5).

      (5) Figure S4C seems to cut off one outlier, and I don't see this outlier included in the boxplot.

      Correct, that is why we describe the boxplots in the figure legend as: “Boxes mark median, quartiles, and range of data not considered outliers.” The axes were now adjusted to include all data points.

      Discussion:

      "A central tenet of the cortical circuit for predictive processing is the split into separate populations of neurons that compute positive and negative prediction errors (Keller and Mrsic-Flogel, 2018; Rao and Ballard, 1999)" - this may be the case for visuomotor mismatch signals or reward prediction errors, but signed PEs do not play a central role in other proposed microcircuits for predictive processing in the perceptual domain (e.g. Bastos)

      Signed prediction errors do not play a central role in proposed cortical microcircuits for predictive processing that do not burden themselves with making a concrete proposal for the implementation of the prediction error computation. The (Bastos et al., 2012) work is a good example of this. The equation for the error term provided in that paper is clearly signed (nothing stops the error from going negative), but no proposal is made for how layer 2/3 excitatory neurons are supposed to signal this quantity. With baseline activity levels close to zero in layer 2/3, there really is only one way to do this, and that is separate populations of negative and positive prediction error neurons. With non-zero baseline firing rate, one could do this bidirectionally around a mean firing rate (as is typically thought of dopaminergic RPE neurons). There are more abstract Bayesian implementations that assume logarithmic transformations that could also implement a prediction error-like system without negative firing rates. But given the absence of any physiological evidence, we will refrain from discussing these. However, most importantly, there is now considerable evidence for the existence of both negative and positive prediction error neurons in layer 2/3 of mouse visual cortex. Thus, by “cortical circuit for predictive processing” we here mean those that make biologically plausible proposals for prediction error computations. Also note, the (Rao and Ballard, 1999) model is probably the prime example for what the reviewer calls a proposed microcircuit for predictive processing in the “perceptual domain”.

      Reviewer #3 (Public review):

      Summary:

      Solyga, Zelechowski, and Keller present a concise report of an innovative study demonstrating clear visuomotor mismatch responses in ambulating humans, using a mobile EEG setup and virtual reality. Human subjects walked around a virtual corridor while EEGs were recorded. Occasionally, motion and visual flow were uncoupled, and this evoked a mismatch response that was strongest in occipitally placed electrodes and had a considerable signal-to-noise ratio. It was robust across participants and could not be explained by the visual stimulus alone.

      Strengths:

      This is an important extension of their prior work in mice, and represents an elegant translation of those previous findings to humans, where future work can inform theories of e.g., psychiatric diseases that are believed to involve disordered predictive processing. For the most part, the authors are appropriately circumspect in their interpretations and discussions of the implications. I found the discussion of the polarity differences they found in light of separate positive and negative prediction errors, intriguing.

      Weaknesses:

      The primary weaknesses rest in how the results are sold and interpreted.

      Most notably, the interpretation of the results of the comparison of visuomotor mismatches to the passive auditory oddball induced mismatch responses is inappropriate, as suboptimal electrode choices, unclear matching of trial numbers, and other factors. To clarify, regarding the auditory oddball portion in Figure 5, the data quality is a concern for the auditory ERPs, and the choice of Occipital electrodes is a likely culprit. Typically, auditory evoked responses are maximal at Cz or FCz, although these contacts don't seem to be available with this setup. In general, caution is warranted in comparing ERP peaks between two different sensory modalities - especially if attention is directed elsewhere (to a silent movie) during one recording and not during the other. The authors discuss this as a purely "qualitative" comparison in the text, which is appreciated, and do acknowledge the limitations within the results section, but the figure title and, importantly, the abstract set a different tone. At least, for comparisons between auditory mismatch and visuomotor mismatch, trial numbers need to be equated, as ERP magnitude can be augmented by noise (which reduces with increased numbers of trials in the average).

      To address this point, we conducted additional auditory oddball experiments with recordings over the auditory cortex (channels T3, T4, T5, and T6). Given our central reference, these channels should capture the strongest mismatch negativity. Nevertheless, the amplitude of the visuomotor mismatch response exceeded that of mismatch negativity on all tested channels (these results are now shown in the new Figures S8 and S9), and the response power was significantly greater for the visuomotor mismatch than for mismatch negativity. Independent of electrode we test, the visuomotor mismatch response has a power 5 to 10 times higher than that of the MMN response. And the number of trials per participant that met quality criteria was comparable between the visuomotor mismatch paradigm (mean = 23 trials) and the auditory mismatch paradigm (mean = 28 trials) (Author response image 4).

      Author response image 4.

      Number of trials included for analysis is comparable between visuomotor and oddball paradigm. (A) Histogram showing the distribution of the number of valid trials per participant for O1-2 electrode pair in visuomotor mismatch paradigm. (B) Same as in A but for deviant stimulus presentations in the oddball paradigm.

      And more generally, the size of the mismatch event at the scalp does not scale one-to-one with the size at the level of the neural tissue. One can imagine a number of variables that impact scalp level magnitudes, which are orthogonal to actual cortex-level activation - the size, spread, and polarity variance of the activated source (which all would diminish amplitude at the scalp due to polyphasic summation/cancelation). The variance of phase to a stimulus across trials (cross trial phase locking) vs magnitude of underlying power - the former, in theory, relates to bottom-up activity and the latter can reflect feedback (which has more variability in time across trials; the distance of the scalp electrode from the activated tissue (which, for the auditory system, would be larger (FCz to superior temporal gyrus) than for the visual system (O1 to V1/2)). None of this precludes the inclusion of the auditory mismatch, which is a strength of the study, but interpretations about this supporting a supremacy of sensory-motor mismatch - regardless of validity - are not warranted. I would recommend changing the way this is presented in the abstract.

      We agree with the point that the EEG response does not need to reflect the total cortical activation. However, the discussion in the abstract (and elsewhere) is in the context of clinical experiments where the underlying cortical activity pattern is irrelevant if it does not trigger a clinically measurable (by EEG in this case) response. The abstract only makes a comparison to MMN implicitly in this sentence “Second, a paradigm that can trigger strong prediction error responses and consequently requires shorter recording times could simplify experiments in a clinical setting.” We are not sure how to phrase this even more carefully – the statement at face value is a truism. The reviewer, we assume, takes exception to the unstated implication that visuomotor prediction errors trigger stronger responses than MMN. Given the data we have, we assume most authors would not consider it an overstatement to make that claim outright.

      Otherwise, the data are of adequate quality to derive most of their conclusions.

      The authors claim that the mismatch responses emanate from within the occipital cortex, but I would require denser scalp coverage or a demonstration of consistent impedances across electrodes and across subjects to make conclusions about the underlying cortical sources (especially given the latencies of their peaks). In EEG, the distribution of voltage on the scalp is, of course, related to but not directly reflective of the distribution of the underlying sources. The authors are mostly careful in their discussion of this, but I would strongly recommend changing the work choice of "in occipital cortex" to "over occipital cortex" or even "posteriorly distributed". Even with very dense electrode coverage and co-registration to MRIs for the generation of forward models that constrain solutions, source localization of EEG signals is very challenging and not a simple problem. Given the convoluted and interior nature of human V1, the ability to reliably detect early evoked responses (which show the mismatch in mouse models) at the scalp in ERP peaks is challenging - especially if one is collapsing ERPs across subjects. And - given the latency of the mismatch responses, I'd imagine that many distributed cortical regions contribute to the responses seen at the scalp.

      This is an excellent point we have rephrased throughout to “over occipital cortex” instead of “in occipital cortex”.

      I think that Figure 3C, but as a difference of visual mismatch vs halting flow alone (in the open loop) might be additionally informative, as it clarifies exactly where the pure "mismatch" or prediction error is represented.

      We performed the analysis as suggested (Author response image 5). Visuomotor mismatch responses are stronger on all electrodes compared to playback halt responses. This difference is also larger in data recorded on occipital electrodes.

      Author response image 5.

      Comparison of the difference between visuomotor mismatch and playback halt on all electrodes. Average response strength was calculated within a 100 ms window centered on the peak of the average visuomotor mismatch response across all electrodes. Boxes mark median, quartiles, and range of data not considered outliers. Each circle represents data from one participant. **: p<0.01, *: p<0.05, Fp1-2: 20 participants, C3-4: 31 participants, P3-4: 35 participants, O1-2: 32 participants.

      As a suggestion, the authors are encouraged to analyse time-frequency power and phase locking for these mismatch responses, as is common in much of the literature (see Roach et al 2008, Schizophrenia Bulletin). This is not to say that doing so will yield insights into oscillations per se, but converting the data to the time-frequency domain provides another perspective that has some advantages. It fosters translations to rodent models, as ERP peaks do not map well between species, but e.g., delta-theta power does (see Lee et al 2018, Neuropsychopharmacology; Javitt et al 2018, Schizophrenia research; Gallimore et al 2023, Cereb Ctx). Further, ERP peaks can be influenced by the actual neuroanatomy of an individual (especially for quantifying V1 responses). Time frequency analyses may aid in interpreting the "early negative deflection with a peak latency of 48 ms " finding as well.

      We have performed time–frequency power and phase-locking analyses for both visual responses (Author response image 6 and Author response image 7) and visuomotor mismatch and playback halt responses (Author response image 8 and Author response image 9), as suggested. We have added the results of these analyses here, as these are not fully developed yet. We may add these to a future publication, for which we would properly want to quantify stability of these effects.

      In brief, time–frequency representations of power did identify potentially interesting differences between walking and sitting sessions in the visual paradigm. Inter-trial phase coherence (ITPC) revealed an early increase in alpha-band synchronization suggesting that phase alignment of alpha oscillations may contribute to the early differences in visual responses between walking and sitting. The same analyses were applied to visuomotor mismatch and playback halt responses. Time–frequency power analysis revealed an increase in delta-band power during visuomotor mismatch, consistent with previous reports linking delta activity to prediction error processing, including reward prediction errors (Cavanagh, 2015), unexpected final words (Webb and Sohoglu, 2025), and visual deviance detection (West et al., 2024). Notably, it appears as if the increase in delta power emerged first over occipital electrodes and appeared later over more frontal electrodes, forming a spatiotemporal gradient of onset across the scalp.

      Delta power changes were markedly reduced in the playback halt responses at the time of visual flow cessation. While some power changes were observed, they occurred primarily at visual flow onset rather than at flow offset. Inter-trial phase coherence analysis further revealed delta-band synchronization over occipital electrodes following visuomotor mismatch, whereas the playback halt response showed strong phase synchronization in both delta and theta bands following visual flow onset.

      Author response image 6.

      Time–frequency representations of EEG power changes during the visual paradigm. (A) Time–frequency maps showing changes in spectral power relative to baseline for electrodes Fp1–2, C3–4, P3–4, and O1–2 following checkerboard reversal in the sitting session. The dashed red vertical line indicates the time of the checkerboard reversal (0 s). (B) As in A, but recorded while participants were walking.

      Author response image 7.

      Inter-trial phase coherence (ITPC) for visual trials during sitting and walking. (A) ITPC across trials for electrode pairs Fp1–2, C3–4, P3–4, and O1–2 following checkerboard reversal in the sitting session. The dashed red vertical line marks the time of the checkerboard reversal (0 s). (B) As in A, but recorded during walking.

      Author response image 8.

      Time–frequency representations of EEG power changes during visuomotor mismatch and playback halt responses. (A) Time–frequency maps showing changes in spectral power relative to baseline for electrodes Fp1–2, C3–4, P3–4, and O1–2 following visuomotor mismatch presentation. Dashed vertical red lines are onset and offset of the visuomotor mismatch. (B) As in A, but for playback halts.

      Author response image 9.

      Inter-trial phase coherence (ITPC) for the visuomotor mismatch and playback halt responses. (A) ITPC across trials for electrode pairs Fp1–2, C3–4, P3–4, and O1–2 following visuomotor mismatch presentation. Dashed vertical red lines are onset and offset of the visuomotor mismatch. (B) As in A, but for playback halts.

      Finally, the sentence in the abstract that this paradigm " can trigger strong prediction error responses and consequently requires shorter recording times would simplify experiments in a clinical setting" is a nice setup to the paper, but the very fact that one third of recordings had to be removed due to movement artifact, and that hairstyle modulates the recording SnR, is reason that this paradigm, using the reported equipment, may have limited clinical utility in its current form. Further, auditory oddball paradigms are of great clinical utility because they do not require explicit attention and can be recorded very quickly with no behavioral involvement of a hospitalized patient. This should be discussed, although it does not detract from the overall scientific importance of the study. The authors should reconsider putting this statement in the abstract.

      We have added a paragraph to the discussion to address these points. Note, we get robust and strong responses with very few trials (Author response image 2). The fact that we need to discard up to 21.7 % of trials due to movement/eye blink artefacts, does little to change the fact that we need much fewer trials and have larger and more robust responses compared to other EEG paradigms. Finally, we understand that sometimes not needing participants to pay attention to the task is useful. However, having a paradigm that is engaging and fun for participants and takes 5 minutes of recording time is probably equally often of advantage.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      (1) In the Introduction, I'm not sure that the logic comes through as to what the authors aim to illustrate by comparing mice to humans, in terms of precision and "movement modulation". In some cases, the precision of the comparison is referred to, and in others, the precision of the prediction (I think?). I'm not sure if they mean for this to be different or not. Simlarly, on line 81, "If indeed the precision of visuomotor coupling determines the amount of motor modulation of visual responses" - here I'm a little confused, as "amount of motor modulation" to me, the term "modulation" refers to a conditional modifier (if moving, than suppress visual movement resposnes. if not moving, then amplify visual movement repssones) rather than movement driven activity. The way I'm reading it, the authors mean the latter, but I could be misunderstanding.

      We have rephrased this section of the introduction.

      (2) I think it could be helpful, in the sentence starting on line 65, to reiterate that this observation of higher-than-expected motor activity in V1 is in mice (if I'm understanding it correctly). I also found myself tangled up in the difference between motor-related activity in V1 and motor-modulation in V1 in this paragraph.

      We have rephrased this section of the introduction.

      (3) For signal power, was the amplitude squared on individual trials prior to averaging, or after averaging? If prior, it would help with separating amplitude modulations from phase variance.

      In our previous analysis, power was computed by squaring the amplitude after trial averaging (Author response image 10A). We repeated the analysis using the alternative approach in which power was calculated for individual trials and then averaged (Author response image 10B). Although this method yields substantially higher absolute power values, the overall pattern of results remains unchanged: visuomotor mismatch responses continue to show significantly higher power than visual responses. To look at the phase variance we additionally analyze inter-trial phase coherence (Author response image 7 and Author response image 9).

      Author response image 10.

      Visuomotor mismatch responses have more power compared to visual responses. (A) Comparison of power between visuomotor mismatch and visual responses, calculated within a 0 - 0.5 s time window following stimulus onset. Power was computed by squaring the amplitude after trial averaging. Boxes indicate the median and interquartile range, with whiskers showing the range excluding outliers; circles represent data from individual participants. ***p < 0.001. (B) Same comparison as in (A), but with power calculated by squaring the amplitude of individual trials prior to averaging.

      (4) The "the world suddenly flew forward!" response from the participant, I understand, and I believe that it is useful to illustrate a point. I do not understand the "Are you printing this? - Hi Mom! " part of the participant response, and I'm not sure it adds to the paper, beyond amusement, which seems inappropriate.

      One of the authors (the one who did none of the experiments) finds this endlessly hilarious and as the reviewer notes, it might add amusement more generally. “Inappropriate” might be a bit harsh – according to our favorite AI chatbot: “Amusement provides significant mental, physical, and social value by offering a necessary escape from routine, reducing stress, and fostering a connection. It enhances well-being through endorphin-releasing experiences and encourages social bonding, learning, and joy.” Nevertheless, we have censored the offending passage.

      Aizenbud, I., Audette, N., Auksztulewicz, R., Basiński, K., Bastos, A.M., Berry, M., Canales-Johnson, A., Choi, H., Clopath, C., Cohen, U., Costa, R.P., Filippo, R.D., Doronin, R., Errington, S.P., Gavornik, J.P., Gillon, C.J., Granier, A., Hamm, J.P., Hertäg, L., Kennedy, H., Kumar, S., Ladd, A., Ladret, H., Lecoq, J.A., Maier, A., McCarthy, P., Mei, J., Mejias, J., Mikulasch, F., Mudrik, N., Najafi, F., Nejad, K., Nejat, H., Oweiss, K., Petrovici, M.A., Priesemann, V., Rudelt, L., Ruediger, S., Russo, S., Salatiello, A., Senn, W., Sennesh, E., Sima, S., Uran, C., Vasilevskaya, A., Vezoli, J., Vinck, M., Westerberg, J.A., Wilmes, K., Xiong, Y.S., 2025. Neural mechanisms of predictive processing: a collaborative community experiment through the OpenScope program. https://doi.org/10.48550/arXiv.2504.09614

      Bastos, A.M., Usrey, W.M., Adams, R.A., Mangun, G.R., Fries, P., Friston, K.J., 2012. Canonical microcircuits for predictive coding. Neuron 76, 695–711. https://doi.org/10.1016/j.neuron.2012.10.038

      Cavanagh, J.F., 2015. Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times. NeuroImage 110, 205–216. https://doi.org/10.1016/j.neuroimage.2015.02.007

      Delorme, A., Makeig, S., 2004. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134, 9–21. https://doi.org/10.1016/j.jneumeth.2003.10.009

      Gramann, K., Gwin, J.T., Bigdely-Shamlo, N., Ferris, D.P., Makeig, S., 2010. Visual evoked responses during standing and walking. Front. Hum. Neurosci. 4, 202. https://doi.org/10.3389/fnhum.2010.00202

      Heindorf, M., Keller, G.B., 2024. Antipsychotic drugs selectively decorrelate long-range interactions in deep cortical layers. eLife 12, RP86805. https://doi.org/10.7554/eLife.86805

      Keller, G.B., Hahnloser, R.H.R., 2009. Neural processing of auditory feedback during vocal practice in a songbird. Nature 457, 187–90. https://doi.org/10.1038/nature07467

      Keller, G.B., Mrsic-Flogel, T.D., 2018. Predictive Processing: A Canonical Cortical Computation. Neuron 100, 424–435. https://doi.org/10.1016/j.neuron.2018.10.003

      Oliveira, A.S., Schlink, B.R., Hairston, W.D., König, P., Ferris, D.P., 2016. Proposing Metrics for Benchmarking Novel EEG Technologies Towards Real-World Measurements. Front. Hum. Neurosci. 10, 188. https://doi.org/10.3389/fnhum.2016.00188

      O’Toole, S.M., Oyibo, H.K., Keller, G.B., 2023. Molecularly targetable cell types in mouse visual cortex have distinguishable prediction error responses. Neuron 111, 2918-2928.e8. https://doi.org/10.1016/j.neuron.2023.08.015

      Rao, R.P.N., Ballard, D.H., 1999. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87. https://doi.org/10.1038/4580

      Vasilevskaya, A., Widmer, F.C., Keller, G.B., Jordan, R., 2023. Locomotion-induced gain of visual responses cannot explain visuomotor mismatch responses in layer 2/3 of primary visual cortex. Cell Rep. 42, 112096. https://doi.org/10.1016/j.celrep.2023.112096

      Webb, J.M., Sohoglu, E., 2025. Cortical tracking of prediction error during perception of connected speech. https://doi.org/10.1101/2025.07.18.665498

      West, C.L., Bastos, G., Duran, A., Nadeem, S., Ricci, D., Groves, A.M.R., Wargo, J.A., Peterka, D.S., Leeuwen, N.V., Hamm, J.P., 2024. A lasting impact of serotonergic psychedelics on visual processing and behavior. https://doi.org/10.1101/2024.07.03.601959

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Rolland and colleagues investigated the interaction between Vibrio bacteria and Alexandrium algae. The authors found a correlation between the abundance of the two in the Thau Lagoon and observed in the laboratory that Vibrio grows to higher numbers in the presence of the algae than in monoculture. Timelapse imaging of Alexandrium in coculture with Vibrio enabled the authors to observe Vibrio bacteria in proximity to the algae and subsequent algae death. The authors further determine the mechanism of the interaction between the two and point out similarities between the observed phenotypes and predator prey behaviours across organisms.

      Strengths:

      The study combines field work with mechanistic studies in the laboratory and uses a wide array of techniques ranging from co-cultivation experiments to genetic engineering, microscopy and proteomics. Further, the authors test multiple Vibrio and Alexandria species and claim a wide spread of the observed phenotypes.

      Comments on revisions:

      I thank the authors for their additional work on the manuscript. My comments were addressed to my satisfaction.

      Dear Reviewer #1, we thank you for your careful evaluation of our manuscript and for the time and effort you dedicated to this review. We are pleased that the revised version has addressed your concerns to your satisfaction.

      Reviewer #2 (Public review):

      Goal summary

      The authors sought to (i) demonstrate correlations between the dynamics of the dinoflagellate Alexandrium pacificum and the bacterim Vibrio atlanticus in natural populations, ii) demonstrate the occurrence of predation in laboratory experiments, iii) demonstrate that predation is induced by predator starvation, and iv) test for effects of quorum sensing and iron-uptake genes on the predation process.

      Strengths include

      - Data indicating correlated dynamics in a natural environment that increase the motivation for study of in vitro interactions

      - Experimental design allowing clear inference of predation based on population counts of both prey and predators in addition to microscopy-based evidence

      - Supplementation of population-level data with molecular approaches to test hypotheses regarding possible involvement of quorum sensing and iron update in predation

      Weaknesses include

      - A quantitative analysis of effects of manipulating V. atlanticus density on rates of predation would have been valuable

      - Lack of clarity in some of the methodological descriptions

      Appraisal

      The authors convincingly demonstrate that V. atlanticus can prey on A. pacificum, provide strongly suggestive evidence that such predation is induced by starvation and clearly demonstrate that both iron availability and correspondingly the presence of genes involved in iron uptake strongly influence the efficacy of predation.

      Discussion of impact

      This paper will interest those interested in the diversity of forms of microbial predation and how microbial predatory behavior responds to environmental fluctuations. It will also interest those investigating bacteria-algae interactions and potential ecological controls of algal blooms. It may also interest researchers of microbial cooperation in light of the suggestion of communication between predator cells.

      Dear Reviewer #2, we sincerely thank you for the time you devoted to this second review of our manuscript. We greatly appreciate your thoughtful comments, which helped us further improve the clarity and precision of the manuscript. All your additional recommendations have been carefully considered and addressed in the revised version and in our responses below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (2) The authors' reference to Fig. 4a did not address our concern about density potentially affecting the outcomes shown in Fig. 3. Fig. 4a does not provide any quantitative effects of manipulating Vibrio density. But the new density numbers the authors added in response to point (33) do seem to address our concern, because Vibrio densities become lower in the older cultures, excluding the possibility that the increased predation in older cultures might have been due higher Vibrio densities. We think this should be stated explicitly.

      (33) See point (2) above. We think the authors should explicitly state in the text that the increased predation in older cultures was not due higher Vibrio densities in those older cultures, referring to their data.

      As recommended by Reviewer#2, we added the sentence “Importantly, Vibrio densities decreased with culture age, ruling out the possibility that the stronger predation observed in older cultures was driven by higher bacterial densities” in the results section “Attack of A. pacificum ACT03 is activated by V. atlanticus LGP32 starvation.”

      (45) Is it known that bacterial predators collectively feed more on other bacteria than on microbial eukaryotes in natural habitats? While this certainly seems most likely, it's stated as fact and so should either the statement should be supported with relevant citations or phrased as a likely hypothesis.

      As suggested, we rephrased this sentence “Predatory bacteria are found in a wide variety of environments and are commonly described as feeding on other bacteria, although some cases of predation on microbial eukaryotes have also been hypothesized” in the discussion section.

      (46) Perhaps "Conceiving predators as free-living organisms that kill other organisms and feed on them, this study suggest that Vibrios engage in a novel form of predation in which they kill and feed on algae."

      The reference to 'developing' a predator behavior is not clear. What is meant by 'develop'? It seems unnecessary.

      The use of italics when writing Vibrio is inconsistent.

      We agree that the reference to “developing” a predatory behavior was unclear and unnecessary. We therefore revised the sentence as follows: “Conceiving predators as free-living organisms that kill other organisms and feed on them, this study suggests that Vibrio engages in a novel form of predation in which it kills and feeds on algae.” We also corrected the inconsistent use of italics for Vibrio throughout the manuscript.

      (48) The authors might wish to revise this sentence, as although M. xanxthus does have contact-dependent killing mechanism, it is our understanding that both Lysobacter and myxobacteria can kill some prey at a distance with diffusible secretions.

      The sentence “These bacteria must be in close proximity to their prey in order to cause lysis and utilize their biomass, regardless of the prey's species” was replaced by “These bacteria may require close proximity to their prey to cause lysis and utilize their biomass, although some can also kill prey at a distance through diffusible secretions”.

      (50) Why not directly say 'predatory behavior?

      We totally agree and have reworded the sentence.

      Line by line feedback:

      28 '...the phycosphere, an interface ...'

      We agree and have revised the wording.

      24 'In the attack stage, Vibrios...'

      This sentence has been rephrased as recommended.

      35 surrounds -> surround

      The correction has been done.

      36 The lysis is induced by the cells not by the 'stage'. We would rephrase to 'in which the lysis and consumption of the dinoflagellates occurs'

      This sentence has been rephrased as recommended.

      41 'a new mechanism that could to be involved' -> 'a new mechanism that could be involved ...'

      The correction has been done.

      61 forms

      The correction has been done.

      98 'the role...in'

      The suggested correction has been performed.

      103 'Qpcr' -> 'qPCR'

      Thank you for spotting this typo. “Qpcr” was corrected to “qPCR” in the manuscript.

      125 Misplaced punctuation

      The punctuation was corrected.

      152 The use of '.' vs 'x' to indicate multiplication when writing numbers is inconsistent. In some cases both are missing.

      Numbers have been corrected throughout the manuscript.

      231 I would rephrase 'poor nutrient stress' to 'little nutrient stress' or 'no nutrient stress'

      The rephrasing was carried out as suggested.

      310 R and used packages are not cited

      We added the citation (R Core Team, 2024). Linear models, QQ plots (which are part of linear models), tests, and AICs are included in R by default and are credited to the R Core Team.

      The sentence “Statistical analyses were performed using R 3.6.3 software” was replaced by “Statistical analyses were performed using R 3.6.3 software (R Core Team, 2024) using Rstudio”.

      358 'are capable of simultaneously attacking'

      The expression “are capable of simultaneously attacking” was revised in the manuscript to improve clarity and readability.

      366 'exponential growth phase'

      We have corrected the wording to “exponential growth phase” in the revised manuscript.

      430 The large difference in incubation time between the sea-water vs nutrient-rich treatments and use of different media are unfortunate. These additional variables compromise the ability to directly ascribe observed differences to starvation.

      We agree, the sentence “The comparative analysis of the proteome of V. atlanticus LGP32 incubated 60 h in artificial seawater (ENSW) versus V. atlanticus LGP32 grown 12 h in Zobell nutrient-rich medium revealed 10 proteins modulated by nutrient stress (Fig. S2)” was replaced by “The comparative analysis of the proteome of V. atlanticus LGP32 incubated 60 h in artificial seawater (ENSW) versus V. atlanticus LGP32 grown 12 h in Zobell nutrient-rich medium revealed 10 proteins that were differentially abundant under these two contrasting conditions (Fig. S2)”

      443 Somewhat unclear sentence. I would rephrase this to "Remarkably, of the 10 proteins identified by proteomic analysis and eliminated by mutation, only elimination of PvuB prevented V. atlanticus from attacking A. pacificum ACT03."

      To clarify this point, the sentence “Remarkably, among the 10 proteins identified by proteomic analysis only V. atlanticus LGP32 mutant lacking pvuB failed to attack A. pacificum ACT03 (Fig. 4C; ANOVA p <0.001)” was replaced by “Remarkably, of the 10 proteins identified by proteomic analysis and eliminated by mutation, only elimination of PvuB prevented V. atlanticus from attacking A. pacificum ACT03 (Fig. 4C; ANOVA p <0.001).”

      445 'attack simultaneously' -> 'simultaneously attack'

      The suggested modification has been done.

      450 H3BO4 is written as Boron later, it would be good to call it boron here as well so that it is easier to make the connection for the reader.

      We agree, we modified the manuscript and called it boron.

      459 'no linked' -> 'no link'

      The text was modified accordingly.

      483 'which induces' -> 'which induce'

      The correction has been made.

      519 The use of Vibrio atlanticus and V. atlanticus is inconsistent within the text.

      We have checked and modified the manuscript in accordance with the recommendations.

      807-808 The use of the phrase 'Akaike information criterion (AICc) models' is confusing. Aren't these models just generalized linear models? It should be rephrased to make clear that the AICc is just a test that is used to select which model to use.

      We clarified this point by revising Figure 1 legend. The sentences “(C) Result of Akaike information criterion (AICc) models tested to explain the mean value of degraded Alexandrium cells (dead cells) in spring. (D) Wald test of the AICc model attributing the mean value of degraded cells of Alexandrium in spring to free Vibrio “were replaced by “(C) Results of the Akaike Information Criterion (AICc) test conducted to select a model for explaining the mean value of dead Alexandrium (degraded cells) in spring. (D) Wald test of the AICc model explaining the mean value of dead Alexandrium in spring by free Vibrio”

      827 The chronological sequence of snapshots is not very clear. Perhaps it would be clearer if pictures over a shorter timeframe were used to clearly show the gathering of the V. atlanticus cells near the algal cells.

      To address this point, we removed the first and the last 14 seconds of the snapshots to clearly show the gathering of the V. atlanticus cells near the algal cells, and we added an arrow on Fig. 2D to indicate the chronological order.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the editors and the reviewers for the thorough and insightful comments and suggestions. Addressing them has strengthened our manuscript. We have carefully addressed all reviewer comments, as described in detail below, as well as additional comments we received from others. In addition, we made two substantive updates to the manuscript:

      (1) We improved the estimation of uncertainty in the model predictions by computing 95% confidence intervals using 120 bootstrapped datasets (instead of the 100% of 10 bootstrapped datasets in the original submission) to match the number of bootstrap for the validation dataset.

      (2) We selected a slightly different hyperparameter value based on follow-up analyses suggested by Reviewer 1, which provided very useful information.

      Importantly, none of these changes alter the main results or conclusions of the paper.

      Beyond these changes and those outlined below, we also worked to improve the clarity of the prose throughout as well as added various additional citations to the literature.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper presents an ambitious and technically impressive attempt to map how well humans can discriminate between colours across the entire isoluminant plane. The authors introduce a novel Wishart Process Psychophysical Model (WPPM) - a Bayesian method that estimates how visual noise varies across colour space. Using an adaptive sampling procedure, they then obtain a dense set of discrimination thresholds from relatively few trials, producing a smooth, continuous map of perceptual sensitivity. They validate their procedure by comparing actual and predicted thresholds at an independent set of sample points. The work is a valuable contribution to computational psychophysics and offers a promising framework for modelling other perceptual stimulus fields more generally.

      Strengths:

      The approach is elegant and well-described (I learned a lot!), and the data are of high quality. The writing throughout is clear, and the figures are clean (elegant in fact) and do a good job of explaining how the analysis was performed. The whole paper is tremendously thorough, and the technical appendices and attention to detail are impressive (for example, a huge amount of data about calibration, variability of the stim system over time, etc). This should be a touchstone for other papers that use calibrated colour stimuli.

      Weaknesses:

      Overall, the paper works as a general validation of the WPPM approach. Importantly, the authors validate the model for the particular stimuli that they use by testing model predictions against novel sample locations that were not part of the fitting procedure (Figure 2). The agreement is pretty good, and there is no overall bias (perhaps local bias?), but they do note a statistically-significant deviation in the shape of the threshold ellipses. The data also deviate significantly from historical measurements, and I think the paper would be considerably stronger with additional analyses to test the generality of its conclusions and to make clearer how they connect with classical colour vision research. In particular, three points could use some extra work:

      (1) Smoothness prior.

      The WPPM assumes that perceptual noise changes smoothly across colour space, but the degree of smoothness (the eta parameter) must affect the results. I did not see an analysis of its effects - it seems to be fixed at 0.5 (line 650). The authors claim that because the confidence intervals of the MOCS and the model thresholds overlap (line 223), the smoothing is not a problem, but this might just be because the thresholds are noisy. A systematic analysis varying this parameter (or at least testing a few other values), and reporting both predictive accuracy and anisotropy magnitude, would clarify whether the model's smoothness assumption is permitting or suppressing genuine structure in the data. Is the gamma parameter also similarly important? In particular, does changing the underlying smoothness constraint alter the systematic deviation between the model and the MOCS thresholds? The authors have thought about this (of course! - line 224), but also note a discrepancy (line 238). I also wonder if it would be possible to do some analysis on the posterior, which might also show if there are some regions of color space where this matters more than others? The reason for doing this is, in part, motivated by the third point below - it's not clear how well the fits here agree with historical data.

      Thank you for raising this important point. We have now added analyses of the effects of the two smoothness-related hyperparameters, ε and γ (see Appendix 10).

      First, we swept a range of values for each hyperparameter (ε: 0.1 – 1; γ: 0.000001 – 0.003) and evaluated model performance using 5-fold cross-validation of the dataset used to fit the WPPM, quantifying predictive accuracy on held-out test data. We used the mean negative log likelihood averaged across the held-out data in the cross validation as our measure of predictive accuracy (Figs. S27-31).

      The two hyperparameters affect cross-validation accuracy in a similar manner. With γ fixed at 0.0003, predictive accuracy is highest for ε in the range of approximately 0.3–0.5 and drops quite rapidly for ε < 0.3. We attribute this drop to oversmoothing. Cross-validation accuracy also decreases, albeit more gradually, for ε > 0.5. We attribute this to increased variance due to undersmoothing relative to the power of our datasets. Similarly, with ε fixed at 0.4, predictive accuracy is highest for γ values between approximately 0.0001 and 0.001, declines rapidly for smaller γ (oversmoothing), and more slowly for larger γ (undersmoothing).

      Second, we examined how the hyperparameter ε affected the agreement between the WPPM fit and the MOCS validation data. Specifically, at each ε, for each participant, we computed the linear regression between WPPM thresholds and validation thresholds at 25 reference locations. Then, we examined the slope and correlation coefficient of all participants as a function of ε. We found a classic bias–variance tradeoff. Excessive smoothness introduces bias by failing to capture structure in the data, whereas insufficient smoothness increases variance in model predictions. These results further support a choice of ε = 0.4 as lying near the optimal balance between bias and variance (Fig. S32).

      Based on these analyses, we selected for the final analysis ε = 0.4, slightly smaller than the preregistered value used in the original submission (0.5), while retaining the original value of γ (0.0003).

      We now discuss these reasons for changing this value in the revision, as well as provide a more general discussion of the importance and practicalities of hyperparameter choice in Bayesian approaches to analyzing data (Discussion / Prior specification).

      (2) Comparison with simpler models. It would help to see whether the full WPPM is genuinely required. Clearly, the data (both here and from historical papers) require some sort of anisotropy in the fitting - the sensitivities decrease as the stimuli move away from the adaptation point. But it's >not< clear how much the fits benefit from the full parameterisation used here. Perhaps fits for a small hierarchy of simpler models - starting with isotropic Gaussian noise (as a sort of 'null baseline') and progressing to a few low-dimensional variants - would reveal how much predictive power is gained by adding spatially varying anisotropy. This would demonstrate that the model's complexity is justified by the data.

      In the 5-fold cross-validation analysis described above (and now presented in Appendix 10), we found that when ε or γ is small, the stronger smoothness constraint leads to threshold ellipses that are nearly identical to each other across color space. Under these conditions, model predictions show poor accuracy on held-out test data and lead to poor predictions of the validation data. This observation addresses the underlying point raised by the reviewer, albeit in a different way than suggested: it shows that a degree of spatially varying anisotropy is necessary to capture the structure of the data. We now make this point in the paper (Discussion / Prior specification).

      More broadly, we employed the WPPM as a prior that imposed smoothness but not much other obvious structure, and used this to learn about the psychometric field. We are currently working to understand how we can best use our current data to improve the prior we would apply to future measurements. There are a number of approaches to this. One would be to seek a parametric mechanistic model that can describe the current data, and to the extent this is possible formulate prior distributions over the parameters of the model. The results reported here thus provide a foundation for deriving and evaluating more structured priors that would even more efficiently leverage future datasets, but with the feature that they impose more structure. We have added this perspective to the Discussion / Extensions of the WPPM framework.

      (3) Quantitative comparison to historical data. The paper currently compares its results to MacAdam, Krauskopf & Karl, and Danilova & Mollon only by visual inspection. It is hard to extract and scale actual data from historical papers, but from the quality of the plotting here, it looks like the authors have achieved this, and so quantitative comparisons are possible. The MacAdam data comparisons are pretty interesting - in particular, the orientations of the long axes of the threshold ellipses do not really seem to line up between the two datasets - and I thought that the orientation of those ellipses was a critical feature of the MacAdam data. Quantitative comparisons (perhaps overall correlations, which should be immune to scaling issues, axis-ratio, orientation, or RMS differences) would give concrete measures of the quality of the model. I know the authors spend a lot of time comparing to the CIE data, and this is great.... But re-expressing the fitted thresholds in CIE or DKL coordinates, and comparing them directly with classical datasets, would make the paper's claims of "agreement" much more convincing.

      Although we are sympathetic to this request, we have chosen not to implement the sort of quantitative comparison requested by the reviewer. The reason is that an important feature of color thresholds is that they depend on the spatial (e.g. Kelly, 1974; Poirson & Wandell, 1996; Danilova & Mollon, 2025) and temporal (e.g. Kelly, 1974) properties of the stimuli, and on the observer’s state of adaptation (e.g. Loomis & Berger, 1979; Krauskopf & Gegenfurtner, 1992). Because (as the reviewer notes below) the spatial and temporal properties of our stimuli were not matched to those of the comparison datasets, our purpose in making these comparisons was to examine qualitative agreement, as well as to situate our results in the literature and to demonstrate that our approach allows us to read out thresholds around the references and in the color spaces used in other studies. We would not expect detailed quantitative agreement with the current dataset because of differences in stimuli.

      As a consequence of this, we think we would be overreaching to quantify the differences between our data and classic datasets. This consideration is particularly important for the MacAdam measurements, where because of the matching adjustment procedure used, the observer’s state of adaptation is likely to have varied (by amounts that are difficult to estimate) from one reference to the next (e.g. Danilova & Mollon, 2025). We have clarified the manuscript with respect to these points (Results / Comparison with previous measurements).

      A point to make on this topic is that an important and interesting future direction that emerges from our work is to develop efficient methods to characterize the dependence of the full discrimination field on ancillary variables, such as those that describe spatial and temporal properties and/or the state of adaptation, which we now also mention in the paper (Discussion / Implications for the mechanisms of color perception). Although not the primary motivation, doing so would enable comparison of data with a wider range of studies.

      We do agree that the comparisons to CIELAB predictions work better when we express them in CIELAB, and have now done so (Fig. 3D; Fig. S24-S26).

      Kelly, D. H. (1974). "Spatio-temporal frequency characteristics of color-vision mechanisms." Journal of the Optical Society of America 64(7): 983–990.

      Poirson, A. B. and B. A. Wandell (1996). "Pattern-color separable pathways predict sensitivity to simple colored patterns " Vision Research 36(4): 515–526.

      Danilova, M. V. and J. D. Mollon (2025). "Effect of stimulus size on chromatic discrimination." Journal of the Optical Society of America A 42(5).

      Loomis, J. M. and T. Berger (1979). "Effects of chromatic adaptation on color discrimination and color appearance." Vision Research 19(8): 891–901.

      Krauskopf, J., Gegenfurtner, K. (1992). "Color discrimination and adaptation." Vision Research 32(11): 2165–2175.

      Overall, this is a creative and technically sophisticated paper that will be of broad interest to vision scientists. It is probably already a definitive method paper showing how we can sample sensitivity accurately across colour space (and other visual stimulus spaces). But I think that until the comparison with historical datasets is made clear (and, for example, how the optimal smoothness parameters are estimated), it has slightly less to tell us about human colour vision. This might actually be fine - perhaps we just need the methods?

      Related to this, I'd also note that the authors chose a very non-standard stimulus to perform these measurements with (a rendered 3D 'Greebley' blob). This does have the advantage of some sort of ecological validity. But it has the significant disadvantage that it is unlike all the other (much simpler) stimuli that have been used in the past - and this is likely to be one of the reasons why the current (fitted) data do not seem to sit in very good agreement with historical measurements.

      As the reviewer notes, our stimuli head in the direction of ecological validity (see also Hedjar et al., 2025) and indeed this was a consideration when we chose them, at the cost of limiting the degree of comparison we can make with prior studies (as discussed above). Another reason we chose our stimuli is that they enable the current data to be used as a basis of comparison with stimuli where we add specularity, change object shape, and vary object pose in the future. These manipulations are not possible with flat matte patches. Such experiments are of interest to us, as they will tell us about how effectively color may be used to differentiate stimuli in cases where other ecologically important variables co-vary. We now mention this motivation in the paper (Results / Task and Stimuli).

      Hedjar, L., M. Toscani and K. R. Gegenfurtner (2025). "Importance of hue: color discrimination of three-dimensional objects and two-dimensional discs." Journal of the Optical Society of America A 42(5).

      Reviewer #2 (Public review):

      Summary:

      Hong et al. present a new method that uses a Wishart process to dramatically increase the efficiency of measuring visual sensitivity as a function of stimulus parameters for stimuli that vary in a multidimensional space. Importantly, they have validated their model against their own hold-out data and against 3 published datasets, as well as against colour spaces aimed at 'perceptual uniformity' by equating JNDs. Their model achieves high predictive success and could be usefully applied in colour vision science and psychophysics more generally, and to tackle analogous problems in neuroscience featuring smooth variation over coordinate spaces.

      Strengths:

      (1) This research makes a substantial contribution by providing a new method to very significantly increase the efficiency with which inferences about visual sensitivity can be drawn, so much so that it will open up new research avenues that were previously not feasible. Secondly, the methods are well thought out and unusually robust. The authors made a lot of effort to validate their model, but also to put their results in the context of existing results on colour discrimination, transforming their results to present them in the same colour spaces as used by previous authors to allow direct comparisons. Hold-out validation is a great way to test the model, and this has been done for an unusually large number of observers (by the standards of colour discrimination research). Thirdly, they make their code and materials freely available with the intention of supporting progress and innovation. These tools are likely to be widely used in vision science, and could of course be used to address analogous problems for other sensory modalities and beyond.

      Weaknesses:

      It would be nice to better understand what constraints the choice of basis functions puts on the space of possible solutions. More generally, could there be particular features of colour discrimination (e.g., rapid changes near the white point) that the model captures less well.

      This comment bears conceptual similarity to Reviewer 1’s question about the hyperparameters of our prior, as it is basically asking whether we might be oversmoothing through the choice of form and number of basis functions. The hyperparameter sweeps we now present suggest that within the choice of basis functions we used, we are operating at a reasonable point on the bias-variance tradeoff curve - we can see bias emerging with a smoother prior, and variance increasing with a less smooth prior. Our expectation is that varying the smoothness of the prior in other ways, such as by varying the form and number of the basis functions, would lead to similar tradeoffs.

      We did perform one additional check that shows, within our current framework, that adding more basis functions is unlikely to change things much. This was to plot the fit weights as a function of Chebyshev basis order (Figure S4 in Appendix 2). These decline to near zero at the highest order we used, suggesting that adding more would not alter the inferred psychometric field, given our hyperparameter choices. Although we could explore this question further by explicitly fitting the data using more basis functions along with different hyperparameter choices, or different functional forms for the basis functions, we decided not to pursue this in favor of performing the other additional analyses we now present.

      We resonate with the reviewer’s concern that assuming smoothness, both by assuming that isoperformance contours are elliptical and by assuming that these vary smoothly with reference, might cause us to miss features of the true underlying field in cases where that field varies rapidly or the isoperformance contours are asymmetric or non-elliptical. Our approach to this was to measure the validation thresholds and demonstrate that any bias in our WPPM-inferred field is small for these measurements. Because we shared the reviewer’s intuition that the adapting point is a candidate location where there might be less smooth variation, we measured a validation threshold at this reference for every subject. Nonetheless, we only measured in one direction around the adapting reference for each subject. We considered validation approaches where we measured full ellipses at a set of validation references, but we were worried about effects of uncertainty reduction and perceptual learning which might distort thresholds at highly sampled locations.

      It is the case that if one wanted to study the discrimination field in more detail around a particular reference, one could concentrate trials in a smaller model space around that reference, and for the same number of trials use a prior with less smoothness relative to the underlying stimulus space. Indeed, simply halving the size of the stimulus space that maps onto the [-1,1] model space and keeping the same prior over the model space effectively halves the degree of smoothness expressed with respect to the stimulus space. Thus our methods could prove useful in studying more rapid variations in the discrimination field if one hypothesized that they might occur around particular reference choices, but this would still rest upon the elliptical assumption. To relax that assumption, one could use the threshold field estimation methods implemented in AEPsych, which incorporate a smoothness assumption but do not assume elliptical isoperformance contours. Weakening the prior in this way would, however, increase trial demand to obtain similar measurement precision.

      As a general matter, we don’t think it is possible to leverage smoothness for trial efficiency on the one hand and at the same time be completely sure that there isn’t some aspect to the underlying ground truth that has been smoothed over. Carefully choosing the degree of prior smoothness together with the number of experimental trials in the context of a particular content problem is an important part of bringing the WPPM and related methods to bear, and one where simulation and held-out data both play an important role.

      We now bring these points out more fully in the paper (Discussion / Extensions of the WPPM framework; Discussion / Prior specification).

      Chen, C.-C., J. M. Foley and D. H. Brainard (2000). "Detection of chromoluminance patterns on chromoluminance pedestals I: threshold measurements." Vision Research 40(7): 773–788.

      The substantial individual differences evident in Figure S20 (comparison with Krauskopf and Gegenfurtner, 1992) are interesting in this context. Some observers show radial biases for the discrimination ellipses away from the white point, some show biases along the negative diagonal (with major axes oriented parallel to the blue-yellow axis), and others show a mixture of the two biases. Are these genuine individual differences, or could the model be performing less accurately in this desaturated region of colour space?

      We agree that these differences are interesting. We have now added more complete bootstrapped confidence regions in these (Appendix 8) and the other comparison figures (Appendix 6, 7, 9), so that an estimate of measurement precision is directly available in these figures. These confidence regions suggest that the individual differences in this region of color space are real. A longer-term goal is to develop more mechanistic models that can account for individual subject data through parameter choice. This might lead to insight into what differs in the visual system across individuals.

      Reviewer #3 (Public review):

      Summary:

      This study presents a powerful and rigorous approach for characterizing stimulus discriminability throughout a sensory manifold, and is applied to the specific context of predicting color discrimination thresholds across the chromatic plane.

      Strengths:

      Color discrimination has played a fundamental role in studies of human color vision and for color applications, but as the authors note, it remains poorly characterized. The study leverages the assumption that thresholds should vary smoothly and systematically within the space, and validates this with their own tests and comparisons with previous studies.

      Weaknesses:

      The paper assumes that threshold variations are due to changes in the level of intrinsic noise at different stimulus levels. However, it's not clear to me why they could not also be explained by nonlinearities in the responses, with fixed noise. Indeed, most accounts of contrast coding (which the study is at least in part measuring because the presentation kept the adapt point close to the gray background chromaticity, and thus measured increment thresholds), assume a nonlinear contrast response function, which can at least as easily explain why the thresholds were higher for colors farther from the gray point. It would be very helpful if a section could be added that explains why noise differences rather than signal differences are assumed and how these could be distinguished. If they cannot, then it would be better to allow for both and refer to the variation in terms of S/N rather than N alone.

      We agree with the reviewer. We are measuring SNR and attributing it to noise, but cannot identify from the data whether changes in SNR across color spaces are due to changes in noise, to a nonlinear relationship between stimulus space and the observer’s response space with noise in the response space held fixed, or both. We now make this point where we introduce the Results / Wishart Process Psychophysical Model and reiterate it in the Discussion / Extensions of the

      WPPM framework.

      Related to this point, the authors note that the thresholds should depend on a number of additional factors, including the spatial and temporal properties and the state of adaptation. However, many of these again seem to be more likely to affect the signal than the noise.

      We don’t disagree. Indeed, as we noted in our response to a comment by Reviewer 1 and above in the context of individual differences, we are very interested in developing a mechanistically plausible model that accounts for the data. If we or others are able to do so, that would provide a basis for parsing performance into separate signal and noise effects. And if such a model has natural ways in which additional variables affect its predictions, measuring the effects of these variables would be a way to provide evidence in favor of the model (Discussion / Implication for the mechanisms of color perception - Extensions of the WPPM framework).

      An advantage of the approach is that it makes no assumptions about the underlying mechanisms. However, the choice to sample only within the equiluminant plane is itself a mechanistic assumption, and these could potentially be leveraged for deciding how to sample to improve the characterization and efficiency. For example, given what we know about early color coding, would it be more (or less) efficient to select samples based on a DKL space, etc?

      The more we are willing to assume about the structure of the psychometric field, the more efficiently we can measure it. As the reviewer correctly notes, this principle applies to trial placement as well. We are currently using an adaptive method (AEPsych) that starts with a fairly weak smoothness prior and attempts to place trials using heuristics that aim to minimize the expected uncertainty in the posterior. As we learn more about the discrimination field, we should be able to leverage stronger priors to increase trial efficiency. This point is closely related to one we made above about developing stronger priors that capture what we have learned in this study. Such priors could also help improve trial placement. For a prior that has a relatively small number of parameters, for example, perhaps a mechanistic prior, methods such as Quest+ (Watson, 2017) may be used for trial placement.

      Watson, A. B. (2017). "QUEST+: A general multidimensional Bayesian adaptive psychometric method." J Vis 17(3): 10.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I do not think that the authors need to perform additional experiments. However, I would like to see some additional analyses regarding the assumptions made in the fitting procedure and how they affect the final maps.

      I also think some more quantitative comparisons with historical data would be valuable - at the moment, a lot of the comparisons are simply 'by eye'.

      It would have been nice to have the code and data available during the review procedure - I'm sure these will be released with excellent documentation?

      We addressed the first two points in the public review section. The code is now available online as is the data. These links are now provided in the paper (Methods and Materials / Data and code availability).

      Reviewer #2 (Recommendations for the authors):

      Minor points

      I have a few suggestions for additions and small changes.

      (1) Several examples of covariance matrix fields are shown in Figure 1, 4, but these are for simulated examples. It would be nice to see the fields actually fit the data! I would be interested in seeing this for all participants in an Appendix, and maybe for participant CH in the main paper?

      We have made the changes (see Figure 4 and Figure S3).

      (2) I have not worked through all the math in the appendices line by line, but it seems to be complete, and the model validation results speak for themselves. I think the authors have done a pretty good job of explaining the model conceptually (not easy), but I struggled with the 'weighted sum' step in Figure 4 and the main text. I would appreciate a bit more hand-holding here, e.g, why is an 'overcomplete' representation needed as an intermediate, and providing an intuition of why there are 12 matrices in the overcomplete representation and what each matrix in this representation represents.

      We have now added more explanations in the figure legend and text (Fig. 4 and Methods and Materials / The Wishart Process Psychometric Model).

      (3) Individual differences: There is a section on this in the manuscript, and it's concluded that there are only "modest" individual differences. However, in Figure S20, the individual differences, I think, are huge and place observers almost in qualitatively different categories! Some observers show a radial bias in discrimination ellipses, others seem to show basically a bias along the negative diagonal, and others a mixture of both biases. These ellipses are at a desaturated part of colour space - is it possible that there are some rapid changes in the underlying noise in this region that the Wishart fit has not captured due to relatively sparse sampling or the fact that the basis functions are all fairly low spatial frequency? I wondered whether the results are constrained by the choice of Cartesian rather than polar basis functions, e.g, polar basis functions may have better allowed fine-grained changes near the white point but slower changes at higher saturations away from the white point.

      We agree that the individual differences are meaningful and, in some cases, quite pronounced. Our intent in describing the differences as “modest” was to emphasize that the overall structure of the psychometric fields remains broadly consistent across observers. We have revised the Results to note and more fully describe these differences.

      Regarding the possibility that sharp changes in the underlying noise near the achromatic point might not be fully captured by the current model, we agree that this is an important consideration. The current implementation uses relatively low-order Chebyshev basis functions that primarily capture smooth global variations in the psychometric field. While validation analyses indicate that these basis functions capture the dominant structure in the data, they may be less sensitive to sharp local variations such as those that could occur near the white point. Future work could address this by mapping the model space to a smaller region around the achromatic reference or by exploring alternative basis sets (e.g., polar or Zernike functions) that may better capture such localized structure. This is discussed above in this response and now addressed in Discussion / Extensions of the WPPM framework.

      On sampling, I wondered if the results might have been biased by the strongly biased ellipse that occurs at the grey point. If not, and the model is accurate in this region of colour space, I think this figure does show some large individual differences, and it would be good to comment on these in the individual differences section of the manuscript.

      Based on our analysis of trial placement (Fig. S1), the adaptive algorithm does not appear to have disproportionately concentrated trials near the gray point. In fact, more trials were allocated to the edges of the stimulus space than to the center. This suggests that the WPPM estimates are unlikely to be driven primarily by performance in the gray region. In addition, we examined the threshold ellipses around the gray reference in DKL space and found that they are broadly consistent across participants (Figs. S22–S23). Together, these analyses suggest that the anisotropy observed near the gray point reflects a genuine property of the psychometric field rather than an artifact of the sampling procedure.

      As noted just above, we have added additional text about individual differences in the Results and referenced it in the Discussion.

      (4) The manuscript seems unusually free of typographical errors, but I noticed that in many places "Krauskopf and Karl 1992" is cited! Also, I think something has gone wrong with the legend to Figure 2 - perhaps the order of panels was swapped around, but the legend was not fully updated. There is a repeated reference to the "summary of regression slopes" which seems to be in 2 positions, after C and G. It would make more sense to label panel G as D and progress from there, or switch the order of the panels so that G is on the bottom row.

      Thank you for catching those errors. They are now fixed.

      Reviewer #3 (Recommendations for the authors):

      A minor point (or perhaps major if your last name is Gegenfurtner) is that the reference to Krauskopf and Karl is incorrect.

      They are now fixed.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The proposed compensatory mechanism is inferred primarily from transcriptional changes and metabolite levels; direct measurements of gluconeogenic flux are lacking.

      We agree that isotopic tracer experiments would provide the most direct evidence for gluconeogenic flux. While such experiments are beyond the scope of the current revision, we will explicitly acknowledge this as a key limitation and clearly state it as an important direction for future research. We note, however, that the convergent evidence from multiple independent lines, transcriptional upregulation of PEPCK and G-6-Pase, declining protein levels, altered amino acid profiles, and maintained trehalose levels, collectively supports gluconeogenic activation, even though each individual line is indirect. In the revised manuscript, we will present this evidence more cautiously, framing it as “consistent with gluconeogenic compensation” rather than definitively establishing metabolic flux.

      (2) Alternative glycogen degradation pathways (α-amylase, glycogen debranching enzymes) are proposed but not experimentally examined.

      We have now directly addressed this by measuring, via RT-qPCR, the expression of glycogen branching enzyme (GBE) and α-amylase following PxGP knockdown. Our preliminary results reveal a striking and informative pattern:

      GBE was significantly upregulated at 24 h (+29.24%), 48 h (+16.78%), and 96 h (+44.46%) post-injection, indicating transcriptional activation of an alternative glycogen-metabolizing enzyme in response to GP suppression.

      α-Amylase showed no significant change at any time point, suggesting that the compensatory response is pathway-specific rather than a generalized upregulation of all starch/glycogen-degrading enzymes.

      This differential response, GBE up while α-amylase unchanged, provides the first direct evidence that P. xylostella selectively activates specific alternative glycogen catabolic pathways when GP function is compromised. These data will be incorporated into the revised manuscript as a new figure panel.

      (3) Physiological consequences of the proposed metabolic compensation (fitness costs) are not explored.

      We have now assessed fitness consequences of PxGP knockdown by measuring feeding rate, larval body weight, and pupal weight. The results reveal a transient but significant fitness cost:

      Feeding rate: no significant difference between dsGP and dsGFP groups across all time points (24–120 h), indicating that the observed metabolic changes are not attributable to reduced food intake.

      Larval weight: significantly reduced at 24 h (−29.10%) and 48 h (−25.38%) in the dsGP group, demonstrating that metabolic compensation carries a measurable short-term cost.

      Pupal weight: no significant difference, indicating that larvae recover from the transient weight deficit before pupation.

      This pattern, transient larval weight loss with full pupal recovery, is consistent with our proposed model: GP suppression triggers protein catabolism to fuel gluconeogenesis (explaining the weight loss), but the compensatory mechanism is sufficiently effective to restore metabolic homeostasis before the pupal transition. Adult wing area and female fecundity measurements are currently in progress and will be included in the revised manuscript.

      (4) Enzyme activity is not measured in RNAi-treated insects; only transcript-level knockdown is reported.

      We have now measured GP enzyme activity (GPa) in crude extracts from RNAi-treated larvae using the coupled-enzyme spectrophotometric assay. The results provide important new insights:

      Per-larva GP activity was significantly reduced at 24 h (−27.57%) and 48 h (−29.28%), confirming that RNAi-mediated transcript suppression translates to reduced enzyme function in vivo.

      Per-protein GP activity showed a significant reduction only at 48 h (−10.35%). This apparent discrepancy is explained by a substantial decrease in total protein concentration at 24 h (−44.48%), which then gradually recovered. When enzyme activity is normalized to a declining protein pool, the per-protein reduction appears smaller.

      Importantly, the 44.48% decline in total protein at 24 h provides independent biochemical confirmation of our proposed protein catabolism: it is consistent with the mobilization of protein stores to supply amino acids for gluconeogenesis, directly supporting the compensatory mechanism described in our manuscript.

      These enzyme activity data will be presented alongside the existing transcript-level data in the revised manuscript, providing a complete picture from gene expression through enzyme function.

      (5) Conclusions regarding BPU class may require testing additional compounds beyond diflubenzuron.

      We agree and will explicitly limit our conclusion to diflubenzuron in the revised manuscript. The relevant text will be revised to state that “DFB does not inhibit PxGP” rather than making broader claims about the BPU class as a whole.

      (6) Structural evidence that GPI can bind PxGP in a comparable manner to its mammalian target is lacking.

      We have performed molecular docking and binding free energy analysis to address this concern directly. The PxGP homodimer structure was modeled using SWISS-MODEL with the rabbit muscle GP–acyl urea co-crystal structure (PDB: 2ATI; Klabunde et al., 2005) as the template. Molecular docking and MM/GBSA calculations were performed using Cresset Flare V11.

      Key findings:

      GPI exhibited substantially stronger binding to PxGP (ΔG = −34.63 kcal/mol) compared to DFB (ΔG = −29.29 kcal/mol), with a ΔΔG of −5.34 kcal/mol.

      Energy decomposition revealed that van der Waals interactions are the primary driver of selectivity (ΔG<sub>VDW</sub> = −11.49 kcal/mol), reflecting superior shape complementarity of GPI within the binding pocket.

      GPI was predicted to bind at the allosteric site at the dimer interface, engaging seven residues across both subunits (Asn44 and Val45 from chain A; Trp67, Gln71, Tyr75, Arg193, and Asp227 from chain B), a binding mode consistent with the experimentally determined site of acyl urea inhibitors in mammalian GP.

      DFB contacted only six residues, primarily from a single subunit, and its difluorobenzoyl moiety remained entirely solvent-exposed without productive protein contacts, explaining its inability to achieve effective target engagement.

      These structural data, together with the biochemical inhibition data (IC<sub>50</sub> = 2.96 nM for GPI; no inhibition by DFB), provide a comprehensive molecular explanation for the observed selectivity. The results will be presented as a new figure and table in the revised manuscript.

      (7) Dietary carbohydrates could mask the metabolic effects of GP inhibition.

      Our new data showing no difference in feeding rate between dsGP and dsGFP groups addresses this concern from one angle: the metabolic changes we observe are not attributable to altered food intake. We will also add a discussion of the potential contribution of dietary carbohydrates to glucose homeostasis and acknowledge this as a caveat in interpreting the metabolite data.

      Minor points: All terminology errors (“gluconeogenolysis” → “gluconeogenesis”), typographical errors (“over over four decades”), and formatting inconsistencies will be corrected. We will clarify the metabolite normalization approach and improve figure labeling and pathway schematics.

      Reviewer #2 (Public review):

      (1) The central premise — that structural similarity among acylurea compounds implies shared molecular targets — is not supported by existing evidence.

      We agree that the original manuscript overstated the significance of the shared acylurea core as a predictor of common biological activity. In the revised manuscript, we will substantially restructure the Introduction to:

      (1) Explicitly acknowledge the compelling genetic evidence from CRISPR/Cas9 experiments (Reference 5) establishing CHS as the primary site conferring BPU resistance.

      (2) Reframe the study’s objective: rather than proposing to “resolve” the BPU target controversy, the revised manuscript will focus on the systematic evaluation of GP as an independent insecticidal target and the discovery of a gluconeogenic compensation mechanism, questions that have scientific value independent of the BPU mechanism debate.

      (3) Remove the claim that the study “resolves the primary hypothesis.” The conclusion will instead state that our biochemical data demonstrate DFB does not inhibit PxGP, adding enzyme-level evidence to the existing genetic framework.

      (2) Target selectivity among acylurea compounds is determined by side-chain composition, not the shared core.

      We fully agree, and our new structural data now provide a molecular explanation for this principle at the atomic level. Molecular docking reveals that both GPI and DFB anchor to PxGP through their common acylurea carbonyl groups (forming hydrogen bonds with Arg193), but diverge dramatically in their side-chain engagement: GPI’s methoxyphenyl-methylurea moiety engages five additional residues across the dimer interface, while DFB’s difluorobenzoyl group remains entirely solvent-exposed. The van der Waals energy difference (ΔΔG<sub>VDW</sub> = −11.49 kcal/mol) quantitatively reflects this differential shape complementarity. These data directly support Reviewer 2’s point and will be presented as new evidence in the revised manuscript.

      (3) References 6–9 did not express CHS in cell-free assays.

      We will revise the relevant passage for greater precision. Our revised text will distinguish between (a) the absence of direct biochemical evidence for BPU-mediated CHS inhibition in cell-free systems and (b) the technical challenge of expressing and purifying functional CHS for such assays. This distinction will be stated more carefully to avoid any mischaracterization of the cited literature.

      (4) The term “dataology” is non-standard.

      This term has been removed and replaced with “data.” In accordance with eLife’s policy on the use of AI tools and technology, we will include a statement in the Materials and Methods section declaring that AI-based language editing tools were used for English grammar and style refinement. All scientific content was generated entirely by the authors.

      Author response table 1.

      We are confident that the substantial new experimental data and restructured narrative will meaningfully strengthen the manuscript.

    1. Author response:

      We thank the Reviewing Editor, Senior Editor, and both reviewers for their constructive evaluation of our manuscript. We are encouraged that the reviewers found the central question, whether donor dietary conditioning modulates FMT efficacy in ALD, compelling and the multi-omics framework a strength. Their critiques converge on a shared theme: the manuscript's mechanistic claims around caproic acid and PPARα signaling currently rest on associative and pathway-level evidence, and would benefit from more direct causal testing and more guarded language. We agree, and we outline below the revisions we plan to undertake.

      Public Reviews:

      Reviewer #1 (Public review):

      While the proteomic and gene expression data suggest activation of pathways related to fatty acid β-oxidation, these measurements do not directly demonstrate increased metabolic flux. The use of the PPARα antagonist GW6471 provides important functional support for the involvement of this pathway; however, this approach primarily establishes pathway dependency rather than directly confirming enhanced β-oxidation activity. The role of caproic acid as a central mediator is plausible but not definitively established. Finally, the link between microbiota composition, metabolic function, and host signaling remains partly correlative.

      We thank the reviewer for this thoughtful assessment. We agree that the GW6471 inhibition experiments primarily support pathway dependency rather than direct activation of PPARα by caproic acid, and we will revise the manuscript accordingly to avoid overstating mechanistic conclusions. However, we would like to clarify that the objective of the current study was not to directly quantify metabolic flux. We agree that metabolic flux should not be used here. We will be modifying this in the text to make it clear that we measured mitochondrial beta oxidation as a response to caproic acid.

      To functionally assess alterations in fatty acid β-oxidation capacity, we performed Seahorse Mito Fuel Flex assays, which demonstrated altered dependency and utilization of fatty acid oxidation pathways in response to caproic acid treatment. We will further clarify this distinction in the revised.

      In addition, we agree that the role of caproic acid as a central mediator and the relationship between microbiota composition, metabolite production, and host signaling remain partly correlative. Therefore, we will moderate the interpretation throughout the manuscript and incorporate additional correlation analyses between microbial taxa, caproic acid levels, and disease-associated metabolic parameters to strengthen the microbiota-metabolite-host association while acknowledging the associative nature of these findings.

      Reviewer #2 (Public review):

      (1) While the Methods section states that each recipient mouse group consisted of 16 animals, microbiome sequencing was performed on only 4 samples per group. This sample size is insufficient, and the high inter-individual variability observed reduces the statistical power and representativeness of the data. I recommend increasing the sequencing sample size or, at a minimum, explicitly acknowledging the risk of false positives due to the small sample size in the Discussion.

      We thank the reviewer for this important comment. We would like to clarify that microbiome sequencing was performed on 6 samples per group and not on 4 samples per group, and we will revise the Methods section to improve clarity regarding the number of biological replicates analyzed. The 4 samples were used only for whole proteome analysis.

      In addition, several previously published murine microbiome studies investigating gut microbial alterations in liver disease and FMT interventions have used comparable sample sizes (typically 5-8 animals per group) for 16S rRNA sequencing analyses [1–3]. Nevertheless, we agree that inter individual variability may influence microbiome analyses, and therefore we will explicitly acknowledge this limitation and the possibility of reduced statistical power in the revised Discussion section. We will also ensure that interpretations derived from microbiome compositional analyses are presented more cautiously.

      (2) The layout of Figure 4 should be adjusted. Panel A should be enlarged for better visibility, while Panel B should be reduced in size to balance the figure composition.

      We thank the reviewer for this suggestion. We will revise the layout of Figure 4 accordingly by enlarging Panel A for improved visibility and reducing the size of Panel B to achieve a more balanced figure composition.

      (3) A rationale should be provided for the selection of egg white protein as the animal protein control. Does this adequately represent animal proteins in general? Could the results differ if casein or whey protein were used? The current choice limits the generalizability of the conclusions, and this limitation should be addressed.

      We thank the reviewer for this important suggestion. In the revised manuscript, we will provide additional rationale for selecting egg albumin as the animal-derived protein source. Egg albumin was chosen because it is a well-characterized protein with high biological value, rapid digestibility, standardized composition, and has also been used in our previous ALD-related dietary intervention studies for experimental consistency [4].

      We agree that egg albumin does not represent all animal protein sources. Due to its rapid digestion and absorption, relatively less substrate may reach the distal gut for microbial fermentation compared with more complex proteins. In contrast, proteins such as casein or whey may generate distinct microbial and metabolite profiles and potentially different host responses.

      Accordingly, we will explicitly acknowledge this limitation in the revised manuscript and clarify that our findings should not be generalized to all animal-derived proteins.

      (4) The ALD model was established over 12 weeks, yet the FMT intervention consisted of only 3 administrations with a 1-week observation period. In the context of such a severe liver injury model, a 1-week recovery period appears insufficient to observe genuine fibrosis reversal, which typically requires a longer timeframe. The authors should discuss whether short-term FMT can truly induce structural remodeling or if the observed effects are transient.

      We thank the reviewer for this important and thoughtful observation. We agree that a one-week post-FMT observation period appears insufficient to conclude complete structural remodeling or durable fibrosis reversal in a chronic 12-week ALD model. Though it should be noted that the results achieved with the one week intervention suggest otherwise in this animal model of ALD. As can be observed from the immunohistochemistry of abstinence and treatment groups, which was further quantified for steatosis and fibrosis, there is a __% and __% reduction respectively in the treatment group. Thus we can safely conclude that in the given animal model, an alternate day FMT for 3 doses can reverse steatosis and fibrosis.

      In the revised manuscript, we will explicitly clarify this distinction.

      (5) The results rely heavily on PICRUSt2 for functional prediction. As prediction does not equate to factual validation, the authors should exercise caution in their wording within the Discussion. Alternatively, I recommend supplementing the study with shotgun metagenomic sequencing to verify the existence of these pathways rather than relying solely on predictive algorithms.

      We thank the reviewer for this important suggestion and agree that PICRUSt2-based analyses represent predictive functional inference rather than direct validation of microbial metabolic activity. We will explicitly acknowledge in the Results and Discussion that PICRUSt2 outputs are inferences rather than measurements, and we will integrate our metabolomics data to show where predicted microbial pathways (fatty acid salvage, β-oxidation related pathways) coincide with measurable metabolite shifts, providing observational support for the predictions.

      We would like to avoid doing metagenomic analysis to substantiate PICRUST2 findings primarily because metagenomic analysis would provide information on the set of genes each species carries, and not the functional state of the resulting pathways. To read out the pathways we would be left with the same two options of PICRUS2 or metabolome analysis. Yes, if we perform transcriptome analysis we can reach to a conclusion on which pathways are active. Which is likely to be similar to the readout we get from the end result of these pathways – the metabolome.

      (6) Although Egg-FMT was less effective than Veg-FMT, it performed better than the standard FMT or abstinence groups. Why is the effect of egg white protein intermediate? Is this due to rapid digestion resulting in insufficient substrate, or differences in metabolite production? A deeper comparative analysis of the Egg-FMT group is required, rather than treating it merely as a negative control.

      We thank the reviewer for this insightful observation. We agree that the Egg-FMT group demonstrated an intermediate phenotype and should not be interpreted merely as a negative control. We will modify the text in the manuscript to mention the outcomes with egg protein, wherever it missing. In the revised manuscript, we will modify the language accordingly and expand the Discussion.

      (7) “Relying solely on the ‘inhibitor blocking effect’ proves only that Caproic acid's function is dependent on the PPARα pathway, not that it directly acts on PPARα. To claim direct activation, the authors must demonstrate direct binding between Caproic acid and the PPARα protein (e.g., via SPR or MST assays). Alternatively, a luciferase reporter assay driven specifically by PPARα response elements (PPRE) should be conducted. If Caproic acid induces luminescence, it would confirm transcriptional activation of PPARα rather than mere downstream activation.”

      We thank the reviewer for this important and insightful suggestion. We agree that the current inhibitor-based experiments primarily support the involvement of the PPARα pathway and do not definitively establish direct interaction or transcriptional activation of PPARα by caproic acid. Accordingly, in the revised manuscript, we will moderate our interpretation and avoid statements implying direct activation based solely on the current data.

      We also agree that direct validation experiments such as SPR/MST-based binding assays or PPREdriven luciferase reporter assays would substantially strengthen the mechanistic conclusions. We are currently planning additional experiments to further evaluate the direct action of caproic acid on PPARα and will incorporate these analyses in future revisions and follow-up studies.

      With the pending experiments we request the Editors to kindly provide us a time of about 2 months to send back the revised manuscript.

      References:

      (1) Mitsinikos, F. T., Chac, D., Schillingford, N. & DePaolo, R. W. Modifying macronutrients is superior to microbiome transplantation in treating nonalcoholic fatty liver disease. Gut Microbes 12, 1792256.

      (2) Ferrere, G. et al. Fecal microbiota manipulation prevents dysbiosis and alcohol-induced liver injury in mice. J. Hepatol. 66, 806–815 (2017).

      (3) Zhang, Y., Li, P., Chen, B. & Zheng, R. Therapeutic effects of fecal microbial transplantation on alcoholic liver injury in rat models. Clin. Res. Hepatol. Gastroenterol. 48, 102478 (2024).

      (4) Mittal, A. et al. Protein supplementation differentially alters gut microbiota and associated liver injury recovery in mouse model of alcohol-related liver disease. Clin. Nutr. 46, 96–106 (2025).

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This Review Article provides a compendium of advice for MD-PhD students to consider when deciding which, if any, clinical field they will select for residency training. It is grounded in published data and effectively considers factors including the potential for clinical disciplines to sustain research integration, provide mentorship, meet lifestyle expectations, and foster a long-term career as a research-focused physician-scientist.

      We thank the editors for this positive assessment. We have revised the manuscript to sharpen the decision-making framework and make the advice more actionable, as detailed below.

      Public reviews:

      Reviewer #1 (Public review):

      This brief piece by Swartz and colleagues outlines the complexities surrounding the choice of clinical specialty for physician-scientists. It is, in general, clear and well-written, and it will be useful to research-oriented medical students choosing a path and to the mentors who are guiding them.

      We thank Reviewer #1 for these supportive comments.

      Strengths:

      The writing is clear. The points made are not profound, but they are important and will be of use to the intended audience.

      We appreciate this assessment and agree that the value of this piece lies in consolidating practical, experience-based guidance in one resource for trainees and mentors.

      Weaknesses:

      I have only minor suggestions for improvement. There are some areas of redundancy where the article could be tightened up by consolidating.

      We agree and have made substantial revisions to reduce redundancy throughout the manuscript. Specifically, we have streamlined the Introduction by removing a lengthy paragraph that previewed the article’s contents in a way that overlapped with later sections. The revised Introduction now concisely introduces five core decision-making factors (alignment between clinical and research interests, the structure of clinical work, availability of mentorship and research pathways, institutional culture, and financial sustainability) and directs readers to the new Table 1 and Figure 1 as organizing frameworks.

      We have also consolidated overlapping discussions of research alignment, protected time, and clinical demands. The sections on clinical workload and protected research time have been tightened to minimize repeated points about specialty-specific demands, and we now cross-reference Table 1 rather than re-stating the same considerations in multiple places. Prose has been revised throughout for concision and clarity.

      Reviewer #2 (Public review):

      This article is a useful compendium of advice for MD/PhD students (and research-focused MD students) to consider when it is time to decide on a clinical field for residency training. The authors are a distinguished group of physician-scientists and program directors who are drawing on published data and their own experience as mentors to provide advice and resources to students about to make what can be a career-defining choice.

      We thank Reviewer #2 for this generous and thoughtful evaluation.

      Strengths:

      (1) A lot has been written about physician-scientists as an endangered species. Given the important role that physician-scientists can play if they engage in research that is informed by experience in patient care, not nearly enough has been written about the choices that students make during training that can keep them on track or throw them off.

      We share this perspective and appreciate the reviewer’s recognition of this gap in the literature. Our goal was precisely to address the decision-making process itself, which is often under-discussed in formal publications despite being a frequent topic in mentoring conversations.

      (2) The article provides not only general advice, but specific information in the 2 tables that can help trainees to weigh their priorities and consider their options.

      Thank you. We have further strengthened the tabular content in this revision by adding a new Table 1 (described below) and renumbering the original tables accordingly.

      (3) Among the best advice is to weigh clinical demands, maintenance of procedural skills, recognition of the impact of research time on salary, and the impact of high salaries on the tension between research effort and clinical effort in clinical departments, which is where most physician-scientists in academia are employed.

      We appreciate this feedback and have made this advice more prominent by incorporating these factors explicitly into the new Table 1 framework and by adding a more direct statement in the text about how specialty-specific structural differences affect the ease of sustaining a research career.

      Area for Improvement

      (1) Some of the most useful pieces of advice are scattered through the text when they might be more impactful if focused. For example, what are the 4 or 5 most essential factors that someone in an MD/PhD or an MD program should weigh when they are deciding between clinical disciplines? There are also published data on the experience of past graduates in achieving a research-focused career in each clinical discipline. How should that data be applied by trainees? What are the factors that should be weighed in deciding where to work as a research-focused physician once training has been completed?

      We agree that the most critical decision-making factors were insufficiently distilled. To address this, we have made two major changes.

      First, we have added a new Table 1: “Key Decision Factors for Physician-Scientists Choosing a Clinical Specialty.” This table identifies five essential factors—(i) Alignment of Clinical Specialty with Research Focus, (ii) Structure of Clinical Work and Its Impact on Research Time, (iii) Availability of Structured Research Pathways and Mentorship, (iv) Institutional Environment and Culture, and (v) Financial Model and Long-Term Sustainability—and for each provides columns describing Why It Matters, What to Look For, and Potential Red Flags. This table is designed to be directly actionable for trainees comparing specialties and programs.

      Second, the Introduction now explicitly names these five factors as the organizing framework for the article and directs readers to Table 1 as a synthesis tool. The prior introductory paragraph, which previewed the article’s structure in a general way, has been replaced with a more focused synthesis.

      Regarding the published outcomes data: we have retained the specialty-specific outcomes data in what is now Table 2 (previously Table 1) and have added context in the text about how trainees should interpret these data—specifically, that published graduation and career outcome data provide a useful baseline but should be weighed alongside institutional context, since the same specialty can look very different at different institutions.

      Regarding factors for evaluating post-training positions: we have added a new paragraph in the section on Protected Research Time that addresses how trainees can evaluate the institutional environment at the faculty level, including specific metrics trainees can examine (see response to Points #4 and #5 below).

      (2) Some clinical fields at academic institutions have proved to be much more hospitable to careers as research-focused physicians than others. Published data highlight the challenges. I believe the authors have tried very hard to present a balanced perspective, but in the process, they have, I believe, missed an opportunity to guide trainees and make them aware of what they should look for to avoid making a decision that may prove incompatible with their long-term goals.

      We appreciate this candid observation and agree that our prior draft was overly cautious in this regard. In the revision, we have added a more explicit statement acknowledging that while successful physician-scientists exist across all specialties, the structural ease of sustaining a research-intensive career varies substantially by field. Specifically, we have added the following language to the section on Balancing Clinical and Research Responsibilities:

      “In practice, specialties with high procedural demands and unpredictable clinical schedules are often more challenging environments for sustaining research-intensive careers unless strong institutional protections are in place. While successful physician-scientists exist across all specialties, the structural ease of sustaining a research-intensive career varies substantially by field, and trainees should approach certain specialties with a clear understanding of the additional negotiation and institutional support required.”

      Additionally, the new Table 1 includes a “Potential Red Flags” column that gives trainees concrete warning signs to watch for when evaluating specialties and programs (e.g., departments primarily driven by clinical revenue with limited research infrastructure; absence of physician-scientists in leadership roles; inability to reduce clinical effort).

      (3) Where will be the jobs for physician-scientists who have an MD ± PhD and want to do research and discovery? How many openings will there be for physician-scientists in academia 5–10 years from now? In industry? How are recent events in Washington affecting the continuation of those jobs?

      after careful consideration, we believe that a detailed treatment of labor market projections, industry trends, and the effects of federal funding policy on the physician-scientist workforce falls outside the scope of this article, which is focused on the decision-making process for specialty selection. We note that the workforce question has been the subject of several recent analyses and commentaries (e.g., Milewicz et al., ASCI/AAP/APSA workforce reports) and feel that a thorough treatment would warrant a dedicated manuscript. We have not added this content but acknowledge the reviewer’s point in our thinking about future work.

      (4) Should one of the “smart choices” in the article’s title be where you do the residency, and not just which residency you do? How important is it to be at a successful, research-intensive medical center/university, both during and after residency and fellowship training? If being in an institution where there are numerous very successful physician-scientists and scientists improves the likelihood of being able to sustain a physician-scientist career, how should graduating students improve their chances of being at one of those institutions?

      This is an excellent point, and we agree that institutional environment is at least as important as specialty choice itself. We have made several changes to address this.

      In the Introduction, we have added the statement: “Importantly, the ability to sustain a physician-scientist career is often determined as much by the institutional environment and training program as by the specialty itself.” This signals early in the manuscript that “where” is as critical as “which.”

      In the new Table 1, we have included a row on “Institutional Environment and Culture” as one of the five key decision factors, with the explicit note that institutional commitment is often more determinative than specialty alone in enabling long-term success as a physician-scientist.

      We have also added a dedicated paragraph advising trainees to assess the broader institutional environment by examining: (i) the number of R01-funded investigators within the department, (ii) the presence of institutional training grants (e.g., T32 programs), and (iii) the track record of trainees transitioning from mentored (K) awards to independent (R) funding. We direct trainees to publicly available resources such as NIH RePORTER and the Blue Ridge Institute for Medical Research rankings.

      Finally, we have added a concluding sentence to the protected time section: “Taken together, these factors reinforce that institutional environment and departmental culture are often as determinative as specialty choice itself in shaping a sustainable physician-scientist career.”

      (5) In every clinical discipline, there are departments that value physician-scientists more than other departments and invest accordingly. What advice would the authors give to help graduating students identify those departments?

      This point is closely related to Point #4, and we have addressed it through the same set of revisions. The new paragraph on evaluating institutional environments provides concrete, actionable guidance for trainees on how to assess departmental commitment to physician-scientists, including specific metrics (R01 density, T32 presence, K-to-R transition rates) and publicly accessible tools (NIH RePORTER, Blue Ridge Institute rankings).

      The new Table 1 “Potential Red Flags” column highlights warning signs that a department may not be supportive of physician-scientist careers, including: departments primarily driven by clinical revenue (RVUs) with limited research infrastructure; lack of protected time enforcement; minimal NIH funding; and absence of physician-scientists in leadership roles.

      We have also expanded the existing discussion in the section on mentorship and residency selection, where we already noted the value of identifying departments with T32 grants and active physician-scientist mentors. The revised text now more explicitly connects these markers to the departmental evaluation process.

      We believe these revisions substantially strengthen the manuscript and are grateful for the reviewers’ constructive feedback.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In the paper, the authors compare the performance of their new version to two previous approaches. Figure 2b shows that the new toolbox performs similarly to the previous deep-learning-based toolbox, but requires only an anatomical scan, which is a significant improvement. They also compare it to an older method that uses an atlas without requiring deep learning. For eccentricity and pRF size predictions, both deep-learning methods perform better than the older approach. For polar angle, a critical parameter for delineating visual field maps, the gain is substantially less. Moreover, the comparison to the atlas method (Benson2014) is not entirely fair, as, to our knowledge, there is also a more advanced atlas version that uses Bayesian fitting methods and already performs better than the old method. To better understand the gain of using deep learning, it would be beneficial if the authors also made the comparison to this more recent atlas-based approach. Moreover, it would be useful to know the correlations for the representative participant. Some examples of relatively "bad" maps would also be useful to have (and could be provided as supplementary information).

      We thank the reviewer for their constructive feedback. We plan to expand our benchmarking section to include the Bayesian model comparison. Note, however, that the additional accuracy gain afforded with the Bayesian model of retinotopy (Benson and Winawer, 2018) results from combining anatomical data with retinotopic maps estimated with a few minutes of functional data. The Bayesian model of retinotopy without such functional data is equivalent to Benson14. We plan to report the correlations (between predicted and empirical maps) for the representative participant shown in Figure 2 and include an additional supplementary figure showing retinotopic map predictions for a participant whose predictions deviate the most from empirical maps, as suggested by the reviewer.

      Figure 2b shows that the toolbox is quite good at estimating eccentricity and polar angle parameters, but less good at estimating the population receptive field (pRF) size. I will return to this latter point.

      An interesting feature is that while the toolbox is trained on a specific data set (HCP), it can, "out-of-the-box", be applied to different existing data sets, without the need to retrain the model. This is quite important for the general utility of the method. The results for this are shown in Figure 3. Again, in panel b, it can be seen that the toolbox does a good job at estimating eccentricity and polar angle values, but performs rather poorly for pRF size: the deepRetinotopy toolbox has a strong tendency to only estimate very small pRFs, particularly when applying it across different datasets. For this reason, at the moment, these estimates appear hardly useful. It would be very helpful for readers if the authors could clarify or elaborate on this point, particularly regarding the limitations of pRF size predictions. They explain that this could be due to the use of different types of stimuli, but even within the same (HCP) dataset, the predictions primarily suggest tiny pRFs, even though the training dataset also contains larger ones (which can be better seen in supplementary Figure 4). Showing the predictions for higher-order brain areas, which have larger pRFs on average, could serve a similar evaluation purpose. Presumably, the underlying reasons are complex and could relate to the use of different stimuli, different analysis toolboxes, and how the deep learning model is currently being trained. Possibly, the abundance of small pRFs at lower eccentricity in the training set (which is usually the case in any empirical analysis) has given the model a very strong bias toward predicting small pRFs.

      There would be various ways to verify which of these components is critical. For example, the model could be trained only on the bar stimuli of the HCP dataset, or the pRFs for all stimuli and datasets could be estimated using the same software tool. The latter seems important. For example, Supplementary Figure 4 indicates a high correlation between the Stanford and NYU cohorts that have used the same stimulus and analysis package, despite having different resolutions and scanners. Further investigation into the underlying reasons for these discrepancies would strengthen the paper. It would also provide valuable guidance for users of the toolbox on which toolbox predictions to trust and which not, as well as how well the model generalizes to other stimulus types, scanners, and image resolutions.

      We will expand our discussion of the limitations of pRF size prediction, highlighting that differences in visual stimuli, analysis toolboxes used to estimate pRF parameters from empirical data, and the current training of deepRetinotopy affect prediction accuracy. As the reviewer pointed out, the underlying reasons are complex, and it is difficult to isolate all the potential contributing factors. However, in addition to our expanded discussion, we also intend to present results from additional experiments that assess the impact of different loss functions on the range of predicted pRF sizes (to explain how training may partly account for the differences observed in the HCP dataset). We will also perform pRF fitting on at least one dataset using the same software/encoding model as in the HCP dataset (the training data) to illustrate that the lower performance in pRF size prediction in out-of-distribution datasets is also partly explained by differences in how the empirical maps were obtained.

      An aspect that is not directly apparent from the title, abstract, and introduction is that the deepRetinotopy toolbox does not by itself produce estimates of visual area labels or boundaries. It predicts only polar angle and eccentricity values. To predict labels and boundaries, the authors combine the toolbox with an atlas (the aforementioned Bayesian atlas). For visual areas V1 - V3, it does a very good job, in that the predictions are as good as the empirical ones. Notably, the authors indicate that the predictions for V2 and, in particular, V3 are worse than for V1, but Figure 4 clearly shows that predictions are as good as the empirical ones. More cannot be expected from a model that is trained on such empirical data.

      We will edit the introduction and abstract to make it clearer that the deepRetinotopy toolbox does not yet produce estimates of visual boundaries on its own.

      Irrespective of the limitations with respect to predicting pRF size, the toolbox opens up functionally oriented analyses of very large cohorts of healthy participants, of which only anatomical data is available. The authors present an example of this by confirming the existence of differences in horizontal and vertical asymmetries in the field maps of the visual cortex of children and adults. While Figure 5 confirms the existence of differences, the analysis could be expanded to provide deeper insights, such as normalized developmental trajectories for both asymmetries, given the size of the dataset. This would better highlight the true power of their approach.

      Although providing insights into developmental trajectories for horizontal and vertical asymmetries is beyond the scope of the current work, as it would require aggregating datasets such that individuals’ age span a larger range (ABCD dataset only contains individuals between 9-11 years old and the HCP Young Adult dataset between 22-36 years old), we plan to provide some complementary analyses (differences across ages and sex within the ABCD dataset).

      While the authors address limitations with respect to studying experience-dependent atypical functional organization, they do not address how the deepRetinotopy toolbox would handle (acquired) brain lesions. Addressing this, even if only speculative, would be welcome. Another welcome addition would be to see the predictions for additional brain areas, even if those would (presumably) be worse at present. Such information would nevertheless be essential for users considering applying this toolbox. Moreover, this could be a valuable resource serving as a benchmark for future iterations of either deepRetinotopy or other approaches.

      We plan to expand and report performance evaluation across other visual areas (using Wang atlas’ parcels) to serve as a benchmarking resource. Moreover, we will expand our discussion on how deepRetinotopy would handle brain lesions.

      Reviewer #2 (Public review):

      (1) The weak point of the contribution is the choice to limit anatomical quality assessments and error quantifications to just three early regions, V1-V3, even though the deepRetinotopy toolbox can delineate over 20 regions (including parietal, ventral, and lateral regions, such as IPS0-5, hV4, VO1-2, V3A, PHC1-2, LO1-2, and TO1-2).

      (2) The limit is fine for their large-scale application of the toolbox to age groups, as here, a clear hypothesis on early cortex variability was tested.

      (3) However, the introduction of the toolbox itself warrants quality assessments and comparisons to prior models and ground truth beyond V1-V3, just like the authors did in their prior publication of the predecessor model.

      (4) This is important as the vast majority of applications of this toolbox will likely go beyond V1-V3 to delineate dorsal, ventral, and lateral regions.

      (5) For the present paper, this will require only 1 or 2 additional figures, or extending their present figures 2 and 4 along the lines of their previous figure 7 (Ribeiro et al 2021), which included error measures for high-level regions. Ideally, you provide sub-graphs separately for early visual, dorsal, ventral, and lateral regions.

      (6) Going beyond V1-V3 is important for several reasons: first, future studies applying the software beyond V3 will need quantification for reassurance and justification. Second, for the sake of transparency, even if results are noisy or on par with prior models. Third, as a benchmark or reference point for future approaches.

      We thank the reviewer for their constructive feedback, and we agree that expanding our performance assessment beyond V1-3 would be a valuable benchmarking resource. Thus, we plan to evaluate retinotopic map prediction accuracy across visual areas defined by the Wang atlas’ parcels, expanding on the results reported in Figure 2, and provide it as a supplementary figure. However, performance estimation ultimately depends on the quality of the dataset used for evaluation. The empirical maps, although treated as ground truth, may themselves misrepresent the underlying retinotopic organization. As a matter of fact, the quality of the empirical data (HCP dataset and others) is indeed lowest in some of the higher-order visual areas.

      It may be unclear from the text that the deepRetinotopy toolbox does not yet produce estimates of visual boundaries on its own. Accordingly, we illustrate how deepRetinotopy toolbox’s predictions can be combined with another tool [the Ba yesian model of retinotopy from Benson and Winawer (2018)] to obtain visual area boundaries automatically. We will edit the introduction and abstract to make it clearer. Given the availability of empirical labels (currently only for V1-3) and the segmentation tool (which was only assessed for V1-3), we cannot expand Figure 4 to other visual areas as suggested.

      Reviewer #3 (Public review):

      Quantification of the Analysis: My main concern is that the analysis relies heavily on global summary measures such as correlation and Dice score. Those measures are useful, but the paper would be more informative if it also quantified boundary differences in millimeters, especially for comparisons such as the V1/V2 boundary in Figure 2. That kind of analysis would help readers understand how large the errors are in physically meaningful terms.

      We thank the reviewer for their constructive feedback. Following the reviewer’s suggestion, we plan to expand our segmentation evaluation to quantify the extent to which boundary predictions from deepRetinotopy’s maps deviate from those from empirical maps, in millimetres.

      Model fitting methods: I also think the discussion of prediction failures for pRF size should be more explicit. The mismatch is likely influenced by the fact that the training data and several evaluation datasets were fit with different models and different analysis software. In particular, the network was trained on non-linear size estimates from the HCP data, while the comparison datasets were derived using other packages and, in some cases, different model assumptions. That likely contributes to the spread in Figure 3b and should be discussed more directly. It is important to discuss that the pRF parameters were derived using different software tools.

      We will expand our discussion of the limitations of pRF size prediction, highlighting that differences in visual stimuli, different encoding models for estimating pRF parameters from empirical data, and the current training of deepRetinotopy affect prediction accuracy. In addition to our expanded discussion, we intend to also present results from additional experiments that assess the impact of those factors on pRF size prediction performance.

      Clarifying Model Accuracy: If deepRetinotopy generates a true "noise-removed" representation of functional mapping based on anatomy, then fitting it to one fMRI measurement should predict a second, independent fMRI run better than the noisy data from the first run does.

      The authors possess the exact data for this test. For the HCP dataset, the empirical fMRI data were explicitly separated into two halves: "fit 2" (the first half of the fMRI runs) and "fit 3" (the second half). They correlated these two halves to establish a "noise ceiling," the maximum possible reliability of the data. Looking at their results in Figure 2b, the correlation of the deepRetinotopy predictions falls below this noise ceiling. This means that the noisy functional Half 1 actually predicts functional Half 2 better than the anatomical model does.

      The authors should state this explicitly. A side-by-side plot of Half 1 predicting Half 2 versus deepRetinotopy predicting Half 2 would show that the anatomical model regularizes map location well, but misses reliable subject-specific variation that anatomy alone cannot capture.

      We will expand our benchmarking session to make these comparisons (“Half 1 predicting Half 2 versus deepRetinotopy predicting Half 2”) more explicit. It is important to highlight that there is more subject-specific variation that is currently not captured by our model, and it can also serve as a benchmarking resource for future model versions and newer approaches.

      The Hemodynamic Response Function: The assumptions used to generate the original empirical maps are permanently baked into the deep learning model. However, the authors explicitly mention the hemodynamic response function (HRF) only once, noting in the Methods that the modeled time series was "convolved with a canonical hemodynamic response function."

      Beyond this single mention, there is no direct discussion of how the assumption of a single canonical HRF across all 161 HCP training subjects might have systematically impacted or biased the network's predictions. The authors address cross-dataset differences broadly under the umbrella of "experimental design" and "fMRI preprocessing pipeline" biases, but the HRF is a core biological property that mediates the connection between the anatomy and the data. The authors should explicitly discuss how this canonical assumption limits or biases the resulting deepRetinotopy network.

      As Reviewers 3 and 1 have noted, the observed limitations in pRF size prediction stem from multiple underlying factors. One of those factors is indeed the HRF assumed in the encoding models. We will expand our discussion about factors that may introduce biases into deepRetinotopy predictions, including the HRF.

      Scoping the Input Data and Normative Use: The authors use FreeSurfer to generate a mean curvature map for the entire midthickness cortical surface. This full-hemisphere curvature map is resampled to a standard template surface space (32k_fs_LR), acting as the data frame that feeds input features into the neural network. However, while the network receives the full geometric structure of the hemisphere, it is explicitly trained to predict retinotopic parameters only within a restricted posterior ROI, based on the Wang et al. atlas and containing roughly 3,200 vertices per hemisphere.

      A useful experiment to try, and perhaps the authors have already considered this, would be to restrict the input features exclusively to the posterior vertices. Including all anterior vertices may make it harder for the network to fit the localized visual data. A brief commentary on why the full hemisphere was retained as input could be highly informative for researchers adapting this geometric deep learning pipeline.

      Thanks for this suggestion. We have not performed a systematic evaluation of using ROIs that span a larger portion of the cortex (including the full hemisphere). It is a great idea to do so and report it in our manuscript to inform other researchers interested in adapting our pipeline. We intend to also update our toolbox by retraining our models to take all posterior vertices as suggested, which would improve the coverage of current predictions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful consideration of our work and constructive comments. We are glad that reviewers appreciated the rigor and value of our work. In response to the reviewer comments we have made the following changes:

      (1) Addition of new experiments on EndoA localization at the Drosophila NMJ (Fig. 2).

      (2) Addition of new experiments on Dap160 localization at the Drosophila NMJ (Fig. 2).

      (3) Addition of new experiments to validate Dynamin, Dap160 and EndoA antibodies (Fig. 2 – figure supplement 1).

      (4) Assessment of the activity-dependence of EndoA and Dap160 localization at the Drosophila NMJ (Fig. 3).

      (5) Assessment of the liprin-dependence of EndoA and Dap160 localization at the Drosophila NMJ (Fig. 8).

      (6) Addition of a limitations section to the discussion to directly address that spontaneous release was not fully ablated in our studies and might contribute to recruitment.

      (7) Addition of an outlook to the same section on what experimental avenues could address the limitations in the future.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Emperador-Melero et al. seek to determine whether recruitment of endocytic machinery to the periactive zone is activity-dependent or tethered to delivery of active zone machinery. They use genetic knockouts and pharmacological block in two model synapses - cultured mouse hippocampal neurons and Drosophila neuromuscular junctions - to determine how well endocytic machinery localizes after chronic inhibition or acute depolarization by super-resolution imaging. They find that acute depolarization in both models has minimal to no effect on the localization of endocytic machinery at the periactive zone, suggesting that these proteins are constitutively maintained rather than upregulated in response to transient activity. Interestingly, chronic inhibition slightly increases endocytic machinery levels, implying a potential homeostatic upregulation in preparation for rebound depolarization. Using genetic knockouts, the authors show that localization of endocytic machinery to periactive zones occurs independently of proper active zone assembly, even in the absence of upstream organizers like Liprin-α. Overall, they propose that the constitutive deployment of endocytic machinery reflects its critical role in facilitating rapid and reliable membrane internalization during synaptic functions beyond classical endocytosis, such as regulation of the exocytic fusion pore and dense-core vesicle fusion. Although many experiments reveal limited changes in the localization or abundance of endocytic machinery, the findings are thorough, and data substantially support a model in which endocytic components are organized through a pathway distinct from that of the active zone. This work advances our understanding of synaptic dynamics by supporting a model in which endocytic machinery is constitutively recruited and regulated by distinct upstream organizers compared to active zone proteins. It also highlights the utility of super-resolution imaging across diverse synapse types to uncover functionally conserved elements of synaptic biology.

      We thank the reviewer for the positive assessment of our study.

      Strengths:

      The study's technical strengths, particularly the use of super-resolution microscopy and rigorous image analyses developed by the group, bolster their findings.

      We thank the reviewer for highlighting the technical strength of our work.

      Weaknesses:

      One notable limitation, however, is the absence of interrogation of endocytic proteins previously suggested to be recruited in an activity-dependent manner, in particular, endophilin.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drospophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for our findings compared to previous work on Endophilin, which we discuss on lines 407-410: “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al., 2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord versus Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together with our work, we conclude that these data suggest that Endophilin constitutively, but not completely, localizes to the periactive zone.

      Reviewer #2 (Public review):

      Summary:

      This study examines whether the localization of endocytic proteins to presynaptic periactive zones depends on synaptic activity or active zone scaffolds. Using a combination of genetic and pharmacological perturbations in Drosophila and mouse neurons, the authors show that proteins such as Dynamin, Amphiphysin, AP-180, and others are still recruited to periactive zones even when evoked release or active zone architecture is disrupted. While the results are mostly negative, the study is methodologically solid and contributes to a more nuanced understanding of synaptic vesicle recycling machinery.

      We thank the reviewer for deeming our work solid and for highlighting its importance for the field.

      Strengths:

      (1) The experimental design is careful and systematic, covering both fly and mammalian systems.

      (2) The use of advanced genetic models (e.g., Liprin-α quadruple knockout mice) is a notable strength.

      (3) High-resolution imaging (STED, Airyscan) is well used to assess spatial localization.

      (4) The findings clarify that certain core assumptions - such as strict activity dependence of endocytic recruitment - may not hold universally.

      We thank the reviewer for pointing out these strengths.

      Weaknesses:

      (1) The study would benefit from a clearer positive control to demonstrate activity-dependent recruitment (e.g., Endophilin).

      We have added experiments to measure the localization of Endophilin, a protein previously reported to localize to the synaptic vesicle cloud [1], in Drosophila NMJs (Figs. 2 and 3). We observed that EndoA localized both to the synaptic vesicle cloud and to the periactive zone area. While stimulation did not enhance levels in either compartment, this outcome is not inconsistent with shuttling of protein between compartments during activity. Nevertheless, our data support a model in which EndoA, like the other tested endocytic proteins, is present at the periactive zone at rest.

      (2) The reliance on Tetanus toxin in the Drosophila NMJ experiments in my eyes is a limitation, as it does not block all presynaptic fusion events; this should be discussed more directly.

      We agree with the point of the reviewer. To more directly discuss it, we have included a “Limitations and Outlook” section in the revised version. We state that “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited” (lines 514-515). We further state that, while the manipulations that we included result in decreased spontaneous release, “it is possible that the remaining spontaneous release supports periactive zone assembly” (518-519) and that “Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017).” (519-523).

      (3) The potential role of Dynamin in organizing other periactive zone proteins is not addressed and could be an important next step.

      We agree with the reviewer that this is an interesting possibility. On lines 454-455, we make the broad point that “interactions between endocytic proteins may further contribute to the anchoring of this apparatus”, and on lines 459-460, we specifically suggest a role for Dynamin by stating that “perturbing interactions between Dynamin-1 and Endophilin-A1 increases the distance between these proteins (Imoto et al., 2024), suggesting their binding has a scaffolding function.”

      (4) Some small changes in protein levels upon silencing are reported; their biological meaning (e.g., compensation vs. variability) is not fully clarified.

      These changes might include homeostatic adaptations. In the revised version of the manuscript, this is addressed on lines 135-137 and 405-407. We think it is overall difficult to assign biological meaning to small-magnitude changes, and chose to highlight the main point that there are no large-magnitude changes.

      (5) While alternative organizing mechanisms (actin, lipids, adhesion molecules) are mentioned, a more forward-looking discussion of how to test these models would be helpful.

      Following the reviewer’s suggestion, we have added an outlook section to the discussion where we provide suggestions for future studies (lines 510-543).

      (6) The authors should consider including, or at least discussing, a well-established activity-dependent endocytic protein (e.g., Endophilin) as a positive control to help contextualize the negative findings.

      We have included new experiments on EndoA at the fly neuromuscular junction (Fig. 2, Fig. 3, Fig. 8, Fig. 3 – figure supplement 1) and have added appropriate discussion of these findings as outlined above.

      Reviewer #3 (Public review):

      Summary:

      This study examines how synaptic endocytic zones are positioned using a combination of cultured neurons and the Drosophila neuromuscular junction. The authors test whether neuronal activity, active zone assembly, or liprin-α function is required to localize endocytic zone markers, including Dynamin, Amphiphysin, Nervous Wreck, PIPK1γ, and AP-180. None of the manipulations tested caused a coordinated disruption in the localization or abundance of these markers, leading to the conclusion that endocytic zones form independently of synaptic activity and active zone scaffolds.

      We thank the reviewer for reviewing our work.

      Strengths:

      The work is systematic and carefully executed, using multiple manipulations and two complementary model systems. The authors consistently examine multiple molecular markers, strengthening the interpretation that endocytic zone positioning is robust to changes in activity and structural assembly.

      We thank the reviewer for pointing out these strengths.

      Weaknesses:

      The main limitation is that the study does not test whether the methods used are sensitive enough to detect subtle functional disruption, and no condition tested produces clear disorganization of the endocytic zone. As a result, the conclusion that these zones assemble independently is supported by negative data, without a strong positive control for disassembly or mislocalization.

      We are confident that our methods are sensitive enough to detect changes within synaptic compartments. First, for mouse neurons assessed with STED microscopy, we have demonstrated that we can distinguish between the N- and the C-termini of the presynaptic protein Bassoon, which are positioned only a few tens of nanometers apart [4]. We have subsequently been consistently able to resolve the localization of pre- and postsynaptic proteins that also localize a few tens of nanometers apart and have established that genetic manipulations of active zone proteins induce detectable disruptions as assessed by STED microscopy [4-12]. Given that the periactive zone is larger than the distances that we can resolve, we are confident that we can detect changes in this area with enough sensitivity. Second, for Drosophila NMJs, we use a carefully validated workflow that allows assessing the distribution of periactive zone proteins and can detect subtle changes [13]. Unfortunately, there are no known manipulations that lead to periactive zone disassembly that could serve as a positive control, which reflects the little knowledge available in this field. We acknowledge that there may be subtle changes in protein localization that escape the resolution of our microscopy methods or experimental design, but this would not undermine the conclusion that the periactive zone remains assembled across the manipulations that we have tested. Overall, none of the manipulations we test induces a detectable disruption of the periactive zone. Naturally, we cannot exclude milder effects and have added a limitations section to discuss this possibility and some of the subtle changes we observe.

      This paper addresses a longstanding question in synaptic biology and provides a well-supported boundary on the types of mechanisms that are likely to govern endocytic zone localization. The conclusions are well justified by the data, though additional evidence would be needed to define the assembly mechanism itself.

      We thank the reviewer for the support of the conclusion of our study.

      Recommendations for the authors:

      Reviewing Editor Comments:

      This is a rigorous study that, while presenting largely negative data, delimitates the processes that control peri-active zone organization. In addition to the interpretive and technical comments below, we encourage the authors to consider extending this study in two areas. First, examining the activity-dependence of Endophilin, and perhaps other factors, being recruited to the PAZ, where previous research has indicated a positive role for activity. Second, further characterization of the role of miniature release events in potentially contributing to PAZ organization. Overall, this was a rigorous and well-executed study.

      We thank the reviewing editor for this positive assessment of our work.

      Reviewer #1 (Recommendations for the authors):

      (1) The rationale for comparing chronic inhibition to acute depolarization could be more clearly articulated. While this approach may be grounded in prior studies, the physiological consequences of chronic silencing differ markedly from those of transient activity, and these distinctions should be more explicitly addressed in the interpretation of results. For example, might lower intensity, chronic stimulation be a better comparison? Since fixation takes place immediately after stimulation, the time window to capture changes in protein recruitment may be curtailed.

      We thank the reviewer for this comment. The introduction of the manuscript now includes a rationale on lines 110-112. By inhibiting evoked synaptic vesicle fusion throughout the lifespan of neurons, we assessed whether this process is necessary for periactive zone assembly and concluded that it is not a requirement. By acutely depolarizing neurons with 50 mM KCl or with a 40 Hz train of action potentials, we were able to test whether synaptic vesicle fusion triggers the rapid recruitment of endocytic proteins to the periactive zone and concluded that this is not the case for most of the endocytic proteins that we studied. While these results indicate that a constitutive pathway must exist to assemble the periactive zone, we remain agnostic as to whether stimulation paradigms not tested in our study can enhance the deployment of endocytic proteins, especially over long periods of time. This may be the case for low, chronic stimulation, as suggested by the reviewer. We clarify these limitations on a “limitations and outlook” section of the discussion (lines 510-543).

      (2) Amphiphysin stood out as the only protein showing a notable change in opposite directions under either active zone protein knockout/blockers and Liprin-α knockout. Given the predominance of negative results, it would be valuable to devote more discussion to why Amphiphysin behaves differently. What functional role might it play in this context that sets it apart from other endocytic components?

      As suggested by the reviewer, we have extended the discussion on Amphiphysin. One possibility why Amphiphysin may respond differently to different genetic manipulations or changes in stimulation is that different endocytic proteins might belong to different endocytic submachineries. This is addressed on lines 421-424. On lines 444-449, we further discuss the subtle decrease in the levels of Amphiphysin and AP-180 in Liprin-α mutants. We suggest that the actin cytoskeleton may be the link between the active zone and the endocytic apparatus, and that this link may be partially disrupted in Liprin-α mutants. Overall, we note that Amphiphysin is still localized to the periactive zone at rest, and hence that it fits with the overall model of constitutive deployment that we propose.

      (3) The claim of activity-independence may need to be nuanced. Although the data suggest no recruitment in response to acute stimulation, the subtle changes following chronic inhibition complicate this interpretation, especially when considering redundancy. If activity-dependence is considered bidirectional, these findings might reflect a more complex regulatory mechanism. The interpretation in lines 188-190 more accurately captures this complexity than earlier generalizations.

      We agree with the reviewer that the dependence on activity should be discussed in a nuanced fashion. We have scrutinized the manuscript on this point and state throughout that recruitment is independent of evoked activity and not necessarily of any kind of activity. We believe that this interpretation is accurate because evoked release of neurotransmitter was ablated by the pharmacological and genetic manipulations that we used. Furthermore, we have included a “Limitations of the study” section in the discussion where we openly address that spontaneous fusion of synaptic vesicles cannot be ruled out as a potential mechanism to sustain periactive zone assembly (lines 514-523). Finally, we have expanded on the complexity of periactive zone assembly relative to activity. In particular, homeostasis may contribute to increased levels of endocytic proteins upon chronic blockade of evoked transmission (lines 404-406).

      (4) Given published work on endophilin's role in activity-dependent endocytic recruitment, adding endophilin (at least in the Drosophila NMJ experiments) would be highly informative.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drosophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for these findings compared to previous work on Endophilin [3], which we discuss on lines 407-410:

      “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al.,2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord vs Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together, all data are compatible with a model in which Endophilin constitutively, but not completely, localizes to the periactive zone.

      (5) Line 57 might have a typo in the citation.

      We thank the reviewer for pointing this out. The citations now include: Bai et al., 2010; Jiang et al., 2024; Koh et al., 2007; Winther et al., 2013 and Winther et al. 2015. Please note that these two last citations are grouped as Winther et al. 2013, 2015 following our formatting style.

      (6) Line 208 might be missing a citation that justifies parameters.

      In the revision, this information is discussed on lines 222-224, where we cite our prior work describing these data: “Each unit is divided into ‘mesh’ and ‘core’ regions, where the periactive zone mesh is a ~175 nm wide area localized at ~330 nm from the center, and the ‘core’ region is the interior to this mesh (Del Signore et al., 2023)”.

      Reviewer #2 (Recommendations for the authors):

      (1) Please consider including, or at least discussing, a well-established activity-dependent endocytic protein (e.g., Endophilin) as a positive control to help contextualize the negative findings.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drosophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for our findings compared to previous work on Endophilin [3], which we discuss on lines 407-410: “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al.,2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord vs Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together, all data are consistent with a model in which Endophilin constitutively, but not completely, localizes to the periactive zone.

      (2) Expand the discussion of TeNT's limitations-specifically that it does not block spontaneous fusion or alternative fusion pathways-and consider referencing more stringent tools (e.g., Botulinum toxins or SNARE mutants), even if they weren't used here.

      Following the reviewer’s suggestion, we have included a “Limitations and Outlook” section in the revised version. We state that “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited” (lines 514-515). We further state that, while the manipulations that we included result in decreased spontaneous release, “it is possible that the remaining spontaneous release supports periactive zone assembly” (518-519) and that “Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017)” (520-523).

      (3) We encourage the authors to briefly discuss whether Dynamin might contribute to periactive zone structure beyond its role in membrane fission. Loss-of-function data could be particularly informative in future work.

      We agree with the reviewer that this is an interesting possibility. On lines 454-455, we make the broad point that “interactions between endocytic proteins may further contribute to the anchoring of this apparatus”, and on lines 459-460, we specifically suggest a role for Dynamin by stating that “perturbing interactions between Dynamin-1 and Endophilin-A1 increases the distance between these proteins (Imoto et al., 2024), suggesting their binding has a scaffolding function.”

      (4) Clarify the interpretation of increased endocytic protein levels upon chronic silencing - are these interpreted as homeostatic responses or experimental variability?

      We suggest that these changes might include homeostatic adaptations. We note that this increase is of the same magnitude as the increase in active zone proteins following a similar pharmacological manipulation on lines 405-406, where we state that “a mechanism for this effect might be a homeostatic response (Wen and Turrigiano, 2024) similar in magnitude to the increase in active zone protein levels following activity blockade (Held et al., 2020).”

      (5) The Discussion could be strengthened by sketching out more concrete experimental approaches to test candidate mechanisms (e.g., roles for actin, lipids, adhesion molecules) in organizing periactive zones.

      The potential roles of the cell adhesion molecules (lines 430-440), cytoskeleton and lipids (442-452) are addressed in the discussion. Furthermore, following the reviewer’s suggestion, we have added the following statement (lines 541-543): “This work builds a foundation to assess alternative mechanisms and models of periactive zone assembly, including roles of the cytoskeleton, lipids, adhesion molecules, and intrinsic endocytic protein interactions”. We hope that the reviewer agrees that the discussion of our paper is not the right format to provide a concrete experimental plan for future work. In our view, the discussion should put the findings of our experiments in the context of the field.

      Reviewer #3 (Recommendations for the authors):

      (1) At a spine synapse, the endocytic zone is estimated to be between 100-200nm from the active zone. The focus of the author's analysis is largely outside of this region (0-150nm), raising the question of whether the area studied may be outside of the area affected by the manipulations made. While STED systems claim ~80 nm resolution, this is rarely achieved in practice, and the authors do not report the effective resolution of their system. Reporting the resolution achieved would address this issue. In addition, super-resolution imaging does not appear to have been used at the Drosophila NMJ. The authors should clarify whether resolution limitations influenced the choice of analysis region and whether their imaging approach is sufficient to detect changes in the endocytic zone.

      We believe that it is unlikely that the relevant signals were missed. First, in mouse synapses, most signal corresponding to endocytic proteins was detected inside the selected region of interest. Our rationale to select the area was based on the fact that expanding the region analyzed would have reduced the sensitivity of our approach, as averaging over a larger area would dilute the signal. The resolution of our microscopy should not be a limitation either. In our previous work, we demonstrated that STED microscopy allows discriminating between the N- and the C-terminal termini of the presynaptic scaffold Bassoon, which are positioned only a few tens of nanometers apart [4]. This establishes that we can resolve differences at tens of nanometers in biological context, which is more relevant than the resolution measured with fluorescent beads (which we have repeatedly assessed to be ~80 nm laterally). Subsequently, we have also been consistently able to resolve the localization of pre- and postsynaptic proteins that also localize a few tens of nanometers apart [4-12]. Given that the periactive zone spans over a larger area than the distances that we can resolve experimentally in the examples above, we are confident that our measurements are sensitive enough to detect changes in this area.

      Second, for Drosophila NMJs, the choice for the region of interest and the overall analysis was done following a workflow validated in our previous work [13]. This method analyzes both immediately adjacent and more distant regions from the active zone, and does not exclude any region based on distance from the active zone as described on lines 222-224: “Each unit is divided into ‘mesh’ and ‘core’ regions, where the periactive zone mesh is a ~175 nm wide area localized at ~330 nm from the center, and the ‘core’ region is the interior to this mesh (Del Signore et al., 2023).” In our previous study, we analyzed the distribution of periactive zone proteins at rest with STED microscopy and with Airyscan confocal microscopy. The resolution provided by Airyscan is reported to be ~175 nm in XY and ~400 nm in Z, which is sufficient to assess localization to the periactive zone compartment imaging methods and is not inferior to imaging methods previously used to report changes in the distribution of endocytic proteins; for examples, see [1,2]. In the revised manuscript, we have added new data measuring the levels and distribution of EndoA and Dap160 using STED microscopy (Figure 3 – figure supplement 1). The results acquired with STED microscopy and with Airyscan confocal microscopy are consistent with one another.

      Overall, the accuracy of the imaging methods and analyses used in this study are sufficient to assess periactive zone structure given its size and organization.

      (2) Interestingly, in a number of cases, the authors observe significant differences in endocytic markers (Figure 1q, 4k, 6k, 6r). However, little is made of these differences. The authors should provide more discussion of these changes and how they make sense of them alongside their claims of a lack of effect from their manipulations.

      The reviewer raises a good point. We interpret these changes in two different ways. First, we suggest that changes observed in response to block of action potentials or disassembly of the active zone might be homeostatic. This is addressed on lines 135-137. Second, we discuss that the actin cytoskeleton may be the link between the active zone and the endocytic apparatus. Several active zone proteins interact with the actin cytoskeleton. One of them is Liprin-α. This interaction may explain the decrease in the level of Amphiphysin and AP-180 at the periactive zone in Liprin-α null neurons. This is addressed on lines 444-449. We hope that the reviewer agrees that overall, we should focus on the main conclusion that deployment of endocytic proteins persists over a number of manipulations and synapse types.

      (3) The graphs in Figure 1c and 1g, 3g, 4c, 4e, 6c, and 6g do not appear to be identical. If the solid line represents the mean and the lighter color represents the distribution of these data, these data appear to be different from one another. It is surprising that these differences are not significant. What statistical tests were used to determine whether the differences in these graphs are not significant? Is the issue that a relatively now number of synapses were examined (30-60)? Did the authors conduct a power analysis?

      We apologize if the display of our data and analyses was not clear. We do not perform statistical analyses on the line profiles. Instead, we perform it on two values that are extracted from line profiles. These values are (1) the distance between the peak intensity values of the protein of interest and the marker and (2) the peak intensity values. For example, in Figure 1, distances are quantified and statistically analyzed in panel j, and the peak levels are quantified and statistically analyzed in panel k. We have clarified this in the legend of current Figures 1, 4, 5, and 7.

      (4) The authors clearly state that their experiments address the role of evoked activity in endocytic zone positioning, but they do not examine whether spontaneous vesicle fusion might play a role. Given the availability of Drosophila mutants that decrease (Doc2, Dunc-13) or increase (syt1) spontaneous release, this is a notable omission. Ideally, these mutants should be examined. And at a minimum, the authors should discuss whether spontaneous release could contribute to endocytic zone organization.

      We agree with the reviewer that spontaneous fusion of synaptic vesicles may contribute to periactive zone organization. Many of the genetic manipulations that we used in mouse neurons result in a significant decrease in spontaneous release. This includes Ca<sub>V</sub>2 triple knockouts with a ~60% decrease in spontaneous fusion [10], RIM+ELKS quadruple knockouts with a ~70% decrease in spontaneous fusion [9] and Liprin-α quadruple knockouts with a ~50% decrease in spontaneous fusion [7]. We cannot rule out that the spontaneous release that is left is sufficient to mediate assembly functions. The conclusive way to address this possibility is using a manipulation that ablates spontaneous release without altering other pathways. However, to our knowledge, this is not available. The manipulations suggested by the reviewer might suffer from similar limitations, as they would change the frequency of spontaneous release without fully ablating it, and they would also affect evoked release. We have included a limitations section in the discussion where we address this (lines 514-523), specifically stating “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited. While many of the manipulations used here, including Ca<sub>V</sub>2 knockout (Held et al., 2020), RIM+ELKS knockout (Tan et al., 2022; Wang et al., 2016) and Liprin-α knockout (Emperador-Melero et al., 2024) in hippocampal neurons, and TeNT expression in fly NMJs (Sweeney et al.,1995) , result in 50% to 70% decreased spontaneous release rates, it is possible that the remaining spontaneous release supports periactive zone assembly. Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017).” We hope that the reviewer agrees that assessing these mutants should be a topic of future studies, given that we already test many mutants in the paper.

      (5) In Figures 1 and 6, the authors assess presynaptic protein localization in cultured neurons, but it is unclear whether these are synaptic sites. Many presynaptic proteins traffic together and can accumulate at sites lacking postsynaptic specializations. The authors should validate that the observed spatial organization occurs at bona fide synapses, ideally by co-labeling with postsynaptic markers as done in Figure 4. If methods like these were used, providing more details on how synapses were identified and selected would be useful to the reader.

      While we understand the reviewer’s point, we are confident that the structures analyzed are bona fide synapses for three reasons, as we have established before across many papers [4-8,10-12,17].

      The diameter of the structures detected using the synaptic vesicle marker Synaptophysin aligns much more closely with the size of the large vesicle clusters found at presynaptic terminals than with that of a few transport vesicles.

      In side-view synapses, the bar-like distribution of the active zone marker (Bassoon or Munc13-1) at one edge of the vesicle cloud indicates that active zone proteins are organized at one edge of the vesicle cluster—consistent with the architecture of synapses.

      Synaptophysin is one of our key markers for detecting synapses. In our cultures, most of the Synaptophysin signal colocalizes with postsynaptic markers (either PSD-95 or Gephyrin), as we have established across many studies [4,7-12]. This indicates that the markers used here are sufficient to select synapses. Furthermore, the frequency at which synapses were identified using an active zone marker as the second marker was similar to that observed when using a postsynaptic marker, suggesting that we were not randomly including unrelated structures.

      (6) Many of the images, particularly of the Drosophila NMJ, are of low quality and are shown in very small images. In addition, the quality of the images throughout the paper makes it difficult to assess the author's analysis and results. The authors should provide larger, higher-quality images that show examples of the means for each of the examples shown. This is an issue for most of the figures, but is particularly prominent in the dNMJ. A minor additional point is that the authors should be clear whether the dNMJ images are collected at super-resolution or using a conventional microscope.

      We believe that the quality of our images is sufficient for the assessments made for the following reasons:

      These images were acquired with enough spatial resolution to assess levels at the PAZ as discussed in response to this reviewer’s first comment. In our previous work, we used images acquired at the same resolution and presented in the same manner for both mouse hippocampal synapses [6,7] and Drosophila NMJs [13,18]. In those previous studies, we drew conclusions at a similar level of detail as in the current study.

      In our view, our representative images are not inferior in quality to other papers in the field addressing similar questions [1,2,19,20].

      We have selected sample images based on the quantified mean values per condition. Hence, we strived to select panels that are objectively representative regarding the quantified parameters.

      We have specified microscopy methods in the figure legends. Specifically, for Drosophila NMJs, we used Airyscan confocal microscopy and STED microscopy. For each experiment, it is now stated which microscopy method was used in the corresponding legend.

      References:

      (1) Winther, Å. M. E. et al. An Endocytic Scaffolding Protein together with Synapsin Regulates Synaptic Vesicle Clustering in the Drosophila Neuromuscular Junction. J Neurosci 35, 14756–14770 (2015).

      (2) Winther, Å. M. E. et al. The dynamin-binding domains of Dap160/intersectin affect bulk membrane retrieval in synapses. J Cell Sci 126, 1021–1031 (2013).

      (3) Bai, J., Hu, Z., Dittman, J. S., Pym, E. C. G. & Kaplan, J. M. Endophilin functions as a membrane-bending molecule and is delivered to endocytic zones by exocytosis. Cell 143, 430–441 (2010).

      (4) Wong, M. Y. et al. Liprin-alpha3 controls vesicle docking and exocytosis at the active zone of hippocampal synapses. Proc Natl Acad Sci U S A 115, 2234–2239 (2018).

      (5) Emperador-Melero, J., de Nola, G. & Kaeser, P. S. Intact synapse structure and function after combined knockout of PTPδ, PTPσ, and LAR. Elife 10, (2021).

      (6) Emperador-Melero, J. et al. PKC-phosphorylation of Liprin-α3 triggers phase separation and controls presynaptic active zone structure. Nat Commun 12, 3057 (2021).

      (7) Emperador-Melero, J. et al. Distinct active zone protein machineries mediate Ca2+ channel clustering and vesicle priming at hippocampal synapses. Nature Neuroscience 2024 1–15 (2024) doi:10.1038/s41593-024-01720-5.

      (8) Tan, C., Wang, S. S. H., de Nola, G. & Kaeser, P. S. Rebuilding essential active zone functions within a synapse. Neuron 110, 1498-1515.e8 (2022).

      (9) Wang, S. S. H. et al. Fusion Competent Synaptic Vesicles Persist upon Active Zone Disruption and Loss of Vesicle Docking. Neuron 91, 777–791 (2016).

      (10) Held, R. G. et al. Synapse and Active Zone Assembly in the Absence of Presynaptic Ca(2+) Channels and Ca(2+) Entry. Neuron 107, 667-683.e9 (2020).

      (11) Chin, M. & Kaeser, P. S. The intracellular C-terminus confers compartment-specific targeting of voltage-gated calcium channels. Cell Rep 43, 114428 (2024).

      (12) Nyitrai, H., Wang, S. S. H. & Kaeser, P. S. ELKS1 Captures Rab6-Marked Vesicular Cargo in Presynaptic Nerve Terminals. Cell Rep 31, 107712 (2020).

      (13) Del Signore, S. J., Mitzner, M. G., Silveira, A. M., Fai, T. G. & Rodal, A. A. An approach for quantitative mapping of synaptic periactive zone architecture and organization. Mol Biol Cell 34, (2023).

      (14) Sweeney, S. T., Broadie, K., Keane, J., Niemann, H. & O’Kane, C. J. Targeted expression of tetanus toxin light chain in Drosophila specifically eliminates synaptic transmission and causes behavioral defects. Neuron 14, 341–351 (1995).

      (15) Kaeser, P. S. & Regehr, W. G. Molecular mechanisms for synchronous, asynchronous, and spontaneous neurotransmitter release. Annu Rev Physiol 76, 333–363 (2014).

      (16) Santos, T. C., Wierda, K., Broeke, J. H., Toonen, R. F. & Verhage, M. Early Golgi Abnormalities and Neurodegeneration upon Loss of Presynaptic Proteins Munc18-1, Syntaxin-1, or SNAP-25. Journal of Neuroscience 37, 4525–4539 (2017).

      (17) de Jong, A. P. H. et al. RIM C2B Domains Target Presynaptic Active Zone Functions to PIP2-Containing Membranes. Neuron 98, 335-349.e7 (2018).

      (18) Del Signore, S. J. et al. An autoinhibitory clamp of actin assembly constrains and directs synaptic endocytosis. Elife 10, (2021).

      (19) Imoto, Y. et al. Dynamin 1xA interacts with Endophilin A1 via its spliced long C-terminus for ultrafast endocytosis. EMBO Journal https://doi.org/10.1038/S44318-024-00145-X

      (20) Imoto, Y. et al. Dynamin is primed at endocytic sites for ultrafast endocytosis. Neuron 110, 2815-2835.e13 (2022).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the relationship between 3D chromatin architecture and innate immune gene regulation in monocytes from patients with alcohol-associated hepatitis (AH). Using Hi-C technology, they attempt to identify structural changes in the genome that correlate with altered gene expression. Their central claim is that genome restructuring contributes to the hyper-inflammatory phenotype associated with AH.

      Strengths:

      (1) The manuscript employs Hi-C technology, which, in principle, is a powerful approach for studying genome organization.

      (2) The focus on disease-relevant genes, particularly innate immune loci, provides a contextually important angle for understanding AH.

      Weaknesses:

      (1) Sample Size: The study relies on an exceptionally small cohort (4 AH patients and 4 healthy controls), rendering the results statistically underpowered and highly susceptible to variability.

      (2) Hi-C Resolution unpaired to RNA seq: The data are presented at a resolution of 100kb, which is insufficient to uncover meaningful chromatin interactions at the level of individual genes. This data is unpaired.

      (3) Functional Validation: The manuscript lacks experiments to directly link changes in chromatin architecture with gene expression or monocyte function, leaving the claims speculative.

      (4) Data Integration: The lack of Hi-C with ATAC and RNA-seq data handicaps the analysis and really makes it superficial. In short, it does not convincingly demonstrate a functional relationship.

      (5) Confounding Factors: The manuscript neglects critical confounding variables such as comorbidities, medications, and lifestyle factors, which could influence chromatin structure and gene expression independently of AH.

      Appraisal of the Aims and Results:

      The manuscript sets out to establish a connection between chromatin architecture and AH pathology. However, the study fails to achieve its stated aims due to inadequate methods and insufficient data. The conclusions drawn from the Hi-C analyses alone are poorly supported, and the lack of functional validation undermines the credibility of the proposed mechanisms. Overall, the results do not provide compelling evidence to substantiate the authors' claims.

      Impact on the Field and Utility to the Community:

      The work, in its current form, is unlikely to have a meaningful impact on the field. The limited scope, methodological shortcomings, and lack of robust data significantly diminish its potential utility. Without addressing these critical gaps, the study does not offer new insights into the role of genome architecture in AH or provide useful methodologies or datasets for the community.

      Additional Context:

      The manuscript would benefit from a more comprehensive analysis of potential mechanisms underlying the observed changes, including the interplay between chromatin architecture and epigenetic modifications. Furthermore, longitudinal studies or therapeutic interventions could provide insights into the dynamic aspects of genome restructuring in AH. These considerations are entirely absent from the current study.

      Conclusion:

      The manuscript does not achieve its stated goals and does not present sufficient evidence to support its conclusions. The limitations in sample size, resolution, and experimental rigor severely hinder its contribution to the field. Addressing these fundamental flaws will be essential for the work to be considered a meaningful addition to the literature.

      Reviewer #2 (Public review):

      Summary:

      Dr. Adam Kim and collaborators study the changes in chromatin structure in monocytes obtained from alcohol-associated hepatitis (AH) when compared to healthy controls (HC). Through the usage of high throughput chromatin conformation capture technology (Hi-C), they collected data on contact frequencies between both contiguous and distal DNA windows (100 kB each); mainly within the same chromosome. From the analyses of those data in the two cohorts under analysis, authors describe frequent pairs of regions subject to significant changes in contact frequency across cohorts. Their accumulation onto specific regions of the genome -referred to as hotspots- motivated authors to narrow down their analyses to these disease-associated regions, in many of which, authors claim, a number of key innate immune genes can be found. Ultimately, the authors try to draw a link between the changes observed in chromatin architecture in some of these hotspots and the differential co-expression of the genes lying within those regions, as ascertained in previous single-cell transcriptomic analyses.

      Strengths:

      The main strength of this paper lies in the generation of Hi-C data from patients, a valuable asset that, as the authors emphasize, offers critical insights into the role of chromatin architecture dysregulation in the pathogenesis of alcohol-associated hepatitis (AH). If confirmed, the reported findings have the potential to highlight an important, yet overlooked, aspect of cellular dysregulation-chromatin conformation changes - not only in AH but potentially in other immune-related conditions with a component of pathological inflammation.

      Weaknesses:

      In what I regard as the two most important weaknesses of the work, I feel that they are more methodological than conceptual. The first of these issues concerns the perhaps insufficient level of description provided on the definition of some key types of genomic regions, such as topologically associated domains, DNA hotspots, or even DNA loci showing significant changes in contact frequency between AH and HC. In spite of the importance of these concepts in the paper, no operational, explicit description of how are they defined, from a statistical point of view, is provided in the current version of the manuscript.

      Without these definitions, some of the claims that authors make in their work become hard to sustain. Some examples are the claim that randomizing samples does not lead to significant differences between cohorts; the claim that most of the changes in contact frequency happen locally; or the claim that most changes do not alter the structure of TADs, but appear either within, or between TADs. In my viewpoint, specific descriptions and implementation of proper tests to check these hypotheses and back up the mentioned specific claims, along with the inclusion of explicit results on these matters, would contribute very significantly to strengthening the overall message of the paper.

      The second notable weakness of the study pertains to the characterization of the changes observed around immune genes in relation to genome-wide expectations. Although the authors suggest that certain hotspots contain a high number of immune-related genes, no enrichment analysis is provided to verify whether these regions indeed harbor a higher concentration of such genes compared to other genomic areas. It would be important for readers to be promptly informed if no such enrichment is observed, for in that case, the presence of some immune genes within these hotspots would carry more limited implications.

      Additionally, the criteria used to define a hotspot are not clearly outlined, making it difficult to assess whether the changes in contact frequencies around the immune genes highlighted in figures 5-8 are truly more pronounced than what would be expected genome-wide.

      Reviewer #3 (Public review):

      In this manuscript, the authors use HiC to study the 3D genome of CD14+ CD16+ monocytes from the blood of healthy and those from patients with Alcohol-associated Hepatitis.

      Overall, the authors perform a cursory analysis of the HiC data and conclude that there are a large number of changes in 3D genome architecture between healthy and AH patient monocytes. They highlight some specific examples that are linked to changes in gene expression. The analysis is of such a preliminary nature that I would usually expect to see the data from all figures in just one or two figures.

      In addition, I have a number of concerns regarding the experimental design and the depth of the analyses performed that I think must be addressed.

      (1) There is a myriad of literature that describes the existence of cell type-specific 3D genome architecture. In this manuscript, there is an assumption by the authors that the CD14+ CD16+ monocytes represent the same population from both healthy and diseased patients. Therefore, the authors conclude that the differences they see in the HiC data are due to disease-related changes in the equivalent cell types. However, I am concerned that the AH patient monocytes may have differentiated due to their environment so that they are in fact akin to a different cell type and the 3D genome changes they describe reflect this. This is supported by published articles for example: Dhanda et al., Intermediate Monocytes in Acute Alcoholic Hepatitis Are Functionally Activated and Induce IL-17 Expression in CD4+ T Cells. J Immunol (2019) 203 (12): 3190-3198, in which they show an increased frequency of CD14+ CD16+ intermediate monocytes in AH patients that are functionally distinct.

      I suggest that if the authors would like to study the specific effects of AH on 3D genome architecture then they should carefully FACsort the equivalent monocyte populations from the healthy and AH patients.

      (2) The analysis of the HiC data is quite preliminary. In the 3D genome field, it is usual to report the different scales of genome architecture, for example, compartments, topologically associated domains (TADs), and loops. I think that reporting this information and how it changes in AH patients in the appropriate cell types would be of great interest to the field.

      We thank the reviewers for their careful and thorough examination of our manuscript. We agree with all of their comments regarding the limitations of the study. Many of the criticisms focus on the small sample size of our study (n=4 for healthy controls and disease patients) in both Hi-C and single-cell RNA-seq experiments, and that these experiments are unpaired, or in other words, PBMCs came from different patients for each experiment.

      Unfortunately, these experiments are fairly complicated to perform, requiring patient cells and very expensive deep sequencing. We are not currently in a position to be able to easily or cost effectively increase sample size. In the case of Hi-C, we still believe our study to be of value as Hi-C is not a commonly used technique to study disease effects on chromatin, and very few studies have employed a large enough sample size to perform statistical comparisons. Additionally, to analyze the data at a higher resolution would require deeper sequencing, and unfortunately we do not have the resources to sequence these libraries deeper. Regarding the single-cell RNA-seq data, this dataset was generated for an earlier study [1] focusing on gene expression responses to LPS, and we were unable to get PBMCs from exactly the same patients to perform the Hi-C study.

      We disagree that our study has limited scientific value. Our study is the first to use Hi-C to show that the 3D genome architecture of primary monocytes is changed in a disease context. The only other study to follow a similar approach performed Hi-C in monocytes from 2 healthy and 2 Systemic lupus erythematosus (SLE) patients, and in their study the data from both patients were combined prior to comparison. No statistics were performed and their conclusion was no differences in genome architecture due to disease. They did find differences between primary monocytes and the THP1 monocytic cell line, but this lacked statistical analysis. Their conclusion was that inflammatory disease may not lead to genome wide changes in architecture. Our study, though a very different disease than SLE, shows statistically significant differences between AH and healthy controls. We believe our study lays the groundwork for how Hi-C can be used to study genome architecture in human disease, and the possible downstream effects.

      Confounding Factors: The manuscript neglects critical confounding variables such as comorbidities, medications, and lifestyle factors, which could influence chromatin structure and gene expression independently of AH.

      This is an interesting suggestion. This dataset only contains 4 AH patients, which we have included basic clinical data in Supplemental Table 1, including Age, HCA1c, Bilirubin, AST, ALT, Creatinine, Albumin, and MELD score. 3/4 of these patients are severe AH while 1 is moderate (AH2). Despite one patient being moderate, all four AH patients had similar correlations with each other, suggesting these disease specific differences we observed are not indicative of severity. More patient samples are needed to determine if genome architecture changes throughout disease progression. We have added this important discussion to the manuscript (page 12, lines 5-14).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The criteria used to determine which pairs of regions exhibit significant differences in contact frequency between alcohol-associated hepatitis (AH) and healthy controls (HC) are not disclosed. It would be beneficial for the authors to provide this information, including details such as the number of pairs tested, the nature of the statistical tests conducted, the method of multiple testing correction applied, as well as the significance thresholds used, and the number of loci-pairs below these thresholds for each chromosome. This information would greatly enhance the reader's understanding of the relevance of the reported findings.

      Thank you for this comment, though we are not sure we totally understand. All of our statistics were performed using multiHiCcompare [2], where we input all 8 datasets (.hic files from Juicer), then measured statistical differences between defined groups (HC vs AH). For our randomization studies, we randomized the group comparisons, so each group contained a mix of HC and AH.

      Second, a formal statistical definition of what constitutes a hotspot would be valuable for clarity.

      Thank you for this suggestion. Initially, hotspots were defined as just regions of the genome with a high frequency of very significant differential contacts. We have defined a more formal definition of “hotspot” based on similar criteria. A hotspot is defined by both adjusted p value and frequency of locations. First, we filtered all pair-wise chromosomal interactions by a very, very stringent padj < 0.0000001 to focus on only the most changed coordinates (Supplemental Table 4). Then we looked for regions of the genome with a high frequency of these differential locations. Borders for each hotspot were determined more liberally by looking at the full list of differential spots (padj < 0.05). Then we used code to list genes within each interacting region. We have added these important details to the Methods (page 14, lines 11-14).

      Third, a clear definition of the criteria used to identify different topologically associated domains (if these were indeed defined in the data and/or utilized in the analyses) would also be a helpful addition.

      Thank you for this suggestion, we did not identify TADs or really utilize TADs in any of these analyses.

      Likewise, several statements throughout the paper lack support from specific analyses, although it should be feasible to implement such analyses (or at least present them if they have already been conducted) to substantiate these claims:

      If randomizing samples does not result in significant differences between (randomized) cohorts, it would be beneficial to provide insights into the number of loci pairs that exhibit differences in frequency when using both the actual and randomized cohorts.

      Thank you for asking this question, as this is an important point. Using multiHiCcompare, if we compare WT (n=4) to AH (n=4), we get the results in the figures and supplementary data but if we randomize Group 1 (WT, WT, AH, AH) vs Group 2 (WT, WT, AH, AH), we get almost 0 significant changes in contact frequency. To show this more robustly, we performed 5 randomized comparisons and found far fewer changes in contact frequency between groups. This shows that these changes in contact frequency caused by disease are not random, but rather due to our real difference in AH. This point has been added to the Results (page 6, lines 15-17), and Methods (page 14, lines 16-21)

      If most changes in contact frequency occur locally, it would be useful to visualize the relationship between effect sizes and/or significance levels for the observed differences in frequency in relation to the distance between the involved loci. Additionally, comparing these results to the average baseline contact intensities as a function of distance would be informative. This comparison could help determine whether the distance decay in effect size/significance for the differences between AH and HC is faster or slower than the decay rates for baseline contact frequencies.

      This is a good suggestion. In our initial analysis, we made a number of figures relating chromosome positions, distance between loci, and statistics regarding the differential contact frequency. In the initial submission, we only showed Figure 3, which shows the logFC (log fold change) for the differential contact frequency by chromosomal position on both sides. To address this question, we have added a supplemental figure showing logFC as a function of the distance between two loci (new Supplemental Figure 3)

      Similarly, the assertion that most changes do not affect the structure of topologically associated domains (TADs) but occur either within or between TADs should be supported by specific testing; otherwise, or else, removed.

      Thank you, yes we have adjusted the language in the Discussion

      Furthermore, the authors should clarify whether differences in chromatin conformation are more pronounced around immune genes compared to genome-wide expectations. If this is not the case, it would be helpful to quantify the intensity of these differences around the highlighted genes in relation to the rest of the genome. To achieve this, I would suggest the following:

      Conduct enrichment analyses on the genes located within the most prominent hotspots to determine whether they are significantly enriched in immune genes (and, or, alternatively, in any other functional category).

      Estimate the average absolute fold change in contact frequency within all topologically associated domains (TADs) identified in the study. This would allow for the identification of immune gene-containing TADs highlighted in Figures 5-8, providing readers with a quantitative understanding of how anomalously different these genomic regions are with regards to the magnitude of its alterations in AH, compared to the rest of the genome.

      While some of the selected gene clusters appear to co-localize well with topologically associated domains (e.g., Figures 5A, 8A), others seemingly encompass either multiple TADs (Figure 6) or only portions of them (Figure 7). This should be clarified.

      Thank you, this is a great suggestion. In order to be as unbiased as possible, we took all genes present in the regions with the highest significant changes in genome (Supplemental Table 4) that we used to identify the hotspots. And you are correct, we do in fact see enrichment of genes involved in innate immune signaling. This has been added to Results (page 7, lines 19-25) and Figure 4.

      Finally, there are several minor issues concerning the figures that could be easily addressed to substantially enhance their readability:

      Font sizes in most figures should be increased, particularly for some axis labels and tick marks. This issue affects most figures; for instance, in Figure 4, it hinders the reader's ability to interpret the ranges of the data presented.

      Thank you, the figures have been adjusted

      Figures 5 to 8 (panels A and B) would benefit significantly from a more consistent format. Specifically, the gene cluster boxes should also be included in the right panels, and the gene locations should be displayed on the left in a uniform format across all figures (e.g., formatting Figures 7 and 8 to match the style of Figures 5 and 6).

      Figures 5 and 6 have a similar structure to each other because we were focusing on all of the genes in that chromosomal region. Figures 7 and 8 are different because we are focusing on how the region around a certain hotspot of interest changes.

      It is also important to note that the genes plotted in Figures 8C and 8D are not the same. Concerning these two panels, it would be valuable to clarify whether the data presented pertains exclusively to monocytes. If so, information regarding the number of cells analyzed and the number of donors from which they were drawn would also be beneficial.

      These figures are generated using scRNA-seq data. They represent all of the genes expressed in that region of the genome, in their chromosomal position. If a gene is not expressed in the scRNA-seq data, then it is not shown. I have debated with myself a lot on how to show gene expression in a region of the genome, but I think this is the clearest way to show this; including the genes that have no expression would make it more confusing. But yes, if you compare HC and AH, you see some differences in the list of genes. We have added more clarity to the figure legend for this figure.

      References

      (1) Kim, A., Bellar, A., McMullen, M. R., Li, X. & Nagy, L. E. Functionally Diverse Inflammatory Responses in Peripheral and Liver Monocytes in Alcohol-Associated Hepatitis. Hepatol Commun 4, 1459-1476 (2020). https://doi.org:10.1002/hep4.1563

      (2) Stansfield, J. C., Cresswell, K. G. & Dozmorov, M. G. multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments. Bioinformatics 35, 2916-2923 (2019). https://doi.org:10.1093/bioinformatics/btz048

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Wang et al. describes the development of an optimized soluble ACE2-Fc fusion protein, B5-D3, for intranasal prophylaxis against SARS-CoV-2. As shown, B5-D3 conferred protection not only by acting as a neutralizing decoy, but also by redirecting virus-decoy complexes to phagocytic cells for lysosomal degradation. The authors showed complete in vivo protection in K18-hACE2 mice and investigated the underlying mechanism by a combination of Fc-mutant controls, transcriptomics, biodistribution studies, and in vitro assays.

      Strengths:

      The major strength of this work is the identification of a novel antiviral approach with broad-spectrum and beyond simple neutralization. Mutant ACE2 enables broad and potent binding activity with the S proteins of SARS-CoV-2 variants, while the fused Fc part mediates phagocytosis to clear the viral particles. The conceptual advance of this ACE2-Fc combination is convincingly validated by in vivo protection data and by the completely abrogated protection of Fc LALA mutant.

      We thank the reviewer for his recognition and positive comments on our study.

      Weaknesses:

      Some aspects could be further modified.

      (1) A previously reported ACE2 decamer (DOI: 10.1080/22221751.2023.2275598) needs to be mentioned and compared in the Discussion part.

      We thank the reviewer for pointing out this weakness.

      Indeed, previous studies reported that the ACE2-IgM decamer, taking advantage of the decameric structure of IgM, exhibited higher avidity to spikes and greater potency for viral neutralization [1-3]. In particular, the study by Guo et al. has demonstrated a broad-spectrum neutralization ability of the ACE2-IgM decamer against multiple SARS-CoV-2 variants and reported the efficacy of intranasal prophylaxis in preventing lethal SARS-CoV-2 challenge in K18-hACE2 mice.

      We agree with the reviewer that it is promising that our B5-D3 design would benefit from switching to the IgM isotype. However, the distinct biological features imposed by IgM Fc, including short serum half-life and restricted tissue penetration [4], may complicate the study design and diverge our focus.

      In our current study, we would focus on the IgG1 Fc-based decoy design, while inactivating the enzyme activity of ACE2 to avoid disturbing the renin angiotensin system. This design allowed us to compare diverse administration routes and regimens and to gain useful insights into the potential of sACE2-Fc decoy in combating SARS-CoV-2 in vivo.

      We appreciated the reviewer‘s insightful suggestion. In the revised manuscript, we have included additional discussion regarding ACE2-IgM decamer, addressing the relevant concern on page 17 lines 409–414.

      (2) Limitations of this study, such as off-target binding and potential immunogenicity, should also be discussed.

      We thank the reviewer for his insightful comments and agree that off-target activity is a major concern for designing the ACE2 decoy.

      (1) In our study, the representative sACE2-Fc decoy candidate B5-D3 contains H374N mutation (D3) that is designed to inactivate ACE2 enzyme activity by causing dyscoordination of Zn2+. Our in vitro enzymatic activity assay has demonstrated that the H374N mutation (D3), as well as other three single mutations D1, D4 and D5, in either WT sACE2-Fc or B5 mutant, could effectively abolish the hACE2 enzyme activity (Supplementary Fig. 2e, h).

      (2) To further address the concern on off-target activity, we performed AAV-based overexpression experiments in K18-hACE2 mice and examined serum levels of RAS hormones, using ELISA methods that specifically detect serum renin, Angiotensin II (Ang II), and Ang (1-7). While our data from WT sACE2-Fc overexpression revealed significantly elevated serum renin and Ang II, indicating a disruption of the RAS (Supplementary Fig. 4d, e); the results from examined double mutants, including B5-D3, showed negligible change in any of these metabolite levels, demonstrating no off-target effect and minimal disturbance to the RAS activity in K18-hACE2 mice (Supplementary Fig. 4d–f).

      (3) Moreover, in this experiment, after the prolonged overexpression of all these molecules in K18hACE2 mice, histological examination of multiple organs showed no evidence of immune cell infiltration and tissue damage and no difference was observed between the mice receiving WT sACE2-Fc or B5-D3(Supplementary Fig. 4g).

      In the revised manuscript, we have included the results from the AAV-delivered in vivo overexpression of WT sACE2-Fc and three most promising double mutants (B5-D3, B5-D4 and B5-D5) on page 5 lines 118–122 and on page 6 lines 123–135 in the main text. The relevant data were presented in the new Supplementary Fig. 4.

      Reviewer #2 (Public review):

      Summary:

      Wang et al. engineered an optimized ACE2 mutant by introducing two mutations (T92Q and H374N) and fused this ACE2 mutant to human IgG1-Fc (B5-D3). Experimental results suggest that B5-D3 exhibits broad-spectrum neutralization capacity and confers effective protection upon intranasal administration in SARS-CoV-2-infected K18-hACE2 mice. Transcriptomic analysis suggests that B5D3 induces early immune activation in lung tissues of infected mice. Fluorescence-based biodistribution assay further indicates rapid accumulation of B5-D3 in the respiratory tract, particularly in airway macrophages. Further investigation shows that B5-D3 promotes viral phagocytic clearance by macrophages via an Fc-mediated effector function, namely antibody-dependent cellular phagocytosis (ADCP), while simultaneously blocking ACE2-mediated viral infection in epithelial cells. These results provide insights into improving decoy treatments against SARS-CoV-2 and other potential respiratory viruses.

      Strengths:

      The protective effect of this ACE2-Fc fusion protein against SARS-CoV-2 infection has been evaluated in a quite comprehensive way.

      We thank the reviewer for his recognition and positive comments on our study.

      Weaknesses:

      (1) The paper lacks an explanation regarding the reason for the combination of mutations listed in Supplementary Figure 2b. For example, for the mutations that enhance spike protein binding, B2-B6 does not fully align with the mutations listed in Table S1 of Reference 4, yet no specific criteria are provided.

      We thank the reviewer for pointing out this negligence.

      We constructed the B2-B6 mutants based on the study by Chan et al. [5] (Reference 4 in the previous version), mainly referencing to their Fig. 1A rather than to their Table S1. In Chan’s study, each of the proposed mutations were discovered as single mutations in monomeric sACE2 molecules based on the enrichment in target cell-binding. T92 was a notable hot spot for enriched mutations in their Fig. 1A.

      Since monomeric and dimeric forms of sACE2 showed dramatically different kinetics for ACE2-RBD interaction, we selected five proposed mutations and further examined their affinity and activity in dimeric sACE2-Fc in our study. We chose not only the combinations of mutations, such as B3, B4, and B6 proposed in their Table S1, but also explored less-complicated mutation(s) like B2 (T27Y/L79T) and B5 (T92Q) in their Fig. 1A, which were in silico predicted to enhance ACE2-RBD binding but not tested in sACE2-Fc in Chan’s study.

      Interestingly, although our results confirmed enhanced viral neutralization by all these mutations, the activity increase compared to WT ACE2-Fc was rather limited. Hence, we chose not to explore other mutations but to focus on B2–B6 to construct an enhanced ACE2-Fc decoy as a representative, to investigate the potential of ACE2-Fc decoys in combating SARS-CoV-2 infections.

      In the revised manuscript, we have further amended the writing on page 4 lines 84–87 to enhance the readability. Whereas for conciseness of the manuscript, we did not describe in too much detail how we selected the mutations to be tested.

      Second, for the mutations that abolished enzymatic activity, while D1 and D2, D3, D4, and D5 are cited from References 12, 11, and 33, respectively, the reason for combining D3 and D4 into A2, and D1 and D2 into A3 remains unexplained. It is also unclear whether some of these other possible combinations have been tested. Furthermore, for the B5-derived mutations, only double-mutant combinations with D1-D5 are tested, with no attempt made to evaluate triple mutations involving A2 or A3.

      We thank the reviewer for pointing out this negligence.

      A2 and A3 mutations were originally proposed as double mutations [6,7]. A2 (H374N/H378N) was first reported by Guy et al. [6] (Reference 11 in the previous version), while A3 (R273G/T445G) was originally proposed in Payandeh et al.’s study [7] (Reference 33 in the previous version).

      In this study, we further split the two mutations in A2 and A3, to generate the single enzymedeactivating mutations, D1 and D2 from A3, and D3 and D4 from A2. Among these single mutations, D2 failed to inactivate ACE2 enzymatic activity (Supplementary Fig. 2e), and it was excluded in subsequent analyses.

      D5 (H345L) was a single mutation directly adopted from the report by Glasgow et al. [8] (Reference 12 in the previous version).

      After combining the B5 with the enzyme-deactivating mutations (A2, A3, D1, D3, D4, D5), our neuralization assay results showed that, the simpler compound mutants with only two mutations, like B5-D1, B5-D3, B5-D4 and B5-D5, exhibited stronger neutralization capacity than B5-A2 and B5-A3 with triple mutations. Moreover, since fewer mutations were more favorable to reduce risks in causing protein structure alteration and evoking host immunity, we then focused on the sACE2-Fc double mutants B5-D3, B5-D4 and B5-D5 in the subsequent neutralization and overexpression assays (Supplementary Fig. 3 and 4), and examined B5-D3 as a representative candidate in the in vivo infection tests and follow-up analysis (Figure 2–6, and Supplementary Figures 5–18).

      We agree that the lack of explanation for splitting A2 and A3 into D1 to D4 single mutations made the rationale unclear. In the revised manuscript, we have included our previous test results on B5-A2 and B5-A3, cited Lei et al.’s study using A2 in ACE2 decoy [9], and explained the rationale for splitting A2 and A3 into D1 to D4 mutations. Relevant revision was made on page 4 lines 94–97 in the main text, while the design and data for B5-A2 and B5-A3 were included in the revised Figure 1b and Supplementary Figure 2b, f–h.

      (2) Figures 1b, 1d, and 1e lack statistical analyses, making it difficult to determine whether B5 and D3 exhibit significant advantages. For Wuhan-Hu-1 strain, B2 and B5 are similar, and for D614G strain, B2, B3, B4, B5, and B6 display comparable results. However, only the glycosylation-related single mutant B5 is chosen for further combinatorial constructs. Moreover, for VOC/VOI strains, B5 is superior to B5-D3; for the Alpha strain, B5-D4 and B5-D5 are superior to B5-D3; and for the Delta and Lambda strains, B5-D5 is superior to B5-D3. These observations further highlight the need for a clearer explanation of the selection strategy.

      We agree with the reviewer’s insightful observations.

      Indeed, although our results confirmed enhanced viral neutralization by these reported mutations, the activity increases compared to WT ACE2-Fc were generally limited. Importantly, these observations were largely consistent with other reports (including the study by Chan et al. [5]), suggesting limited potential of mutagenesis in enhancing the ACE2-RBD/Spike interaction. Therefore, we chose to selectively examine B2-B6 to construct an enhanced ACE2-Fc decoy with reasonable performance, as a representative candidate to study the application potential of ACE2-Fc decoy.

      The IC<sub>50</sub> values in Figures 1b, 1d, and 1e were calculated from neutralization curves, measuring infection reduction at multiple concentrations in duplicates, which therefore were presented with statistical support. Based on the multiple neutralization assays, B5-D3 consistently showed a high performance among other top-performers (Figure 1, Supplementary Fig. 2f,g, and Supplementary Fig. 3).

      We agree that B2 and B5 performed comparably well in neutralization assays, but B2 contains two mutations (T27Y/T92Q) while B5 carries a single mutation (T92Q). Hence, we decided to focus on B5 due to its lowest mutational burden and least potential risk.

      We agree that for VOC/VOI strains, B5 was superior to B5-D3 in pseudovirus-neutralization assays. However, B3-D3 was enzymatically inactive, which is essential for generating safe ACE2 decoy and, therefore, justifies our usage of B5-D3 over B5.

      We agree with the reviewer that, altogether, the B5-D3 did not show significant advantages than other top performers like B5-D4 and B5-D5. Here, B5-D3 was selected as a representative, which performed equally well rather than being the most outstanding candidate, for subsequent examination of efficacy, safety, and mechanistic insights.

      We thank the reviewer for his valuable feedback. In the revised manuscript, we have further amended our description of B5-D3, as a “representative” candidate, to improve the readability. Relevant changes can be found on page 4 line 84, page 5 line 109, page 14 line 333 and page 15 line 360.

      (3) Figure 1e does not specify the construct form of the control hIgG1, namely whether it is an hIgG1 Fc fragment or a full-length hIgG1 protein. If the full-length form is used, the design of its Fab region should be clarified to ensure the accuracy and comparability of the experimental control.

      We thank the reviewer for pointing out this negligence.

      In this study, we used the in vivo grade recombinant human IgG1 isotype control antibody in its full length (Syd labs, #PA007125) as the negative control. It is the 4F17 clone, which is widely used and showed low or no specific binding to any human samples [10] (Human IgG1 Isotype Control Antibody | Recombinant, in vivo Grade - Syd Labs). We have added the relevant information in the MATERIALS AND METHODS on page 23 lines 548–549.

      (4) In Figure 2a, all three PBS control mice died, whereas in Figure 2f, three out of five PBS control mice died, with the remaining showing gradual weight recovery. This discrepancy may reflect individual immune variations within the control groups, and it is necessary to clarify whether potential autoimmune factors could have affected the comparability of the results. Also, the mouse experiments suffer from insufficient sample sizes, which affects the statistical power and reliability of the results. In Figure 2a, each group contains only 4 replicates, one of which was used for lung tissue sampling. As a result, body weight monitoring data is derived from only 3 mice per group (the figure legend indicating n=4 should be corrected to n=3). Such a small sample size limits the robustness of the conclusions. Similarly, in Figure 2f, although each group has 5 replicates, body weight data are presented for only 4 mice, with no explanation provided for the exclusion of the fifth mouse. Furthermore, the lung tissue experiments in Figure 3a include only 3 replicates, which is also inadequate.

      We thank the reviewer for his valuable feedback.

      Figure 2a was the first in vivo infection experiment of this study, and we performed the test in aged female K18-hACE2 mice at 10–12 months old. Whereas for the subsequent experiments in Figure 2f and Figure 3, we changed to young female K18-hACE2 mice at 2–3 months old, because the limited supply of old mice. While in Figure 2a, four aged mice (not three) in the PBS control group all died within 7 dpi, results of Figure 2f and Figure 3 consistently showed heterogeneous responses among young mice in the PBS control groups. Since increased susceptibility to SARS-CoV-2 infection has been broadly observed among aged human populations and it was also supported by mouse study [11], here we would attribute the observed discrepancy to the age difference between the two cohorts in Figure 2a and 2f. In the revised manuscript, we have further elucidated this observation in results (on page 7 lines 163–167) and included a new reference for better clarification (page 7 line 167).

      Furthermore, because the PBS control mice in both Figure 2a and 2f died within 7 dpi, which was too soon for autoimmune factors to take place. Moreover, we have performed AAV-based prolonged overexpression experiments in K18-hACE2 mice (new Supplementary Fig. 4), which showed no tissue damage in either WT sACE2-Fc or B5-D3 treated mice, suggesting low immunogenicity. Collectively, the autoimmune factors are unlikely the reason leading to the different survival between PBS controls in Figure 2a and 2f.

      We thank the reviewer for pointing out the weakness regarding small sample sizes in our study.

      (1) In Figure 2a–c, the experiment was performed in an aged cohort at 10–12 months old, starting with 5 mice in each virus-inoculated group and 4 mice in the mock control group. At 4 dpi, we sacrificed one mouse from each group for tissue analysis. Therefore, in the survival analysis, there were 4 mice in each virus-inoculated group and 3 mice in the mock control group, whose survival and body weight changes were presented in Figure 2b, c.

      Despite the relatively small sample sizes in Figure 2b, c, all 4 PBS control mice died, while all 4 mice in 6-hour B5-D3 IN prophylaxis group survived, demonstrating 100% survival and no sign of body weight loss. The survival and body weight data were highly consistent, strongly supporting that B5-D3 intranasal prophylaxis could protect the mice from lethal SARS-CoV-2 infection.

      To enhance clarity, in the revised manuscript, we have added the sample size information in chart legends in Figure 2a–c.

      (2) In Figure 2f–h, the experiment was performed in a young cohort at 2–3 months old and the body weight and survival data were presented for 5 mice in each group (not for 4 mice). Notably, although 2 out of 5 young mice in the PBS control group eventually survived from the viral infection, they had suffered significant weight loss during 4–7 dpi, similarly to the died. Whereas all 5 mice in the – 6hr B5-D3 IN prophylaxis group showed no sign of weight loss. Hence, these data were highly consistent with Figure 2b, c, supporting the efficiency of B5-D3 IN prophylaxis in protection against SARS-CoV-2 infection.

      We noticed that some data points in Figure 2g, h were very close to each other, making it difficult to distinguish the data line for individual mice. To enhance clarity, in the revised manuscript, we have added sample-size information in chart legends in Figure 2g and 2h.

      (3) In Figure 3a, we aimed to examine the lung tissues at early time points. For each treatment, we have 3 mice sacrificed at a single selected time point. Hence, total 9 mice were examined in the PBS control group and B5-D3 IN group, yielding results at 1 dpi, 2 dpi and 4 dpi that consistently supported each other. Moreover, the viral titers, S, and N protein expression analysis all showed significant difference among different groups. Therefore, our experiments have enough discrepancy between different treatment groups to draw the conclusion.

      (5) Compared to 6 hours, intranasal administration of B5-D3 at 24 hours before viral infection results in reduced protective efficacy. However, only survival and body weight data are provided, with no supporting evidence from virological assays such as viral titer measurement. Therefore, the long-term effectiveness lacks sufficient experimental validation.

      In Figure 2f–h, we aimed to compare the efficacies of IN administration of B5-D3 at different timepoints, mainly focusing on the body weight change and survival data along the infection and recovery time. As indicated by early data in Figure 2d, viruses were largely cleared by 4 dpi in mice treated with B5-D3 prophylaxis. Therefore, in this test, we did not examine virus titers in the recovered animals by the end of observation at 14 dpi. Instead, we examined plasma levels of virus-neutralizing antibodies in the survivors at the endpoint, which indeed supported that the 6-hours and 24-hours IN B5-D3 prophylaxis provided effective protection against the SARS-CoV-2 infection and resulted in minimal levels of neutralizing antibodies in plasma, as shown in Figure 2i.

      Collectively, the body weight, survival, and antibody data all supported that 6-hour IN B5-D3 prophylaxis achieved the best efficacy. Hence, we performed comprehensive viral titer and profiling analysis at early time points like 1 dpi, 2 dpi, and 4 dpi, focusing only on the 6-hour IN B5-D3 prophylaxis. This works also included B5-D3-LALA control to examine viral titers, host immune responses, and underlying mechanisms (Figure 3,4).

      We agree with the reviewer that it would be more comprehensive if our experiments could include indepth analysis of the 24-hours IN B5-D3 prophylaxis group. However, due to limited capacity of animal service, we chose to focus on the best-performing group as a representative treatment to study the underlying mechanisms.

      (6) In Figures 3b and 3c, viral spike (S) and nucleocapsid (N) RNA relative expression levels are quantified by qPCR. The results show significant individual variation within the B5-D3-LALA treatment group: one mouse exhibits high S and N expression, while the other two show low expression. Viral load levels are also inconsistent: two mice have high viral loads, and one has a low viral load. Due to this variability, the available data are insufficient to robustly support the conclusion.

      We understand the reviewer’s concern on the variability within the B5-D3-LALA group. However, we have some reservations about the importance of further increasing the sample sizes in this test.

      First, since viral gene transcription and viral particle levels represented different phases in viral life, they may follow different kinetics during infection progression and lead to variability. Second, we used different parts of the lung tissues from each mouse for extracting RNA and tissue homogenates, which were then used for detection of S/N expression and viral load levels, respectively. The uneven viral infection in the lung might also contribute to the variability. Furthermore, in this test, both our qPCR and viral load analysis data consistently demonstrated that the B5-D3-LALA was less effective than B5-D3, indicating that Fc function played an important role in supporting full protection by B5-D3 against lethal SAS-CoV-2 infections. This observation is also supported by other studies [12].

      We appreciate the valuable feedback from the reviewer. In the revised manuscript, we have further clarified these observations on page 8, lines 192–194, and included alveolar thickening data on page 9, lines 202–204.

      (7) Figure 3e: "H&E staining indicated alveolar thickening in all groups," including the Mock group. Since the Mock group did not receive virus or active drug treatment, this observed change may result from local tissue reaction induced by the intranasal inoculation procedure itself, rather than specific immune activation. A control group (no manipulation) should be set to rule out potential confounding effects of the experimental procedure on tissue morphology, thereby allowing a more accurate assessment of the drug's effects.

      We thank the reviewer for his insightful comments and suggestions.

      We have further examined our H&E staining and quantified alveolar thickening in different treatment groups. Indeed, the data suggested a transient alveolar thickening in the mock group at 1 dpi, which was improved at 2 dpi. This observation supports that the intranasal procedure itself indeed caused a transient alveolar thickening, that was evident at 1 dpi but disappeared at 2 dpi.

      Notably, moderate alveolar thickening was found to be persistent in the B5-D3-treated mice till the end point at 4 dpi. Whereas the PBS groups with intensive SARS-CoV-2 infection progressively developed severe structural damage and showed much stronger alveolar thickening than B5-D3 or mock groups at 4 dpi. Consistent with the partial protection by B5-D3-LALA, histological analysis of lung samples in this group revealed severer yet heterogenous alveolar thickening. These observations suggested that -6h IN B5-D3 treatment prevented tissue damage brought by infection with minimal yet efficient immune activation.

      In the revised manuscript, we have included the quantitation results of alveolar thickening on page 9, lines 200–204 and presented the data in new Supplementary Fig. 7.

      (8) In Supplementary Figure 11b, a considerable number of alveolar macrophages (AMs) are observed in both the PBS and B5-D3 groups. This makes it difficult to determine whether the observed accumulation is specifically induced by B5-D3.

      We thank the reviewer for pointing out this issue.

      In this experiment, the cell populations examined in previous Supplementary Fig. 11b and Fig. 5h are different, though graphs appear similar.

      Supplementary Fig. 11b (new Supplementary Fig. 12b) showed the analysis among CD45+ immune cells, regardless of B5-D3-AF750 signal. The dominance of AMs among immune cell populations is a normal physiological feature of BALF cells. To make this clear, we have added new data of BALF cells from untreated mice in the revised manuscript and new Supplementary Fig. 12b.

      Fig. 5h displayed for cell type analysis among the CD45+ B5-D3-AF750+ cells —only CD45+ immune cells that took up the AF750-labeled B5-D3.

      To enhance clarity, in the revised manuscript, we have amended the labels as CD45+ B5-D3-AF750+ in Figure 5h (and similarly in revised Supplementary Fig. 13), to differentiate the data from that in CD45+ cells shown in the revised Supplementary Fig. 12b.

      (9) In the flow cytometry experiment shown in Figure 5, the PBS control group is not labeled with AF750, which necessarily results in a value of zero for "B5-D3+ cells" on the y-axis. An appropriate control (e.g., hIgG1-Fc labeled with AF750) should be included.

      We thank the reviewer for his valuable question.

      In this experiment, we intended to analyze all immune cells with positive AF750 signals, to identify the major immune cell types that took up AF750-B5-D3 as the candidate cells responsible for the observed activation of innate immunity. Hence, here we deliberately set PBS vehicle treatment without AF750 signal as the control group for gating.

      This analysis aimed to provide an overall picture of immune cell types that actively take up ACE2 decoy, likely via Fc receptor-mediated binding. Control IgG1 labeled with AF750, with an Fc region, may show similar profile and biodistribution among BALF immune cells, which, therefore, was not examined as control for gating.

      Instead, in the revised manuscript, we have added new analysis results comparing the efficiencies of B5-D3 and IgG1 in mediating pseudovirus uptake in THP-1-derived macrophages. IgG1 isotype control was examined to address ACE2-specific effect. Indeed, we observed no pseudovirus uptake based on p24 signal, in the IgG1 treated samples, indicating that the presence of B5-D3 is crucial for efficient pseudovirus uptake in macrophages due to the sACE2-spike affinity. These results have been added on page 13 lines 310–316 in the main text, and the relevant data was presented in new Supplementary Fig. 17.

      (10) The Methods section: a more detailed description of the experimental procedures involving HIV p24 and SARS-CoV-2 should be included.

      We thank the reviewer for pointing out this weakness.

      In the revised manuscript, we have provided further details of the relevant experimental procedures in the Materials and Methods part, on page 21, lines 507–517.

      Reviewer #3 (Public review):

      Strengths:

      The core strength of this study lies in its innovative demonstration that an engineered sACE2-Fc fusion redirects virus-decoy complexes to Fc-mediated phagocytosis and lysosomal clearance in macrophages, revealing a distinct antiviral mechanism beyond traditional neutralization. Its complete prophylactic protection in animal models and precise targeting of airway phagocytes establish a novel therapeutic paradigm against SARS-CoV-2 variants and future respiratory viruses.

      We thank the reviewer for his recognition and positive comments on our study.

      Weaknesses:

      The study attributes complete antiviral protection to Fc-mediated phagocytic clearance, a central claim that requires more rigorous experimental validation. The observation that abrogating Fc functions compromises protection could be confounded by potential alterations in the protein's stability, half-life, or overall structure. To firmly establish this mechanism, it is crucial to include a control molecule with a mutated Fc region that lacks FcγR binding while preserving the Fc structure itself. Without this critical control, the conclusion that phagocytic clearance is the primary mechanism remains inadequately supported.

      We thank the reviewer for his insightful comments and suggestions.

      The L234A/L235A mutations in human IgG1 Fc region are most widely used to abolish its FcγR binding and Fc effector functions [13]. In this study, we have used B5-D3-LALA in the in vivo infection experiments in K18-hACE2 mice, as the control molecule that lacks FcγR binding while preserving the Fc structure (Figure 3, 4).

      To address the reviewer’s concern, we further performed new analysis comparing the efficiencies of different versions of B5-D3 in mediating pseudovirus uptake in THP-1-derived macrophages. In this test, B5-D3-LALA and B5-D3 were examined side-by-side to address the role of Fc effector functions in the phagocytosis process. Meanwhile, IgG1 isotype control was examined to address ACE2-specific effect. Indeed, we detected significant reduction of pseudovirus uptake based on p24 signal, in the B5D3-LALA treated samples compared to those receiving B5-D3. This decreased pseudoviral uptake correlated with the loss of Fc-mediated effector functions in B5-D3-LALA, indicating the involvement of Fc functions in efficient macrophage uptake of B5-D3-virus complex.

      In the revised manuscript, we have included these results on page 13 lines 310–316 in the main text and presented relevant data in Supplementary Fig. 17.

      The strategy of deliberately targeting virus-decoy complexes to phagocytes via Fc receptors inherently raises the question of Antibody-Dependent Enhancement (ADE) of disease. While the authors demonstrate a lack of productive infection in macrophages, this only addresses one facet of ADE. The risk of Fc-mediated exacerbation of inflammation (ADE) remains a critical concern. The manuscript would be significantly strengthened by a direct discussion of this risk and by including data, such as cytokine profiling from treated macrophages, to more comprehensively address the safety profile of this approach.

      (1) We thank the reviewer for his insightful comments and suggestions regarding the ADE issue.

      Indeed, Antibody-Dependent Enhancement (ADE) of viral infection is a critical concern when developing the ACE2 decoy strategy. In this study, we have carefully examined the relevant risk based on our data from various in vitro and in vivo assays.

      In our in vivo infection experiments, all B5-D3 prophylaxis and treatment groups, regardless of the administration times and routes, showed improved outcomes like less body-weight loss and better survival, compared to the PBS control groups (Figure 2). None of these treatment groups demonstrated worsened infections, indicating that ADE phenomenon was not occurring or did not play a major role during the B5-D3 treatments. Instead, moderate immune activation was observed in the lung of B5-D3 treated mice, which occurred much earlier but was milder compared to that in the PBS groups, and may reflect responses that lead to the efficient early clearance of viruses without observable symptoms (Figure 3 and 4).

      In our in vitro assays shown in Figure 6, B5-D3 treatments in epithelial or non-immune cell models (hACE2-Galu-3 and hACE2-293T) significantly blocked the entry of pseudovirus into cells and yielded much reduced luciferase signals (Figure 6d–g). Whereas in the THP-1-derived macrophages, although the presence of B5-D3 largely enhanced the entry of SARS-CoV-2 pseudovirus into cells (Figure 6a,b), it did not result in active infection and produced no luciferase signal (Figure 6g). These results were robustly reproducible, indicating that pseudoviruses did not successfully release its genome RNA and viral proteins (like RTase and integrases) after entering macrophages. Instead, colocalization analysis of p24 (pseudoviruses), sACE2-Fc (B5-D3), and LAMP1 (lysosome) signals suggested probability of pseudovirus degradation in endosomes/lysosomes after cell entry (Figure 6a,c). Consistently, examination of the macrophages that had taken up pseudovirus showed that the Spike (S) proteins from the pseudovirus particles were not cleaved to release S2’ fragment at a distinct smaller size (Figure 6h). As the cleavage of S protein in host cells is critical for effective membrane fusion, it is essential and regarded as hallmark for successful viral entry and escape from endosome. Collectively, these data consistently indicated that the SARS-CoV-2 pseudoviruses were degraded directly in lysosomes after entering macrophages, showing no sign of ADE.

      (2) We thank the reviewer for his valuable suggestion and have performed RNA-seq analysis to profile immune responses in the treated macrophages.

      We performed RNA-Seq analysis to investigate major transcriptional changes in THP-1-derived macrophages after the pseudovirus infection, with or without B5-D3 treatments. Although no individual genes fulfilled the cutoff threshold of significant up-/down-regulation, we observed antiviral responses in the pseodovirus-B5-D3 treated samples by GSEA (new Supplementary Fig. 18). This observation indicated that the B5-D3 treatment and subsequent cell-entry of pseudovirusB5-D3 complexes into macrophages induced immune activation at moderate levels, but not evoking strong immune responses that can be harmful to the host.

      In the revised manuscript, we have included the new RNA-seq analysis results on macrophage infection tests on page 13 lines 317–322 and page 14 lines 323–325 in the main text and presented the relevant data in the new Supplementary Fig. 18. Furthermore, we agree that ADE is a critical issue and have further enriched our discussion on page 17 lines 415–417, to emphasize that the risk for ADE should be thoroughly evaluated to further develop the decoy strategy for human use.

      The exclusive use of the K18-hACE2 mouse model, which exhibits severe disease, limits the generalizability of the findings. The "complete protection" observed may not translate to models with more robust and naturalistic immune responses or to human physiology.

      We thank the reviewer for pointing out the limitation of the mouse model used.

      (1) Given that wild type mice are not susceptible to SARS and SARS-CoV-2 infection, transgenic mice have been generated to express hACE2, through various designs and strategies, serving as models for viral infection and drug development. However, many of these hACE2 transgenic mouse models exhibit mild infections due to moderate hACE2 levels, failing to develop the severity observed in SARS and COVID patients [14].

      (2) The K18-hACE2 transgenic mouse line (B6. Cg-Tg(K18-ACE2)2Prlmn/J, Jackson Laboratory) used in our study carries multiple copies of K18-hACE2 transgene cassette [15]. Compared to other hACE2 transgenic mouse models, this K18-hACE2 line shows higher expression of hACE2 in airway and other epithelia and supports severer infections by both SARS and SARS-CoV2 viruses, successfully causing lethality [16]. Hence, K18-hACE2 mice is a widely used model to study SARS and SARS-CoV2 virus infections and drug developments.

      (3) We agree that K18-hACE2 mice is a relatively weak transgenic line with poor productivity. However, it demonstrates best susceptibility to SARS-CoV-2 infection among established mouse models. In this study, we observed robust responses to SARS-CoV-2 infection in both aged and young cohorts, with all infected mice consistently demonstrating significant body weight loss during 4 dpi to 7 dpi (the PBS groups in Figure 2b, g)

      We agree with the reviewer that it would be more convincing to assess the efficacy of B5-D3 using additional animal models. However, we have some reservations about the importance of these additional tests. First, the generality of ACE2-Fc decoy concept and its efficacy have been reported in other studies using various models [17,18]. Moreover, different transgenic mice or animal models exhibit distinct kinetics in the pathogenesis process and immune responses to SAS-CoV-2 infections, which differ from that in human patients at varied aspects. Hence, given the limited capacity of animal facility, we chose to focus on the K18-hACE2 mice that have demonstrated most robust and convincing infection data, to investigate the potential of B5-D3 administered through various strategies as well as the underlying mechanisms for the full protection observed in IN prophylaxis.

      In the revised manuscript, we have further enriched our discussion regarding this limitation, on page 17 lines 417–422.

      Furthermore, the lack of data on circulating SARS-CoV-2 variants is a concern

      We thank the reviewer for his valuable comment.

      In this study, we have demonstrated the viral neutralization capacity of B5-D3, as a representative of the enhanced sACE2 decoy, using multiple pseudoviruses and authentic SARS-CoV-2, which collectively covered eleven variants (up to Omicron strains). Our results from both in vitro neutralization and PRNT experiments confirmed the robust resilience of B5-D3 against viral evolution (Figure 1c–g). This observation aligns well with other studies and is broadly supported by various investigations, as was pointed out below by the reviewer.

      Furthermore, studies on viral evolution have observed a robust trend that later-emerging SARS-CoV-2 variants exhibit a higher affinity for the ACE2 receptor, enhancing their infectivity and transmissibility [19]. Therefore, it is unlikely for a newly emerged SARS-CoV-2 variant to escape from B5-D3mediated neutralization.

      Collectively, all evidence consistently supports the principle of decoy design, B5-D3 (or other effective ACE2 decoys) possess the intrinsic ability to neutralize new circulating SARS-CoV-2 variants, as long as the virus variants rely on ACE2 receptor for cell entry. Hence, although further tests on circulating viral variants would add strengths to our study, the significance of this additional data may be limited.

      In the revised manuscript, we have further addressed this concern in the discussion, on page 16 lines 394–397.

      The concept of sACE2-Fc fusion proteins as decoy receptors is not novel, and numerous similar constructs have been previously reported. The manuscript would benefit from a clearer demonstration of how the optimized B5-D3 mutant represents a significant advance over existing sACE2-Fc designs.

      We thank the reviewer for his valuable comments.

      Indeed, previous research has reported multiple ACE2 mutations to enhance its binding to spike proteins and neutralization against SARS-CoV-2. However, combining ACE2 mutations based on in silico predictions to both enhance spike binding and eliminate the ACE2 enzymatic activity resulted in accumulated burdens. For instance, ACE2 decoy candidates with up to five mutations like K31F/N33D/H34S/E35Q/H345L [8] and L79F/M82Y/Q325Y/H374A/H378A [12] have demonstrated excellent potency to neutralize SARS-CoV-2 in both in vitro and in vivo assays. However, the extensive mutations could be associated with structural instability and reduced production efficiency [8,12]. Furthermore, the high mutation loads increase risks for immunogenicity, which is a critical issue in future clinical applications. Corroboratively, Urano et al. detected in vitro T cell stimulation elicited by the L79F mutation, whereas the T92Q mutation (included in our decoy design) showed much lower immunogenicity and enhanced spike binding affinity [20].

      In our ACE2 decoy design, we incorporated only two mutations (like T92Q and H374N in B5-D3) to enhance neutralization potency while eliminating enzymatic activity, resulting in simplest ACE2 mutants desired for engineering enhanced decoy. B5-D3, as one representative, not only exhibited minimal mutation-related risks (Supplementary Fig. 2i) but also top-level neutralization potencies among all candidate mutants tested (Figure 1, Supplementary Fig. 2f,g and Supplementary Fig. 3). To further address the safety of B5-D3 for in vivo use, we have performed prolonged in vivo overexpression of B5-D3 ACE2 decoy through AAV delivery in immune-competent K18-hACE2 mice, which indeed showed no sign of RAS disturbance or immune infiltration causing tissue damage. (In the revise manuscript, we have included these new results on page 5 lines 118–122 and page 6 lines 123–135 in the main text and presented the data in new Supplementary Fig. 4).

      Therefore, instead of demonstrating advantage over existing sACE2-Fc designs, our study used the optimized B5-D3 as a representative ACE2 decoy of top performers, to systematically examined various administration strategies as well as the underlying mechanisms for the full protection observed in IN prophylaxis. Aligned with this effort, our study identified 6-hours IN prophylaxis as the most effective regimen to confer complete protection against SARS-CoV-2 infection in K18-hACE2 mice. Further investigation through transcriptomics, bio-distribution, and phagocytosis analysis revealed that IN-delivered B5-D3 not only neutralizes viruses but also engaged airway phagocytes to promote early viral clearance and host immune activation, uncovering a distinct antiviral mechanism for the universal “decoy strategy” to combat unknown air-borne respiratory virus in the future.

      In the revised manuscript, we have further clarified our focus on using B5-D3 as a “representative” of ACE2 decoy on page 4 line 84, page 5 line 109, page 14 line 333, and page 15 line 360.

      A direct comparative analysis with previously published benchmarks, particularly in terms of neutralizing potency, Fc effector function strength, and in vivo efficacy, is necessary to establish the incremental value and novelty of this specific agent.

      We thank the reviewer for his valuable comments.

      Indeed, our study has aimed to address this concern and made partial progress through in vitro neutralization assays (Figure 1b and Supplementary Fig. 2c,d,f,g). Our results from the limited yet meaningful comparisons with the sACE2 lacking Fc domain and selected sACE2-Fc mutants published/proposed previously clearly demonstrated “substantial enhancement through Fc-fusion” (Supplementary Fig. 1d) and modest improvement from protein mutagenesis at ACE2-Spike interaction interface” (Figure 1b and Supplementary Fig. 2c,d,f,g).

      Based on the results from our various neutralization assays, we chose B5-D3 as a representative of enhanced decoy for in vivo infection, which identified 6-hours IN prophylaxis to confer complete protection against infection, demonstrating significant impact of administration strategies on in vivo efficacy of B5-D3 (Figure 2). Subsequent analysis further uncovered intriguing phenomena regarding the cellular distribution of IN-administered B5-D3 and the early immune activation triggered in the lung, which underlies the full protection by IN prophylaxis and represents an important novelty of this study.

      We agree with the reviewer that further analysis with additional benchmark versions would enhance the value of this study, but we have reservation regarding the importance. To enhance clarity, in the revised manuscript, we have further emphasized our study focus on using B5-D3 as a representative ACE2 decoy throughout the text and enriched the discussion on page 15 line 348–365.

      References

      (1) Ku Z, Xie X, Hinton PR, Liu X, Ye X, Muruato AE, Ng DC, Biswas S, Zou J, Liu Y, Pandya D, Menachery VD, Rahman S, Cao Y-A, Deng H, Xiong W, Carlin KB, Liu J, Su H, Haanes EJ, Keyt BA, Zhang N, Carroll SF, Shi P-Y & An Z. Nasal delivery of an IgM offers broad protection from SARS-CoV-2 variants. Nature 595, 718-723 (2021).

      (2) Liu J, Mao F, Chen J, Lu S, Qi Y, Sun Y, Fang L, Yeung ML, Liu C, Yu G, Li G, Liu X, Yao Y, Huang P, Hao D, Liu Z, Ding Y, Liu H, Yang F, Chen P, Sa R, Sheng Y, Tian X, Peng R, Li X, Luo J, Cheng Y, Zheng Y, Lin Y, Song R, Jin R, Huang B, Choe H, Farzan M, Yuen KY, Tan W, Peng X, Sui J & Li W. An IgM-like inhalable ACE2 fusion protein broadly neutralizes SARSCoV-2 variants. Nat Commun 14, 5191 (2023).

      (3) Guo H, Cho B, Hinton PR, He S, Yu Y, Ramesh AK, Sivaccumar JP, Ku Z, Campo K, Holland S, Sachdeva S, Mensch C, Dawod M, Whitaker A, Eisenhauer P, Falcone A, Honce R, Botten JW, Carroll SF, Keyt BA, Womack AW, Strohl WR, Xu K, Zhang N, An Z, Ha S, Shiver JW & Fu T-M. An ACE2 decamer viral trap as a durable intervention solution for current and future SARS-CoV. Emerging Microbes & Infections 12, 2275598 (2023).

      (4) Keyt BA, Baliga R, Sinclair AM, Carroll SF & Peterson MS. Structure, Function, and Therapeutic Use of IgM Antibodies. Antibodies 9, 53 (2020).

      (5) Chan KK, Dorosky D, Sharma P, Abbasi SA, Dye JM, Kranz DM, Herbert AS & Procko E. Engineering human ACE2 to optimize binding to the spike protein of SARS coronavirus 2. Science 369, 1261-1265 (2020).

      (6) Guy JL, Jackson RM, Jensen HA, Hooper NM & Turner AJ. Identification of critical active-site residues in angiotensin-converting enzyme-2 (ACE2) by site-directed mutagenesis. The FEBS Journal 272, 3512-3520 (2005).

      (7) Payandeh Z, Rahbar MR, Jahangiri A, Hashemi ZS, Zakeri A, Jafarisani M, Rasaee MJ & Khalili S. Design of an engineered ACE2 as a novel therapeutics against COVID-19. Journal of Theoretical Biology 505, 110425 (2020).

      (8) Glasgow A, Glasgow J, Limonta D, Solomon P, Lui I, Zhang Y, Nix MA, Rettko NJ, Zha S, Yamin R, Kao K, Rosenberg OS, Ravetch JV, Wiita AP, Leung KK, Lim SA, Zhou XX, Hobman TC, Kortemme T & Wells JA. Engineered ACE2 receptor traps potently neutralize SARS-CoV2. Proceedings of the National Academy of Sciences 117, 28046-28055 (2020).

      (9) Lei C, Qian K, Li T, Zhang S, Fu W, Ding M & Hu S. Neutralization of SARS-CoV-2 spike pseudotyped virus by recombinant ACE2-Ig. Nature Communications 11, 2070 (2020).

      (10) Maciuba S, Bowden GD, Stratton HJ, Wisniewski K, Schteingart CD, Almagro JC, Valadon P, Lowitz J, Glaser SM, Lee G, Dolatyari M, Navratilova E, Porreca F & Riviere PJM. Discovery and characterization of prolactin neutralizing monoclonal antibodies for the treatment of female-prevalent pain disorders. MAbs 15, 2254676 (2023).

      (11) Dwivedi V, Shivanna V, Gautam S, Delgado J, Hicks A, Argonza M, Meredith R, Turner J, Martinez-Sobrido L, Torrelles JB & Kulkarni V. Age associated susceptibility to SARS-CoV-2 infection in the K18-hACE2 transgenic mouse model. Geroscience 46, 2901-2913 (2024).

      (12) Chen Y, Sun L, Ullah I, Beaudoin-Bussières G, Anand SP, Hederman AP, Tolbert WD, Sherburn R, Nguyen DN, Marchitto L, Ding S, Wu D, Luo Y, Gottumukkala S, Moran S, Kumar P, Piszczek G, Mothes W, Ackerman ME, Finzi A, Uchil PD, Gonzalez FJ & Pazgier M. Engineered ACE2-Fc counters murine lethal SARS-CoV-2 infection through direct neutralization and Fc-effector activities. Science Advances 8, eabn4188 (2022).

      (13) Lund J, Winter G, Jones PT, Pound JD, Tanaka T, Walker MR, Artymiuk PJ, Arata Y, Burton DR, Jefferis R & Woof JM. Human Fc gamma RI and Fc gamma RII interact with distinct but overlapping sites on human IgG. The Journal of Immunology 147, 2657-2662 (1991).

      (14) Lutz C, Maher L, Lee C & Kang W. COVID-19 preclinical models: human angiotensinconverting enzyme 2 transgenic mice. Hum Genomics 14, 20 (2020).

      (15) McCray PB, Pewe L, Wohlford-Lenane C, Hickey M, Manzel L, Shi L, Netland J, Jia HP, Halabi C, Sigmund CD, Meyerholz DK, Kirby P, Look DC & Perlman S. Lethal Infection of K18hACE2 Mice Infected with Severe Acute Respiratory Syndrome Coronavirus. Journal of Virology 81, 813-821 (2007).

      (16) Oladunni FS, Park JG, Pino PA, Gonzalez O, Akhter A, Allue-Guardia A, Olmo-Fontanez A, Gautam S, Garcia-Vilanova A, Ye C, Chiem K, Headley C, Dwivedi V, Parodi LM, Alfson KJ, Staples HM, Schami A, Garcia JI, Whigham A, Platt RN, 2nd, Gazi M, Martinez J, Chuba C, Earley S, Rodriguez OH, Mdaki SD, Kavelish KN, Escalona R, Hallam CRA, Christie C, Patterson JL, Anderson TJC, Carrion R, Jr., Dick EJ, Jr., Hall-Ursone S, Schlesinger LS, Alvarez X, Kaushal D, Giavedoni LD, Turner J, Martinez-Sobrido L & Torrelles JB. Lethality of SARS-CoV-2 infection in K18 human angiotensin-converting enzyme 2 transgenic mice. Nat Commun 11, 6122 (2020).

      (17) Urano E, Itoh Y, Suzuki T, Sasaki T, Kishikawa JI, Akamatsu K, Higuchi Y, Sakai Y, Okamura T, Mitoma S, Sugihara F, Takada A, Kimura M, Nakao S, Hirose M, Sasaki T, Koketsu R, Tsuji S, Yanagida S, Shioda T, Hara E, Matoba S, Matsuura Y, Kanda Y, Arase H, Okada M, Takagi J, Kato T, Hoshino A, Yasutomi Y, Saito A & Okamoto T. An inhaled ACE2 decoy confers protection against SARS-CoV-2 infection in preclinical models. Sci Transl Med 15, eadi2623 (2023).

      (18) Higuchi Y, Suzuki T, Arimori T, Ikemura N, Mihara E, Kirita Y, Ohgitani E, Mazda O, Motooka D, Nakamura S, Sakai Y, Itoh Y, Sugihara F, Matsuura Y, Matoba S, Okamoto T, Takagi J & Hoshino A. Engineered ACE2 receptor therapy overcomes mutational escape of SARS-CoV-2. Nature Communications 12, 3802 (2021).

      (19) Cho MJ, Been NR & Son H. From Alpha to Omicron: Structural Insights into SARS-CoV-2 RBD Evolution and ACE2 Binding. European Journal of Public Health 35(2025).

      (20) Urano E, Itoh Y, Suzuki T, Sasaki T, Kishikawa J-i, Akamatsu K, Higuchi Y, Sakai Y, Okamura T, Mitoma S, Sugihara F, Takada A, Kimura M, Nakao S, Hirose M, Sasaki T, Koketsu R, Tsuji S, Yanagida S, Shioda T, Hara E, Matoba S, Matsuura Y, Kanda Y, Arase H, Okada M, Takagi J, Kato T, Hoshino A, Yasutomi Y, Saito A & Okamoto T. An inhaled ACE2 decoy confers protection against SARS-CoV-2 infection in preclinical models. Science Translational Medicine 15, eadi2623 (2023).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Lemen et al. represents a comprehensive and unique analysis of gene networks in rat models of opioid use disorder, using multiple strains and both sexes. It provides a time-series analysis of Quantitative Trait Loci (QTLs) in response to morphine exposure.

      Strengths:

      A key finding is the identification of a previously unknown morphine-sensitive pathway involving Oprm1 and Fgf12, which activates a cascade through MAPK kinases in D1 medium spiny neurons (MSNs). Strengths include the large-scale, multi-strain, sex-inclusive design, the time-series QTL mapping provides dynamic insights, and the discovery of an Oprm1-Fgf12-MAPK signaling pathway in D1 MSNs, which is novel and relevant.

      Weaknesses:

      (1) The proposed involvement of Nav1.2 (SCN2A) as a downstream target of the Oprm1-Fgf12 pathway requires further analysis/evidence. Is Nav1.2 (SCN2A) expressed in D1 neurons?

      The authors mentioned that SCN8A (Nav1.6) was tested as a candidate mediator of Oprm1-Fgf12 loci and variation in locomotor activity. However, the proposed model supports SCN2A as a target rather than SCN8A. This is somewhat unexpected since SCN8A is highly abundant in MSN.

      Can the authors provide expression data for SCN2A, Oprm1, and Fgf12 in D1 vs. D2 MSNs?

      Author response image 1.

      We generated Author response image 1 to show both Scn2a and Scn8a are ubiquitously expressed in MSN and GABAergic neurons.

      (2) The authors should consider adding a reference to FGF12 in Schizophrenia (PMC8027596) in the Introduction.

      This is a relevant reference. We have cited it in the discussion section instead of introduction because we felt that is more relevant.

      (3) There is recent evidence supporting the druggability of other intracellular FGFs, such as FGF14 (PMC11696184) and FGF13 (PMC12259270), through their interactions with Nav channels. What are the implications of these findings for drug discovery in the context of the present study? Could FGF12 be considered a potential druggable therapeutic target for opioid use disorder (OUD)?

      The recent success in targeting FGF14 and FGF13 protein-protein interactions with sodium channels suggests that FGF12 could indeed be a druggable target for OUD. We have added a section to the Discussion exploring the potential for developing small-molecule modulators of the FGF12-Nav interface as a novel therapeutic strategy.

      Reviewer #2 (Public review):

      Summary:

      This highly novel and significant manuscript re-analyzes behavioral QTL data derived from morphine locomotor activity in the BXD recombinant inbred panel. The combination of interacting behavioral-pharmacology (morphine and naltrexone) time course data, high-resolution mouse genetic analyses, genetic analysis of gene expression (eQTLs), cross-species analysis with human gene expression and genetic data, and molecular modeling approaches with Bayesian network analysis produces new information on loci modulating morphine locomotor activity.

      Furthermore, the identification of time-wise epistatic interactions between the Oprm1 and Fgf12 loci is highly novel and points to methodological approaches for identifying other epistatic interactions using animal model genetic studies.

      Strengths:

      (1) Use of state-of-the art genetic tools for mapping behavioral phenotypes in mouse models.

      (2) Adequately powered analysis incorporating both sexes and time course analyses.

      (3) Detection of time and sex-dependent interactions of two QTL loci modulating morphine locomotor activity.

      (4) Identification of putative candidate genes by combined expression and behavioral genetic analyses.

      (5) Use of Bayesian analysis to model causal interactions between multiple genes and behavioral time points.

      Weaknesses:

      (1) There is a need for careful editing of the text and figures to eliminate multiple typographical and other compositional errors.

      We have performed a thorough review of the manuscript and corrected typographical errors, including "ddactivates" and other compositional issues.

      (2) There are multiple examples of overstating the possible significance of results that should be corrected or at least directly pointed out as weaknesses in the Discussion. These include:

      (a) Assumption that the Oprm1 gene is the causal candidate gene for the major morphine locomotor Chr10 QTL at the early time epochs. Oprm1 is 400,000 bp away from the support interval of the Mor10a QTL locus, and there is no mention as to whether the Oprm1 mRNA eQTL overlaps with Mor10a.

      We have clarified this in the text. While Oprm1 is located proximal to the peak, its massive size and the presence of a strong mRNA cis-eQTL in the NAc and hippocampus that precisely overlaps with the Mor10a QTL support interval provide robust evidence for its candidacy. We have added this detail to the Results section.

      (b) Although the Bayesian analysis of possible complex interactions between Oprm1, Fgf12, other interacting genes, and behaviors is very innovative and produces testable hypotheses, a more straightforward mediation analysis of causal relationships between genotype, gene expression, and phenotype would have added strength to the arguments for the causal role of these individual genes.

      We agree that mediation analysis would be a valuable addition. We revised the Results section to acknowledge that while the Bayesian network provides a comprehensive causal hypothesis, future studies employing formal mediation analysis could further strengthen these individual gene-to-behavior links.

      (c) The GWAS data analysis for Oprm1 and Fgf12 is incomplete in not mentioning actual significance levels for Oprm1 and perhaps overstating the nominal significance findings for Fgf12.

      We have updated the manuscript to include the specific significance levels for the human GWAS findings related to Oprm1 and Fgf12. We have clarified that the OPRM1 variant rs1799971 reached genome-wide significance (OR = 1.046, p = 4.92 × 10<sup>-9</sup>). Furthermore, we have ensured that the findings for FGF12 are described as nominally significant to avoid any overstatement of the results. For example, we now specify that the top FGF12 SNP rs1553460 achieved nominal significance (OR = 1.015, p = 0.021). The Results and Discussion sections have been revised to reflect these precise statistical values.

      Appraisal:

      The authors largely succeeded in reaching goals with novel findings and methodology.

      Significance of Findings:

      This study will likely spur future direct experimental studies to test hypotheses generated by this complex analysis. Additionally, the broad methodological approach incorporating time course genetic analyses may encourage other studies to identify epistatic interactions in mouse genetic studies.

      Reviewer #3 (Public review):

      Summary:

      This is a clearly written paper that describes the reanalysis of data from a BXD study of the locomotor response to morphine and naloxone. The authors detect significant loci and an epistatic interaction between two of those loci. Single-cell data from outbred rats is used to investigate the interaction. The authors also use network methods and incorporate human data into their analysis.

      Strengths:

      One major strength of this work is the use of granular time-series data, enabling the identification of time-point-specific QTL. This allowed for the identification of an additional, distinct QTL (the Fgf12 locus) in this work compared to previously published analysis of these data, as well as the identification of an epistatic effect between Oprm1 (driving early stages of locomotor activation) and Fgf12 (driving later stages).

      Weaknesses:

      (1) What criteria were used to determine whether the epistatic interaction was significant? How many possible interactions were explored?

      By design we only tested for epistasis between the Oprm1 and the Fgf12 loci—a single test of a non-linear interaction. As such there is no correction for multiple tests and no need for permutation. In other words the “nominal” P value in this case is the only relevant P value. We have added this clarification in the Results and Methods.

      (2) Results are presented for males and females separately, but the decision to examine the two sexes separately was never explained or justified. Since it is not standard to perform GWAS broken down by sex, some initial explanation of this decision is needed. Perhaps the discussion could also discuss what (if anything) was learned as a result of the sex-specific analysis. In the end, was it useful?

      We chose to analyze sexes separately AND jointly due to significant sex differences and sex by strain interactions in locomotion data. This rationale has been added to the results section. We also discussed sex-specific results in the revision.

      (3) The confidence intervals for the results were not well described, although I do see them in one of the tables. The authors used a 1.5 support interval, but didn't offer any justification for this decision. Is that a 95% confidence interval? If not, should more consideration have been given to genes outside that interval? For some of the QTLs that are not the focus of this paper, the confidence intervals were very large (>10 Mb). Is that typical for BXDs?

      The 1.5 LOD support interval is a standard metric for most QTL mapping studies, and does correspond approximately to a 95% confidence or support interval. Large intervals are common in BXD studies when effect sizes are moderate or recombination density is lower in specific regions. We have clarified the use of the 1.5 LOD interval in the Results section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In the vast majority of the figures, the text is too small to read.

      We have adjusted the font size in most of the figures.

      Reviewer #2 (Recommendations for the authors):

      (1) There is a need for careful editing of the text and figures to eliminate multiple typographical and other compositional errors. Examples of these include:

      (a) Figure 2E&F lacks identification of Oprm1 as the gene for cis-eQTL studies.

      (b) Figure 2H is fairly uninterpretable given the small font sizes. It should be excluded, put as a supplemental figure, or reconfigured to highlight the most important findings in a more legible manner.

      (c) Figure 4b: columns in the table need to be identified by a header row.

      We thank the reviewer for these comments and have addressed them in the revised version.

      Oprm1 is now labeled in Figure 2E and 2F, Figure 2G and 2H is now moved to the Supplementary material. And a header row is added to the table in Figure 4b.

      Reviewer #3 (Recommendations for the authors):

      Abstract

      (1) For the abstract, it might be simpler to name the alleles as "the C57BL/6J allele", etc., since B allele will confuse people unfamiliar with mouse nomenclature.

      It is critical to not confound the organism known as C57BL/6J with the genotype, allele, or haplotype that a mouse happens to inherit. Diverse types of mice inherit reference alleles but they may be only very distantly related the C57BL/6J strain. And even the C57BL/6J strain is a moving target that accumulates mutations that are not even consider reference. For example the mutation in Gabra2 of C57BL/6J is a de novo mutation that is not carried by many of the BXD strains since this mutation happened in JAX foundation stock after the BXDs were first established by Dr. Ben Taylor in the 1970s.

      The convention is to refer to mouse strains by one string and RRID, the abbreviation of that strain by a common code (often B6), and the abbreviation of the allele, genotype, or haplotype by the italic letter B. This has been the recommendation of the Mouse Nomenclature Committee (on which one of the authors has been a member) for well over 50 years.

      (2) I wondered if "also associated with a high B allele" could be reworded somehow; I had to re-read that sentence several times.

      This sentence has been reworded for clarity.

      (3) Parts of the abstract are written in the present tense, but then it switches to past ("we generated" but then "a Bayesian network analysis supports...").

      We have thoroughly revised the abstract. Following standard scientific writing conventions, we now utilize the past tense to describe the specific experimental actions and results of this study. We have maintained the present tense for established biological facts and the broader significance of the findings.

      (4) While the -log(p) values are all impressive, the abstract should indicate what threshold is used for genome-wide significance and how that threshold was obtained.

      We have added the significance threshold to the Abstract.

      (5) Do the details of the MAP kinase cascade need to be explained in the abstract? It feels like a lot of detail for an abstract and represents one of the most speculative aspects of the paper. Maybe just say you identified a possible network, but save the details for the main paper.

      This is a valid suggestion. We removed the specific MAP kinase from the abstract.

      Introduction

      (1) You could add a sentence explaining why using an LMM (GEMMA) was an improvement over the prior analysis.

      We have added a sentence explaining that GEMMA improves mapping power and better controls for population structure compared to previous methods.

      (2) When mentioning Philips 2010, you could indicate that it identified Oprm1. This might be easier than "In addition to Oprm1" which confused me at first because it had not been mentioned before, so 'in addition' was jarring.

      We have revised the text to state that Philip et al. (2010) originally identified the Oprm1 locus.

      Results

      (1) There are additional instances of the tense switching between past and present in the results section.

      We have standardized the tenses in the Results section.

      (2) "Ostn, Uts2d, Ccdc50, Gm10823, Fgf12, and Mb21d2" - before giving arguments for fgf12, can you clarify if there are coding variants or eQTLs for any of these genes?

      We have added a statement clarifying the coding variants for other genes in this interval and highlighting their eQTL status.

      (3) "a total number of 4,495 high-quality nuclei transcriptomes". Consider removing the word "number".

      Removed.

      (4) "approximately 6 males and 6 females" - could you point the reader to a supplementary table that has the exact number of individuals at the end of this sentence?

      The exact number of mice used in each of the BXD strains is not recorded in the original publication by Philip et al., with only mean and max was given. We have clarified that 6 is the average.

      (5) "computed using a subset" - please explain how you selected this subset (I assumed LD pruning, but why not be explicit. How many SNPs/markers were there originally, and how many are retained?

      We have specified that the subset of markers was selected via LD pruning to represent the genetic diversity of the BXDs.

      (6) A few words about how the significant threshold was obtained (permutation?) are needed.

      We have clarified that the significance threshold was obtained through 1,000 permutations.

      (7) Some of the GWAS results are presented for males and females separately (as well as combined). This is not typical, and so maybe a sentence explaining why the authors thought there might be sex specific GWAS results would be warranted.

      The rationale for sex-specific analysis is provided in the results section (significant sex difference and sex by strain interaction)

      (8) The correlation between the sexes of 0.68 could be evidence that there are sex-specific genetic effects, but could it also just be due to increased noise as you reduce sample size? What is the confidence interval for that number? Does it include 1? Or 0? If you randomly split the dataset, rather than splitting on the basis of sex, would you obtain higher correlations? The idea of sex differences is interesting, but a bit more work is needed to clarify these concerns.

      The correlation of 0.68 (95% CI: 0.52–0.79) significantly excludes both 0 and 1. The drop from r = ~0.86 at earlier intervals suggests a biological shift rather than noise due to sample size, as n remains constant (n = ~ 6 /sex/strain) across all time points. This divergence is driven by sex-specific genetic modifiers, such as the Fgf12 locus, which is more than twice as strong in females (LOD 10.6) as in males (LOD 4.3). We have addressed this in the revision.

      (9) Maybe I missed it, but how did you determine the threshold for significance for the epistatic interaction? Could you also clearly indicate how many possible cases of epistasis were examined/considered, since that dictates the correction for multiple testing.

      We only tested the interaction between the Fgf12 and the Oprm loci.

      (10) "To further examine whether Oprm1 and Fgf12 were co-expressed in the same cells of the NAc," can you first give an indication as to why you looked in NAc versus other brain areas you might have considered?

      We have added a sentence explaining that the NAc was chosen due to its central role in opioid reward and the observed strain differences in dopamine release in this region.

      (11) "...from every cell type conveyed a weak but significant positive correlation (r = 0.08, p = 1.8e-8) between the expression of Oprm1 and Fgf12 (Figure 7e). When we performed Pearson's correlation analysis within each individual cell cluster, only D1-MSN-3 had a significant positive correlation (r = 0.35, p = 6.1e-8, Figure 7f). In contrast, D1-MSN-2 had a significantly weak negative correlation (r = -0.12, p = 0.02, Figure 7g)." Can you explain why these correlations are relevant? What hypothesis are you testing?

      We have clarified that these correlations were used to test the hypothesis that Oprm1 and Fgf12 are co-expressed and potentially co-regulated within the same neuronal subtype to support their epistatic interaction.

      (12) "After the morphine locomotion tests were complete," can you give a specific timepoint? Like, was it exactly 180 minutes after the morphine injection?

      We have specified that naloxone was injected exactly 180 minutes after the morphine injection.

      (13) I appreciate the desire to relate the results of this paper to human GWAS results; however, I don't feel there is much worth discussing beyond the Oprm1 finding. Therefore, I would suggest removing this from the results section and instead just making it a discussion topic. The results presented are clearly the weakest part of this paper, and I personally think it is a shame to end the results section with something that is not very informative. But I suspect the authors may wish to retain this section, and I leave that decision to them and the editor.

      We have retained this section but moved some of the more speculative human data discussion to the Discussion section as suggested.

      Discussion

      (1) Typo "deactivates".

      Corrected to "activates".

      (2) The last sentence in the first paragraph again discusses the comparison to humans; I would remove this.

      That sentence is condensed.

      (3) "These data indicate that Oprm1 is a strong candidate gene for the Chr 10 locus associated with morphine-induced locomotion response." I would remind them of the eQTL for Oprm1 since this is a key piece of evidence supporting this gene as a candidate.

      We have added a reminder of the overlapping mRNA cis-eQTL for Oprm1.

      (4) "It is likely that differences in morphine-induced dopamine release are involved in the highly variable locomotor responses to morphine across the BXD family." I agree this might be true, but since you have no evidence to support this claim, is it worth mentioning at all?

      We have rephrased this as a hypothesis or cited relevant literature supporting this link in parental strains.

      (5) Could you include a sentence or two about why Philip 2010 didn't find Fgf12? Lack of markers? The difference between an LM and an LMM?

      We have added an explanation that the use of a high-density WGS-based marker set and the LMM (GEMMA) allowed for the detection of this novel locus that was previously missed.

      (6) Section titled "Cell-type specific gene expression in NAc". While this is interesting, you might also want to remind the reader that epistatic interactions do not necessarily require the genes to be expressed in the same cell or for their gene products to physically interact.

      We have added this caveat to the Discussion.

      (7) I think the Bayesian network section is not very strong. For example, they did not compare the results for their two chosen genes to the results they might have obtained if they had chosen other genes from their QTL intervals. My guess is that those other genes might have also produced results that were equally convincing. I'm not asking them to do that, but it reflects the risk of false positive results when taking an approach like this. Nevertheless, I am guessing the authors would prefer to include this section.

      We appreciate the reviewer pointing out this possibility and agree with this concern. We have added a statement acknowledging the risk of false positives in Bayesian modeling in this context and noting that these findings are intended as testable hypotheses

      Methods

      (1) How were the 2 HS rats selected? I had the impression that Dr. Telese's lab had access to snRNA-seq data from more than 2 HS rats.

      We have clarified that these rats were selected based on their addiction-like behavior phenotypes from a larger cohort.

      (2) I didn't look back, but did the main paper point out that the rats are treated with oxycodone rather than morphine?

      We have clarified this distinction in the Methods section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) I think this is an important paper, but I’m puzzled about a tension in the results. On the one hand, it looks like the behavioural gains post-TT happen rather smoothly over time (Figure 5). On the other hand, muscle synergy activations change abruptly at specific days (around day ~65 for Monkey A and around day ~45 for Monkey B; e.g., Figure 6). How do the authors reconcile this tension? In other words, how do they think that this drastic behavioural transition can arise from what appears to be step-by-step, continuous changes in muscle coordination? Is it “just” subtle changes in movements/posture exploiting the mechanical coupling between wrist and finger movements, combined with subtle changes in synergies, and they just happen to all kick in at the same time? This feels to me to be the core of the paper and should be addressed more directly.

      We thank the reviewer for this insightful comment, as it touches upon the central finding of our study. The apparent tension between the smooth behavioral recovery and the abrupt shift in neural strategy is indeed a key feature of the adaptation process. We propose that this reflects the interaction of two distinct, parallel processes operating on different timescales:

      A slow, gradual skill-learning process, where the monkeys incrementally developed and refined a compensatory motor strategy (i.e., the tenodesis effect). This slow refinement is responsible for the smooth improvement seen in the behavioral metrics over many weeks.

      A fast, switch-like adaptive process, which governs the activation of the primary muscle synergies. The initial ‘swap’ strategy, while simple, was biomechanically conflicting and inefficient. The CNS only abandoned this flawed strategy abruptly once the slow learning process had rendered the new compensatory strategy “good enough” to be a viable alternative.

      Therefore, the abrupt neural shift does not cause the behavioral improvement but is rather enabled by the gradual, underlying development of a better motor solution. To address this important point more directly within the manuscript, we added a new subheading to the Discussion section. This section is dedicated to explicitly framing our findings within this multi-timescale learning model, ensuring the link between the gradual behavioral recovery and the abrupt neural shift is clearly articulated.

      (2) The muscle synergy analyses, which are an important part of the paper, could be improved. In particular:

      (a) When measuring the cross-correlation between the activation of synergies, the authors should include error bars and should also look at the lag between the signals.

      We thank the reviewer for these excellent suggestions to improve our analysis.

      Error Bars: We agree that showing trial-to-trial variability is important. In our revision, we have added a shaded envelope (representing the SD across trials) to the cross-correlation plots in Figures 6, 9 and 10.

      Time Lag: We have performed the cross-correlation analysis allowing for variable time lags and extracted the lag yielding the maximum correlation coefficient (max CC) for each session, in addition to the zero-lag correlation presented in the main figures. As hypothesized, allowing variable lags often resulted in high max CC values throughout the adaptation period, potentially obscuring the clear swap-and-revert pattern visible in the zerolag analysis. This is likely because the primary adaptation involved changes in synergy timing rather than fundamental shape. However, the analysis of the lag itself proved informative. We observed significant fluctuations in the optimal lag during the early and mid-adaptation phases, particularly around the time of the ‘switch-back’, before the lag stabilized closer to zero in the late phase.

      We have added a description of this analysis to the Methods section. The results of the lag analysis are now presented in a new Supplementary Figure S6 and S7, and a sentence summarizing this finding has been added to the Results section.

      (b) Figure 7C and related figures, the authors state that the activation of muscle synergies reverts to pre-TT patterns toward the end of the experiments. However, there are noticeable differences for both monkeys (at the end of the “task range” for synergy B for monkey A, and around 50% task range for synergy B for monkey B). The authors should measure this, e.g., by quantifying the per-sample correlation between pre-TT and post-TT activation amplitudes. Same for Figures 8I, J, etc.

      We thank the reviewer for this detailed and insightful suggestion. We agree that our use of the term ‘reversion’ should be nuanced, as the recovery of the synergy activation patterns is substantial but not perfect.

      To formally quantify these remaining differences, we performed a rigorous quantitative comparison between the pre-surgery and final-day post-surgery activation profiles. We calculated the Cosine Similarity to assess the recovery of the temporal shape, and used a Permutation Test (n=10,000) to test for statistical distinctness between the pre- and post-surgery trajectories.

      Results: We found that while the temporal shapes were highly similar (Cosine Correlation > 0.90 for all synergies), the Permutation Test confirmed that the profiles remained statistically distinct (p < 0.0001) in both animals.

      We have added this quantification to the text (Results). This confirms our nuanced interpretation: while the primary temporal features of the synergies reverted, the recovered motor program represents a novel, ‘good enough’ solution that is robust and functional, rather than a mathematically perfect restoration of the original baseline.

      (c) In Figures 9 and 10, the authors show the cross-correlation of the activation coefficients of different synergies; the authors should also look at the correlation between activation profiles because it provides additional information.

      We thank the reviewer for this comment and the opportunity to clarify our terminology. We agree that analyzing the correlation between the full activation profiles is the most informative approach. In our manuscript, the terms ‘activation coefficients’ and ‘activation profiles’ both refer to the complete, time-varying activation patterns of the muscle synergies. Therefore, the crosscorrelation analysis presented in Figures 9 and 10 is indeed the correlation between these full activation profiles. To prevent any potential ambiguity for future readers, we have revised the manuscript to use the term ‘activation profiles’ exclusively and consistently when referring to these time-varying synergy activations.

      (d) The muscle synergy analysis for Monkey B is hindered by the fact that the authors lost the ability to record from the (very) functionally relevant FDS muscle. I’d repeat the synergy analyses without this muscle to understand to what extent the observed changes with respect to baseline are driven by the lack of this data.

      We thank the reviewer for raising this important methodological point. We agree that controlling for changes in the recorded muscle set is crucial for a valid comparison between pre- and post-surgical synergy structures. The reviewer’s concern is based on the premise that the FDS muscle was included in the pre-surgical analysis for Monkey B but absent from the postsurgical analysis.

      We would like to clarify that this is not the case. Due to the loss of the FDS signal post-surgery, we made the deliberate decision to exclude the FDS muscle from ALL synergy analyses for Monkey B, including the pre-surgical baseline period. This was done for the precise reason the reviewer identifies: to ensure a direct and unbiased “apples-to-apples” comparison and to avoid introducing the lack of this muscle as a confound. Therefore, the changes in synergy structure that we report for Monkey B can be confidently attributed to genuine physiological adaptation rather than an artifact of a changing input dataset.

      (e) Figure 11: The authors talk about a key difference in how Synergy B (the extensor finger) evolved between monkeys post-TT. However, to me this figure feels more like a difference in quantity - the time course than quality, since for both monkeys the aaEMG levels pretty much go back to close to baseline levels - even if there’s a statistically significant difference only for Monkey B. What am I missing?

      We thank the reviewer for this insightful question, as it has prompted us to refine our interpretation of this key finding. The reviewer correctly notes that the recovery trajectories of Synergy B appear different, and we agree that our original explanation can be improved.

      A more parsimonious interpretation, and one that we believe aligns better with the data, is that both monkeys likely underwent a similar ‘arms race’, but we captured different phases of this process. In Monkey A, our recordings (starting Day 29) captured the escalating phase of this neuromuscular conflict. In contrast, for Monkey B, recordings began on Day 20, by which time this rapid escalation had likely already occurred and peaked. This difference in the timing of the ‘arms race’ is consistent with our behavioral observations; Monkey A struggled for a longer period before performing the task proficiently, suggesting a more protracted overall adaptation process. Thus, the apparent difference in the figures is likely a reflection of the observational window and the individual adaptation rate of each animal, rather than a fundamental qualitative difference in their adaptive strategy. We have revised the text to present this more unified and coherent interpretation.

      (f) Lines 408-09 and above: The authors claim that “The development of a compensatory strategy, primarily involving the wrist flexor synergy (Synergy C), appears crucial for enabling the final phase of adaptation”, which feels true intuitively and also based on the analysis in Figure 8, but Figure 11 suggests this is only true for Monkey B. How can these statements be reconciled?

      We believe the reviewer may be referring to Monkey A in their comment, as the strong compensatory effect is indeed seen in this animal. The core of this issue, which we have clarified in our revision, is that both monkeys developed a compensatory tenodesis grasp but used different neural strategies to achieve it.

      For Monkey A, strong evidence for this strategy is provided by a clear temporal shift in the activation of its dedicated wrist flexor synergy (Synergy C). As we have now clarified in the manuscript, the peak of this synergy’s activation moved from occurring just after object contact to just before it, a re-timing well-suited to enable a tenodesis grasp.

      For Monkey B, the strategy was one of subtle re-timing rather than scaling. While the total aggregated activation of its primary flexor synergy (Synergy A) did not significantly increase, its temporal profile shifted. Specifically, activation prior to object contact increased, providing the necessary wrist flexion for its assistive tenodesis grasp, which was kinematically confirmed in Figure 12. This was achieved by reallocating activation from the post-contact phase, resulting in an earlier activation peak for the synergy overall. Crucially, a finer-grained analysis reveals a precise temporal sequence within this synergy’s activation: the wrist flexor component (PL) consistently peaked just before object contact to enable hand opening, while the finger flexor component (FDP) peaked just after contact to secure the grasp.

      This timing resolves the apparent biomechanical conflict. It also reveals that while both monkeys converged on the same biomechanical solution (a tenodesis grasp), the observable neural implementation appeared different. However, we must be cautious in directly comparing the computed synergy structures themselves, as the analysis for Monkey B was performed without the FDS muscle. The apparent “multi-functional synergy” in Monkey B is most likely a consequence of this missing data. What is clear and robust, however, is that both monkeys converged on a remarkably similar temporal solution: they both learned to re-time the activation of their key wrist flexor muscles to the pre-grasp phase.

      In Monkey A, this was observed in the temporal shift of its dedicated wrist flexor synergy (Synergy C). In Monkey B, this was observed in the temporal shift of the Palmaris Longus (PL) muscle itself (which, in our computed synergies, was grouped into Synergy A). This convergence on an identical temporal adaptation, regardless of the computed modular organization, is the key finding. We have revised the manuscript to articulate this more precise and defensible interpretation.

      (3) Experimental design: at least for the monkey who was trained on the “artificial task” (Monkey A), it would have been good if the authors had also tested him on naturalistic grasping, like the second monkey, to see to what extent the neural changes generalise across behaviours or are task-specific. Do the authors have some data that could be used to assess this even if less systematically?

      We thank the reviewer for raising this important point regarding the generalizability of our findings across different behaviors. We fully agree that a direct comparison of both tasks in the same animal would have been a valuable experiment. Unfortunately, we do not have systematic data on naturalistic grasping for Monkey A that would allow for such a direct comparison. We therefore view the two tasks as providing complementary evidence. Monkey A’s data shows the adaptation process during a highly stereotyped behavior, while Monkey B’s data demonstrates that a similar two-phase adaptive process occurs during a more naturalistic, unconstrained task. The convergence of these findings strengthens our overall conclusion that this multi-timescale adaptation is a robust principle of motor learning. Nonetheless, the reviewer raises a fascinating question about the task-specific tuning of motor synergies, which remains an excellent direction for future studies.

      (4) Monkey B’s behaviour pre-tendon transfer seems more variable than that of Monkey A (e.g., the larger error bars in Figure 5 compared to monkey A, the fluctuating crosscorrelation between FDS pre and EDC post in Figure 6Q). This should be quantified to better ground the results since it also shows more variability post-TT.

      We thank the reviewer for this excellent suggestion to formally quantify the presurgery behavioral variability. We have performed the suggested analysis on the "Grip Formation Time" metric (Fig. 5A), which was the comparable metric between the two tasks. Our calculation of the Coefficient of Variation (CV) confirms the reviewer’s observation. Monkey B’s pre-surgery performance was substantially more variable (CV = 81.93%) than Monkey A’s (CV = 46.62%). Furthermore, a non-parametric test for equal variances (Ansari-Bradley test) confirmed that this difference is highly statistically significant (p < 0.0001). We have added a description of this analysis to the Methods and reported this finding in the Results section to provide a clearer context for the baseline differences between the subjects.

      (5) Minor: Figure 12 is interesting and supports the idea that monkeys may exploit the biomechanical coupling between wrist and fingers as part of their functional recovery. It would be interesting to measure whether there is a change in such coupling (tenodesis) over time, e.g., by plotting the change in wrist angle vs change in MCP angle as a scatter plot (one dot per trial), and in the same plot show all the days, colour coded by day. Would the relationship remain largely constant or fluctuate slightly early on? I feel this analysis could also help address my point (1) above.

      We thank the reviewer for this excellent and insightful suggestion. We have performed the suggested analysis for Monkey B, plotting the trial-by-trial relationship between wrist and MCP angles for all recording days (New Figure 13).

      The results clearly show the gradual refinement of the tenodesis coupling. Pre-surgery, there was no correlation (R²=0.00). Immediately post-surgery (Day 22), the relationship was weak and variable (R²=0.16), reflecting an exploratory phase. Over the following weeks, the coupling became progressively stronger and more consistent, with the R² value peaking at 0.58 around Day 56, indicating a robust exploitation of the new strategy. The relationship then stabilized at a moderate level (R² ~0.2-0.3) in the final days. This analysis provides direct kinematic evidence for the slow, gradual skill-learning component of our two-state model. It beautifully complements our response to the reviewer’s first point by visualizing the underlying refinement process that occurred concurrently with the more abrupt neural shifts. We have added this new figure and a description of these results to the manuscript.

      Reviewer #2 (Public review):

      Weaknesses:

      The most notable weakness of the study is the incompleteness of the data. [...] As a result, it is difficult to make general conclusions from the study, and it awaits further analysis or the addition of another subject.

      We thank the reviewer for this critical and accurate assessment of the study’s limitations. The reviewer is correct that the datasets for the two monkeys are incomplete in different ways and that the tasks were not identical. We fully acknowledge these limitations throughout the manuscript. Rather than viewing these differences as a weakness that prevents generalization, we propose that they offer a unique strength in the form of complementary evidence. We consider the two animals not as a direct replication, but as two distinct case studies that test the same underlying hypothesis under different conditions.

      Monkey A, with its high-quality EMG and highly stereotyped task, provides a detailed, quantitative view of the neural adaptation process, allowing us to precisely characterize phenomena like the ‘neuromuscular arms race’.

      Monkey B, with its kinematic data and more naturalistic task, provides crucial evidence that the same fundamental principles, a two-phase adaptation and the eventual development of a compensatory strategy, generalize to a less constrained, more behaviorally relevant context. We believe the key finding is the convergence of the results. Despite the differences in individual strategy, task demands, and available data, both animals demonstrated the same core "swapand-revert" adaptive process. We propose that this convergence from heterogeneous sources lends support to the generalizability of our conclusions, suggesting that the multi-timescale adaptation we describe may be a general feature of motor learning following such perturbations. We agree that future studies with more subjects are needed to fully establish this principle. Nonetheless, we feel that the convergent evidence from these two complementary cases provides a valuable foundation for the model we present.

      A second weakness is the insufficient analysis of the movements themselves, particularly for Monkey A. [...] Since the authors have video data for both monkeys, it is surprising that it was not used to extract landmarks for kinematic analysis, or at least hand/endpoint trajectory, and how it is adjusted over time. Adding more behavior data and aligning it with the EMG data would be very helpful for characterizing motor recovery and is needed to support conclusions about underlying neural control strategies for functional improvement.

      We thank the reviewer for this important suggestion. The reviewer’s comment prompted us to re-examine our behavioral data, and we have now performed additional analyses that we agree provide a much clearer link between the neural changes and functional recovery.

      For Monkey A, we have quantified the ‘pull times’ on a day-by-day basis. This analysis reveals a clear, gradual learning curve: pull times were initially long and variable post-surgery but steadily decreased and stabilized over the recovery period. This provides a direct, quantitative measure of motor performance recovery for this animal.

      For Monkey B, we have performed a detailed analysis of the ‘grasp aperture’ prior to object contact. This kinematic analysis is particularly revealing, as it shows the development of the compensatory strategy in real-time. The grasp aperture was initially very small post-surgery, reflecting the monkey’s inability to open its hand. It then steadily increased over the next ~40 days as the monkey learned and refined the compensatory tenodesis grasp, before stabilizing at a new, functional baseline.

      We believe these new analyses directly address the reviewer’s concern by providing a more detailed picture of motor recovery. The grasp aperture data, in particular, offers a clear kinematic correlate for the slow, skill-learning process that we propose runs in parallel to the more abrupt neural reorganization. We have added these results as a new figure in the main text of our revised manuscript.

      Considering specific conclusions, the statement that the monkeys learned to use “tenodesis” over time by increasing activation of a wrist flexor muscle synergy does not seem to be fully supported by the data. [...] Given these issues, it is not clear how to align the EMG and kinematic data and interpret these findings.

      We thank the reviewer for this detailed and critical analysis. They raise an excellent point and have correctly observed that the adaptation is not a simple, uniform increase in wrist flexor synergy amplitude. Our interpretation, which we have clarified in the manuscript, is that the monkeys learned a more sophisticated strategy: a precise re-timing of the wrist flexor activation to occur earlier in the movement, specifically to pre-shape the hand for the grasp.

      For Monkey A: The reviewer correctly notes that the peak amplitude of Synergy C (the wrist flexor synergy) around the moment of grasp (0% task range) is lower in the final phase compared to baseline. However, the crucial change is temporal: the peak of this synergy’s activation shifts from occurring just after the grasp (~+1%) to occurring just before it (~-2%). This re-timing is perfectly suited to enable finger extension via the tenodesis effect immediately prior to object contact. The subsequent lower amplitude may reflect a more efficient, less forceful movement once this new skill was refined.

      For Monkey B: The reviewer is right that this monkey does not have a dedicated wrist flexor synergy and that the overall amplitude of the PL muscle does not increase dramatically. However, a closer look at its activity profile (Fig. S2-AN) reveals a clear and consistent increase in activation specifically in the pre-contact phase (~7% task range). This is the precise neural signature of the assistive tenodesis grasp that is kinematically confirmed in Figure 12. The monkey is not simply scaling up the synergy; it is strategically activating it earlier to prepare for the grasp.

      In summary, the key evidence linking the EMG to the tenodesis strategy is in the temporal domain. The learned re-timing of the wrist flexor activation to the pre-grasp phase is the crucial link that aligns the neural and kinematic data. We have revised the manuscript to make this distinction between amplitude scaling and temporal shifting clearer.

      A more minor point regarding conclusions: statements about poor task performance and high energy expenditure being the costs that drive exploration for a new strategy are speculative and should be presented as such. Although the monkeys did take longer to complete the tasks after the surgery, they were still able to perform it successfully and in less than a second and no measurements of energy expenditure were taken.

      We thank the reviewer for this important point regarding the precision of our language. We agree that statements regarding ‘high energy expenditure’ and the specific drivers for exploring a new strategy are interpretations of the data, not direct measurements, and should be framed as such.

      Our speculation about energetic cost is based on the significant increase in muscle co-activation we observed (e.g., Fig. 11), a phenomenon widely understood to be metabolically expensive. Similarly, while the monkeys were still successful, their prolonged movement times and inefficient motor patterns represent a clear performance deficit compared to their highly optimized presurgical baseline, which we propose acted as a driver for further adaptation. In our full revision, we have carefully revised the manuscript to soften these claims. We have used more speculative language, such as “we hypothesize that...”, “the likely cost of...”, or “may have provided the impetus for...” to ensure that our interpretations are clearly distinguished from our direct empirical findings.

      A small concern is whether the tendon transfer effect may fail over time, either due to scar tissue formation or tendon tearing, and it would be ideal if the integrity of the intervention were re-assessed at the end of the study.

      We thank the reviewer for raising this important point regarding the long-term integrity of the tendon transfer. We agree that a terminal anatomical re-assessment would be an ideal control. While a terminal assessment was not performed as part of this study’s protocol, we were able to monitor the transfer’s integrity throughout the study. We are confident the transfer remained functionally intact for two key reasons:

      (1) Physical Monitoring: We periodically used ultrasound imaging to non-invasively visualize the tendon repair, which allowed us to confirm its continued physical integrity.

      (2) Functional Evidence: This physical confirmation was corroborated by the functional data. Both animals achieved stable, proficient task performance that was maintained for months. Furthermore, the late-phase neuromuscular control strategies became highly consistent. A significant failure, such as a tendon tear or prohibitive mechanical scarring, would be incompatible with this sustained behavioral and neural stability.

      Nevertheless, we agree that a terminal assessment is an excellent methodological suggestion that should be incorporated into the design of future long-term studies of this nature.

      Reviewer #3 (Public review):

      (1) First, I find myself wondering about the physical healing process from the tendon transfer surgery and how it might contribute to the learning. Specifically, how long does it take for the tendons to heal and bear forces? If this itself takes a few months, it would be nice to see some discussion of this.

      We thank the reviewer for this insightful question about the potential contribution of the physical healing process to the adaptation timeline. Our surgical protocol was specifically designed to ensure the tendon transfer was biomechanically robust from the outset, minimizing the role of healing as a rate-limiting factor.

      We used a Pulvertaft weave technique, which is known to achieve mechanical strength equivalent to that of a native tendon shortly after the procedure (Graham et al., 2023). The repair involved more than two weaves and utilized high-strength suture material to maximize its initial forcebearing capacity. While full fibrous integration around the suture site typically occurs within approximately six weeks, the repair itself was strong enough to bear physiological forces immediately post-surgery. Therefore, the prolonged, complex, two-phase multi-month behavioral recovery and the neural reorganization we observed cannot be attributed to a slow physical healing process. Instead, this supports our conclusion that the observed timeline reflects the challenges and constraints of a purely neural adaptation and skill-learning process. To make this crucial point clear to all readers, we have added these details about the surgical method to the Methods section and included a brief discussion of its implications in the Discussion.

      (2) Second, I see that there are some changes in the muscle loadings for each synergy over the days, though they are relatively small. The authors mention that the cosine distances are very small for the conserved synergies compared to distances across synergies, but it would be good to get a sense for how variable this measure is within synergy. For example, what is the cosine similarity for a conserved synergy across different pre-surgery days? This might help inform whether the changes post-surgery are within a normal variation or whether they reflect important changes in how the muscles are being used over time.

      We thank the reviewer for this excellent and insightful suggestion. Establishing a baseline for normal day-to-day variability is an important control for our synergy analysis.

      We have performed this analysis in full. Specifically, to quantify baseline stability, we calculated the cosine similarity between the spatial synergy weights (W) of each individual recording day and the pre-surgery average. This provides a rigorous measure of day-to-day variability relative to the stable baseline structure. We have added these data to Figure 7 (Panel I), which plots the pre-surgery similarity (blue traces) alongside the post-surgery adaptation (red traces).

      We found that baseline stability was remarkably high, with cosine similarity consistently exceeding 0.99 (e.g., Monkey A: 0.99 ± 0.001). This quantification allows the reader to formally assess that the changes observed post-surgery (e.g., drops to ~0.80 or ~0.60 in Monkey B) are well outside the range of normal physiological fluctuation, representing subtle but genuine structural adaptation.

      (3) Last, and maybe most difficult (and possibly out of scope for this work): I would have ideally liked to see some theoretical modeling of the biomechanics so I could more easily understand what the tendon transfer did or how specific synergies affect hand kinematics before and after the surgery. Especially given that the synergies remained consistent, such an analysis could be highly instructive for a reader or to suggest future perturbations to further probe the effects of tendon transfer on long-term learning.

      We thank the reviewer for this excellent and forward-thinking suggestion. We completely agree that a detailed biomechanical model of the tendon transfer would be a powerful tool for understanding the mechanical consequences of the surgery and for interpreting the function of the recorded muscle synergies. However, creating a subject-specific musculoskeletal model with the fidelity required to accurately simulate synergy-to-kinematic transformations is a highly complex project that we feel is well beyond the scope of the current manuscript. Such an endeavor would constitute a major research project in its own right.

      Our study’s primary focus was to provide a detailed, longitudinal characterization of the in-vivo neural adaptation following this perturbation, a dataset that is itself rare and valuable. We aimed to document the physiological learning process as it unfolded over many months. Nonetheless, the reviewer’s point is exceptionally well-taken. Currently, we are constructing a monkey musculoskeletal model and performing tendon transfer on this model to investigate what kind of characteristics in the learning process reproduce the synergy changes observed in the experiments. Although this project is still in progress, to date, we have demonstrated that the robustness of synergies themselves is necessary for changes in muscle activity at the synergy level (Nakajima N, Wang S, Ogihara N, Oya T, Seki K, Funato T, Upper Limb Musculoskeletal Model of Macaque Monkey for Approaching Adaptation Mechanism to Tendon Transfer, Society for Neuroscience 2023, Washington DC, USA, 2023).

      The rich dataset we have collected in the present research could serve as an excellent foundation for developing and validating such a model in the future. We believe that combining these two approaches is a critical and exciting next step for the field, and we have highlighted this as a key future direction in our discussion.

      Recommendations for the authors:

      Reviewing Editor Comments:

      When revising the manuscript for resubmission, please try to improve the visual presentation of the data, which is a point highlighted by all three reviewers during the discussion, including making the presentation of monkey-specific results more consistent across subjects.

      We have comprehensively revised the figures to ensure a consistent and clear visual presentation, as requested. Specifically, we standardized the layout across all main and supplementary figures (placing Monkey A consistently in the top rows or left columns and Monkey B in the bottom rows or right columns) and applied unified color schemes throughout the manuscript. Furthermore, we harmonized the presentation of the analytical results, such as the specific cross-correlation pairings in Figures 9 and 10, to ensure that the data for both subjects are presented with identical logic, facilitating direct comparison.

      Reviewer #1 (Recommendations for the authors):

      (1) Please revise the writing; some words are missing (line 90), and some sentences could be clarified slightly, even if the paper is well written (lines 317-320). The paragraph including the idea of tenodesis could also be further clarified, I think.

      Thank you for pointing these out. We have corrected the missing word (osteoarthritis) on line 90. We have also revised lines 317-320 to remove ambiguity. Furthermore, the section describing the tenodesis effect (now section "Distinct neural implementations...") has been substantially rewritten for improved clarity, incorporating a more detailed explanation of the biomechanics.

      (2) In the Introduction, the authors cite Hunter and Eckstein 2009 and Mercuri and Muntoni 2013 without describing the pathological conditions; this will not be clear for not nonspecialists.

      Thank you. We have added brief descriptions ("osteoarthritis, a degenerative joint disease," and "muscular dystrophy, which involves progressive muscle weakness,") directly into the Introduction sentence where these references appear.

      (3) Data presentation: I often thought that the data could be presented more clearly:

      (a) For example, Figure 3D and 4D should show error bars around the mean to have a sense of the consistency of pre-lesion behaviour. Same for other figures like Figure 6.

      We appreciate the reviewer's suggestion to visualize data consistency. (a) Figures 3D, 4D, and 6 (EMG Profiles): For these figures, we opted to display mean traces and peak markers to clearly illustrate the temporal shifts and relationships between muscles. Overlaying multiple standard deviation envelopes in these comparative plots would significantly reduce legibility. However, to fully address the reviewer's request to see the consistency of pre-lesion behavior, we direct attention to Supplementary Figure S1, which presents the complete EMG profiles with full error tubes (Mean ± SD) for every recorded muscle. (b) Quantitative Analysis Figures: We ensured that variability is explicitly visualized in all statistical analyses. The crosscorrelation time-courses in Figures 6 (G-Q), 9, and 10 are plotted with shaded error tubes to show variance. Similarly, the aggregated EMG analysis in Figure 11 utilizes bar plots with explicit error bars to quantify the statistical consistency of the changes.

      (b) The autocorrelation analysis in Figure 6 should also include measures of lag if it’s not at zero lag. If it’s the latter, please specify it in the Methods.

      We thank the reviewer for this question regarding the cross-correlation analysis presented in Figure 6 (Panels G-J, P-Q). We confirm that this analysis was performed at zero time lag. To clarify this, we have added a sentence to the Methods section (Subsection "Crosscorrelation analysis") explicitly stating that the EMG cross-correlations shown in Figure 6 were calculated at zero lag. We have also added a clarifying note ("at zero time lag") to the description of these panels within the Figure 6 caption.

      (c) Seeing EMG patterns similar to those presented in Figures 3D and 4D at different times post-lesion (e.g., as a Supplementary figure) would also give readers a better intuition of the neural changes.

      We thank the reviewer for this suggestion to provide more intuitive examples of the neural changes. We realize we did not sufficiently highlight this in the main text, but this complete data is already available in the manuscript. Supplementary Figures S1 and S2 provide a comprehensive overview of the EMG patterns for all recorded muscles in Monkey A and Monkey B, respectively. These figures show the pre-surgery and post-surgery average profiles for all recording sessions as well as the average profiles from five different post-surgery landmark days, covering the entire adaptation period. We have added explicit cross-references to these figures in the main text.

      (d) I couldn’t fully understand the analysis in Figure 4E; clarify.

      We thank the reviewer for noticing this oversight. The reviewer is correct that Figure 4E was not referenced in the main text. This panel was intended to show the baseline kinematic profiles (MCP and wrist angles) for Monkey B's control session, corresponding to the average EMGs shown in panel 4D. Given that our more comprehensive kinematic analyses are now presented in Figure 12 and the new Figure 13, we believe panel 4E is largely redundant. To improve the clarity and focus of Figure 4, we have removed panel 4E and its description from the revised manuscript.

      (e) Some figures showing neural changes (e.g., Figures 6G-J, 6P,Q, Figures 9 and 10, and even Figure 11 for different reasons) would become more understandable if they were accompanied by the behavioural changes (e.g., something like Figure 5A on top of them).

      We agree that visualizing the temporal link between neural reorganization and behavioral recovery is essential for interpreting the data. We have implemented this suggestion by overlaying behavioral metrics onto the right y-axes of Figures 6 (G-Q), 9, 10, and 11. However, regarding the specific behavioral metric, we opted to overlay the maladaptive behavior/aberrant reaching metric (from Figure 5B) rather than the grip formation time (Figure 5A). We found that the maladaptive behavior profile provided a clearer and more direct correlate to the neural data, as its peak coincides precisely with the ‘swapped’ synergy phase, thereby effectively illustrating the functional cost of that specific neural state.

      (f) Some figure captions could be improved by adding more detail (e.g., for Figure 6).

      We agree. We have substantially expanded and improved the captions for Figure 6 and Figure 7 to make them more self-contained and guide the reader more effectively through the key findings presented in the panels. We have also reviewed other captions for clarity.

      (g) I’d show the cosine distance between synergies across days as a main figure, e.g., as part of Figure 7, because this is an important result.

      We agree that the longitudinal stability of the synergy structures is a crucial result that deserves prominence. We have implemented this suggestion by adding a new panel, Figure 7 (I, K) for primary synergies and Figure 8 (K, L) for secondary synergies, which plots the cosine similarity of the spatial synergy weights across the entire experimental timeline. This figure explicitly visualizes the high stability of the pre-surgery baseline (blue traces, similarity > 0.99) and contrasts it with the dynamic structural tuning observed during the post-surgery adaptation (red traces), providing a clear, day-by-day account of synergy evolution as requested.

      (h) In Figure 7C, D and G, H, it’d be interesting to also see in the background the EMG for the transferred muscle that belongs to each synergy, to appreciate their relationship.

      We thank the reviewer for this suggestion. To illustrate the close relationship between the primary synergies and their key constituent muscles, while avoiding visual clutter in the complex post-surgery plots, we have modified the pre-surgery panels of Figure 7 (C, D, G, H). In these panels, we have now overlaid the average pre-surgery EMG profile of the primary transferred muscle belonging to that synergy (e.g., FDS for Synergy A, EDC for Synergy B) as a thin, gray, dashed line. This visually confirms the tight correlation between the synergy profile and the muscle’s activity at baseline.

      (i) In page 10, the authors report as maladaptive behaviour the duration of the aberrant reaching component from day 29 (monkey A) and day 20 (monkey B). What was happening before those recording dates? Were the monkeys recovering?

      Thank you for this question. We have added two sentences to the start of the Results section (“Functional Recovery Follows...”) clarifying that the period between surgery and formal recordings included approximately one week of home cage recovery followed by several weeks of assisted task practice. Formal recordings began once the monkeys could perform the task consistently without assistance.

      (j) In the Methods (EMG Analysis), the authors state that they resumed their recordings post-TT “once they (the monkeys) were able to perform the task on their own”. It would be good if the authors made this more precise (e.g., based on success rate or another metric).

      We thank the reviewer for this suggestion to increase precision. We have revised the Methods section to include the specific criteria used for resuming post-surgical recordings. Recordings were restarted once the monkeys were able to perform the task independently (i.e., without assistance from the experimenter) and consistently achieved a successful trial count of at least 100 trials within a single experimental session.

      (k) Line 266- reads “Alternation of EMG activity in non-transferred muscle suggests one possibility: TT might alter the control strategy of coordinated muscle activity for hand movement by modifying the transferred muscles and their agonists as a cohesive unit”, however, some “muscles showed patterns that were incompatible with a simple swap” (Lines 255-256). Doesn’t this observation suggest that what happens is not a simple change in muscle synergies?

      We thank the reviewer for this insightful question regarding the interpretation of muscles with adaptive patterns incompatible with the primary ‘swap-and-revert’. We agree that these observations require careful consideration within the modular framework. Our interpretation is that these muscles do not represent evidence against modular control, but rather reflect the involvement of multiple modules adapting concurrently. Specifically, muscles like FCR and PL, which showed distinct patterns, are primary members of Synergy C (the wrist flexor synergy) in Monkey A. Their adaptive profile is therefore consistent with the task-specific recruitment and retiming of Synergy C as part of the compensatory tenodesis strategy, rather than being a deviation from the swap observed in Synergies A and B. Synergies represent the dominant, shared variance in muscle activity. While they capture the overall strategy, some degree of individual muscle variation or the influence of secondary synergies is expected. We have added a sentence to the Results section to clarify that these diverse patterns likely reflect the differential involvement of muscles in multiple adapting synergies. We believe the overall evidence still strongly supports the modulation of stable synergies as the primary mechanism of adaptation in this paradigm.

      (l) You may want to call synergy A and synergy B, synergy F and synergy E to make recall easier? (Same for synergy C and D, which could be F2 and E2).

      We thank the reviewer for this helpful suggestion aimed at improving clarity. We considered renaming the synergies based on function (e.g., F/E). However, given the number of figures and the complexity of a global change, and the fact that the functional roles of Synergies C and D differed between animals, we decided to retain the original A/B/C/D labels for consistency. To ensure clarity for the reader, we have carefully checked the manuscript to ensure that we consistently define the primary functional role of each synergy (e.g., "Synergy A, the primary finger flexor synergy") when it is discussed.

      (m) Lines 315-317 - “These pattens of changes in synergy 3 and 4, both contributed minimally to the EMG of transferred muscles” -> This statement puts the causality as synergies cause muscles to activate according to certain patterns, which is supported by work by several groups -including the authors- however, they could also reflect biomechanical and task constraints as other have argued; perhaps this tone would be better for the discussion?

      We thank the reviewer for this nuanced point regarding the interpretation of synergy contributions. We agree that the causal relationship between computed synergies and muscle activity is complex and can reflect both neural commands and task constraints. To address this, we have revised the sentence in question in the Results section. Instead of stating that the synergies "contributed minimally," we now state that the changes in these synergies "were associated with minimal EMG activity in the transferred muscles." This phrasing is more descriptive of the observation and less implicitly causal, while retaining the key point within the flow of the results. The subsequent sentences, which offer interpretation, are already framed speculatively ("This suggests...", "may have served...").

      (n) Line 403 How do the authors conclude from the synergy patterns in Figure 11 that the early post-TT is characterised by “an unstable and inefficient neural control strategy”? To me, this is shown clearly in the behaviour, not in these plots, unless I’m missing something?

      We thank the reviewer for this comment, which highlights the need to clearly connect our neural findings to the behavioral outcome. The reviewer is absolutely correct that the behavioral data (Fig. 5) provides the most direct evidence of instability and inefficiency during the early adaptation phase. Our intention was to argue that the neural patterns observed in Figure 11 provide a physiological correlate for this behavioral inefficiency. Specifically, the escalating aggregated EMG activity observed in the conflicted extensor synergy (Synergy B), which we term the ‘arms race’, represents significant muscle co-activation. Such co-activation is widely understood to be energetically costly and reflects a suboptimal control strategy where the CNS is essentially "fighting itself" against the altered mechanics. To make this link clearer, we have revised the concluding sentence of the relevant paragraph in the Discussion ("The early adaptation phase...") to explicitly state that this escalating co-activation is a known marker of inefficient recruitment and that it occurred concurrently with the period of poor behavioral performance shown in Figure 5.

      (o) Lines 469-471. The authors suggest that muscle synergies may be preserved post-TT because a modular approach (to motor control) may be computationally easy and metabolically cheap. To me, recent data suggest that the most parsimonious explanation is what they later say: that the nervous system may not be plastic enough to change this (e.g., see Makin and Krakauer, “Against reorganisation” also in eLife).

      We thank the reviewer for raising this important theoretical point and for referencing the relevant literature on constraints on cortical reorganization. We agree that the preservation of muscle synergies in the face of such a profound perturbation is a key finding that warrants careful interpretation. In our revised Discussion (section "The CNS Defaults to a Modular Strategy..."), we have now explicitly incorporated the perspective that synergy stability may reflect inherent constraints on neural plasticity, citing Makin and Krakauer (2023), alongside our original hypothesis regarding computational and metabolic efficiency. We present these ideas not as mutually exclusive, but as potentially complementary factors that both contribute to the CNS’s apparent preference for modulating existing modules rather than fundamentally restructuring them.

      (p) Lines 501-503. Also on interpretation. Would the metabolic cost indeed be much higher? Couldn’t the observed change in strategy be explained purely based on performance metrics?

      This is an important point. We agree that statements regarding high energy expenditure are interpretations, not direct measurements. We have carefully revised the manuscript (Abstract, Results, and Discussion) to soften these claims, using more speculative language (e.g., "likely costly," "what we propose was...") to clearly distinguish our interpretations from direct empirical findings.

      (q) Lines 538-. The authors link the initial adaptation phase to the fast process reported in adaptation studies and say that this leads to poor retention. However, it seems from their data that the behaviour is stable across (early) days, so doesn’t this rule out such an interpretation?

      We thank the reviewer for this insightful question regarding the interpretation of the early adaptive phase within the two-state model framework. The reviewer correctly notes that the early post-surgical behavior, while maladaptive, appeared relatively stable across days and did not show the rapid decay sometimes associated with the "poor retention" characteristic of the fast system. We agree that this apparent stability requires careful interpretation. In our revised Discussion (section "A Multi-Timescale Model..."), we now propose that the fast system is primarily responsible for the initial, rapid adoption of the ‘swap’ strategy in response to the large error signal. The subsequent persistence of this flawed but stable state for several weeks is likely not due to strong retention by the fast system itself, but rather reflects the time required for the parallel slow system to gradually develop a more effective compensatory strategy (i.e., the tenodesis grasp). Once this alternative strategy became viable, it enabled the abrupt "switchback," which we also attribute to the fast system recalibrating away from the highly costly swap strategy. Therefore, we believe our data is consistent with the involvement of a fast system driving rapid strategic shifts, even if the typical "poor retention" phenotype is masked by the lack of a viable alternative strategy during the early phase.

      Reviewer #2 (Recommendations for the authors):

      (1) The discussion would benefit greatly from a more careful comparison with prior work characterizing the response to experimental or clinical tendon or nerve transfer in different models.

      We thank the reviewer for suggesting these important references and for the recommendation to compare our findings more carefully with prior work. This is an excellent point, and we agree it will significantly strengthen the discussion. In our full revision, we have added a new paragraph to the Discussion section dedicated to this comparison. We discuss how our findings relate to classic work showing primate adaptive capacity beyond simple maladaptive responses (Sperry, 1947), EMG evidence for the persistence of original neural patterns alongside new ones in human patients (Illert et al., 1986), the critical role of altered peripheral biomechanics and myofascial force transmission in complicating adaptation (Maas & Huijing, 2012), and how our observation of synergy stability aligns with evidence for modular adaptation strategies (Berger et al., 2013). This comparison helps situate our unique findings of a multi-timescale process and synergy timing modulation within the broader context of motor relearning after musculoskeletal rearrangement.

      (2) Line 90 - Which disease or condition is studied in Hunter and Eckstein (2009)?

      Thank you. We have clarified this in the Introduction; the reference pertains to osteoarthritis.

      (3) Line 280 for clarity in text and as a reminder to the readers, please state which muscles are involved in each synergy grouping.

      We have updated the text (Results, 'Adaptation occurs through modulating...') to explicitly list the main contributing muscles for each synergy grouping (e.g., Synergy A: FDS and FCU for Monkey A). This provides the requested clarity regarding the functional identity of each synergy while maintaining readability. For the complete, quantitative muscle weight composition including minor contributors, we referred the reader to Figure 7 and Supplementary Table 1.

      (4) Line 180 There are differences in the time course for measurements between the behavioral metrics and EMGs. If not recorded at fixed time intervals, the differences in the time courses for the two monkeys should be explained.

      We thank the reviewer for this question regarding the time courses of our measurements. We interpret this comment in two ways, both of which we have addressed in the revised manuscript.

      First, if the reviewer is asking about the overall recording schedule, they are correct that sessions were not performed at fixed daily intervals, and the specific days sampled differed between monkeys. This non-uniform sampling was due to the practical constraints of longterm behavioral experiments (e.g., animal cooperation, scheduling, weekends) and the aim to capture data during key phases of adaptation. However, within any given session, behavioral (video) and EMG data were always collected concurrently.

      Second, if the reviewer is asking whether the set of days included differs between the behavioral plots (e.g., Fig 5) and the EMG/synergy plots (e.g., Figs 6, 9-11), this is a possibility depending on data quality criteria. Our criterion for including a session in the behavioral analysis was a minimum of 20 successful trials. However, for the more demanding synergy analysis, we required a higher minimum of 100 successful trials to ensure robust factorization. It is possible that a few sessions met the behavioral criterion but not the synergy criterion and were thus excluded from the latter analysis, leading to slight differences in the days presented across figures. To ensure full clarity, we have added text to the Methods section explicitly stating: (A) the rationale for the non-uniform daily sampling schedule, and (B) the specific minimum trial count criteria used for including data in the behavioral versus the synergy analyses, noting if this resulted in different sets of days being analyzed for different figures.

      (5) General figure comments - The figures are informative, but they could be better presented, designed, and formatted to explain the important results in the paper. The figures should be able to explain most of the key results without entirely referring to the text to find some of the details. I had a bit of trouble understanding Figure 9 & 10. I would also like to suggest that bringing raw data into some figures (e.g., EMG of different muscle groups), such as showing stability between the synergies, could improve the results and allow the story to flow with more clarity. Likewise, clearly showing the differences between baseline EMG measurements and post-surgery measurements could improve some of the result figures.

      We thank the reviewer for these important general comments on data presentation. We agree that the figures are the key to our story and are implementing several revisions based on this and other reviewer feedback to improve their clarity.

      General Presentation: We have conducted a thorough review of all figures to improve layout, consistency, and font legibility (addressing R3, 1 and the Reviewing Editor's comments). This includes adjusting the layouts of Figures 3, 4, and 6 for better alignment and clarity.

      Figures 9 & 10 (Cross-correlation): The reviewer mentioned having trouble understanding these figures. In our revision, we have substantially rewritten the captions for Figures 9 and 10 to be much more descriptive. We explicitly walk the reader through how to interpret the plots (e.g., "The ‘swap’ is evidenced by the drop in self-correlation... and a concurrent rise in antagonist-correlation...").

      Including "Raw Data" (EMG): We thank the reviewer for this suggestion to provide more intuitive examples of the neural changes. We realize we did not sufficiently highlight this in the main text, but this complete data is already available in the manuscript. Supplementary Figures S1 and S2 provide a comprehensive overview of the EMG patterns for all recorded muscles in Monkey A and Monkey B, respectively. These figures show the pre-surgery and post-surgery average profiles for all recording sessions as well as the average profiles from five different post-surgery landmark days, covering the entire adaptation period. These figures directly visualize the swap-and-revert pattern in the transferred muscles and their agonists (e.g., EDC, ED23), as well as the diverse and complex adaptations in other nontransferred muscles (e.g., FCR, PL), as requested. To make this clearer, we have added explicit cross-references to Supplementary Figures S1 and S2 within the main Results section to ensure readers are directed to this detailed data.

      Showing Differences (Pre vs. Post): To "clearly show the differences between baseline... and post-surgery measurements," we implemented the point-by-point statistical comparison of pre- vs. final-day synergy profiles (as suggested in R1, 2b). This has resulted in a new Supplementary Figure visually highlighting the precise periods in the task where the final profiles still differ significantly from baseline (Fig. S9).

      We believe these additions (new figures and improved captions) will make the results much clearer and more self-explanatory, as the reviewer suggested.

      (6) Figure 1 A table with all the acronyms would help with identifying all the muscles and their respective synergies (supplemental), especially when describing the muscles in the result of the discussion section.

      This is an excellent suggestion. We have created a comprehensive table (Supplementary Table 1) listing all muscle abbreviations, full names, primary functional groups, and assigned synergies for both monkeys. We have added a reference to this table in the Figure 1 caption and the Methods section.

      (7) Figure 2 - is this mainly from Monkey A? If so, it should be stated.

      We thank the reviewer for pointing out this omission. We have updated the caption for Figure 2 to clarify that the example data shown (ultrasound, trajectories, and quantitative plots) are from Monkey A.

      (8) Figure 3 & Figure 4 seems unbalanced because of the descriptive need to explain Monkey B’s tasks? The figure alignments could be better.

      We thank the reviewer for this comment on the visual presentation of Figures 3 and 4. The reviewer’s observation that the figures appeared ‘unbalanced’ was correct. This was a direct consequence of two issues: (1) the different tasks required slightly different schematics (the "descriptive need" the reviewer mentioned), and (2) the original Figure 4 contained an additional kinematic panel (formerly 4E) that was unique to Monkey B, which broke the parallel structure with Figure 3.

      To address this and significantly improve the alignment, we have now moved the unique kinematic panel (formerly 4E) to a new Supplementary Figure (Supplementary Figure S8). This change has allowed us to re-arrange the panels in Figures 3 and 4 so that they now follow the exact same order. We have also adjusted the layout to ensure that corresponding panels are of a consistent size. We agree that this creates a much better visual balance and makes the comparison between the two monkeys far more direct and clear, as the reviewer suggested.

      (9) Figure 5. It seems like the animals can still perform the task post-surgery, but with high variability. Maybe emphasize the differences in variability between baseline and postsurgery?

      We thank the reviewer for this suggestion to emphasize the changes in variability. We have now quantified this using the Coefficient of Variation (CV) for key behavioral metrics across different phases (Pre-surgery, Early, Mid, Late post-surgery). The results confirm the reviewer’s observation of high variability post-surgery, particularly in the early phase. For instance, Monkey A’s grip formation time CV spiked dramatically (Pre: 47% vs Early: 133%), while Monkey B’s remained high (Pre: 82% vs Early: 76%). Interestingly, while Monkey A’s variability returned close to baseline levels in the late phase (Late: 55%), Monkey B’s variability increased further (Late: 97%), suggesting persistent inconsistency despite functional recovery.

      We also observed metric-specific changes. Monkey A’s pull time became less variable than baseline later on (Pre: 65% vs Late: 43%), suggesting refinement of that action. Conversely, Monkey B’s grasp aperture remained consistently low throughout (Pre: 26% vs Late: 19%), indicating relatively precise kinematic control was maintained or quickly regained. We have added a summary of these findings to the Results section to provide a more complete picture of how behavioral variability evolved relative to baseline during the adaptation process.

      (10) Figure 6 quite a confusing figure. This figure needs to be better presented. The figure legends are hard to see for Monkey A vs Monkey B. At first, I thought Monkey B’s figure legend also represented Monkey A. I would suggest reorganizing the figures for clarity and coherence.

      We agree that the original presentation of Figure 6 was dense and potentially confusing. We have completely reorganized the figure to improve clarity and coherence.

      (1) Clear Separation: The figure is now structured with a strict separation between Monkey A (Left Panels, A-J) and Monkey B (Right Panels, K-Q), with prominent headers for each subject to prevent ambiguity.

      (2) Improved Legends: We have redesigned the legends to be larger and placed them explicitly within their respective subject’s section to ensure it is immediately clear which data they describe.

      (3) Visual Consistency: We have standardized the color schemes and axis layouts across this and all other figures to reduce cognitive load and facilitate easier comparison between subjects.

      (11) Figure 12 - This figure is incomplete without Monkey A’s results. The videos in the supplemental sections seem clear enough for some kinematic analysis. The story could be more supported with more thorough measurements of the kinematics from both animals to show how they differ over time and by highlighting the two phases. As a minor note, it would be helpful to present the kinematic data together with a schematic of when during the task the data are drawn from, using the % task range scale, since that is the standard throughout the paper.

      We thank the reviewer for their suggestions regarding the kinematic analysis. We agree that a parallel kinematic analysis for Monkey A, similar to that in Figure 12, would be ideal. We did attempt this. Unfortunately, while the supplemental videos for Monkey A are sufficient for observing the overall movement trajectory, they are not suitable for the detailed joint angle analysis the reviewer suggests. The videos for Monkey A were recorded at an insufficient frame rate that did not allow to reliably extract the rapid joint angle positions of the wrist and fingers during the grasping movement. This is the reason why this detailed kinematic analysis was limited to Monkey B, for which we had high-speed video recorded at 240 fps, allowing for a robust analysis of these fast movements.

      We have, however, expanded our kinematic analysis for Monkey B to show the refinement of the tenodesis strategy over the full time course (New Figure 13), which does help to highlight the different adaptive phases for that animal. We have also clarified in the manuscript (e.g., in the caption for Figure 12) that the lack of Monkey A data for this specific analysis was due to the lowresolution and low-frame-rate video available.

      We agree that defining the precise timing of the kinematic snapshot relative to our normalized task range is critical for accurate interpretation. In response, we have added a new panel (Figure 12C) that explicitly maps the kinematic snapshot to our standardized task timeline. This schematic clarifies that the joint angle analysis captures the hand configuration during the pre-shaping phase, specifically at 83 ms prior to object contact (which corresponds to -0.02% of the normalized task range). This ensures the kinematic data can be directly interpreted within the same temporal context as the EMG and synergy results presented throughout the paper.

      Reviewer #3 (Recommendations for the authors):

      First and most major: I found many of the figures much too small and incredibly difficult to read. Possibly the most difficult was Figure 7, where I had to zoom in a great deal to read what muscles corresponded to which bars. I don’t have specific suggestions here other than to make sure that figures are legible.

      We thank the reviewer for highlighting this important issue. We have comprehensively revised the figures to ensure they are legible at standard publication sizes. Specific improvements include:

      (1) Figure 7: We have significantly increased the font size of the x-axis muscle labels and optimized the bar chart spacing to ensure the muscle identities are readable without excessive zooming.

      (2) Global Updates: Across all figures, we have increased font sizes for axis labels and titles, removed unnecessary whitespace to maximize the data-to-ink ratio, and exported all final figures in high-resolution vector formats to ensure clarity.

      Second and more minor: I liked the setup of the manuscript, where the authors explained the unique benefits of their experimental methods and the question they were going after (“When confronted with structural changes to the musculoskeletal system, does the CNS adapt by modulating existing synergies, or by shifting toward more fractionated control strategies?”). However, the evolution of the paper made the answer to this question seem very confusing to me as I read it. The results show that monkeys initially modulated existing synergies in phase 1, but then reverted to the original modulation. This, in addition to the way the question was set up initially, made me think the conclusion was going to be that the synergies themselves changed in the second phase, but this paradoxically was not the case--synergies were stable throughout. I was left confused for the back half of the results section, until the discussion on tenodesis and developing compensatory movement strategies. So the answer is that the monkey learns by modulating existing synergies, but using different strategies in different learning phases. I’m not entirely sure how to avoid this confusion, but I wonder if there’s a way to foreshadow this finding earlier on.

      We thank the reviewer for this valuable feedback on the manuscript’s narrative structure. We understand how the initial framing (modulation vs. fractionation) followed by the reversion of the initial modulation could lead to confusion before the compensatory strategy is fully introduced. To address this, we have made two key adjustments in the revised manuscript:

      (1) In the Introduction, after posing the central question, we have added a sentence to subtly foreshadow that the adaptive process might be complex and multi-phasic, requiring analysis over extended timescales.

      (2) In the Results section, at the transition point between describing the reversion of the primary synergy timings and introducing the compensatory tenodesis strategy, we have added a short paragraph to explicitly signal that the reversion was not the complete solution and that a distinct compensatory strategy emerged concurrently.

      We believe these changes improve the narrative flow, provide better signposting for the reader, and mitigate the potential for confusion identified by the reviewer, making it clearer that the ultimate solution involved modulating existing synergies but via different strategies across distinct learning phases. We appreciate the reviewer’s help in identifying this area for improvement.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Mosquito-transmitted diseases cause nearly a million deaths every year and significant worldwide morbidity. Moreover, the geographical range of mosquito vectors is rapidly expanding due to climate change and mosquito-borne disease risks are emerging in new parts of the world.

      Innovation in finding new repellents has been slow due to limitations in current research approaches and high costs for EPA registration (especially for synthetic compounds). Since DEET was discovered in the 1940s only a handful of additional actives have been approved by the EPA for repellent products. In the 20+ years since discovery of insect odorant receptors from genomes, not a single novel repellent compound has been identified that was registered by the EPA. Thus, there is a both a strong need for new approaches to find insect repellents and need for new active ingredients that are safe and strategically effective.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors set up a pipeline to predict insect repellents that are pleasant and safe for humans. This is done by daisy-chaining a new classification model based on predicting repellents with a published model on predicting human perception. Models use a feature-engineered selection of chemical features to make their predictions. The predicted molecules are then validated against a proxy humanoid (heated brick) and its safety is tested by molecular assays of human cells. The humanistic approach to modeling these authors have taken (which considers cosmetic/aesthetic appeal and safety) is novel and a necessary step for consumer usage. However, the importance of pleasantness over effectiveness is still up for debate (DEET is unpleasant but still used often) and the generalization of safety tests is unknown and assumed. The effectiveness of the prediction models is also still warranted. They pass the authors' own behavioral tests, but their contribution to the field is unknown as both models (new and published) have not been rigorously benchmarked to previous models. Moreover, the author's breadth of literature in this field is sparse, ignoring directly related studies.

      Strengths:

      Humanistic approach to modeling considers pleasantness and safety. Chaining models can help limit the candidate odorants from the vastness of odor space.

      Weaknesses:

      The current models need to be bench-marked against leading models predicting similar outcomes. Similarly, many of these papers need to be addressed and discussed in the introduction. The authors might even consider their data sources for model training to increase performance and lexical categorization for interoperability. For instance, the Dravnikes data lexicon, currently used in the human perception lexicon, has been highly criticized for its overlapping and hard-to-interpret descriptive terms ("FRAGRANT", "AROMATIC"). 

      Human Perception:

      Khan, R. M., Luk, C. H., Flinker, A., Aggarwal, A., Lapid, H., Haddad, R., & Sobel, N. (2007). Predicting odor pleasantness from odorant structure: pleasantness as a reflection of the physical world. Journal of Neuroscience, 27(37), 10015-10023.

      Keller, A., Gerkin, R. C., Guan, Y., Dhurandhar, A., Turu, G., Szalai, B., ... & Meyer, P. (2017). Predicting human olfactory perception from chemical features of odor molecules. Science, 355(6327), 820-826.

      Gutiérrez, E. D., Dhurandhar, A., Keller, A., Meyer, P., & Cecchi, G. A. (2018). Predicting natural language descriptions of mono-molecular odorants. Nature communications, 9(1), 4979.

      Lee, B. K., Mayhew, E. J., Sanchez-Lengeling, B., Wei, J. N., Qian, W. W., Little, K. A., ... & Wiltschko, A. B. (2023). A principal odor map unifies diverse tasks in olfactory perception. Science, 381(6661), 999-1006.

      The human perception predictions were performed using models that we had reported in two earlier publications which we have now indicated clearly in the results and methods sections of the VOR: Kowalewski & Ray, iScience (2020b) and Kowalewski, Huynh & Ray, Chem. Senses (2021). Three of the four references pointed out by the referee were cited in these prior studies, which involved computational validation by predicting on a test set of the data which was left out of training (as typically done), and also predicting across different human studies with a high degree of success. A rigorous benchmarking of the odor perception models was done in Kowalewski, Huynh & Ray, Chem. Senses (2021) and a mini-review published in the same issue of the journal by Gerkin, Chem. Senses, (2021). This included a favorable comparison with the two references indicated by the referee: Keller et al. Science (2017) as well as the Gutiérrez et al. Nat. Communication (2018).

      The 4th reference, Lee et al, Science (2023) describes a neural network approach and was published well after our mosquito behavior studies were completed. Although using an advanced Neural network model Lee et al. worked with 2-D structures of compounds in contrast to our 3-D approach. They also did not report cross-study validations or comparisons with Keller et al, 2017 or benchmark to past studies, so it is difficult to compare advances if any. We have added this reference in the VOR.

      The intent of the current study was to move beyond testing approaches, of which there are many, and instead work on a practical use case. As we see it, it is not necessarily the prediction of fragrance character or quality alone that matters but overlap with other predicted bioactivities. From the perspective of human use, a molecule with a pleasing scent that also repels insects is likely to be far more useful than one with an unappealing scent. Accordingly, our task in this study was to select molecules that fit into specific use categories: display strong insect repellency, have pleasing scent profiles, are natural in origin and are potentially repurposed from flavors and fragrances.

      Insect Repellents:

      Wright, R. H. (1956). Physical basis of insect repellency. Nature, 178(4534), 638-638.

      Katritzky, A. R., Wang, Z., Slavov, S., Tsikolia, M., Dobchev, D., Akhmedov, N. G., ... & Linthicum, K. J. (2008). Synthesis and bioassay of improved mosquito repellents predicted from chemical structure. Proceedings of the National Academy of Sciences, 105(21), 7359-7364.

      Bernier, U. R., & Tsikolia, M. (2011). Development of Novel Repellents Using Structure− Activity Modeling of Compounds in the USDA Archival Database. In Recent Developments in Invertebrate Repellents (pp. 21-46). American Chemical Society.

      The Katritzky et al. PNAS (2008) paper is cited in our study, and we have indicated that the chemical analogs reported therein are part of the training data set in our study. We thank the reviewer for pointing us to the book chapter by Bernier & Tsikolia (2011), which reviews the QSAR approaches taken for repellent discovery and in large measure focuses on the Katritzky et al. PNAS (2008) paper. We did cite two relevant studies by Uli Bernier.

      The current study assumes that insect repellents repel via their odor valence to the insect, but this is not accurate. Insect repellents also mask the body odor of humans making them hard to locate. The authors need to consult the literature to understand the localization and landing mechanisms of insects to their hosts. Here, they will understand that heat alone is not the attractant as their behavioral assay would have you believe. I suggest the authors test other behaviour assays to show more convincing evidence of effectiveness. See the following studies:

      De Obaldia, M. E., Morita, T., Dedmon, L. C., Boehmler, D. J., Jiang, C. S., Zeledon, E. V., ... & Vosshall, L. B. (2022). Differential mosquito attraction to humans is associated with skin-derived carboxylic acid levels. Cell, 185(22), 4099-4116.

      McBride, C. S., Baier, F., Omondi, A. B., Spitzer, S. A., Lutomiah, J., Sang, R., ... & Vosshall, L. B. (2014). Evolution of mosquito preference for humans linked to an odorant receptor. Nature, 515(7526), 222-227.

      Wei, J. N., Vlot, M., Sanchez-Lengeling, B., Lee, B. K., Berning, L., Vos, M. W., ... & Dechering, K. J. (2022). A deep learning and digital archaeology approach for mosquito repellent discovery. bioRxiv, 2022-09.

      In this study we took an unbiased approach to compile the training data set, including several known insect repellents of varying chemical structures and volatility, for most of which there is no information on how they are sensed by insects. Not surprisingly, the repellents we identified are varied in structure and in functional groups, and are likely detected in more than one way by the mosquitoes, using olfactory and/or gustatory systems. We did not consider “masking” of skin attraction as a factor in the training data set in this study, which precluded the need to discuss the papers pointed out by the referee. In fact there is an extremely vast and rich body of literature regarding human skin odor, CO<sub>2</sub> and breath emanations, which includes our own contributions of research, and review articles that are not discussed in the current paper.

      We did in fact conduct human arm-in-cage experiments with a few of the compounds reported in this study using female Aedes aegypti mosquitoes; a preprint describes the smaller scale analysis, the results of which show very strong repellency, in Boyle et al. bioRxiv (2016) https://doi.org/10.1101/060178 (Figure 4). That line of experimentation falls outside the scope of this current study and are being pursued in a separate form. We have added the citation for this preprint in the results section of the VOR.

      However, heat with CO<sub>2</sub> as used in this study offers a practical proxy for evaluating prospective repellents in a high-throughput manner. It would certainly be desirable to further evaluate additional candidates from the heat attraction assay with human subjects in the future.

      We thank the reviewer for pointing out the preprint by Wei, et al. bioRxiv (2022). Our approaches differ in that Wei et al do not consider properties such as fragrance and toxicity. We also cannot assume that their newer neural network model is superior because although the model uses a large training dataset, it does not use 3D chemical structures that are extremely relevant for biological activity. While very little information is available for the actives reported in Wei et al., we independently evaluated their top compounds similar or better than DEET (CAS#3731-16-6, 4282-32-0, 2040-04-2, 32940-15-1 and 3446-90-0) and could not find information about toxicity, smell, or natural source. In contrast, the top repellents that we identify here as similar or better than DEET (N=8) are all classified as GRAS (Generally Regarded as Safe) compounds by the Flavor and Extract Manufacturers (FEMA), are all naturally occurring (plum, jasmine, mushroom, grapes, etc), and have pleasant smells. The Dermal toxicity values in rabbits are known for six of our compounds and are at the best possible levels (≥5000mg/kg).

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting study that seeks to identify novel mosquito repellents that smell attractive to humans.

      Strengths:

      The combination of standard machine learning methods with mosquito behavioral tests is a strength.

      Weaknesses:

      The study would be strengthened by describing how other modern ML approaches (RF, decision trees) would classify and identify other potential repellents.

      The current approach already shows a success rate >85% for repellency coefficient >0.5 and identifies eight naturally occurring GRAS compounds with repellency as strong as or greater than DEET. This substantially expands the repertoire of strong natural repellents. Since the 1950s only six active ingredients have been registered by US EPA for use in topical repellents, of which only two are natural in origin (Oil of lemon eucalyptus and catmint oil) and they typically do not protect as well as DEET does. That being said, we have since explored other predictive algorithms, for instance Neural Networks. The experimental evaluation of these newer pipelines will take significant resources and time and will be the focus of future grants.

      A comparison in the repellent activity between DEET and the top ten hits identified in this new study indicates little change in repellent activity (~3%), suggesting that DEET remains the gold standard. Without additional toxicity tests, the study is arguably incremental. The study's novelty should be better clarified.

      There is an urgent need to find new insect repellents that have better chances of being adopted by people who avoid DEET, such as in Africa and Asia. Having more natural actives that are effective, expands the tools against disease transmitting mosquitoes. As mentioned above, the top repellents that we identified as similar to or better than DEET (N=8) are all classified as GRAS (Generally Regarded as Safe) compounds by the Flavor and Extract Manufacturers (FEMA), are all naturally occurring (plum, jasmin, mushroom, grapes), and have pleasant smells. The Dermal toxicity values in rabbits are known for six and they are of the best possible levels (≥5000mg/kg).

      The Methods in the repellency tests are sparse, and more information would be useful. Testing the top repellents at low doses (<<1%) and for long periods (2-12 h) would strengthen the manuscript. Without this information, the manuscript is lacking in depth.

      The US Environmental Protection Agency (EPA) regulates mosquito repellents, and DEET-based commercial products are typically assigned protection times that vary with concentration (10% ~2 hrs, 30% ~5hrs, 100% ~8hrs). These would be the relevant concentrations for testing protection times on human volunteers, not lower as suggested. Such studies fall within the realm of EPA registration efforts, involving extensive GLP-testing for safety, physical chemistry, and Human Subjects Board approvals. This is outside the scope of the current study and is typically accomplished during development efforts.

      Testing human subjects on their olfactory perceptions of the repellents would also increase the depth and utility of the manuscript. Without additional experiments, the authors' conclusions lack support and have limited impact on the state-of-the-art.

      This manuscript is a mix of different approaches, which makes it lack cohesion. There is the ML method for classifying new repellents that smell good, but no testing of the repellents on human volunteers. The repellents are not tested at realistic concentrations and durations. And the calcium mobilization test is strange and makes little sense in the context of the other experiments and framing of the manuscript.

      The human olfaction validation that we present in this paper is consistent with most current publications in the field (for example, Keller et al, Gutiérrez et al.). More systematic validation of the human odor character prediction pipelines used was presented in two previous papers Kowalewski & Ray, iScience (2020b) and Kowalewski, Huynh & Ray, Chem. Senses (2021) and a mini-review published in the same issue of the journal by Gerkin, Chem. Senses, (2021).

      Reviewer #3 (Public Review):

      While I am not a specialist in this field, I do have some knowledge of the subject matter and the computational aspects involved. The authors employ simple machine learning techniques (such as SVM) for the following purposes:

      (a) Prediction of aversive valence.

      (b) Predicting anti-repellent chemicals.

      (c) Predicting calcium mobilization.

      The approach is commonplace in chemoinformatics literature.

      Weaknesses:

      All the above models are presented discretely, making it difficult to discern experiment design principles and connectedness.

      The ML work is rudimentary, lacking adequate details. Chemoinformatics has reached great heights, and SVM does not seem contemporary.

      There is significant existing research on finding repellents.

      In the current study, we aimed to showcase how computational research may be combined with basic science to create scalable pipelines that address real world problems, rather than to demonstrate methodological novelty of chemoinformatics approaches. Specifically we wanted to use different predictive models to identify compounds that display strong insect repellency, have pleasing scent profiles, are natural in origin and are potentially repurposed from flavors and fragrances. Unfortunately, there is very little existing research on insect repellents that have these types of properties, which would make them better candidates for EPA registration. Most tested compounds are synthetic, and are often analogs of known repellents like DEET, and necessitate substantial time and resources to register. Moreover the identities of chemosensory receptors that are responsible for repellency to DEET and other compounds, and that are conserved across Anopheles, Aedes and Culex mosquitoes are not known.

      It is true that the field of cheminformatics has experimented with a variety of newer approaches, based in part on neural networks (e.g., Graph Neural Networks and graph embeddings to encode chemical structure rather than a more conventional Extended Connectivity Fingerprint (ECFP)). Importantly, however, novelty does not imply usefulness. The mosquito behavior experiments that we present show a very high success rate (>85%), validating our approach and identifying several excellent candidates already.

      Strengths:

      Authors attempt to make a case for calcium mobilization in the context of repellency. This aspect sounds interesting but is not surprising.

      Behavioral profiling of repellents could be useful.

      We thank the referee for this comment. We have indeed done behavioral profiling for several repellents that evoke calcium mobilization, but we do not see any clear correlation thus far.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The paper describes a biologically plausible version of JEPA using recurrent neural networks called RPL for recurrent predictive learning. Given an embedding z<sub>t</sub>, a recurrent neural network processes these inputs with the form: c<sub>t</sub>+1 = RNN(c<sub>t</sub>,z<sub>t</sub>). Then the predictive network f is predicting the future inputs with the format: min||f(c<sub>t</sub>) − stop grad(z<sub>t</sub>+∆<sub>t</sub>)||<sup>2</sup>. I understand that a prediction error is defined as: e = z<sub>t</sub>+∆<sub>t</sub> − f(c<sub>t</sub>) to model cortical measurements in the oddball task.

      The RPL model is also shown to build an internal world model, with ”real-world” data like the movement of moving animals or speech signals. The representation is then compared to V1 data and expected prediction error signals in an oddball setting. In a stacked hierarchy of RNN learning with RPL, the higher layers appear to learn high-level latent variables, although gradients are not propagated downward to the lower layers.

      The paper tackles an open question: Self-supervised learning is thought to be a fundamental principle to explain how computation is structured in the brain. Cortical data suggest qualitatively that prediction error is a core principle of representation learning in the brain, but the field is still looking for a simple yet expressive model that would explain how the cortex learns its representations. RPL contributes in that direction by making a useful link between cortical representation learning in RNN models and the JEPA learning algorithm that was demonstrated to scale to large world model learning from video data by Lecun’s group. It is very useful to connect this popular deep learning algorithm to cortical data.

      The model formalism is relatively elegant and simple: Simple next input prediction objectives are conceptually simple but not necessarily trivial to build at scale. There is a clear benefit in comparison with contrastive or IL methods because they are free from dataset-specific data augmentation and negative samples. Thereby moving the comp neuro field towards conceptually simpler models of representation in the cortex. Yet predictive only models (and in particular predictive models in latent space instead of pixel space) are not easy to build in a stable fashion. JEPA family is basically intended to solve this question; it is very nice and timely to bring this to comp neuro.

      The methodology combining comp neuro and deep learning makes sense: The conceptual and qualitative analogy with cortical prediction errors is relevant and consistent with what is expected as a model of self-supervised learning in cortical models. The methodology to compare RPL with IL and CL is methodologically meaningful and grounded: showing, for instance, how some of the models fail to represent some latent structure in some toy datasets is interesting.

      (1.1) h-RPL: The h-RPL is perhaps the most creative departure from the JEPA model family. It would be interesting to say more about what was particularly difficult to see in the latent variables emerging in the hierarchical model. I often find it magical that layer-wise learning rules of this type are not learning redundant representations. Any insights why this is not the case here would be potentially insightful.

      We thank the reviewer for this comment. Regarding representational collapse in h-RPL: each local circuit independently applies the same collapse-preventing strategy as the single-level RPL model: namely, the asymmetric prediction architecture combined with the stop-grad operator. Since this mechanism operates locally within each circuit, it is sufficient to prevent collapse at every level of the hierarchy independently (see also our response to Point P1.3).

      The more subtle question is why the circuits learn non-redundant rather than identical representations across the hierarchy. We believe two mechanisms are at play here: First, the hierarchical encoder is a stacked convolutional network, meaning that receptive field sizes grow with depth. This architectural inductive bias naturally encourages successive circuits to operate on increasingly spatially integrated features, creating a structural pressure toward learning complementary rather than redundant representations. Second, the growing expressivity of the network with depth means that higher circuits have access to richer, more abstract inputs from which they can extract higher-level latent structure that is not already captured by lower circuits. Together these factors: the local collapse-preventing mechanism and the depth-dependent growth in receptive field size and network expressivity presumably explain why h-RPL builds an increasingly refined and non-redundant representational hierarchy.

      What we will do: We will expand our discussion on this point in the revised manuscript. We plan to expand our quantification on how abstractions emerge in h-RPL in future work in which we will also study variations with top-down connections.

      (1.2) In general, I fully support the type of question and ideas that the paper is putting forward. It is, however, very hard in this research field to gain insight into specific conceptual contributions or specific bits of experimental data that the model puts forward. In pointing to the following weaknesses, I am encouraging the authors to lay out more clearly what the unique hypothesis is or the contribution of the RPL model that we should remember it for.

      Thanks for the positive feedback along with the constructive criticism, and we agree that articulating the core contributions more crisply would strengthen the paper.

      At its heart, we believe the paper makes two contributions we hope it will be remembered for. First, while prior work has established that invariant representations can be learned via local Hebbianlike learning rules, we show that learning equivariant representations alongside a latent dynamics model requires something qualitatively different: a local circuit; one with recurrent dynamics and an asymmetric predictive architecture. RPL provides a minimal concrete instantiation of this principle.

      Second, and perhaps more broadly, the model makes a structural prediction about (cortical) neuronal circuit organization: since the encoder, integrator, and predictor each perform functionally distinct computations, the framework implies the existence of corresponding cell types and connectivity patterns one should look for in experimental data.

      What we will do: We will sharpen these above messages in the revised manuscript to ensure these contributions are prominently highlighted throughout the paper.

      (1.3) Comparison with JEPA variants: JEPA variants are integrating different details into the learning algorithm. Integrating, for instance, “masking” of the latent encoder targets, or EMA in the style of BYOL or Siamese networks, for the predicted representations. It is great that RPL does not seem to need any of those (next input prediction is a natural implementation of masking, and EMA does not seem to be used). It is notoriously hard for the JEPA model to work without these features. Since some of these details are sometimes surprisingly crucial for a simulation to work, it would be good to report which of the other important details were key to live without EMA and masking. Is it the difference in learning rate, for instance? Or maybe the tasks considered are simply easy enough for any model to work; if so, it could be useful to acknowledge to what extent this is true.

      We thank the reviewer for raising this important point. There are two key mechanisms that ensure stable, non-trivial training in RPL. First, using a higher learning rate for the predictor relative to the encoder is crucial for stable training. This prevents the predictor from collapsing the encoder representations and was already noted empirically by Chen et al. (2021).

      Second, and more fundamentally, predicting at the level of the memoryless encoder output, rather than at the level of the recurrent integrator, is essential to prevent a degenerate solution in which the RNN simply learns to generate an internally predictable time series unrelated to the input. By anchoring the prediction target to the encoder, the model is forced to ground its representations in the sensory input. Intuitively, otherwise the RNN can simply “make up” a predictable time series, which satisfies the learning objective, but would not yield useful internal representations.

      Beyond these architectural points, previous work from our group (Srinath Halvagal et al., 2023) has shown mathematically that JEPAs without EMA avoid collapse via an implicit variance regularization mechanism, and we believe RPL benefits from the same principle. Indeed, we now have a more complete theoretical understanding of this, including identifiability proofs for the latent dynamical model under relatively mild assumptions (Mikulasch et al., 2026). This work has recently been accepted at ICML. Other than that, one has to ensure that representations are not already nearly collapsed at the beginning of training. In this paper, we used normalization layers (batchnorm) in the encoder to ensure this.

      Finally like all SSL paradigms the augmentation strength is an important hyperparameter that impacts the quality of learned representations. In the temporal predictive setting, the augmentation strength is fixed by the world itself. The only knob we have to play with is the prediction horizon ∆. While we typically focused on next-time-step (∆ = 1) prediction, we saw a clear effect in the case of the speech dataset where ∆ = 8, but not ∆ = 1, yielded useful representations for the tasks (Fig. 5b).

      What we will do: We will discuss the above points more prominently in the discussion to avoid them being overlooked in the methods. Additionally, we will include a plot on the empirical prediction horizon for the speech dataset in the supplementary material for reference.

      (1.4) Comparison with IL and CL: On a high level, the comparison with IL and CL algorithms is written as conclusive. I suspect that the failure modes of IL and CL that are described are not due to the algorithms themselves, but rather to the construction of invariance statistics or the choice of negative sample sets (the sets of samples among which variance 1 is requested by VICreg). For instance, if variance (or negative sample set) is taken only across time, the variance object identity is expected to collapse. Similarly, if the variance is taken across the object identity, the variance across time can collapse. So I wonder if the failure of IL and CL is induced by the construction of the variance definition.

      We thank the reviewer for this thoughtful point. Both RPL and CL implement an implicit variance regularizer by virtue of being JEPAs (Srinath Halvagal et al., 2023), whereas IL uses an explicit regularizer computed along both the batch and time dimensions to avoid representational and dimensional collapse. The failure modes of IL and CL therefore cannot be entirely attributed to the statistics of the input samples chosen for variance regularization, but are instead primarily determined by the choice of prediction and target representations.

      What we will do: We will clarify this in the Methods section of the revised manuscript.

      (1.5) Prediction error: When compared to the recording of cortical activity in Figure 7. It is not obvious from the figure which latent space we are talking about mathematically. Is the vector z, c or the prediction error e? This is rather important from a neuroscientific point of view, because the prediction error e is expected to explain the neuronal data. On the other hand, the prediction error e is only used in the learning algorithm to define the loss function, but it is not the communication medium between the RNN units c (or with the encoder z).

      In the brain, since the measurements are recorded as neural activity, they are communication channels between specific units (z or c). It is probably c or z that would already explain the oddball prediction error. I believe that other models, like Forward-forward of Nejad et al., have tried quite hard to address this apparent tension. Whether or not this is resolved by RPL, it thinks it would be beneficial to state the problem and clarify how the algorithm addresses or ignores the issue.

      Thanks for pointing out the issue with regards to clarity and for raising the important but subtle point about prediction error representation. To answer the immediate question asking which vector we use in Figure 7, it is the vector c corresponding to the integrator representations. We agree this should be stated explicitly and will update the manuscript accordingly.

      On the more general point, we agree that the tension between recordable neural activity and the computational role of prediction errors is an important issue. We do already briefly engage with it in the Discussion (subsection “Relation to previous modeling work”), where we note that under RPL “inter-areal communication is dominated by representations rather than error signals”. However, we agree that this point should be surfaced more directly.

      To elaborate, under classical predictive coding, prediction errors are the inter-areal communication channel and are therefore expected to be directly observable in neural recordings, e.g., as oddball responses. Under RPL, this is not the case: e is computed locally within a circuit and serves only as a learning signal for synaptic plasticity, not as a signal propagated between circuits or areas. What cortex primarily encodes and communicates in our framework are predictive representations, not reconstruction errors. Accordingly, what should map onto recorded population activity are the representations c (and z), while locally computed prediction errors could in principle remain observable as more circumscribed or transient mismatch-like signals within a circuit.

      We would like to push this point further. The reviewer frames this as a tension that RPL needs to resolve, but growing neurophysiological evidence suggests that classical residual-difference prediction errors may not be a dominant mode of cortical encoding in the first place. Furutachi, Franklin, et al. (2024) showed that V1 responses to unexpected visual stimuli do not encode how input deviates from predictions, but instead selectively amplify the representation of the unexpected stimulus itself. Very recently, Furutachi and Hofer (2026) generalize this into a revised framework in which feedforward pathways transmit sensory representations modulated by prediction-error magnitude, rather than residual differences. Vasilevskaya et al. (2026) constrain the space of plausible cortical algorithms via functionalinfluence experiments, also concluding that no variant of standard predictive processing is consistent with the full pattern of layer 2/3 ↔ layer 5 interactions; they propose a JEPA-based model, citing RPL as a promising candidate. The model by Nejad et al. (2025) similarly shares with RPL the property that representations, rather than residual errors, propagate between circuit elements.

      Taken together, the apparent tension may be less a problem RPL needs to resolve than one it is well positioned to explain, remaining consistent with the emerging picture of cortex as encoding amplified sensory features rather than transmitting residual errors across areas.

      What we will do: We will add missing information to the main text and sharpen the Discussion with these arguments.

      (1.6) Successor representation without value? I believe the term successor representation is historically relevant in a reinforcement learning (RL) setting and has a precise mathematical definition. Without RL, I feel that learning successor representation is conceptually identical to learning a transition matrix (aka, a primitive world model). I therefore wonder if the pitch for high-level framing of the successor representation is appropriately described or trivial.

      The reviewer makes a valid point on the concept of successor representations. To answer the immediate question, it is not entirely trivial, as we not only observe the emergence of the transition structure (Fig. 6c), but also the encoding of decaying future (but not past) state occupancy (Fig 6d,e). We largely adapted the terminology “successor-like representations” from the study by (Ekman et al., 2023), but we will elaborate a bit further for why we stuck to it. As nicely pointed out by the reviewer, the term “successor representations” was introduced in the RL literature (Dayan, 1993), but further adopted in neuroscience to describe the idea that a neuronal population encodes a predictive representation that reflects the expected future occupancy of future states under a given policy. Ekman et al. (2023) use the term “successor-like representations” to explain the phenomena where the neural activity in V1 (and hippocampus) represent both current and (discounted) future, but not past, state occupancies in a sequence learning task with no explicitly defined policy or value training. In other words, successor-like representations are simply predictive representations.

      What we will do: To deal with this dichotomy, we will replace “successor-like representations” with the term “predictive representations” in the abstract and clarify this distinction in the Results section of the revised manuscript.

      (1.7) Learning in RNN: Learning with recurrent networks appears to be a key in this model presented here (it is in the algorithm name). Yet, this aspect of the model and the literature on biologically plausible learning rules for RNN is not really discussed.

      We thank the reviewer for raising this concern. While h-RPL is one step toward more biologically plausible and spatially local learning rules, exploring it further in terms of temporal credit assignment is beyond the scope of the present study and would require a more systematic and in-depth analysis. However, moving toward more biologically plausible learning rules is an interesting research direction that we plan to explore, as we also mentioned in the Discussion (“Limitations and future research directions”).

      We think a viable strategy could be to combine a slim spatial credit assignment strategy such as feedback alignment (Nøkland, 2016; Lillicrap et al., 2016) with an online learning rule using eligibility traces for temporal credit assignment such as SuperSpike (Zenke et al., 2018) or e-prop (Bellec et al., 2020). Similar strategies have given promising results for CLAPP (Illing et al., 2021; Zihan et al., 2026).

      What we will do: Following the suggestion, we will discuss biologically plausible learning rules for RNNs in the Discussion.

      Reviewer #2 (Public review):

      This is a very interesting manuscript, which proposes a novel idea on how cortical networks may learn useful representations of sensory stimuli. The model implementing this idea is thoroughly tested in multiple experimental paradigms. The manuscript is very clearly written. I feel it may have a significant impact on our understanding of cortical circuitry.

      Reviewer #3 (Public review):

      This paper presents Recurrent Predictive Learning (RPL), a self-supervised model conceptually similar to Joint-Embedding Predictive Architecture (JEPA) models. RPL sequentially observes dynamic scenes to predict subsequent observations. A central claim of the work is that the model’s trained representations are simultaneously invariant and equivariant to transformations, such as movement properties that emerge without explicit supervision. These representational qualities are demonstrated through three experiments utilizing two simulated datasets and one naturalistic dataset. Furthermore, the latent embeddings are qualitatively compared with neural data, showing that the model reproduces the successor representation observed in human V1 and the local/global oddball effect in the monkey Prefrontal Cortex.

      The paper addresses a fundamental question relevant to both computational neuroscience and machine vision: how the brain learns representations that are simultaneously invariant and equivariant to transformations. The manuscript is well-written, easy to follow, and supported by clear visualizations.

      While JEPA-style models have recently gained significant traction in the artificial intelligence community, this paper nicely bridges the gap to neuroscience. By framing these architectures as a theory for visual learning in the brain, the authors provide valuable insights into how predictive frameworks can explain cortical processing.

      The qualitative alignment with V1 and PFC data is a particularly strong contribution, as it offers a potential mechanistic explanation for observed neural phenomena through the lens of selfsupervised learning.

      (3.1) The central claim, that both invariance and equivariance emerge spontaneously, requires further scrutiny (see Ghaemi et al., NeurIPS, 2025; Garrido et al., arXive, 2024). In particular, the synthetic ”moving animal” dataset used in this paper may be too simple to fully support this claim. In latent space prediction, a model must predict both the scene content and the dynamics of movement. Because movement (whether ego-motion or external) is often highly uncertain (or multi-modal), predictive models in naturalistic settings often ”collapse” toward learning purely invariant representations, ignoring the hard-to-predict dynamics. In the provided simulations, the movements are extremely predictable. In more complex scenarios, the model would likely prioritize content (invariance) over dynamics (equivariance) unless aided by action-conditioning or explicit factor estimation (Zhang et al., ICLR, 2026). The authors’ results in Figure 5 using naturalistic video seem to reflect this limitation, given the lower performance on the naturalistic videos compared to the synthetic datasets.

      We thank the reviewer for the feedback. We agree that further validation on more complex datasets would strengthen the claims, and we take this point seriously. If the reviewer has any suggestions for a specific alternative dataset, we would welcome any recommendations.

      Regarding the mouse video data specifically, we realized that this is a suboptimal benchmark rather than a shortcoming of our method. The culprit presumably is that the mice remain largely stationary, leading to a heavily imbalanced velocity distribution peaked near zero (Supplementary Fig. S9). This imbalance makes equivariance evaluation unreliable regardless of the learning algorithm. For example, end-to-end supervised training results in an R<sup>2</sup> of 0.19 compared to 0.08 ± 0.02 for RPL.

      Regarding the moving animal dataset, we note that the dynamics are not trivial from an SSL perspective: unlike moving MNIST (Srivastava et al., 2015), the dataset includes changes in scale and orientation, both features that invariance-focused SSL models can easily ignore, yet RPL recovers reliably. For example, this discrepancy can be seen in Supplementary Table S1 where we compare to InfoNCE and CPC. That said, we acknowledge the reviewer’s broader concern and will seek to validate RPL on more complex datasets.

      While it would be nice to compare to related work by Ghaemi et al. (2024), this study used 3DIEBench (Garrido et al., 2023). Unfortunately, 3DIEBench’s reliance on pair-based representations with annotated but random augmentations (such as rotations or color changes) precludes the possibility of smooth latent traversals that would be required for RPL to learn from the same dataset. We will look into whether it is computationally feasible to adapt or regenerate a similar dataset that meets the requirements for temporal prediction.

      Regarding stochasticity, we agree that predictive learning in latent space is most natural in approximately deterministic settings, whereas real world sensory information often comprises non-deterministic elements. While a deeper treatment of such stochastic environments is beyond the scope of the present manuscript, it will be the focus of ongoing and future work. Regarding ongoing work, it is worth mentioning that in recent work from our group (Hauri et al., 2026), we have demonstrated that RPL’s core objective can replace the reconstruction loss in Dreamer, achieving competitive performance in complex, stochastic environments. While we did not systematically evaluate equivariance in this study, the results suggests that representation-space predictive learning is viable beyond the deterministic regime.

      What we will do: We will make the point about the real-world mouse video dataset being a poor benchmark and include the additional R<sup>2</sup> values to show that. Further, we will try to identify or generate alternative datasets to back the equivariance claims and discuss our findings in the light of previous work, e.g., Ghaemi et al. (2024). Moreover, we will sharpen our discussion of our model’s limitations in stochastic settings and highlight notable connections to related work.

      (3.2) The framing of the RPL model as an entirely new theory of representation learning is slightly overstated. The focus on prediction in representation space rather than input space is the defining characteristic of JEPA and various other Self-Supervised Learning (SSL) models, even sequential prediction. While this paper clarifies the connection between these AI frameworks and cortical circuits, the work would be strengthened by more explicitly positioning RPL within the context of existing JEPA-style models and prior SSL theories of the visual system.

      Thanks for raising this point. We are unsure what the reviewer refers to. We did not frame our work as ”an entirely new theory of representation learning,” as the reviewer suggests. In fact, we highlight quite the opposite already in the title of our article, which reads: “Understanding neural circuit principles for representation learning through joint-embedding predictive architectures.” We do not claim novelty over JEPA as an ML paradigm, we adopt it precisely because it provides a principled, non-generative framework for predictive representation learning, and our goal is to develop a circuit level instantiation that accounts for neural circuit computation. We already discuss a body of previous work of self-supervised learning and JEPAs at length. Since the reviewer did not specify what they are missing, we will briefly reiterate what is already there.

      Our contribution is a theory of representation learning in the brain, built on JEPAs as the underlying ML framework. The Title and Introduction already position our work quite explicitly this way. Specifically, we mention prior work on JEPAs (CPC, BYOL, SimSiam, I-JEPA, seq-JEPA, V-JEPA, V-JEPA 2), while noting that “most JEPAs developed in machine learning are poor models of cortical computation” because of their reliance on negative sampling, transformers, masking, static images, and/or known parametrized transformations, and motivate RPL as the minimal candidate that “must instead rely on recurrent neural dynamics, learn from streaming sensory input without masking, support both invariant and equivariant representations, and reproduce key neurophysiological observations.”

      The Discussion (“Relation to previous modeling work”) further details the specific novelties of RPL relative to existing sequential JEPA-style and SSL models like CPC (Oord et al., 2018), V-JEPA (Bardes et al., 2024), V-JEPA 2 (Assran et al., 2025), seq-JEPA (Ghaemi et al., 2024). In brief:

      RPL is a recurrent JEPA based on RNN dynamics, not transformers, and learns from streaming sensory input without masking or random negative sampling;

      It explicitly compares three prediction-error topologies (RPL vs. invariance learning vs. contextprediction; Fig. 2, Suppl. Fig. S2, S6) and shows that asymmetric recurrent prediction is essential for jointly learning invariant and equivariant representations;

      Importantly, it does so via pure temporal prediction without access to underlying transformations, a property shared by very few JEPAs. The closest exception is VJ-VCR (Drozdov et al., 2024) which uses an explicit variance-covariance regularization (VCReg) in a JEPA, which we will cite in the revised manuscript;

      It provides the first hierarchical JEPA optimizing local prediction errors at multiple levels (h-RPL, Fig. 8), as envisioned by LeCun (2022) but not previously implemented;

      It connects directly to neurophysiological data: successor-like representations in human V1 and abstract sequence representations in macaque PFC, which provides qualitative correspondence between JEPA components and cortical activity that the existing JEPA literature, focused on ML benchmarks, does not address.

      Finally, our article already includes a discussion paragraph on recent self-supervised learning models in the context of the brain where we discuss work by Nejad et al. (2025) and Asabuki et al. (2025). Most other SSL theories of the visual system rely on static images and recognition tasks (Yerxa et al., 2024; Margalit et al., 2024). However, there are two studies that include temporal prediction objectives and are worth mentioning with more details: First, Bakhtiari et al. (2021) show that representations similar to ventral and dorsal pathways in the visual system can emerge in a two-pathway encoder architecture within the CPC model. Second, Niu et al. (2024) use a “straightening” objective together with VCReg as a practical model of the perceptual straightening hypothesis (H´enaff et al., 2019). Though not a JEPA (i.e., has no predictor network), it can decode equivariant factors in a sequential MNIST dataset where only single factors change throughout a video.

      What we will do: We will carefully review our discussion of previous work and further discuss Drozdov et al. (2024), Bakhtiari et al. (2021), and Niu et al. (2024) in the revised manuscript.

      (3.3) A significant challenge in latent-space SSL is avoiding “representational collapse” (where the model provides a trivial constant output). While the paper alludes to JEPAlike solutions, it lacks a detailed explanation (in both the text and the architectural schematics) of the specific technique used to prevent collapse. Consequently, it is difficult to evaluate the authors’ claim of “biological plausibility,” as the biological equivalents of common machine learning techniques (such as stop gradient) are not discussed.

      Thanks for pointing this out. Our model avoids collapse through the asymmetric stop-grad / predictor architecture. It does not require an EMA, when the predictor learns with a faster learning rate than the rest of the network (see also our response to Point P1.3).

      The use of stop-grad suggests that a circuit learning with RPL needs to compute a vector-based instructive learning signal. While we do not explicitly model the circuit level mechanisms of how this could be implemented in the brain, excitation-inhibition balance is one possibility (Rossbroich et al., 2025). Finally, differences in learning rate can be implemented both structurally or functionally in the brain (see Liu et al. (2025) for instance), or activity normalization is suggested as a canonical computation in biological neural circuits (Carandini et al., 2012).

      What we will do: We will make sure to discuss these putative biological mechanisms in the revised manuscript.

      (3.4) Recent work has shown that the capacity (size) of the predictor significantly influences the learned representations in a JEPA-type world model (Gorrido et al., 2024). In simpler scenarios, a large enough predictor can allow a model to ”memorize” dynamics rather than learning generalized equivariant features. It would be beneficial to see how the ratio of predictor size to encoder size affects the emergence of these features.

      Thanks for raising this concern. We don’t observe noticeable difference in position and velocity decoding when changing the width or depth of the MLP predictor in the moving animals data. However, performance on rotation speed and orientation decoding scales with the changes in width, but not depth of the predictor. This analysis excludes the effect of integrator’s capacity as it directly affects the dimensionality of the representations, even though it also effectively contributes to prediction computation in RPL.

      What we will do: We will include a figure how how task performance varies with the predictor’s width and depth.

      Methodological Clarifications

      (3.5) The authors mention a contrastive learning comparison but provide few details. Since contrastive learning is primarily a technique to avoid collapse, it would be a more rigorous baseline if implemented within the same architecture as RPL to isolate the effect of the predictive objective.

      Thanks for the question. We already use the same network model as in RPL for the contrastive predictive learning (InfoNCE) baseline in Supplementary Table S1 and mentioned in the main text (l.164).

      What we will do: We will mention the architecture of the non-linear predictor used for InfoNCE baseline in Methods more explicitly.

      (3.6) In the PFC data comparison (Figure 7f), there appears to be a discrepancy where the local and global conditions show nearly identical results in PFC, while different dynamics in the model. It is unclear if this is a visualization error or a genuine model deviation.

      Thanks for picking up on this subtlety in the experimental results. To clarify, it is a model deviation but an interesting one. The local and global responses do look quite similar in the original PFC data. They differ in that the global oddball (xY|xx and xx|xY) response has a secondary peak that encodes the presence of the global oddball, whereas the initial response is actually dominated by local oddball encoding (xY vs xx). Concretely, this results in the response to the xx|xY condition only showing up weakly in the data and at a time lag with respect to the initial local oddball response. Our model, however, does not show the transient initial response to local oddballs in the decoding direction for global oddballs. In a sense, the network model encodes the global oddball concept more robustly than is seen in the PFC data. That said, whether this indicates a genuine difference in representational strategies that needs to be further accounted for, or whether it is an issue stemming from limited sub-sampling of PFC neurons, remains unclear.

      (3.7) The criteria for selecting specific model variables for comparison with V1 versus PFC are not explicitly defined. Clarification is needed on whether the same latent variables were used for both brain regions or if different layers were selected.

      To clarify, the successor-like representations in human V1 and abstract representations in macaque PFC are two different experiments, so each has different latent variables requiring different RPL models. The architecture used for each experiment is detailed in Methods and the criteria for selecting each architecture was the simplest that should work given the task complexity. Throughout the paper, all representation analysis is done on the output of integrator (c) unless said otherwise. We hope this resolves the confusion.

      References

      Chen, Xinlei et al. (2021). “Exploring simple siamese representation learning”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758.

      Srinath Halvagal, Manu et al. (2023). “Implicit variance regularization in non-contrastive SSL”. In: Advances in Neural Information Processing Systems 36, pp. 63409–63436.

      Mikulasch, Fabian A et al. (2026). Understanding Self-Supervised Learning via Latent Distribution Matching. arXiv: 2605.03517[cs.LG].

      Furutachi, Shohei, Alexis D. Franklin, et al. (Sept. 2024). “Cooperative thalamocortical circuit mechanism for sensory prediction errors”. en. In: Nature 633.8029. Publisher: Nature Publishing Group, pp. 398–406. issn: 1476-4687. doi: 10.1038/s41586-024-07851-w.

      Furutachi, Shohei and Sonja B Hofer (2026). “Rethinking Predictive Processing”. In: Annual Review of Neuroscience 49.

      Vasilevskaya, Anna et al. (2026). “A functional influence based circuit motif that constrains the set of plausible algorithms of cortical function”. In: bioRxiv. doi: 10.64898/2026.01.29.702557. eprint: https://www.biorxiv.org/content/early/2026/01/29/2026.01.29.702557.full. pdf.

      Nejad, Kevin Kermani et al. (July 2025). “Self-supervised predictive learning accounts for cortical layer-specificity”. en. In: Nat Commun 16.1, p. 6178. issn: 2041-1723. doi: 10.1038/s41467-025-61399-5.

      Ekman, Matthias et al. (Feb. 2023). “Successor-like representation guides the prediction of future events in human visual cortex and hippocampus”. In: eLife 12. Ed. by Morgan Barense et al., e78904. issn: 2050-084X. doi: 10.7554/eLife.78904.

      Dayan, Peter (1993). “Improving generalization for temporal difference learning: The successor representation”. In: Neural computation 5.4, pp. 613–624.

      Nøkland, Arild (2016). “Direct feedback alignment provides learning in deep neural networks”. In: Advances in neural information processing systems 29.

      Lillicrap, Timothy P et al. (2016). “Random synaptic feedback weights support error backpropagation for deep learning”. In: Nature communications 7.1, p. 13276.

      Zenke, Friedemann et al. (2018). “Superspike: Supervised learning in multilayer spiking neural networks”. In: Neural computation 30.6, pp. 1514–1541.

      Bellec, Guillaume et al. (2020). “A solution to the learning dilemma for recurrent networks of spiking neurons”. In: Nature communications 11.1, p. 3625.

      Illing, Bernd et al. (2021). “Local plasticity rules can learn deep representations using self-supervised contrastive predictions”. In: Advances in Neural Information Processing Systems 34.

      Zihan, Wu S et al. (2026). “Can Local Learning Match Self-Supervised Backpropagation?” In: arXiv preprint arXiv:2601.21683.

      Srivastava, Nitish et al. (2015). “Unsupervised learning of video representations using lstms”. In: International conference on machine learning. PMLR, pp. 843–852.

      Ghaemi, Hafez et al. (2024). “Seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models”. In: NeurIPS 2024 Workshop: Self-Supervised Learning - Theory and Practice.

      Garrido, Quentin et al. (2023). “Self-supervised learning of split invariant equivariant representations”. In: arXiv preprint arXiv:2302.10283.

      Hauri, Michael et al. (2026). “Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction”. In: arXiv preprint arXiv:2603.07083.

      Oord, Aaron van den et al. (July 2018). “Representation Learning with Contrastive Predictive Coding”. In: arXiv:1807.03748 [cs, stat]. arXiv: 1807.03748.

      Bardes, Adrien et al. (2024). V-JEPA: Latent Video Prediction for Visual Representation Learning.

      Assran, Mido et al. (2025). “V-jepa 2: Self-supervised video models enable understanding, prediction and planning”. In: arXiv preprint arXiv:2506.09985.

      Drozdov, Katrina et al. (2024). “Video representation learning with joint-embedding predictive architectures”. In: arXiv preprint arXiv:2412.10925.

      LeCun, Yann (2022). “A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-0627”. en. In.

      Asabuki, Toshitake et al. (2025). “Learning predictive signals within a local recurrent circuit”. In: Proceedings of the National Academy of Sciences 122.27, e2414674122. doi: 10.1073/pnas. 2414674122. eprint: https://www.pnas.org/doi/pdf/10.1073/pnas.2414674122.

      Yerxa, Thomas et al. (2024). “Contrastive-equivariant self-supervised learning improves alignment with primate visual area it”. In: Advances in neural information processing systems 37, pp. 96045–96070.

      Margalit, Eshed et al. (2024). “A unifying framework for functional organization in early and higher ventral visual cortex”. In: Neuron 112.14, pp. 2435–2451.

      Bakhtiari, Shahab et al. (2021). “The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning”. In: Advances in Neural Information Processing Systems. Ed. by M. Ranzato et al. Vol. 34. Curran Associates, Inc., pp. 25164–25178.

      Niu, Julie Xueyan et al. (2024). “Learning predictable and robust neural representations by straightening image sequences”. In: Advances in Neural Information Processing Systems 37, pp. 40316– 40335.

      H´enaff, Olivier J et al. (2019). “Perceptual straightening of natural videos”. In: Nature neuroscience 22.6, pp. 984–991.

      Rossbroich, Julian et al. (2025). “Breaking Balance: Encoding local error signals in perturbations of excitation-inhibition balance”. In: bioRxiv, pp. 2025–05.

      Liu, Peng et al. (2025). “Layer-specific changes in sensory cortex across the lifespan in mice and humans”. In: Nature neuroscience 28.9, pp. 1978–1989.

      Carandini, Matteo et al. (2012). “Normalization as a canonical neural computation”. In: Nature reviews neuroscience 13.1, pp. 51–62.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors combine discriminative auditory fear conditioning with longitudinal in vivo calcium imaging to ask how prelimbic (PL) representations of learned and generalized threat evolve across recent and remote memory time points. Using two different CS+ frequencies and a no-shock control group, they report that PL population activity tracks graded behavioral generalization, that population similarity is highest for tones eliciting strong threat responding, and that distinct subnetworks can be identified that appear to encode tone-specific sensory features versus learned threat-related response structure.

      To my knowledge, this may be the first study to comprehensively examine neural encoding of fear generalization in prelimbic cortex (PL). The manuscript is ambitious and technically interesting, and several aspects are potentially important. In particular, the suggestion that neurons showing graded, learning-related response patterns become selectively stabilized over time is intriguing. The inclusion of two CS+ training conditions and a no-shock control also strengthens the case that at least some of the reported effects are related to associative learning rather than simple sensory differences. However, in its current form, the manuscript does not yet fully support the strength of the conceptual claims. Several issues limit confidence in the interpretation, including the possibility that repeated testing itself contributes to changes across days, uncertainty about the relationship between neural activity and freezing behavior, limited quantitative documentation of longitudinal cell registration, and a number of problems in figure clarity and statistical framing. Overall, the study contains promising observations, but the claims should be narrowed, and several analyses or controls would be needed to fully support the proposed framework.

      Detailed Comments

      (1) A general concern is that the repeated test procedure itself may contribute to extinction. Because the animals are exposed to multiple CS frequencies across multiple test days, and each tone is presented three times per session, some of the reported changes in behavior and neural activity across days could reflect extinction or repeated nonreinforced retrieval rather than the passage of time per se. This is especially relevant given that the manuscript makes claims about recent versus remote representations and representational drift over 30 days. At a minimum, the authors should discuss this limitation explicitly and temper claims about time-dependent changes. Ideally, they would include a control group in which animals are tested only once or twice (e.g., at an early and later time point with fewer CS frequencies), or a reduced-frequency testing design that minimizes extinction while still allowing evaluation of recent versus remote memory.

      We agree with the reviewer that repeated testing is an inherent limitation of longitudinal memory studies and may itself contribute to some neural changes across sessions. However, several aspects of our behavioral design and results argue against extinction or repeated nonreinforced retrieval as the primary drivers of the observed effects. Importantly, discrimination ratios remained stable or increased across time rather than progressively diminishing as would be expected under extinction (this new analysis will be added to the resubmission). Nevertheless, we will address this important point in the Discussion and explicitly acknowledge that repeated retrieval may contribute to some component of the observed representational changes.

      (2) More generally, some of the reported learning-related neural differences may be driven by behavioral differences, particularly freezing, rather than by learning or generalization per se. For example, animals that freeze more to certain frequencies may show corresponding neural response differences simply because freezing alters PL activity. The authors should examine this possibility more directly. Analyses testing whether recorded cells encode freezing behavior, or whether tone frequency-related neural differences remain robust when comparing high- and low-freezing epochs, would help determine whether the reported effects reflect learned stimulus value rather than behavioral state differences.

      We thank the reviewer for raising this important point, which was also noted by the other reviewers. To address this issue, we will implement Reviewer 3’s suggested Generalized Linear Model (GLM) analysis using inferred spiking activity derived from the Ca2+ signals, with both tone identity and freezing behavior included as predictors. Because freezing behavior varies across trials whereas stimulus identity is fixed, this approach will allow us to dissociate their respective contributions to neuronal activity. If, after accounting for freezing behavior, responsive neurons continue to exhibit graded coding consistent with inferred threat value, this would strengthen the interpretation that the identified ensembles reflect generalization gradients related to aversive value rather than freezing behavior alone. Otherwise, we will adjust the conclusions according to the interpretation that freezing itself drives the generalization gradients.

      (3) A central feature of the manuscript is the analysis of neural response properties over an extended period of time, up to 30 days after learning. However, aside from a brief mention in the Methods that spatial registration was used, the manuscript provides very little quantitative information about this critical aspect of the study. The paper would be strengthened by including explicit metrics describing longitudinal cell tracking, such as the number and proportion of ROIs retained across all sessions, distributions of spatial-footprint correlations or centroid distances across days, and representative examples of matched imaging fields over time. Without this information, it is difficult to assess how strongly the longitudinal claims are supported.

      We thank the reviewer for this suggestion. We will include measures of registration quality in the resubmission.

      (4) The text states that "Figs. 1c and 1d show GCaMP6f expression in PL, representative calcium footprints, and activity traces". However, the figure as presented does not clearly show all of these elements, at least not in a way that matches the description in the Results. The correspondence between text and figure should be corrected.

      We will correct correspondence between text and Figure.

      (5) The labeling of Figure 2a is insufficient for interpretation. The legend states that the panel shows raster plots of sound responsiveness, but the axes and scaling are not clearly defined. It is not clear from the figure what the x-axis represents, whether the y-axis corresponds to individual neurons, where the CS period occurs, or what the activity scale at the right denotes. Also, the term 'rasters' implies that spikes were analyzed. It seems that the spike inference approach (CASCADE) was only used for later analyses. Perhaps 'heat-plot' would be more accurate here? Generally, this figure should be annotated more clearly so that the reader can understand it without referring back to the Methods.

      Thank you for this suggestion. We will clarify the labelling of the Figure 2a and call the graphs “activity-plots”.

      (6) In relation to Figure 3, the analysis of population-averaged responses across tone frequencies is useful, but the manuscript would be stronger with additional statistical analyses across time and across groups. For example, if the authors want to argue that learning induces graded changes in neural responses and that these evolve across time, they should directly compare within-group responses across days and also compare matched frequencies between the conditioned groups and the no-shock controls. These analyses would help establish whether the observed differences are genuinely learning dependent and whether they change significantly over time.

      We will redo the Statistics of Figure 3 to take into account the following variables: group (CS15, CS3, no shocks), frequency (3, 7, 11, 15), and day of testing (2, 15, 30).

      (7) The inclusion of two different CS+ frequencies and a no-shock control is a strength of the study and substantially improves the interpretation that graded neural responses are related to learning and generalization rather than to simple sensory processing or passage of time. That said, I am not entirely comfortable with the use of the term "inference" throughout the manuscript. What is being measured here appears closer to sensory generalization than inference in a stronger cognitive sense. The current task does not clearly require that animals infer hidden structure or stimulus value through abstract reasoning; rather, the generalized stimulus may simply be treated as similar to the conditioned cue. The terminology should therefore be reconsidered or softened.

      We thank the reviewer for appreciating the strengths of the experimental design and for this thoughtful suggestion regarding terminology. We agree that the term “inference” may overstate the cognitive processes engaged by the current task. Accordingly, we will revise the terminology throughout the manuscript to describe these effects as graded generalization of threat value across stimuli.

      (8) I also found the use of the term "valence" somewhat problematic. The manuscript appears to use valence to refer to graded responding across tones with different aversive significance, but valence typically refers more broadly to distinctions between appetitive and aversive value. Here, terms such as "threat value," "aversive value," may be more precise. The authors should consider revising this language throughout.

      We will correct the language and use “threat value”.

      Reviewer #2 (Public review):

      Summary:

      The following points are those that occurred to me across readings of the paper. They are listed in what I take to be the order of their significance. Many of the points relate to the loose use of language and invocation of concepts that are not warranted, given the study design and results obtained.

      Major Comments:

      (1) The concept of ensemble turnover is interesting - the way it is introduced and discussed implies some type of spontaneous change in the neural underpinnings of fear discrimination and generalization in the PL. But, of course, every trial involves an opportunity to learn about the threat CS or the generalization test stimuli, and I am troubled by the thought that stability in the neural underpinnings of fear discrimination and generalization will actually reflect the level of defensive behaviours evoked on different trial types and/or the discrepancy between those behaviours and the outcome of a given trial in the generalization test. That is, stability in the neural underpinnings may be related to an animal's certainty or uncertainty in the contingency between a stimulus and danger; or, put another way, an animal's confidence that danger will or won't occur given the presence of some stimulus. This is not uninteresting. It is, however, not considered anywhere in the paper, which is overloaded with references to inferred threat values and integration of information across different types of stimuli. The protocol is not one that requires inference about anything or integration across anything.

      We thank the reviewer for these important points, which we address in further detail below.

      Ongoing learning during test sessions: The reviewer correctly notes that unreinforced test presentations may constitute extinction-learning trials and that some neural changes across days could therefore reflect ongoing learning rather than spontaneous ensemble reorganization. However, new analyses indicate that extinction is unlikely to be the primary driver of our findings. Discrimination ratios do not decay over time; instead, they either sharpen or remain stable across sessions (new analyses to be included in the resubmission). These results argue against robust extinction as the primary source of the neural changes observed across sessions. This interpretation is also consistent with the strength of our conditioning protocol, which used 10 CS+ shock pairings and 10 CS− no-shock pairings specifically to minimize extinction across repeated testing sessions. Nevertheless, we acknowledge that the current design cannot fully dissociate time-dependent consolidation from retrieval-induced plasticity, and we will explicitly discuss this limitation in the revised Discussion.

      Stability reflecting behavioral consistency: We agree this alternative cannot be fully excluded. However, the cluster stability analyses assess identity at the level of response profile across all four frequencies, not response magnitude alone. Tone-selective clusters, which also show consistent behavioral correlates (firing rate correlates with threat-value, Fig. S8), do not show equivalent profile stability, suggesting that the stability of graded clusters is not simply a consequence of behavioral consistency. This point will be added to the Discussion in the resubmission.

      Language of "inference" and "integration": The reviewer is correct that responses to novel tones are consistent with graded stimulus generalization. We will substantially revise the manuscript to replace "inference" and "integration" with more precise language describing graded frequency generalization gradients.

      (2) I appreciate the link to Gu and Johansen in paragraph 3 of the Introduction, but the type of generalization under investigation here is not the same as the type of 'generalization' studied by Gu and Johansen [who used a sensory preconditioning protocol]. Nonetheless, the authors have forced the language used by Gu and Johansen into their paper, and this has created tension [at least for this reader] as the concepts introduced by Gu and Johansen [inference, integration] are simply not relevant given the generalization protocol used here. Here are a few examples of points where the tension might interfere with a reader's understanding:

      We thank the reviewer for these specific and constructive criticisms. We will revise the manuscript throughout to remove or redefine terms like "inferred valence" and "integration," replacing them with clearer, more accurate descriptions of gradient generalization of threat value. Below we address each point raised by the reviewer regarding terminology clarifications.

      (a) 'We hypothesized that generalization to novel stimuli depends on stable subnetwork organization that enables comparisons between learned and inferred valence, as well as population-level features that reduce variability across related representations.'

      I understand the words in the hypothesis, but can't form a representation of what is being said because of the reference to terms that stand in need of clarification [inferred valence, variability across related representations], but, ultimately, won't be clarified. This needs to be re-expressed so that the reader can appreciate what is being said.

      The hypothesis will be rewritten as: "We hypothesized that generalization to tones acoustically similar to the CS+ and CS− depends on the emergence of stable ensembles encoding threat value, and that population-level response similarity across stimuli would correlate with the degree of behavioral fear generalization, consistent with prior work in auditory cortex [1]."

      (b) 'Our results show that stable cortical subnetworks integrate the emotional "gist" of memory and inferred valence for novel cues over time, despite ongoing ensemble reorganization, and that population-level firing rate similarity across stimulus presentations determines threat generalization.'

      Again, what does this mean? How is the gist of a memory integrated with inferred valence for novel cues over time? The statement simply doesn't make sense. This needs to be rewritten for clarity.

      The summary statement will be rewritten: "Our results show that stable cortical sub-ensembles preserve the emotional content of the fear memory over time, despite ongoing ensemble reorganization, and that population-level firing rate similarity in response to tones associated with threat correlates with the degree of behavioral threat generalization."

      (c) 'In CS⁺15 mice, positively modulated sound-responsive neurons exhibited graded tone activity reflecting the contingency learned valence as well as the inferred valence of novel tones across testing days...'.

      Can this be rewritten as 'In CS⁺15 mice, positively modulated sound-responsive neurons exhibited graded activity to the tone CS and its variants that were used to assess generalization.'? The overloading of the text with references to 'contingency learned valence' and 'inferred valence' is unnecessary and makes it much harder to understand what has been shown in the results.

      We will adopt the reviewer's suggested rewording: "In CS+15 mice, positively modulated sound-responsive neurons exhibited graded activity to the tone CS and its variants that were used to assess generalization."

      We will systematically review the entire manuscript to ensure consistency with this revised framing.

      (3) Re the same passage of text as in 2c:

      Is it the case that these neurons are simply tracking the expression of freezing to the various tones? The same question applies to the results obtained for the CS+3 mice. If this is the case, then why should the results be taken to support the banner statement that 'Sound-modulated PL population responses encode learned and inferred valence' - these analyses do not support that statement. And, as indicated, I don't believe that the language of learned and inferred valence is appropriate to such statements, given the nature of the protocol used and results obtained. It is a study looking at how populations of neurons in the PL respond during presentations of auditory stimuli that were subject to discriminative conditioning, and during tests of generalized freezing to other [intermediate] auditory stimuli.

      The reviewer is correct that the graded population responses observed in PL could reflect freezing behavior across tone frequencies rather than encoding an abstract threat-value representation. This important concern was also raised by other reviewers. To address it directly, we will follow Reviewer 3’s suggestion and implement a Generalized Linear Model (GLM) using inferred spiking activity derived from the Ca2+ signals, with both tone identity and freezing behavior included as predictors. This analysis will allow us to dissociate the respective contributions of tone frequency and freezing to the graded neural responses. Based on the outcome of this analysis, we will revise and appropriately adjust our conclusions.

      In addition, we will revise the section heading and surrounding text to remove the terminology of “learned and inferred valence.” Instead, the findings will be described more conservatively as: “PL population responses reflect behavioral generalization to auditory stimuli following discriminative fear conditioning.”

      (4) It is stated that:

      'In no-shock controls, although both positive and negative responses were present, population activity was not modulated by tone frequency or valence'.

      What does this mean? I can understand that population activity was not modulated by tone frequency. But what does it mean to say that it was not modulated by valence? Why should it have been when none of the tones were conditioned in this group and, hence, mice were responding to all the tones equally? And given that this is true, I don't understand the use of 'valence' here, or the subsequent statements in this paragraph that 'graded responses require associative learning' and that 'PL population responses encode graded sound-valence associations that reflect both learning and inference, closely matching behavioral generalization.' The latter statement is particularly unwarranted and, again, highlights a major issue with the paper. It could and should be rewritten as 'PL population responses reflect behavioral generalization.' There is nothing in the additional language that adds to the reader's understanding of what has been shown. The reference to 'graded sound-valence associations that reflect both learning and inference' is completely unwarranted, given the nature of this study. It is anathema to the vast literature on stimulus generalization. If the authors wished to make statements of this sort, they should have taken a different approach, perhaps using protocols like those featured in Gu and Johansen.

      The reviewer is correct that controls do not form threat associations; however, these animals still could respond differentially to distinct frequencies, something that is not reflected in the data. We will correct the section indicating that distinct neutral frequencies do not produce graded responses: "graded responses require associative learning" will be retained but reframed simply as: "graded frequency-dependent population responses were absent in animals that did not receive fear conditioning." The concluding statement of the paragraph will be rewritten as: "PL population responses reflect behavioral generalization to acoustically similar stimuli following discriminative conditioning," in line with the reviewer's suggestion.

      (5) The section titled, 'Consistently active neurons preserve valence representations as newly recruited neurons sharpen remote memory traces' ends with the following summary:

      'Together, these results indicate that consistently active neurons maintain stable representations of learned and inferred sound associations across time, whereas neurons recruited after conditioning progressively acquire graded tuning at later retrieval stages. This dynamic refinement suggests that cortical memory representations become increasingly selective during systems consolidation, while a stable neuronal subpopulation preserves the core emotional content of the memory.'

      Once again, the summary is not in keeping with the results obtained. The 'dynamic refinement' of representations is far more likely to reflect the repeated testing across days 1, 15, and 30 rather than anything to do with systems consolidation - at the very least, it is the simplest interpretation of the results. The impact of repeated testing is evident in the sharpening of generalization gradients over time, which is contrary to what is otherwise observed in the literature - the incredibly well -documented broadening of generalization gradients with time. Given this impact of repeated testing, surely the changes in the neuronal population that underlie performance are more likely to reflect the learning that occurs on days 1, 15, and 30, which is reflected in reduced freezing to the non-conditioned tones. If this is a reasonable take on the results, then I don't see the basis for invoking systems consolidation at all, and I don't see the basis for inferring a stable neuronal subpopulation that preserves the emotional content of the memory. Rather, non-reinforced presentations of 'never-reinforced' tones result in recruitment of additional neurons that result in suppression of freezing responses to those stimuli.

      We respectfully disagree with the reviewer’s interpretation. While repeated testing cannot be entirely excluded as a contributing factor, several lines of evidence suggest that it cannot fully account for our observations.

      Regarding extinction: discrimination ratios between CS+ and all other frequencies either remained stable or increased over time (new analysis included in resubmission), indicating that animals continued to discriminate threat value across the testing period rather than showing the progressive suppression expected under extinction — the opposite of what we observe.

      Regarding the recruitment of new neurons: repeated non-reinforced tone exposure would be expected to produce stimulus-specific adaptation — characterized by reduced, less discriminative neural responsiveness and flatter tuning profiles [2]— not the progressive sharpening we observe. The same would be expected if these neurons represent or are associated with new extinction learning.

      Finally, sharpening of generalization gradients during repeated within-subjects testing has been reported previously [3], suggesting that successive exposures may promote more precise discrimination in some cases. Consistent with this, discrimination learning has also been shown to narrow or sharpen fear generalization gradients rather than broaden them [4], supporting the idea that discriminative conditioning enhances stimulus specificity during testing. Although we cannot exclude the possibility that more extended training could eventually broaden the generalization gradient, under the training parameters and temporal window used in our study, the data support a progressive sharpening of the gradient over time. In the revised Discussion, we will present systems consolidation as the primary interpretive framework and further elaborate on why repeated testing is unlikely to account for the full pattern of behavioral and neural findings reported here.

      (6) In the section titled, 'Population vector similarity at stimulus onset determines degree of generalization', it is stated that:

      'Because population similarity peaked shortly after stimulus onset, we quantified similarity during the first 5 s after tone onset relative to the CS⁺. In CS⁺15 mice, population similarity was highest for 15/15 and 15/11 tone pairs with no differences between them.'

      Isn't this consistent with the view that the population response in the PL simply reflects the level of freezing? Freezing to the 15-15 and 15-11 tones is most likely to be similar on their first presentation prior to the effects of extinction on the 11 Hz tone; hence the results obtained. That is, these results appear to clearly indicate that neuronal responses in the PL reflect the degree of stimulus generalization, as evidenced in freezing behavior. Given all that we know about the involvement of the PL in expressing fear responses, it is not appropriate to claim that 'population vector similarity at stimulus onset *determines* the degree of generalization. The PL responses simply reflect the varying levels of performance displayed to the different types of tones. What have I missed that could be taken to support additional statements?

      The GLM analysis described in our response to reviewers 1 and 3 will directly address the contribution of freezing. We will report these results in the resubmission and revise the interpretive language in the manuscript accordingly.

      However, regarding the analysis of population vector similarity, we need to clarify a point of confusion. The reviewer states “Freezing to the 15-15 and 15-11 tones is most likely to be similar on their first presentation prior to the effects of extinction on the 11 Hz tone; hence the results obtained”. The similarity vectors were calculated by correlating activity across all tone presentations within each testing day, not only the first two presentations. In Fig. 4, “Early” and “Late” refer to the order of a tone within a trial, which we will clarify more explicitly in the resubmission. Notably, repeated-measures analyses did not reveal any effect of the time variable (Fig. 4e,f), indicating that similarity across tone presentations remained high for tones associated with high threat value. Importantly, our data showed no evidence that responses to 11 kHz or 15 kHz in the CS15 group, or to 3 kHz in the CS3 group, exhibited extinction-like patterns at either the behavioral or neural level. Therefore, the persistence of high population similarity across time provides additional evidence against extinction as the primary explanation for our findings.

      We will remove the word "determines" from the manuscript, as our data cannot conclusively establish a causal relationship.

      Later in the same section, it is stated that 'population-level similarity at stimulus onset scales with behavioral threat generalization and is maximal for tones associated with robust threat responses.' For simplicity and, therefore, clarity, this should be rewritten as 'population-level similarity at stimulus onset reflects behavioral threat generalization.'

      We will make this correction.

      (7) In the section titled, 'Different subnetworks encode acoustic versus learned properties of sound association', it is stated that:

      'Our previous analyses show that learned and inferred associations are represented at the population level. However, these results do not resolve whether graded responses arise from pooled activity of frequency-selective neurons or from subnetworks encoding integrated learned valence across tones.'

      What does it mean to say 'integrated learned valence across tones'? As it presently stands, the meaning of the phrase is unclear. It only makes sense if one supposes that generalized freezing responses to the 11 and 7 kHZ tones reflect separate associations between those tones and the aversive foot shock US. This supposition is inconsistent with the rich literature on generalization of Pavlovian conditioned fear responses. Specifically, it is inconsistent with the many theories of fear generalization, which attribute the reduction in fear as one moves away from the specific conditioned stimulus to a decrement in the ability of the test stimulus to activate the trained CS-US association. My strong impression is that the authors would do well to ground their findings in theories of stimulus/fear generalization, of which there are many. This would better serve the results obtained [and the reader's appreciation of them] - at present, the unnecessary invocation of concepts does very little to enhance the reader's appreciation or understanding of what has been found in the study.

      We thank the reviewer for raising this point. The phrase "integrated learned valence across tones" refers specifically to a subpopulation of neurons that respond to all four frequencies in a graded manner, with response magnitude scaling according to threat value. This is distinct from tone-selective neurons, which respond preferentially to a single frequency. The neurons responding to all tones in a graded manner are present only in conditioned animals and not in no-shock controls, demonstrating that their graded response profile is shaped by associative learning.

      We agree, however, that the phrase "integrated learned valence" is unnecessarily opaque and we will replace it with more precise language: these neurons will be described as showing graded frequency-dependent responses whose magnitude scales with threat value. We believe this subpopulation represents a genuinely novel finding that complements the behavioral generalization literature by identifying a specific neural substrate for the generalization gradient within PL.

      (8) Another example of what has been a common theme in this review:

      '...we hypothesized that the PL active ensemble segregates into functionally distinct subnetworks: one encoding tone-specific sensory features with dynamic characteristics, and another responding to all frequencies encoding stable core memory content and inferred emotional valence.'

      What does it mean to say 'all frequencies encoding stable core memory content and inferred emotional valence'? Do the authors mean to say '...and another that tracks freezing/defensive responses regardless of whether they were elicited by the trained CS or one of the generalization test stimuli'?

      As stated in our previous responses, in the resubmission we will determine the contribution of freezing. If we find that freezing predicts graded neural responses, we will adjust the language of the manuscript.

      (9) It is stated that - 'Graded clusters encode emotional valence but constitute only a fraction of the active population; yet valence coding at the population level remains accurate and precise. This indicates that neurons newly recruited into the population-likely frequency-selective and organized within learning-independent clusters-can be shaped by associative processes through modulation of firing activity.'

      What does this mean? Are the authors trying to say that - 'Some clusters of PL neurons track freezing responses. In spite of the fact that these are only a fraction of the total active neuronal population, the population-level response of PL neurons also tracks the levels of fear to the trained tone and its variants used in the test for generalization.' If this is what one wants to say, then the final statement in the reproduced section does not follow. That is, there is no indication that 'neurons newly recruited into the population-likely frequency-selective and organized within learning-independent clusters-can be shaped by associative processes through modulation of firing activity.' As noted, the characteristics of other ensembles that become active across the repeated tests on days 1, 15, and 30 are more likely to reflect learning from non-reinforcement that occurs within and across those sessions. Perhaps this is what is meant by the phrase, 'shaped by associative processes'? If so, it should be stated explicitly instead of left to the reader to work out.

      We thank the reviewer for highlighting the lack of clarity in this passage and agree that the original phrasing was insufficiently precise. What we intended to convey is that only a subset of PL neurons displays graded tuning that tracks behavioral generalization across tones. Nevertheless, despite constituting only a fraction of the total active population, this graded coding is also reflected at the population level. Therefore, we suggest that neurons recruited into the active population after conditioning — likely frequency-selective neurons — contribute to the graded population responses through changes in their firing-rate activity, which is modulated by threat value (Fig. S8). We will rewrite this passage in the resubmission to make this interpretation explicit rather than leaving it to the reader to infer.

      Regarding the reviewer's suggestion that the characteristics of newly recruited neurons more likely reflect learning from non-reinforced exposures during repeated test sessions, we respectfully maintain that this interpretation is difficult to reconcile with two aspects of our data. First, graded-response neurons are absent in no-shock controls that are exposed to nonreinforced repeated testing. Second, as detailed in our responses to previous points, the progressive sharpening of population responses over time is inconsistent with what would be expected from repeated non-reinforced exposure, which would more plausibly produce broader or flatter tuning profiles.

      We agree that the phrase "shaped by associative processes" was ambiguous and will replace it with explicit language clarifying that we refer to fear conditioning as the associative process driving the emergence of graded responses, rather than any learning occurring during the test sessions themselves.

      (10) The following points all relate to the Discussion and reiterate many of the points above. 

      (a) 'A subset of neurons remains consistently active across sessions, preserving core components of the memory trace and supporting inference of emotional valence for novel sounds, while neurons recruited after conditioning progressively acquire valence selectivity at remote time points.'

      'Inference of emotional valence' is unclear and unwarranted for all of the reasons provided above regarding the use of language.

      We will modify the language as stated in the prior points.

      (b) '...Our data reconcile these views by demonstrating that cortical representations of emotional valence emerge rapidly after learning and persist within stable subnetworks, even as the broader population undergoes substantial turnover. This architecture preserves core mnemonic content while allowing flexibility in the surrounding ensemble.'

      These statements assume that the PL neuronal responses reflect something more than the levels of freezing behavior to the different stimuli; what are the grounds for this assumption?

      We will incorporate new analysis (GLM) to better address this point and conclusions.

      (c) 'Importantly, these subnetworks encode both learned contingencies and the inferred valence of novel stimuli along a graded representational axis, suggesting that strong recurrent connectivity provides a stable scaffold for emotional memory representations.'

      What is a graded representational axis, and what part of the first statement suggests that 'strong recurrent connectivity provides a stable scaffold for emotional memory representations'? If the authors' goal was to make statements about emotional memory representations vis-à-vis emotional memory content, they should have used protocols that allowed them to probe such content. The auditory fear conditioning protocol used here [followed by tests for generalization to other auditory stimuli that differ in frequency from the conditioned tone] is not one that lends itself to analysis of emotional memory representations or content.

      We thank the reviewer for this comment and agree that both phrases require clarification or revision.

      By "graded representational axis" we intended to convey that PL population activity varies systematically as a function of stimulus similarity to the conditioned tone — that is, population responses are not categorical but scale continuously with spectral proximity to the CS+. We agree this was not clearly stated and will revise the manuscript accordingly.

      Regarding recurrent connectivity, we agree with the reviewer that nothing in our data directly measures or manipulates connectivity between neurons. This statement was intended as a speculative interpretive hypothesis in the Discussion, motivated by the established literature linking strong recurrent connectivity in prefrontal circuits to stable population-level representations [5]. However, we acknowledge that invoking it in this context, without direct evidence, risks overstating our conclusions. We will revise this sentence to make its speculative nature explicit and ground it more carefully in the cited literature rather than presenting it as an inference from our own data.

      In summary, we will ensure our conclusions will be restricted to population-level coding of learned threat value and its generalization across auditory frequencies. We will revise the relevant passages in the Discussion to ensure that speculative interpretations regarding emotional memory content are either removed or clearly flagged as speculative hypotheses.

      (d) 'Dynamic tone-selective responsive neurons emerge independently of learning, as they are present in both control and experimental mice, reflecting pre-existing PL sensory-driven properties (Hockley & Malmierca, 2024; Zikopoulos & Barbas, 2006).'

      Maybe. They are also likely to have developed as a consequence of the repeated testing on days 1, 15, and 30, which involved intermixed exposures to the tones of different frequencies. That is, rather than 'pre-existing PL sensory-driven properties', the responses of these neurons might reflect the emergence of discrimination between the various tones across testing, and greater suppression of freezing to the non-trained tones compared to the trained tone across the various test intervals.

      We thank the reviewer for this point. Our interpretation that these neurons reflect pre-existing PL sensory-driven properties was based on the observation that tone-selective responses were present in control animals that never received conditioning, consistent with prior reports of sensory responsiveness in PL cortex ([6, 7]. Because these responses emerge from the first time we expose mice to the intermediate frequencies, they cannot be explained by repeated exposure. Moreover, we did not observe progressive refinement, emergence of discrimination-like changes, or suppression of responding to non-reinforced tones in control mice. This difference between conditioned and control animals indicates that repeated tone exposure alone is not sufficient to produce the observed dynamics — associative learning is necessary. We therefore maintain that the tone-selective responses of these neurons reflect pre-existing sensory-driven properties of PL cortex that are present independently of conditioning history.

      In summary, we thank the reviewer for suggesting clarifications to our interpretation, for raising the possibility that freezing behavior may contribute to graded neural responses, and for raising the question of whether repeated tone exposure may contribute to the properties of neurons recruited after conditioning. In the revised manuscript, we will include additional analyses to better dissociate the contributions of freezing behavior and tone identity, clarify passages that were insufficiently precise, and include a paragraph in the Discussion addressing potential alternative explanations alongside our own interpretation of the data.

      Reviewer #3 (Public review):

      Summary:

      Normandin et al. explore the coding of stimuli predicting an aversive event in the prelimbic cortex. Stimuli could either be explicitly paired, explicitly unpaired, or novel but with an inferred association with the aversive event (generalization). Long-term tracking of GCaMP-positive neurons allowed them to examine how coding evolves out to a month following training. In general, they found two types of ensemble codes. One was ensembles coding for each stimulus independently, but with enhanced responding to the one eliciting a freezing response. The other was ensembles that responded to all stimuli in proportion to their similarity to the stimulus paired with the aversive event, either increasing or decreasing their activation with the degree of freezing elicited by a stimulus. Importantly, this second set of ensembles was more stable across days, potentially providing a memory trace.

      Strengths:

      (1) The authors track ensembles in prelimbic cortex over long time scales, providing valuable information on the consolidation of neural codes.

      (2) Neural coding of generalization is examined, which is under-examined in the field.

      We thank the reviewer for appreciating our design to track ensembles over time and the relevance of studying the neural substrates of generalization.

      Weaknesses:

      (1) Difficult to determine if responses treated as encoding stimulus valence are driven instead by the behavior that the stimulus elicits, freezing.

      We thank the reviewer for this thoughtful and constructive comment. We agree that an alternative interpretation is that the graded-response ensembles may partially reflect freezing-related activity rather than mnemonic or salience-related representations of the conditioned stimuli themselves. In the revision, we will acknowledge that prior work has identified PL neurons that encode freezing independently of stimulus identity or associative content. Furthermore, we will implement the reviewer’s suggested generalized linear model (GLM) approach using inferred spiking activity derived from the Ca2+ signals. Specifically, we will include both stimulus identity and freezing behavior as predictors. Because freezing varies across trials whereas stimulus presentation is fixed, this analysis will allow us to dissociate the relative contributions of stimulus-related versus freezing-related activity to the graded neuronal responses. We thank the reviewer for this excellent suggestion.

      If graded stimulus coding remains significant after accounting for freezing behavior, this would strengthen the interpretation that these ensembles encode learned salience or associative properties of the stimuli rather than behavioral output alone. Conversely, if freezing explains a substantial proportion of the variance, we will revise our interpretation accordingly.

      (2) The study implies that the identified ensembles are causally related to valence memory, but no experimental interventions are performed to justify this.

      We appreciate the reviewer's point. We agree that our data are correlational in nature and that establishing a causal relationship between identified ensembles and valence memory would require experimental interventions such holographic two-photon manipulations, which are beyond the scope of the present study but represent an important direction for future work.

      To provide an indirect link between ensemble organization and behavior within the constraints of the current dataset, we will examine inter-individual variability in the revised manuscript. Specifically, we will test whether the proportion of neurons participating in stable graded-response ensembles versus dynamic stimulus-specific ensembles predicts individual differences in freezing behavior and fear generalization across retrieval sessions. If animals with a higher proportion of stable graded-response neurons show stronger discrimination and less generalization to non-conditioned tones, this would strengthen the association between ensemble organization and behavioral outcome, while remaining correlational in interpretation.

      We will modify the manuscript terminology accordingly, replacing causal language with phrasing that accurately reflects the associative nature of our conclusions.

      References

      (1) Aschauer, D.F., et al., Learning-induced biases in the ongoing dynamics of sensory representations predict stimulus generalization. Cell Rep, 2022. 38(6): p. 110340.

      (2) Kato, H.K., S.N. Gillet, and J.S. Isaacson, Flexible Sensory Representations in Auditory Cortex Driven by Behavioral Relevance. Neuron, 2015. 88(5): p. 1027–1039.

      (3) Vervliet, B., et al., Generalization gradients in human predictive learning: Effects of discrimination training and within-subjects testing. Learning and Motivation, 2011. 42(3): p. 210–220.

      (4) Dunsmoor, J.E. and K.S. LaBar, Effects of discrimination training on fear generalization gradients and perceptual classification in humans. Behav Neurosci, 2013. 127(3): p. 350–6.

      (5) Mante, V., et al., Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature, 2013. 503(7474): p. 78–84.

      (6) Hockley, A. and M.S. Malmierca, Auditory processing control by the medial prefrontal cortex: A review of the rodent functional organisation. Hear Res, 2024. 443: p. 108954.

      (7) Zikopoulos, B. and H. Barbas, Prefrontal projections to the thalamic reticular nucleus form a unique circuit for attentional mechanisms. J Neurosci, 2006. 26(28): p. 7348–61.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This important study fills a major geographic and temporal gap in understanding Paleocene mammal evolution in Asia and proposes an intriguing "brawn before bite" hypothesis grounded in diverse analytical approaches. However, the findings are incomplete because limitations in sampling design - such as the use of worn or damaged teeth, the pooling of different tooth positions, and the lack of independence among teeth from the same individuals - introduce uncertainties that weaken support for the reported disparity patterns. The taxonomic focus on predominantly herbivorous clades also narrows the ecological scope of the results. Clarifying methodological choices, expanding the ecological context, and tempering evolutionary interpretations would substantially strengthen the study.

      We have now thoroughly revised our manuscript in response to the editor and reviewer’s comments. In particular with regard to:

      (1) Sampling design: we clarified our methods section to indicate that we did not use worn or broken teeth in our initial analyses. We added the following sentence around line 690:

      “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”

      (2) Pooled versus by-tooth position analyses: we repeated the three major analyses (DTA & FEA variability through time, tooth size and variability through time, and DTA-FEA correlation through time) for individual molars (upper M1-3, lower m1-3) and select premolars (upper P3-P4 and lower p4; lower and upper p2 samples contained fewer than 5 specimens across the three time intervals, lower p3 contained only 2 specimens for the middle Paleocene, so they were excluded from the sub-partition analyses).

      For DTA & FEA variability through time (summarized as a new figure, Fig. S5, also pasted below), OPCR, DNE, and FEA trait data are supported in 78-100% of the per-tooth analyses for both the early-middle Paleocene and middle-late comparisons. By contrast, RFI and Slope data are replicated in only 22-56% of the per-tooth analyses. We qualified the main text reporting and discussion to include these sensitivity analyses so readers can assess nuances in the data when comparing pooled sample versus per-tooth analyses.

      For tooth size and variability through time (summarized in a new table, Table S3, also pasted below), we observed broad concordance in the pooled analyses and the per-tooth partitioned analyses. Different tooth positions provide strong support for different aspects of the observed trends, with the lower fourth premolar being the strongest driver of the overall trend. All of the significant trends in per-tooth analyses are in the same direction (i.e., decreasing size disparity and size mean through time) as the pooled sample. We added qualifying clarification in the text to bring attention to these refined results.

      For DTA-FEA correlation through time, we generated per-tooth correlation plots in three new figures (Figs. S9-11, only Fig. S10 shown here as an example). We observed that upper M1 patterns general reflect the trend recovered from analysis of the overall dataset, but M2 and M3 results display inconsistent DTA-FEA correlations, possibly due to small sample sizes. Lower molar patterns generally replicate those recovered in the overall analyses, but lower M1 and M2 signals appear to be stronger than those for lower M3. Finally, low sample sizes make premolar correlations unstable, with general pattern showing EP-MP strengthening then MP-LP stasis or weakening. Given these findings, it appears that the results in the pooled sample correlation plots are mainly driven by lower molar signals. It is not possible to conclude the other tooth position display different patterns because of the limited sample sizes.

      (3) Ecological scope of the study: although carnivorans and mesonychids are recorded from some of the time intervals examined in this study, our sampling choice of pantodonts and anagalids reflects the high abundance of available dental specimens in those clades, permitting us to make the strongest statistical inference given the incomplete fossil record. Additionally, all sampled taxa come from archaic clades that have not been determined to be specifically herbivorous; we included an additional paragraph in the introduction to explain this:

      “A major challenge with expanding analyses of post K-Pg recovery to Paleocene mammal assemblages elsewhere in the world is the generally stratigraphically limited nature of early Cenozoic sequences. In Asia, Paleocene localities in China represent the best studied to date[11]. From the earliest Paleocene, highly regional and endemic faunas are known from a handful of sedimentary basins (Fig. S1A). Among the faunal elements, only the archaic clades Anagalida and Pantodonta are consistently sampled across the major subdivisions of the Paleocene[11]. An additional complication with ecomorphological analysis of these early mammals is the uncertainty in their dietary ecology, as they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction. Phenomic analysis of the placental radiation supports insectivory as the ancestral diet of the hypothetical placental ancestor, but uncertainty in the post K-Pg availability of insects and plants in some regions leave some doubt as to the accuracy of this ancestral state reconstruction[1]. Herein we treat the archaic Paleocene taxa in our analyses as having generalized diets rather than categorizing them as insectivores, herbivores, or carnivores.”

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work provides valuable new insights into the Paleocene Asian mammal recovery and diversification dynamics during the first ten million years post-dinosaur extinction. Studies that have examined the mammalian recovery and diversification post-dinosaur extinction have primarily focused on the North American mammal fossil record, and it's unclear if patterns documented in North America are characteristic of global patterns. This study examines dietary metrics of Paleocene Asian mammals and found that there is a body size disparity increase before dietary niche expansion and that dietary metrics track climatic and paleobotanical trends of Asia during the first 10 million years after the dinosaur extinction.

      Strengths:

      The Asian Paleocene mammal fossil record is greatly understudied, and this work begins to fill important gaps. In particular, the use of interdisciplinary data (i.e., climatic and paleobotanical) is really interesting in conjunction with observed dietary metric trends.

      Weaknesses:

      While this work has the potential to be exciting and contribute greatly to our understanding of mammalian evolution during the first 10 million years post-dinosaur extinction, the major weakness is in the dental topographic analysis (DTA) dataset.

      There are several specimens in Figure 1 that have broken cusps, deep wear facets, and general abrasion. Thus, any values generated from DTA are not accurate and cannot be used to support their claims. Furthermore, the authors analyze all tooth positions at once, which makes this study seem comprehensive (200 individual teeth), but it's unclear what sort of noise this introduces to the study. Typically, DTA studies will analyze a singular tooth position (e.g., Pampush et al. 2018 Biol. J. Linn. Soc.), allowing for more meaningful comparisons and an understanding of what value differences mean. Even so, the dataset consists of only 48 specimens. This means that even if all the specimens were pristinely preserved and generated DTA values could be trusted, it's still only 48 specimens (representing 4 different clades) to capture patterns across 10 million years. For example, the authors note that their results show an increase in OPCR and DNE values from the middle to the late Paleocene in pantodonts. However, if a singular tooth position is analyzed, such as the lower second molar, the middle and late Paleocene partitions are only represented by a singular specimen each. With a sample size this small, it's unlikely that the authors are capturing real trends, which makes the claims of this study highly questionable.

      With regard to sampling design: we clarified our methods section to indicate that we did not use worn or broken teeth in our initial analyses. We added the following sentence around line 690:

      “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”

      With regard to pooled versus by-tooth position analyses: we repeated the three major analyses (DTA & FEA variability through time, tooth size and variability through time, and DTA-FEA correlation through time) for individual molars (upper M1-3, lower m1-3) and select premolars (upper P3-P4 and lower p4; lower and upper p2 samples contained fewer than 5 specimens across the three time intervals, lower p3 contained only 2 specimens for the middle Paleocene, so they were excluded from the sub-partition analyses).

      For DTA & FEA variability through time (summarized as a new figure, Fig. S5, also pasted below), OPCR, DNE, and FEA trait data are supported in 78-100% of the per-tooth analyses for both the early-middle Paleocene and middle-late comparisons. By contrast, RFI and Slope data are replicated in only 22-56% of the per-tooth analyses. We qualified the main text reporting and discussion to include these sensitivity analyses so readers can assess nuances in the data when comparing pooled sample versus per-tooth analyses.

      For the tooth size and variability through time (summarized in a new table, Table S3, also pasted below), we observed broad concordance in the pooled analyses and the per-tooth partitioned analyses. Different tooth positions provide strong support for different aspects of the observed trends, with the lower fourth premolar being the strongest driver of the overall trend. All of the significant trends in per-tooth analyses are in the same direction (i.e., decreasing size disparity and size mean through time) as the pooled sample. We added qualifying clarification in the text to bring attention to these refined results.

      For DTA-FEA correlation through time, we generated per-tooth correlation plots in three new figures (Figs. S8-10, only Fig. S9 shown here as an example). We observed that upper M1 patterns general reflect the trend recovered from analysis of the overall dataset, but M2 and M3 results display inconsistent DTA-FEA correlations, possibly due to small sample sizes. Lower molar patterns generally replicate those recovered in the overall analyses, but lower M1 and M2 signals appear to be stronger than those for lower M3. Finally, low sample sizes make premolar correlations unstable, with general pattern showing EP-MP strengthening then MP-LP stasis or weakening. Given these findings, it appears that the results in the pooled sample correlation plots are mainly driven by lower molar signals. It is not possible to conclude the other tooth position display different patterns because of the limited sample sizes.

      Reviewer #2 (Public review):

      Summary:

      This study uses dental traits of a large sample of Chinese mammals to track evolutionary patterns through the Paleocene. It presents and argues for a 'brawn before bite' hypothesis - mammals increased in body size disparity before evolving more specialized or adapted dentitions. The study makes use of an impressive array of analyses, including dental topographic, finite element, and integration analyses, which help to provide a unique insight into mammalian evolutionary patterns.

      Strengths:

      This paper helps to fill in a major gap in our knowledge of Paleocene mammal patterns in Asia, which is especially important because of the diversification of placentals at that time. The total sample of teeth is impressive and required considerable effort for scanning and analyzing. And there is a wealth of results for DTA, FEA, and integration analyses. Further, some of the results are especially interesting, such as the novel 'brawn before bite' hypothesis and the possible link between shifts in dental traits and arid environments in the Late Paleocene. Overall, I enjoyed reading the paper, and I think the results will be of interest to a broad audience.

      Weaknesses:

      I have four major concerns with the study, especially related to the sampling of teeth and taxa, that I discuss in more detail below. Due to these issues, I believe that the study is incomplete in its support of the 'brawn before bite' hypothesis. Although my concerns are significant, many of them can be addressed with some simple updates/revisions to analyses or text, and I try to provide constructive advice throughout my review.

      (1) If I understand correctly, teeth of different tooth positions (e.g., premolars and molars), and those from the same specimen, are lumped into the same analyses. And unless I missed it, no justification is given for these methodological choices (besides testing for differences in proportions of tooth positions per time bin; L902). I think this creates some major statistical concerns. For example, DTA values for premolars and molars aren't directly comparable (I don't think?) because they have different functions (e.g., greater grinding function for molars). My recommendation is to perform different disparity-through-time analyses for each tooth position, assuming the sample sizes are big enough per time bin. Or, if the authors maintain their current methods/results, they should provide justification in the main text for that choice.

      With regard to pooled versus by-tooth position analyses: we repeated the three major analyses (DTA & FEA variability through time, tooth size and variability through time, and DTA-FEA correlation through time) for individual molars (upper M1-3, lower m1-3) and select premolars (upper P3-P4 and lower p4; lower and upper p2 samples contained fewer than 5 specimens across the three time intervals, lower p3 contained only 2 specimens for the middle Paleocene, so they were excluded from the sub-partition analyses).

      For DTA & FEA variability through time (summarized as a new figure, Fig. S5, also pasted below), OPCR, DNE, and FEA trait data are supported in 78-100% of the per-tooth analyses for both the early-middle Paleocene and middle-late comparisons. By contrast, RFI and Slope data are replicated in only 22-56% of the per-tooth analyses. We qualified the main text reporting and discussion to include these sensitivity analyses so readers can assess nuances in the data when comparing pooled sample versus per-tooth analyses.

      For the tooth size and variability through time (summarized in a new table, Table S3, also pasted below), we observed broad concordance in the pooled analyses and the per-tooth partitioned analyses. Different tooth positions provide strong support for different aspects of the observed trends, with the lower fourth premolar being the strongest driver of the overall trend. All of the significant trends in per-tooth analyses are in the same direction (i.e., decreasing size disparity and size mean through time) as the pooled sample. We added qualifying clarification in the text to bring attention to these refined results.

      For DTA-FEA correlation through time, we generated per-tooth correlation plots in three new figures (Figs. S8-10, only Fig. S9 shown here as an example). We observed that upper M1 patterns general reflect the trend recovered from analysis of the overall dataset, but M2 and M3 results display inconsistent DTA-FEA correlations, possibly due to small sample sizes. Lower molar patterns generally replicate those recovered in the overall analyses, but lower M1 and M2 signals appear to be stronger than those for lower M3. Finally, low sample sizes make premolar correlations unstable, with general pattern showing EP-MP strengthening then MP-LP stasis or weakening. Given these findings, it appears that the results in the pooled sample correlation plots are mainly driven by lower molar signals. It is not possible to conclude the other tooth position display different patterns because of the limited sample sizes.

      Also, I think lumping teeth from the same specimen into your analyses creates a major statistical concern because the observations aren't independent. In other words, the teeth of the same individual should have relatively similar DTA values, which can greatly bias your results. This is essentially the same issue as phylogenetic non-independence, but taken to a much greater extreme.

      It seems like it'd be much more appropriate to perform specimen-level analyses (e.g., Wilson 2013) or species-level analyses (e.g., Grossnickle & Newham 2016) and report those results in the main text. If the authors believe that their methods are justified, then they should explain this in the text.

      Based on the per-tooth partition analyses we performed and reported above, the results now show that the overall trends described in the previous draft of the study is a composite of signals from different regions of the dentition. For example, the OPCR, DNE, and FEA trends persist across most tooth positions, whereas the Slope and RFI trends are mainly driven by lower fourth premolar patterns. The tooth size results are also mainly driven by lower fourth premolar patterns, but tooth disparity trends are broadly supported across tooth positions. These observations indicate that the overall trends remain valid, but there are nuances as to which tooth positions are driving which components of the trends. As such, we deem the overall results to be valid, and focused our revision on providing the nuances so readers can assess through-time patterns in more detail than in the previous version of the study.

      (2) Maybe I misunderstood, but it sounds like the sampling is almost exclusively clades that are primarily herbivorous/omnivorous (Pantodonta, Arctostylopida, Anagalida, and maybe Tillodonta), which means that the full ecomorphological diversity of the time bins is not being sampled (e.g., insectivores aren't fully sampled). Similarly, the authors say that they "focused sampling" on those major clades and "Additional data were collected on other clades ... opportunistically" (L628). If they favored sampling of specific clades, then doesn't that also bias their results?

      If the study is primarily focused on a few herbivorous clades, then the Introduction should be reframed to reflect this. You could explain that you're specifically tracking herbivore patterns after the K-Pg.

      We appreciate the reviewer’s suggestion that our sampling may have focused on putative herbivorous clades more than others. However, at the early stage of placental evolution during the Paleocene, and in particular among the endemic forms we studied from south China, it is unclear to us that such clearcut ecomorphological categories were present amongst the fossil mammals. Thus, we take a more agnostic approach and do not define the dietary categories of the sample taxa (and by extension, those of the unsampled taxa). Although we recognize that representatives of certain clades, such as Carnivora, may be more reasonably interpreted as carnivores/insectivores/omnivores and, in the current context, remains unsampled, we point out the fact that including tooth samples from rare taxa such as carnivores likely would have biased the analyses temporally. Chinese Paleocene carnivores are known only from one of the three time intervals analyzed (representing only a handful of specimens), and so would potentially inflate the disparity in that time interval relative to the others (if dentitions specialized for carnivory is assumed to be present in the Paleocene). To clarify this point, we added a paragraph in the introduction:

      “A major challenge with expanding analyses of post K-Pg recovery to Paleocene mammal assemblages elsewhere in the world is the generally stratigraphically limited nature of early Cenozoic sequences. In Asia, Paleocene localities in China represent the best studied to date[11]. From the earliest Paleocene, highly regional and endemic faunas are known from a handful of sedimentary basins (Fig. S1A). Among the faunal elements, only the archaic clades Anagalida and Pantodonta are consistently sampled across the major subdivisions of the Paleocene[11]. An additional complication with ecomorphological analysis of these early mammals is the uncertainty in their dietary ecology, as they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction. Phenomic analysis of the placental radiation supports insectivory as the ancestral diet of the hypothetical placental ancestor, but uncertainty in the post K-Pg availability of insects and plants in some regions leave some doubt as to the accuracy of this ancestral state reconstruction[1]. Herein we treat the archaic Paleocene taxa in our analyses as having generalized diets rather than categorizing them as insectivores, herbivores, or carnivores.”

      (3) There are a lot of topics lacking background information, which makes the paper challenging to read for non-experts. Maybe the authors are hindered by a short word limit. But if they can expand their main text, then I strongly recommend the following:

      a) The authors should discuss diets. Much of the data are diet correlates (DTA values), but diets are almost never mentioned, except in the Methods. For example, the authors say: "An overall shift towards increased dental topographic trait magnitudes ..." (L137). Does that mean there was a shift toward increased herbivory? If so, why not mention the dietary shift? And if most of the sampled taxa are herbivores (see above comment), then shouldn't herbivory be a focal point of the paper?

      We edited the introduction to say that “We used dental topographical traits as indicators of ecomorphological diversity[28] and examined temporal shifts in tooth crown complexity, curvature, and height and their association with tooth performance in terms of deformation resistance using topographic and simulation analyses.” And also added the following to the methods section, in order to clarify that we are using DTA as a general ecomorphological proxy, and not a direct dietary proxy.

      “Overall, we use these DTA traits as indicators of ecomorphological capacity, but do not link them explicitly to dietary categories. The craniodental morphology of archaic placental clades in general have not been demonstrated to share the same structure-function linkages as crown mammals, so the aforementioned linkages between DTA and dietary ecology in extant species only serve as evidence that DTA is a potentially useful ecomorphological proxy, without the application of those DTA-diet relationships to the Paleocene fossil mammal dataset.”

      b) The authors should expand on "we used dentitions as ecological indicators" (L75). For non-experts, how/why are dentitions linked to ecology? And, again, why not mention diet? A strong link between tooth shape and diet is a critical assumption here (and one I'm sure that all mammalogists agree with), but the authors don't provide justification (at least in the Introduction) for that assumption. Many relevant papers cited later in the Methods could be cited in the Introduction (e.g., Evans et al. 2007).

      We added the following sentence to clarify our usage of tooth crowns as ecomorphological proxies: “Teeth are among the most well-preserved parts of fossil mammals, and the fact that they interface directly with the environment through mastication makes them suitable elements for studying potential ecology-morphology linkages.”

      c) Include a better introduction of the sample, such as explicitly stating that your sample only includes placentals (assuming that's the case) and is focused on three major clades. Are non-placentals like multituberculates or stem placentals/eutherians found at Chinese Paleocene fossil localities and not sampled in the study, or are they absent in the sampled area?

      We modified the following sentence to indicate our sampling focus on placentals: “Our analyses focused on placental mammals from three of the most fossiliferous and biogeographically isolated Paleocene sedimentary sequences in paleotropical Asia: The Nanxiong, Qianshan, and Chijiang Basins in present-day south China 23–27 (Fig. S1)”

      d) The way in which "integration" is being used should be defined. That is a loaded term which has been defined in different ways. I also recommend providing more explanation on the integration analyses and what the results mean.

      If the authors don't have space to expand the main text, then they should at least expand on the topics in the supplement, with appropriate citations to the supplement in the main text.

      We replaced all mentions of “integration” with “covariation” to avoid using the loaded terminology. Covariation more accurately reflects the correlation between two sets of traits (DTA vs FEA) without invoking developmental mechanisms implied by modularity/integration.

      (4) Finally, I'm not convinced that the results fully support the 'brawn before bite' hypothesis. I like the hypothesis. However, the 'brawn before ...' part of the hypothesis assumes that body size disparity (L63) increased first, and I don't think that pattern is ever shown. First, body size disparity is never reported or plotted (at least that I could find) - the authors just show the violin plots of the body sizes (Figures 1B, S6A). Second, the authors don't show evidence of an actual increase in body size disparity. Instead, they seem to assume that there was a rapid diversification in the earliest Paleocene, and thus the early Paleocene bin has already "reached maximum saturation" (L148). But what if the body size disparity in the latest Cretaceous was the same as that in the Paleocene? (Although that's unlikely, note that papers like Clauset & Redner 2009 and Grossnickle & Newham 2016 found evidence of greater body size disparity in the latest Cretaceous than is commonly recognized.) Similarly, what if body size disparity increased rapidly in the Eocene? Wouldn't that suggest a 'BITE before brawn' hypothesis? So, without showing when an increase in body size diversity occurred, I don't think that the authors can make a strong argument for 'brawn before [insert any trait]".

      Although it's probably well beyond the scope of the study to add Cretaceous or Eocene data, the authors could at least review literature on body size patterns during those times to provide greater evidence for an earliest Paleocene increase in size disparity.

      We added a sentence in the discussion of body size during the Paleocene to note that the largest late Cretaceous fossil mammals in China are shrew- to gopher-sized, whereas the largest early Paleocene Chinese Endemic Pantodonts are dog-sized:

      “Dog-sized CEPs such as Bemalambda reached sizes not seen in late Cretaceous mammals from China such as Zhangolestes and Kryptobaatar, which are shrew- to gopher-sized [Meng 2014]”

      Reference: Meng, J. (2014). Mesozoic mammals of China: implications for phylogeny and early evolution of mammals. Natl. Sci. Rev. 1, 521–542. 10.1093/nsr/nwu070.

      Furthermore, we tempered our discussion to restrict the “brawn before bite” hypothesis to post K-Pg recovery in the Paleocene. Body size patterns shifted in the Eocene as crown clades replaced the archaic endemic clades analyzed in our study, and much larger taxa began to appear after the PETM. Such body size shift patterns are based on different clades and likely different dynamics compared to the 10-million year interval examined in our study, so we refrain from commenting on post-Paleocene times.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In regard to the DTA dataset: Was there a method used to 'fix' these teeth before dental topographic analyses were implemented? If so, this should be explicitly stated. If not, the authors should explain why broken, worn, or abraded teeth were used.

      We excluded the incomplete teeth from our analyses. We added the following sentence for clarification: “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”

      (2) The authors should explicitly explain why all tooth positions were analyzed together. Again, this is not something that is typically done, and some explanation would be helpful for readers.

      We added a paragraph in the methods section to explain both our pooled sampling approach, as well as the per-tooth analyses added in this revised manuscript:

      “Given the rarity of Paleocene fossil material from China, we combined data from different tooth positions into three pooled samples, one for each of the time intervals examined (early, middle, late Paleocene). We treated the pooled samples as representative of the range of dental topographic features and bite performance traits available to the mammal taxa under study. In this way, the variance estimates are interpreted as measures of the morphological and performance heterogeneity present in each time interval dataset. To further tease out the possibility of specific tooth positions driving the overall trends observed in the pooled samples, we also performed the DTA, FEA, DTA-FEA correlation, and tooth size through-time analyses using per-tooth data partitions.”

      (3) I think the authors should hedge their claims a bit more and recognize the limitations of their study (e.g., sample size and tooth preservation).

      We thank the reviewer for raising this important point. We carefully read through the main text and further tempered our interpretations based on the limitations of our data. Additionally, we added a paragraph in the supplemental text to summarize the major sources of uncertainty in the sample:

      “Sample and methodological limitations

      The highly fragmentary nature of early Cenozoic mammal fossils in Asia means that even the best preserved faunas studied herein contain much missing information. First, the absence of a high-resolution chronological framework prevents the fossil data from being analyzed on a continuous time axis; the binning of the samples into three main intervals within a 10-million-year period hinders additional hypotheses about the environmental and climatic correlations of the dental structure-performance results presented. Second, the uneven sampling of the available mammalian assemblage throughout the Paleocene sites in China limits the breadth of ecomorphological categories included in the analyses; rarer taxa representing more specialized carnivore, insectivore, or herbivore forms were not included in our sampling. Third, the spatial discontinuity of stratigraphically younger (Eocene) and older (Cretaceous) mammal assemblages means that body size and ecomorphological shifts bracketing the Paleocene cannot currently be analyzed alongside the dataset presented. These limitations should be taken into account when considering the interpretations made in the main text.”

      Reviewer #2 (Recommendations for the authors):

      I'm including my Line Comments here as recommendations for the authors. But note that many of my recommendations are also in my Public Review.

      L22: "3% of sites"? Do you mean 3% of global sites?

      Yes, we revised the sentence to indicate 3% of global sites. Thank you for this suggestion.

      L35: This is nitpicky because it's not crucial to your study, but I can't help but point out that the Long Fuse, etc, hypotheses are specifically about the DIVERGENCE TIMES for Placentalia and major subclades, NOT the 'adaptive radiation' of placentals like you imply in your text. Adaptive radiations include ecomorphological diversification and are driven by ecological opportunity (e.g., Schluter 2000). (Emphasis on 'ecological.') The long fuse, short fuse, and explosive models do not include an ecological component - i.e., the diversifications could have occurred without ecological diversification. Instead, for hypotheses that are specifically on the adaptive/ecological radiation of mammals, see the Early Rise, Suppression (or Dinosaur Incumbency; Benevento et al. 2023 Palaeontology), and Late Rise hypotheses (Grossnickle et al. 2019 TREE). These hypotheses apply broadly to all mammals, not just placentals (see Box 1's figure in Grossnickle et al. 2019), but they can still be applied to mammalian subclades like eutherians/placentals (e.g., see Thomas Halliday papers).

      Thank you for helping to clarify the adaptive radiation vs. divergence time concepts. We edited this sentence to mention the adaptive radiation hypotheses instead, adding in the references provided by the reviewer.

      L39-40: I think your comment is probably accurate. But keep in mind that advocates of the Early Rise and Delayed Rise hypotheses (see citations within Grossnickle et al. 2019) might argue that other time periods, other than the Paleocene, are equally or more important.

      We added a reference to Grossnickle et al. 2019 to bring attention to potential arguments otherwise. Thank you for the suggestion.

      L48: I think the inclusion of "at higher latitudes" is a little distracting or misleading and should be erased. It implies that the taxonomic diversification was ONLY rapid at higher latitudes. But many of the references that you cite include analyses at the global or continental scale (e.g., Alroy 1999, Grossnickle & Newham 2016) and don't distinguish patterns at different latitudes. If you want to keep the point about latitudes, then I recommend inserting a separate sentence on that point.

      We removed “at higher latitudes”.

      L50: Isn't "stem lineages and those with no living relatives" somewhat redundant? Or do you mean something like "stem placental/eutherian lineages and extinct placental subgroups"?

      Yes, we adopted the suggested phrasing. Thank you.

      L53: I recommend starting a new paragraph around here (maybe starting with "Distinct from ...") that focuses specifically on introducing the 'brawn before [ecomorphological trait]' hypothesis.

      Done.

      L56: "large herbivores and their predators"? Are you just referring to mammals? Wilson (2013), which you cite, and Grossnickle & Newham (2016) argued that dietary specialists were targeted at the K-Pg, but none of the herbivores were "large" (at least relative to Cenozoic herbivores). And most faunivorous mammals at the time were probably insectivorous and not preying on herbivorous mammals, besides maybe a few outlying taxa (e.g., Altacreodus, Nanocuris). I'd revise your sentence for clarity.

      We removed “disproportionately impacting large herbivores and their predators” for clarity.

      L63: I'd replace "ecometric" with "ecomorphological". Ecometrics commonly refers to using fossil traits to infer paleo environments/climate (e.g., see papers by David Polly, Michelle Lawing, etc), which I don't think is what you're referring to here. (E.g., I don't think that brain size or jaw shape patterns were/are used to infer paleo environments.)

      Revised. Thank you.

      L85: I strongly advise against making conclusions like this: "Dental height and sharpness variability ... [spiked] in the middle Paleocene corresponding to a short-lived negative excursion in global temperature." That implies that the change in dentitions is linked to global temperature changes, which I don't think your results support. Later in the text you highlight the temporal uncertainty of your time bin ages (L650) and say that the middle Paleocene bin could be as old as ~62 Ma (L646), which is well before the negative excursion (and looks to be more in line with a positive excursion!), at least according to the Figure 1 time scale (see comment below). So, I don't think that your results even support your statement.

      We reworded this sentence to say “Dental height and sharpness variability were low in the beginning and end of the time interval, with a peak in the middle Paleocene. This pattern is observed both when dentitions are considered holistically and by tooth position in the lower dentition (Fig. S5; upper teeth display the opposite pattern).”

      L144: Using variance for disparity seems fine. But keep in mind that other disparity metrics, such as range (or sum-of-ranges for multivariate data), might produce different results. For instance, variance of RFI and Slope spike in the middle Paleocene, like you point out, but based on the values in Figure 1A, it looks like the ranges stay relatively constant through the Paleocene (although I realize that the ranges might change with bootstrapping). So, your choice of disparity metric might have a big influence on your conclusions. Alternatively, you could calculate disparity using multiple metrics (e.g., Brusatte et al. 2012 Nature Communications; Grossnickle & Newham 2016 supplemental analyses), even if it's just for supplemental analyses.

      Thank you for bringing the choice of disparity measures to our attention. We conducted a parallel set of bootstrapped disparity calculation and comparison analyses using range lengths (maximum trait value – minimum trait value for a given trait) and summarized the through-time trends as for variance-based results (Fig. S5). Overall, very similar trends are observed, providing support for the variance-based data interpretation presented in the main text. We added explanation of this additional sensitivity testing both in the main text and in the supplemental text.

      L147: "body size disparity ... (Fig. 1B, S6A, Table 1, Data S5)." But I don't see disparity calculated or plotted in any of the figures/tables that you cite. You test for differences in disparity between time bins (Table 1), but that doesn't provide the actual disparity patterns.

      We generated a new figure (Fig. S8) to show the tooth size variance and range levels across time and data partitions, and modified this sentence to say that “Over the same time interval examined, body size disparity and mean were higher in the early Paleocene than in subsequent time intervals (Fig. S8, Table S3; also supported by premolar 4 and upper molar partition analyses), indicating that substantial increases in the disparity of dental complexity, curvature, and height lagged behind maximum size disparity tooth size during the Paleocene.”

      L151-153: Maybe. But you're basing this on a much narrower temporal range (Paleocene) than the brain and jaw studies, and I think those studies observed big increases in brain/jaw disparity in the Eocene, which you don't sample. And as I explained elsewhere, I'm not convinced that your results strongly support the same pattern. At a minimum, I recommend tempering your conclusions to better reflect the uncertainty of your results.

      We tempered our statements here to say that “This suggests a ‘brawn before bite’ pattern in endemic Asian mammals, partially mirroring the endocranial and jaw functional morphology patterns identified in their North American and European counterparts [21,22]. These findings raise the possibility that an initial size-driven post-K-Pg recovery followed by ecomorphological radiation was a global phenomenon, even as regional tectonic events such as the initial collision of the Indian subcontinent with Asia and Deccan Traps volcanism influenced local mammal evolution.”

      L170: I'm not well-versed in integration (and modularity) studies, so maybe this reflects my ignorance, but I had trouble understanding sentences like this: "These findings indicate that form-function malleability, the coexistence of distinct topography-performance relationships in each time and taxon partition while overall integration between the two trait groups increases between time bins, was present throughout the Paleocene." If there is space, I recommend revising and/or breaking apart long, jargon-y sentences like that (throughout the paper) so that they're more digestible for readers.

      We simplified complex sentences such as the one the reviewer noted, in order to communicate our findings and interpretations more clearly. Thank you for the suggestion.

      L183: It's probably fine to assume most placental orders arose in the Paleocene based on fossil evidence. But keep in mind that molecular studies often argue that many orders arose in the Late Cretaceous.

      We revised the statement to indicate a “Cretaceous/Paleocene” origin of many modern mammal orders.

      L200-207: Again, this might just reflect my ignorance concerning integration analyses, but I recommend expanding on this text to better explain how your integration results support this conclusion. It seems really interesting, and I like the Garden of Eden hypothesis. It's just not immediately clear to me how your results support that hypothesis. A little more background on how to interpret the integration results would be helpful.

      We expanded the discussion here to say that “Such flexibility in dental form-function linkage permits ‘mix and match’ trait combinations rather than evolutionary change as a single unit, potentially enhancing the evolvability of feeding ecological traits as new environmental conditions arose [Goswami et al. 2015]”

      Reference: Goswami, A., Binder, W.J., Meachen, J., and O’Keefe, F.R. (2015). The fossil record of phenotypic integration and modularity: A deep-time perspective on developmental and evolutionary dynamics. Proc. Natl. Acad. Sci. 112, 4891–4896. 10.1073/pnas.1403667112.

      L218: "reached maximum tooth size disparity early". Again, I don't see size disparity plotted or reported. And without baseline comparisons (Late K or Eocene), it's hard to interpret your results and evaluate what 'maximum' means (Figure 1B).

      We revised the sentence to now say “In response, Paleocene mammal clades in south China between dental topography and bite performance later, all the while maintaining high levels of variability in dental complexity and convexity (Fig. 1).”

      Figure 1A: The time scale in the top left of the figure looks off. Shouldn't the K-Pg be at 66 Ma (not 65 Ma) and the P-E boundary at 56 Ma (not ~54 or 55)?

      We revised Fig. 1 to fix the time scale so that K-Pg is at 65.5 Ma and the P-E boundary at 56 Ma. Thank you for catching this.

      Figure 1A: Is there a different y-axis scale for the variance (red line) results?

      Yes, the y axes for the variance curves were missing. We added them back in. Thank you.

      L628-629: As I explained above, it feels like you focused your sampling just on herbivorous/omnivorous groups, and, if true, this is an important point that should be discussed at the forefront of the paper. Does your sample truly represent the total ecological diversity of the mammalian faunas at the time?

      We agree with the reviewer about the potential partial sampling of the range of ecomorphological diversity when only the most abundant clades are included in the analyses. However, we refrain from interpreting the dietary groupings represented in the dataset using an assumption of functional morphology from crown/extant clades. We added a paragraph in the introduction to bring attention to the inherent uncertainty in the ecological diversity of the dataset:

      “A major challenge with expanding analyses of post K-Pg recovery to Paleocene mammal assemblages elsewhere in the world is the stratigraphically limited nature of early Cenozoic sequences that produce fossil mammals. In Asia, Paleocene localities in China represent the best studied to date 11. From the earliest Paleocene, highly regional and endemic faunas are known from a handful of sedimentary basins (Fig. S1A). Among the faunal elements, only the archaic placental clades Anagalida and Pantodonta are consistently sampled across the major subdivisions of the Paleocene 11. An additional complication with ecomorphological analysis of these early mammals is the uncertainty in their dietary ecology, as they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction. Phenomic analysis of the placental radiation supports insectivory as the ancestral diet of the hypothetical placental ancestor, but uncertainty in the post K-Pg availability of insects and plants in some regions leave some doubt as to the accuracy of this ancestral state reconstruction 1. Herein we treat the archaic Paleocene taxa in our analyses as having uncharacterized diets rather than categorizing them as insectivores, herbivores, or carnivores. “

      L653: Sorry if this is mentioned elsewhere, but did you avoid using teeth with especially worn or broken cusps? You might expand on how you chose teeth for your sample.

      We left out this detail in the original submission. Thank you for pointing this out. We had to exclude a third of the teeth because they were too worn or broken. We added the following explanation to the methods section:

      “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”

      L654: "specimens" should be "teeth", correct? In the preceding sentence, you say that there are 200 teeth from only 48 specimens.

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      General Statements

      We thank the reviewers for their thoughtful and constructive comments, which substantially improved our manuscript. In response, we have revised the text and figures throughout to address the points raised. Specifically, we have:

      i. Refined our definition of Inactivation/Stability Centers (I/SCs): We limit this designation to loci where both Allelic Expression Imbalance (AEI) and Variable Epigenetic Replication Timing (VERT) were detected, either in the present study or in previously published work.

      ii. Expanded methodological clarity: We provide detailed descriptions of how VERT regions were identified, annotated, and quantified, including thresholds for allelic imbalance, replication timing variability, and sampling depth. We also justify the ≥80% AEI cutoff, which is based on recently published studies showing that modest allelic biases can have biological and clinical significance.

      iii. Enhanced benchmarking and validation: In addition to the analysis of X inactivation in female ACP cells, we now include comparisons between imprinted and non-imprinted regions to benchmark the magnitude of allelic replication timing imbalance, demonstrating that the magnitude of imbalance observed at non-imprinted VERT regions is comparable to known imprinted regions.

      iv. Address tissue specificity and sampling limitations: We now discuss how the data derived from a limited number of clones, tissues, and individuals support the identification of robust AEI and VERT patterns.  In the future, additional tissues and individuals will be required to capture the full diversity of I/SC regulation.

      v. Clarify biological relevance: We have expanded our discussion to highlight the consistency of AEI findings across cell types, including examples of genes implicated in neurodevelopmental and neurodegenerative disorders, and we clarify our model of how I/SC regulation contributes to haploinsufficiency, variable expressivity, and incomplete penetrance in human disease.

      vi. Improved figures and supplemental data: We have updated figure legends for clarity, added a new supplementary figure benchmarking imprinted regions, added supplementary tables containing: the full description of our GO analysis, the list of I/SCs where we have detected both VERT and AEI, the ratios of the number of transcripts derived from early and late replicating alleles for the I/SCs illustrated in all figures, and we have cross-referenced all supplementary tables.

      Point-by-point description of the revisions

      Reviewer 1:

      The existence of VERT regions is well supported, but the number of regions called as ISCs may be inflated by permissive thresholds (e.g., AEI {greater than or equal to} 0.8 or {less than or equal to} 0.2 in a single clone). This risks conflating transient stochastic differences with stable ISCs.

      We selected the >80% (or <20%) allelic imbalance threshold, along with the requirement of at least one biallelic clone, as our criterion for significant AEI. This choice was guided by a recent study demonstrating that allelic imbalance, as low as a 65%/35%, is enough to effect disease penetrance in humans (Nature 2025; 637:1186–1197). For completeness, results obtained using more stringent thresholds (>90% and >95% imbalance) are presented in Supplementary Table 2.

      Furthermore, it is unlikely that transient stochastic differences in allelic expression, such as those detected by single-cell RNA sequencing assays (Nat. Rev. Genet. 2015; 16:653–664), would be captured by our approach. Each clone in our study was expanded from a single cell to over one million cells before both RNA-seq and Repli-seq analyses, effectively averaging out transient transcriptional and/or replication fluctuations, and thus reflecting stable, mitotically heritable epigenetic states.

      Reviewer 1:

      More robust approaches would include using magnitude of imbalance, annotating VERTs by genomic location, applying stricter thresholds for replication timing, and benchmarking AEI distributions against the X chromosome.

      All VERT regions identified in this study were annotated according to both the magnitude of allelic imbalance and their genomic coordinates, using 250 kb windows for the human samples and 50 kb windows for the mouse samples (see Supplementary Tables 1 and 6). Figure 1c directly compares the magnitude of imbalance, defined as outliers in the standard deviation, for both allelic replication timing and allelic expression across autosomal and X-linked loci in female ACP cells.

      In addition, we detected allelic replication asynchrony at 12 known imprinted loci, and the standard deviation of replication timing at these loci, measured in 250 kb windows, is comparable to that observed across the >350 VERT regions detected at non-imprinted sites. For comparisons, we have highlighted the imprinted regions with + symbols in Figures 1e, 2d, 3c, 6g, 7e, 7g, and we have highlighted the imprinted regions in Supplemental Table 1, and in the Data Source files. For additional comparisons, we have included Supplemental Figure 1 to illustrate the magnitude of replication timing imbalance and allele-specific gene expression at two autosomal imprinted regions.

      Reviewer 1:

      Figures and text would benefit from improved clarity: axis labels are missing in places (e.g., Fig. 1c, Fig. 2g), legends should explain chromosome arm colors, and cluttered figures such as Fig. 1j could be re-visualized for interpretability.

      Figure labels have been added to Figs. 1c and 2g, and legends modified for clarity.

      Reviewer 1:

      “…the claim of cell-type specificity is not convincingly demonstrated given the small sample size (n=4) and strong batch confounding between lymphoblastoid and cartilage progenitors.” And “Hierarchical clustering is confounded by batch and based on presence/absence calls that lack quantitative resolution.”

      We agree that the limited number of individuals and clones, as well as the comparison between only two distinct tissue types (LCLs and ACPs), have quantitative limitations. Our primary intent was to evaluate whether any I/SCs were shared between independently derived clonal cell lines from different tissues to determine whether there is evidence of tissue-specific I/SC usage, rather than to make quantitative claims about global cell-type specificity.

      To address this concern, we have replaced the hierarchical clustering analysis, in Figure 1i, with a Venn diagram that more directly illustrates the overlap and tissue-specific distribution of VERT regions detected in the different clonal sets. This revised representation avoids assumptions about clustering relationships and removes batch-driven bias, while still conveying the key observation that many VERT regions are shared across tissues and others appear tissue-restricted.

      Reviewer 1:

      While syntenic VERT regions across mouse and human are intriguing, they complicate interpretation of strong clustering by cell type. Sampling depth may also have exaggerated allelic imbalance calls.

      We note that the human LCLs used in our study are B cells, and immunoglobulin gene rearrangements were used to confirm the clonal uniqueness of each line. Similarly, the mouse replication timing data analyzed here was generated from pre-B cells, which also undergo immunoglobulin gene rearrangements. Thus, both the human LCL and mouse pre-B cell datasets were derived from B-cell lineages, providing a consistent cellular context for comparative analysis.

      Sequencing depth is an important consideration for all variant base calls. Without fully haplotype-resolved genomes, previous studies relied on calculating per-SNP calls of allelic imbalance based on reads covering a single nucleotide locus. To improve sequencing depth supporting the identification of VERT and AEI regions, we utilized haplotype-resolved genomes that allowed all informative allele-specific reads to be pooled across all heterozygous SNPs within genomic windows or expressed genes. For AEI, we set a minimum threshold of 20 informative allele-specific reads per gene, a minimum FDR-corrected p-value of <=0.05, and a minimum of 80% vs 20% allelic imbalance. Importantly, a recent study showed that allelic imbalance as low as a 65%/35% is clinically relevant in humans (Nature 2025; 637:1186–1197). We reiterate that more stringent thresholds (>90% and >95% imbalance) are presented in Supplementary Table 2.

      Reviewer 1:

      Gene set enrichment analysis should be restricted to avoid inflated significance from overly broad categories.

      Reviewer 2:

      Some of the GO terms presented are too broad to suggest any biological significance to the result, even if there is statistical significance (for example, the top term for LCL clones 'Cytoplasm' is associated with 12,000 genes, and the second term for mouse clones 'Membrane' is associated with 10,000). It would be helpful to focus on GO terms lower in the GO hierarchy.

      We now include our complete Gene Ontology analysis, with more specific biological categories, in Supplemental Table 5.

      Reviewer 2:

      Allelic imbalance has been referred to as AI, MAE (monoallelic expression), RMAE (random monoallelic expression) etc. The paper whose mouse data the authors make use of uses Asynchronous Stochastic Replication Timing (ASRT) instead of VERT to refer to the same phenomenon. Creating unnecessary jargon makes the paper more difficult to read and adds needless complexity to an already complex field.

      While we agree that allelic expression imbalance has been described by different investigators using many different phrases, we believe that MAE, RMAE and AI do not represent an accurate description of the phenomenon. In our study [and our previous study; Nat Commun. 2022; 13(1):6301] we used clonal analysis of allele-specific expression and found that while some clones display equivalent levels of expression between alleles of a given gene (i.e. bi-allelic expression) other clones express only one allele (i.e. mono-allelic expression), and yet other clones have undetectable expression (i.e. silent on both alleles). This pattern of allele-restricted expression indicates that each allele independently adopts either an expressed or silent state. Importantly, because these expression states are mitotically stable, allele-autonomous, and independent of parental origin, we refer to the choice of the expressed allele as stochastic. Given this variability, we believe that the phrase “Allelic Expression Imbalance” (AEI) represents a more accurate descriptor for this phenomenon. We also point out that “Allelic Expression Imbalance” has also been used by other investigators >120 times in the Pubmed database.

      In addition, the replication asynchrony that exists at these loci is not consistent with purely ASynchronous Replication Timing (ASRT) between alleles. We found that each allele can independently adopt either earlier or later replication timing in different clones. This variability results in some clones exhibiting pronounced asynchrony between alleles, while in others, the two alleles replicate synchronously, with both adopting either the earlier or later timing state. As reported in our previous study (Nat. Commun. 2022; 13:6301), this behavior reflects a stochastic and allele-autonomous process, leading us to describe these loci as exhibiting Variable Epigenetic Replication Timing (VERT), which we believe is a more accurate descriptor of this phenomenon.

      Reviewer 2:

      The point that allelic imbalance is enriched in VERTs would be enhanced if the authors could present the allelic ratio for all genes found in all VERTs, demonstrating how replication timing on either chromosome affects the allelic ratio.

      The stochastic nature of allelic expression and replication timing observed at VERT loci indicates that each allele independently acquires its epigenetic state. In addition, there are typically more than one transcription unit, both protein coding and non-coding, within each VERT region, and each transcription unit also acquires its expressed or silent state independently.  Therefore, the expressed or silent status of one allele of a transcription unit does not predict the replication timing or expression status of the same or opposite allele of any other transcription unit within the VERT region. Accordingly, the Early/Late pattern of replication timing that we detect, both in this study and in our previous work (Nat. Commun. 2022; 13:6301), is not correlated with which allele is transcriptionally active. This supports our conclusion that asynchronous replication timing is not a downstream consequence of monoallelic transcription, but rather an independent epigenetic feature of I/SCs. Regardless, because each transcription unit is independent, we provide the expression ratios for all transcripts that are generated from the VERT regions for the coding and non-coding transcription units in Figures 1, 2, and 6; shown in Supplemental Table 9. This analysis indicated that 4,017 informative reads were derived from the earlier replication allele and 3,161 informative reads were derived from the later replication allele, generating an allelic ratio of 1.3 (early/late) and a binomial P value of 1.0.

      In addition, a similar analysis of imprinted loci reveals that even at genomic regions with parent-of-origin–specific expression, the replication timing of each allele does not align with transcriptional activity, i.e. both early- and late-replicating alleles can be transcriptionally active, depending on the gene. This observation is consistent with the complex organization of many imprinted domains, where genes on opposite alleles exhibit reciprocal expression patterns. To illustrate this point, we now include Supplemental Figure 1 demonstrating that imprinted loci harbor genes expressed from both the earlier- and later-replicating alleles. In addition, quantification of the total number of transcripts at the DLK1/MEG8 imprinted locus (Supplementary Figure 1a-1c) indicates that the ratio of transcripts derived from the early versus late replicating alleles is equivalent (i.e. a ratio of 1.0; See Supplemental Table 9).

      Reviewer 2:

      Figure 3 highlights the association of related gene clusters with VERTs but the VERTs are assigned based on variable replication timing in just 1 or 2 clones. This is an interesting observation, but to make the point that "VERT regions frequently coincide with gene clusters in the human genome" there needs to be a systematic assessment of replication timing at all gene clusters across all clones, and a statistical test for significance.

      Our intent in Figure 3 was not to suggest that all gene clusters are subject to VERT and AEI, but rather to highlight that several well-characterized multigene families that are known to exhibit AEI, such as olfactory receptor, protocadherin, and HLA gene clusters, coincide with VERT regions at their genomic locations. These examples serve as representative illustrations demonstrating that I/SC-associated regulation occurs at established AEI loci organized in gene clusters.

      To clarify this point, we have revised the text to explicitly state that Figure 3 presents illustrative examples of known AEI-associated gene clusters overlapping with VERT regions, rather than a comprehensive or statistically exhaustive analysis of all gene clusters across the genome.

      Reviewer 2:

      It is an interesting hypothesis that VERTs are conserved between species at synentic loci. If such regions are really conserved, one would expect that replication timing at these sites would be consistently asynchronous. However the data presented shows that in human clones these VERTs can be specific to an individual donor (as in 5A) or an individual clone (as in 5H).

      As discussed in our Limitations Section, our analysis was restricted to a limited number of cell types, clones, and individuals, which may not capture the full diversity of I/SC usage across tissues and populations. While our dataset was sufficient to identify robust patterns of AEI and VERT, it likely represents only a subset of the broader landscape of I/SC regulation in both humans and mice. We anticipate that future studies incorporating a wider range of tissues, individuals, and clonal analyses will uncover an even greater degree of conservation and diversity in I/SC usage across genomes.

      Reviewer 2:

      In order to support the claim that neurodevelopmental disease associated genes reside in asynchronously replicating regions, and are thus more prone to allelic imbalance, the authors would need to demonstrate this phenomenon in neuronal cells.

      We make two points that address this critique: First, many of the neurodevelopmental disease genes located within or adjacent to VERT regions are not exclusively expressed in neuronal cells and have previously been shown to exhibit AEI in non-neuronal contexts. For example, Gimelbrant and Chess (Science, 2007; 318:1136–1140) demonstrated AEI of the Parkinson disease genes SNCA and LRRK2 in lymphoblastoid cell lines (LCLs), and in our previous study, we detected AEI of DNAJC6, another Parkinson disease gene, also in LCL cells (Nat. Commun. 2022; 13:6301). In the present study, using cartilage progenitor cells, we identified VERT and AEI of several epilepsy-associated genes, including SCN1A, SCN2A (Fig. 6b), GABRA1(Fig. 6e), and SAMD12 (Fig. 6j), as well as a gene implicated in autism and neurodevelopmental disorders, SEMA5A (Fig. 5c), indicating that these genes are not exclusive to neuronal cell types.

      Second, independent studies from the Dr. E. Heard laboratory have provided further evidence that AEI occurs in neuronal lineages. Using mouse neural progenitor cells (NPCs), they identified genes subject to AEI (Dev. Cell, 2014; 28:366–380) and they later evaluated AEI of syntenic human neurodevelopmental disease genes, including Snca, App, Eya4, and Grik2 (Nat. Commun. 2021; 12:5330). In addition, and consistent with our use of AEI, they used the phrase “Allelic Expression Imbalance” to describe the epigenetic expression biases at these genes.

      Together, these findings reinforce that AEI, and by extension I/SC regulation, is not restricted to specific cell types, but rather represents a generalizable mechanism of stochastic epigenetic regulation that includes genes relevant to neurodevelopment and disease.

      Reviewer 2:

      However, the authors consistently lean on thin evidence (i.e. a single clone) within a modestly sized dataset (4 clones from 2 donors each) to propose a new model for haploinsufficiency in human disease. The consistent focus on limited elements in the data and perhaps an overreach in the interpretation makes it difficult to appreciate what is in fact a very good experiment.

      We agree that our analysis was conducted on a modest number of clones and individuals, which we explicitly acknowledge as a limitation of the present study. However, several key points support the robustness and broader relevance of our conclusions:

      i. Clonal Design and Replication: The strength of our approach lies in its clonal resolution. Each clone represents a single-cell–derived population expanded to over a million cells, enabling direct detection of stable, mitotically heritable allele-specific epigenetic states that would not be apparent in population-averaged data. Importantly, many of the VERT regions we identified are shared between independent clones from different donors and across distinct cell types (ACP and LCL), demonstrating reproducibility and biological consistency.

      ii. Cross-Species Validation: We further identified syntenic VERT regions in mouse pre-B cell clones, including at loci known to exhibit AEI in prior studies, providing independent validation and evolutionary conservation of the phenomenon.

      iii. Integration with Published Evidence: Our findings extend prior observations of AEI and VERT (e.g. Gimelbrant et al. Science 2007; Heskett et al. Nat. Commun. 2022) and are fully consistent with known stochastic allelic expression imbalance of autosomal genes. We also draw parallels with the absence of cellular selection mechanisms that dictate dominant inheritance patterns for loss of function alleles for X linked disease genes (reviewed in: J Clin Invest, 2008, 20-23; and Nat Rev Genet. 2025, 26, 571–580). Our proposed model linking I/SC regulation to haploinsufficiency is therefore a synthesis of our results with an extensive body of published data, not an inference drawn from isolated observations.

      iv. Scope and Framing: We have revised the manuscript to clarify that our proposed model represents a mechanistic framework, not a definitive or exclusive explanation, for how stochastic allelic regulation could contribute to dosage-sensitive disease phenotypes. We also explicitly discuss the need for larger datasets and additional tissues to refine and test this model.

      In summary, while we recognize the limited sampling depth inherent to clonal analyses, the consistency of our observations across donors, cell types, and species, together with prior corroborating studies, supports the validity of the conclusions and justifies the broader conceptual implications.

    1. Author response:

      General Statements

      We were pleased to see that all three reviewers support publication after revision. No one questions the premise that cell size influences ferroptosis susceptibility. The main concerns fall into two categories: (A) disentangling “Cell size vs cell cycle”, which is the biggest issue for Reviewer #1 and partially for #3. (B) Additional mechanistic tests including SLC7A11 and ferritin functional tests (Reviewer #2) and lysosomal iron (via LysoRhoNox) and some further ACSL4 experiments (Reviewer #3). Other reviewer concerns are more minor.

      In our revision, we have addressed the reviewer’s specific criticisms with additional experiments as described below. We believe the constructive feedback from peer reviews helped us to significantly extend our mechanistic findings and strengthen the manuscript through revision.

      Point-by-point description of the revisions

      Reviewer #1:

      Summary:

      The study by Zatulovskiy et al. examined how cell size influences cell susceptibility to ferroptosis. The authors found a size dependence specifically for ferroptosis-inducing drug Era2, but not for other drugs. Using various human cell lines (HMEC, HT 1080, RPE 1), the authors generated populations of small and large G1 cells by FACS, CDK4/6 inhibition (palbociclib), or inducible cyclin D1 knockdown, and measured cell susceptibility to ferroptosis. Larger cells were more resistant than smaller cells. Mechanistically, larger cells showed reduced plasma membrane lipid peroxidation, higher glutathione concentrations, and changes in relevant cellular proteins levels, as analyzed using previously published data. Deleting ACSL4, which is involved in ferroptosis, partly eliminated the size dependence of ferroptosis. The work concludes that cell size is a key determinant of ferroptosis susceptibility.

      My major concerns about this work focus on whether many of the results reflect cell size or cell cycle effects, and whether the FACS-based size-scaling analyses have some misleading features to their design & presentation. If these concerns can be addressed with new experiments, then the conclusions of this paper are justified. If these concerns cannot be addressed, then the authors should more directly acknowledge the alternative hypothesis that cell cycle effects may explain many of their results.

      The experiments seem to be replicated sufficiently, and most conclusions rely on data from multiple cell lines. My minor comments focus on needs to provide statistics and method details, and on suggestions on how to improve text clarity, but these edits are easily done and don't require new experiments. Overall, this is an interesting study, and it should be published once the concerns below are addressed.

      Major comments:

      In experiments reported in Fig 1 and 2A, the authors sort small and large cells in G1, plate them, and later start the drug treatments & cell monitoring. Are these cells actively cycling (progressing in the cell cycle), and how fast? The large cells are likely to enter S phase earlier than the small cells, so by the time that the authors start their drug treatments, they may be comparing cells in different cell cycle stages, which could influence drug sensitivity more than cell size (as the authors also suggest later in Fig 2). This needs to be controlled for.

      Furthermore, even if the cells remain in G1 after sorting until the drug treatments are started, the authors should address the fact that the drugs are present for a long time, thus targeting the cells in various cell cycle stages.

      We agree with the reviewer that the cell cycle stage could affect ferroptosis susceptibility and could be a confounding effect in asynchronous cells. One of us (Dixon) reported the cell cycle effects on ferroptosis previously, and we observe them in this manuscript too (Fig. 2B,C,E). We now state this more clearly both in the Results and in the Discussion sections, where we write:

      Line 159: “We note that non-arrested cells had a lower susceptibility to Era2-induced ferroptosis compared to cells that were arrested in G1 for 2-3 days, despite being smaller in size. This is likely due to the difference in the fraction of cells in different cell cycle phases between arrested and non-arrested conditions since cells in S/G2/M phases are known to be more resistant to ferroptosis than cells in G0/G1 phases (Rodencal et al, 2024; Kuganesan et al, 2023)”

      Line 533: “Cells in G1 phase of the cell cycle were reported to be more susceptible to ferroptosis (Rodencal et al, 2024; Kuganesan et al, 2023), which suggested that ferroptosis inducers could be used in combination with cancer drugs, like the CDK4/6 inhibitor palbociclib, that arrest cells in G1 phase of the cell cycle (Herrera-Abreu et al, 2024). However, while CDK4/6 inhibitors arrest cells in G1, they do not inhibit cell growth, such that the longer they are arrested, the larger the cells grow (Lanz et al, 2022; Crozier et al, 2023; Manohar et al, 2023). This results in a complex, nonmonotonic ferroptotic response dynamics in cells treated with CDK4/6 inhibitors (Fig. 2B,E). Just following CDK4/6 inhibitor treatment, as more and more cells are arrested in G1 phase, cells become more sensitive to both RSL3- and erastin-induced ferroptosis (Kuganesan et al, 2023; Rodencal et al, 2024). However, the longer the cells are arrested, the larger they become, which further promotes their susceptibility to RSL3 (Fig. S1B) but reduces their susceptibility to Era2-induced ferroptosis (Fig. 2B). The fact that the cell cycle arrest and cell size increase have opposing effects on Era2-induced ferroptosis susceptibility could explain why different studies reported seemingly contradictory results, where sometimes an increased and sometimes a decreased or unchanged sensitivity to system x<sub>c</sub><sup>-</sup> inhibitors was observed depending on the cell type, duration and type of cell cycle arrest (Lee et al, 2024; Kuganesan et al, 2023; Rodencal et al, 2024). Such complex interplay between the cell cycle and cell size effects on ferroptosis suggests that combination therapies utilizing CDK4/6 inhibitors and ferroptosis inducers would have to carefully choose a dosage schedule.”

      Given the potentially confounding effects of the cell cycle in cycling cells sorted by size, we performed an additional experiment, in which RPE-1 cells were pre-treated with the CDK4/6 inhibitor palbociclib to synchronize them in G1 phase prior to treatment. These cells were then continuously exposed to palbociclib during the Era2 treatment (Fig. 2C-E). RPE-1 cells pretreated with palbociclib for 2 and 4 days had the same cell cycle distribution with 94% of cells being arrested in G1, but with different sizes. Cells treated with palbociclib for 4 days were significantly larger and more resistant to Era2.

      Additionally, in the experiment shown in Fig. 5E,F, where we FACS-sorted WT and ACSL4 KO HMEC cells by cell size, and then measured Era2 susceptibility, we pre-treated the cells with palbociclib for 24 h to synchronize them in G1 prior to the sorting. We then cultured the cells in the presence of palbociclib during the Era2 treatment to avoid the cell cycle effects observed in Fig. 2. In this case, we still observe that larger cells are more resistant to Era2, consistent with our conclusion that cell size protects against Era2-induced ferroptosis.

      Can the G1 arrest-driven changes in drug susceptibility (Fig 2 C-D) be attributed to cell size? Can the authors rescue the palbociclib treatment with rapamycin or other growth inhibitors that allow size to remain small during G1 arrest?

      We have attempted to perform these experiments, but when we co-treated the cells with palbociclib and mTORC inhibitors, but observed variable results, which are likely due to the fact that prolonged mTORC inhibition itself rewires cellular metabolism and reduces cell susceptibility to ferroptosis, as one of us (Dixon) found previously (Armenta et al. (2022), Ferroptosis inhibition by lysosome-dependent catabolism of extracellular protein. Cell Chemical Biology 29: 1588-1600.e7). Our results were consistent with this previous report and is now included in a new supporting figure panel (Fig. S3C):

      Thus, upon palbociclib+rapamycin co-treatment there seems to be a competition between cellsize-mediated and metabolism-mediated effects of mTORC inhibition on ferroptosis, which leads to variable outcomes.

      In Fig 2E-F, is the cell cycle distribution of the samples influenced by CCND1 shRNA induction? Are the drug sensitivity effects due to cell size or cell cycle changes?

      The CCND1 manipulation model is extensively characterized in our recent work cited in this manuscript (You et al. (2025), Cell size-dependent mRNA transcription drives proteome remodeling. 2025.10.30.685141 doi:10.1101/2025.10.30.685141). Indeed, CCND1 shRNA cells have a slightly elongated G1 phase due to a ~30% reduction in Cyclin D1 concentration: the G1 fraction changes from ~70% in wild-type to ~80% in CCND1 shRNA cells, which could potentially affect the ferroptosis susceptibility, but the additional results obtained on synchronized RPE-1 cells, described above (Fig. 2C-E), support the conclusion that the primary effect on Era2 sensitivity is due to cell size.

      Can the authors address the meaningfulness of the FACS-based size-scaling results in cases where cell-to-cell variability is very large? For example, in Fig 4D&G, the results are so variable even in identically sized cells that the importance of the size-scaling pattern seems questionable.

      We do observe variability in fluorescent probe-based measurements of GSH and lipid oxidation, which could be due to biological (natural cell heterogeneity) and/or technical (low sensitivity of the probes) reasons. However, when we look at binned data and compare the mean values ± s.e.m. for each bin, we observe a robust and reproducible trend (black line with dark-grey shaded area), even though the SD is quite broad (lighter shaded area). We believe such trends are meaningful when describing cell death in probabilistic terms as we do. I.e., the GSH measurement might not be precise enough to predict cell death for a given individual cell, but the statistical trend is clear and these measurements help predict cell death probabilities for cells of different sizes.

      In Figs 4B-D, the cell size axis seems to have over 4-fold size variability, but when the authors show the analysis of this data (Figs 4E-G) the variability is only 2-fold. What was excluded and on what basis?

      To address this point, we have now clarified in the Methods section how the data were processed and what data points we excluded from this analysis:

      Line 671: “For all binned flow cytometry data plots, the cells below the 2nd and above the 98th cell size percentiles were excluded to remove the extreme outliers. Then, the remaining data were binned by size and plotted as background-corrected average fluorescence intensity for each bin against the bin’s average cell size. Bins with fewer than 200 cells were excluded from the analysis to reduce noise.”

      Typically, such pre-processing reduces the size range, mostly from the large-cell end, because of the long right tail of the size distribution containing a few very large cells.

      Based on the methods section & figure legends of Fig 4B-I, the RPE cells were not pre-sorted to include only G1 cells, nor did the assay account for cell cycle differences. How can these data be used to explain results from earlier figures, where analyses were exclusively focused on size differences in G1?

      This is a valid point: Cells in the GSH measurement experiment were not gated by Hoechst signal for G1 phase because the channel normally used for Hoechst staining was in this case occupied by the MCB probe. However, given the data in Fig. 4A,B showing that the GSH production machinery is superscaling when measured specifically in G1-phase cells, we believe the flow cytometry data in Fig. 4C-J showing GSH concentration increasing with cell size across the whole cell cycle is very likely true for G1 cells as well.

      Minor comments:

      I recommend clarifying in the early introduction that all size changes discussed are in the absence of DNA content increase.”

      We have now clarified this in the introduction (Line 41 and Line 81).

      The introduction seems to cite primary research and review paper in the same sentences, which is a bit misleading as the reviews don't seem to add new evidence.

      We have removed review citations where they did not provide additional context.

      OPTIONAL

      In the second introduction paragraph, consider the classification/description of the three different mechanisms. Currently, it seems that these mechanisms are not independent of each other, and the details provided about each mechanism are inconsistent.”

      We have now modified this paragraph to make the description more consistent.

      Please provide statistics for the IC50 values reported based on Fig 1C. Were small and large cells statistically different? Are the IC50 values reported as +/- standard deviation or some other metric?

      This has now been clarified in the text as follows:

      “For example, at the 72 h time point, the Era2 IC50 was 28 ± 11 µM (mean ± SD) for large cells versus 2.0 ± 1.4 µM for small cells (Student’s t-test: p = 0.039) (Fig. 1C).”

      Providing more insight into why Era2 and RSL3 treatments yield more opposite responses would be of great interest to the field.”

      We agree this is an important point that should be discussed in more detail. In the field of ferroptosis, context-dependent (i.e., cell type-specific) effects are common and multiple groups including our own (Dixon) have published extensively on genes and mechanisms that can lead to differences between erastin2 and RSL3 sensitivity. For example, there are studies showing that the mTOR pathway or the p53 pathway can either prevent or promote ferroptosis, depending on the cell type and/or other currently unknown variables. To address more specifically the differences between Era2 and RSL3 in the context our observed cell-sizedependent response, we have now added more data and discussion. In the Results section we added panel 4B and the following text:

      Line 359: “While the upregulation of GSH biosynthesis may promote the resistance of larger cells to ferroptosis, such an upregulation alone cannot explain why larger cells become more resistant to ferroptosis induced by the cystine import inhibitor Era2, but not, for example, by the GPX4 inhibitor RSL3 (Chan et al, 2025) (Figs. 2B, S1B). We found previously that upon mTORC1 inhibition cells can evade cystine deprivation-induced ferroptosis by uptake and catabolism of cysteine-rich extracellular proteins, mostly albumin (Armenta et al, 2022) (Fig. S3C). This process involves albumin degradation in lysosomes, predominantly by cathepsin B (CatB), and subsequent export of cystine from lysosomes to fuel the synthesis of glutathione. Large cells undergo proteome rearrangements similar to those occurring upon mTORC1 inhibition (Zatulovskiy et al, 2022). This suggests that large cells may upregulate CatB expression to bypass the Era2-induced cystine import inhibition via system xc-. To test this hypothesis, we used flow cytometry to measure how the expression of cathepsin B and the system xc- cystine/glutamate transporter SLC7A11 (xCT) scales with cell size (Fig. 4B). We found that SLC7A11 concentration modestly decreases, while CatB concentration significantly increases with cell size (Fig. 4B). This shift in the ratio between SLC7A11 and CatB supports the hypothesis that larger cells may rely less on cystine import via system xc- and thus become more resistant to system xc- inhibition by Era2.”

      Additionally, in the Discussion we added the following:

      Line 578: “We show that large cells may become resistant specifically to Era2 but not RSL3 through the upregulation of lysosomal function, particularly cathepsin B expression, which enables the uptake and catabolism of cysteine-rich extracellular proteins. A size-dependent shift in the ratio between SLC7A11 and cathepsin B makes large cells less dependent on cystine import via system xc-, and thus, more resistant to Era2. In addition to this, it was reported that RSL3 can induce ferroptosis independently of GPX4 and may target other selenoproteins (DeAngelo et al, 2025; Cheff et al, 2023), which could also contribute to the difference in sizedependent responses to RSL3 and Era2.”

      Is the BODIPY-C11 labeling specific to plasma membrane, as suggested by the writing of the authors, or do the results shown integrate signals over all cell membranes?

      We thank the reviewer for pointing this out. BODIPY-C11 581/591 stains many membranes in the cell, not just the plasma membrane. We have changed the wording in the manuscript to reflect this.

      How exactly is gating done for the flow cytometry samples? Especially when analyzing size-scaling, the results are likely to be sensitive to outliers, such as those seen in Fig 4C (a subpopulation of very low CFSE stained cells). Can the authors clarify their methods and/or display supplementary figures with gating examples?

      We have now specified our gating strategy in the Methods section (Line 663) and added a corresponding Supplementary Figure S5.

      In Fig 4, total protein staining was used as a control, whereas Fig 5B b-actin was used as a control. Why did the authors rely on different controls approaches for essentially the same measurements? Are these controls comparable?

      In our flow cytometry experiments, we consistently use live-cell total protein stain (CFSE) for live cells, and anti-Tubulin immunofluorescent staining for fixed cells, both of which scale in proportion to cell volume and act as a read-out for total cellular protein content (Lanz and Zatulovskiy et al., Mol Cell 2022; Berenson et al. MBoC 2019), which we use to calculate concentrations of other cellular components (analogous to loading controls). In Fig. 5B, betaActin is used as a reference - a protein whose concentration does not change with cell size, as opposed to ACSL4 whose concentration decreases with cell size. In this plot, both ACSL4 and beta-Actin amounts were normalized to alpha-Tubulin, which is analogous to a concentration calculation using loading control. This is now explained in more detail in the Figure legend.

      Reviewer #1 (Significance):

      I work in the cell size research field, and I am familiar with other related works in this field. My evaluation reflects a specialist's view of this study. Overall, this study will be of a large interest to a small group of specialists, and specific aspects of the work will also gain some interest from broader basic research audiences studying mechanisms of drug responses and ferroptosis in general. However, I do not see this work gaining very broad interest across larger audiences, simply because the field of cell size research is not of broad interest, and this is not a landmark study for the field.

      The field of cell size research has long searched for size-dependent functions, as these could help explain why cell size matters. This study is a nice addition to our field, helping establish ferroptosis as a size-dependent function. However, the significance of this work relies on how clearly the authors can establish that their results are cell size rather than cell cycle effects (see major comments above). Should the authors address these concerns, then this study will provide some conceptual and mechanistic insight.

      Regarding mechanistic insights, this work is in stark contrast to a recent study about sizedependency of ferroptosis (https://doi.org/10.1016/j.isci.2025.112363), where increased cell size heightened sensitivity to the GPX4 inhibitor RSL3, thus suggesting an opposite conclusion than what the authors observed with the drug Era2. The authors examined this contradiction, and while their results with the drug RSL3 agreed with the recent study, they did not explain why different drug mechanisms yield opposite results. Providing more insights into this discrepancy would increase the impact of this work.

      Regardless of the impact of this work, I want to emphasize that I am fully supportive of seeing this work published once the technical concerns have been addressed. Our field will benefit from this work, and this work could catalyze important future research. The general topic studied here has the potential to become very important.

      We thank the reviewer for their thoughtful assessment and for supporting publication pending resolution of the technical concerns. We respectfully disagree that our audience is likely narrow: Reviewer #2 noted broad relevance to specialists in cell death/ferroptosis, redox biology, cancer biology, aging, and translational efforts in ferroptosis-based therapies, and Reviewer #3 similarly emphasized both cell size and ferroptosis/cell death communities. We therefore believe the work will be of interest across multiple active fields, particularly because it highlights how cell size heterogeneity can shape drug response.

      We agree that the significance hinges on clearly distinguishing cell size from cell-cycle effects, and we have strengthened the corresponding controls/analyses and adjusted language accordingly (see responses to major comments above). We also addressed the reported discrepancy between Era2 and RSL3 size-dependencies by adding new data (Fig. 4B) and expanded discussion. We very much hope that the reviewer appreciates the efforts we have made to strengthen this manuscript and resolve the technical concerns. For these reasons, we believe this work will have an impact on several fields and gain a broad readership.

      Reviewer #2:

      Zatulovskiy et al. demonstrate that cell size modulates susceptibility to ferroptosis, a form of iron-dependent cell death driven by lipid peroxidation. Using human cell lines (HMEC, HT-1080, RPE-1), the authors examined cell size through FACS sorting, CDK4/6 inhibition and inducible cyclin D1 knockdown. They found that larger cells are more resistant to ferroptosis induced by system xc<sup>-</sup>⁻ inhibition (erastin2), but more sensitive to GPX4 inhibition (RSL3), highlighting pathway-specific size dependencies.

      Mechanistically, larger cells exhibited:

      - Higher glutathione levels, supporting lipid peroxide detoxification

      - Increased ferritin expression, promoting iron sequestration

      - Lower ACSL4 levels, reducing incorporation of peroxidation-prone lipids

      These findings were supported by high-throughput microscopy, flow cytometry (BODIPY-C11 lipid peroxidation assays), and proteomic analyses. The study concludes that cell size influences proteome composition and metabolic capacity, thereby shaping cell death decisions, an insight with implications for aging, cancer, and ferroptosis-based therapies.

      Major Comments

      (1) Direct evaluation of SLC7A11 abundance and function is needed

      The opposite size-dependent effects of erastin2 and RSL3 strongly suggest a role for SLC7A11/system xc<sup>-</sup> activity in size-dependent ferroptosis resistance. However, SLC7A11 levels were not quantified due to insufficient peptide detection in the proteomic data. o Direct measurement of SLC7A11 protein levels (immunoblotting or flow cytometry) in small vs large cells would test whether its expression scales with size.

      a) Functional perturbation (siRNA/CRISPR knockdown) followed by erastin2 treatment would provide mechanistic validation. o Use of additional SLC7A11 inhibitors (e.g., sulfasalazine, sorafenib) could further test whether the size resistance phenotype is xc<sup>-</sup>-specific.

      We agree that the difference in size-dependent responses to RSL3 and Era2 is an important point that needs further investigation and discussion, as other reviewers also pointed out. To address more specifically the differences between Era2 and RSL3 in the context of cell-sizedependent response, we have now added more data and discussion. In the Results section we added panel 4B measuring SLC7A11 and Cathepsin B scaling with cell size and the following text:

      Line 359: “While the upregulation of GSH biosynthesis may promote the resistance of larger cells to ferroptosis, such an upregulation alone cannot explain why larger cells become more resistant to ferroptosis induced by the cystine import inhibitor Era2, but not, for example, by the GPX4 inhibitor RSL3 (Chan et al, 2025) (Figs. 2B, S1B). We found previously that upon mTORC1 inhibition cells can evade cystine deprivation-induced ferroptosis by uptake and catabolism of cysteine-rich extracellular proteins, mostly albumin (Armenta et al, 2022) (Fig. S3C). This process involves albumin degradation in lysosomes, predominantly by cathepsin B (CatB), and subsequent export of cystine from lysosomes to fuel the synthesis of glutathione. Large cells undergo proteome rearrangements similar to those occurring upon mTORC1 inhibition (Zatulovskiy et al, 2022). This suggests that large cells may upregulate CatB expression to bypass the Era2-induced cystine import inhibition via system xc-. To test this hypothesis, we used flow cytometry to measure how the expression of cathepsin B and the system xc- cystine/glutamate transporter SLC7A11 (xCT) scales with cell size (Fig. 4B). We found that SLC7A11 concentration modestly decreases, while CatB concentration significantly increases with cell size (Fig. 4B). This shift in the ratio between SLC7A11 and CatB supports the hypothesis that larger cells may rely less on cystine import via system xc- and thus become more resistant to system xc- inhibition by Era2.”

      Additionally, in the Discussion we added the following:

      Line 578: “We show that large cells may become resistant specifically to Era2 but not RSL3 through the upregulation of lysosomal function, particularly cathepsin B expression, which enables the uptake and catabolism of cysteine-rich extracellular proteins. A size-dependent shift in the ratio between SLC7A11 and cathepsin B makes large cells less dependent on cystine import via system xc<sup>-</sup>, and thus, more resistant to Era2. In addition to this, it was reported that RSL3 can induce ferroptosis independently of GPX4 and may target other selenoproteins (DeAngelo et al, 2025; Cheff et al, 2023), which could also contribute to the difference in sizedependent responses to RSL3 and Era2.”

      (2) Functional tests of ferritin contribution to resistance are needed Although elevated ferritin (FTH1/FTL) levels in larger cells represent a strong correlational signal, definitive experimental evidence establishing causality is currently lacking. o Measuring the labile iron pool directly in size-stratified populations would strengthen the link. o Knockdown of FTH1 or FTL could reveal whether ferritin upregulation is necessary for the resistance of large cells to ferroptosis.

      We thank the reviewer for raising this point. We have now completed additional experiments, as suggested by the reviewer, and found that iron chelation is unlikely to mediate the sizedependent response to Era2. We have modified the manuscript accordingly and added the following data and discussion to address this point:

      Line 296: “The observed increase in ferritin concentration with cell size could therefore lead to additional Fe2+ ion chelation, which in turn would protect large cells from iron-dependent lipid peroxidation and ferroptosis. However, when we measured the concentration of labile intracellular Fe2+ using a fluorescent probe FerroOrange (Hirayama et al, 2020), we did not observe any size-dependent decrease in labile iron concentration (Fig. S2A). Previous work suggests a link between increased sequestration of ferrous iron in lysosomes and resistance to ferroptosis. It was reported that senescent cells, which are also large (Fig. S3A,B), gain resistance to ferroptosis through lysosomal alkalinization and sequestration of ferrous iron in lysosomes (Loo et al, 2025). We therefore tested whether the superscaling of lysosomes observed in large cells (Lanz et al, 2022; You et al, 2025) promotes Era2 resistance through lysosomal iron sequestration. To do this, we stained the cells with the lysosomal iron detection probe Lyso-FerroRed (Saimoto et al, 2025) and measured its scaling using flow cytometry (Fig. S2B). We observed that the amount of Lyso-FerroRed, and therefore, the amount of lysosomal iron, scaled in direct proportion to cell size, just like the total cellular protein content (Fig. S2B). These results indicate that iron chelation by ferritin and its sequestration in lysosomes are unlikely to play a crucial role in size-dependent decrease in Era2 sensitivity.”

      (3) Relevance to senescence should be addressed experimentally or explicitly discussed

      Given that senescent cells are enlarged and accumulate in aged and tumour tissues, testing senescent models for erastin2 resistance would greatly strengthen the physiological significance.”

      We agree that an increase in cell size contributing to the resistance of senescent cells to ferroptosis is intriguing. We have now added a Supplementary Figure S3 and discussion of this point in the manuscript as follows:

      Discussion line 552: “…our data suggest that previously reported resistance of senescent cells to ferroptosis can at least partially be due to the increased cell size, a well-established hallmark of senescence.”

      Minor Comments

      (1) Mechanistic nuance regarding RSL3 should be included

      RSL3 has been reported to induce ferroptosis independently of GPX4 (PMID: 37087975, PMID: 40392234) and may target other selenoproteins such as TXNRD1. This nuance would help explain the observed divergence between RSL3 and erastin2 sensitivity across sizes.

      We have now added this in the Discussion as suggested by the reviewer (line 583):

      “In addition to this, it was reported that RSL3 can induce ferroptosis independently of GPX4 and may target other selenoproteins (DeAngelo et al, 2025; Cheff et al, 2023), which could also contribute to the difference in size-dependent responses to RSL3 and Era2.”

      (2) Dynamic range of BODIPY-C11 assays needs commentary

      Despite high erastin2 doses, the oxidized BODIPY signal remains close to DMSO levels. The authors should comment on whether this reflects high GSH buffering capacity, probe limitations, or other factors.”

      We believe there are both technical (narrow dynamic range of the probe) and biological reasons for the relatively small (2-3 fold) difference in Oxidized-to-Non-oxidized BODIPY-C11 ratios between DMSO and Era2-treated cells. The biological reason is that the cells continue producing GSH until they fully deplete the cystine pool, which happens ~20-24 h after Era2 addition. Once the cystine pool is depleted, the cells very rapidly deplete GSH and initiate cell death. Therefore, there is only a short time window where cells are strongly depleted of GSH before dying. We see this small fraction of cells with a high Oxidized BODIPY-C11 signal in our flow cytometry experiments and in previous microscopy analysis of BODIPY-C11 (Murray et al., Protocol for detection of ferroptosis in cultured cells. STAR Protoc. 2023), but at our chosen time point (20h Era2) most cells are not as bright because we aimed to analyze the population before the onset of widespread cell death.

      (3) Western blot for shCycD1 depletion should be included

      CycD1 depletion usually causes cells to stop proliferating, which is not the case here. Therefore, depletion must be partial. The level of depletion should be shown by immunblotting.”

      The CCND1 manipulation model is extensively characterized in our recent work cited in this manuscript (You et al. (2025), Cell size-dependent mRNA transcription drives proteome remodeling. 2025.10.30.685141 doi:10.1101/2025.10.30.685141). CCND1 shRNA cells do not fully arrest in G0/G1 because the concentration of Cyclin D1 protein in this system is only partially decreased, as the reviewer noted. As a result, the cells have a slightly elongated G1 phase due to a ~30% reduction in Cyclin D1 concentration, but continue to proliferate. The G1 fraction changes from ~70% in wild-type to ~80% in CCND1 shRNA cells.

      Reviewer #2 (Significance):

      General Assessment: This study presents a mechanistic link between cell size and ferroptosis susceptibility. Using high-throughput microscopy, proteomics, and genetic perturbations across multiple human cell lines, the authors demonstrate that larger cells are more resistant to ferroptosis induced by system xc<sup>-</sup> inhibition (erastin2). This resistance is attributed to elevated glutathione production, increased ferritinmediated iron sequestration, and reduced ACSL4-dependent lipid peroxidation. The experimental design is rigorous and multifaceted, with consistent results across cell types and size manipulation methods. While the study is limited to in vitro systems, its conceptual and mechanistic insights lay the groundwork for future in vivo and translational investigations.

      Advance: This work is the first to systematically show that cell size directly influences ferroptosis susceptibility via proteome scaling. It reconciles previous findings that large cells are sensitized to GPX4 inhibition (RSL3) by demonstrating that the ferroptosis pathway targeted system xc<sup>-</sup> vs GPX4 determines the direction of size-dependent vulnerability. The study provides a conceptual advance by positioning cell size as a regulatory axis in cell death decisions, and a mechanistic advance by identifying size-dependent changes in glutathione metabolism, ferritin levels, and ACSL4 expression.

      Audience: This research will be of interest to specialists in cell death, ferroptosis, redox biology, and cancer biology. It also holds relevance for aging researchers and translational scientists exploring ferroptosis-based therapies. The findings may influence how cell size heterogeneity is considered in therapeutic design, particularly in oncology and senescence-targeting strategies.

      Field of Expertise: Translational cancer biology, cell cycle regulation, proteomics, therapy resistance, molecular mechanisms of cell death.

      We thank Reviewer #2 for their careful and constructive assessment of our manuscript. We were happy that they appreciated the rigor of our multifaceted approach. We are also grateful for their thoughtful perspective on the conceptual and mechanistic advances, and for highlighting the broader relevance of this work to ferroptosis biology, redox regulation, cancer and aging research.

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this manuscript, Zatulovskiy and colleagues elaborate on their previous work describing cell size-dependent changes in the proteome by investigating whether these changes can be correlated in differences in cell physiology. Using a cleverly-designed high throughput screen, they searched for compounds that differently-sized cells display differential sensitivity towards. Their primary hit, Era2, is involved in the ferroptosis pathway and serves as the starting point for a detailed study of how excess cell size protects cells from ferroptosis-induced cell death via: 1) lower concentrations of ACSL4 (which produces peroxidation-prone PUFAs), 2) increased ferritin concentrations, and 3) increased GSH concentrations.

      Overall, the experiments in this manuscript are well-designed and interpreted. It is an extremely well-written manuscript with a clear trajectory of logic. I have only a few major concerns that should be addressed before publication:

      We thank Reviewer #3 for their careful reading of the manuscript and for the clear summary of our study and its central findings. We appreciate their positive assessment of the experimental design, interpretation, and overall clarity of the writing and logical flow. We are also grateful for their constructive feedback and take their major concerns seriously; we have addressed each point in detail below.

      Major concerns:

      (1) In Figure 3E, the authors gate their flow cytometry data using SYTOX so that they are only analyzing live cells. Based on their gating scheme, it seems like there are really a lot of dead cells. Presumably the cells that died were the most sensitive to Era2, so it seems an oversight to discard these cells. Of course, it is not appropriate to analyze dead cells, but this could potentially be solved by using a shorter treatment duration than 24 hours wherein fewer cells die.”

      This is a good point. To address it, we have now replaced this panel with a time point where most cells are still alive (20 h, 0.2 µM Era2), as suggested by the reviewer (Fig. 3E,F). This did not change the conclusion that BODIPY-C11 oxidation decreases with cell size.

      (2) In Figure 5, are the small, medium, and large bins for ACSL4 KO cells the same as for WT cells? If the ACSL4 KO cells are just bigger to begin with, this could explain why the "small" bin has greater cell survival than the WT small bin. Moreover, is the overlap between the three bins the same in the WT and KO cells?

      This is an important point that we now address with data shown in Fig. S4B. We have now added a Supplementary Figure S4B to show the relative size of small, medium, and large WT and ACSL4 KO HMEC cells. As seen from this graph, the ACSL4 KO cells are not bigger than WT cells. Importantly, the fold-range between the small and large FACS-sorted cells is similar (~1.9 to 2-fold).

      (3) Loo, et al. Nat Comms 2025 similarly found that senescent cells (which are enlarged) are resistant to ferroptosis using the same inhibitor as the authors. In contrast to the authors, they show that this is due to lysosomal alkalinization and sequestration of ferrous iron in lysosomes. Given that Lanz et al. 2022 found that lysosomal components super-scale with cell size, it seems like this would be an important hypothesis to address. Free lysosomal iron can be easily measured with the LysoRhoNox stain. Loo et al. was able to restore ferroptosis sensitivity in senescent cells using the V-ATPase activator EN6, so it would be important for the authors to address whether this (or similar) treatment would have the same effect in enlarged cells.

      This is an excellent point. We have now performed this experiment and added it to the manuscript, as suggested by the reviewer. Based on the Lyso-FerroRed staining (another brand name for the LysoRhoNox probe), we do not see an increase in lysosomal iron sequestration in large cells (Fig. S2B):

      Line 301: “Previous work suggests a link between increased sequestration of ferrous iron in lysosomes and resistance to ferroptosis. It was reported that senescent cells, which are also large (Fig. S3A,B), gain resistance to ferroptosis through lysosomal alkalinization and sequestration of ferrous iron in lysosomes (Loo et al, 2025). We therefore tested whether the superscaling of lysosomes observed in large cells (Lanz et al, 2022; You et al, 2025) promotes Era2 resistance through lysosomal iron sequestration. To do this, we stained the cells with the lysosomal iron detection probe Lyso-FerroRed (Saimoto et al, 2025) and measured its scaling using flow cytometry (Fig. S2B). We observed that the amount of Lyso-FerroRed, and therefore, the amount of lysosomal iron, scaled in direct proportion to cell size, just like the total cellular protein content (Fig. S2B). These results indicate that iron chelation by ferritin and its sequestration in lysosomes are unlikely to play a crucial role in size-dependent decrease in Era2 sensitivity.”

      Minor concerns:

      (1) It would be helpful if this manuscript were re-submitted with line numbers to more easily reference the text.

      We have added line numbers for convenience.

      (2) In Figure 5A and other figures that reproduce data from Lanz et al. 2022, it would be helpful to have a summary curve for the overall abundance of each protein rather than only the individual peptide curves. These plots (particularly Figure 5A) are difficult to interpret since some peptides were presumably more abundant / measured with higher confidence than others.

      We have added the average ACSL4 protein slope line to Fig. 5A.

      (3) In Figure 5, the authors show the validation of the ACSL4 KO HT-1080 cell line but not HMEC, even though both are used in this figure. It would be useful to show both. Additionally, the authors switch back and forth between the two cell lines for this figure, and it is not clear why.

      We have added the HMEC ACSL4 KO validation Western blot in Fig. S4A.

      For the BODIPY oxidation experiment (Fig. 5D), we used HT-1080 instead of HMEC because HT1080 cells are sensitive to lower concentrations of Era2, and therefore, we could better optimize the Era2 concentrations and treatment durations to measure BODIPY oxidation at the time point when most cells are still alive but demonstrate a pronounced oxidized BODIPY signal.

      (4) In Figure 5B, the authors use antibody-based staining of ACSL4 and flow cytometry to correlate a loss of ACSL4 expression with increased cell size, validating the proteomics data in Figure 5A. This does not seem like a good way to do this. Firstly, fixing cells with formaldehyde alters their size (is this proportional across differently sized cells? It's impossible to know), which makes it inappropriate to use SSC as a proxy for size in this particular situation. Secondly, the normalization scheme here doesn't make sense. If actin was used as a reference protein, why was tubulin used to normalize ACSL4 abundance? Overall, this seems like a very round-about experiment that could have just been addressed by doing a simple western blot with the four size bins sorted from live cells (as it was in the proteomics). If the issue is that ACSL4 is not detectable by western in the HMEC cells, another solution would be plating the live, sorted bins on coverslips and measuring by IF (or using the HT-1080 cells).

      We prefer IF flow cytometry to Western blotting for protein scaling analysis because it is more quantitative and provides cell size and protein content information for each individual cell. While in principle, different-sized cells might change their size differently during fixation, the cells that were larger or smaller prior to the fixation remain larger or smaller after fixation as well.

      Therefore, the SSC measurement after fixation still provides reliable information on size ranking, even if SSC does not perfectly linearly scale with cell volume. We do not use the SSC information to calculate protein concentrations here. Instead, we divide the amount of our protein of interest in the cell by the amount of constitutively-expressed Tubulin, which acts as an analogue of a loading control in this experiment. In Fig. 5B, both ACSL4 and Actin were normalized to Tubulin to estimate their concentrations. Actin is used just as a reference protein to show how the concentration of a perfectly scaling protein remains constant across cell size, as opposed to the sub-scaling ACSL4. Tubulin in this case was used as a proxy for total cellular protein content, which scales linearly in proportion to cell volume. This approach for determining the scaling behaviors of different proteins was previously validated in Lanz et al., Mol Cell 2022.

      (5) In Figure 5E/5F, the authors pre-arrest the cells in G1 with palbociclib before size-sorting them. The pre-arrest is not done in other experiments using this cell line for sizesorting, so it would be important for the authors to comment on why this was done for this experiment but not others.”

      As we found in Fig. 2B-E, the cell cycle has confounding effects on size-dependent ferroptosis susceptibility measurements (as discussed in detail in our response to the first major point of Reviewer #1 above). Briefly, to avoid these confounding effects and isolate the effects of cell size from the effects of the cell cycle, we pre-synchronized the cells with 24 h treatment with palbociclib in Fig. 5E,F. This is now better clarified in the text, as follows:

      Line 456: “In this experiment, we synchronized cells in G1 phase using palbociclib prior to cell sorting and also incubated the sorted cells in the presence of palbociclib during Era2 treatment to isolate cell size effects from the previously observed confounding effects of the cell cycle on ferroptosis (Fig. 2B,E).”

      (6) Conceptually, it is difficult for me to understand why large cell size sensitizes cells to GPX4 inhibition but confers resistance to Era2 treatment. Particularly given the pathway described in Figure 3A, I am having trouble understanding why these would convey such opposing phenotypes. Shouldn't the extra ferritin in the bigger cells also help them cope with GPX4 inhibition if, as the authors state in the discussion, the increased sensitivity to the GPX4 inhibitor is reported to be mediated by (among other things) iron accumulation? A deeper discussion of this seeming-incongruity would be helpful for contextualizing the broader role of cell size in determining ferroptosis sensitivity.

      We agree this is an important point, which was also raised by the other reviewers. As such, we note that context-dependent (i.e., cell type-specific) effects are common in the ferroptosis field, and multiple groups including our own (Dixon) have published extensively on genes and mechanisms that can lead to differences between erastin2 and RSL3. For example, there are studies showing that the mTOR pathway or the p53 pathway can both prevent and promote ferroptosis, depending on the cell type or some other hidden variable.

      To better address the differences between Era2 and RSL3 in the context of the cell-sizedependent response, we have now added more data and discussion. In the Results section we added panel 4B and the following text:

      Line 359: “While the upregulation of GSH biosynthesis may promote the resistance of larger cells to ferroptosis, such an upregulation alone cannot explain why larger cells become more resistant to ferroptosis induced by the cystine import inhibitor Era2, but not, for example, by the GPX4 inhibitor RSL3 (Chan et al, 2025) (Figs. 2B, S1B). We found previously that upon mTORC1 inhibition cells can evade cystine deprivation-induced ferroptosis by uptake and catabolism of cysteine-rich extracellular proteins, mostly albumin (Armenta et al, 2022) (Fig. S3C). This process involves albumin degradation in lysosomes, predominantly by cathepsin B (CatB), and subsequent export of cystine from lysosomes to fuel the synthesis of glutathione. Large cells undergo proteome rearrangements similar to those occurring upon mTORC1 inhibition (Zatulovskiy et al, 2022). This suggests that large cells may upregulate CatB expression to bypass the Era2-induced cystine import inhibition via system xc-. To test this hypothesis, we used flow cytometry to measure how the expression of cathepsin B and the system xc- cystine/glutamate transporter SLC7A11 (xCT) scales with cell size (Fig. 4B). We found that SLC7A11 concentration modestly decreases, while CatB concentration significantly increases with cell size (Fig. 4B). This shift in the ratio between SLC7A11 and CatB supports the hypothesis that larger cells may rely less on cystine import via system xc- and thus become more resistant to system xc- inhibition by Era2.”

      Additionally, in the Discussion we added the following:

      Line 578: “We show that large cells may become resistant specifically to Era2 but not RSL3 through the upregulation of lysosomal function, particularly cathepsin B expression, which enables the uptake and catabolism of cysteine-rich extracellular proteins. A size-dependent shift in the ratio between SLC7A11 and cathepsin B makes large cells less dependent on cystine import via system xc-, and thus, more resistant to Era2. In addition to this, it was reported that RSL3 can induce ferroptosis independently of GPX4 and may target other selenoproteins (DeAngelo et al, 2025; Cheff et al, 2023), which could also contribute to the difference in sizedependent responses to RSL3 and Era2.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript incorrectly describes the result as poor spatial acuity. Acuity measures the average absolute error, and acuity is good when response biases are absent. Precision relates to the error variance. It is common to see high precision with low acuity or vice versa. Just noticeable differences assess precision or spatial resolution, while points of subjective equality evaluate acuity or bias. Similar confusions between these terms appear throughout the manuscript.

      While I do not agree with the reviewer's usage of the word “acuity” and a cursory Google search does not agree with the provided definition, I have replaced acuity with precision as appropriate to improve clarity.

      A paragraph within the next section seems to follow up on this insight by examining the across-participant consistency of the differences in tactile spatial resolution between body parts. To this aim, pairwise rank correlations between body sites are conducted. This analysis raises red flags from a statistical point of view. 1) An ANOVA and its follow-up tests assume no variation in the size of the tested effect but varying base values across participants. Thus, if significant differences between conditions are confirmed by the original statistical analysis, most participants will have better spatial resolution in one condition than the other condition, and the difference between body sites will be similar across participants. 2) Correlations are power-hungry, and non-parametric tests are power-hungry. Thus, the number of participants needed for a reliable rank correlation analysis far exceeds that of the study. In sum, a correlation should emerge between body sites associated with significantly different tactile JNDs; however, these correlations might only be significant for body sites with pronounced differences due to the sample size.

      We have entirely removed this result from both the text and supplement.

      The data do not support this conclusion. The conclusion that the nipple is perceived as a unit is based on poor tactile localization performance for touches on the nipple compared to the areola. The problem is that the localization task is a quadrant identification task with the center being at the nipple. Quadrants for the areola could be significantly larger due to the relative size of the areola and the nipple; the results section seems to suggest this was accounted for when placing the tactile stimuli within the quadrants, but the methods section suggests otherwise. Additionally, the areola has an advantage because of its distance from the nipple, which leads to larger Euclidean distances between the centers of the quadrants than for the nipple. Thus, participants should do better for the areola than for the nipple even if both sites have the same tactile resolution.

      We agree with this interpretation and have updated the language throughout.

      Categorization accuracy in each area was tested against chance using a Monte Carlo test, which is fine, though the calculation of the test statistic, Z, should be reported in the Methods section, as there are several options. Localization accuracies are then compared between areas using a paired t-test. It is a bit confusing that once a distribution-approximating test is used, and once a test that assumes Gaussian distributions when the data is Bernoulli/Binomial distributed. Sampling-based and t-tests are very robust, so these surprising choices should have hardly any effect on the results.

      Excellent point. We have replaced the paired t-test with a signed rank test and added text to the methods to expand upon this.

      A correlation based on N=4 participants is dangerously underpowered. A quick simulation shows that correlation coefficients of randomly sampled numbers are uniformly distributed at such a low sample size. This likely spurious correlation is not analyzed, but quite prominently featured in a figure and discussed in the text, which is worrisome.

      We have removed this panel to reduce this concern.

      The conclusion that tactile percepts are drawn toward the nipple is based on localization biases for tactile stimuli on the breast compared to the back. Unfortunately, the way participants reported the tactile locations introduces a major confound. Participants indicated the perceived locations of the tactile stimulus on 3D models of these body parts. The nipple is a highly distinctive and cognitively represented landmark, far more so than the scapula, making it very likely that responses were biased toward the nipple regardless of the actual percepts. One imperfect but better alternative would have been to ask participants to identify locations on a neutral grey patch and help them relate this patch to their skin by repeatedly tracing its outline on the skin.

      While I wholeheartedly agree with the sentiments of the reviewer, in our experience performing these tests across many women we have found that the variability of the morphology of the breast makes it incredibly hard for women to perform this task in the way the reviewer is describing. Consequently, there is likely no perfect version of the task. That said, we have endeavored to acknowledge the limitations of the approach in the discussion.

      Participants also saw their localization responses for the previously touched locations. This is unlikely to induce bias towards the nipple, but it renders any estimate of the size and variance of the errors unreliable. Participants will always make sure that the marked locations are sufficiently distant from each other.

      I again respectfully disagree with this interpretation. If the participants were to always make sure marked locations were sufficiently distant from each other then the degree of error and bias would be similar between regions given that the visual pattern would be almost identical. As this is not true in the data, I disagree with the premise, though we hope the changes to the discussion acknowledge limitations with the data collection method.

      Null-hypothesis significance testing only lets scientists either reject the null hypothesis or not. The latter does NOT mean the Null hypothesis is true, i.e., it can never be concluded that there is no effect. This rule applies to every NHST test. However, it raises particular concerns with distribution tests. The only conclusion possible is that the data are unlikely from a population with the tested distribution; these tests do not provide insight into the actual distribution of the data, regardless of whether the result is significant or not.

      Thank you for this comment. We have updated the language to make it explicit that we do not mean to imply failing to deviate from the Null distribution does not mean that they are in fact Null in nature.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I am wondering whether the interpretation of "the nipple as a sensory unit" is also supported by localization performance as reported in the analysis around Fig. 3 and supplementary Fig. 2. I cannot really see the error lines in that figure, and cannot tell whether any of the touches were on the nipple proper. Specifically I am wondering whether touch to the nipple is reliably attributed to the nipple, and touch to the areola to the areola, or whether confusion exists between the two. The description of the nipple as a sensory unit implies reliable attribution of touch to the respective area. Also the discussion (lines 309ff) is ambiguous about this.

      Thank you for this comment. We have removed language about the nipple being a unit and reframed the text in the discussion. We have also clarified that touches were indeed on the nipple.

      typos etc.

      lines 68-71 - implied causality is not backed up by evidence and could be the other way around than stated here

      line 82 grammar is inconsistent

      lines 199-200, "on the nipple" occurs twice

      Thank you for catching these. We have addressed the typos and grammar. We have also added a citation to the sentence where this exact hypothesis is stated. We have also relaxed the language to imply it is indeed a hypothesis.

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This important work demonstrates the role of physically linking the core and CTD kinase modules of TFIIH via separate domains of subunit Tfb3 in confining RNA Polymerase II Serine 5 CTD phosphorylation to promoter regions of transcribed genes in budding yeast. The main findings, resulting from analyses of viable Tfb3 mutants in which the linkage between TFIIH core and kinase modules has been severed, are supported by solid evidence from in vitro and in vivo experiments. The new findings raise the intriguing possibility that the Tfb3-mediated connection between core and kinase modules of TFIIH is an evolutionary addition to an ancestral state of physically unconnected enzymes.

      After consultation with the referees, we would like to suggest that you insert text into the RESULTS section acknowledging two limitations of your findings remaining in the revised manuscript, as follows:

      (i) It remains possible that Kin28 abundance was reduced by splitting Tfb3, which could be a factor in reducing its occupancies at gene promoters.

      In response, the paper now contains the following sentence:

      “Kin28 levels in extracts were below the limit of detection for our antibody, so we cannot rule out that the drop in ChIP signal is partly due to reduced Kin28 levels in the split Tfb3 strains. However, the viability of the cells (Figure 2) and the Tfb3-TAP purifications (Figure 3) argue against a complete loss of Kin28.”

      (ii) Lower than wild-type expression of the Tfb3 truncations might contribute to their mutant phenotypes shown in Figs. 2 & 5.

      In response, the paper now contains the following sentence:

      “There was some variation in protein expression levels (Figure 3A, left panel, lanes 1-4), and reduced levels of the split Tfb3 may contribute to the slow growth phenotypes.”

      Public Reviews:

      Reviewer #1 (Public review):

      Giordano et al. demonstrate that yeast cells expressing separated N- and C-terminal regions of Tfb3 are viable and grow well. Using this creative and powerful tool, the authors effectively uncouple CTD Ser5 phosphorylation at promoters and assess its impact on transcription. This strategy is complementary to previous approaches, such as Kin28 depletion or the use of CDK7 inhibitors. The results are largely consistent with earlier studies, reinforcing the importance of the Tfb3 linkage in mediating CTD Ser5 phosphorylation at promoters and subsequent transcription.

      Notably, the authors also observe effects attributable to the Tfb3 linker itself, beyond its role as a simple physical connection between the N- and C-terminal domains. These findings provide functional insight into the Tfb3 linker, which had previously been observed in structural studies but lacked clear functional relevance. Overall, I am very positive about the publication of this manuscript and offer a few minor comments below that may help to further strengthen the study.

      We appreciate the reviewer’s positive assessment of our work and suggestions for improvement.

      Page 4 PIC structures show the linker emerging from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits, followed by a turn and a short stretch of helix just N-terminal to a disordered region that connects to the C-terminal region (see schematic in Fig. 1A).

      The linker helix was only observed in the poised PIC (Abril-Garrido et al., 2023), not other fully-engaged PIC structures.

      Thanks for clarifying. We note that some structures of TFIIH alone also see the long helix. Accordingly, we modified this section to read:

      “In many TFIIH and PIC structures the linker is not visible, presumably due to flexibility. However, when it is seen (Abril-Garrido et al., 2023; Greber et al., 2019), the linker emerges from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits…”

      Page 8 Recent structures (reviewed in (Yu et al., 2023)) show that the Kinase Module would block interactions between the Core Module and other NER factors. Therefore, TFIIH either enters into the NER complex as free Core Module, or the Kinase Module must dissociate soon after.

      To my knowledge, this is still controversial in the NER field. I note the potential function on the kinase module is likely attributed to the N-terminal region of Tfb3 through its binding to Rad3.

      We are not experts on NER, but in reviews of the field this appears to be a widely held assumption. A 2008 paper from the Egly lab (Coin et al., DOI 10.1016/j.molcel.2008.04.024) is usually cited, which shows that the interaction between XPD (metazoan Rad3) and XPA is likely incompatible with XPD-MAT1 interaction. In addition to the Yu 2023 review, we now also cite a more recent publication that more extensively reviews the models for core TFIIH interactions (van Sluis et al, 2025). We looked at the multiple recently published structures of various TCR-NER and GG-NER intermediate complexes, and none of them show the CAK module or even the Tfb3/Mat1 N-term, even though those proteins were typically included during assembly. We also consulted with our colleagues Johannes Walter and Lucas Farnung, who are studying various TC-NER intermediates biochemically and structurally. Although the CAK module is included in their assembly reactions, it is not visible in their cryoEM structures. They tell me that the presence of CAK would be compatible with early TC-NER intermediates, but is predicted to overlap with later interactions of XPD with the TC-NER factor STK19 (see Mevissen et al., Cell 2024). To be conservative, we modified the sentence to say “Recent structures … suggest” rather than “show”.

      Because the yeast strains used in Fig. 6 retain the N-terminal region of Tfb3, the UV sensitivity assay presented here is unlikely to directly address the contribution of the kinase module to NER.

      We agree that our experiment only shows that the connection between Tfb3 N- and C-term domains is not necessary for NER. The individual domains might still be able to function independently. Accordingly, we changed the heading of that section from “Disconnected core TFIIH does not cause an NER defect” to “Split Tfb3 does not cause an NER defect.” This more closely matches the figure legend title.

      Page 11. Notably, release of the Tfb3 Linker contact also results in the long alpha-helix becoming disordered (Abril-Garrido et al., 2023), which could allow the kinase access to a far larger radius of area. This flexibility could help the kinase reach both proximal and distal repeats within the CTD, which can theoretically extend quite far from the RNApII body.

      Although the kinase module was resolved at low resolution in all PIC-Mediator structures, these structural studies consistently reveal the same overall positioning of the kinase module on Mediator, indicating that its localization is constrained rather than variable. This observation suggests that the linker region may help position the kinase module at this specific site, likely through direct interactions with the PIC or Mediator. This idea is further supported by numerous cross-links between the linker region and Mediator (Robinson et al., 2016).

      That is true. But please note that this sentence was meant to describe movement of the kinase module AFTER release from Mediator (see previous sentence). Re-reading the passage, we realized the confusion is because we propose multiple possible pathways in that paragraph. In the first half, we suggest the capture of the kinase module by Mediator might trigger the conformation changes in the linker. In the second half (where it says “Alternatively….”) we suggest the Mediator-CAK interaction could instead come first, and the release of this contact could free the CAK module to move around. We have modified the paragraph to make it clear these are two different distinct models.

      Comments on revisions:

      Revised ms clarified all my points, including those I previously misunderstood.

      Thanks again for helping us improve the manuscript.

      Reviewer #2 (Public review):

      Summary:

      This work advances our understanding of how TFIIH coordinates DNA melting and CTD phosphorylation during transcription initiation. The finding that untethered kinase activity becomes "unfocused," phosphorylating the CTD at ser5 throughout the coding sequence rather than being promoter-restricted, suggests that the TFIIH Core-Kinase linkage not only targets the kinase to promoters but also constrains its activity in a spatial and temporal manner.

      Strengths:

      The experiments presented are straightforward and the model for coupling initiation and CTD phosphorylation and for evolution of these linked processes are interesting and novel. The results have important implications for the regulation of initiation and CTD phosphorylation.

      Comments on revisions:

      The revised version with revisions to figures, text and new data has addressed all of our prior comments.

      We thank the reviewer for helping us improve the paper.

      Reviewer #3 (Public review):

      Summary:

      Eukaryotic gene transcription requires a large assemblage of protein complexes that govern the molecular events required for RNA Polymerase II to produce mRNAs. One of these complexes, TFIIH, comprises two modules, one of which promotes DNA unwinding at promoters, while the other contains a kinase (Kin28 in yeast) that phosphorylates the repeated motif at the C-terminal domain (CTD) of the largest subunit of Pol II. Kin28 phosphorylation of Ser5 in the YSPTSPS motif of the CTD is normally highly localized at promoter regions, and marks the beginning of a cycle of phosphorylation events and accompanying protein association with the CTD during the transition from initiation to elongation.

      The two modules of TFIIH are linked by Tfb3. Tfb3 consists of two globular regions, an N-terminal domain that contacts the Core module of TFIIH and a C-terminal domain that contacts the kinase module, connected by a linker. In this paper, Giordano et al. test the role of Tfb3 as a connector between the two modules of TFIIH in yeast. They show that while no or very slow growth occurs if only the C-terminal or N-terminal region of Tfb3 is present, near normal growth is observed when the two unlinked regions are expressed. Consistent with this result, the separate domains are shown to interact with the two distinct TFIIH modules. ChIP experiments show that the Core module of TFIIH maintains its localization at gene promoters when the Tfb3 domains are separated, while localization of the kinase module, and of Ser5 phosphorylation on the CTD of Pol II, is disrupted. Finally, the authors examine the effect of separating the Tfb3 domains on another function of TFIIH, namely nucleotide excision repair, and find little or no effect when only the N-terminal region of Tfb3 or the two unlinked domains are present.

      Strengths:

      Experiments involving expression of Tfb3 domains in yeast are well-controlled and the data regarding viability, interaction of the separate Tfb3 domains with TFIIH modules, genome-wide localization of the TFIIH modules and of phosphorylated Ser5 CTDs, and of effects on NER, are convincing. The experiments are consistent with current models of TFIIH structure and function and support a model in which Tfb3 tethers the kinase module of TFIIH close to initiation sites to prevent its promiscuous action on elongating Pol II.

      We appreciate that the reviewer finds that our main conclusions are convincing.

      Weaknesses:

      The work is limited in scope and does not provide major insights into the mechanism of transcription. The main addition to current models of transcription is that tethering of Kin28 to Tfb3 may limit kinase action from occurring downstream from the initiation site.

      The first described experiment, which purports to show that three kinases cannot function in place of Kin28 when tethered (by fusion) to Tfb3 is missing the crucial control of showing that Kin28 can support viability in the same context. This result also does not connect with the rest of the manuscript, although the experiment apparently motivated the subsequent studies reported here.

      We elected not to do this control experiment for several reasons. As reviewer 3 points out, this kinase fusion experiment turned out to be somewhat disconnected from the rest of the paper. Even though it didn’t work, we included it in the paper because the results led us to the realization that the Tfb3 C-term was actually not fully essential for viability as reported, which in turn led us to the idea of splitting Tfb3. Structural studies (https://doi.org/10.1126/sciadv.abd4420, https://doi.org/10.1073/pnas.2009627117, https://doi.org/10.7554/eLife.44771) show that, in addition to providing linkage to the core module, the C-term of Tfb3 induces a conformation change in Kin28/Cdk7 necessary for full kinase activity (which is likely why the strains without C-term are just barely viable). If we were to pursue why the fusions didn’t work, we could tether Kin28 directly to the Tfb3 linker (and may try this in the future), but then would need to also express the C-term separately for its activating function. Even then, this would be an imperfect control for the fusion experiments in Figure 1. Because were trying to best mimic Kin28 being tethered via the accessory subunit Tfb3/Mat1, in the Figure 1 experiment we did not directly attach the kinases to Tfb3. For Ctk1/Cdk12, we fused the Tfb3 linker to the Ctk3 accessory subunit (analogous to Tfb3), and for Bur1/Cdk9, we fused to the cyclin subunit Bur2 (there is no known third subunit in this complex). The one exception was Mpk1, which has no partner subunits and is not a CDK. There are many reasons why this high-risk protein fusion experiment may not have worked, but chose not to pursue it further at this time.

      Finally, the authors present the interesting and reasonable speculation that the TFIIH complex and connecting Tfb3 found in mammals and yeast may have evolved from an earlier state in which the two TFIIH subdomains were present as unconnected, distinct enzymes. It will be interesting to have this idea tested more thoroughly as more molecular evolutionary data becomes available.

      Comments on revisions:

      For the most part, the authors have satisfactorily addressed my previous critique. In particular, they have added to their discussion of evolutionary implications, and performed an experiment casting doubt on the assertion of a dominant negative effect, and as a consequence removed this claim from the manuscript. I also pointed out that the fusion experiments that lead off the Results section are missing the crucial control of including a Tfb3-Kin28 fusion. The authors have elected not to perform this control experiment, pointing out that even this control would be imperfect in some respects, and agreeing that this experiment is somewhat disconnected from the rest of the paper. The reason for including it, in spite of its somewhat tangential nature, is that it provides something of a rationale for the experiments that follow. I don't so much mind their retaining the experiment, as the absence of this control (and indeed, the results) does not so much impact the later results. However, I think if it is to be included, this shortcoming should be explicitly recognized, especially as a service to younger scientists who could benefit from an exposition that includes a thorough consideration of potential control experimenents.

      We thank the reviewer for helping us improve the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public review):

      Summary:

      This manuscript presents a high-quality, chromosome-level genome assembly of the European cuttlefish (Sepia officinalis), a representative species of the cephalopod lineage. Using state-of-the-art sequencing and scaffolding technologies -including PacBio HiFi long reads and Hi-C chromatin conformation capture - the authors deliver a genome assembly with exceptional contiguity and completeness, as evidenced by high BUSCO scores. This genome resource fills a significant gap in cephalopod genomics and offers a valuable foundation for studies in neurobiology, behavior, and evolutionary biology. However, there are several major aspects that need to be strengthened.

      Major Revisions Recommended:

      (1) Single-individual genome limitation

      The genome assembly is based on a single individual, which appears to be male. While this approach is common in genome projects, it does not capture the full genetic diversity of the species. As S. officinalis exhibits a wide geographical range and possible population structure, future efforts (or discussion in this manuscript) should consider re-sequencing multiple individuals - of both sexes and from diverse geographic origins - to characterize population-level variation, sex-linked features, and structural polymorphisms.

      We thank the reviewer for this summary and the important point raised. While sequencing additional individuals, unfortunately, lies outside the scope of our study, we used the published data from the DToL assembly (from a male individual from a different geographical origin) to begin to investigate their differences.

      First, we attempted to create a mixed assembly from both datasets, as also suggested by Reviewer 2, to increase data coverage and genetic information. Even though the heterozygosity estimate is quite low (ca. 1%), the mixed assembly produced severely inflated and fragmented results, yielding an assembly ca. 3× larger than expected, with the top 46 contigs covering only ~5% of the total length - a sign of over duplication and failed haplotype collapse.

      This result is not surprising when considering the assembly algorithms: most programs, including hifiasm used in this study, assume a single diploid individual (or a trio assembly including data from both parents), so using multiple individuals breaks this assumption. Assembly pipelines infer homozygous/heterozygous coverage cutoffs from the k-mer histogram. Mixing individuals raises apparent heterozygosity far above true diploid levels, turning the expected bimodal k-mer profile into a complex multimodal distribution. This misleads the phasing and purging steps in the assembly pipeline, causing over-expansion and fragmentation of the assembly.

      Second, we created separate assemblies from the raw data sets of MPIBR and DToL using the exact same pipeline and parameters to avoid the technical problem described above. These assemblies are directly comparable, and after aligning them, it is possible to build a pangenome graph that we believe would help to address the points raised by the reviewer. Pangenome graphs can represent cross-individual variation more accurately and improve read alignment in regions of high genomic variation, which can aid population-level analyses [1]. We agree on the importance of this work, yet collecting data from more individuals and the construction and analysis of a pangenome graph lies beyond the scope of this manuscript and should be part of future efforts by the cephalopod genomics field.

      (2) Limited experimental validation of chromosomal inferences

      The study reports chromosome-scale scaffolding using Hi-C data and proposes a revised karyotype for S. officinalis. However, these inferences would be significantly strengthened by orthogonal validation methods. In particular, fluorescence in situ hybridization (FISH) or karyotyping from cytogenetic preparations would provide direct confirmation of chromosome number and structural arrangements. The reliance solely on Hi-C contact maps for inferring chromosomal organization should be acknowledged as a limitation or supplemented with such validations.

      We appreciate the reviewer’s point regarding the value of orthogonal validation methods to support the chromosome-scale scaffolding and proposed karyotype. We acknowledge that relying solely on Hi-C contact maps to infer chromosome number and structure presents limitations, as also becomes apparent in our detailed analysis of both S. officinalis genome assemblies (in Figure 2 and Supplementary Figure 3 of the revised manuscript). We attempted to complement these analyses with cytogenetic approaches. Unfortunately, the availability of suitable mitotic tissue was limited. Moreover, our karyotyping trials proved challenging: resolving the ≥92 (2n) chromosomes in situ was not feasible due to their high number and the small size of the nuclei (approximately 5 µm in diameter on average).

      We now highlight this point as an important direction for future work in our discussion (line 456-466):

      “Additional methods such as cytogenetic karyotyping or optical mapping such as BioNano [141] (imaging of fluorescently tagged, linearized DNA) could be used to validate chromosome numbers. However, whereas karyotypes of octopuses have been consistent throughout the literature (1n=30) [142,143], those measured in decapods vary greatly. For example, 1n=46 chromosomes have been reported for two species of cuttlefish (A. esculentum and A. lycidas) and three loliginid squids [85]; 1n=36 has been reported for A. Arabica [86] and 1n=24 in A. pharaonis [87]. In S. officinalis, a karyotype of 1n=52 is reported for testis samples [88]. Combining cytogenetic preparations with fluorescent labeling of centromeric or telomeric sequences, as demonstrated in the octopus A. aerolatus [143] could help resolve these issues. Establishing a routine staining protocol would enable comprehensive tests at the species- and population-level.”

      (3) Shallow discussion of chromosomal evolution

      The manuscript briefly mentions chromosomal number differences among cephalopods but does not explore their evolutionary or functional implications. A more thorough comparative analysis - linking chromosomal rearrangements (e.g., fusions, fissions) with ecological adaptation, life history, or neural complexity - would greatly enhance the impact of the findings. Referencing chromosomal dynamics in related taxa and possible links to behavioral innovations would contextualize these results more effectively.

      We agree with the reviewer that this is a fascinating topic of research that demands further attention and have extended our discussion, which now reads (line 476-501):

      “In addition to studying chromosomal topology in phylogenetic reconstructions, some of the most interesting aspects of these rearrangements relate to changes of and innovation in regulatory elements that underlie phenotypic diversity. In coleoid cephalopods, it is thought that an ancient large-scale genome rearrangement was combined with lineage-specific changes and repeat expansions [48–50]. This restructuring gave rise to hundreds of tightly linked, evolutionarily unique microsyntenies, corresponding to distinct topological compartments with specialized regulatory architectures that contribute to complex, tissue-specific expression patterns in the nervous system and elsewhere [43]. Extending this, chromosomal conformation analyses in E. scolopes revealed that co-regulated eye and light-organ genes cluster at topologically associating domain (TAD) boundaries, and that an evolutionarily recent rearrangement at the dachshund (DAC) locus may have been instrumental in the emergence of the symbiotic light organ in Euprymna - directly linking specific chromosomal topology to morphological innovation [44].

      To understand the broader functional impact of these changes across coleoids, a recent study investigating Micro-C, RNA-seq, and ATAC-seq data from multiple species revealed broadly conserved chromatin domains, but also many lineage-specific chromatin loops that form novel regulatory signatures and impact expression profiles across species and tissues [149].

      Despite the observed small-scale regulatory changes, the chromosomes of decapods are considered to be more closely related to the ancestral coleoid karyotype than those of octopods. The derived octopod karyotype becomes apparent when comparing it to the genome of the vampire squid, an early-branching octopodiform (sister to all octopods) which retained features of the decapod, ancestral karyotype [150]. Taken together, the conserved karyotype of decapods accommodates fine-scale regulatory diversity that might underlie morphological diversity among species, which suggests that many regulatory innovations are still being evolutionarily explored through rearrangements within the existing chromosomes.”

      (4) Underdeveloped gene family and pathway analysis

      While the authors identify expansions in gene families such as protocadherins and C2H2 zinc finger transcription factors, the functional significance of these expansions remains speculative. The manuscript would benefit from:

      (a) Functional enrichment analyses (e.g., GO, KEGG) targeting these gene families.

      (b) Expression profiling across tissues or developmental stages to infer regulatory roles.

      (c) Comparison with expression or expansion patterns in other cephalopods with known behavioral complexity (e.g., Octopus bimaculoides, Euprymna scolopes).

      (d) Potential integration of transcriptomic or epigenomic data to support regulatory hypotheses.

      We thank the reviewer for these constructive suggestions and have substantially expanded the functional characterization of expanded gene families in the revised manuscript.

      To address points a) + b), we performed GO enrichment analyses for all expanded gene families (orthogroups), both for the largest gene families and the most significantly expanded families identified from our CAFE5 analysis. Further, we cross-referenced all S. officinalis members of each expanded orthogroup against differentially expressed genes in our bulk RNA-seq data from multiple tissues (initially collected to improve the gene modeling), allowing us to infer tissue-specific expression patterns for the expanded families.

      To address point (c), the species-resolved copy-number profiles from our orthogroup analysis directly situate the S. officinalis expansions within the broader coleoid context, including O. bimaculoides, O. vulgaris, E. scolopes, and D. pealeii, enabling direct comparison of expansion scale and lineage specificity across species with varying degrees of behavioural complexity. We note that the C2H2 zinc finger and protocadherin expansions show distinct phylogenetic profiles consistent with independent radiations in octopods and decapodiforms, in agreement with recent studies.

      Regarding point (d), no epigenomic data for S. officinalis was publicly available at the time of writing, thus we focused on the transcriptomic data from this study, as described above.

      We describe this analysis in two additional results paragraphs to the manuscript, one modified (Figure 4) and two new figures (Figure 5 and Supplementary Figure 7), which are reproduced (lines 294-400):

      “Analysis of expanded gene families

      We sought to investigate the S. officinalis gene annotation and place it in the context of gene repertoires from other cephalopod or molluscan species. First, we collected available genome annotations from 12 other molluscan species (Table 2) and clustered them using OrthoFinder v.3.1.0 [122], resulting in 23,658 orthogroups, hereafter named gene families.

      First, we investigated 36 of the gene families that contain more than 100 genes in any of the species, with 17 of these families containing at least one gene of S. officinalis, that reflect large-scale gene family expansions (Figure 4E). We used the InterProScan and eggNOG-mapper annotations to infer functional roles of these genes, selecting the most common gene annotation as the name of the gene family.

      The zinc finger C2H2-type transcription factors (TFs) were grouped into three of the large gene families, with the largest family (OG0000000) only present in decapod cephalopods. This likely reflects the largely independent expansions in the octopod and decapod lineages that date back to a burst of transposon activity ca. 25 million years ago [46,48,49]. The largest expansion across mollusks occurs in the cadherin-like family (OG0000001): 310 in S. officinalis, 283 in D. pealeii, 209 in A. lycidas, 102 in O. vulgaris, 55 in O. bimaculoides, with low but non-zero counts in bivalves (C. virginica, M. gigas). This profile is consistent with the protocadherin expansion first described in O. bimaculoides [46] and subsequently shown to be present across cephalopods [48,49,123].

      HPGDS (OG0000005, hematopoietic prostaglandin D synthase) is a glutathione-S-transferase family member that catalyzes the conversion of prostaglandins, which have well-described roles in immune responses in vertebrates and insects [124,125]. This family shows a broad expansion in decapods, with a lesser expansion in octopods. Additionally, members of the glutathione-S-transferase families have been co-opted as S-crystallins, structural proteins found in the lens of cephalopods that may, or may not, retain enzymatic functions [126,127].

      Two large families are mostly lineage-restricted. The RING-type zinc finger family (OG0000058) has 103 copies in S. officinalis and 26 in A. lycidas but is absent in all other species except for E. scolopes. Conversely, OG0000002 (unknown function) has 479 copies in E. scolopes and only a few copies in the other species. This interesting Sepiolid-specific expansion warrants further characterization.

      We estimated gene family evolution rates using CAFE5 [128] for all families with less than 100 copies in any species (this excludes the families described above, as very large copy-number differences between species preclude likelihood calculations under the applied birth-death model). After comparing different model parameters, we chose a gamma model with three rate categories, allowing for evolutionary rate variation among gene families. Out of the 12,895 gene families analyzed, 1,813 showed a significant (p < 0.05) expansion or contraction in at least one of the species. We focused our analysis on the 30 most significantly expanded families; among them were several retrotransposon-associated domains that have expanded specifically in S. officinalis five families carrying Retrovirus-related Pol polyprotein domains, two Reverse transcriptase domain families, and four Ribonuclease H-like families (Supplementary Figure 7A). There was no coordinate-based overlap of the coding sequences with annotated TEs from the RepeatMasker output (Methods).

      In addition to the three large gene families of C2H2 zinc finger expansions, 45 gene families containing this TF type showed a significant change in the CAFE5 analysis. Notably, eight of the significant gene families, as well as four of the largest gene families, were annotated as CCHC-type zinc fingers, which contain a “zinc knuckle” motif that is characteristic of retroviral nucleocapsid proteins [129] and is functionally integrated in the genomes of several species, including humans [130].

      Some gene families without any relationship to retrotransposons were also expanded. For example, the UGT2A1-related family is a UDP-glucuronosyltransferase, a class of enzymes central to phase II detoxification and conjugation of metabolites, reported in other mollusks in the context of environmental chemical tolerance [131], and in insects in the context of pigmentation [132]. We also detected a family of homeodomain-like proteins, representing an expansion of this important TF family.

      Tissue-specific expression of expanded gene families

      To place the identified gene families in a functional context, we profiled their expression in the bulk RNA-seq data (taken from multiple tissues of S. officinalis) used originally for gene modeling (Figure 5A). Principal component analysis (PCA) revealed the largest axis of variation in gene expression to separate brain tissues from peripheral tissues, with skin being the most transcriptomically distinct (Figure 5A), consistent with the high number of tissue-specific differentially expressed (DE) genes identified in non-neural tissues (Figure 5B). We identified the genes belonging to expanded families that were differentially expressed across tissues and enriched gene ontology [133,134] (GO) terms for them to gain additional insight. The large families excluded from CAFE5 modelling and the significantly expanded families identified by CAFE5 were analyzed separately.

      Eleven of the largest gene families were expressed in our data (Figure 5C) and five had enriched GO terms (Figure 5D,E). Among them, the cadherin family showed brain-restricted expression and GO terms related to cell–cell adhesion and calcium binding, consistent with their role in neuronal connectivity and circuit formation [46,135]. Two C2H2 zinc finger gene families were expressed in the optic and vertical/subvertical lobes of the brain and in the skin, with GO terms related to DNA-binding, transcriptional regulation or development. The RING-type zinc finger family was expressed specifically in the skin, with GO terms including zinc binding and ubiquitin protein ligase activity, the canonical function of RING-domain E3 ligases [136]. Genes of the HPGDS/S-crystallin family were expressed in the brain (basal and optic lobes and posterior subesophageal mass) and skin, with GO terms related to glutathione metabolism, matching their described enzymatic function. We did not find expression in the retina, which is expected given that S-crystallins are expressed in lentigenic cells of the eye [42,137] and these cells were not included during sampling.

      Among the 30 most significantly expanded families examined (out of 1,813 total), expression was widespread (20/30) and tissue-specific differential expression was common (17/30), suggesting that a substantial proportion of expanded paralogs represent functional coding sequences with specialized spatial deployment (Supplementary Figure 7B). Ten of the retrotransposon-associated families were differentially expressed in the brain (optic and vertical/subvertical lobes) and skin, arguing against these loci being inactive repeat fragments and supporting their inclusion as transcribed gene models. Two significantly expanded families showed both differential expression and enriched GO terms (Supplementary Figure 7C). The first was the UGT2A1-related family, which had the largest number of differentially expressed genes overall, with expression concentrated in the skin, retina and posterior subesophageal mass of the brain. Enriched GO terms matched the described enzymatic function for this family, namely UDP-glycosyltransferase activity. The second gene family was the homeodomain-like family with enrichment for DNA binding terms consistent with their role as transcription factors, and was preferentially expressed in the vertical and subvertical brain lobes with weaker expression in other areas.

      Collectively, many differentially expressed genes from expanded families were restricted to specific tissues or brain subregions (Figure 5F and Supplementary Figure 7D), indicating that paralogs within an expanded family have adopted distinct spatial expression domains and possibly, specialized functions.”

      Reviewer 2 (Public review):

      Summary:

      This paper concerns an interesting organism, Sepia officinalis. However, in the opinion of this reviewer, the paper reads somewhat like a genome report. The authors have used 23x PacBio HiFi in conjunction with relatively low coverage (11x) Hi-C to scaffold the genome into a karyotype of 47 chromosomes. They have used a combination of short and long read RNA seq to annotate the genome in what looks like a very good annotation. The paper offers basic analyses of the Busco evaluation, some descriptive analyses of gene family and repeat content, and a bit more focused analysis on synteny among sequenced squids. Generally, the data will be useful.

      Strengths:

      This is a high-quality annotation, and the data ultimately will be useful to other researchers. I appreciate trying to understand what's happening between assemblies of S. officinalis.

      Weaknesses:

      I don't believe the data at hand makes a strong case for the argument of 47 chromosomes. This is my biggest sticking point with the paper, and it is for a few reasons:

      (1) The authors point to assembly differences between the DToL assembly and the one presented in the manuscript and seem to claim that DToL is incorrect. However, the DToL assembly (xcSepOffi3.1) is based on much deeper HiFi and HiC coverage than the one at hand (51x and 80+x respectively). There are many things to try here, including:

      (a) Downloading the DToL data and reassembling using a common pipeline.

      (b) Downsampling the DToL data to similar coverage as what the authors have achieved.

      (c) Combining your data and that of DToL for even deeper coverage (heterozygosity is low enough that I don't imagine this impeding things too badly).

      We thank the reviewer for these helpful suggestions and want to clarify that we did not seek to point out errors in the DToL assembly, but rather to investigate the unexpected discrepancies between the two assemblies. It is correct that the DToL data has a much higher coverage than our data. We followed the individual suggestions and incorporated them into the revised manuscript. We reproduce the relevant sections below, and provide additional information:

      (a) Downloading the DToL data and reassembling using a common pipeline.

      We downloaded the DToL data and reassembled it using a common pipeline, yielding the results listed in Author response table 1. The DToL assembly is more contiguous, which is mainly due to its higher HiFi coverage. It also receives slightly better BUSCO scores (computed using odb12 as recommended by Reviewer 3).

      Author response table 1.

      Full statistics of S. officinalis assemblies from two independent datasets, assembled using a common pipeline.

      The updated manuscript now reads (lines 146-159):

      “A chromosome-scale assembly for Sepia officinalis was released recently by the Wellcome Sanger Institute’s Darwin Tree of Life project [75] (DToL, GCA_964300435.1). That genome was assembled from a male individual using high coverage PacBio Sequel II (~51x) and Arima2 Hi-C (~80x) data, with a final assembly size of 5.8 Gb. The the haploid chromosome number was estimated to be 49. To compare both S. officinalis datasets directly, we downloaded the DToL data and created two new assemblies using the pipeline described above (hifiasm using PacBio HiFi and Hi-C data). The resulting assemblies were overall very similar, with the DToL assembly having a slightly higher contiguity (N50 length, see Table 1) and BUSCO completeness (Supplementary Figure 2A,B) due to their higher sequencing coverage.”

      To further compare the two datasets, we added a new Figure 2 to the revised manuscript and the following paragraph to the results (lines 160-169):

      “After scaffolding with YAHS, both datasets reached the previously identified chromosome numbers (1n=47 for MPIBR and 1n=49 for DToL, Figure 2A,B). To further investigate this surprising discrepancy, we aligned both assemblies using Winnowmap [89] to locate the differences between them (Figure 2C). We observed four “breakpoints” (BP) of chromosome scaffolds: one in the MPIBR assembly compared to DToL (BP1: DToL_5 = MPIBR_40+44) and three in the DToL assembly compared to MPIBR (BP2: DToL_31+40 = MPIBR_2, BP3: DToL_41+46 = MPIBR_6, BP4: DToL_44+45 = MPIBR_7). We also aligned the assemblies to the chromosome-scale genome of another cuttlefish Acanthosepion esculentum (1n=46, GCA_964036315.1). In this alignment, all four breakpoints were collinear with single A. esculentum chromosomes (Figure 2D).”

      (b) Downsampling the DToL data to similar coverage as what the authors have achieved.

      Instead of downsampling the DToL data, we decided to analyze the Hi-C and HiFi data for both assemblies, focusing on the four “breakpoints” between the assemblies and the A. esculentum genome that we described above. First, we performed a QC analysis of the Hi-C reads using pairtools [2], the result is visualized in Author response image 1. The percentage of valid Hi-C read pairs, i.e., cis pairs with insert distances of more than 1 kb and trans pairs, following the Dovetail genomics QC manual (https://dovetail-analysis.readthedocs.io/en/latest/whole_genome/qc.html). When Hi-C pairs were aligned to the primary contigs from hifiasm (as is used for scaffolding with YAHS), the DToL HiC data contains fewer valid read pairs (11.4%) than the MPIBR data (43.1%), possibly due to using a different tissue (eye vs. optic lobe) and HiC kit (Arima 2 vs. Dovetail OmniC) for the library preparation. Nonetheless, due to the much higher overall coverage, the amount of valid read pairs is still 2.35x higher for DToL (144,014,368 pairs) than for MPIBR (61,318,955 pairs). The higher trans fraction (i.e. HiC pairs across contigs) is dependent on the length of the primary contigs, so the higher trans fraction for the MPIBR data can be explained by the lower contiguity of its primary contigs. It is conceivable that for both assemblies, the low numbers of valid read pairs introduce a technical fragmentation of certain chromosomes, as indicated by the identified breakpoints (Figure 2).

      Author response image 1.

      Analysis of Hi-C read pairs from both S. officinalis assemblies. Hi-C reads were aligned to the primary contigs from hifiasm (as is used for scaffolding with YAHS) and analyzed using pairtools. Note the higher fraction of long-range contacts (at least 1 kb cis pairs or trans pairs) in the MPIBR data (top) compared to DToL (bottom). Due to overall higher coverage, the absolute number of read pairs is higher for DToL than for MPIBR data.

      Second, we performed a detailed analysis of read coverage along the breakpoint junctions of the discrepant chromosomes/scaffolds between both assemblies. We included a description of the results and a new Supplementary Figure 3 in the manuscript, (lines 171-207):

      “To better understand the potential cause of these divergent chromosome numbers, we analyzed the Hi-C and HiFi coverage in the breakpoint regions (Supplementary Figure 3A). First, we aligned the Hi-Fi reads to the scaffolds and extracted all alignments along the 200 kb terminal scaffold windows to find any notable drops in coverage, or reads spanning any of the scaffold junctions. We detected no spanning reads. This is not surprising given that no contigs were assembled at these sites, resulting in the observed scaffold junctions. More interestingly, we noted a ~5-fold decrease in HiFi coverage along the DToL scaffold_40 (part of BP2) relative to its flanking regions, indicating a highly repetitive, low-mappability region at this boundary.

      Next, we realigned the Hi-C data to the scaffolded assemblies using bwa-mem2 [91] and extracted all trans HiC pairs (between-scaffold contacts) using pairtools [92]. We normalized trans HiC contacts to the scaffold length and compared contact rates between breakpoint scaffolds to the baseline contact rate (computed from pairs of scaffolds with a clear 1-to-1 match between assemblies), and the contact rate within scaffolds (intra-scaffold pairs) (Supplementary Figure 3B,C). The contact rates within breakpoints were consistently lower than within scaffolds, likely falling below the threshold to be merged during assembly. However, the contact rates at three of four breakpoints (BP1, BP3, BP4) were significantly elevated above the genome-wide background distribution (empirical p = 0.010, 0.005, 0.005 respectively), suggesting that they may represent intra-chromosomal contacts disrupted by a misassembly. Notably, BP2 was not significant (empirical p = 0.170), likely due to the low coverage and mappability around the DToL scaffold_40 boundary. Considered jointly, the three DToL breakpoint scaffold pairs showed significantly higher trans contact rates than the background (Wilcoxon rank-sum, one-tailed, U = 1771, p = 0.004).

      Lastly, we analyzed the repeat landscape around the 200 kb scaffold ends using RepeatMasker [93] and the custom repeat library that we had generated for Sepia officinalis (described further below). Compared to control scaffolds of the same assembly, we observed consistently elevated repeat content at the breakpoint junctions (mean 71.5% vs 67.6% masked bases), with an enrichment of unclassified repeats (32.1% vs 30.0%), which could explain a repeat-driven assembly fragmentation or scaffolding failure. The BP2 DToL scaffold_40 junction window was 99.99% masked (99.2% unclassified repeats), providing a likely mechanistic explanation for both the HiFi coverage drop and the absence of a significant trans Hi-C signal at this breakpoint. Taken together, these analyses suggest that the different chromosome numbers across the two S. officinalis assemblies are due to technical reasons, caused by repeat-rich scaffold boundaries that impair HiFi and Hi-C read alignment and in turn, correct assembly in these regions.”

      (c) Combining your data and that of DToL for even deeper coverage (heterozygosity is low enough that I don't imagine this impeding things too badly).

      When combining the data to achieve a higher coverage, we ran into the assembly fragmentation issues detailed above in response 1) to Reviewer 1.

      (2) Looking at Figure 1, there appears to be a misjoin at chromosome 42. Looking carefully at Figure S1, that misjoin does not appear on any of the panels - this is confusing. Given the size of that chromosome and the authors' chromosome numbering, I'm guessing this is a manual merge (as it's larger than most of the chromosomes numerically close (40, 41, 43, etc). Further, staring closely at Figure 1, there appear to be cross-scaffold contacts between 42 and 43 and 42 and 44. Secondarily there are contacts between 43 and 44. This bit of the assembly seems potentially problematic.

      This is a great observation, indeed the HiC maps differ between Figure 1 and Figure S1. Figure 1 is the result of scaffolding with YAHS and manual curation, whereas Figure S1 was scaffolded using HapHiC. We updated the figure legend to clarify this important difference. HapHiC produces very clean contact maps without the need for manual curation, but when analyzed at a higher resolution, the tool broke many contigs and ultimately compromised the assembly quality, possibly due to our comparatively low HiC coverage. Thus, we preferred to use YAHS and manual curation, which is perhaps inherently error-prone, as becomes apparent in the regions of the assembly that are pointed out by the reviewer.

      Reviewer 3 (Public review):

      Summary:

      In this study, authors Simone Rencken and co-authors present and investigate the genome of the common cuttlefish Sepia officinalis.

      Strengths:

      The authors explain in a detailed yet concise manner the main steps for a genome assembly, with very robust methods for validation, and according to current best practices. In addition to the chromosomal assembly, the authors confirmed the presence of 47 chromosomes using Hi-C data and multiple species synteny. They also generated a comprehensive gene annotation, with assessments of gene completeness, providing a useful resource for the community of researchers interested in cuttlefish biology and comparative genomics.

      Weaknesses:

      While the study touches upon the subjects of gene content, TE activity, or species-level comparisons, the study does not provide in-depth investigations of these.

      We thank the reviewer for their positive assessment of our manuscript. We acknowledge the descriptive nature and limitations of our previous analyses of gene content, TE distribution, and species comparisons. Our focus for the initial submission was to provide a high-quality assembly that could serve as a resource for anyone interested in Sepia officinalis or related species. However, we agree that greater insight into genome content is valuable as well. In the revised manuscript, we included a more detailed analysis of expanded gene families and GO enrichment analysis of our bulkRNAseq data, which we summarized in response 4) to reviewer 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor Revisions Recommended:

      (1) Figure and legend clarity

      Several figures lack sufficient annotation. All figures, including supplementary ones, should include:

      (a) Clear axis labels.

      (b) Descriptions of statistical measures (n values, error bars, statistical tests).

      (c) Legends that allow the figure to be understood independently of the main text.

      We updated the figures accordingly.

      (2) Terminology and formatting

      (a) Consistency in gene and species nomenclature should be maintained throughout (e.g., italicizing gene names and Latin binomials).

      (b) Ensure that abbreviations (e.g., Hi-C, BUSCO, FISH) are defined upon first use.

      We updated the nomenclature throughout the text and checked the definition of abbreviations used in the text. Further, we updated the names of several cuttlefish species according to the recent revision of genera, e.g. Sepia esculenta was changed to Acanthosepion esculentum [3].

      (3) Literature coverage

      The references primarily focus on earlier studies from 2010-2020. It would strengthen the context to include recent high-impact studies on cephalopod genomics and chromosomal biology published in the last 3 years (e.g., 2022-2024).

      We apologize for this oversight and have extended the manuscript to discuss more of these recent studies.

      (4) Clarify methods

      While the methods section is generally detailed, some critical aspects are underspecified:

      (a) Parameters used in genome annotation tools (e.g., BRAKER, RepeatMasker).

      We thank the reviewer for bringing our attention to this shortcoming, and have added the missing parameters to the methods section. Additionally, the full code is available at https://gitlab.mpcdf.mpg.de/mpibr/laur/cuttlefishomics/soffgenome

      (b) Criteria for ortholog clustering and gene family expansion analysis.

      The details have been added to the methods section, which now reads (lines 828-853):

      “Orthogroups were inferred across 13 molluscan species (Table 2), including S. officinalis, using OrthoFinder v3.1.0 [122] with default parameters. The input proteomes included the longest protein isoform per gene for each species. The rooted species tree from OrthoFinder [182,184] was converted to an ultrametric tree using the R package ape [183] v5.8.1.

      Gene families were filtered by removing orthogroups present in only a single species, and by separating orthogroups containing 100 or more gene copies in any species, as extreme copy-number differences in gene families prevent likelihood calculation under the applied birth-death model.

      Gene family evolution rates were estimated using CAFE5 [128] v5.1.1 on the filtered orthogroups, using the ultrametric species tree as input. Four models were evaluated: the base model (single global lambda), and Gamma models with k = 2, 3, and 4 rate categories, which allow evolutionary rate variation among gene families. The Gamma k = 3 model was selected based on the best (lowest) final log-likelihood score. All subsequent statistical inferences were performed under this model.

      For families showing statistically significant expansion or contraction (p < 0.05 after Bonferroni correction), branch-specific copy-number changes were extracted from the CAFE5 output. Families were categorized as S. officinalis-specific, coleoid-specific, or broad expansions based on the distribution of significant changes across the phylogeny.

      To assess whether expanded gene families in S. officinalis contained genes derived from or embedded within repetitive elements, a coordinate-based overlap analysis was performed. For each gene in an expanded orthogroup, the overlap between its coding sequence (CDS) coordinates and RepeatMasker annotations was computed using bedtools intersect v2.30 [185]. To avoid double-counting when multiple repeat annotations overlapped the same coding bases, overlapping repeat intervals were merged per gene prior to summing covered bases, and the overlap fraction was computed as merged covered bases divided by total CDS length.”

      (c) Thresholds or cutoffs for synteny or duplication detection.

      We included the details in the updated methods (lines 755-781):

      “Synteny analyses between all chromosomes of the compared species were performed using the R package GENESPACE v.1.2.3 [175] with default parameters, described briefly below. Protein sequence similarity was first estimated using DIAMOND2 [109] in fast mode, and orthogroups and pairwise orthologues were inferred using OrthoFinder v2.5 [176] with hierarchical orthogroups (HOGs) enabled. Prior to synteny inference, tandem arrays were condensed to their most central representative gene, and gene rank order was recalculated on these array-representative genes to reduce confounding effects of tandem duplication on collinearity detection.

      Syntenic blocks were identified pairwise between all genome combinations using MCScanX [177], constrained to DIAMOND hits where both query and target genes belonged to the same orthogroup (onlyOgAnchors = TRUE). Initial anchor hits were clustered into large syntenic regions using a density-based spatial clustering approach (dbscan [178]), with a minimum block size of five anchor genes (blkSize = 5) and a maximum of five intervening non-anchor genes permitted within a block (nGaps = 5). Anchor clustering used a search radius of 25 gene-rank positions (blkRadius = 25). All hits falling within a syntenic buffer of 100 gene-rank positions around confirmed block anchors (synBuff = 100) were retained as syntenic. No secondary syntenic hits were included (nSecondaryHits = 0). Syntenic orthogroups were integrated across all pairwise comparisons and collapsed into a pan-genome annotation anchored to. S. officinalis was used as the reference genome.

      Syntenic relationships were visualized as riparian plots and pairwise dotplots using the built-in plotting functions of GENESPACE v1.2.3. Riparian plots were constructed using physical chromosomal coordinates (useOrder = FALSE) with S. officinalis as the reference, displaying all three genomes. A second riparian plot was generated highlighting a region of interest. Pairwise dotplots were produced species for the S. officinalisD. pealeii and S. officinalisE. scolopes genome comparisons, displaying only synteny-validated hits (type = "syntenic") with a minimum synteny score of 10 (minScore = 10) and a minimum of 10 genes per chromosome pair required for display (minGenes2plot = 10).”

      Reviewer #2 (Recommendations for the authors):

      Line 153 should be supplemental Figure 3B.

      The text was referring to the correct Figure 2B (three species synteny comparison). It is now updated to Figure 3B in the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) L37: Perhaps add a comparison with other species (mammals, Drosophila, etc.) to put this number in context.

      We agree with this recommendation and added numbers for Drosophila and mouse to the text (lines 40-45):

      “Coleoid cephalopods (octopus, squid, cuttlefish) are a highly derived group of mollusks, characterized by the largest nervous systems among all invertebrates (ca. 500 million neurons in an adult octopus of which 200 million are in the central brain [1,2], compared to ca. 140,000 in the fruit fly [3] or 70 million in the mouse [4]) and specializations with a great historical importance for neuroscience (e.g., “giant axons” [5] and “giant synapses” [6–8]).”

      (2) L51, 279: "Octopodiformes" is a superorder, not a genus or a species name. It should not go in italics.

      We updated this throughout the text.

      (3) L53: "even smaller" seems odd here, because the argument of the sentence is to stress the large genome size of Octopodiformes. Perhaps start the sentence by stating that it is sometimes smaller, but often larger.

      We rephrased the sentence for clarity, it now reads (lines 55-58):

      “While the genomes of Octopodiformes (Octopus, Eledone, Argonauta) are either smaller than (1.1 Gigabases or Gb [45]) or comparable in size to that of humans (around 3 Gb [46,47]) the typical genomes of Decapodiformes (squids and cuttlefish) often reach 6 Gb [48,49].”

      (4) L90: What tool was used to estimate the k-mer distribution of the long reads? Jellyfish? FastK? It's not mentioned anywhere in the text.

      (5) L95: What k-mer size did the authors use to estimate k-mer distribution?

      We thank the reviewer for pointing out this missing information, and have included the details in the methods (lines 692-694):

      “The k-mer distribution was estimated using Meryl [165] within the Merfin [166] package with a k-mer size of 21, and genomeGenome size was estimated using GenomeScope [77] from Illumina short reads and PacBio HiFi data.”

      (6) L99: What about using the most recent BUSCO databases? odb12?

      We thank the reviewer for this question, which prompted us to compute BUSCO scores using the more recent odb12 database. The results are shown in Supplementary Figure 2C. Both gene sets have been refined by including more species and using a more stringent filtering approach, so the more recent database contains fewer and more conserved genes [4]. For the mollusca gene sets, a great improvement in completeness was observed between odb10 and odb12 (Supplementary Figure 2C); the metazoan completeness was marginally increased. Therefore, we evaluated all new assemblies produced since the first submission with the odb12 database.

      (7) L107: How many scaffolds were obtained in total? After manual curation, how many of the scaffolds were placed in the "correct" chromosomes? How many scaffolds were in the shrapnel? Were these scaffolds mostly repetitive regions? Or did they contain important genetic information?

      These are important questions. To evaluate the content of the “shrapnel”, we split the manually curated assembly into the 47 chromosomes and the 1840 residual scaffolds, and computed BUSCO scores for both. While the 47 chromosome scaffolds contain the majority of conserved genes: C:92.9%[S:92.7%,D:0.1%],F:4.0%,M:3.1% with metazoa_odb12 and C:88.7%[S:88.0%,D:0.7%],F:4.4%,M:6.9% with mollusca_odb12, the unplaced scaffolds still contain a few BUSCOs: C:2.5%[S:2.4%,D:0.1%],F:2.4%,M:95.1% from metazoa_odb12 and C:1.9%[S:1.7%,D:0.2%],F:1.2%,M:96.9% from mollusca_odb12. Even if only a few BUSCOs are present on these scaffolds, it means they contain important genetic information. Additionally, we observed low, but non-zero alignment of RNA reads to these scaffolds. We observed a slightly elevated repeat content in the unplaced scaffolds (Author response image 2), and a variable base composition (Figure 1C) compared to the chromosome scaffolds.

      Author response image 2.

      Quantification of repeat content in chromosome scaffolds and unplaced residual scaffolds. Density plot showing fraction of repeat masked bases in total sequence length for chromosome scaffolds (i.e. scaffolds 1-47) in teal and all remaining small scaffolds (1840 scaffolds) in purple. Median repeat fraction is shown as vertical lines.

      The slightly elevated repeat content in the unplaced scaffolds provides a likely explanation for their fragmented state: repeat-rich regions are inherently difficult to assemble and scaffold, as repetitive sequences cause ambiguous read alignments that prevent contigs from being confidently joined or anchored to chromosomal scaffolds during HiC-based scaffolding. This is consistent with the near-complete absence of BUSCO genes from the unplaced scaffolds - not because these fragments lack biologically relevant sequence entirely, as evidenced by the residual BUSCO hits and RNA read alignments, but because the gene-rich portions of the genome are largely captured in the 47 chromosome scaffolds. The unplaced scaffolds instead likely represent fragmented contigs from repetitive or low-complexity genomic regions, such as centromeres, telomeres, and transposable element clusters, where assembly graph complexity and collapsed repeats prevent confident placement. The variable base composition further supports this interpretation, as GC-extreme or low-complexity sequences are disproportionately represented in assembly shrapnel. Together, these observations suggest that the unplaced scaffolds contain limited unique coding content but reflect genuine repeat-rich genomic sequence that cannot currently be placed without additional long-range information, such as optical mapping or ultra-long reads.

      (8) L33, 53, 240, 255, 279: Decapodiformes, not in italics.

      We changed this throughout the text.

      (9) L228: Can you put this expansion in perspective with other taxa?

      We added a more detailed comparison of our gene family expansion with different species to the revised manuscript, as detailed in response 4 to reviewer 1.

      (10) L251: "However, our results show how difficult it still is to assemble large genomes with high karyotype numbers." Can you clarify how your results show this, because it is equally spectacular to assemble the karyotype with only PacBio and Hi-C data (and no linkage mapping).

      Indeed, it is correct that the recent improvements in data quality and scaffolding algorithms enable these “spectacular” chromosome-scale assemblies without the need for linkage mapping. This sentence reflected our expectation to resolve a clear karyotype as has been demonstrated for multiple cephalopod genomes in recent years, including two cuttlefish species (Octopus bimaculoides, Octopus vulgaris, Euprymna scolopes, Euprymna berryi, Acanthosepion lycidas and Acanthosepion esculenta). To our knowledge, none of these publications used linkage mapping or cytogenetic methods to confirm the karyotype. In this light, our resulting chromosome number and the discrepancy to a second assembly of the same species led us to this conclusion. We updated the section in the revised discussion as follows (lines 466-473):

      “Taken together, our results illustrate the difficulty of assembling large genomes with high repeat content and large karyotypes, at least from sequencing data alone. Internal validation methods and genome comparisons across species are therefore important. Convergence of reliable estimates will, in turn, help identify chromosomal fusion-with-mixing events (FWM; fusion of two ancestral chromosomes followed by extensive shuffling of their gene content) that are clade specific. Early branching order in Decapodiformes has been notoriously unstable [53,84,94,144–147]; thus, such rare and irreversible FWM characters could be useful in further phylogenetic analysis of this clade [51,148].”

      (11) L419: Why use the phased haplotype 1 instead of the primary assembly generated by hifiasm?

      We thank the reviewer for this important question. We used the phased haplotype assembly because it provides a biologically coherent representation with the least amount of duplication by avoiding allele-collapsing and haplotype-switching that can be present in the primary assembly. We reasoned that this would result in clearer gene models and a more accurate representation of structural variation. However, we acknowledge that this comes at the cost of reduced contiguity and completeness, as becomes apparent in our BUSCO comparison shown in Supplementary Figure 2, where the phased haplotypes have fewer duplicated genes than the primary assembly, but more missing genes in turn. When reassembling both datasets for our comparison, we used the primary assembly to use the longest contigs as input for scaffolding.

      (12) L444: It is unclear from what tissues and life stages RNA-seq data were used or were available from other species.

      This is an important detail. RNA-seq data was collected from two adult Sepia officinalis, from various tissues (whole brain, retina, skin, mantle, arm, tentacle). For the long-read PacBio Isoseq data, tissue was taken from the animal used for genome sequencing (6 months old), and tissue for short-read Illumina RNA-seq was taken from another adult (8 months old). The data have been released on SRA (study accession SRP570862), where all sample details are listed as well. We added the SRA accession to the data availability section of the revised manuscript. We clarified the relevant sections in the methods:

      lines 628-629:

      “RNA was isolated from various flash-frozen tissues (different brain areas, mantle/epidermis, arm/tentacle; 5-10 mg each).”

      lines 678-680:

      “For short-read RNA sequencing, tissue from another animal (8-month-old adult, F0 from eggs collected in Normandie, France) was used. RNA was isolated from various flash-frozen tissues (different brain areas, skin and retina; 5 mg each).”

      (13) L454, 469: Why is minimap2 in italics? It wasn't formatted like this before. Same for StringTie.

      We thank the reviewer for their detailed methods review. In the updated methods section, all formatting of used softwares was harmonized.

      (14) L461: Lophotrochozoa is a clade, not a genus or species. Not in italics.

      This is now changed throughout the revised manuscript.

      (15) Figure 1D: Axes labels are hard to read.

      We have now increased the axis label size.

      (16) Figure 2: Consider increasing font sizes. Many chromosome orientations seem to be flipped across species, which makes it harder to see smaller-scale rearrangements or notice less conserved chromosomes. Would it make sense to standardize these?

      We increased the font sizes and plotted only fully collinear syntenic blocks (instead of aggregated syntenic regions, the default of GENESPACE) for improved readability.

      References:

      Below are references cited in our responses. References from the reproduced manuscript sections are included in the revised manuscript.

      (1) Secomandi, S., Gallo, G.R., Rossi, R., Rodríguez Fernandes, C., Jarvis, E.D., Bonisoli-Alquati, A., Gianfranceschi, L., and Formenti, G. (2025). Pangenome graphs and their applications in biodiversity genomics. Nat. Genet. 57, 13–26. https://doi.org/10.1038/s41588-024-02029-6.

      (2) Open2C, Abdennur, N., Fudenberg, G., Flyamer, I.M., Galitsyna, A.A., Goloborodko, A., Imakaev, M., and Venev, S.V. (2023). Pairtools: from sequencing data to chromosome contacts. Preprint at bioRxiv, https://doi.org/10.1101/2023.02.13.528389 https://doi.org/10.1101/2023.02.13.528389.

      (3) Lupše, N., Reid, A., Taite, M., Kubodera, T., and Allcock, A.L. (2023). Cuttlefishes (Cephalopoda, Sepiidae): the bare bones—an hypothesis of relationships. Mar. Biol. 170, 93. https://doi.org/10.1007/s00227-023-04195-3.

      (4) Tegenfeldt, F., Kuznetsov, D., Manni, M., Berkeley, M., Zdobnov, E.M., and Kriventseva, E.V. (2025). OrthoDB and BUSCO update: annotation of orthologs with wider sampling of genomes. Nucleic Acids Res. 53, D516–D522. https://doi.org/10.1093/nar/gkae987.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Although not undermining these data, there are a few potential weaknesses that reduce the impact of the work. For example, the inability to directly assess whether cue-induced drug-seeking is in fact augmented compared to daily intake during self-administration in the maintenance face only permits the authors to denote that re-exposure to cues and the context is sufficient to promote active lever pressing without demonstrating whether seeking behavior is in fact elevated further during a cue test. This is notably understandable as drug available sessions were 6-hours versus a 1-hour relapse test. Importantly, it is clearly demonstrated that drug seeking is higher on average in female mice after 14 days versus 1 day.

      We agree that the current design does not allow us to directly assess whether cue induced drug-seeking is augmented relative to the average self-administration intake. However, this comparison was not a question examined in the manuscript and was not an intended interpretation of the data. Our analyses and interpretations focused on comparisons between saline and oxycodone groups tested under identical cue-induced relapse conditions. While it does not change or contradict the reviewer’s point, we would also like to clarify that the relapse test was 2 hours long.

      (2) With regard to the interpretation of electrophysiology findings, the lack of inclusion of an abstinence-only group does not permit interpretations to parse out whether observed increases in synaptic strength (or the lack of) reflect abstinence or an interaction between abstinence period and re-exposure to the operant chamber, as slices were taken 30-45 min post relapse test.

      The inclusion of an abstinence-only control group would have been required to definitively dissociate synaptic changes driven by abstinence alone from those arising from an interaction between abstinence and re-exposure to the operant context during the relapse test. In the present study, electrophysiological recordings were intentionally performed 30 to 45 minutes following the relapse test to capture synaptic modifications associated with cue-induced drug-seeking after abstinence. Accordingly, we interpret these findings as reflecting the neural state following relapse rather than abstinence alone, and we have revised the text accordingly to clarify this point.

      (3) With regard to the interpretation of electrophysiology findings, the lack of inclusion of an abstinence-only group does not permit interpretations to parse out whether observed increases in synaptic strength (or the lack of) reflect abstinence or an interaction between abstinence period and re-exposure to the operant chamber, as slices were taken 30-45 min post relapse test. While much literature has shown that drug-induced adaptations in the NAc require a post-drug period for plasticity to measurably emerge, studies have also shown that re-exposure to heroin-associated cues following abstinence seemingly "reverses" increases in cell excitability in prelimbic-NAc pyramidal neurons (Kokane et al., 2023) and that depotentiation of morphine-induced increases in synaptic strength in the NAc shell can be depotentiated by drug re-exposure - an effect also observed with cocaine re-exposure (Madayag et al., 2019). Notably, the lack of effect at 14 but not 1 day supports the likelihood that the relapse test does not in fact influence the plasticity within the PVT-NAcSh circuit.

      We thank the reviewer for highlighting relevant literature showing that drug or cue re exposure can modify or reverse drug-induced plasticity in NAc-related circuits. We want to clarify that, in our dataset, synaptic changes in the PVT-NAcSh pathway are seen after 14 days of abstinence, but not after 1 day. Therefore, the lack of effect at the earlier time point and its appearance after extended abstinence support the idea of time-dependent plasticity. Although electrophysiological recordings were taken soon after the relapse test, this temporal pattern argues against relapse testing alone as the primary driver of the observed synaptic changes. We have updated the text to clarify this point.

      (4) While the lack of effect on AMPAR:NMDAR ratio and rectification indices do support the notion that enhanced EPSC amplitudes in input-output curves do not reflect a change in AMPAR subunit expression (i.e., increased GluA2-lacking receptors that exhibit inward rectification at depolarized potential) nor a change in postsynaptic sensitivity to glutamate, without direct assessment of AMPAR-specific and NMDAR-specific input output curves, it doesn't definitively exclude the possibility that both AMPA and NMDA receptor currents are being upregulated, thus negating an observable change in postsynaptic strength.

      We agree that unchanged AMPAR/NMDAR ratios and rectification index suggest against altered AMPAR subunit composition or simple postsynaptic sensitivity changes. Although receptor-specific input-output analyses would be necessary to definitively rule out proportional increases in both AMPA and NMDA receptor currents, we have updated the manuscript to clarify that our conclusions are limited to the synaptic measures we obtained. The revised text now states that acute or prolonged abstinence “might have no detectable postsynaptic effects as assessed by these synaptic measures” at PVT-NAcSh synapses.

      Reviewer #2 (Public review):

      (5) While this paper is certainly interesting, and well-written, and the experiments seem to be well performed, the behavioral and physiological effects observed are somewhat divorced. Specifically, what accounts for the heightened relapse in females? Since no opioid-related sex differences were observed in PVT-NAcSh neurophysiology, it is unclear how the behavioral and neurophysiological data fit together. Furthermore, the lack of functional manipulation of PVT-NAcSh circuitry leaves one to wonder if this circuit is even important for the behavior that the authors are measuring. I would be more positive about this study if the authors were able to resolve either of the two issues noted above.

      A key challenge in circuit-based studies of motivated behavior is connecting circuit-level plasticity to complex, sex-dependent behavioral phenotypes. In this study, we do not mean to imply that synaptic plasticity within the PVT-NAcSh projection alone explains the increased relapse seen in females. Instead, our electrophysiological data indicate that this projection experiences time-dependent, abstinence-dependent changes in synaptic strength, offering important insights into when and where circuit-level adaptations may occur. We also believe that the lack of obvious sex differences in PVT-NAcSh synaptic strength does not rule out this circuit's role in sex-specific behavior. Growing evidence suggests that sex differences in relapse and motivated behaviors may stem from different modulation of shared circuits (for example, via ovarian hormones, neuromodulatory tone, or upstream inputs), rather than from significant differences in baseline synaptic properties within a given projection. Regarding circuit relevance, extensive previous research has identified the PVTNAcSh pathway as a critical regulator of cue-induced reward seeking and relapse. Our findings expand on this by showing that this projection displays abstinence-dependent synaptic strengthening after oxycodone self-administration. Although functional manipulation of this circuit is needed to confirm its causal role, such experiments were beyond the scope of this study.

      (6) There are insufficient animals in some cases. For example, in Figure 4, the Male Saline 14-day abstinence group (n = 3 rats) has less than half of the excitability as compared to the Male Saline 1-day abstinence group (n = 7 rats). This is likely due to variance between animals and, possibly, oversampling. Thus, more rats need to be added to the 14-day abstinence group. Additionally, the range of n neurons/rat should be reported for each experiment to ensure readers that oversampling from single animals is not occurring.

      We appreciate the reviewer's concern regarding the number of animals and the potential for oversampling. We take this concern seriously and have substantially revised our statistical approach in response.

      All spike count data were reanalyzed using nested hierarchical Poisson generalized linear mixed-effects models (GLMMs), fitted separately for each sex and abstinence duration. Each model included injected current (mean-centered), drug condition, and their interaction as fixed effects, with random intercepts and slopes for injected current at the animal level, and random intercepts for cells nested within animals. Importantly, this reanalysis changed several of our original conclusions. Effects that appeared significant under the conventional cell-level analysis were no longer statistically significant once the hierarchical structure of the data was properly modeled. We report these corrected results transparently throughout the revised manuscript.

      However, in males after prolonged abstinence, oxycodone-treated animals showed a higher spike output than controls, with a large effect size. Post-hoc analysis showed only 20% power with current sample (3 saline, 4 oxycodone rats). To reach 80% power, 13 rats per group are needed. We report this as a trend that warrants further study and have revised related sections to reflect this. The data suggest a possible neuroadaptation in males that the study is underpowered to confirm, not a null effect.

      In response to this comment, we have updated Figure 5, the Results and Discussion sections, and the Statistics/Methods section to clearly describe the nested hierarchical modeling approach, report corrected statistical values, and acknowledge the power limitation for the male prolonged abstinence group. The figure legend now reports the number of neurons recorded per rat, showing the distribution across animals rather than individual subjects.

      (7) The IPSC data, for example in Figure 4, is one of the more novel experiments in the manuscript. However, it is quite challenging to see the difference between males and females, saline and oxycodone, at low stimulation intensities within the graph. Authors should expand this so that reviewers/readers can see those data, especially considering other work suggesting that PVT synaptic input onto select NAc interneurons is disrupted following opioid self-administration. Additional comment: It's also interesting that the IPSC amplitude seems to be maximal at ~2mW of light, whereas ~11 mW is required to evoke maximal EPSC amplitude. It would be interesting to know the authors' thoughts on why this may be.

      While visual separation between conditions at low light levels is subtle, we addressed this directly using linear mixed-effects modeling, which evaluates IPSC amplitudes across the full range of stimulation intensities while accounting for repeated measurements from cells nested within animals. This approach provides greater sensitivity than visual inspection alone and avoids over interpretation of noise at individual stimulation levels.

      Using this framework, we observed robust main effects of light intensity in both males and females, indicating preserved recruitment of inhibitory synaptic responses as stimulation increased. Importantly, no significant Light × Condition interactions were detected in either sex, indicating that the scaling of IPSC amplitudes with light intensity was not altered by oxycodone exposure.

      With respect to the observation that IPSC amplitudes appear to reach near-maximal levels at lower light intensities (~2 mW) compared to EPSCs (~11 mW), we agree that this distinction is intriguing. One possible explanation is that the depend on the recruitment of local interneurons. However, the number of interneurons activated by PVT interneurons is limited and inhibitory responses may reach a plateau at relatively low light intensities once these interneurons are fully recruited.

      On the other hand, the increased intensity of photostimulation would result in an increase of monosynaptic EPSC amplitude over a wider range of stimulation (light) intensities, as increased intensity of light would recruit more ChR2-expressing PVT fibers, resulting in larger EPSCs.

      (8) There is an inadequate description of what has been done to date on the PVT-NAc projection regarding opioid withdrawal, seeking, disinhibition, and the effects on synaptic physiology therein. For example, a critical paper, Keyes et al., 2020 Neuron, is not cited. Additionally, Paniccia et al., 2024 Neuron is inaccurately cited and insufficiently described. Both manuscripts should be described in some detail within the introduction, and the findings should be accurately contextualized within the broader circuit within the discussion.

      In the revised manuscript, we expanded the Discussion to give a more thorough overview of previous research on the PVT-NAc pathway in relation to opioid-related behaviors and synaptic changes. Specifically, we added more detail about Keyes et al., 2020 and Paniccia et al., 2024, clarifying their findings and placing them within the context of the circuit mechanisms studied in our work. We also revised the text to ensure the descriptions of these studies are accurate and that their conclusions are properly related to our findings.

      (9) Related to the above, the authors should provide a more comprehensive description of how PVT synapses onto cell-type specific neurons in the NAc which expand beyond MSNs, especially considering that PVT has been shown to influence drug/opioid seeking through the innervation of NAc neurons that are not MSNs. For example, see PMIDs 33947849, 36369508, 28973852, 38141605.

      In the revised manuscript, we expanded the Discussion to describe the diversity of PVT projections within the NAc and the potential role of non-MSN neuronal populations in drug-related behaviors. We added discussion on the broader circuit context and other cell types where relevant to the focus on synaptic transmission onto MSNs. Since our experiments specifically examined synaptic physiology in MSNs, we focused the literature discussion on studies most directly related to MSNtargeted PVT inputs and opioid-related behaviors.

      Reviewer #3 (Public review):

      (10) Additional experiments could strengthen the results and help clarify synaptic mechanisms underpinning behavioral sex differences.

      We agree that additional experiments focused on identifying cell-type-specific mechanisms within the PVT-NAcSh circuit would further enhance understanding of the neural substrates behind the observed behavioral sex differences. In the revised manuscript, we have expanded the Discussion to explicitly acknowledge these limitations and clarify the scope of our current study. Specifically, we discuss the possibility that sex-specific adaptations might occur in particular neuronal subpopulations or circuit components that were not resolved in the present experiments. We also mention that future research using cell-type–specific approaches will be necessary to determine if such mechanisms contribute to the increased oxycodone seeking seen in females after prolonged abstinence. We appreciate the reviewer’s suggestions and have incorporated this perspective into the revised manuscript to better contextualize our findings and outline future directions.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Ho and Schock investigates the role of the Z-disc protein Zasp52 during Drosophila flight muscle development. It was known before, mainly by findings from this group, that Zasp52 is required for normal sarcomere morphogenesis, specifically Z-disc morphogenesis in indirect flight muscles. But the exact molecular mechanism by which Zasp52 contributes, apart from the fact that it is localised there and is somehow involved in multimerization/cross-linking, was not clear. This paper proposes that an intrinsically disordered region (IDR) in Zasp52 is needed for some of its functions, by stabilising Zasp52 localisation at the Z-disc. Specifically, the IDR in Zasp52 is proposed to be required for Z-disc maintenance during the mechanical challenges of flight, while being dispensable for the initial morphogenesis during development. This hypothesis is supported by strong genetic evidence and behavioural tests, deleting Zasp's IDR impairs flight from mid-age onwards, while a block in flight activity lifts the phenotype.

      However, some of the phenotypic analysis, in particular the bending of the sarcomere, likely upon mechanical challenge by muscle contractions, needs more detailed investigations to be fully convincing.

      Strengths:

      (1) The linker in the alternatively spliced exon 15 of Zasp52 was deleted with a state-of-the-art genetic editing strategy. Surprisingly, flies are homozygous viable, showing that this long part of the Zasp52 protein is not essential for animal survival or sarcomere morphogenesis.

      (2) The observed sarcomere phenotypes with age, especially the bending Z-discs, are new and exciting.

      (3) The displayed EM images document interesting phenotypes.

      (4) Most of the observed phenotypes can be rescued by re-expression of the long Zasp52 isoform, which does contain the IDR region, but not by a shorter one without it, suggesting that IDR is important.

      (5) FRAP data measure the local turnover of a short-ZaspGFP and show that this increased in the Zasp mutant lacking the IDR domain, suggesting that Zasp-IDR might stabilise Zasp at the Z-disc.

      (6) Interestingly, flight and sarcomere morphology phenotypes can be rescued by preventing the flies from flying, suggesting that they are mechanically induced.

      Weaknesses:

      (1) The western blot quantifications of Zasp isoform expression are weak. No error bars are indicated in the quantifications; the quantifications appear to be more qualitative than quantitative. According to band intensities, the long Zasp isoforms seem to be less present compared to the shorter ones, even in the flight muscles.

      We will work on including quantifications with error bars for the Western blots in our resubmission. It is important to keep in mind that the main point in figure 1B is that there are plenty of exon15e-containing isoforms in IFM, in contrast to other tissues with very limited exon15e-containing isoforms. This is confirmed by the analysis of RNAseq data in figure 1C, and of course, by the flightless phenotype of the exon15e mutant.

      (2) The phenotypic analysis of the sarcomere appears somewhat superficial throughout the paper. Only Zasp52 and phalloidin are shown; no other Z-disc or thick filament proteins. At least myosin stainings and overview images are important to better judge the phenotypic variations. Are the variants between individuals or regional in the same muscle?

      Our images are representative of the observed phenotypes. We aim to provide overview images and other stainings to better illustrate the phenotypic variations in the revised version. Phenotypes are consistently present across all individuals, as reflected in our replicates. Interestingly, they appear to not be randomly interspersed among the sarcomeres but concentrated in certain regions of muscle more than others.

      (3) EM images would benefit from better quantification.

      We do not believe that EM images can be meaningfully quantified, because of the many selection steps preceding image acquisition.

      (4) Other proteins were not analysed with the FRAP-based turnover assay for comparison in wild type and mutant. All Z-proteins might turn over faster in the mutant with the defective Z-disc.

      This is the point we are trying to make. The Zasp52 IDR acts like a glue stabilizing all Z-disc proteins. We performed this experiment as a first step to explore whether an exon15e-lacking system exhibited modified dynamics, and we aim to provide more data in the revised version.

      Reviewer #2 (Public review):

      Summary and Strengths:

      This in-depth genetic analysis of Zasp52 function in Drosophila indirect flight muscle (IFM) provides an interesting perspective regarding the role of a partially disordered region (IDR) in exon 15e. This exon seems to be exclusively present in IFM and contributes to the prevention of myofibril disintegration during aging, likely due to interactions of this region with Z-disc insertion and/or stability. The addition of an isoform (PR) that lacks exon 15e serves as a nice control to illustrate the necessity of exon 15e in muscle structure and function. Overall, the manuscript is exceptionally well-written, logical, with nicely controlled experiments and detailed statistical analysis that largely support the conclusions drawn by the authors. While exon 15e is clearly involved in preventing muscle degeneration, a solid role for thin filament stability is not clearly shown (as mentioned in the abstract). In addition, which regions/how the proteins of the IDR may contribute are unclear.

      Weaknesses:

      (1) It is not clear in Figure S1A where exon 15e fits within the Zasp52 locus schematic. This is important as a premise of this paper describes this region to be key, and proof from multiple prediction programs would lend more weight to the prediction of the exon being largely disordered. Inclusion of the discussed short linear motifs, comparison with Canoe or LBD3 for similarities and/or an Alphafold structure would help make the authors' point (colorized with known domains).

      We will add a bar below figure S2A to show the region corresponding to exon 15e. We used three disorder prediction programs and one structure (order) prediction program. The majority of exon15e is completely disordered and of very low confidence score, and thus uninformative to display as an Alphafold structure. Likewise, IDR’s are very difficult to classify, therefore we cannot say much more than that LDB3, Zasp52, and Canoe contain IDRs, with Zasp52 and Canoe both having an actin-binding domain within the IDR. We will provide more data on the function of the ABD in the revised version.

      (2) Interesting that immobilization rescues the deterioration phenotypes. The authors should explain in more detail how this was done to avoid dehydration/starvation of the flies.

      We will provide more details in the revised version.

      (3) There is a lot of discussion about the potential function of the IDR region, specifically a putative actin binding motif or other 'ordered' regions that may contain short linear motifs. It would strengthen the findings to show which of these may be essential for Zasp52 function in the IFM. The ability to bind actin could be tested biochemically, and/or smaller deletions could be made to unequivocally test the role of the ABD vs other predicted motifs using genetics. If some of these regions are more ordered, where do they lie within, and do they form a predicted fold or structure that gives insight into function?

      We will provide data on the function of the ABD in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors report the results of a tDCS brain stimulation study (verum vs sham stimulation of left DLPFC; between-subjects) in 46 participants, using an intense stimulation protocol over 2 weeks, combined with an experience-sampling approach, plus follow-up measures after 6 months.

      Strengths:

      The authors are studying a relevant and interesting research question using an intriguing design, following participants quite intensely over time and even at a follow-up time point. The use of an experience-sampling approach is another strength of the work.

      Weaknesses:

      There are quite a few weaknesses, some related to the actual study and some more strongly related to the reporting about the study in the manuscript. The concerns are listed roughly in the order in which they appear in the manuscript.

      We truly appreciate your dedicating time and efforts to review our manuscript. Yes, we do perceive that those weaknesses you raised all make sense. We agree with you on almost all the suggestions that you detailed below, particularly in clarifying statistics and sample size determination. Please see specific responses below.

      Major Comments

      (1) In the introduction, the authors present procrastination nearly as if it were the most relevant and problematic issue there is in psychology. Surely, procrastination is a relevant and study-worthy topic, but that is also true if it is presented in more modest (and appropriate) terms. The manuscript mentions that procrastination is a main cause of psychopathology and bodily disease. These claims could possibly be described as 'sensationalized'. Also, the studies to support these claims seem to report associations, not causal mechanisms, as is implied in the manuscript.

      Thank you for this very practical suggestion. We agree that the current statements to underline the importance of procrastination are somewhat overreaching. Upon revision, we have overall toned down such claims by explicitly stating them as “associative evidence”, and rewritten a portion of terms in a more modest and balanced style. Please see specific revisions in the main text below:

      Introduction Section (Page 5, Line 64-81)

      “Procrastination is increasingly becoming a prevalent behavioral problem around the world, which reflects the irrational voluntary postponement of scheduled tasks albeit being worse off for such delays (Blake, 2019; Steel, 2007). In the epidemiological investigations, more than 15% of adults were identified as having chronic procrastination problems, and the situation for students was worse as 70-80% of undergraduates engaged in procrastination (American College Health Association, 2022; Ferrari et al., 2005). Moreover, the behavioral genetic evidence indicates a certain heritability of procrastination in human beings as well (Gustavson et al., 2017; Gustavson et al., 2014, 2015). In addition to its prevalence, the undesirable associations between procrastination behavior and health also warrant cautions. There is cumulative evidence to show the close associations between procrastination behavior and working performance, financial status, interpersonal relationships, and subjective well-being (Ferrari, 1994; Pychyl & Sirois, 2016; Steel et al., 2021). Further, as the prospective cohort studies indicated, many mental health problems emerge alongside procrastination, particularly in sleep problems, depression, and anxiety (Hairston & Shpitalni, 2016; Johansson et al., 2023). Even worse, chronic procrastination behavior has been observed to impair general health, as manifested by the intimate associations with close system disruption, gastrointestinal disturbance, as well as a high risk of hypertension and cardiovascular disease (Sirois, 2015; Sirois, 2016). ... ”

      (2) It is laudable that the study was pre-registered; however, the cited OSF repository cannot be accessed and therefore, the OSF materials cannot be used to (a) check the preregistration or to (b) fill in the gaps and uncertainties about the exact analyses the authors conducted (this is important because the description of the analyses is insufficiently detailed and it is often unclear how they analyzed the data).

      We are sorry to encounter a serious technical barrier making our preregistration invisible and inaccessible. The OSF has disabled my OSF account, as it claimed to detect “suspicious user’s activities” in my account (please see the screenshot below). This results in no access to all materials already deposited in this OSF account, including this preregistration. We have contacted the OSF team, but received no valid technical solution to recover this preregistered report. We reckon that this may be triggered by my affiliation change to the Third Military Medical University of the People’s Liberation Army (PLA).

      To address this unexpected circumstance and to ensure transparency, we have explicitly reported this case in the main text, and added the “Reconstructed Preregistration Statement” into the Supplemental Materials (SM). Also, as it has been out of best practices in preregistration, in addition to transparently reporting this case, we have removed this statement regarding preregistration elsewhere throughout the whole revised manuscript. Furthermore, we fully understand the gaps of comprehending the statistics of this study, resulting from inadequate methodological details in the reporting. Therefore, we have clearly reported extensive details in the Methods section to clarify how to conduct those analyses, favoring the smooth evaluations of our conclusions. Please see what we have added in the lines below (Comments #4-9).

      Methods Section (Page 5, Line 186-191)

      “This study fully adhered to CONSORT reporting guidelines, and was originally preregistered in the OSF repository (10.17605/OSF.IO/Y3EDT). However, due to the technical constraint related to OSF account service (see SM), this OSF page is no longer accessible. For transparency and best practices of open science, based on the original protocol documentations, a preregistration statement has been reconstructed to clarify aprior hypotheses, sample size determinations, and analysis plans for this study (Table S1).”

      (3) Related to the previous point: I find it impossible to check the analyses with respect to their appropriateness because too little detail and/or explanation is given. Therefore, I find it impossible to evaluate whether the conclusions are valid and warranted.

      Again, we apologize for confusing you because of inadequate statistical and methodological details. As you may know, this manuscript has ever been reviewed by Nature Human Behaviour, which editorially constrained the paper length. Thus, a substantial number of details had to be omitted or removed. As you kindly suggested, we have diligently added extensive descriptions to clarify how we carried out statistical analyses in the present study. Please see specific instances underneath.

      (4) Why is a medium effect size chosen for the a priori power analysis? Is it reasonable to assume a medium effect size? This should be discussed/motivated. Related: 18 participants for a medium effect size in a between-subjects design strikes me as implausibly low; even for a within-subjects design, it would appear low (but perhaps I am just not fully understanding the details of the power analysis).

      Thank you for raising this crucial question. We have determined this a priori effect size based on the existing work we published previously (Xu et al., 2023, J Exp Psychol Gen;152(4):1122-1133). In our pilot study (Xu et al., 2023), we identified a significant interaction effect between the single-session tDCS stimulation (active vs sham) and time (pre-test vs post-test) (t = 2.38, p = .02, n = 27; 95% CI [0.14, 1.49]) for changing procrastination willingness in the laboratory settings, indicating a medium effect size. Therefore, this pilot study provides supportive evidence to determine this effect size a priori. To clarify, we have explicitly justified the selection of this effect size in the Methods section.

      Methods Section (Page 5, Line 206-215)

      “A full randomized block design was used to assign participants to both groups (active neuromodulation group, NM; sham-control group, SC) (see Fig. 2C). As the pilot study probing into the effect of single-session tDCS stimulation to change procrastination willingness indicated (t = 2.38, p = .02, 95% CI [0.14, 1.49]; Xu et al., 2023), statistical power was predetermined by G*Power at a relatively medium effect size (1-β err prob = 0.80, f = 0.25), yielding the total sample size at 18 to reach acceptable power (see SM Methods and Fig. S1)....”

      We fully understand that this sample size to reach a medium effect size is seemingly low, and that the18 participants for each group are apparently limited in any case. Upon double-checking these power analyses, we confirmed that this sample size requirement is indeed correct. Please see the G*Power outputs in Author response image 1.

      Author response image 1.

      Despite the absence of algorithmic errors in the power analysis here, we are aware that this limited sample size may hamper statistical robustness. To tackle this weakness, we have clearly warranted such cautions in the Limitation section:

      Limitations Section (Page 12, Line 637-640)

      “... In addition to technical limitations, given the apparently limited size of the sample (total N = 46), it warrants caution in generalizing these findings elsewhere, and necessitates further validations in a large-scale cohort.”

      (5) It remains somewhat ambiguous whether the sham group had the same number of stimulation sessions as the verum stimulation group; please clarify: Did both groups come in the same number of times into the lab? I.e., were all procedures identical except whether the stimulation was verum or sham?

      Yes, we fully followed the CONSORT pipeline to carry out this double-blind trial, and thus confirmed that all the participants in both groups had the same number of stimulation sessions in our lab. That is to say, except for the stimulation type (verum vs sham), all the procedures, equipment and even the room were identical for all the participants. For clarification, we have clearly stated this in the main text:

      Results Section (Page 9, Line 419-423)

      “In both groups, almost all participants (93.2%, 41/44) reported perceiving acceptable pain stemming from current stimulation, and believed they were receiving treatment (91.30% (21/23) for active neuromodulation group (NM), 86.95% (20/23) for sham control group (SC), x<sup>2</sup> = 0.224, p = .636). All the participants were engaged in the identical experimental procedures excepting to stimulation’s type (active vs sham). ...”

      (6) The TDM analysis and hyperbolic discounting approach were unclear to me; this needs to be described in more detail, otherwise it cannot be evaluated.

      We apologize for the inadequate details, which hindered a precise understanding of the TDM and the hyperbolic discounting model. The Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations to take away from playing actions now for avoiding negative experiences). Once task aversiveness overrides the pursuit of task outcome values, the procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). Considering the nonlinear dynamics inherent in this hyperbolic discounting, we therefore employed a log-spaced temporal sampling scheme (Myerson et al., 2001) to strengthen curve-fitting performance (please see the schematic diagram (https://uen.pressbooks.pub/behavioraleconomics/chapter/the-reality-of-homo-sapiens, where each point indicates a sampling time)):

      Specifically, based on the log-spaced temporal sampling rule, five time points were first selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampling occurred at 10:00, 16:00, 18:00, 19:30, 20:00). At each time point, participants reported task aversiveness (A) on a 0–100 Visual Analog Scale (VAS). Then, task aversiveness discounting was calculated as 1- (A<sub>t</sub> / A<sub>earliest</sub>), where t<sub>earliest</sub> was the earliest sampling point (e.g., 10:00), serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), we estimated the AUC from these five data points based on the Myerson algorithm (Myerson et al., 2001), which was computed as the trapezoidal integration of task aversiveness discounting over time. By this modelling method, a higher AUC reflects stronger temporal discounting of task aversiveness, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. That is to say, if a participant showcases a greater discounting of task aversiveness as reflected by a higher AUC, she/he experiences a more pronounced reduction in subjective aversiveness upon postponement, plausibly yielding less procrastination. As you kindly suggested, we have added these details to explicitly clarify how to use the hyperbolic discounting approach for determining sampling time points and for calculating AUC of task aversiveness discounting.

      Methods Section (Page 6, Line 268-283)

      “On the Task day, we developed a mobile app to implement experience sampling method (ESM) for tracking one’s real-time evaluation of task aversiveness and task outcome value (see Fig. 1). The task aversiveness describes how disagreeable one perceives when performing a given real-life task to be, whereas outcome value refers to the subjective benefits of the task outcome brought about by completing the task before the deadline (Zhang & Feng, 2020). As theoretically conceptualized by the temporal decision model (TDM) of procrastination, the perceived task aversiveness is hyperbolically discounted when approaching deadline, showing sharply discounting when faring away from deadline but slowly discounting once nearing deadline (Zhang & Feng, 2020; Zhang et al., 2021). Thus, considering this nonlinear dynamics inherent in this hyperbolic discounting, the five recording moments of ESM were selected per task a priori by using a log-spaced temporal sampling scheme (Myerson et al., 2001), with increasing sampling density toward the deadline, such as moments of 10:00 (earliest), 16:00, 18:00, 19:30, 20:00 (deadline). The five sampling points could meet statistical prerequisite in the hyperbolic model fitting, requiring ≥ 4 points (Green & Myerson, 2004). To do so, recording moments of tasks were individually tailored for each task per participant in this ESM procedure.”

      Methods Section (Page 7, Line 318-334)

      “... As articulated temporal decision theoretical model above, the task aversiveness evoked by executing a task was temporally dynamic in a hyperbolic discounting pattern, with sharply discounting in faring away from deadline but slowly discounting in nearing deadline (Zhang & Feng, 2020). To quantitatively characterize the task aversiveness with consideration for its dynamics, the model-free area under the curve (AUC) was calculated. Specifically, based on the log-spaced temporal sampling rule, task aversiveness was measured by 100-point visual analog scale at the five sampling moments. Then, the task aversiveness discounting (A) was calculated as 1- (A(t) / A(earliest)), where t(earliest) was the earliest sampling point, serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), the AUC was computed as the trapezoidal integration between task aversiveness discounting and time across five data points, basing on the Myerson algorithm (Myerson et al., 2001). By doing so, a higher AUC reflects stronger temporal discounting of task aversiveness along with nearing deadline, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. As for the task outcome value, it was theoretically posited as a relatively stable evaluation of the task (Zhang & Feng, 2020; Zhang et al., 2021).”

      References

      Myerson, J., Green, L., & Warusawitharana, M. (2001). Area under the curve as a measure of discounting. Journal of the experimental analysis of behavior, 76(2), 235–243. https://doi.org/10.1901/jeab.2001.76-235

      Xu, T., Zhang, S., Zhou, F., & Feng, T. (2023). Stimulation of left dorsolateral prefrontal cortex enhances willingness for task completion by amplifying task outcome value. Journal of experimental psychology. General, 152(4), 1122–1133. https://doi.org/10.1037/xge0001312

      Zhang, S., Verguts, T., Zhang, C., Feng, P., Chen, Q., & Feng, T. (2021). Outcome Value and Task Aversiveness Impact Task Procrastination through Separate Neural Pathways. Cerebral cortex (New York, N.Y. : 1991), 31(8), 3846–3855. https://doi.org/10.1093/cercor/bhab053

      Zhang, S., Liu, P., & Feng, T. (2019). To do it now or later: The cognitive mechanisms and neural substrates underlying procrastination. Wiley interdisciplinary reviews. Cognitive science, 10(4), e1492. https://doi.org/10.1002/wcs.1492

      Zhang, S., & Feng, T. (2020). Modeling procrastination: Asymmetric decisions to act between the present and the future. Journal of experimental psychology. General, 149(2), 311–322. https://doi.org/10.1037/xge0000643

      (7) Coming back to the point about the statistical analyses not being described in enough detail: One important example of this is the inclusion of random slopes in their mixed-effects model which is unclear. This is highly relevant as omission of random slopes has been repeatedly shown that it can lead to extremely inflated Type 1 errors (e.g., inflating Type 1 errors by a factor of then, e.g., a significant p value of .05 might be obtained when the true p value is .5). Thus, if indeed random slopes have been omitted, then it is possible that significant effects are significant only due to inflated Type 1 error. Without more information about the models, this cannot be ruled out.

      Thank you for sharing this very timely and crucial comment. After careful scrutiny, we identified this statistical flaw you pointed out - each participant was not yet modeled as random slopes but as random intercepts merely. As you kindly suggested, we have reanalyzed all the statistics by adding random slopes (i.e., (1 + day|SubjectID)). Results showed a statistically significant interaction effect for both procrastination willingness (β = -7.8, SE = 1.8, DF = 45.6, p < .001) and actual procrastination rates (β = -7.4, SE = 2.4, DF = 46.6, p = .004), indicating the effectiveness of multi-session neuromodulation in mitigating procrastination. In the post-hoc simple effect analyses, participants who engaged in active neuromodulation (NM) showed a significant increase in task-execution willingness (i.e., decreased procrastination willingness; NM-before: 35.65 ± 30.20, NM-after: 80.43 ± 19.92, t.ratio = 5.4, p < .0001, Tukey correction) and a decrease in actual procrastination rates (NM-before: 43.26 ± 39.09, NM-after: 0.00 ± 0.00, t.ratio = 5.1, p < .0001, Tukey correction), while no such effects were identified for participants in the sham control group (for willingness, SC-before: 37.57 ± 26.46, SC-after: 47.35 ± 30.49, t.ratio =0.3, p = .77, Tukey correction; for actual procrastination, SC-before: 46.47 ± 40.75, SC-after: 33.34 ± 37.82, t.ratio = 0.7, p = .48, Tukey correction). Taken together, we do appreciate your pointing out this definitely crucial statistical weakness, and have confirmed that our findings remain reliable after adjusting for Type 1 error by adding random slopes. Moreover, as you kindly suggested, we have incorporated these statistical details, particularly those concerning the GLMM, into the main text to facilitate your evaluation. Please see specific revisions below:

      Methods Section (Page 8, Line 381-401)

      “To clarify whether multiple-session HD-tDCS neuromodulation can reduce procrastination, the generalized mixed-effects linear model (GLMM) was constructed with full factorial design for subjective procrastination willingness (i.e., self-reported visual analog scores) and actual procrastination behavior (i.e., real-world task-completion rate before deadline). Here, sex, age and socioeconomic status (SES) were modeled as covariates of no interest. As the National Bureau of Statistics (China) issued (https://www.stats.gov.cn/sj/tjbz/gjtjbz/), on the basis of per capita annual household income, the SES was divided into seven hierarchical tiers from 1 (poor) to 7 (rich). To obviate subjective rating bias stemming from individual daily mood, we separately measured participants’ daily emotional fluctuation at 10:00 and 16:00 using a self-rating visual analog item (i.e., “How do feel for your mood today?”, 0 for “completely uncomfortable” and 100 for “definitely happy”). By doing so, the averaged score of those self-rating emotions at the two time points was modeled into the GLMM as covariate of no interests, yielding the final expression of “outcome ~ Group*Treatment_Day + Age + Gender + SES + Emotions + (1 + Treatment_Day | SubjectID)” in the statistical model”. This analysis was implemented using the “lme4” and “lmerTest” packages. Employing “emmeans” package, simple effects were also tested at baseline and post-last-intervention using Tukey-adjusted pairwise comparisons of estimated marginal means from the full GLMM, controlling for covariates and random-effects structure. To validate statistical robustness, instead of continuous outcomes for parametric tests, we also conducted a between-group comparison for the number of tasks that procrastination emerges by using the nonparametric x<sup>2</sup> test with φ correction or Fisher exact test....”

      Results Section (Page 9, Line 428-449)

      “To identify whether ms-tDCS targeting the left DLPFC can alleviate subjective procrastination willingness and actual procrastination behavior, a generalized linear mixed-effects model with Scatterthwaite algorithm was built, with task-execution willingness and actual procrastination rates (PR) as primary outcomes, respectively. For procrastination willingness, results showed a statistically significant interaction effect between multi-session neuromodulations and groups (β = -7.8, SE = 1.8, DF = 45.6, p < .001; Fig. 3A). In the post-hoc simple effect analysis, it demonstrated a significantly increased task-execution willingness (i.e., decreased procrastination willingness) after neuromodulation in the active neuromodulation group (NM-before: 35.65 ± 30.20, NM-after: 80.43 ± 19.92, t.ratio = 5.4, p < .0001, Tukey correction), but no such effects were identified in the sham control group (SC-before: 37.57 ± 26.46, SC-after: 47.35 ± 30.49, t.ratio =0.3, p = .77, Tukey correction) (Fig. 3B-C). A linear uptrend for task-execution willingness was further observed across multiple sessions in the active NM group, indicating gradually increasing neuromodulation effects (Fig. 3D; p < .01, Mann-Kendall test). For actual procrastination behavior, changes to actual procrastination rates across all the sessions have been detailed in the Fig. 3E. Similarly, a statistically significant interaction effect was identified here (β = -7.4, SE = 2.4, DF = 46.6, p = .004), and the simple effect analysis further revealed decreased actual procrastination rates after ms-tDCS in the active neuromodulation group (NM-before: 43.26 ± 39.09, NM-after: 0.00 ± 0.00, t.ratio = 5.1, p < .0001, Tukey correction), but no such prominent changes found in the sham control group (SC-before: 46.47 ± 40.75, SC-after: 33.34 ± 37.82, t.ratio = 0.7, p = .48, Tukey correction) (Fig. 3F-G). Also, a significant downtrend for procrastination rates across all the sessions was identified in the active NM group (Fig. 3H; p < .01, Mann-Kendall test).”

      (8) Related to the previous point: The authors report, for example, on the first results page, line 420, an F-test as F(1, 269). This means the test has 269 residual degrees of freedom despite a sample size of about 50 participants. This likely suggests that relevant random slopes for this test were omitted, meaning that this statistical test likely suffers from inflated Type 1 error, and the reported p-value < .001 might be severely inflated. If that is the case, each observation was treated as independent instead of accounting for the nestedness of data within participants. The authors should check this carefully for this and all other statistical tests using mixed-effects models.

      Thank you for underlining this very timely and helpful comment. As you correctly pointed out above, we did not include random slopes in the original GLMM, highly risking the inflation of the false-positive rate (i.e., Type-I error). By adding the random slopes, we reanalyzed all the statistics from the GLMM, and confirmed that all the findings are still reliable from those new GLMMs with random slopes. Again, thank you for this crucial statistical advice, and please see the above response for full details regarding what we have revised to address this comment you kindly raised.

      (9) Many of the statistical procedures seem quite complex and hard to follow. If the results are indeed so robust as they are presented to be, would it make sense to use simpler analysis approaches (perhaps in addition to the complex ones) that are easier for the average reader to understand and comprehend?

      We do thank you for this practical and helpful comment. In the original manuscript, we incorporated a joint model of longitudinal and survival data (JM-LSD), in conjunction with machine learning algorithms, to strengthen the robustness of our statistical findings. Nevertheless, we all agree with you on this point: there is no need to complicate the analyses by repeatedly probing the same research question to increase methodological robustness, at the expense of compromising readability and intelligibility for a broader audience. As you suggested, we have removed these complicated statistical methods, and merely maintained the primary ones - GLMM and X<sup>2</sup> cross-tab test, as well as a complementary one - Mann-Kendall linear trend test. Thus, we have almost rewritten the whole Results section. Please see the specific instances below:

      Results Section (Page 9, Line 468-485)

      “Ms-tDCS changes task aversiveness and task-outcome value

      Both task aversiveness and task outcome value serve as key pathways determining whether one would procrastinate. To this end, we further utilized a generalized linear mixed-effects model to examine the effects of ms-tDCS on changes in task aversiveness and task outcome value. Task aversiveness changes across all the sessions are shown in the Fig. 4A and 4C. We demonstrated a statistically significant decrease in task aversiveness and an increase in task outcome value via ms-tDCS in the neuromodulation group (Task aversiveness: interaction effect, β = -0.12, SE = 0.04, DF = 46.7, p = .002; simple effect, NM-before <sub>(AUC)</sub>: 1.13 ± 0.53, NM-after <sub>(AUC)</sub>: 1.95 ± 0.85, t.ratio = 4.5, p < .001, Tukey correction; Outcome value: β = -6.8, SE = 1.74, DF = 46.2, p < .001; simple effect, NM-before: 35.86 ± 27.82, NM-after: 73.08 ± 23.33, t.ratio = 5.0, p < .001, Tukey correction; see Fig. 4B), but not in the sham control group (Task aversiveness: SC-before <sub>(AUC)</sub>: 1.07 ± 0.51, SC-after <sub>(AUC)</sub>: 1.28 ± 0.46, t.ratio = 1.3, p = .20, Tukey correction; Outcome value: SC-before: 34.00 ± 25.17, SC-after: 40.13 ± 28.94, t.ratio = 0.8, p = .41, Tukey correction; see Fig. 4D). In the neuromodulation (NM) group, task aversiveness steadily decreased with the cumulative number of stimulation sessions, while perceived task outcome value increased significantly (see Fig. 4E-F, p < .05, Mann-Kendall test). Thus, it provides causal evidence clarifying that neuromodulation to left DLPFC reduces task aversiveness and enhances task-outcome value meanwhile.”

      Results Section (Page 10, Line 525-542)

      “Long-term effects of ms-tDCS

      We have also attempted to conduct a follow-up investigation to test the long-term retention of ms-tDCS in reducing actual procrastination. Almost all the participants had undergone follow-up except one in the neuromodulation group after last neuromodulation for 6 months (N<sub>NM</sub> = 22, N<sub>SC</sub> = 23). Thus, the GLMM was constructed, with the PR before first neuromodulation vs. PR after last neuromodulation for 6 months as covariates of interest. Results showed the statistically significant group*time interaction effects (β = 16.5, SE = 9.9, p = .049). Simple-effect model demonstrated a decrease in actual procrastination rates in the active neuromodulation group after last stimulation for 6 months compared to baseline (β = -22.05, SE = 10.0, p = .038, Tukey correction; NM-before: 40.68 ± 37.96, NM-after<sub>6-months</sub>: 18.63 ± 29.80), and revealed null effects in the SC group (β = 1.26, SE = 9.78, p = .99, Tukey correction; SC-before: 46.47 ± 40.75, SC-after<sub>6-months</sub>: 47.73 ± 39.18) (see Fig. 6).. Furthermore, using a nonparametric x<sup>2</sup> test to compare differences in the number of procrastinated tasks, we still found a statistically significant reduction in procrastination frequency in NM group after neuromodulation for 6 months compared to baseline (x<sup>2</sup> = 3.30, p = .035, NM-before: 68.19% (15/22), NM-after<sub>6-months</sub>: 40.91% (9/22)), while no significant changes were observed in the SC group (x<sup>2</sup> = 0.11, p = .74, SC-before: 69.56% (16/23), SC-after<sub>6-months</sub>: 73.91% (17/23)). Therefore, beyond to short-term effects, the benefits of ms-tDCS neuromodulation to reduce procrastination pose the long-term retention.”

      (10) As was noted by an earlier reviewer, the paper reports nearly exclusively about the role of the left DLPFC, while there is also work that demonstrates the role of the right DLPFC in self-control. A more balanced presentation of the relevant scientific literature would be desirable.

      We are grateful to you for noticing the unbalanced presentation of the literature on left DLPFC. As you kindly suggested, we have added literature to support the association between self-control and the right lateralization of the DLPFC. Please see below for what we have revised:

      Introduction Section (Page 4, Line 137-143)

      “...In addition to the left lateralization, there is solid evidence indicating significant associations between self-control and the right DLPFC indeed, particularly given that this region specifically functions in top-down regulation, future self-continuity representation and social decisions (Huang et al., 2025; Lin and Feng, 2024; Knoch & Fehr, 2007). Despite this case, Xu and colleagues demonstrated null effects of anodally stimulating the right DPFC to modulate either value evaluation or emotional regulation for changing procrastination willingness (Xu et al., 2023).”

      (11) Active stimulation reduced procrastination, reduced task aversiveness, and increased the outcome value. If I am not mistaken, the authors claim based on these results that the brain stimulation effect operates via self-control, but - unless I missed it - the authors do not have any direct evidence (such as measures or specific task measures) that actually capture self-control. Thus, that self-control is involved seems speculation, but there is no empirical evidence for this; or am I mistaken about this? If that is indeed correct, I think it needs to be made explicit that it is an untested assumption (which might be very plausible, but it is still in the current study not empirically tested) that self-control plays any role in the reported results.

      We truly appreciate your pointing out this weakness with regard to conceptualization. Yes, you are correct in understanding this causal chain: we conceptually speculate that the HD-tDCS stimulation over the left DLPFC operates self-control to change procrastination, rather than empirically validating this component in the chain: brain stimulation→increased self-control→increased task outcome value→decreased procrastination. In this causal chain, we did not collect data to directly measure self-control at either baseline or post-neuromodulation times. Therefore, we all agree with your suggestion to explicitly claim this case in the main text. Following this advice, we have redrawn a portion of the Conclusion by clearly pointing out the hypothesis-generating role of self-control in mitigating procrastination, and have further claimed this case in the Limitation section:

      Abstract Section (Page 2, Line 55-57)

      “... This establishes a precise, value-driven neurocognitive pathway to account the conceptualized roles of self-control on procrastination, and offers a validated, theory-driven strategy for interventions.”

      Results Section (Page 10, Line 489-492 and 520-522)

      “Given the dual neurocognitive pathways identified above—reduced task aversiveness and increased task-outcome value—we proposed that these changes, conceptually driven by enhanced self-control via ms-tDCS over left DLPFC, account for how neuromodulation reduces procrastination. ...”

      “In summary, these findings demonstrated a mechanistic pathway underlying procrastination: the self-control that was conceptualized to be governed by left DLPFC mitigate procrastination by plausibly increasing task-outcome value.”

      Discussion Section (Page 13, Line 642-645)

      “Moreover, this study did not collect data for assessing participants’ self-control at either baseline or post-neuromodulation, thereby limiting our ability to determine whether the effects on procrastination were uniquely attributable to neuromodulation-induced changes in self-control. ...”

      (12) Figures 3F and 3H show that procrastination rates in the active modulation group go to 0 in all participants by sessions 6 and 7. This seems surprising and, to be honest, rather unlikely that there is absolutely no individual variation in this group anymore. In any case, this is quite extraordinary and should be explicitly discussed, if this is indeed correct: What might be the reasons that this is such an extreme pattern? Just a random fluctuation? Are the results robust if these extreme cells are ignored? The authors remove other cells in their design due to unusual patterns, so perhaps the same should be done here, at least as a robustness check.

      Thank you for raising this highly important and helpful comment. Indeed, we fully understand that this result is somewhat extraordinary, a fact that was equally striking to us when unblinding the data. After carefully scrutinizing the data and statistics, we are thrilled to confirm that this pattern is true. In support of this observation, we were gratified to receive numerous thank-you letters from participants who engaged in active neuromodulation. They expressed gratitude to us, and reported that they have substantially ameliorated procrastination behavior in real-life activities after completing the trial. While this does not constitute formal scientific evidence, we are also glad to see the benefits of this neuromodulation for those procrastinators.

      Two reasons could account for this pattern herein. One interpretation is to attribute this pattern to “scalar inflation”. In the present study, the procrastination rate was calculated as 1 minus the task-completion rate (e.g., 80%, 60%, 40%) by the deadline. At sessions # 6 and #7, all the participants completed their real-life tasks before the deadline, yielding a 0% (1 minus 100% completion rate) procrastination rate, without any between-individual variation. Thus, rather than there being no individual variation in procrastination, this scalar – the procrastination rate - is too insensitive to capture subtle differences per se. For instance, although participants #1 and #2 both showed a 0% procrastination rate - meaning that both completed their tasks before the deadline - Participant #1 might have completed it 3 hours before the deadline, whereas Participant #2 might have completed it only 10 minutes before. In this case, the “scalar inflation” emerges to let us perceive that both participants have equivalent procrastination rates, although participant #2 may have a higher procrastination level than #1. As conceptually defined in the field, procrastination is contextualized as “not completing a task before the deadline”. Thus, if this task is completed before the deadline, regardless of whether it was finished close to or far in advance of the deadline, this case is defined as “no procrastination”. In the present study, the primary outcome is whether a participant procrastinated on a real-life task before the deadline in real-world settings, irrespective of when she/he completed this task. Thus, this scalar - procrastination rate - fits our conceptualization of procrastination.

      Another reason is the potential accumulative effects from sequential multi-session tDCS stimulation. As shown in Mann-Kendall trend tests, the procrastination rates show a significant linear downtrend in the active neuromodulation group across sessions, even after removing sessions #6 and #7. This indicates that the improvements of going against procrastination may be sequentially accumulative along with the increase in sessions, implying a potential “dose-dependent effect”. Despite a speculative interpretation, this “dose-dependent effect” in neuromodulation has been well-documented in previous studies, showing the robustly linear association between the number of sessions and effectiveness (c.f., Cole et al., 2020; Hutton et al., 2023; Sabé et al., 2024; Schulze et al., 2018). Therefore, although this extreme pattern is somewhat extraordinary compared to previous observations, it makes sense.

      Yes, this is a definitely great idea to carry out a robustness check by removing sessions #6, #7, or both. We do believe that this analysis could support statistical robustness to go against potential biases from extreme cells. By doing so, we found that all the group*treatment_day interaction effects remained significant when removing either session #6 or session #7 (or even both, all p-values < .05), indicating high statistical robustness. Please see Supplementary table S3 and S4

      Taken together, in spite of their being extraordinary, we confirm that those findings are statistically robust to extreme outliers. As you kindly suggested, we have added those findings of the robustness check into the revised Supplemental Materials section.

      References

      Cole, E. J., Stimpson, K. H., Bentzley, B. S., Gulser, M., Cherian, K., Tischler, C., Nejad, R., Pankow, H., Choi, E., Aaron, H., Espil, F. M., Pannu, J., Xiao, X., Duvio, D., Solvason, H. B., Hawkins, J., Guerra, A., Jo, B., Raj, K. S., Phillips, A. L., … Williams, N. R. (2020). Stanford Accelerated Intelligent Neuromodulation Therapy for Treatment-Resistant Depression. The American journal of psychiatry, 177(8), 716–726. https://doi.org/10.1176/appi.ajp.2019.19070720

      Hutton, T. M., Aaronson, S. T., Carpenter, L. L., Pages, K., Krantz, D., Lucas, L., Chen, B., & Sackeim, H. A. (2023). Dosing transcranial magnetic stimulation in major depressive disorder: Relations between number of treatment sessions and effectiveness in a large patient registry. Brain stimulation, 16(5), 1510–1521. https://doi.org/10.1016/j.brs.2023.10.001

      Sabé, M., Hyde, J., Cramer, C., Eberhard, A., Crippa, A., Brunoni, A. R., Aleman, A., Kaiser, S., Baldwin, D. S., Garner, M., Sentissi, O., Fiedorowicz, J. G., Brandt, V., Cortese, S., & Solmi, M. (2024). Transcranial Magnetic Stimulation and Transcranial Direct Current Stimulation Across Mental Disorders: A Systematic Review and Dose-Response Meta-Analysis. JAMA network open, 7(5), e2412616. https://doi.org/10.1001/jamanetworkopen.2024.12616

      Schulze, L., Feffer, K., Lozano, C., Giacobbe, P., Daskalakis, Z. J., Blumberger, D. M., & Downar, J. (2018). Number of pulses or number of sessions? An open-label study of trajectories of improvement for once-vs. twice-daily dorsomedial prefrontal rTMS in major depression. Brain stimulation, 11(2), 327–336. https://doi.org/10.1016/j.brs.2017.11.002

      (13) The supplemental materials, unfortunately, do not give more information, which would be needed to understand the analyses the authors actually conducted. I had hoped I would find the missing information there, but it's not there.

      Sorry to offer uninformative supplemental materials (SM) in the original submission. As you suggested, we have added a substantial number of details to clarify how we conducted data analyses in the main text, and also tightened the whole SM section to improve readability and comprehensibility. We do hope that this revised manuscript could offer clear and adequate information in understanding methods and statistics for broader readers.

      In sum, the reported/cited/discussed literature gives the impression of being incomplete/selectively reported; the analyses are not reported sufficiently transparently/fully to evaluate whether they are appropriate and thus whether the results are trustworthy or not. At least some of the patterns in the results seem highly unlikely (0 procrastination in the verum group in the last 2 observation periods), and the sample size seems very small for a between-subjects design.

      Thank you for this very helpful summary. As you kindly suggested above, we have overhauled this manuscript to address those points that you listed here, particularly where we added relevant literature to balance our claims, added a huge amount of details to sufficiently/transparently report statistics, and conducted a robustness check to confirm the statistical robustness of our findings to those plausible extreme patterns (sessions #6 and #7), as well as justified how we determined this sample size fulfilling medium statistical power in a priori. Please see above for full details regarding how we addressed those comments, point-by-point.

      Reviewer #2 (Public Review):

      Chen and colleagues conducted a cross-sectional longitudinal study, administering high-definition transcranial direct stimulation targeting the left DLPFC to examine the effect of HD-tDCS on real-world procrastination behavior. They find that seven sessions of active neuromodulation to the left DLPFC elicited greater modulation of procrastination measures (e.g., task-execution willingness, procrastination rates, task aversiveness, outcome value) relative to sham. They report that tDCS effects on task-execution willingness and procrastination are mediated by task outcome value and claim that this neuromodulatory intervention reduces procrastination rates quantified by their task. Although the study addresses an interesting question regarding the role of DLPFC on procrastination, concerns about the validity of the procrastination moderate enthusiasm for the study and limit the interpretability of the mechanism underlying the reported findings.

      Strengths:

      (1) This is a well-designed protocol with rigorous administration of high-definition transcranial direct current stimulation across multiple sessions. The approach is solid and aims to address an important question regarding the putative role of DLPFC in modulating chronic procrastination behavior.

      (2) The quantification of task aversiveness through AUC metrics is a clever approach to account for the temporal dynamics of task aversiveness, which is notoriously difficult to quantify.

      Thank you for taking your invaluable time to review our manuscript, warmly applauding the strength in research design and the conceptualization of scaling task aversiveness, as well as kindly sharing such helpful and insightful evaluations. As you correctly pointed out, we are aware of the absence of detailed, clear and understandable reporting of measures (e.g., real-world procrastination), statistics and methods, in the original manuscript. Following all your suggestions, we have thoroughly revised this manuscript to address those comments that you kindly made, point-by-point. Please see the full response underneath.

      Weaknesses:

      (1) The lack of specificity surrounding the "real-world measures" of procrastination is problematic and undermines the strength of the evidence surrounding the DLPFC effects on procrastination behavior. It would be helpful to detail what "real-world tasks" individuals reported, which would inform the efficacy of the intervention on procrastination performance across the diversity of tasks. It is also unclear when and how tasks were reported using the ESM procedure. Providing greater detail of these measures overall would enhance the paper's impact.

      We genuinely appreciate your raising this very crucial comment. We are sorry for omitting a tremendous number of methodological details to comply with the editorial requirement on the manuscript’s length, which hampered the comprehension of how we measure “real-life tasks” and “real-world procrastination”.

      As shown in the schematic diagram for experimental procedure (Fig. 1), the experimental protocol alternated between Neuromodulation Days (Days 2, 4, 6, 8, 10, 12, 14) and Task Days (Days 1, 3, 5, 7, 9, 11, 13, 15). On each Neuromodulation Day, participants received either active or sham HD-tDCS, and—critically—before stimulation—were instructed to specify a real-life task they were required to complete the following day, with a deadline between 18:00 and 24:00. This ensured ≥24 hours between neuromodulation and task execution, isolating offline after-effects. For instance, on Day #2 (Neuromodulation Day), before carrying out stimulation, participants were asked to report a real-life task that has a deadline within 18:00 - 24:00 for tomorrow’s “task day” (Day #3) (please see the schematic diagram in Author response image 2).

      Author response image 2.

      There are some real-life tasks that they reported in our experiment as examples: “Complete and submit a homework assignment”, “Complete a standardized English proficiency test”, “Complete an online course module required for applying a Class C driver’s license”, “Prepare slides for a seminar presentation”, “Practice guitar”, “Practice Chinese calligraphy”, and “Do the laundry”. Reported tasks spanned academic (e.g., submitting an assignment), occupational (e.g., preparing a presentation), administrative (e.g., applying for a license), self-improvement (e.g., practicing guitar for ≥30 min), domestic (e.g., laundry), and health-related domains (e.g., running ≥ 2,000m for exercise), indicating a plausible task diversity.

      On each “task day”, participants engaged in an intensive Experience Sampling Method (iESM) protocol via a custom-built mobile app. Using this app, participants were required to report a subjective task-execution willingness score (i.e., a one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”; procrastination willingness = 100 – the task-execution willingness score), the subjective task aversiveness (i.e., a one-item 100-point visual analog scale), the subjective task outcome value (i.e., a one-item 100-point visual analog scale), and the objective procrastination rate, respectively.

      Rather than self-reported scores from those one-item visual analog scales, we asked participants to report real “task completion rate” for the objective quantification of the “real-world procrastination behavior”. Specifically, at the deadline, each participant was asked to report whether she/he had completed this task. If she/he reported not having yet completed the task (i.e. procrastination behavior emerged), she/he was further required to report the percentage of the task completed (1% - 99%), which was defined as the task completion rate. By doing so, we could calculate the real-world procrastination rate for the real-life task as the “1 – the task completion rate”. For instance, if a participant did not complete her/his real-life task before the deadline (i.e. she/he procrastinated this task) and reported completing 75% of this task at the deadline, her/his real-world procrastination rate was computed as the 25% (1 - 75%) (Please see the schematic diagram in Author response image 3).

      Moreover, rather than merely a self-reported task completion rate, each participant was also asked to upload proof (e.g., screenshots of submitted assignments, photos of printed documents, system timestamps) to the ESM digital system for validation.

      Author response image 3.

      To determine the sampling time points for this mobile app in the ESM, we capitalized on both the conceptual temporal decision model and the statistical Myerson algorithm. Specifically, the Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of the trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations for avoiding taking action now for avoiding negative experiences). Once task aversiveness overrides the pursuits of task outcome values, procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). To maximize statistical power to fit dynamic motivational curves, we employed a log-spaced temporal sampling scheme (Myerson et al., 2001) (please see the schematic diagram in https://uen.pressbooks.pub/behavioraleconomics/chapter/the-reality-of-homo-sapiens, where each point indicates a sampling time):

      By this fitting algorithm (Myerson et al., 2001), five time points were selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampled at 10:00, 16:00, 18:00, 19:30, 20:00). Once the task-specific five sampling time points were determined per participant, this mobile app sent a digital message to ask her/him to immediately report the task aversiveness and the task outcome value then. As the primary outcomes, the procrastination rate (i.e., 1 – the task completion rate) and the procrastination willingness were sampled at the deadline point.

      Furthermore, yes, we fully concur with you on this great idea, that is, transparency about task diversity strengthens the generalizability of our findings. In response, we have tabulated these real-life tasks that were reported in this experiment in the independent Appendix 1, with automatic translations from Chinese to English via Qwen GPT. Please see below for what we have added to the main text:

      Methods Section (Page 6-7, Line 238-308)

      “Nested cross-sectional longitudinal design

      This study used a nested cross-sectional longitudinal design to investigate whether the multiple-session anodal HD-tDCS targeting the left DLPFC could reduce actual procrastination behavior and to probe how this effect manifests. To assess procrastination in daily life, we implemented a 15-day protocol alternating between Neuromodulation Days (Days 2, 4, 6, 8, 10, 12, 14) and Task Days (Days 1, 3, 5, 7, 9, 11, 13, 15). On the Neuromodulation days, the 20-min anodal HD-tDCS neuromodulation targeting the left DLPFC was performed for HD-tDCS active group at intervals of 2 days, while the sham-control group received sham HD-tDCS training. This HD-tDCS training was repeated for a total of seven sessions, and lasted 15 days (see Fig. 1a). Crucially, to capture procrastination in ecologically valid contexts, prior to receiving either active or sham HD-tDCS (administered between 09:00–18:00), participants were instructed to specify a real-life task they were personally obligated to complete the following day, with a self-defined deadline strictly constrained to 18:00–24:00 to ensure ≥24 hours between stimulation offset and task deadline, thereby isolating offline after-effects. This task should meet the following three criteria: (a) it should be already assigned in the real-world settings; (b) deadline should be constrained to 18:00-24:00 (see above); (c) it should be more likely to induce procrastinate. By doing so, more than 300 real-life tasks were collected, spanning academic (e.g., “submit a statistics homework assignment”), occupational (e.g., “draft and email a project proposal”), administrative (e.g., “complete online application for Class C driver’s license”), self-improvement (e.g., “practice guitar for ≥30 minutes”), domestic (e.g., “do laundry ”), and health-related (e.g., “running 2,000m for exercise”). Full task list has been tabulated in the Appendix 1. As primary outcomes, all the participants were required to reported task-execution willingness (TEW) (Zhang & Feng, 2020; Zhang, Liu, et al., 2019), for a real-life task 24 hours post-neuromodulation. Thus, procrastination willingness was quantified as 100-TEW score (see underneath for details). Furthermore, we asked participants to report the actual task completion rate (CR) of the task at the deadline (e.g. participant A finished 90% homework at deadline and reported this situation to us at deadline). In this vein, the actual procrastination rate (PR) was quantified as 1-CR.

      On the Task day, we developed a mobile app to implement experience sampling method (ESM) for tracking one’s real-time evaluation of task aversiveness and task outcome value (see Fig. 1). The task aversiveness describes how disagreeable one perceives performing a given real-life task to be, whereas outcome value refers to the subjective benefits of the task outcome brought about by completing the task before the deadline (Zhang & Feng, 2020). As theoretically conceptualized by the temporal decision model (TDM) of procrastination, the perceived task aversiveness is hyperbolically discounted when approaching deadline, showing sharply discounting when faring away from deadline but slowly discounting once nearing deadline (Zhang & Feng, 2020; Zhang et al., 2021). Thus, considering this nonlinear dynamics inherent in this hyperbolic discounting, the five recording moments of ESM were selected per task a prior by using a log-spaced temporal sampling scheme (Myerson et al., 2001), with increasing sampling density toward the deadline, such as moments of 10:00 (earliest), 16:00, 18:00, 19:30, 20:00 (deadline). The five sampling points could meet statistical prerequisite in the hyperbolic model fitting (requiring ≥ 4 points; Green & Myerson, 2004). To do so, recording moments of tasks were individually tailored for each task per participant in this ESM procedure. To obviate the confounds of daily emotions in task aversiveness evaluation, we used the averaged scores of PANAS at 10:00 (noon) and 16:00 (afternoon) as anchoring points to quantify one’s daily emotions by using this ESM app. Before each session of HD-tDCS training, each participant was required to report a real-life task whose deadline is tomorrow. To obtain the long-term effect of HD-tDCS (i.e., the interval between HD-tDCS and task completion is at least 24 hours), the task deadline that participants reported was required to be between 18:00 - 24:00. Once a sampling time reached, this app would send a digital message to require participants to fill online form for data collection.

      Quantification of covariates of interests

      Outcome variables of this study were twofold: one is task-execution willingness and another is procrastination rate (PR). Task-execution willingness is used to evaluate one’s subjective inclination to avoid procrastination (Zhang & Feng, 2020). In this vein, we used a 100-point scale to require participants to report their task-execution willingness (0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”). This metric was recorded 24 hours after neuromodulation to examine its long-term effects. PR is used to quantify the extent to which one task has been procrastinated, and was calculated as 1 - CR (task completion rate). Critically, at the precise deadline, the app prompted participants to (a) indicate task completion status (yes/no), and if incomplete, (b) report the percentage completed (1–99%), defined as the Task CR, while simultaneously uploading objective evidence (e.g., screenshots of submitted files, photos of physical outputs, system-generated logs, or app-exported records). If the task was actually completed before the deadline, the CR would be 100% and the PR would be calculated as 0% (1-CR). PR was recorded at the actual task deadline for each participant. We were also interested in re-investigating their actual procrastination by using PR 6 months after the last neuromodulation to test the long-term retention of this neuromodulation effect.”

      References

      Myerson, J., Green, L., & Warusawitharana, M. (2001). Area under the curve as a measure of discounting. Journal of the experimental analysis of behavior, 76(2), 235–243. https://doi.org/10.1901/jeab.2001.76-235

      Xu, T., Zhang, S., Zhou, F., & Feng, T. (2023). Stimulation of left dorsolateral prefrontal cortex enhances willingness for task completion by amplifying task outcome value. Journal of experimental psychology. General, 152(4), 1122–1133. https://doi.org/10.1037/xge0001312

      Zhang, S., Verguts, T., Zhang, C., Feng, P., Chen, Q., & Feng, T. (2021). Outcome Value and Task Aversiveness Impact Task Procrastination through Separate Neural Pathways. Cerebral cortex (New York, N.Y. : 1991), 31(8), 3846–3855. https://doi.org/10.1093/cercor/bhab053

      Zhang, S., Liu, P., & Feng, T. (2019). To do it now or later: The cognitive mechanisms and neural substrates underlying procrastination. Wiley interdisciplinary reviews. Cognitive science, 10(4), e1492. https://doi.org/10.1002/wcs.1492

      Zhang, S., & Feng, T. (2020). Modeling procrastination: Asymmetric decisions to act between the present and the future. Journal of experimental psychology. General, 149(2), 311–322. https://doi.org/10.1037/xge0000643

      (2) Additionally, it is unclear whether the reported effects could be due to differential reporting of tasks (e.g., it could be that participants learned across sessions to report more achievable or less aversive task goals, rather than stimulation of DLPFC reducing procrastination per se). It would be helpful to demonstrate whether these self-reported tasks are consistent across sessions and similar in difficulty within each participant, which would strengthen the claims regarding the intervention.

      Thank you for raising this very crucial comment. We indeed agree with you on this point that the reported effects may vary with task difficulties and task-execution proficiency, which potentially confound the effects of stimulation on mitigating procrastination. As you correctly comment, given no data collection on difficulties or other relevant characteristics of tasks, we cannot completely rule out this confounder in interpreting our findings on the one hand. As a result, we have explicitly claimed this limitation in the Discussion section.

      On the other hand, despite no quantitative evidence, this risk of confounding main effects with disparities in task characteristics was controlled experimentally. As we reported above, all the reported tasks were mandated to meet three criteria: (a) they were already assigned in the real-world settings; (b) the deadline was constrained to 18:00-24:00; (3) they were likely to lead to procrastinate. To do so, each participant was clearly instructed to report a real-life task that was more likely to be procrastinated in real-world settings, and was not allowed to report easy, achievable and cost-less tasks. Supporting this case, those reported tasks were found spanning academic (e.g., submitting an assignment), occupational (e.g., preparing a presentation), administrative (e.g., applying for a license), self-improvement (e.g., practicing guitar for ≥30 min), domestic (e.g., laundry), and health-related domains (e.g., running ≥ 2,000m for exercise), indicating a plausible task diversity and difficulty. This was resonated by observing the high within-subject task homogeneity. For instance, for Participant #5, she/he reported the tasks that were almost all around academic activities across all the sessions. Therefore, as the task list reported (please see Appendix 1), these self-reported tasks were plausibly consistent across sessions and similar in difficulty within each participant.

      In addition, as we tested, almost all the participants reported they were receiving treatment, with 91.30% (21/23) for the active neuromodulation group (NM) and with 86.95% (20/23) for the sham control group (SC) (x<sup>2</sup> = 0.224, p = .636), indicating the effectiveness of the double-blinding methods. If participants learned across sessions to report more achievable or less aversive task goals, their procrastination willingness and procrastination rates for their reported tasks would all increasingly decrease, irrespective of whether they were in the active neuromodulation-effect group or the sham group. However, no such effects - procrastination willingness and procrastination rates for their reported tasks increasingly decreasing across sessions - existed in the sham control group (Mann-Kendall test, for procrastination willingness, tau = 0.60, p = .13; for procrastination rate, tau = 0.61, p = .13), indicating no statistically significant learning effect or strategic effect on task performance. Again, thank you for this very crucial comment, and we do hope these clarifications could address it.

      Limitations Section (Page 12, Line 637-640)

      “In addition, despite instructing to report valid real-life tasks with high probabilities to procrastinate, we had not yet measured the task difficulty and consistency across sessions for each participant. Consequently, interpreting the effects of neuromodulation to mitigate procrastination as “unique contributions” should warrant cautions. ...”

      (3) It would be helpful to show evidence that the procrastination measures are valid and consistent, and detail how each of these measures was quantified and differed across sessions and by intervention. For instance, while the AUC metric is an innovative way to quantify the temporal dynamics of task-aversiveness, it was unclear how the timepoints were collected relative to the task deadline. It would be helpful to include greater detail on how these self-reported tasks and deadlines were determined and collected, which would clarify how these procrastination measures were quantified and varied across time.

      We do appreciate your highlighting the importance of clarifying how to measure procrastination, substantially helping readers to interpret these findings. As reported above, the primary outcomes of this experiment included subjective procrastination willingness and objective actual procrastination rate. For the subjective procrastination willingness, using the purpose-built mobile app, participants were required to report subjective task-execution willingness score (i.e., one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”). Thus, the procrastination willingness was computed as “100 – the task-execution willingness score”. For the objective procrastination rate, rather than self-reported scores from those one-item visual analog scales, we asked participants to report the real “task completion rate from 1% to 99%” for the objective quantification of the “real-world procrastination behavior”. Full details can be found in Response #1.

      For determining sampling time points for the quantification of AUC, we capitalized on both the conceptual Temporal Decision Model and the statistical Myerson algorithm. Specifically, the Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of the trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations for avoiding taking action now for avoiding negative experiences). Once task aversiveness overrides the pursuits of task outcome values, the procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when being far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). To maximize statistical power to fit dynamic motivational curves, we employed a log-spaced temporal sampling scheme (Myerson et al., 2001). By this fitting algorithm (Myerson et al., 2001), five time points were selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampled at 10:00, 16:00, 18:00, 19:30, 20:00).

      Once the task-specific five sampling time points were determined per participant, this mobile app sent a digital message to ask her/him to immediately report the task aversiveness and the task outcome value then. After capturing the task aversiveness from those five time points, the task aversiveness discounting was calculated as 1- (A(t) / A(earliest)), where t(earliest) was the earliest sampling point (e.g., 10:00), serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), we estimated the AUC from those five data points based on the Myerson algorithm (Myerson et al., 2001), which was computed via the trapezoidal integration between task aversiveness discounting and time. By this modelling method, a higher AUC reflects stronger temporal discounting of task aversiveness, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. That is to say, if a participant showcases a greater discounting of task aversiveness as reflected by a higher AUC, she/he experiences a more pronounced reduction in subjective aversiveness upon postponement, plausibly yielding less procrastination.

      Taken together, following your suggestion, we have added a substantial number of details to clarify how to measure procrastination, when to sample the data and how to estimate the AUC into the revised manuscript. Please see them in Response #1.

      (4) There are strong claims about the multi-session neuromodulation alleviating chronic procrastination, which should be moderated, given the concerns regarding how procrastination was quantified. It would also be helpful to clarify whether DLPFC stimulation modulates subjective measures of procrastination, or alternatively, whether these effects could be driven by improved working memory or attention to the reported tasks. In general, more work is needed to clarify whether the targeted mechanisms are specific to procrastination and/or to rule out alternative explanations.

      Yes, we fully agree with you on this consideration: we should tone down the conclusions currently claimed in the main text, given the inherent shortcomings mentioned above. As you helpfully suggested, we have moderated our overall claims regarding the effects of multi-session neuromodulation in alleviating chronic procrastination. Please see specific instances below:

      Abstract Section (Page 2, Line 55-57)

      “... This establishes a precise, value-driven neurocognitive pathway to account the conceptualized roles of self-control on procrastination, and potentially offers a validated, theory-driven strategy for interventions.”

      Conclusion Section (Page 13, Line 657-664)

      “In conclusion, this study potentially provides an effective way to reduce both procrastination willingness and actual procrastination behavior by using neuromodulation on the left DLPFC. Furthermore, such effects have been observed for 2-day-interval long-term after-effects, and were also found for 6-month long-term retention in part. More importantly, this study identified that the ms-tDCS neuromodulation could decrease task aversiveness and increase task outcome value while, and further demonstrated that the increased task outcome value could predict decreased procrastination, a relationship conceptually driven by enhancing self-control. In this vein, the current study enriches our understanding of neurocognitive mechanism of procrastination by showing the prominent role of increased task outcome value in reducing procrastination. Also, it may provide an effective method for intervening in human procrastination.”

      Moreover, yes, as we clarified above, in addition to the objective measure of procrastination behavior, we also leveraged a one-item visual analog scale (i.e. one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”) to measure subjective procrastination willingness. Results demonstrated that the subjective procrastination willingness significantly decreased across neuromodulation sessions in the active group, but not in the sham control group, consistent with the observed reduction in the objective procrastination measure. In addition, we all perceive it as helpful and crucial to note that we cannot draw the conclusion that the effects of neuromodulation on mitigating procrastination are contributed by increasing task outcome value uniquely. Given no measures or evidence of other factors, such as working memory and attention, we cannot rule out other neurocognitive pathways. To address this point, we have removed or rephrased such statements throughout the whole revised manuscript, and explicitly constrained to interpret this neurocognitive mechanism (i.e., increased task outcome value) within the theory-driven framework of the temporal decision model.

      Reviewer #3 (Public review):

      This manuscript explores whether high-definition transcranial direct current stimulation (HD-tDCS) of the left DLPFC can reduce real-world procrastination, as predicted by the Temporal Decision Model (TDM). The research question is interesting, and the topic - neuromodulation of self-regulatory behavior - is timely.

      Many thanks for kindly dedicating time to review our manuscript, and for the helpful comments detailed below. Thank you for appreciating the novelty of this study.

      However, the study also suffers from a limited sample size, and sometimes it was difficult to follow the statistics.

      Thank you for pointing out these crucial concerns. As you correctly raised, the sample size is somewhat small in any case, but we confirm that this sample size is adequate to obtain medium statistical power.

      For estimating the sample size, we determined the a priori effect size based on the existing work we published (Xu et al., 2023, J Exp Psychol Gen;152(4):1122-1133). In this pilot study, we identified a significant interaction effect between single-session tDCS stimulation (active vs sham) and time (pre-test vs post-test) (t = 2.38, p = .02, n = 27; 95% CI [0.14, 1.49]) for changing procrastination willingness in laboratory settings, indicating a medium effect size. Therefore, this pilot study provides supportive evidence to determine this effect size a priori.

      Using the GPower software with an estimation of a medium effect size, we determined that a total sample size of N<sub>total</sub> = 34 could reach adequate statistical power. Please see outputs of the GPower in Author response image 1.

      As for the statistics, we genuinely acknowledge that the vague methodological descriptions and complex algorithms indeed complicated the understanding of the methods and statistics. To address this, echoing the comment raised by Reviewer #1, we have removed the complicated statistics and methods, and further clarified how we used the generalized linear mixed-effect model (GLMM) for statistical analysis. Please see the specific revisions below:

      Methods Section (Page 8, Line 378-403)

      “Statistics

      All the statistics were implemented by R (https://www.rstudio.com/) and R-dependent packages.

      To clarify whether multiple-session HD-tDCS neuromodulation can reduce procrastination, the generalized mixed-effects linear model (GLMM) was constructed with full factorial design for subjective procrastination willingness (i.e., self-reported visual analog scores) and actual procrastination behavior (i.e., real-world task-completion rate before deadline). Here, sex, age and socioeconomic status (SES) were modeled as covariates of no interest. As the National Bureau of Statistics (China) issued (https://www.stats.gov.cn/sj/tjbz/gjtjbz/), on the basis of per capita annual household income, the SES was divided into seven hierarchical tiers from 1 (poor) to 7 (rich). To obviate subjective rating bias stemming from individual daily mood, we separately measured participants’ daily emotional fluctuation at 10:00 and 16:00 using a self-rating visual analog item (i.e., “How do feel for your mood today?”, 0 for “completely uncomfortable” and 100 for “definitely happy”). By doing so, the averaged score of those self-rating emotions at the two time points was modeled into the GLMM as covariate of no interests, yielding the final expression of “outcome ~ Group*Treatment_Day + Age + Gender + SES + Emotions + (1 + Treatment_Day | SubjectID)” in the statistical model”. This analysis was implemented using the “lme4” and “lmerTest” packages. Employing “emmeans” package, simple effects were also tested at baseline and post-last-intervention using Tukey-adjusted pairwise comparisons of estimated marginal means from the full GLMM, controlling for covariates and random-effects structure. To validate statistical robustness, instead of continuous outcomes for parametric tests, we also conducted a between-group comparison for the number of tasks that procrastination emerges by using the nonparametric x<sup>2</sup> test with φ correction or Fisher exact test. Regarding the 6-month follow-up investigation, this GLMM was also built to examine the long-term retention of neuromodulation on reducing actual procrastination.”

      The preregistration and ecological design (ESM) are commendable, but I was not able the find the preregistration, as reported in the paper.

      We are sorry to encounter a serious technical barrier that has rendered our preregistration invisible and inaccessible. The OSF has disabled my OSF account, as it claimed to detect “suspicious user’s activities” in my account. This has prevented access to all materials deposited in this OSF account, including this preregistration. We have contacted the OSF team, but received no valid technical solution to recover this preregistered report (please see the screenshot below). We reckon that this may be due to my affiliation change to the Third Military Medical University of People’s Liberation Army (PLA).

      To address this unexpected circumstance and to ensure transparency, we have explicitly reported this case in the main text, and added the “Reconstructed Preregistration Statement” to the Supplemental Materials (SM). Also, as it has been out of best practices in preregistration, in addition to transparently reporting this case, we have removed this statement regarding preregistration elsewhere throughout the revised manuscript.

      Overall, the paper requires substantial clarification and tightening.

      We are grateful for your evaluation, and we fully agree with you. In response, we have added a tremendous number of details to clarify how to measure procrastination, how to conduct the statistical analyses, and how to collect real-life tasks, as well as other experimental materials. Please see the revisions in the Methods section of the revised manuscript. Again, thank you for those helpful suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the Supplemental Materials, page 4, lines 163 to 167 seem to be from a different manuscript (as the section talks about neural markers, significant clusters, and brain networks).

      We are sorry for erroneously embedding this irrelevant section here. We have removed it, and have double-checked the document to avoid such mistakes.

      (2) I'm no expert here, but some of the trace and density plots in the SOM look problematic (e.g., Figure S5 top panel). But it's not made clear to which model/analysis these plots belong, so they are not very helpful without that information.

      Thank you for bringing these potentially problematic plots to our attention. Following your great suggestion, these results have been removed from the SM to amplify readability and comprehensibility.

      (3) Table S1 reports side effects "from the neurostimulation" (this is also the language used in the main manuscript), but having the flu is rather unlikely to be a side effect from the stimulation, isn't it? Thus, this language is highly confusing, and when reading the main text, it's not clear that these are just life events that are most likely unrelated to the stimulation, but have the potential to affect the measured variables (i.e., ultimately, they seem a source of noise).

      We apologize for this confusing wording. Here, the “side effects” are defined as confounding effects deriving from unexpected life events that uncontrollably disrupt task execution and task performance, such as “having the flu”, or “an unexpected mandatory CCP (Communist Party of China) meeting assignment”. To obviate misunderstanding, we have rephrased “side effects” as “unexpected life events disrupting task execution” in both the main text and the SM section both.

      (4) The use of the English language could be improved.

      Thank you for your very practical suggestion. As you kindly suggested, we have invited a proofreading editor to edit and polish the English of the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) It would be helpful to include greater detail about the ESM procedure and details of the self-reported tasks. This would help rule out potential confounds of difficulty or learning (e.g., participants may have learned to identify more achievable and less difficult tasks across the sessions, which would mean they are learning to perform the task better rather than to procrastinate less). Further elaboration on the quantification of procrastination measures would help clarify the mechanism underlying this behavior, which is important for clarifying how these effects arise and what aspect of procrastination behavior is being targeted by the tDCS intervention (and rule of alternative explanations).

      We wholeheartedly appreciate your sharing this very crucial recommendation. As we mentioned above, we fully followed your helpful suggestions, particularly by adding massive details to fully report how to collect real-life tasks (with consistent and plausible difficulty across sessions), how to determine sampling time points, and how to quantify metrics (e.g., subjective procrastination willingness score, objective procrastination rate, AUC of task aversiveness, and task outcome value) to the revised manuscript. We do believe that these revisions and clarifications are imperative and necessary. By including these details, we do believe that the readability and clarity have been substantially improved in the current form. Please see the specific revisions and clarifications above.

      (2) It would be helpful to proofread for grammatical and spelling typos (e.g., DLPFC is spelled incorrectly in line 140, Satterwaite is spelled incorrectly in Line 415).

      Thank you for your kind suggestion. Both spelling typos have been corrected, and we have double-checked the revised manuscript to ensure no such typos remain. As you kindly suggested, we have invited a proofreading editor to edit and polish the English of the revised manuscript.

      (3) Please clarify in Figure 4 that a higher AUC is associated with lower task aversiveness (which is stated in the methods but not clearly in the figure).

      Many thanks to you for your helpful suggestion. As you kindly suggested, we have clarified this case in the figure legend.

      Reviewer #3 (Recommendations for the authors):

      I want to see the preregistration.

      Thank you for your helpful recommendation. As we replied above, a serious technical issue on OSF occurred, making our preregistration invisible and inaccessible. OSF has disabled my account, claiming to detect “suspicious user’s activities” in my account. As a result, there is no access to all materials that were already deposited in this OSF account, including this preregistration. We have reconstructed this preregistration based on archived documents, and reported it in the SM. As we reported above, although this partially addresses the problem, it no longer fulfills the best practices of preregistration. Consequently, in addition to transparently reporting this case, we have removed all the preregistration statements throughout the revised manuscript.

    1. Author response:

      Both reviewers noted that some published studies question the association of HPV types with cervical cancer survival {PMIDs 36207323 and 33117670}, while others did not observe that {REFS 69-74 in Chakravarty}. We appreciate both reviewers pushing us to discuss and hypothesize (even speculate) on our finding that HPV types not in phylogenetic clade α9 types (including HPV18) had more recurrences than α9 types (including HPV16). The most likely explanation is that we analyzed 225 HPV types not just the most prevalent types. Specifically, each of the 5 recurrences in our pilot study had different HPV types (α7’s: 18, 39, 45, 70 & α5: 69). Similarly, on re-examination of the TCGA data set, we found that 80% of the 181 α9 samples had HPV16, while 52.5% of the non-α9 samples had HPV18, consistent with a broader variety of types in the latter. We note that PMID: 36207323 did assess a broad number of HPV types, but these were classified into three non-cladistic categories, HPV16, HPV18 and Other for comparison. More in line with the main point of that study, HPV18 was enriched, though not significantly, in the more pathogenic C2 group (which was defined by a deep analysis of specific genomic alterations). It can be speculated that perhaps α9 types are less proficient at effecting or interacting with some C2 characteristic(s). Overall, we suggest that these observations emphasize the importance of examining the full spectrum of HPV types including phylogenetic relationships in cervical cancers induced by these viruses.

      Reviewer #1:

      The detection of “non-tumor HPVs” was noted as a potential limitation. The highly multiplexed, HC+SEQ methodology that we use obviously detects many HPV types and thus can identify lesions with multiple HPV types as occurred in Patient 16 and in other HPV cancers. It is unclear what role multiple HPV types might play in tumorigenesis if any. Regardless of whether broad detection of HPV types proves to be a limitation or an advantage, it will be interesting. Our approach in this study focused on integration of HPV DNAs into human DNA, as this is a key event in cervical tumorigenesis. We believe that detection of clonally expanded cells with an integrated URR-E6-E7 DNA segment of any HPV type (whether high-risk, low-risk, or intermediate, or even perhaps non α-clade {PMID:40742260}) should be viewed with suspicion. For the small fraction of cervical cancers that contain only unintegrated HPV DNA, it will be interesting to see if these viral DNAs share any particular properties.

      The reviewer asked for details of the HPV DNA capture probes used. All were from the proprietary Roche Nimblegen SeqCap EZ System. They encompassed all HPV types from HPV1 through HPV225.

      The reviewer questioned why the data verifying the viral-human DNA junctions in primary tumor tissue by the orthogonal approach of PCR assays PCR assays were not shown. The data summary and the approach used for PCR are in Figure 1, Table 1 and Supplementary Table 1. Only the dozens of agarose gel photographs were not shown. PCR assays that addressed key issues comparing primary and metastatic sites and confirming HPV16 + HPV18 coinfection are shown in Figure 2 and Figures 4A & 4B, respectively.

      Reviewer #2:

      The reviewer raised general issues about data quantification and statistical adequacy. Regarding data quantification, we used a strict, conservative guideline of a 10 read minimum per junction in the DNA from tumor samples. This was based on the sequence analysis pipeline design and on our requirement that some clonal expansion of cells containing specific junctions must have occurred. Extensive complications to comparing quantified read counts in different studies are detailed below in the responses to specific comments. The statistical methods used were based on the dichotomous variable of detection versus no detection of integrated HPV DNA. For this study, we also used the orthogonal method of verifying every junction by PCR with one primer in viral DNA and the other in flanking human DNA followed by Sanger sequencing. The statistical methods used were entirely appropriate for this dichotomous variable and time to event analyses. Nonetheless, we concur that quantification of HPV DNA integration would be an interesting variable to consider once carefully controlled methodologies are applied considering the issues detailed below.

      Regarding the first point about variability in HPV-human junction number in different studies: The number of HPV DNA genome and junction read counts obtained from a sample are subject to numerous technical and biological variables. Extensive caution should be applied when comparing quantitative results among different studies, and this particularly includes the number HPV-human DNA junctions detected. Among the factors that can be involved among different studies are the following: 1) inadequate deduplication of sequence reads; 2) “barcode-hopping” or “bleed-through” from one sample to another and thus cross-contamination of one sample with another during multiplexed short-read sequencing; 3) variation in the fraction of cells that are tumor cells in the post-clinical analysis sample of tissue obtained; 4) artifactual ligation of HPV and human DNA segments occurring at the adaptor ligation step of short-read sequencing; 5) variability in the mismatch settings of computational sequence aligners used; 6) perhaps most importantly, the level of genomic instability of each particular integration locus; and 7) subclonal variation in proliferation or survival of cells containing specific junctions within a lesion. The reviewer correctly implied that our requirement for a minimum of 10 sequence reads at each junction excludes low level, subclonal variants. Nonetheless, one tumor did have two integrations (Table 1). More importantly, we emphasize that all five tumor-recurrences at distant metastatic sites in our study had the exact same integration event as the primary tumor (determined at single nucleotide resolution at both ends). We judge this to be compelling evidence that the approach we use correctly identifies the key integration event underlying each cancer.

      Regarding the second point about ratios between genomic DNA copy numbers and junction read counts: Both human genome and HPV genome copy numbers deserve mention in regard to this issue. HPV HC+SEQ highly enriches for viral DNA, with the advantage gained of high read depth for viral sequences, but with human DNA largely excluded (except for the junction reads). Thus, ratios of junctions to the rest of the human genome cannot be assessed as they can be with whole genome sequencing methodologies. While HPV genome read depth can be ascertained with HC+SEQ reads (as in Figure 1C, 1D, 1E), and the reviewer’s suggestion raises the possibility of using junction to viral read ratios to normalize data to compare different integration loci and even perhaps different studies, there are nonetheless additional, biomedically relevant complications. HPV DNA segments are sometimes often present as tandem units with or without human DNA segments in tumors (Figure 1E shows the former), and this affects the ratio of junctions to viral genomes. Thus, using the suggested ratios would require additional normalization for tandem copy numbers, and thus, it would be difficult to use them in a manner analogous to gene-specific read counts per million total read ratios in RNA-seq.

      Regarding the third point about comparing read counts from primary tumor tissue with those from cfDNA: Ours was a retrospective study using archived samples that were available, and the HPV genome coverage obtained by HC+SEQ using cfDNA varied (Table 1). Assessment of viral DNA genome and human junction reads in a quantitatively reliable manner by HC+SEQ will require application of precise collection, storage, and processing of cfDNA samples. Nonetheless, the results presented in this study, while variable among the different samples, were entirely sufficient to test the dichotomous variable analyzed. We note that this included orthogonal, PCR verification of junctions, based on the straightforward, abundant identification of the junctions by HC+SEQ in the primary tumor samples. We emphasize that examination of HPV DNA integration directly interrogates a key, likely causal event in HPV cervical tumorigenesis.

      Regarding the fourth point about many of the initial cancer samples harboring no junction breakpoints: 100% of the 16 initial, cervical, primary tumor tissue samples harbored an integration (one sample had two). The reviewer is correct that many of the initial cfDNA samples lacked HPV DNA integration as assessed by HC+SEQ and by PCR based on the junctions detected in the primary tumor tissue. We interpret this to mean that these cancers were not spilling genomic DNA containing the integrated HPV DNA into serum at sufficient levels to be detected, and judge this to be due to underlying, unidentified, biomedically-relevant effects.

      Regarding the fifth point about HPV-human DNA junctions being used as a measure of tumor heterogeneity and subclonal variation: We concur with the reviewer that this is an interesting, important issue. We noted it in the response to the “first” point (numbers 6 and 7) above. Again, one of the samples had two integrations, and this patient did not suffer a recurrence (Table 1, Figure 1). Based on our ongoing experience, to take findings of junction subclonality beyond just detection of multiple integration junctions, we believe that development of in situ, single cell approaches are necessary to reveal the full meaningful picture of subclonality.

      Beyond these quantitative issues that we raise in response to Reviewer #2’s comments, the Reviewers’ comments point at important, incompletely understood aspects about HPV tumorigenesis. Our finding of the identical viral DNA insertions in primary tumors and metastases point to a central, constant role for these structures in viral tumorigenesis. Nonetheless, the issues raised point to key questions concerning subclonality, detailed structures and quantification of HPV and human tandem DNA units, intrachromosomal DNA vs. ecDNA, genomic instability of integrated HPV DNA loci, and cell-to-cell variation, and what roles these might play in tumorigenesis.

      Regarding the point about cell-free DNA breakpoints, we note the field of circulating tumor DNA fragmentomics that examines the sequences and a host of structural properties of circulating DNAs derived from tumors including specific, short sequences at the ends (breakpoints) of DNA fragments circulating in blood. These are of emerging significance as biomarkers for cancer {PMIDs:40038442 and 41043439}. We note that cell free DNA breakpoints are not synonymous with DNA junctions. We stress again that the main point of our manuscript was to investigate HPV-human DNA junctions in cfDNA, as this directly addresses a likely causal mechanism underlying HPV cervical tumorigenesis. Additional, future studies would be required to assess the effectiveness of our targeted, individualized approach relative to other aspects of fragmentomics in cervical cancer.

      In summary, we restate one of the reviewers’ points. “This study provides important foundational evidence for further evaluating the clinical utility of HPV DNA detection from cfDNA and specifically assessing for integration junctions.” Both reviewers raised thoughtful points about DNA integration and HPV tumorigenesis, and prospective studies are required to refine and evaluate clinical utility of the new findings presented here.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This interesting paper probes the problematic relationships between the classical "spiralian" taxa, i.e., annelids, molluscs, brachiopods, platyhelminths and nemerteans, and shows that the branches leading to them are so short as to be unreliable guides to their relationships. This, in turn, has important implications for how we view the origin of the animal phyla.

      Strengths:

      A very careful analysis of a famous old problem with quite significant results. The results seem to be robust and support their conclusions.

      It often passes uncommented that many different trees are published about animal relationships, yet some parts of the tree seem extremely difficult to resolve; the spiralians are perhaps the most difficult case. More recently, problems about sponges or ctenophores as sister groups to the rest of the animals have alerted us to major areas of uncertainty in large-scale phylogenetic reconstruction; this paper is a welcome reminder that other, perhaps even harder, problems exist which may be difficult to ever resolve with the (molecular) data we have.

      Weaknesses:

      The paper could have perhaps drawn out some of the implications of its results in a clearer manner.

      Reviewer #2 (Public review):

      Summary:

      The relationships among the phyla making up Spiralia - a major clade of animals including molluscs, annelids, flatworms, nemerteans and brachiopods - have been challenging from a phylogenomic perspective despite decades of molecular phylogenetic effort. Every topology uniting subsets of these phyla has been recovered with apparent support in at least one study, yet no consensus has emerged even from large-scale genomic datasets. Serra Silva and Telford set out to determine whether this instability reflects a genuine biological signal being obscured by analytical limitations, or whether it reflects a rapid, near-simultaneous origin of these phyla that has left behind in modern genomes far too little phylogenetic information to resolve. They focused deliberately on five phyla, reducing the problem to a tractable set of 15 unrooted and 105 rooted topologies, and applied a suite of complementary approaches across two independent datasets and multiple substitution models to test whether any topology is significantly preferred over alternatives.

      Strengths:

      (1) The conceptual framing of the problem is excellent, and the study makes a convincing case across several lines of evidence. By enumerating all possible topologies and demonstrating empirically that every one of the 15 unrooted arrangements has been recovered as the preferred solution in at least one published study, the authors make a strong argument about the state of the field. The use of two entirely independent datasets as a consistency check is great, and convergence between them, where it occur,s substantially strengthens confidence in the conclusions.

      (2) It is my view that the simulation framework is a particular strength. Generating data on a fully unresolved star tree and scoring those data under both correctly-specified and misspecified substitution models provides convincing evidence that the strong preference for rooting Spiralia on the flatworm branch is, at least partly, an analytical artefact driven by the exceptionally long branch in combination with compositional heterogeneity across sites. This is an important methodological demonstration with implications beyond spiralian phylogenetics, as the same issue is likely to affect other deep, long-branched lineages in the animal tree of life.

      (3) The randomised taxon-jackknifing approach is a very nice addition here. The demonstration that preferred topologies shift depending on which species happen to be sampled (even within the same phylum) is a convincing indicator of weak signal, and provides a practical caution for future studies that may report strong support for a particular spiralian arrangement based on a fixed taxon sample.

      (4) The branch-length analyses, benchmarking internal interphylum branches against the already disputed and extremely short branch uniting deuterostomes (work also by this group), are well-conceived and solid.

      (5) I think it is worth highlighting the notable intellectual honesty throughout the paper: the authors do not overstate their results, correctly acknowledging that while the unrooted topology grouping molluscs with brachiopods and flatworms with nemerteans emerges most consistently, this preference is not statistically significant under more adequate substitution models and may itself carry some artefactual component.

      Weaknesses:

      (1) The restriction to five phyla is the most significant limitation, as the authors acknowledge this and give a clear computational justification, but readers should be aware that the paper's convincing conclusions apply specifically to the five focal phyla and the evidence remains incomplete with respect to spiralian phylogeny as a whole.

      (2) The treatment of substitution model adequacy, while commendably thorough for site-heterogeneous models, is necessarily bounded. The authors note that models accounting for non-stationarity, across-lineage compositional heterogeneity, or mixtures of tree histories might yield different results, and that even the most sophisticated currently available approaches have not produced consistent spiralian topologies across studies. This is not a criticism of what has been done here - the analytical scope is reasonable and well-implemented - but it means the paper cannot be read as a definitive demonstration that no model will ever resolve these relationships. The distinction between a true hard polytomy and a radiation that is effectively unresolvable given current data and methods could be drawn more sharply in the discussion.

      (3) The reticulation-aware coalescent analyses are presented somewhat briefly relative to the likelihood-based topology scoring. The finding that flatworms are recovered within a paraphyletic jaw-bearing animal clade in both summary trees - interpreted as long-branch attraction - is striking, and its implications for gene-tree-based approaches to spiralian rooting deserve more discussion than they currently receive.

      (4) The central conclusions - that interphylum branches in Spiralia are extraordinarily short, that topological preferences are strongly model-dependent and taxon-sampling-sensitive, and that an ancient rapid radiation is the most parsimonious explanation - are convincingly supported by the evidence presented. The identification of flatworm long-branch attraction as an important confounding factor in rooting analyses is itself an important and well-demonstrated result.

      Conclusion:

      This paper clearly makes an important contribution to the ongoing debate about spiralian relationships and, more broadly, to methodological discussions about how to handle anciently diversified clades where phylogenetic signal is genuinely limited. The exhaustive topology-scoring framework combined with taxon-jackknifing and simulation under unresolved trees is a valuable methodological template that could usefully be applied to other notoriously difficult nodes in the animal tree. I thoroughly enjoyed the discussion of the implications of these findings for interpreting Cambrian fossils and the evolutionary history of shells, segmentation, larval types and other characters - it is both thoughtful and thought-provoking and will be of broad interest well beyond the phylogenomics and zoology communities. From a very practical perspective, the data and scripts provided make the work useful to researchers wishing to apply similar approaches to other groups.

      Reviewer #3 (Public review):

      Summary:

      This paper addresses the controversial internal relationships within the Spiralia, a major clade of invertebrate animals including molluscs, annelids, brachiopods and flatworms.

      Strengths:

      Performs a range of empirical analyses and simulations that address the core question. Although a favoured unrooted topology finds some support, this is not strongly endorsed in the paper.

      Weaknesses:

      (1) Only considers a subset of relevant phyla (e.g. gastrotrichs are relevant to the phylogenetic position of Platyhelminthes), although how this would change the scale of the analyses (i.e. number of topologies) is addressed in the paper.

      (2) Discussion of Spiralia evolution and broader context, particularly the relevance for the fossil record. Line 448: our current understanding of the early spiralian fossil record is quite consistent with the main results of this paper. For example, there are very few claims for fossils that sit on the short branch leading to Spiralia (or Lophotrochozoa as defined here) that this paper discusses. Many of the key fossils that inform on the characters discussed in the introduction, which have unusual character combinations, have an apomorphy of one of the phyla discussed, and so are resolved as members of the stem lineages of particular phyla.

      (3) This is what you would expect with long phylum stem lineages (line 148) and a short spiralia stem lineage. For example, the mollusc Wiwaxia has chaetae, but a mollusc like Radula (Smith 2012), the conchiferan mollusc Pelagiella has chaetae and a coiled shell (Thomas et al. 2020). The only fossil groups that are routinely discussed as belonging to the stem lineage of more than one phylum are the tommotiids, which have chaetae, segmentation and a complex mineralised skeleton (but not shells in the brachiopod/mollusc sense, see Guo et al 2023) but they sit on the lophophorate stem lineage, a synapomorphy rich group the monophyly of which the present paper endorses (e.g. line 435). The fossil record is consistent with the scenario presented in line 442, e.g. convergent loss or reduction of chaetae and segmentation and convergent evolution of shells in molluscs and brachiopods.

      We thank the reviewers for their kind comments. Please see below for detailed responses to all identified weaknesses.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some minor comments that might help improve the paper:

      (1) Abstract L17. "Most analyses on the 15 unrooted trees showed a preference for the same topology but the support over other solutions was non significant" - I don't really understand this sentence in the context of the paper; it makes it sound as if the tree is, after all, well resolved! Non-significant, or not significant better than non significant?

      Having read the rest of the paper I see what this refers to (uT4), but still I don't understand the second clause.

      Re-written to clarify.

      (2) Introduction L31. This makes it sound as if phoronids are actually part of brachiopods, and while that was recovered by Cohen and Weydmann 2005, I'm not sure if it's really a general result. In addition, rather than using "brachiopods plus phoronids" everywhere, you could use "Brachiozoa" (Cavalier-Smith 1998, Biol. Rev).

      We have updated our text and figures to use Brachiozoa.

      (3) L36-37. Yes, but the presence of Chaetagnatha in this clade is suggestive that their primitive body size is not small.

      Have made clear that chaetognaths are not all tiny.

      (4) L85. Kumar et al. may have claimed that Spiralia are as old as 670, but many other analyses would suggest a range of different results. Why choose just this one? In addition, this age seems rather incompatible with your results.

      We agree this maximum age is highly improbable (the principal point remains the deep age of the protostomes). We have used a different reference and refer to a generally acceptable minimum age only.

      (5) L88. The key part of this sentence, "proving a hard polytomy", comes at the end of a long set of references that makes it hard to connect to the lead-in "given the age of", so I would suggest rephrasing.

      Rephrased for clarity.

      (6) L109. It is unclear what this means in the context: "and even support multiple topologies".

      Re-worded for clarity.

      (7) Figure 1. Why did you choose to indicate brachiopods plus phoronids as a larval form, unlike the other clades? Perhaps it's because we don't know what the last common ancestor of the two looked like (unless P is an ingroup of B), but that's arguably true for some of the other clades as well!

      Apologies, this was laziness as we already had a line drawing of an actinotroch larva. Have improved the images in figures 1 and 5 where required.

      (8) L164. Reticulation-aware analyses. As I understand it, this would include introgression, hybridization, etc. However, incomplete lineage sorting has also been invoked, not just for Cambrian-explosion age events but also for other major radiations, such as for angiosperms and birds. How significant might ILS be for generating the results you get?

      Section title amended. Results section updated to reflect this. We now explicitly mention the potential impact of ILS and introgression on spiralian relationships in our discussion.

      Unrooted trees analysis:

      (9) L405 on. Maybe it would be worth including a figure showing the relative branch lengths of uT4. All the images of trees show similar-length branches, which gives off the wrong impression within the context of the paper!

      We understand the motivation, but we worry that showing uT4 as the sole phylogram may end up with this being interpreted by a casual reader as being the main result of the paper. Hopefully the figures with branch lengths encompass this information well enough and with no danger of misinterpretation.

      (10) L430 on. Why is this a "conservative" interpretation?

      Yes agreed not clear. Have changed to “We interpret our results as showing that…”

      (11) You mention synapomorphy accumulation time and implicitly equate shortness of branches with shortness of time. However, other options are available under varying diversification rate models (e.g. ClaDs, Barido-Sottani et al. 2023 Syst. Biol.; CET, Budd and Mann 2025, Syst.Biol.). In particular, the latter paper shows that when unusually large clades are selected for study (as is arguably the case here), then those clades are likely to have started with very high "evolutionary tempo", which speeds up all aspects of evolution, including diversification rates.

      In the Budd and Mann scenario large clades begin with high tempo of cladogenesis, high substitution rate and high diversification rate (rapid origin of new characters). This would suggest that the period of the radiation was extra rapid (even less time than in a ‘normal’ period during which smaller clades emerge) so we feel the point stands.

      (12) L449. Maybe refer to the Song et al. paper again here on scaphopods plus bivalves, as it makes the same sort of points, albeit in a slightly different context.

      We thank the reviewer for the suggestion and have added the citation where relevant.

      (13) Finally, to return to L20. You mention implications for the Cambrian fossil record, but then fail to deliver any!

      We have hopefully addressed this remark in the discussion better (at least to the extent we are qualified to).

      Yet if you are correct, then synapomorphy accumulation would unite groups of phyla, and would surely lead to a scenario highly incompatible with clock models suggesting deep origins of clades (as they would all be more fossilisable).

      Apologies but we don’t completely understand this point as ‘synapomorphy accumulation would unite groups of phyla’ is a little ambiguous. Of course, this is generally true, but our results suggest there was little opportunity to accumulate identifiable synapomorphies linking pairs, triplets or quartets of our 5 spiralian phyla.

      In addition, clock results suggest rather long periods of time leading to the phyla, which would imply that there would have to be extremely slow rates of molecular evolution to yield the short early branches here. Also, it might be worth referring to papers compatible with this view, such as Wernström, J.V. et al., EvoDevo 13, 17 (2022). https://doi.org/10.1186/s13227-022-00202-8 or some of the palaeo literature, such as Budd and Jackson 2016, Phil Trans.

      The referee refers to clock results suggesting a (deep) Ediacaran origin of Lophotrochozoa/Spiralia. We interpret the spiralian radiation itself as rapid but, in the absence of a clock analysis, we cannot comment on when it took place.

      Reviewer #2 (Recommendations for the authors):

      (My not very) Major points - as I feel this is an excellent paper.

      (1) The coalescent-based summary tree analyses warrant expansion. The recovery of flatworms within a paraphyletic jaw-bearing animal clade in both summary trees is a striking result attributed to long-branch attraction, but this interpretation would be strengthened by examining whether pruning or downweighting the longest-branching taxa within those groups affects the outcome, or by reporting per-node quartet scores more fully. This would make the reticulation-aware results more directly informative and would bring this section into better balance with the detailed likelihood-based analyses.

      We thank the reviewer for the suggestion of the expanded analyses. We have now done these, and they yielded essentially the same results as the unpruned analyses. Additionally, while not discussed, we ran the Astral analyses on the subset of gene-trees where all groups of interest (spiralian phyla and superphyletic Ecdysozoa, Deuterostomia, etc.) were monophyletic and found no changes to interphylum quartet scores beyond those due to enforced (super)phylum monophyly, with Platyhelminths still recovered within Gnathifera.

      We have expanded our description of the results slightly as well as our discussion. Location of the tables with detailed quartet scores and local posterior probabilities has been added to Fig. S1’s legend.

      (2) It would strengthen the paper to include at least a brief analysis or explicit discussion of whether any currently available models accounting for non-stationary or across-lineage compositional heterogeneity show any change in the pattern of support, even if only tested on a subset of topologies. A null result here would itself be informative and would make the conclusions more robust to the concern that unexamined model classes might behave differently.

      We thank the reviewer for the suggestion, but this represents a considerable amount of new work and we think it falls outside the scope of the present work. We have, as suggested, included this as a discussion point.

      (3) The authors note that topologies grouping flatworms with ribbon worms appear among the higher-scoring arrangements even under model misspecification in simulations. It would be helpful to comment explicitly on whether the apparent signal for this grouping should therefore be regarded with particular scepticism, or whether it survives artefact correction in any of the analyses, as this is a grouping that has appeared repeatedly in the literature and readers will want guidance on how to interpret it.

      We do state that the nemertean+platyhelminth grouping seems likely to be at the least emphasised by an artefact (as the referee points out it is common to the higher scoring trees in the star tree simulations). We state that this suggests “…that this grouping derives some support from systematic errors.” We now return briefly to this in the discussion.

      Writing and presentation

      (1) The abstract states that rooting Spiralia on the flatworm branch "is a long-branch artefact" - this is slightly stronger than the language used in the body of the paper, where the authors correctly write that this preference is "at least enhanced by" the artefact. The abstract phrasing should be softened to reflect the more nuanced conclusion in the text.

      Good point. Done.

      (2) A brief signposting sentence near the start of the Results, setting out the overall analytical logic before the individual sections begin, would help orient readers. The strategy - score all topologies, test robustness to model choice and taxon sampling, then use simulation to identify artefactual signals - is clear in retrospect but would benefit from being made explicit upfront.

      We have taken this suggestion on board. The summary seemed in the end better placed as the final part of the introduction.

      (3) Figure 3 is complex and would be easier to interpret with a brief explanatory note in the legend clarifying what a wide versus narrow range of log-likelihood scores across topologies means in practical terms for statistical resolution between trees.

      Added sentence to legend.

      Minor Corrections:

      (1) The Figure 2 legend contains a typographical error: "shorter than the short, disputed deuterostome branch" should read "shorter than."

      Done

      (2) At least one reference appears to carry a future publication year (Ishii et al., 2026) and should be verified for accuracy before final submission.

      This reference is correct per the journal’s website. We did find Google Scholar to list it as being from 2025.

      Reviewer #3 (Recommendations for the authors):

      (1) Abstract/SI definitions of Spiralia/Lophotrochozoa

      While I don't have strong feelings about this, if Spiralia is being used as an apomorphy-based name, then it still might be equivalent to Lophotrochozoa, as spiral cleavage in Gnathostoniula jenneri was illustrated by Riedl (1969). Although no other studies have replicated this observation, this should at least be mentioned.

      Sorry this reference to gnathostomulid spiral cleavage was included in a longer version of the discussion of nomenclature. This was first reduced in length (which was when the mention of gnathostomulid spiral cleavage was dropped) then finally moved to the supplementary material. We have now re-included mention of this in the discussion in supplementary info.

      The SI text suggests that the name Lophotrochozoa, as used in its original form by Halanych et al. (1995), was a node-based definition, and that this name is for the sister group of Ecdysozoa. However, in that paper, the name is actually defined as "as the last common ancestor of the three traditional lophophorate taxa, the molluscs, and the annelids, and all of the descendants of that common ancestor". This definition would exclude Gnathifera, and depending on the internal relationships of the non-Gnathiferan phyla, may be equivalent (or not) to the usage of the name Spiralia adopted in the present paper. The perils of mixing node and apomorphy-based definitions of clades are clear, and the situation is less straightforward than the paper suggests, and (somewhat unhelpfully given the subject of the paper) may only become clearer if the relationships of non-ecdysozoan protostomes are resolved.

      We believe that the community universally understood the definition of Lophotrochozoa following the 1997 paper (by the authors who also provided the original 1995 definition). This 1997 definition included both chaetognaths and rotifers as examples of the Gnathifera. The Spiralia, in contrast, began life not even as a name for a clade but a description of a character shared by some apparently unrelated taxa – similar to a grouping of ‘carnivores’. The introduction of a new name was, we suggest, unhelpful. We hope that by defining our terms up front the meaning in the current paper is clear.

      (2) Introduction

      Line 76. Some references needed regarding claims that there was a polymeric brachiopod ancestor, e.g. Gutman (1978), Temereva and Malakhov (2011), Guo et al. (2023). Likewise for the chaetae of brachiopods, annelids and molluscs, e.g. Schiemann (2017), as it's key to trace where these ideas originated.

      Added

      Figure 1. This is a nice illustration of the uncertainty in the relationships of these groups. However, I kept checking which thumbnail image was which for nemerteans and annelids. A minor suggestion, but perhaps a polychaete instead for the annelid?

      We have replaced the rather poor image of an earthworm with a polychaete and also now include labels. We hope the improved images are more helpful. Good point.

      (3) Results

      Branch length comparison. I understand why the deuterostome stem was chosen as the branch for comparison from the point of view of phylogenetic uncertainty. However, what about the branch leading to ecdysozoa or the branch subtending lophotrochozoan and/or gnathifera? Given that the short internodes are used as an argument underpinning uncertain relationships, can we be sure that Gnathifera is not nested within the group of interest, especially given that Gnathifera contains many long-branched taxa and the root may be misplaced within the group?

      We have added the Lophotrochozoa and Ecdysozoa median lengths to our plots and now discuss both the lophotrochozoan branch in our results.

      Line 249. Given that Spiralia is the group of interest, why were the Gnathiferans also chosen at random?

      The point of the experiment was to see the effect of taxon sampling on the consistency of the resulting topology. Random sampling across the tree seems helpful in this context. We chose Gnathifera as one group to sample from as this ensured they would be present in all trees. This seems appropriate as they are the sister group of the clade of interest and as such their inclusion reflects a choice a typical investigator might make when choosing which species to include. Additionally, as noted in the reviewer’s earlier comment, Gnathifera includes many long-branched taxa and we wanted to ensure our root-placement results were robust to this aspect of taxon sampling.

      (4) Discussion

      Line 448. Our current understanding of the early spiralian fossil record is quite consistent with the main results of this paper. For example, there are very few claims for fossils that sit on the short branch leading to Spiralia (or Lophotrochozoa as defined here) that this paper discusses. Many of the key fossils that inform on the characters discussed in the introduction that have unusual character combinations have an apomorphy of one of the phyla discussed, and so are resolved as members of the stem lineages of particular phyla.

      This is what you would expect with long phylum stem lineages (line 148) and a short spiralia stem lineage. For example, the mollusc Wiwaxia has chaetae, but a mollusc like radula (Smith 2012), the conchiferan mollusc Pelagiella has chaetae and a coiled shell (Thomas et al. 2020). The only fossil groups that are routinely discussed as belonging to the stem lineage of more than one phylum are the tommotiids, which have chaetae, segmentation and a complex mineralised skeleton (but not shells in the brachiopod/mollusc sense, see Guo et al 2023) but they sit on the lophophorate stem lineage, a synapomorphy rich group the monophyly of which the present paper endorses (e.g. line 435). The fossil record is consistent with the scenario presented in line 442, e.g. convergent loss or reduction of chaetae and segmentation and convergent evolution of shells in molluscs and brachiopods.

      We accept these points (though are clearly not experts on these fossils). We have (slightly tentatively given our lack of expertise) expanded our discussion to include these fossil taxa with their combinations of characters.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Kim and Parsons present a timely overview of the NTR/prodrug system and its applications in regenerative biology research, with particular emphasis on tissue-specific cell ablation. The system has substantially advanced the field by enabling non-invasive, conditional cell elimination, and has proven especially powerful in zebrafish, though applications in other classical model organisms are also noted. The review covers the historical origins of the NTR system, its use in regeneration studies, small molecule screening, and genetic and CRISPR-based screening, as well as future directions, including the development of the highly efficient NTR2 enzyme variant.

      Strengths:

      This is a useful and well-structured contribution. The manuscript is a valuable resource for the regeneration biology community.

      Weaknesses:

      The impact and scientific value of this paper could be meaningfully enhanced by addressing several points outlined below. The concerns centre on completeness, conceptual precision, and the depth of mechanistic discussion.

      (1) Title: Species specificity.

      Given that the review's primary focus is the zebrafish model, it would be appropriate to include the species name in the title. This would improve discoverability and accurately set the scope of the article for prospective readers.

      Thank you for this suggestion. In revising the review, we have substantially expanded the content to address the reviewers' comments, including adding more detail on the use of NTR in other species. We agree that the majority of published work, and the research we cover, has been conducted in zebrafish, and we have clarified this in the abstract and introduction. However, our aim in writing the review was also to highlight that there is no intrinsic barrier to adopting this technique more broadly in other systems. Notably, NTR was first developed in mice, but with a prodrug that proved difficult to use, and it was not widely pursued. In mouse models, the development of DTR offered an alternative, though that approach carries risks of kidney toxicity and is incompatible with chronic ablation due to immunogenicity. Given this context, we would prefer to retain a title that does not limit the scope exclusively to zebrafish, so as not to discourage readers working in other model systems who might benefit from considering the NTR system.

      (2) Subchapter: Physical injury.

      The subchapter enumerates different types of physical injury models but would benefit from a more substantive comparative discussion. In particular, the authors are encouraged to address the following:

      (2.1) Outcome comparison: Surgical and other invasive approaches cause damage to entire tissue structures comprising multiple cell types, whereas tissue-specific genetic ablation eliminates a defined cell population while leaving the surrounding architecture largely intact. This fundamental distinction has direct implications for the interpretation of regenerative outcomes and should be clearly articulated.

      We appreciate the reviewer raising these important points, as well as those noted in Section 2.2. We addressed the concerns from Sections 2.1 and 2.2 throughout multiple parts of our review, specifically in the following sections:

      • Physical injury – where we highlight the importance of precisely characterizing the nature and extent of tissue damage in order to appropriately interpret subsequent biological responses.

      • Chemogenetic cell-specific ablation – where we expand on this theme by discussing the advantages of selectively eliminating discrete cell populations and how this improves mechanistic interpretation of regeneration.

      • Development of NTR as a suicide gene – where we examine apoptotic pathways and their relevance to nitroreductase-mediated cell ablation.

      • NTR/prodrug systems in regenerative studies – where we compare what is currently known about immune activation and inflammatory responses across different NTR-based ablation paradigms.

      (2.2) Inflammatory response: Invasive injuries typically trigger a robust inflammatory response, which itself can be a potent driver of regeneration. By contrast, genetic cell ablation may elicit a qualitatively different inflammatory reaction. A comparative discussion of this distinction would help readers appreciate a critical limitation of genetic ablation systems relative to models of natural, accidental tissue damage.

      Please see above response 2.1

      (3) Subchapter: Cell-specific toxins.

      This subchapter would benefit from several targeted expansions:

      (3.1) Off-target effects: The authors should include evidence that the exemplified drugs have known off-target activities, with a discussion of how these confounded the interpretation of experimental data. At least a few concrete published examples should be cited.

      Thank you very much for the comments. We have strengthened the discussion of off-target effects by adding concrete published examples. We now note that MPTP/MPP⁺ can affect noradrenergic and serotonergic systems in addition to dopaminergic neurons, that aminoglycoside antibiotics can damage support cells and afferent neurons at higher concentrations with compound-specific differences in ototoxicity, and that streptozotocin exhibits hepatotoxicity beyond pancreatic β-cells.

      (3.2) Completeness of the toxin list: The current list appears illustrative rather than comprehensive. A more complete enumeration would be valuable, particularly for neurotoxins and drugs targeting sensory cells, as these are highly relevant to the zebrafish regeneration field.

      We have now consolidated the toxins discussed throughout the review into Table 1, which includes additional entries alongside the previously listed agents. We have explicitly noted that this list is representative rather than exhaustive, as the full range of cell-specific toxins used across species is extensive.

      (3.3) Interspecies differences: It would be informative to specify whether drug specificity differs across species, as this is a practical consideration for researchers working in organisms other than zebrafish.

      We appreciate the reviewer’s question regarding potential interspecies differences in prodrug performance. Early work using NTR in mammals was conducted in mice, and all five published mouse studies relied exclusively on CB1954. No other NTR-activating prodrugs have been reported in mouse models, so direct comparisons are not available. Likewise, all published Xenopus studies used MTZ and thus do not provide internal comparisons across prodrugs. The Nematostella study employed NFP (citing rationale from a zebrafish study) and the approach yielded effective ablation.

      The only non-zebrafish study that directly compared prodrugs is the Drosophila work, which evaluated MTZ, RNZ, and NFP and reported lower activity for MTZ relative to the other compounds. Because it is not clear whether the authors were aware of the batch variability of MTZ or the need for freshly prepared solutions, interpreting this specific comparison is difficult.

      To address the reviewer’s comment, we have expanded the section on non-zebrafish organisms to clearly state which prodrug was successfully used in each species. However, given the limited number of studies, the absence of titration experiments, and the lack of standardized conditions across laboratories, we do not feel that the available evidence supports drawing conclusions about interspecies differences in prodrug performance.

      Consistent with our original discussion and based on the broader biochemical and empirical data available, we continue to recommend RNZ as the starting point for new experiments.

      (4) Subchapter: Optogenetic cell ablation.

      The authors note that optogenetic cell ablation has not yet been applied in conventional regeneration studies. It would strengthen this section to include a discussion of the underlying reasons for this gap, whether technical or biological, so that readers can appreciate the barriers and potential for future adoption.

      We thank the reviewer for this helpful suggestion. As recommended, we have added a concise, explicitly speculative statement discussing potential technical factors that may explain why optogenetic cell ablation has not yet been widely applied in regeneration studies. Specifically, we note that KillerRed-based ablation requires localized light delivery and ROS generation, making it best suited for discrete, optically accessible cells and less practical for targeting large or deep tissues. We also highlight that the dependence on microscopy-based illumination inherently limits throughput. This new text clarifies possible barriers to broader adoption while acknowledging that these points remain speculative.

      (5) Terminology: "Suicide gene".

      The use of the term "suicide gene" to nitroreductase is conceptually imprecise and merits reconsideration. Strictly speaking, a suicide gene is one whose expression alone is sufficient to kill the cell, as in the case of genes encoding direct triggers of apoptosis or the catalytic A subunit of diphtheria toxin (DTA). NTR does not meet this criterion: it requires the exogenous administration of a prodrug (e.g., metronidazole) to produce a cytotoxic metabolite and is therefore only conditionally lethal.

      It is worth noting that nitroreductases evolved in bacteria and fungi as enzymes involved in chemoprotection and detoxification, converting potentially toxic and mutagenic nitroaromatic compounds into less harmful metabolites (PMID: 18355273). This biological context further underscores that NTR is not inherently a lethal protein. The authors are encouraged to replace or qualify the term "suicide gene" and instead adopt terminology that more accurately reflects the conditional, prodrug-dependent nature of the system.

      We appreciate the reviewer’s thoughtful attention to terminology. We agree that, in its most classical and stringent sense, a suicide gene is one whose expression alone is sufficient to induce cell death. We also recognize that NTR does not meet this strict criterion.

      At the same time, we note that the term has broadened in contemporary usage, particularly within applied and translational contexts, to encompass prodrug-dependent systems. For example, the National Cancer Institute Thesaurus defines a suicide gene as “a gene which will cause a cell to kill itself, typically through interaction with a prodrug,” and Taber’s Medical Dictionary likewise states that it is “a gene that causes a cell to kill itself, usually by encoding an enzyme that converts a nontoxic prodrug into a toxic metabolite.” Under these widely used definitions, NTR is included within the scope of suicide gene systems.

      Nevertheless, we appreciate that terminology in this area is not universally standardized. To ensure clarity for all readers, we have added a brief definition in the revised manuscript explicitly noting the conditional, prodrug-dependent nature of NTR-mediated ablation. We are grateful to the reviewer for prompting this clarification.

      (6) NTR/MTZ in regenerative studies: Mechanistic depth.

      While the review catalogues several studies employing the NTR/MTZ system, it lacks mechanistic depth regarding the cellular basis of ablation. The following questions should be addressed, where evidence exists in the literature:

      (6.1) Temporal dynamics of cell death: What is known about the kinetics of NTR/MTZ induced lethality across different tissue types in larval and adult zebrafish, as well as other organisms? Are there age- and tissue-specific differences in the speed or completeness of ablation?

      Thank you for this important question. We have added text noting that the kinetics and completeness of NTR/prodrug-mediated ablation vary across experimental contexts, including with differences in NTR expression, enzyme/prodrug pairing, dose, cell type, and developmental stage. Published studies illustrate that the time course of ablation can differ substantially between models. Because most studies were designed to optimize ablation within individual tissues rather than for direct side-by-side comparison, the literature does not yet support broad quantitative conclusions about age- or tissue-specific differences across systems.

      (6.2) Mechanism of cell death: What is the cellular basis of NTR/MTZ-induced cytotoxicity in zebrafish? In particular, do the toxic metabolites preferentially cause mitochondrial damage or nuclear DNA damage, and what downstream death pathways are engaged?

      Thank you for the comments. We have added text discussing the mechanism of NTR/MTZ-induced cell death. We now note that NTR-mediated reduction of MTZ generates reactive intermediates that cause DNA damage and oxidative stress, with cell death occurring predominantly through apoptosis. We have also more strongly emphasized that in dopaminergic neurons, mitochondrial damage was identified as the primary cytotoxic mechanism. We acknowledge that the relative contribution of these pathways is likely to vary by cell type and remains an important area for future study.

      (6.3) Proliferative versus post-mitotic cells: Are proliferating and non-proliferating cells equally sensitive to the NTR/MTZ system, or does the proliferative status of a cell influence susceptibility? This is a practically important question for researchers designing ablation experiments in tissues with mixed cell populations.

      We appreciate the reviewer’s insightful question. We have now added a brief clarification to this section explaining that the NTR/MTZ system has been shown to act in a cell-cycle–independent manner, and both proliferating and post-mitotic cells can be ablated effectively.

      (6.4) Ablation of progenitor cells: Are there published examples demonstrating that co-ablation of differentiated functional cells and organ-specific progenitor cells abolishes regenerative capacity? Such examples would be highly informative in illustrating the system's power to dissect the cellular requirements for regeneration.

      To our knowledge, the zebrafish lateral line currently provides the clearest example in which NTR-mediated ablation of progenitor populations results in a loss of regenerative capacity. In this system, targeted ablation of support-cell progenitors severely reduces hair-cell regeneration, illustrating how NTR enables direct testing of cellular requirements for tissue repair.

      Addressing the points above, particularly the comparative discussion of injury models and inflammatory responses, the clarification of terminology, and the mechanistic discussion of NTR/MTZ-induced cell death would substantially strengthen the review's scientific contribution and utility.

      Reviewer #2 (Public review):

      Summary:

      Kim and Parsons reviewed the nitroreductase (NTR)/prodrug system: when engineered cells expressing the enzyme NTR are treated with prodrug (e.g. metronidazole), NTR converts the prodrug into a cytotoxic compound that kills these cells. The review covers how the system has been developed, spatiotemporal control of targeted cell ablation, and its broad utility to study regenerative mechanisms, model human diseases, and screen chemicals to discover pro-regenerative and protective compounds. They further discussed the newer version of NTR, a more potent prodrug, and experimental design, which not only expands the possible utility of the NTR/prodrug system, but also allows the research community to develop a precise, reproducible and versatile platform.

      Strengths:

      The review summarized landmark work application of the NTR/prodrug system, and recent studies, with focus on the model organism zebrafish. The review provides a good gateway to understanding the system and considering regenerative studies.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Kim and Parsons presents an overview of the nitroreductase/metronidazole (NTR/MTZ) cell ablation system.

      Strengths:

      This manuscript nicely places the NTR/MTZ system in the context of other cell ablation methods, with a discussion of their respective advantages and disadvantages. This review is particularly useful for highlighting the many ways the NTR/MTZ system has been applied to study the regeneration of multiple cell types and to model different degenerative human diseases. The review concludes with a discussion on recent improvements made to the system and practical considerations and "best practices" for NTR-based experiments. This review could be a helpful resource, especially for researchers new to regeneration or cell ablation studies.

      Weaknesses:

      Although the NTR/MTZ system has been used in other model organisms, this review is primarily focused on its uses in zebrafish. While this is understandable given the wide adoption of NTR/MTZ in the zebrafish field, discussion of the unique considerations and/or challenges for non-zebrafish systems would be an interesting addition and could broaden the potential audience for this review. Additional minor revisions, as suggested below, could also improve readability.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Since the lab mouse is an important mammalian model system, with certain tissues harbouring some regenerative capabilities, including the peripheral nervous system (e.g., sciatic nerve regeneration after crush), and myelin, etc., it would be great if a section could be included to discuss the potential adoption of the NTR/prodrug system in future mouse studies.

      We appreciate the reviewer’s suggestion to discuss the potential future use of the NTR/prodrug system in mouse models. In surveying the literature, we identified only five mouse studies employing NTR, all of which used CB1954. These early studies were conducted primarily as proof-of-principle work in the context of gene-directed enzyme prodrug therapy (GDEPT) for cancer, rather than for regenerative or lineage-specific ablation applications. We added this point to the text.

      Since those reports, we have not found additional examples of NTR use in mice. We do not know the precise reasons for this limited adoption, but it may reflect the availability of alternative ablation systems that are widely established in mouse research, such as the diphtheria toxin receptor (DTR) system.

      We agree that certain mouse tissues exhibit regenerative capacity and that targeted ablation tools can be valuable in such contexts. To address the reviewer’s point, we have added text noting the very limited historical use of NTR/CB1954 in mouse. We have no explanation as to why no one moved onto using NTR/MTZ in the mouse but note in two places in the text that DTR is preferred method to use in mouse ablation experiments (even though DT does cause kidney damage and is incompatible with chronic studies!).

      Minor:

      (1) Line 174-176, the sentence was repeated.

      (2) Figure 1, for the transgenic line, please be consistent with the line name in italics.

      Reviewer #3 (Recommendations for the authors):

      (1) In the abstract as well as in the main text, the authors note that the NTR/MTZ system has been used in multiple model systems. Yet, most of the review, and especially the practical advice given at the end, is very zebrafish-focused. Although this is understandable given the wide adoption of NTR/MTZ in the zebrafish field, the authors might consider revising the abstract to make it clearer that this review is primarily concerned with the use of the NTR/MTZ system in zebrafish.

      Thanks for the suggestion. We have changed last half of first paragraph in abstract

      That said, a brief discussion of any unique considerations and/or challenges for non-zebrafish systems would be an interesting addition and could broaden the potential audience for this review.

      Agreed and we have expanded in several places in the text to discuss more about the NTR system in non-zebrafish. We especially expanded our discussion about NTR in the mouse.

      (2) Line 176: There is a repetition of the sentence, "NTR/MTZ-mediated ablation has also been adapted for other model organisms."

      Found and deleted. Thank you!

      (3) Line 177: To improve clarity, the authors should include species names to prevent confusion. For example, both Xenopus laevis and Xenopus tropicalis are commonly used model organisms. Similarly, multiple Drosophila species are used by researchers.

      Added melanogaster and laevis to text.

      (4) Can the authors address whether alternatives to MTZ (RNZ, etc.) have the same issues with batch-to-batch variability? That might be an important consideration for potential users. It would also be useful to include practical guidance for accounting for batch variability, for example, how to determine optimal prodrug concentrations, whether effective concentrations need to be determined for every batch/replicate/experiment, etc.

      Added text that discusses that, it is not yet known whether RNZ exhibits batch-to-batch variability similar to MTZ, as this has not been systematically reported. Given the potential for variability, it would be prudent for researchers to titrate each new batch of RNZ or, alternatively, adopt a dosing strategy that exceeds the minimum effective concentration to ensure consistent ablation results.

      (5) For the last section ("Experimental design: Practical and technical considerations"), readability would be improved by applying a consistent bullet point format.

      Made the changes as requested.

      (6) Figure 1: Asterisks are not defined.

      The asterisks where to link to two boxes depicting the same transgene without rewriting the name of the transgene. Clearly, this wasn’t clear, so we have added explanation to legend too.

      (7) Figure 3: Given that the schematics specify expression of NTR1 and NTR1.1, I assume this figure is adapted or based on previous published report(s). If so, the reference(s) should be noted in the figure legend or on the figure itself (as done for Figure 1). If the schematic is meant to depict only in general terms how binary expression vectors can be used, a more inclusive "NTR" label might be less confusing.

      Changed figure legend and figure

      (8) Figure 4: To improve readability and accessibility, the authors should consider modifying panels C-N to use a more colorblind-friendly palette (e.g., green/magenta) or to present each channel as separate grayscale images.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses of the methods and results:

      - Line 162: need to establish and verify the PKH26-labeled TSL cells were unaffected by the dhh-/- environment. No data to support the claim that they were unaffected.

      We thank the reviewer for this important comment. In dhh<sup>-/-</sup> recipient testes, PKH26-labeled TSL cells were observed within the interstitial compartment (Fig. 3C3). Importantly, these PKH26-positive cells could be induced by SAG treatment to differentiate into Cyp11c1-positive steroidogenic cells (Fig. 3E3), indicating that they remained viable in the dhh<sup>-/-</sup> environment.

      We have revised the Results section (line 171–173) to “These results suggest that SLC differentiation is inhibited, whereas the survival and engraftment of PKH26-labeled TSL cells were not affected in dhh<sup>-/-</sup> XY tilapia testes.”

      - The rescued phenotype caused by the addition of ptch2-/- to the dhh-/- model is a compelling. To further define potential ptch1 contributions, it would be helpful to examine the expression level of ptch1 in the context of the ptch2-/- and ptch2-/-;dhh-/- mutant animals. Any compensatory increase in ptch1 in either case, without obvious phenotype changes, would support the dominant role for ptch2.

      We thank the reviewer for this valuable suggestion. We have now performed RT-qPCR analysis of ptch1 expression in XY testes from WT, ptch2<sup>-/-</sup> and dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> fish at 90 dah. As shown in Fig. S8, no significant differences in ptch1 mRNA levels were detected among these genotypes, indicating that loss of ptch2 does not induce compensatory upregulation of ptch1 at the transcriptional level under the conditions examined. We have revised the Discussion section (line 277–290) to “The specificity for Ptch2 in this context might stem from unique co-receptor interactions or expression patterns within the testicular niche. To preliminarily assess potential compensatory regulation, we examined ptch1 expression in XY testes from WT, ptch2<sup>-/-</sup> and dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> fish at 90 dah. No significant differences in ptch1 mRNA levels were detected among these genotypes (Fig. S8), suggesting that loss of ptch2 does not trigger compensatory upregulation of ptch1 at the transcriptional level under the conditions examined. Nonetheless, global ptch2 mutation affects multiple tissues, whereas our mechanistic focus is on SLC differentiation within the testicular niche. Moreover, the early embryonic lethality of global ptch1 mutation in tilapia (Liu et al., 2024) precludes direct assessment of its role in postnatal testis development. Therefore, although our findings strongly support a predominant role for Ptch2 in mediating Dhh signaling in SLCs, definitive resolution of receptor specificity will require future Leydig cell-specific conditional knockout models.”

      - Activity of individual gli factors need additional reconciliation. The expression profiles for both alternative gli factors should be quantified in each knockout cell line to establish redundancy and/or compensation.

      We agree that quantifying the expression of alternative gli genes might be informative. In the present study, TSL-gli1<sup>-/-</sup> cells completely lose responsiveness to Dhh stimulation in the 8×GLI luciferase assay, whereas TSL-gli2<sup>-/-</sup> and TSL-gli3<sup>-/-</sup> cells retain normal pathway activation (Fig. 5B), which unambiguously suggest that Gli1 is the principal transcriptional effector in tilapia SLCs under our experimental conditions. Redundancy and/or compensation of alternative gli factors need further genetic dissection in the future study.

      - Figure 5E: An important control is missing that includes evaluation of HEK293 cells transfected with pcDNA3.1-OnGli1 without the addition of pGL3-sf1.

      We don’t think HEK293 cells transfected with pcDNA3.1-OnGli1 without the addition of pGL3-sf1 is an important control in our study. In the dual-luciferase assays, we think pcDNA3.1 + pGL3 (empty reporter) and pcDNA3.1 + pGL3-sf1 controls were sufficient.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Recommendations for improving the writing and presentation; minor corrections:

      - Include Park paper (Endocrinology 2007) somewhere near line 73. Need to acknowledge this paper as it is one of the first to connect Dhh to Sf1.

      We have now included the citation of Park et al. (Endocrinology 2007) in the Introduction (now line 81).

      - Include Kothandapani paper (PLoS Genetics 2020) somewhere near line 86. Need to acknowledge this paper as it is the only to reconcile the data showing no difference in Gli1 or Gli2 knockouts, but loss of Leydig cell function due to Gli3 activity.

      We have now included the citation of Kothandapani et al. (PLoS Genetics 2020) in the Introduction (now line 97).

      - Please include sequences of B1 and B2 in sf1 promoter, how conserved are they to the canonical Gli binding sequence?

      We have revised the Results section (line 216–218) to “Functional annotation of its promoter region identified two conserved Gli1-binding motifs, B1 (AACCACCCA) and B2 (GAGCCACCCA)”.

      - Figure 1 or results text: please clarify that the dhh-/- model used is the delta13bp mutation.

      We have clarified in the Results section (line 133) that the dhh<sup>-/-</sup> model corresponds to the 13-bp (CAGGGATGCGGAC) frameshift deletion.

      - Figure 5E legend: please clarify that HEK293 cells are used

      We have revised the Figure 5E legend to explicitly state that the dual-luciferase reporter assays were performed in HEK293 cells. Revised legend sentence (line 743-746): HEK293 cells were co-transfected with pRL-TK, pGL3, pcDNA3.1, pGL3-sf1, pcDNA3.1-On Gli1, and the indicated cold probe constructs, and luciferase activity was measured 48 hours post-transfection.

      - Figure S5E: * indicates the heteroduplex-it seems that there is a heteroduplex highlighted with the asterisk at ~600bp size; based on homozygous and mutant bands, it seems the asterisk should be highlighting the duplex near those sized bands. What are the bands up at ~600bp?

      We thank the reviewer for the careful observation. In Figure S5E, the bands observed at approximately ~600 bp represent heteroduplex products formed during the re-annealing of PCR amplicons derived from heterozygous individuals. During denaturation and re-annealing, WT and mutant strands can pair in different configurations, generating distinct heteroduplex conformations that migrate more slowly than homoduplex products in PAGE. As a result, two heteroduplex bands are visible at ~600 bp, reflecting alternative mismatched duplex structures. The homoduplex WT and mutant bands are indicated separately by arrows.

      - Figure S7F: dhh-/- data are missing

      We thank the reviewer for pointing out this omission. The missing dhh<sup>-/-</sup> dataset has now been added to Figure S7F, and the figure has been updated accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Comments on revised version:

      The authors have appropriately addressed my comments and questions from the initial review process. My remaining concern relates to the lack of evidence to confirm proteasomal inhibition by lactacystin in both promastigotes and amastigotes. The immunoblotting experiment newly presented does not reveal a clear increase in the levels of poly-ubiquitylated proteins in treated parasites. In fact, poly-Ub levels were lower at both the 4h and 18h timepoints of treatment. If alternative antibodies or additional immunoblots are not available, the manuscript would benefit from an expanded discussion of this observation and potential explanations. In particular, the interpretation that lactacystin stabilizes ama- and pro-specific degradation would be greatly strengthened by such validation.

      Reviewer #2 (Public review):

      General comments on the revisions:

      My view is that the authors have made significant, satisfactory changes that address the comments and queries I made on the original manuscript (Review Commons).

      There are two areas where the authors had to make major changes/justifications where further comment is merited, these were:

      RNA-seq.

      The most significant issue was the originally underpowered RNA-seq which had only two replicates. This has been repeated with four replicates now. This has not led to changes in the interpretation of the data between the original study and this one. One comment that the authors make in the response to this was : "Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary". Ensuring that animal experiments are properly powered and that maximum robustness of the data from the minimum sample size is an important part of experimental design for ethical use of animal models. Essentially the replication here could have been avoided if the original study had used 1 more animal. However, the new version of RNA-seq brings appropriate confidence to the interpretation of the data.

      Phosphoproteomics.

      The authors provide a robust justification of their strategy for the phosphoproteomics and highlight the inclusion criteria for phosphosites: "Phosphosites were only considered if detected with high confidence (identification FDR<1%) and high localisation confidence (localisation probability >0.75) in at least one replicate". The way missing values were dealt with is explained "For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition." This fills in some of the gaps I was missing from the original manuscript, and I am satisfied that the data analysis is entirely appropriate for a discovery/system -based approach such as this one. The authors also edit the manuscript to reflect that "occupancy" or "stoichiometry" might not be the best description of what they were presenting and switched to the terminology of "normalised phosphorylation level" - I think this is an appropriate response.

      Overall, in the absence of follow up experiments on specific individual examples, some of the claims in the original submission were toned down and reflect a more neutral description of the data now. Significantly, the data still underpin a key role for regulation of the ribosome between the amastigote and promastigote stages (and during the differentiation process). The recursive and reciprocal links between the phosphorylation and ubiquitination systems are interesting and present many opportunities for future investigation.

      Reviewer #3 (Public review):

      Summary:

      The authors proposed to use 5-layer systems level analysis (genomics, transcriptomics, proteomics / protein degradation, metabolomics, phosphoproteomics) to uncover how post-transcriptional mechanisms regulate stage differentiation in Leishmania donovani.<br /> This enabled the identification of several potential regulatory networks, including the regulation of stage-specific gene clusters by RNA stabilisation or decay, proteasomal degradation and protein phosphorylation.

      In the new version of this manuscript, the authors have addressed all questions raised by the reviewers.

      Strengths:

      Although some observations in this study have already been described in the literature, the integrated analysis applied here provides a novel view on how different levels of post-transcriptional networks regulate Leishmania differentiation. This "5-layer system" represents the first analysis of this depth in kinetoplastid parasites.

      The revised version with an increased sample number for the RNA-seq now made the authors assumptions adequate to their obtained data.

      The use of a proteasomal inhibitor adds an interesting insight in how protein degradation is involved in the parasite differentiation, confirming previous observations in the literature, and help to explain the discrepancies between mRNA and protein expression in the different stages.

      Weaknesses:

      While this work provides an impressive and foundational dataset, it opens the door for future research to rigorously validate these initial findings and conclusions.

      Significance and Impact in the field.

      The different datasets generated in this study will be of great interest to the parasitology community, either to be used for hypothesis generation, to validate data from other sources, etc.

      The multi-layered analysis performed here identified a series of potential feedback loops and regulatory networks to be further explored in organisms that lack transcriptional control.

      According to the reviewers’ comments, we made the following minor changes:

      As suggested by reviewer 1, we have extended the discussion of the results related to the analysis of the ubiquitination pattern by Western blot analysis as follows: “Proteasome inhibition blocked amastigote-to-promastigote differentiation, without inducing rapid global accumulation of ubiquitinated proteins (Figure S7C, upper panel) consistent with a quiescent-like state and low basal ubiquitin–proteasome system activity in amastigotes. After 18 h, ubiquitination levels remained similar to untreated cells, indicating that protein turnover and ubiquitin accumulation are primarily driven by developmental remodeling rather than acute proteasome inhibition. In promastigotes, the lack of detectable change (Fig. S7C, lower panel) may also reflect high basal ubiquitination, engagement of compensatory pathways such as autophagy, and/or only partial proteasome inhibition.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      - Supplementary figure 3 is not referenced in the main text.

      - The authors removed the "infinite" sign from figures 3 and 4 to better present the data according to their chosen approach to missing values when LFQ=0. However, the sign is still present in the respective figure legends, please adjust.

      Supplementary Figure 3 (Figure S3) is now referenced in the main text as requested.

      The "infinite" sign has been removed from the legends of Figures 3 and 4 as requested.

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary:

      This manuscript reports the identification of putative orthologues of mitochondrial contact site and cristae organizing system (MICOS) proteins in Plasmodium falciparum - an organism that unusually shows an acristate mitochondrion during the asexual part of its life cycle and then this develops cristae as it enters the sexual stage of its life cycle and beyond into the mosquito. The authors identify PfMIC60 and PfMIC19 as putative members and study these in detail. The authors at HA tags to both proteins and look for timing of expression during the parasite life cycle and attempt (unsuccessfully) to localise them within the parasite. They also genetically deleted both gene singly and in parallel and phenotyped the effect on parasite development. They show that both proteins are expressed in gametocytes and not asexuals, suggesting they are present at the same time as cristae development. They also show that the proteins are dispensible for the entire parasite life cycle investigated (asexuals through to sporozoites), however there is some reduction in mosquito transmission. Using EM techniques they show that the morphology of gametocyte mitochondria is abnormal in the knockout lines, although there is great variation.

      Major comments:

      The manuscript is interesting and is an intriguing use of a well studied organism of medical importance to answer fundamental biological questions. My main comments are that there should be greater detail in areas around methodology and statistical tests used. Also, the mosquito transmission assays (which are notoriously difficult to perform) show substantial variation between replicates and the statistical tests and data presentation are not clear enough to conclude the reduction in transmission that is claimed. Perhaps this could be improved with clearer text?

      We would like to thank the reviewer for taking the time to review our manuscript. We are happy to hear the reviewer thinks the manuscript is interesting and thank the reviewer for their constructive feedback.

      To clarify the statistical analyses used, we included a new supplementary dataset with all statistical analyses and p-values indicated per graph. Furthermore, figure legends now include the information on the exact statistical test used in each case.

      Regarding mosquito experiments, while we indeed reported a reduction in transmission and oocysts numbers, we are aware that this effect might be due to the high variability in mosquito feeding assays. To highlight this point, we deleted the sentence “with the transmission reduction of [numbers]….” and we included the sentence “The high variability encountered in the standard membrane feeding assays, though, partially obstructs a clear conclusion on the biological relevance of the observed reduction in oocyst numbers“

      More specific comments to address:

      Line 101/Fig1E (and figure legend) - What is this heatmap showing. It would be helpful to have a sentence or two linking it to a specific methodology. I could not find details in the M+M section and "specialized, high molecular mass gels" does not adequately explain what experiments were performed. The reference to Supplementary Information 1 also did not provide information.

      We added the information “high molecular mass gels with lower acrylamide percentage” to clarify methodology in the text. Furthermore, we extended the figure legend to include all relevant information. Further experimental details can be found in the study cited in this context, where the dataset originates from (Evers et al., 2021).

      Line 115 and Supplementary Figure 2C + D - The main text says that the transgenic parasites contained a mitochondrially localized mScarlet for visualization and localization, but in the supplementary figure 2 it shows mitotracker labelling rather than mScarlet. This is very confusing. The figure legend also mentions both mScarlet and MitoTracker. I assume that mScarlet was used to view in regular IFAs (Fig S2C) and the MitoTracker was used for the expansion microscopy (Fig S2D)?

      Please clarify.

      We thank the reviewer for pointing this out – this was indeed incorrectly annotated. We used the endogenous mito-mScarlet signal in IFA and mitoTracker in U-ExM. The figure annotation has now been corrected.

      Figure 2C - what is the statistical test being used (the methods say "Mean oocysts per midgut and statistical significance were calculated using a generalized linear mixed effect model with a random experiment effect under a negative binomial distribution." but what test is this?)?

      The statistic test is now included in the material and method section with the sentence “The fitted model was used to obtain estimated means and contrasts and were evaluated using Wald Statistics”. The test is now also mentioned in the figure legend.

      Also the choice of a log10 scale for oocyst intensity is an unusual choice - how are the mosquitoes with 0 oocysts being represented on this graph? It looks like they are being plotted at 10^-1 (which would be 0.1 oocysts in a mosquito which would be impossible).

      As the data spans three orders of magnitude with low values being biologically meaningful, we decided that a log scale would best facilitate readability of the graph. As the 0 values are also important to show, we went with a standard approach to handle 0s in log transformed data and substituted the 0s with a small value (0.001). We apologize for not mentioning this transformation in the manuscript. To make this transformation transparent, we added a break at the lower end of the log-scaled y-axis and relabelled the lowest tick as ‘0’. This ensures that mosquitoes with zero oocysts are shown along the x-axis without being assigned an artificial value on the log scale. We would furthermore like to highlight that for statistics we used the true value 0 and not 0.001.

      Figure 2D - it is great that the data from all feeding replicates has been shared, however it is difficult to conclude any meaningful impact in transmission with the knock-out lines when there is so much variation and so few mosquitoes dissected for some datapoints (10 mosquitoes are very small sample sizes). For example, Exp1 shows a clear decrease in mic19- transmission, but then Exp2 does not really show as great effect. Similarly, why does the double knock out have better transmission than the single knockouts? Sure there would be a greater effect?

      We agree with the reviewer and with the new sentence added, as per major point, we hope we clarified the concept. Note that original Figure 2D has been moved to the supplementary information, as per minor comment of another reviewer.

      Figure 3 legend - Please add which statistical test was used and the number of replicates.

      Done

      Figure 4 legend - Please add which statistical test was used and the number of replicates.

      Done. Regarding replicates, note that while we measured over 100 cristae from over 30 mitochondria, these all stem from the same parasite culture.

      Figure 5C - the 3D reconstructions are very nice, but what does the red and yellow coloring show?

      Indeed, the information was missing. We added it to the figure legend.

      Line 352 - "Still, it is striking that, despite the pronounced morphological phenotype, and the possibly high mitochondrial stress levels, the parasites appeared mostly unaffected in life cycle propagation, raising questions about the functional relevance of mitochondria at these stages."

      How do the authors reconcile this statement with the proven fact that mitochondria-targeted antimalarials (such as atovaquone) are very potent inhibitors of parasite mosquito transmission?

      Our original sentence was reductive. What we wanted to state was related to the functional relevance of crista architecture and overall mitochondrial morphology rather than the general functional relevance of the mitochondria. We changed the sentence accordingly.

      Furthermore, even though we do not discuss this in the article, we are aware of mitochondria targeting drugs that are known to block mosquito transmission. We want to point out that it is difficult to discern the disruption of ETC and therefore an impact on energy conversion with the impact on the essential pathway of pyrimidine synthesis, highly relevant in microgamete formation. Still, a recent paper from Sparkes et al. 2024 showed the essentiality of mitochondrial ATP synthesis during gametogenesis so it is very likely that the mitochondrial energy conversion is highly relevant for transmission to the mosquito.

      Reviewer #1 (Significance):

      This manuscript is a novel approach to studying mitochondrial biology and does open a lot of unanswered questions for further research directions. Currently there are limitations in the use of statistical tests and detail of methodology, but these could be easily be addressed with a bit more analysis/better explanation in the text.

      This manuscript could be of interest to readers with a general interest in mitochondrial cell biology and those within the specific field of Plasmodium research.

      My expertise is in Plasmodium cell biology.

      We thank the reviewer for the praise.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Major comments:

      (1) In my opinion, the authors tend to sensationalize or overinterpret their results. The title of the manuscript is very misleading. While MICOS is certainly important for crista formation, it is not the only factor, as ATP synthase dimer rows make a highly significant contribution to crista morphology. Thus, one can argue with equal validity that ATP synthase should be considered the 'architect', as it's the conformation of the dimers and rows modulate positive curvature. Secondly, while cristae are still formed upon mic60/mic19 gene knockout (KO), they are severely deformed, and likely dysfunctional (see below). Thus, I do not agree with the title that MICOS is dispensable for crista formation, because the authors results show that it clearly is essential. So, the title should be changed.

      We thank the reviewer for taking the time to review our manuscript.

      Based on the reviewers’ interpretation we conclude the title does not come across as intended. We have changed the title to: “The role of MICOS in organizing mitochondrial cristae in malaria parasites”

      The Discussion section starting from line 373 also suffers from overinterpretation as well as being repetitive and hard to understand. The authors infer that MICOS stability is compromised less in the single KOs (sKO) in compared to the mic60/mic19 double KO (dKO). MICOS stability was never directly addressed here and the composition of the MICOS complex is unaddressed, so it does not make sense to speculate by such tenuous connections. The data suggest to me that mic60 and mic19 are equally important for crista formation and crista junction (CJ) stabilization, and the dKO has a more severe phenotype than either KO, further demonstrating neither is epistatic.

      We do agree with the reviewer’s notion that we did not address complex stability, and our wording did not make this sufficiently clear. We shortened and rephrased the paragraph in question.

      The following paragraphs (line 387 to 422) continues with such unnecessary overinterpretation to the point that it is confusing and contradictory. Line 387 mentions an 'almost complete loss of CJs' and then line 411 mentions an increase in CJ diameter, both upon Mic60 ablation. I do not think this discussion brings any added value to the manuscript and should be shortened. Yes, maybe there are other putative MICOS subunits that may linger in the KOS that are further destabilized in the dKO, or maybe Mic60 remains in the mic19 KO (and vice versa) to somehow salvage more CJs, which is not possible in the dKO. It is impossible to say with confidence how ATP synthase behaves in the KOs with the current data.

      We shortened this paragraph.

      (2) While the authors went through impressive lengths to detect any effect on lifecycle progression, none was found except for a reduction in oocyte count. However, the authors did not address any direct effect on mitochondria, such as OXPHOS complex assembly, respiration, membrane potential. This seems like a missed opportunity, given the team's previous and very nice work mapping these complexes by complexome profiling. However, I think there are some experiments the authors can still do to address any mitochondrial defects using what they have and not resorting to complexome profiling (although this would be definitive if it is feasible):

      i) Quantification of MitoTracker Red staining in WT and KOs. The authors used this dye to visualize mitochondria to assay their gross morphology, but unfortunately not to assay membrane potential in the mutants. The authors can compare relative intensities of the different mitochondria types they categorized in Fig. 3A in 20-30 cells to determine if membrane potential is affected when the cristae are deformed in the mutants. One would predict they are affected.

      Interesting suggestion. As our staining and imaging conditions are suitable for such analysis (as demonstrated by Sarazin et al., 2025, https://www.biorxiv.org/content/10.1101/2025.11.27.690934v1), we performed the measurements on the same dataset which we collected for Figure 3. We did, however, not detect any difference in mitotracker intensity between the different lines. The result of this analysis is included in the new version of Supplementary figure S6.

      ii) Sporozoites are shown in Fig S5. The authors can use the same set up to track their motion, with the hypothesis that they will be slower in the mutants compared to WT due to less ATP. This assumes that sporozoite mitochondria are active as in gametocytes.

      While theoretically plausible and informative, we currently do not know the relevance of mitochondrial energy conversion for general sporozoite biology or specifically features of sporozoite movement. Given the required resources and time to set this experiment up and the uncertainty whether it is a relevant proxy for mitochondrial functioning, we argue it is out of scope for this manuscript.

      iii) Shotgun proteomics to compare protein levels in mutants compared to WT, with the hypothesis that OXPHOS complex subunits will be destabilized in the mutants with deformed cristae. This could be indirect evidence that OXPHOS assembly is affected, resulting in destabilized subunits that fail to incorporate into their respective complexes.

      While this experiment could potentially further our understanding of the interaction between MICOS and levels of OXPHOS complex subunits we argue that the indirect nature of the evidence does not justify the required investments.

      To expedite resubmission, the authors can restrict the cell lines to WT and the dKO, as the latter has a stronger phenotype that the individual KOs and conclusions from this cell line are valid for overall conclusions about Plasmodium MICOS.

      I will also conclude that complexome/shotgun proteomics may be a useful tool also for identifying other putative MICOS subunits by determining if proteins sharing the same complexome profile as PfMic60 and Mic19 are affected. This would address the overinterpretation problem of point 1.

      (3) I am aware of the authors previous work in which they were not able to detect cristae in ABS, and thus have concluded that these are truly acristate. This can very well be true, or there can be immature cristae forms that evaded detection at the resolution they used in their volumetric EM acquisitions. The mitochondria and gametocyte cristae are pretty small anyway, so it not unreasonable to assume that putative rudimentary cristae in ABS may be even smaller still. Minute levels of sampled complex III and IV plus complex V dimers in ABS that were detected previously by the authors by complexome profiling would argue for the presence of miniscule and/or very few cristae.

      I think that authors should hedge their claim that ABS is acristate by briefly stating that there still is a possibility that miniscule cristae may have been overlooked previously.

      We acknowledge that we cannot demonstrate the absolute absence of any membrane irregularities along the inner mitochondrial membrane. At the same time, if such structures were present, they would be extremely small and unlikely to contain the full set of proteins characteristic of mature cristae. For this reason, we consider it appropriate to classify ABS mitochondria as acristate. To reflect the reviewer’s point while maintaining clarity for readers, we have slightly adjusted our wording in the manuscript, changing ‘fully acristate’ to ‘acristate’.

      This brings me to the claim that Mic19 and Mic60 proteins are not expressed in ABS. This is based on the lack of signal from the epitope tag; a weak signal is detected in gametocytes. Thus, one can counter that Mic19 and Mic60 are also expressed, but below the expression limits of the assay, as the protein exhibits low expression levels when mitochondrial activity is upregulated.

      We agree with the reviewer that the absence of a detectable epitope-tag signal does not definitively exclude low-level expression, and we have therefore replaced the term ‘absent’ with ‘undetectable’ throughout the manuscript. In context with previous findings of low-level transcripts of the proteins in a study by Lopez-Berragan et al. and Otto et al., we also added the sentence “The apparent absence could indicate that transcripts are not translated in ABS or that the proteins’ expression was below detection limits of western blot analysis.” to the discussion. At the same time, we would like to clarify that transcript levels for both genes fall within the <25th percentile, suggesting that these low values likely represent background signal rather than biologically meaningful expression. This interpretation is further supported by proteomic datasets in PlasmoDB, which report PfMIC19 and PfMIC60 expression in gametocyte and mosquito stages, but not in asexual blood stages.”

      To address this point, the authors should determine of mature mic60 and mic19 mRNAs are detected in ABS in comparison to the dKO, which will lack either transcript. RT-qPCR using polyT primers can be employed to detect these transcripts. If the level of these mRNAs are equivalent to dKO in WT ABS, the authors can make a pretty strong case for the absence of cristae in ABS.

      We appreciate the reviewer’s suggestion. As noted in the Discussion, existing transcriptomic datasets already show detectable MIC19 and MIC60 mRNAs in ABS. For this reason, we expect RT-qPCR to reveal low (but not absent) levels of both transcripts, unlike the true loss expected to be observed in the dKO. Because such residual signals have been reported previously and their biological relevance remains uncertain, we do not believe transcript levels alone can serve as a definitive indicator of cristae absence in ABS.

      They should highlight the twin CX9C motifs that are a hallmark of Mic19 and other proteins that undergo oxidative folding via the MIA pathway. Interestingly, the Mia40 oxidoreductase that is central to MIA in yeast and animals, is absent in apicomplexans (DOI: 10.1080/19420889.2015.1094593).

      Searching for the CX9C motifs is a valuable suggestion. In response to the reviewer´s suggestion we analysed the conservation of the motif in PfMIC19 and included this in a new figure panel (Figure 1 F).

      Did the authors try to align Plasmodium Mic19 orthologs with conventional Mic19s? This may reveal some conserved residues within and outside of the CHCH domain.

      In response to this comment we made Figure 1 F, where we show conserved residues within the CHCH domains of a broad range of MIC19 annotated sequences across the opisthokonts, and show that the Cx9C motifs are conserved also in PfMIC19. Outside the CHCH domain, we did not find any meaningful conservation, as PfMIC19 heavily diverges from opisthokont MIC19.

      (5) Statistical significance. Sometimes my eyes see population differences that are considered insignificant by the statistical methods employed by the authors, eg Fig. 4E, mutants compared to WT, especially the dKO. Have the authors considered using other methods such as student t-test for pairwise comparisons?

      The graphs in figures 3, 4 and 5 got a makeover, such that they now are in linear scale and violin plots (also following a suggestion from further down in the reviewer’s comments). We believe that this improves interpretability. ANOVA was kept as statistical testing to assure the correction for multiple comparisons that cannot be performed with standard t-test. A full overview of statistics and exact pvalues can also be found in the newly added supplementary information 2.

      Minor comments:

      Line 33. Anaerobes (eg Giardia) have mitochondria that do produce ATP, unlike aerobic mitochondria

      We acknowledge that producing ATP via OXPHOS is not a characteristic of all mitochondria-like organelles (e.g. mitosomes), which is why these are typically classified separately from canonical mitochondria. When not considering mitochondria-like organelles, energy conversion is the function that the mitochondrion is most well-known for and the one associated with cristae.

      Line 56: Unclear what authors mean by "canonical model of mitochondria"

      To clarify we changed this to “yeast or human” model of mitochondria.

      Lines 75-76: This applies to Mic10 only

      We removed the “high degree of conservation in other cristate eukaryotes” statement.

      Line 80: Cite DOI: 10.1016/j.cub.2020.02.053

      Done

      Fig 2D: I find this table difficult to read. If authors keep table format, at least get rid of 'mean' column' as this data is better depicted in 2C. I suggest depicted this data either like in 3B depicting portion of infected vs unaffected flies in all experiments, then move modified Table to supplement. Important to point out experiment 5 appears to be an outlier with reduced infectivity across all cell lines, including WT.

      To clarify: the mean reported in the table indicates the mean per replicate while the mean reported in figure 2C is the overall mean for a given genotype that corrects for variability within experiments. We agree that moving the table to the supplementary data is a good idea. We decided to not include a graph for infected and non-infected mosquitoes as this information would be partially misleading, highlighting a phenotype we argue to be influenced by the strong variability.

      Fig. 3C-G: I feel like these data repeatedly lead to same conclusions. These are all different ways of showing what is depicted in Fig 2B: mitochondria gross morphology is affected upon ablation of MICOS. I suggest that these graphs be moved to supplement and replaced by the beautiful images.

      Thank you for the nice comment on our images. We have now moved part of the graphs to supplementary figure 6 and only kept the Relative Frequency, Sphericity and total mitochondria volume per cell in the main figure.

      Line 180: Be more specific with which tubulin isoform is used as a male marker and state why this marker was used in supplemental Fig S6.

      We have now specified the exact tubulin isoform used as the male gametocyte marker, both in the main text and in Supplementary Fig. S6. This is a commercial antibody previously known to work as an effective male marker, which is why we selected it for this experiment. This is now clearly stated in the manuscript.

      Line 196 and Fig 3C: the word 'intensities' in this context is very ambiguous. Please choose a different term (puncta, elements, parts?). This is related to major point 2i above.

      To clarify the biological effect that we can conclude form the measurement, we added an explanation about it in the respective section of the results, and we decided to replace the raw results of the plug-in readout with the deduced relative dispersion.

      Line 222: Report male/female crista measurements

      We added Supplementary information 2, which contains exact statistical test and outcomes on all presented quantifications as well as a per-sex statistical analysis of the data from figure 4. Correspondingly, we extended supplementary information 2 by a per-sex colour code for the thin section TEM data.

      Fig. 4B-E: depict data as violin plots or scatter plots like Fig. 2C to get a better grasp of how the crista coverage is distributed. It seems like the data spread is wider in the double KO. This would also solve the problem with the standard deviation extending beyond 0%.

      We changed this accordingly.

      Lines 331-333: Please clarify that this applies for some, but not all MICOS subunits. Please also see major point 1 above. Also, the authors should point out that despite their structural divergence, trypanosomal cryptic mitofilins Mic34 and Mic40 are essential for parasite growth, in contrast to their findings with PfMic60 (DOI: https://doi.org/10.1101/2025.01.31.635831).

      This has been changed accordingly.

      Line 320: incorrect citation. Related to point 1above.

      Correct citation is now included in the text.

      Lines 333-335. This is related to the above. Again, some subunits appear to affect cell growth under lab conditions, and some do not. This and the previous sentence should be rewritten to reflect this.

      This has been changed accordingly.

      Line 343-345: The sentence and citation 45 are strange. Regarding the former, it is about CHCHD10, whose status as a bona fide MICOS subunit is very tenuous, so I would omit this. About the phenomenon observed, I think it makes more sense to write that Mic60 ablation results in partially fragmented mitochondria in yeast (Rabl et al., 2009 J Cell Biol. 185: 1047-63). A fragmented mitochondria is often a physiological response to stress. I would just rewrite as not to imply that mitochondrial fission (or fusion) is impaired in these KOs, or at least this could be one of several possibilities.

      The sentence has been substituted following the indication of the reviewer. Though we still include the data of the human cells as this has also been shown in Stephens et al. 2020.

      Line 373: 'This indicates' is too strong. I would say 'may suggest' as you have no proof that any of the KOs disrupts MICOS. This hypothesis can be tested by other means, but not by penetrance of a phenotype.

      Done

      Line 376-377; 'deplete functionality' does not make sense, especially in the context of talking about MICOS subunit stability. In my opinion, this paragraph overinterprets the KO effects on MICOS stability. None of the experiments address this phenomenon, and thus the authors should not try to interpret their results in this context. See major point 1.

      We removed the sentence. Also, the entire paragraph has been shortened, restructured and wording was changed to address major point 1.

      Other suggestions for added value

      (1) Does Plasmodium Sam50 co-fractionate with Mic60 and Mic19 in BN PAGE (Fig. 1E)

      While we did identify SAMM50 in our BN PAGE, the protein does not co-migrate with the MICOS components but instead comigrates with other components of a putative sorting and assembly machinery (SAM) complex. As SAMM50, the SAM complex and the overarching putative mitochondrial membrane space bridging (MIB) complex are not mentioned in the manuscript, we decided to not include the information in Author response image 1.

      Author response image 1.

      Reviewer #2 (Significance):

      The manuscript by Tassan-Lugrezin is predicated on the idea that Plasmodium represents the only system in which de novo crista formation can be studied. They leverage this system to ask the question whether MICOS is essential for this process. They conclude based on their data that the answer is no, which the authors consider unprecedented. But even if their claim is true that ABS is acristate, this supposed advantage does not really bring any meaningful insight into how MICOS works in Plasmodium.

      First the positives of this manuscript. As has been the case with this research team, the manuscript is very sophisticated in the experimental approaches that are made. The highlights are the beautiful and often conclusive microscopy performed by the authors. Only the localization of Mic60 and Mic19 was inconclusive due to their very low expression unfortunately.

      The examination of the MICOS mutants during in vitro life cycle of Plasmodium falciparum is extremely impressive and yields convincing results. Mitochondrial deformation is tolerated by life cycle stage differentiation, with a modest but significant reduction of oocyte production, being observed.

      However, despite the herculean efforts of the authors, the manuscript as it currently stands represents only a minor advance in our understanding of the evolution of MICOS, which from the title and focus of the manuscript, is the main goal of the authors.

      In its current form, the manuscript reports some potentially important findings:

      (1) Mic60 is verified to play a role in crista formation, as is predicted by its orthology to other characterized Mic60 orthologs.

      (2) The discovery of a novel Mic19 analog (since the authors maintain there is no significant sequence homology), which exhibits a similar (or the same?) complexome profile with Mic60. This protein was upregulated in gametocytes like Mic60 and phenocopies Mic60 KO.

      (3) Both of these MICOS subunits are essential (not dispensable) for proper crista formation

      (4) Surprisingly, neither MICOS subunit is essential for in vitro growth or differentiation from ABS to sexual stages, and from the latter to sporozoites. This says more about the biology of plasmodium itself than anything about the essentiality of Mic60, i.e. plasmodium life cycle progression tolerates defects to mitochondrial morphology. But yes, I agree with the authors that Mic60's apparent insignificance for cell growth in examined conditions does differ with its essentiality in other eukaryotes. But fitness costs were not assayed (e.g. by competition between mutants and WT in infection of mosquitoes)

      (5) Decreased fitness of the mutants is implied by a reduction of oocyte formation.

      While interesting in their own way, collectively they do not represent a major advance in our understanding of MICOS evolution. Furthermore, the findings bifurcate into categories informing MICOS or Plasmodium biology. Both aspects are somewhat underdeveloped in their current form.

      This is unfortunate because there seem to be many missed opportunities in the manuscript that could, with additional experiments, lead to a manuscript with much wider impact. For me, what is remarkable about Plasmodium MICOS that sets it apart from other iterations is the apparent absence of the Mic10 subunit. Purification of plasmodium MICOS via the epitope tagged Mic60 and Mic19 could have verified that MICOS is assembled without this core subunit. Perhaps Mic60 and Mic19 are the vestiges of the complex, and thus operate alone in shaping cristae. Such a reduction may also suggest the declining importance of mitochondria in plasmodium.

      Another missed opportunity was to assay the impact of MICOS-depletion of OXPHOS in plasmodium.

      This is a salient issue as maybe crista morphology is decoupled from OXPHOS capacity in Plasmodium, which links to the apparent tolerance of mitochondrial morphology in cell growth and differentiation. I suggested in section A experiments to address this deficit.

      Finally, the authors could assay fitness costs of MICOS-ablation and associated phenotypes by assaying whether mosquito infectivity is reduced in the mutants when they are directly competing with WT plasmodium. Like the authors, I am also surprised that MICOS mutants can pass population bottlenecks represented by differentiation events. Perhaps the apparent robustness of differentiation may contribute plasmodium's remarkable ability to adapt.

      I realize that the authors put a lot of efforts into their study and again, I am very impressed by the sophistication of the methods employed. Nevertheless, I think there is still better ways to increase the impact of the study aside from overinterpreting the conclusions from the data. But this would require more experiments along the lines I suggest in Section A and here.

      We thank the reviewer for their extensive analysis of the significance of our findings, including the compliments on our microscopy images and the sophisticated experimental approaches. We hope we have convincingly argued why we could or could not include some of the additional analyses suggested by the reviewer in section 1 above.

      With regard to the significance statement, we want to point out that our finding that PfMICOS is not needed for initial formation of cristae (as opposed to organization thereof), is a confirmation of something that has been assumed by the field, without being the actual focus of studies. We argue that the distinction between formation and organization of cristae is important and deserves some attention within the manuscript. The result of MICOS not being involved in the initial formation of cristae, we argue to be relevant in Plasmodium biology and beyond. As for the insights into how MICOS works in Plasmodium we have confirmed that the previously annotated PfMIC60 is indeed involved in the organization of cristae. Furthermore, we have identified and characterized PfMIC19. These findings, we argue, are indeed meaningful insights into PfMICOS.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      MICOS is a conserved mitochondrial protein complex responsible for organising the mitochondrial inner membrane and the maintenance of cristae junctions. This study sheds first light on the role of two MICOS subunits (Mic60 and the newly annotated Mic19) in the malaria parasite Plasmodium falciparum, which forms cristae de novo during sexual development, as demonstrated by EM of thin section and electron tomography. By generating knockout lines (including a double knockout), the authors demonstrate that knockout of both MICOS subunits leads to defects in cristae morphology and a partial loss of cristae junctions. With a formidable set of parasitological assays, the authors show that despite the metabolically important role of mitochondria for gametocytes, the knockout lines can progress through the life stages and form sporozoites, albeit with diminished infection efficiency.

      We thank the reviewer for their time and compliment.

      Major comments:

      (1) The authors should improve to present their findings in the right context, in particular by:

      i) giving a clearer description in the introduction of what is already known about the role of MICOS. This starts in the introduction, where one main finding is missing: loss of MICOS leads to loss of cristae junctions and the detachment of cristae membranes, which are nevertheless formed, but become membrane vesicles. This needs to be clearly stated in the introduction to allow the reader to understand the consistency of the authors' findings in P. falciparum with previous reports in the literature.

      We extended the introduction to include this information.

      iii) at the end to the introduction, the motivating hypothesis is formulated ad hoc "conclusive evidence about its involvement in the initial formation of cristae is still lacking" (line 83). If there is evidence in the literature that MICOS is strictly required for cristae formation in any organism, then this should be explained, because the bona fide role of MICOS is maintenance of cristae junctions (the hypothesis is still plausible and its testing important).

      To clarify we rephrased the sentence to: “Although MICOS has been described as an organizer of crista junctions, its role during the initial formation of nascent cristae has not been investigated.”

      (2) Line 96-97: "Interestingly, PfMIC60 is much larger than the human MICOS counterpart, with a large, poorly predicted N-terminal extension." This statement is lacking a reference and presumably refers to annotated ORFs. The authors should clarify if the true N-terminus is definitely known - a 120kDa size is shown for the P. falciparum but this is not compared to the expected length or the size in S. cerevisiae.

      To solve the reference issue, we added the uniprot IDs we compared to see that the annotated ORF is bigger in Plasmodium. We also changed the comparison to yeast instead of human, because we realized it is confusing to compare to yeast all throughout the figure, but then talk about human in this specific sentence.

      Regarding whether the true N-terminus is known. Short answer: No, not exactly.

      However, we do know that the Pf version is about double the size of the yeast protein.

      As the reviewer correctly states, we show the size of 120kDa for the tagged protein in Figure 1G. Considering that we tagged the protein C-terminally, and observed a 120kDa product on western blot, it is safe to conclude that the true N-terminus does not deviate massively from the annotated ORF, and hence, that there is a considerable extension of the protein beyond a 60kDa protein. We do not directly compare to yeast MIC60 on our western blots, however, that comparison can be drawn from literature: Tarasenko et al., 2017 showed that purified MIC60 running at ~60kDa on SDS-PAGE actively bends membranes, suggesting that in its active form, the monomer of yeast MIC60 is indeed 60kDa in size.

      To clarify, we now emphasize that we ran the Alphafold prediction on the annotated open reading frame (annotated and sequenced by Bohme et al. and Chapell et al. now cited in the manuscript), and revised the wording to make clear what we are comparing in which sentence.

      (3) lines 244-245: "Furthermore, our data indicates the effect size increases with simultaneous ablation of both proteins?". The authors should explain which data they are referring to, as some of the data in Fig 3 and 4 look similar and all significance tests relate to the wild type, not between the different mutants, so it is not clear if any overserved differences are significant. The authors repeat this claim in the discussion in lines 368-369 without referring to a specific significance test. This needs to be clarified.

      As a reply to this and other comments from the reviewers we added the multiple testing within all samples. In addition, to clarify statistics used we included a supplementary dataset with all p-values and statistical tests used.

      (4) lines 304-306: "Though well established as the cristae organizing system, the role of MICOS in initial formation of cristae remains hidden in model organisms that constitutively display cristae.". This sentence is misleading since even in organisms that display numerous cristae throughout their life cycle, new cristae are being formed as the cells proliferate. Thus, failure to produce cristae in MICOS knockout lines would have been observable but has apparently not been reported in the literature. Thus, the concerted process in P. falciparum makes it a great model organism, but not fundamentally different to what has been studied before in other organisms.

      We deleted this statement.

      (5) lines 373-378. "where ablation of just MIC60 is sufficient to deplete functionality of the entire MICOS (11, 15),". The authors' claim appears to be contrary to what is actually stated in ref 15, which they cite:

      "MICOS subunits have non-redundant functions as the absence of both MICOS subcomplexes results in more severe morphological and respiratory growth defects than deletion of single MICOS subunits or subcomplexes."

      This seems in line with what the authors show, rather than "different".

      This sentence has been removed.

      (6) lines 380-385: "... thus suggesting that membrane invaginations still arise, but are not properly arranged in these knockout lines. This suggests that MICOS either isn't fully depleted,...". These conclusions are incompatible with findings from ref. 15, which the authors cite. In that study, the authors generated a ∆MICOS line which still forms membrane invaginations, showing that MICOS is not required at all for this process in yeast. Hence the authors' implication that MICOS needs to be fully depleted before membrane invaginations cease to occur is not supported by the literature.

      This sentence has been deleted in the revised version of the manuscript.

      Minor comments:

      (1) The authors should consider if the first part of their title could be seen as misleading: It suggests that MICOS is "the architect" in cristae formation, but this is not consistent with the literature nor their own findings.

      Title is changed accordingly

      - Line 43, of the three seminal papers describing the discovery of MICOS in 2011, the authors only cite two (refs 6 and 7), but miss the third paper, Hoppins et al, PMID: 21987634, which should probably be corrected.

      Done, the paper is now cited

      - Page 2, line 58: for a more complete picture the authors should also cite the work of others here which shows that although at very low levels, e.g. complex III (a drug target) and ATP synthase do assemble (Nina et al, 2011, JBC).

      Done

      - Page 3, line 80: "Irrespective of the shape of an organism's cristae, the crista junctions have been described as tubular channels that connect the cristae membrane to the inner boundary membrane (22, 24)." This omits the slit-shaped cristae junctions found in yeast (Davies et al, 2011, PNAS), which the authors should include.

      The paper and concept have been added to the manuscript, though the sentence has been moved up in the introduction, when crista junctions are first introduced.

      - Line 97: "poorly predicted N-terminal extension", as there is no experimental structure, we don't know if the prediction is poor. Presumably the authors mean either poorly ordered or the absence of secondary structure elements, or the poor confidence score for that region in the prediction? This should be clarified or corrected.

      We were referring to the poor confidence score. To address this comment as well as major point 2, we rewrote the respective paragraph. It now clearly states that confidence of the prediction is low, and we mention the tool that was used to identify conserved domains (Topology-based Evolutionary Domains).

      - Line 98: "an antiparallel array of ten β-sheets". They are actually two parallel beta-sheets stacked together. The authors could find out the name of this fold, but the confidence of the prediction is marked a low/very low. So, its existence is unknown, not just its "function".

      We adapted the domain description to “a stack of two parallel beta-sheets" and replaced the statement on unknown function by the statement “Because this domain is predicted solely from computational analysis, both its actual existence in the native protein and its biological function remain unknown.”

      - Fig 1B: The authors show two alphafold predictions of S. cerevisiae and P. falciparum Mic60 structures. There is however an experimental Mic60/19 (fragment) structure from the former organism (PMID: 36044574), which should be included if possible.

      We appreciate the reviewer’s suggestion and note that the available structural data indeed provides valuable insight into how MIC60 and MIC19 interact. However, these structures represent fusion constructs of limited protein fragments and therefore capture only a small portion of each protein, specifically the interaction interface. Because our aim in Fig. 1B is to compare the overall domain architecture of the full-length proteins, we believe that including fragment-based structures would be less informative in this context.

      - Line: 318-321: "The same trend was observed for PfMIC19 and PfMIC60. Although transcriptomic data suggested that low-level transcripts of PfMIC19 and PfMIC60 are present in ABS (38), we did not detect either of the proteins in ABS by western blot analysis. While this statement is true, the authors should comment on the sensitivity of the respective methods - how well was the antibody working in their hands and how do they interpret the absence of a WB band compared to transcriptomics data?

      The HA antibody used in our experiments is a standard commercial reagent that performs reliably in both WB and IFA, although it shows a low background signal in gametocytes. We agree that the sensitivity of the method and the interpretation of weak or absent bands should be addressed explicitly. Transcript levels for both PfMIC19 and PfMIC60 in asexual blood stages fall within the <25 percentile, suggesting that these signals likely represent background. Nevertheless, we acknowledge that low-level protein expression below the detection limit of western blot analysis cannot be excluded. To reflect these considerations, we added the sentence: ‘The apparent absence could indicate that transcripts are not translated in ABS or that the proteins’ expression was below detection limits of western blot analysis.

      - Lines 322-323: would the authors not typically have expected an IFA signal given the strength of the band in Western blot? If possible, the authors should comment if the negative fluorescence outcome can indeed be explained with the low abundance or if technical challenges are an equally good explanation.

      Considering the nature of the investigated proteins (embedded in the IMM and spread throughout the mitochondria) difficulties in achieving a clear signal in IFA or U-ExM are not very surprizing. While epitopes may remain buried in IFA, U-ExM usually increases accessibility for the antibodies. However, U-ExM comes at the cost of being prone to dotty background signals, therefore potentially hiding low abundance, naturally dotty signals such as the signal of MICOS proteins that localize to distinct foci (at the CJ) along the mitochondrion. Current literature suggests that, in both human and yeast, STED is the preferred method for accurate spatial resolution of MICOS proteins (https://www.ncbi.nlm.nih.gov/pubmed/32567732,https://www.ncbi.nlm.nih.gov/pubmed/3206734 4). Unfortunately, we do not have experience with, nor access to, this particular technique/method.

      - Lines 357-365: the authors describe limitations of the applied methods adequately. Perhaps it would be helpful to make a similar statement about the analysis of 3D objects like mitochondria and cristae from 2D sections. E.g. the apparent cristae length depends on whether cristae are straight (e.g. coiled structures do not display long cross sections despite their true length in 3D).

      The limitations of other methods are described in the respective results section.

      We added a clarifying sentence in the results section of Figure 4:

      “Note that such measurements do not indicate the true total length or width of cristae, as the data is two-dimensional. The recorded values are to be considered indicative of possible trends, rather than absolute dimensions of cristae.“

      This statement refers to the length/width measurements of cristae.

      In the context of Figure 4D we mention the following (see preprint lines 229 – 230): “We expect this effect to translate into the third dimension and thus conclude that the mean crista volume increases with the loss of either PfMIC19, PfMIC60, or both.”

      For Figure 5, we included a clarifying statement in the results section of the preprint (lines 269 – 273): “Note that these mitochondrial volumes are not full mitochondria, but large segments thereof. As a result of the incompleteness of the mitochondria within the section, and the tomography specific artefact of the missing wedge, we were unable to confirm whether cristae were in fact fully detached from the boundary membrane, or just too long to fit within the observable z-range.”

      - Line 404: perhaps undetected or similar would be a better description than "hidden"?

      The sentence does not exist in the revised manuscript.

      Reviewer #3 (Significance):

      The main strength of the study is that it provides the first characterisation of the MICOS complex in P. falciparum, a human parasite in which the mitochondrion has been shown to be a drug target. Mic60 and the newly annotated Mic19 are confirmed to be essential for proper cristae formation and morphology, as well as overall mitochondrial morphology. Furthermore, the mutant lines are characterised for their ability to complete the parasite life cycle and defects in infection effectivity are observed. This work is an important first step for deciphering the role of MICOS in the malaria parasite and the composition and function of this complex in this organism. The limitation of the study stems from what is already known about MICOS and its subunits in great detail in yeast and humans with similar findings regarding loss of cristae and cristae defects. The findings of this study do not provide dramatic new insight on MICOS function or go substantially beyond the vast existing literature in terms of the extent of the study, which focuses on parasitological assays and morphological analysis. Exploring the role of MICOS in an early-divergent organism and human parasite is however important given the divergence found in mitochondrial biology and P. falciparum is a uniquely suited model system. One aspect that would increase the impact of the paper would be if the authors could mechanistically link the observed morphological defects to the decreased infection efficiency, e.g. by probing effects on mitochondrial function. This will likely be challenging as the morphological defects are diverse and the fitness defects appear moderate/mild.

      As suggested by Reviewer 2, we examined mitochondrial membrane potential in gametocytes using MitoTracker staining and did not observe any obvious differences associated with the morphological defects. At present, additional assays to probe mitochondrial function in P. falciparum gametocytes are not sufficiently established, and developing and validating such methods would require substantial work before they could be applied to our mutant lines. For these reasons, a more detailed mechanistic link between the observed morphological changes and the reduced infection efficiency is currently beyond reach.

      The advance presented in this study is to pioneer the study of MICOS in P. falciparum, thus widening our understanding of the role of this complex to different model organism. This study will likely be mainly of interest for specialised audiences such as basic research parasitologists and mitochondrial biologists. My own field of expertise is mitochondrial biology and structural biology.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the role of the insulin receptor and the insulin growth factor receptor was investigated in podocytes. Mice, where both receptors were deleted, developed glomerular dysfunction and developed proteinuria and glomerulrosclerosis over several months. Because of concerns about incomplete KO, the authors generated and studied podocyte cell lines where both receptors were deleted. Loss of both receptors was highly deleterious with greater than 50% cell death. To elucidate the mechanism of cell death, the authors performed global proteomics and found that spliceosome proteins were downregulated. They confirmed this directly by using long-read sequencing. These results suggest a novel role for insulin and IGF1R signaling in RNA splicing in podocytes.

      This is primarily a descriptive study and no technical concerns are raised. The mechanism of how insulin and IGF1 signaling regulates splicing is not directly addressed but implicates potentially the phosphorylation downstream of these receptors. In the revised manuscript, it is shown that the mouse KO is incomplete potentially explaining the slow onset of renal insufficiency. Direct measurement of GFR and serial serum creatinines might also enhance our understanding of progression of disease, proteinuria is a strong sign of renal injury. An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful but may be masked by defects in other spliceosome genes. As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on.

      Significance:

      With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism.

      Comments on revised version:

      I'm satisfied with the revised manuscript and the responses to my previous concerns.

      Thank you.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, submitted to Review Commons (journal agnostic), Coward and colleagues report on the role of insulin/IGF axis in podocyte gene transcription. They knocked out both the insulin and IGFR1 mice. Dual KO mice manifested a severe phenotype, with albuminuria, glomerulosclerosis, renal failure and death at 4-24 weeks.

      Long read RNA sequencing was used to assess splicing events. Podocyte transcripts manifesting intron retention were identified. Dual knock-out podocytes manifested more transcripts with intron retention (18%) compared wild-type controls (18%), with an overlap between experiments of ~30%.

      Transcript productivity was also assessed using FLAIR-mark-intron-retention software. Intron retention w seen in 18% of ciDKO podocyte transcripts compared to 14% of wild-type podocyte transcripts (P=0.004), with an overlap between experiments of ~30% (indicating the variability of results with this method). Interestingly, ciDKO podocytes showed downregulation of proteins involved in spliceosome function and RNA processing, as suggested by LC/MS and confirmed by Western blot.

      Pladienolide (a spliceosome inhibitor) was cytotoxic to HeLa cells and to mouse podocytes but no toxicity was seen in murine glomerular endothelial cells.

      The manuscript is generally clear and well-written. Mouse work was approved in advance. The four figures are generally well-designed, bars/superimposed dot-plots.

      Methods are generally well described.

      Comments on revised version:

      Coward and colleagues have done an excellent job of responding to all the reviewer comments.

      Thank you.

      Reviewer #4 (Public review):

      Summary and background:

      This report entitled "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte" from Hurcombe et al is based on a mouse double knockdown of the IR and IGF1R and a parallel cultured mouse podocyte model. Insulin/IGF signaling system in mammals evolved as three gene reduplicated peptides (insulin, IGF-1, and IGF-2) and their two receptors IR and IGF1R that cross-react to variable extents with the peptides, are ubiquitously expressed, and signal through parallel pathways. The major downstream effect of insulin is to regulate glucose uptake and metabolism, while that of the IGF pathways is to regulate growth and cell cycling in part through mTORC1. The GH-IGF-1-IGF1R pathway regulates post-natal growth. IGF-2 signaling is thought to play a major role in regulating intrauterine growth and development, although IGF-2 is also present at high levels in post-natal life. Thus, one would anticipate that reducing IR/IGF1R signaling in any cell would slow growth and cell cycling by reducing growth factor and metabolic mTORC1-mediated and other processes including the splicing of RNA for protein synthesis.

      Thank you for this new extra review and assessing our paper with new suggestions (we addressed the previous suggestions to the satisfaction of other reviewers). Of note -regarding this introduction – the podocyte is a terminally differentiated cell and may have unique responses to insulin / IGF as it is accepted it does not generally proliferate (hence we consider understanding the actions of insulin / IGF and their receptors to be of interest). Indeed, we have recently shown a contrasting effect of IGF signalling in the podocyte. Partial suppression of the IGF1 receptor is beneficial in contrast to near complete suppression that results in mitochondrial dysfunction (PMID:38706850).

      Mouse IR/IGF1R double knockdown model:

      A double knockdown mouse model was generated by interbreeding mice with different genetic backgrounds carrying floxed sites for IR and IGF-1R to produce mixed background offspring with both floxed IR and IGF-1R genes. These mice were crossed so that the podocin promoter driven-Cre (that comes on at about embryonic day 12 bas podocytes are developing) would delete IR and IGF-1R genes. Since podocin is believed to be an absolutely podocyte-specific protein, this podocin promoter this is predicted to specifically knock down the IR and IGF1R genes only in podocytes. The weight and growth of double KO offspring was not different from controls, but some proportion of the double knockdown mice subsequently developed proteinuria by 6 months and 20% died, although no specific data is provided to identify the cause of the deaths since eGFR was not decreased. Surviving mice were evaluated at 6 months of age. The efficacy of knockdown was not demonstrated in the mouse model itself, although a temperature-sensitive cell line developed from these double knockdown mice showed that expression of IR and IGF-1R proteins in the Cre-treated cell line were both reduced by about 50% (no statistical analysis of this result provided).

      In the knockout mice, proteinuria was significantly increased by 6 months, but not at earlier time points. Histologic analysis showed proteinaceous casts, glomerulosclerosis and interstitial fibrosis. Podocyte number was stated to be reduced by about 30% in double knockdown mice, although the method by which this was evaluated seems to have been by counting WT1 positive nuclei in glomerular cross-sections, an approach that is well-known not to be a reliable way of assessing true podocyte number. No information is provided about podocyte size, density or glomerular volume.

      Comment: If IR/IGF1R deletion plays a significant role in normal podocyte function sufficient to cause proteinuria and glomerulosclerosis then the effect of reduced IR and IGF1R protein expression on podocyte function would have been expected to produce a phenotype before 6 months. A more likely scenario to explain the overall result is that deleting the IR and IGF1R genes at about embryonic day12 impacted podocyte development to a variable extent such that some mice developed fewer podocytes per glomerulus than other mice. As mice grow and their glomeruli and glomerular capillary area increases, those mice with fewer podocytes would not be able to completely cover the filtration surface with foot processes and would develop proteinuria and glomerulosclerosis. If reduced podocyte number per glomerulus is the proximate cause of the observed proteinuria, then modulation of the body and kidney growth rate by calorie restriction to slow growth (lower circulating IGF-1 levels) would be expected to be protective, while a high protein high calorie diet (higher circulating IGF-1 levels) or uni-nephrectomy to increase kidney growth rate would be expected to enhance proteinuria and glomerulosclerosis.

      Thank you for these comments. In response to them:

      (1) WT1 as a marker of podocyte number. We agree may not be the most accurate way of precisely measuring podocyte number but is widely accepted in the field (PMID:33655004 / PMID:38542564) and we think convincingly shows fewer podocytes at 6-months.

      (2) Podocyte size and density was not measured. This was not the focus of the paper and the histology obviously showed a significant phenotype in several mice (Figs 1D-F). Of note we did objectively assess a glomeruloscleorosis index (Fig 1D). We took the approach to understand mechanism through non-biased proteomics and phospho-proteomics of conditionally immortalised podocytes in which we had convincingly knocked down the insulin and IGF1 receptors (Figure 2)

      (3) You did not study the mice earlier to ascertain the developmental phenotype. We concede we did not do this but there was no significant proteinuria detected early in the mice so elected not to increase mouse numbers by studying them then (which we consider good practice for reduction, replacement and refinement). We suspect there would have been subtle changes in those mice that had significantly reduced simultaneous IR and IGF1R knockdown. It was precisely because of this that we generated a conditionally immortalised podocyte cell line with robust simultaneous knock-down of both receptors.

      (4) You did not show significant insulin and IGF1 receptor knockdown in the conditionally immortalised cell line (reviewer states it was 50%). We clearly knocked both receptors down (insulin and IGF1R) in the podocyte line by >80% which was highly statistically significant (p<0.00001). Figure 2A. We agree this was crucial (and we made the cell line because of the variability in the mouse model).

      The model as used may be more representative of a variable degree of podocyte depletion than an effect of impaired IR/IGF1R signaling. Therefore, although the phenotype may be ultimately attributable to the IR/IGF1R gene deletions the proteinuria and glomerulosclerotic phenotype itself was probably a consequence of defective podocyte development. Examining podocyte number, size, density and glomerular volume at earlier time points (4 weeks) would help to answer this question. Therefore, a more appropriate title would be "The insulin/IGF axis is critically important (for) normal podocyte development and deployment". In this context the effect of the knockdowns on splicing would make more sense.

      Please see our response (above). We think our final conclusion that in the podocyte the insulin/IGF axis is important for spliceosome activity and control is valid. This is due to our findings (both total and phospho proteomics results) and considering recent other papers showing this axis can rapidly phosphorylate a variety of spliceosome proteins in different cell types (PMID:39939313 / PMID:32888406). All discussed in detail in the manuscript).

      Cell culture studies. A cell line was generated using a temperature sensitive SV40 system that has been previously reported from this laboratory. A detailed analysis is provided to show that double knockout cells exhibited abnormal spliceosome activity. This forms the basis for the conclusion that "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte". There are several concerns that weaken this conclusion.

      (1) In the double knockdown cell culture system about 30% of cells were "lost" by 3 days and about 70% of cells were "lost" by 5days. The studies were done at the 3 day time point. It is not clear whether "lost" cells were in the process of dying, stress-induced detachment, or just growing more slowly than control due to reduced IR and IGF-1R signaling. These processes could have impacted splicing in a non-specific way independent of IR/IGF1R signaling itself.

      (2) Can a single cell line derived from the double floxed mice be relied on to provide an unbiased picture of the effect of deleting IR and IGF-1R? Presumably, the transfection and selection process will select for cells that survive thereby including unknown biases, possibly related to spliceosome function. Is a single cell line adequate? These investigators have extensive experience with this type of analysis, but this question is not addressed in the discussion.

      (3) To determine whether the effect is specific to reduced IR/IGFR signaling the deletion of IR and IGF-1R could be corrected by transfecting full length IR and IGF-1R cDNAs into the cells to restore normal IR/IGF1R signaling. If transfected cells with intact IR and IGF-1R expression and activity returns spliceosome activity to normal this would be evidence that receptors themselves play some role in spliceosome activity, as opposed to the downstream effect on growth limitation/stress on the cells.

      (4) Other ways of testing whether the splicing effect is specifically due to reduced IR/IGF-1R signaling would be to (a) block IR and IGF1R receptors using available inhibitors, (b) remove or reduce insulin, IGF-1 and IGF-2 levels in the culture medium, (c) use low glucose and amino acid culture medium to slow growth rate independent of receptor function, (d) or block intra-cellular signaling via the IR and IGF-1R receptors through mTORC1 inhibition using rapamycin or other signaling targets.

      (5) It would be useful to determine whether the cultured cells stressed in other ways (e.g. ischemia, toxins, etc.) also results in the same splicing abnormalities.

      Point 1. 70% cell loss was observed at day 7 (not day 5). We found approximately 20% loss at day 3. We opted to go for this early date hypothesising the key detrimental processes would be clear then. This 3 day time point also ensures there has been enough time to allow for the expression of Cre recombinase, receptor gene excision and degradation of existing endogenous IR/IGF1R following lentiviral transduction. Interestingly we did not find a major “death or apoptosis” signal in our data then but agree it should be considered. We think this is a specific pathway as we have examined several other conditionally immortalised detrimental podocyte cell line previously using proteomics with a much more severe phenotype of cell death (E.g. podocyte GSK3 alpha/beta knockdown) and we detected NO spliceosome signal (PMID:30679422). Furthermore, there are now other podocyte proteomics “stress” studies that have been published in which there is proteinuria and significant cell loss / death that also do not show spliceosome dysfunction. These include studying the detailed proteosomal signature of podocytes stressed with Doxorubicin and Lipopolysaccharide endotoxin LPS in mice (PMID:32047005) and bradykinin stimulation of rat podocytes (PMID:32518694).

      Point 2. Yes, we think it is valuable and reproducible. We generated a podocyte cell line from insulin receptor and IGF1 receptor homozygous floxed cells. Hence there is no selection bias in the cells when generating the line as both receptors are effectively intact. We then temporally “knocked down” the receptors with extrinsic lentiviral Cre.

      Importantly we validated our cell line findings both back in the cells (with Western blotting) and in our transgenic receptor knockdown mice and found evidence of spliceosomal dysregulation (Figure 3E and 3F). Also as discussed above the spliceosome has been identified in other models in the insulin/IGF pathway.

      Point 3. We don’t think the experiment of knocking down the receptors and then reconstituting them would prove this hypothesis. This is because if splicing abnormality was due to generalised cell dysfunction (which we do not think is the case in this situation) then putting the receptors back may simply restore cell health and the spliceosomal function (e.g. it does not prove it is via the receptors). Secondly, the process of transduction with multiple lentiviruses may be inherently stressful to the cell and there may be a high level of extrinsic receptor inserted which may also be confounding/detrimental. Finally, as discussed there are now several lines of evidence describing insulin / IGF signalling to spliceosomal proteins which we consider important (discussed in the paper in detail).

      Point 4. We think modulating the receptors using the Cre-lox approach is the cleanest approach (with fewer off-target effects) to interrogate the insulin / IGF axis. It allows us to differentiate the cells by thermo-switching (which is crucial for this terminally differentiated cell) and then robustly knocking down both receptors simultaneously to investigate mechanism. We agree these supplementary approaches may give some extra information if their limitations (eg off target effects of inhibitors) are also taken into consideration.

      Point 5. They do not. Please see response to point 1 above regarding GSK3, Doxorubicin, LPS and bradykinin challenge.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Johnston and Smith used linear electrode arrays to record from small populations of neurons in the superior colliculus (SC) of monkeys performing a memory-guided saccade (MGS) task. Dimensionality reduction (PCA) was used to reveal low-dimensional subspaces of population activity reflecting the slow drift of neuronal signals during the delay period across a recording session (similar to what they reported for parts of cortex: Cowley et al., 2020). This SC drift was correlated with a similar slow-drift subspace recorded from the prefrontal cortex, and both slow-drift subspaces tended to be associated with changes in arousal (pupil size). These relationships were driven primarily by neurons in superficial layers of the SC, where saccade sensitivity/selectivity is typically reduced. Accordingly, delay-period modulations of both spiking activity and pupil size were independent of saccade-related activity, which was most prevalent in deeper layers of the SC. The authors suggest that these findings provide evidence of a separation of arousal- and motor-related signals. The analysis techniques expand upon the group's previous work and provides useful insight into the power of large-scale neural recordings paired with dimensionality reduction. This is particularly important with the advent of recording technologies which allow for the measurement of spiking activity across hundreds of neurons simultaneously. Together, these results provide a useful framework for comparing how different populations encode signals related to cognition, arousal, and motor output in potentially different subspaces.

      Comments on revised manuscript:

      The authors have done a very good job of responding to all of the reviewers' concerns.

      No weaknesses to address.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themself introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time on explaining the importance of arousal and how it could interfere with oculomotor behavior.

      (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results.

      (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC?

      Comments on revised manuscript:

      I remain somewhat concerned that the authors jump immediately into an analysis of the 'arousal-related' effects on SC activity. Before that, I would like to see a more detailed discussion justifying the use pupil size alone (i.e., w/o other indicators such as RT) as indicative of fluctuations in general arousal that are causal to concomitant changes in SC activity. Instead, in its current form, the authors find changes in SC activity and describe them immediately as 'arousal-related'.

      Other than this conceptual issue, I do not have major problems with the analysis per se.

      We agree with the reviewer that we may have advanced into discussing arousal-related effects in the previous version of the manuscript without providing a thorough explanation for why we think the slow drift axis is associated with changes in the monkey’s arousal levels. Arousal has been linked to the size of the pupil as well as movements of the eyes in numerous previous studies. We have made the following changes in the revised manuscript to address the reviewer’s concern:

      (1) When first describing how the spiking responses of SC neurons fluctuate over the course of a recording session (Lines 130-132), we have used the phrase "slow fluctuations in the spiking responses" rather than "arousal-related fluctuations in the spiking responses". Then, when describing these effects in more detail (Lines 136-147), we have explained why we think these fluctuations may be related to arousal. The following text has been added in the revised manuscript for clarification:

      “We found that this low-dimensional pattern of activity in the SC was also correlated with pupil size in the present study and with simultaneously recorded data in the prefrontal cortex (PFC), pointing to a link between this brain-wide fluctuation and changes in the monkeys’ arousal levels while performing the task.” (Lines 136-147)

      (2) We have changed the subheading in Line 183 of the revised manuscript from "Arousal-related fluctuations are present in the SC and correlated with pupil size and fluctuations in PFC activity" to "Slow fluctuations in SC spiking activity are correlated with pupil size and PFC activity". Given that we have not yet explained the results linking these fluctuations to arousal at this stage of the manuscript, we believe that this revised title is more accurate and avoids jumping too quickly to arousal-related fluctuations without first explaining the link between SC slow drift, pupil size and PFC activity.

      (3) We have provided additional justification for using pupil size and PFC activity to assess whether SC slow drift is associated with changes in the monkeys’ arousal levels. In a previous study, we computed an identical slow drift axis for spiking responses in visual cortex (V4) and PFC, and investigated how these low-dimensional neural activity patterns, which were themselves strongly correlated, were associated with various eye-related metrics (e.g., pupil size, microsaccade rate, reaction time, saccade velocity). Results showed that pupil size was the strongest predictor of slow drift in V4 and PFC. Given that the eye metrics were also strongly correlated with each other, we believe that the observed relationship between SC slow drift, pupil size and PFC activity provides sufficient evidence to suggest that the fluctuations observed in the SC are arousal-related. The following text has been added to the Results section of the revised manuscript:

      “Moreover, previous work in our laboratory computed a similar slow-drift axis using spiking activity in visual cortex (V4) and PFC, and investigated the relationship between these low-dimensional neural activity patterns and different eye-related metrics (e.g., pupil size, microsaccade rate, reaction time, saccade velocity). In addition to observing a strong correlation between V4 and PFC slow drift, we found that, relative to the other eye-related metrics, pupil size was the strongest predictor of these fluctuations (Johnston et al., 2022a). Thus, to further confirm the link between the SC slow drift axis and changes in the monkeys’ arousal levels while they performed the MGS task, we next sought to explore if projections onto the SC slow drift axis were associated with pupil size.” (Lines 236-344)

      Reviewer #3 (Public review):

      Summary:

      This study looked at slow changes in neuronal activity (on the order of minutes to hours) in the superior colliculus (SC) and prefrontal cortex (PFC) of two monkeys. They found that SC activity shows slow drift in neuronal activity like in the cortex. They then computed a motor index in SC neurons. By definition, this index is low if the neuron has stronger visual responses than motor response, and it is low if the neuron has weaker visual responses and stronger motor responses. The authors found that the slow drift in neuronal activity was more prevalent in the low motor index SC neurons and less prevalent in the high motor index neurons. In addition, the authors measured pupil diameter and found it to correlate with slow drifts in neuronal activity, but only in the neurons with lower motor index of the SC. They concluded that arousal signals affecting slow drifts in neuronal modulations are brain-wide. They also concluded that these signals are not present in the deepest SC layers, and they interpreted this to mean that this minimizes the impact of arousal on unwanted eye movements.

      Strengths:

      The paper is clear and well-written.

      Showing slow drifts in the SC activity is important to demonstrate that cortical slow drifts could be brain-wide.

      Weaknesses:

      The authors find that the SC cells with the low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual sensitivity. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in the most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC.

      Of course, the general conclusion is that the motor neurons will not have the arousal signal. It's just the interpretation that is different in the sense that the lack of the arousal signal is due to a lack of visual sensitivity in the motor neurons.

      I think that it is important to consider the alternative caveat of different amounts of light entering the system. Changes in light level caused by pupil diameter variations can be quite large. Please also note that I do not mean the luminance transient associated with the target onset. I mean the luminance of the gray display. it is a source of light. if the pupil diameter changes, then the amount of light entering to the visually sensitive neurons also changes.

      Comments on revised manuscript:

      The authors have addressed my first primary comment. For the light comment, I'm still not sure they addressed it. At the very least, they should explicitly state the possibility that the amount of light entering from the gray background can matter greatly, and it is not resolved by simply changing the analysis interval to the baseline pre-stimulus epoch. I provide more clear details below:

      In line 194 of the redlined version of the article (in the Introduction), the citation to Baumann et al., PNAS, 2023 is missing near the citation of Jagadisan and Gandhi, 2022. Besides replicating Jagadisan and Gandhi, 2022, this other study actually showed that the subspaces for the visual and motor epochs are orthogonal to each other

      We thank the reviewer for this comment and apologize that the citation to Baumann et al., PNAS, 2023 was missing in the previous version of the manuscript. In addition to including this citation in the revised version, we have provided a much more comprehensive description of all three cited studies and clarified that, in addition to replicating the results of Jagadisan and Gandhi, Baumann et al., PNAS, 2023 showed that the subspaces for the visual and motor epochs are orthogonal to each other. The following lines have been added to the Introduction of the revised manuscript:

      “A similar separation has been observed for visual and motor responses in the SC (Jagadisan and Gandhi, 2022; Ayar et al., 2023; Baumann et al., 2023). For example, Jagadisan and Gandhi (2022) used linear microelectrode arrays to investigate why early eye movements are not triggered when neuronal responses to a visual target, presented before a delayed saccade to that target, cross a threshold. They found that population activity in the SC was less stable during the visual epoch of a delayed saccade task, relative to the saccade epoch. Moreover, saccades could be evoked more easily by patterned microstimulation when the temporal structure of the microstimulation was stable across electrodes, providing a potential explanation for how downstream regions differentiate between visual and motor responses. Similar results were reported by Baumann et al. (2023) who found that the strength of SC motor responses during a saccade to a visual image depends on the features of that image (e.g., contrast, orientation). When dimensionality reduction was applied to the spiking responses of neuronal populations in the SC, the population trajectory during the initial visual response to the image was orthogonal to that during the motor response. These findings replicate the separation in temporal population structure reported by Jagadisan and Gandhi (2022) and support the results of Ayar et al. (2023). They found that, although not completely orthogonal, population activity in the SC is distinct for visual and motor responses during the same oculomotor task and across different tasks, which could further facilitate the decoding of signals related to sensation, action and context by downstream regions.” (Lines 110-127)

      Line 683 (and around) of the redlined version of the article (in the Results): I'm very confused here. When I mentioned visual modulation by changed pupil diameter, I did not mean the transient changes associated with the brief onset of the cue in the memory-guided saccade task. I meant the gray background of the display itself. This is a strong source of light. If the pupil diameter changes across trials, then the amount of light entering the eye also changes from the gray background. Thus, visually-responsive neurons will have different amount of light driving them. This will also happen in the baseline interval containing only a fixation spot. The arguments made by the authors here do not address this point at all. So, please modify the text to explicitly state the possibility that the global luminance of the display (as filtered by the pupil diameter) alters the amount of light driving the visually-responsive neurons and could contribute to the higher effects seen in the more visual neurons.

      We apologize that our analysis did not fully address the reviewer’s concern that the presence of fluctuations in visual neurons and their absence in motor neurons may have arisen indirectly due to changes in the amount of light entering the eye caused by changes in pupil size. As per the reviewer’s suggestion, we have now raised the possibility that visual neurons in the SC may have firing rates that are monotonically related to slow trends in overall luminance induced by pupil size changes, whereas motor neurons do not. Although we believe this to be an unlikely explanation, the paragraph from lines 374-398 has been modified to better describe this possibility, including the following text:

      “Given that slow drift is found in traditionally defined visual areas (e.g., area V4) and in regions that show mixed selectivity for multiple task variables (e.g., PFC) (Cowley et al., 2020), it seems unlikely that slow drift is caused by luminance fluctuations alone and more likely that it reflects global changes in arousal. At the same time, these arousal-related fluctuations covary with changes in pupil size (Johnston et al., 2022a), which could modulate the amount of light entering the eye from the display. This might affect visual neurons but not motor neurons due to their lack of visual sensitivity. Because SC neurons exist on a continuum, with visual responses decreasing and motor responses increasing from the intermediate to deep layers (Massot et al., 2019; Heusser et al., 2022) and no clear categorical boundary for motor-only neurons, any readout strategy would still need to avoid corruption of the motor output by slow drift, even if it were caused by changes in the amount of light entering the eye.” (Lines 387-398)

      The figures (everywhere, including the responses to reviewers) are very low resolution and all equations in methods are missing.

      We thank the reviewer for bringing this to our attention. We believe this issue may have arisen during conversion of the manuscript file for review, as the figures were of sufficient quality and the equations visible in the version that appeared online (https://doi.org/10.7554/eLife.99278.2). In any case, we will ensure that high-resolution figures are submitted with the revised manuscript and apologize that they were low resolution in the previous version.

      I'm very confused by Fig. 2 - supplement 2. Panel B shows a firing rate burst aligned to *microsaccade* onset. Does that mean you were in the foveal SC? i.e. how can neurons have a motor burst to the target of the memory-guided saccade and also for microsaccades? And which microsaccade directions caused such a burst? And what does it mean to compute the motor index and spike count for microsaccades in panel C? if you were in the proper SC location for the saccade target, then shouldn't you *not* get any microsaccade-related burst at all? This is very confusing to me and needs to be clarified

      We agree that clarification is needed here and thank the reviewer for their comment. The eccentricity of the targets was set to match the endpoints of the evoked saccades, which for some sessions were relatively close to the fovea. The mean eccentricity of the targets across sessions was 4.52° (SD = 2.89°). These values are now reported in the Methods section of the revised manuscript (Line 637). For the neuron shown in Figure 2–figure supplement 2, the eccentricity of the targets was 3°. Previous research has shown that some SC neurons respond during microsaccades as well as slightly larger saccades (see Hafed & Krauzlis, 2012, J. Neurophysiol., Fig. 4B). This likely explains why the neuron shown in Figure 2–figure supplement 2, which had a receptive field at ~3° based on saccades evoked by microstimulation, also responded during microsaccades. We apologize that this was not explained in the previous version and agree that it could have been confusing for the reader. To address this, the legend for this supplementary figure has been edited in the revised version and now reads:

      “(B) PSTH for an SC neuron that responded around the time of a microsaccade. Firing rates were computed in 1ms bins, averaged across trials and smoothed using a Gaussian function (σ = 5ms). Note that the targets were set to 3º in this session based on saccades evoked by microstimulation (see Methods). Previous research has shown that some SC neurons respond during microsaccades as well as to slightly larger saccades (Hafed and Krauzlis, 2012). This likely explains why this SC neuron, which had a RF at ~3º based on saccades evoked by microstimulation, also responded around the time of a microsaccade.” (Lines 1026-1031)

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study explores how exogenous attention operates at the finest spatial scale of vision, within the foveola - a topic that has not been previously explored. The question is important for understanding how attention shapes perception, and how it differs between the periphery and the central regions of highest visual acuity. The evidence is compelling, as shown by carefully designed experiments with state-of-the-art eye tracking to monitor attended locations just a few tens of minutes of arc away from the fixation target, but additional clarification regarding analyses and implications for vision and oculomotor control would broaden the impact of the study.

      We thank the editors and reviewers for their thorough evaluation of our work. We have carefully revised the manuscript and substantially reworked the Discussion to address all of the points raised, eliminate redundancies, streamline the text, and clarify the implications of our findings for vision and oculomotor control. We have also expanded the documentation of our power analyses and conducted the additional analyses requested by the reviewers. Our point-by-point responses are provided.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates how exogenous attention modulates spatial frequency sensitivity within the foveola. Using high-precision eye-tracking and gaze-contingent stimulus control, the authors show that exogenous attention selectively improves contrast sensitivity for low- to midrange spatial frequencies (4-8 cycles/degree), but not for higher frequencies (12-20 CPD). In contrast, improvements in asymptotic performance at the highest contrast levels occur across all spatial frequencies. These results suggest that, even within the foveola, exogenous attention operates through a mechanism similar to that observed in peripheral vision, preferentially enhancing lower spatial frequencies.

      Strengths:

      The study shows strong methodological rigor. Eye position was carefully controlled, and the stimulus generation and calibration were highly precise. The authors also situate their work well within the existing literature, providing a clear rationale for examining the fine-grained effects of exogenous attention within the foveola. The combination of high spatial precision, gazecontingent presentation, and detailed modeling makes this a valuable technical contribution.

      Weaknesses:

      The manipulation of attention raises some interpretive concerns. Clarifying this issue, together with additional detail about statistics, participant profiles, other methodological elements, and further discussion in relation to oculomotor control in general, could broaden the impact of the findings.

      We thank the reviewer for the helpful comments. In the Discussion, we have now considered additional factors that could have contributed to the observed attentional effects. First, the exogenous cue might have functioned as a temporal warning signal. However, the interval between cue and stimulus onset was fixed across trials, meaning that the cue did not provide temporal information beyond what participants could already anticipate. Furthermore, participants completed a large number of trials (≥ 4000), making it highly likely that the temporal relationship between trial onset and target onset was overlearned. These considerations indicate that the observed benefit in the valid condition was predominantly attributable to spatial reorienting induced by the cue, rather than to differences in the temporal predictability of the target across conditions.

      Another possibility is that the 100% validity of the exogenous cue could potentially have promoted endogenous attentional engagement. Yet, several characteristics of our task strongly limited the extent to which such endogenous engagement could meaningfully influence performance. Endogenous attentional benefits typically emerge only after ~150-200 ms (Posner & Petersen, 1990; Carrasco, 2011), whereas our cue-target SOA was 100 ms, and the target remained visible for only 50 ms. Under these temporal constraints, any voluntary, slow endogenous enhancement would primarily occur after the stimulus offset. Thus, although endogenous maintenance is theoretically possible given the cue’s validity, it is unlikely to have substantially contributed to the observed attentional benefits in our task.

      Regarding the points on statistical reporting and participant details, we followed the reviewer’s suggestions by adding post hoc power analyses and providing more comprehensive reporting of the linear model outputs (see Appendices 1 and 2). We also expanded the description of the training procedures conducted with participants prior to formal data collection in the Methods section.

      We appreciate the reviewer for raising the important question of how our findings may relate to oculomotor control. To address this, we analyzed trials excluded from the manuscript due to saccades. This analysis revealed that saccade latencies were shorter in the valid condition than in the neutral condition (see Figure 2 — Supplementary Figure 2). This earlier saccade onset may reflect exogenously triggered preparatory activity in the oculomotor system in response to the salient cue. Future studies are needed to examine whether this preparatory mechanism serves to efficiently guide microsaccades or saccades toward behaviorally relevant stimuli in everyday vision. We have incorporated this point into the Discussion, highlighting a potential mechanistic link between exogenous attention and oculomotor behavior.

      Reviewer #2 (Public review):

      Summary:

      This study aims to test whether foveal and non-foveal vision share the same mechanisms for endogenous attention. Specifically, they aim to test whether they can replicate at the foveola previous results regarding the effects of exogenous attention for different spatial frequencies.

      Strengths:

      Monitoring the exact place where the gaze is located at this scale requires very precise eyetracking methods and accurate and stable calibration. This study uses state-of-the-art methods to achieve this goal. The study builds on many other studies that show similarities between foveal vision and non-foveal vision, adding more data supporting this parallel.

      Weaknesses:

      The study lacks a discussion of the strength of the effect and how it relates to previous studies done away from the fovea. It would be valuable to know if not just the range of frequencies, but the size of the effect is also comparable.

      We thank the reviewer for raising these important issues. In response, we have expanded the Discussion to link our findings to prior work. First, we included a direct comparison of our effect sizes with those reported in previous studies. This analysis revealed that our effect sizes are highly comparable to those earlier studies (see Figure 3 — Supplementary Figure 4). Second, we contextualized our findings within the popular framework of normalization model of attention in the Discussion. We detected a mixture of contrast and response gain effects, consistent with predictions from the normalization framework given our experimental design. Finally, we extended the Discussion to consider potential underlying neural mechanisms. Specifically, we suggested that differences in attentional modulation, particularly the manifestation in response gain vs. contrast gain between the fovea and extrafovea, may reflect distinct characteristics of foveal neurons relative to those in extrafoveal regions.

      Reviewer #3 (Public review):

      Summary:

      This paper explores how spatial attention affects foveal information processing across different spatial frequencies. The results indicate that exogenously directed attention enhances contrast sensitivity for low- to mid-range spatial frequencies (4-8 CPD), with no significant benefits for higher spatial frequencies (12-20 CPD). However, asymptotic performance increased as a result of spatial attention independently of spatial frequency.

      Strengths:

      The strengths of this article lie in its methodological approach, which combines a psychophysical experiment with precise control over the information presented in the foveola.

      Weaknesses:

      The authors acknowledge that they used the standard approach of analyzing observeraveraged data, but recognize that this method has limitations: it ignores the uncertainty associated with parameter estimates and the relationships between different parameters of the psychometric model. This may affect the interpretation of attentional effects. In the future, mixed-effects models at the trial level could overcome these limitations.

      We thank the reviewer for this comment. Our Methods section continues to transparently discuss these limitations, as well as the fact that these limitations are shared with most published studies in psychophysics. Additionally, we now include measures of uncertainty for all key effects (see Appendices 1 and 2), and we have reported effect sizes throughout the Results section. Finally, we have added post hoc power analyses to the Methods. Following previous approaches to power calculation for related experiments, we found that our study was sufficiently powered to detect the main effect of attention and had moderate power to detect the interaction between attention and spatial frequency.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manipulation of attention raises some interpretive concerns. Since only valid and neutral cue conditions were included, the results might reflect differences in temporal predictability rather than true spatial reorienting of attention. In other words, the valid cue could act mainly as a temporal warning signal that reduces uncertainty about stimulus onset. Without invalid trials or a non-predictive control cue, it remains difficult to separate spatial and temporal contributions to exogenous attention.

      We thank the reviewer for raising this point. In this regard, we would like to clarify that there was no temporal uncertainty in stimulus onset: across all conditions and trial types, the stimulus was presented at the same time relative to the start of the trial, i.e., 600 ms after the start. Yet, we acknowledge that the shorter temporal proximity between the cue and stimulus in valid trials could serve as an additional temporal warning signal, potentially conferring an advantage relative to the neutral condition. While we cannot completely rule out a contribution of such temporal cueing within the constraints of the current experimental design, we believe its impact was limited. Specifically, the fixed cue-stimulus interval reduced the cue’s ability to convey additional temporal information. Furthermore, observers completed a large number of trials (≥4000), and the temporal contingency between trial onset and target onset was likely overlearned. Taken together, these considerations indicate that the observed benefit in the valid condition was predominantly attributable to spatial reorienting induced by the cue, rather than to differences in the temporal predictability of the target across conditions. We now mention this in the revised Discussion (lines 309-318).

      We recognized that the original Figure 2 illustrating the experimental paradigm may have caused confusion regarding the timing structure of the task. We have therefore updated the figure to more explicitly illustrate the trial timeline in both conditions.

      (2) The reported effects seem small, and no power analysis is provided. With only seven participants, the study may not have enough statistical power to confirm that the observed differences are reliable or generalizable. Although the technical precision in gaze and stimulus control is impressive, it cannot offset the limitations of a small sample. The authors should include effect size estimates, confidence intervals, and ideally a post-hoc power analysis.

      The statistical results are reported only as χ² values from model comparisons, which do not show the direction or size of the effects. For clarity and transparency, these tests should be accompanied by fixed-effect estimates with their standard errors and confidence intervals, so readers can better assess both the reliability and perceptual relevance of the findings.

      The reviewer raised several important points regarding the study's statistical rigor.

      In the revised manuscript, we now report effect size estimates (Cohen’s d) in the Results section and Appendices. Effect sizes were in the medium-to-large range, including the effect of attention on contrast sensitivity at 4 and 8 CPD, and the difference in attentional benefit on contrast sensitivity between 4 and 12 CPD and between 8 and 12 CPD. We have also included the full model outputs, including standard errors and confidence intervals, in the Appendices.

      The sample size for the current study was determined based on the magnitude of the attentional effects observed in our previous work (Guzhang et al., 2021). The experimental design and dependent measures were highly similar across the two studies, and the prior study revealed a robust effect, which accounted for a substantial proportion of within-observer variance in a tightly controlled repeated-measures design.

      We have revised the manuscript, adding bootstrap-based power estimates, following the procedure described by Jigo and Carrasco (2020), using data from Guzhang et al. (2021). Assuming the effect size in our current study would be comparable to the prior one, 2 to 12 observers were randomly sampled with replacement, and a one-way repeated-measures ANOVA with attention as the main factor was used. This procedure was repeated 10,000 times, and power was estimated as the proportion of iterations yielding a significant main effect for each sample size. The results of this analysis indicate that a sample size of five observers would have been sufficient to achieve approximately 80% power to detect the main effect of attention in the prior study. Based on these estimates, the sample size used in the current study (seven observers) is adequately powered.

      We also conducted a post hoc power analysis to evaluate the power of our design to detect the main effects and their interaction. It was performed using the R package simr, which estimates statistical power for mixed-effects models through model-based simulation. Specifically, simr generated datasets based on the fixed- and random-effect structure of the fitted model, preserving the observed effect sizes and variance components. For each simulated dataset, the model was refit, and the effect of interest was tested. By repeating this procedure 501 times across different sample sizes, power was estimated as the proportion of simulations in which the effect was statistically significant. Based on these post hoc simulations, we estimated that our study had high power (>95%) to detect the main effects and moderate power (>65%) to detect the interaction. Although the estimated power for the interaction was lower than for the main effects, the observed effect size was substantial (as indexed by Cohen’s d), indicating that the interaction was not trivially small.

      We now describe these analyses in lines 501-532 in the Methods section.

      (3) The task seems quite demanding, requiring fine spatial discrimination, very small stimuli, and head stabilization with a bite bar. It is not clear whether participants were naïve or experienced observers. If they had prior psychophysical training, practice effects could have influenced the results, particularly given the lack of invalid trials. The manuscript would benefit from clarifying participants' experience level and describing any training or familiarization procedures.

      We appreciate the reviewer’s concern regarding potential training effects. All observers had prior experience with similar tasks, but were naïve to the scope of this study. Each participant underwent an initial familiarization phase of approximately 50 trials with the experimental setup of this study. They then completed an additional ~50 trials to estimate their individual contrast thresholds per spatial frequency level before we proceeded with data collection at the five predefined contrast levels.

      Based on our experience, we have found that, for experiments similar to the one described here, observers quickly adapt to the setup and are generally able to maintain reliable fixation and stable performance, even during the initial training phase. In addition, each participant completed approximately 400 trials before the data collection started. Even observers who began the session with no prior experience would have become practiced with the setup by the time the actual data-collection phase started, during which ~4000 trials were collected per observer. Therefore, whether an observer participated in previous experiments is unlikely to meaningfully affect the results, as the large number of trials ensures comparable levels of task familiarity across individuals.

      Crucially, valid and neutral trials were interleaved throughout the session. Any general learning or practice would therefore influence both conditions equally. Despite this, we still observed clear performance improvements in the valid condition relative to the neutral condition, indicating that the observed benefits cannot be attributed solely to practice and reflect an attentional enhancement. We have added elaboration on the training procedures in Methods (lines 411-429).

      Finally, we recognize that the lack of invalid trials may raise concerns given our 100% spatially predictive cue, as noted in Reviewer 3’s first comment. We refer the reader to our response to that point for a more detailed discussion of cue validity and the distinction between exogenous and endogenous influences in our paradigm.

      (4) The study would benefit from a clearer connection between the behavioral results and possible underlying neural mechanisms. How might the observed changes in contrast sensitivity relate to known physiological processes at the retinal, thalamic, or cortical level? The discussion could be strengthened by framing the findings within established models of attentional modulation or by referring to known effects of attention in the early visual cortex.

      This is an important point, and we agree that framing the findings within established models of attentional modulation can strengthen the discussion. We believe that the normalization model of attention (Reynolds and Heeger, 2009; Herrmann et al., 2010) offers a useful framework for interpreting our behavioral findings, especially the attention-related changes in contrast sensitivity and asymptotic performance observed at the foveal scale. We have now added a more detailed discussion linking our results to this model and considering, explicitly as speculation, how known physiological processes at different stages may contribute to the observed effects in Discussion (lines 264-307).

      (5) The ecological relevance of the results is not fully developed. The authors propose that the observed effects may resemble natural attentional shifts triggered by salient events, yet the brief, highly localized flashes used here are somewhat artificial. A more likely interpretation is that these mechanisms relate to oculomotor control within the fovea, perhaps reflecting preparatory activity for microsaccades or fine fixation adjustments. Considering this view could broaden the impact of the findings and link them to current discussions on the relationship between attention and oculomotor control.

      We thank the reviewer for raising this important point regarding the ecological relevance of our findings, which we did not sufficiently address in the original manuscript. Although we briefly motivated scenarios that engage exogenous attention at high spatial resolution, such as detecting road signs or traffic lights at a distance while driving, we did not fully elaborate on how such attentional processes may link to downstream visual and oculomotor functions.

      In our experiment, observers maintained fixation and avoided saccades throughout the trial. Nevertheless, in a subset of trials (on average 17% ± 3%), observers made saccades after stimuli disappeared and prior to providing a response. Typically, these movements were microsaccades with amplitudes smaller than 0.5°, directed toward the target location, in both valid and neutral trials. These saccades were discarded prior to the analyses performed in the manuscript. Inspired by the reviewer’s feedback, we decided to examine the saccade latency in these trials relative to the onset of the response cue to assess whether exogenous cueing influenced oculomotor timing. Notably, we observed an earlier onset of microsaccades in valid compared to neutral trials (71 ms ± 50 ms faster, P < 0.01). We have now added this observation as Figure 2 — Supplementary Figure 2 in the manuscript. Because the presence of an exogenous pre-cue was the only difference between the two trial types, the earlier microsaccade onset likely reflects exogenously triggered preparatory activity in the oculomotor system in response to the salient pre-cue. Such fine-grained attention may prime potential eye movements toward behaviorally relevant stimuli for further examination. This interpretation is consistent with the reviewer’s suggestion and supports a mechanistic link between exogenous attention and oculomotor behavior, extending the ecological relevance of our findings. This point has been added to the Discussion on lines 329 to 340.

      We also conducted analysis to examine ocular drift behavior following the response cue. Although trials included in the manuscript analyses were constrained such that fixation during target presentation remained within a small window (10’ radius) around the fixation marker, we did not assess whether gaze subsequently drifted closer to the target location after the response cue. One possibility is that exogenous attention might bias ocular drift, shifting the preferred locus of fixation closer to the target. To address this, we computed the average Euclidean distance between gaze position and the target location following response cue onset for valid and neutral trials. However, we found no significant difference in gaze-target distance between valid and neutral trials (p = 0.57).

      Although the spatial cueing approach has long been used to probe exogenous attention in a controlled manner in psychophysical experiments, we fully recognize the importance of understanding attention under more naturalistic viewing conditions that allow observers to freely move their eyes. Developing paradigms that incorporate more naturalistic, salient stimuli would be an important direction for future work, enabling investigation of exogenous attention in ecologically valid settings and its influence on sequential actions and processes, including oculomotor behavior.

      (6) There is no statement about the availability of the data and code used for the experiment.

      We have now added the data and code for the analysis pipeline to the Open Science Framework (OSF).

      Reviewer #2 (Recommendations for the authors):

      (1) The study could discuss the strength of the effect and how it relates to previous studies.

      We thank the reviewer for raising this point. To facilitate direct comparison with the study by Jigo and Carrasco (2020), we computed attentional benefit as the ratio of contrast sensitivity between the valid and neutral conditions (now shown in Figure 3 — Supplementary Figure 4). In their data, the attentional benefit at 0° eccentricity peaked just below 4 CPD, with a ratio of approximately 1.2, corresponding to a ~20% increase in contrast sensitivity. This magnitude closely matches the benefit we observed for fine-grained attentional shifts within the foveola at spatial frequencies between 4 and 8 CPD (17% ± 12% and 16% ± 14% for 4 and 8 CPD, respectively). We have added this comparison to the Discussion (lines 246-262).

      In addition, we acknowledge that prior studies have reported heterogeneous attentional effects, including pure contrast gain, pure response gain, or a mixture of the two. We now explicitly reference these findings in the Discussion and use the normalization model of attention (Reynolds and Heeger, 2009; Herrmann et al., 2010) to account for how differences in stimulus configuration, attention field size, and eccentricity may account for discrepancies between our findings and prior studies examining attention in the extrafovea or when broadly distributed across the fovea (lines 264-307).

      (2) Minor details:

      (a) The abstract mentions gaze-contingent-display, but if I understand correctly, the stimulus was not presented in a gaze-contingent manner.

      That’s correct. Although stimuli were not presented gaze-contingently, we used a gaze-contingent calibration procedure (see Methods, lines 386-389) to achieve higher precision in localizing the line of sight. This increased accuracy was essential for selecting trials in which stimuli remained at the intended eccentricity relative to the preferred locus of fixation. To avoid potential confusion, however, we have removed this detail from the abstract.

      (b) Line 361: What is the manual calibration the authors are referring to? It does not appear to be described.

      The text has been updated to explain more explicitly what auto and manual calibrations are.

      (c) Line 402: There may be a typo towards the end of the line "t0" should be "to"?

      Text has been updated. Thank you.

      (d) Line 405. What are the units of 30?

      It’s in arcminutes. Text has been updated.

      Reviewer #3 (Recommendations for the authors):

      I found this paper very interesting, with a solid methodological approach and excellent data analyses. The authors present a well-designed psychophysical study that contributes valuable insights into the mechanisms of attention in the foveola. The methodology is rigorous, and the analyses are thoughtfully conducted and clearly presented.

      That said, I would like to offer a few comments and suggestions for clarification and further consideration:

      (1) Exogenous attention:

      If a 100% spatially predictive cue is compared to a neutral cue, the observed attentional effect should not be described as (purely) exogenous, since the cue fully predicts where the post-cue will request a response. This situation represents a case in which attention is exogenously driven but endogenously maintained (see e.g., Chica et al., 2013, Behavioural Brain Research). I recommend clarifying this distinction in the manuscript (and title) to avoid conceptual ambiguity.

      We thank the reviewer for raising this important conceptual point. We agree that because the pre-cue was 100% spatially predictive, the resulting attentional allocation cannot be considered purely exogenous. Although the abrupt, salient onset of the cue obligatorily triggers an exogenous shift of attention, its validity could also promote endogenous maintenance of attention at the cued location. Yet, several characteristics of our task strongly limit the extent to which such endogenous engagement could meaningfully influence performance. Endogenous attentional benefits typically emerge only after ~150-200 ms (Posner & Petersen, 1990; Carrasco, 2011), whereas our cue-target SOA was 100 ms, and the target remained visible for only 50 ms. Under these temporal constraints, any voluntary, slow endogenous enhancement would primarily occur after the stimulus offset. Thus, although endogenous maintenance is theoretically possible given the cue’s validity, it is unlikely to have substantially contributed to perceptual encoding in our task.

      We also considered the possibility that our response cue (a retro-cue indicating the target location) might recruit endogenous attention to the internal perceptual representation. Importantly, however, this retro-cue was equally informative in valid and neutral conditions. Any enhancement driven by the retro-cue should therefore benefit both trial types to the same extent. The fact that we still observe a robust advantage in valid trials supports the conclusion that the performance improvements predominantly reflect fast, spatially specific exogenous facilitation rather than slower endogenous processes.

      We have revised the manuscript to clarify that although the cue obligatorily triggers an exogenous attentional shift, its 100% validity could allow for endogenous attention maintenance as shown by Chica et al. (2013). We also added an explanation detailing why such endogenous contributions are unlikely to drive our main results, given the rapid cue-target timing in our task in Discussion (lines 319-327). Finally, to further prevent ambiguity, we updated the manuscript title to refer to “exogenously triggered attention,” rather than simply “exogenous attention.”

      (2) Interpretation of statistical effects:

      The statement "Therefore, asymptotic performance showed only independent, additive effects of frequency and attention, without a systematic influence of spatial frequency on the attentional benefit" seems not to be supported by the data, as the main effect of frequency was not significant.

      We thank the reviewer for this helpful observation. We agree that the original phrasing did not accurately reflect the results, as the main effect of spatial frequency was not significant (p = .0545). We have revised the sentence to “Therefore, asymptotic performance reflected an effect of attention alone, with no detectable contribution of spatial frequency or of the interaction between spatial frequency and attention” to avoid implying such an effect (lines 210-211).

      If data from two participants were missing in one condition, the authors should consider replacing this data with new participants.

      We agree with the reviewer that having two observers with missing data in one condition is not ideal. However, the 20 cpd condition was deliberately positioned near the resolution limit at the tested eccentricity and was therefore extremely demanding. Observers also had to monitor two stimulus locations simultaneously, further increasing task difficulty. This condition was challenging for all observers and, despite testing up to the highest contrast, two of seven observers were unable to perform above chance, indicating that for a non-trivial fraction of observers, this condition was effectively unmeasurable with our paradigm. As noted in the manuscript, the 20 cpd condition also has a statistical limitation: thresholds clustered near the upper bound (approaching 100% contrast), compressing the dynamic range and markedly reducing variance relative to lower spatial frequencies, which violates the homoscedasticity assumption of linear models. For these reasons, we did not pursue additional data collection in this condition. Nevertheless, we report the data that were successfully obtained, as they remain informative about performance near the resolution limit.

      We finally note that even when setting aside the 20 CPD condition, our data support this conclusion: comparisons between 4 and 12 CPD, as well as between 8 and 12 CPD, revealed large differences in the magnitude of the attentional benefit (d = 0.65, 95% CI [0.11, 1.18] and d = 0.62, 95% CI [0.08, 1.14], respectively). To further quantify these effects, we have added Cohen’s d to report the effect sizes for these spatial-frequency comparisons across texts in Results as well as in tables in Appendices.

      (3) Sample size:

      As this is a psychophysical experiment with many trials and few participants, I am curious about how the authors determined the appropriate sample size and the number of trials required to detect the expected effects. Given that many effects were found to be significant, it seems that statistical power was adequate; however, it would be helpful if the authors could explain how this issue was addressed a priori during experimental planning.

      We appreciate that the reviewer raised this point. Please see the reply to the second point from Reviewer 1, who raised a related question about statistical power.

      (4) Figure 2 clarification:

      In Figure 2B, I do not fully understand the "Valid" and "Neutral" representation. Both conditions include a post-cue indicating the right position; however, in the neutral condition, there is a central fixation square, whereas in the valid condition, there is not. Please clarify this aspect of the figure. I think I understood the paradigm, but this part of the figure is misleading.

      Precue only exists in valid condition. But there is a mistake where fixation marker is missing in valid condition in panel B.

      We thank the reviewer for pointing this out. We have updated Figure 2 to explicitly show the sequence of valid vs. neutral trials. The fixation mark remained on the screen throughout the trial in both the valid and neutral conditions. After a 500 ms fixation period, an exogenous cue was presented for 30 ms in valid trials, followed by a 70 ms interval before stimulus onset. In neutral trials, no cue was presented, and the screen remained blank for 100 ms before the stimuli appeared. In conditions, a response cue would appear 50 ms after stimulus offset.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their manuscript entitled "Terminal tracheal cells of Drosophila are immune privileged to maintain their Foxo-dependent structural plasticity", Bossen and colleagues determine that the terminal cells of the tracheal system differ from other larval tracheal cells in that they do not typically show an Imd-dependent immune response to fungal and viral infections. The authors reach this conclusion based on the expression of a reporter line, Drs-GFP. The authors speculate that this difference may reflect differential expression of an immune pathway component, as tracheal terminal cells (TTCs) do not respond to forced expression of PRGP-LS. The authors then go on to show that, unlike the other cells of the tracheal system, terminal cells do not express PGRP-LC as reported by a GAL4 enhancer trap. Forced expression of PGRP-LC in terminal cells resulted in reduced branching, cell damage, and features of the cell death program. These effects could be suppressed by the depletion of AP-1 or Foxo transcription factors. The authors show that Foxo plays a negative role in the branching of TTCs, with ectopic branching occurring upon RNAi (or under hypoxic conditions). The authors speculate that the immune privilege of the TTCs may have evolved to permit Foxo regulation of TTC branching.

      Strengths:

      The authors provide compelling genetic data.

      Weaknesses:

      (1) The authors state that after infection 34% of larvae were not GFP+ as defined by the detection of Drs-GFP in dorsal branches. The authors should clarify if these larvae are completely without response to infection, with no Drs-GFP in dorsal trunks and or other tracheal branches. If these larvae are entirely unresponsive, could authors indicate why this might be? Also, at this point in the manuscript, the authors are somewhat misleading regarding TTC expression of Drs-GFP - they should state at this point that there are some TTCs that do express Drs-GFP, and also should address their prior study of Drs-GFP induction which does not claim exclusion of TTC Drs-GFP expression.

      GFP– indicates the absence of detectable fluorescence in regions proximal to the TTCs (dorsal branch and fusion cells). Our analysis specifically focused on these regions and did not assess fluorescence in other parts of the tracheal system. Therefore, the reported 34% of larvae classified as GFP– does not imply a complete absence of response in these animals; rather, no fluorescence was detected within our defined region of interest. To clarify how fluorescence in TTCs was quantified, we have added a schematic (new Fig. 1F). In addition, new Fig. S1 illustrates that AMP reporter activation frequently occurs in other tissues.

      Our observations are consistent with earlier reports. In the original description of the AMP reporter lines, Tzou et al. (2000; https://doi.org/10.1016/S1074-7613(00)00072-8) reported that “only a fraction of the flies or larvae exhibited fluorescence in surface epithelia, and the proportion of GFP-expressing animals was variable from one culture vial to the next. In addition, fluorescence was rarely distributed throughout the whole tissue and was limited to restricted areas of the epithelium,” suggesting that AMP reporter activation can occur locally rather than uniformly across tissues.

      In a previous study (https://doi.org/10.1186/1471-2164-9-446), we reported that airway epithelial cells, including the finest tracheal endings on target organs, can activate drosomycin transcription following infection. However, that study focused specifically on infected larvae. Importantly, it did not quantify the frequency of reporter activation or analyze TTC-specific phenotypes. As such, those statements should not be interpreted as implying uniform or ubiquitous reporter activation across all tracheal cells.

      (2) The authors describe the terminal cell phenotype as "shrunken" but this implies loss of size or pruning, however, it is not clear whether the defects could equally be due to lack of growth or slower growth.

      We omitted the term “shrunken” in the present manuscript to avoid potential misinterpretation.

      (3) Figure 1 suggests that GFP+ dorsal branches are not uniform in their expression of Drs-GFP, it seems more patchy. The authors should define the fraction of dorsal branch cells that are Drs-GFP positive. Also, are fusion cells Drs-GFP positive?

      We included a schematic illustrating our quantification approach (new Fig. 1F). We also revised the wording to clarify that GFP<sup>+</sup> animals include fluorescence not only in the dorsal branch (DB) but also in fusion cells (FCs), i.e., structures located between the dorsal trunks and the terminal tracheal cells (TTCs). Any structure in proximity to the TTCs that shows GFP expression was scored as GFP<sup>+</sup>. In most cases, GFP expression was observed in the dorsal fusion cells.

      (4) Drs-GFP expression is largely absent from terminal cells; however, a still significant # of terminal cells show expression (8%). Authors argue that PRGP-LC expression is absent based on a GAL4 transgenic line. If this line reflects endogenous PRGP-LC expression, should there not be 8% positive TTCs? Or is the 8% Drs-GFP expression independent of the IMD receptor?

      We detected PGRP-LE expression in approximately 3% of epithelial tracheal cells that expressed Drs after infection (Fig. 3F,G). This observation suggests that Drs activation can occur through a mechanism independent of PGRP-LCx. We have incorporated this finding into both the Results and Discussion sections.

      (5) Figure 2: the authors state that TTCs are negative even with induced PRGP-LE expression - should there not be at least 8% that are positive?

      We included infection of the PGRP-LE overexpression and could see Drs-GFP expression in 3 % of the cases, which we did not see without infection.

      (6) The authors compare PRGP-LC expression to induction of cell death by expression of reaper and hid. Reaper and Hid had stronger effects and eliminated TTCs. See cleavage of caspase Dpc-1 in PRGP-LC expressing cells. Is caspase cleavage always diagnostic of apoptosis or could the weaker than rpr/hid phenotype imply a different function?

      We have included the potential non-apoptotic functions of Dcp-1 in the Discussion. The weaker phenotype observed could therefore be explained by a non-apoptotic role of Dcp-1.

      (7) Drs-GFP expression is said to be "completely" absent from tracheal terminal cells when the entire tracheal system is expressing PGRP-LE.

      We have revised the wording accordingly.

      (8) Figure 5, TRE_RFP expression, is not convincing that it is higher or in terminal cells. https://doi.org/10.7554/eLife.102369.1.sa2

      We have revised the wording in line 230.

      Reviewer #2 (Public review):

      Summary:

      In this study, Bossen et al. looked at the immune status of the tracheal terminal cells (TTCs) in Drosophila larvae. The authors propose that these cells do show PGFP-LCx expression and, hence, lack immune function. Artificial overexpression of the PGRP-LCx in the TTCs causes these cells to undergo apoptosis.

      Strengths:

      Only a few groups have tried to look at the immune status of the trachea, though we know that AMPs are expressed there after infection. This exciting study attempts to understand the differences in the tracheal cells that do not produce AMPs upon infection.

      Weaknesses:

      The reason why the TTCs have some immune privilege still needs to be completely clear. Whether the phenotype is cell autonomous or contributes to the cellular immune system is not evaluated. As we know, crystal cells also maintain oxygen levels in larvae; whether in the absence of terminal trachea, the crystal cells have any role is not explored. https://doi.org/10.7554/eLife.102369.1.sa1

      In addition to the Drs-GFP reporter line, we performed new infection experiments using additional antimicrobial peptide reporters to further support our observations. While these experiments confirm the humoral immune response, they do not address the mechanisms underlying the apparent immune privilege. Our analysis therefore focuses specifically on the humoral immune response and does not allow conclusions regarding potential contributions of the cellular immune system, including crystal cells, to maintaining oxygen levels in animals with impaired TTCs. Notably, complete loss of TTCs is lethal, as demonstrated by TTC ablation using hid;rpr expression (Fig. 4F).

      Reviewer #3 (Public review):

      Summary:

      The authors report that tracheal terminal cells (TTCs) in Drosophila do not activate innate immunity following bacterial infection. They attribute this to the lack of expression of PGRP-LCx in these cells. Forced activation of the Imd pathway in TTCs leads to cell death and a reduction in tracheal branching. The authors propose a mechanism for cell death induction via pathways involving JNK, AP-1, and foxo. They suggest that the suppression of innate immunity in TTCs may serve to maintain their plasticity, preparing them for responses to hypoxic conditions.

      Strengths:

      (1) The study addresses the understudied area of immune privilege in innate immunity, providing a potentially important example in Drosophila TTCs.

      (2) The molecular characterization of the cell death pathway induced by forced Imd activation is well-executed and provides solid mechanistic insights.

      (3) The authors draw interesting parallels between Drosophila TTCs and mammalian endothelial cells, suggesting broader implications for their findings.

      Weaknesses:

      (1) The core premise of the study - that TTCs do not activate innate immunity following bacterial infection - relies heavily on a single readout (Drs reporter). Additional markers of immune activation would strengthen this crucial claim.

      We included new experiments using additional antimicrobial peptide reporter genes that show results similar to those obtained with the Drs-GFP reporter (new Fig. 1).

      (2) The evidence for the lack of PGRP-LCx expression in TTCs is based on a single GAL4 reporter line. Given the importance of this observation to the authors' model, validation using alternative methods would be beneficial.

      Although we were not able to include alternative methods to further confirm our hypothesis, we performed additional infection experiments. Upon bacterial infection, we observed a strong increase in GFP fluorescence throughout the animal and in many other tissues, while still detecting no response in the TTCs. These results further support our hypothesis.

      (3) The phenotypes observed upon forced activation of the Imd pathway in TTCs, while intriguing, may be influenced by non-physiological levels of pathway activation. The authors should address this potential caveat and consider examining the effects of more moderate pathway activation. https://doi.org/10.7554/eLife.102369.1.sa0

      We used two independent UAS-PGRP-LCx lines located on different chromosomes. One line (III) produced a stronger phenotype than the other (II). We clarified this point in the Results section (Fig. 4C,D) and added supplementary data (new Fig. S2) showing that both lines produce comparable phenotypes when expressed using an alternative tracheal driver. The epithelial thickening observed follows the same pattern as the phenotype detected in TTCs, indicating that even moderate pathway activation leads to similar effects. However, we acknowledge that this represents ectopic pathway activation and therefore likely reflects a non-physiological level of signaling.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      My particular comments on the figures are as follows:

      (1) In Figure 2, the PGRP-LCx signal should be quantified as done for Drosomycin GFP, as shown in Figure 1.

      We agree and have added a quantification.

      (2) In Figure 2F and G are the larvae infected? If not, what happens to PGRP-LCx expression post Ecc15 infection?

      We also included infected larvae to test whether infection induces GFP expression in TTCs. However, GFP expression was never observed in TTCs, although overall fluorescence increased in other tissues.

      (3) Is the effect of overexpression of LCx exaggerated post-infection? In particular when it comes to the escape phenotype.

      We induced mild Imd pathway activation by expressing PGRP-LE using a tracheal driver active in all tracheal cells, including TTCs, for 24 hours. In addition, these larvae were infected and their sensitivity to hypoxia was assessed. Animals expressing PGRP-LE in the trachea showed increased sensitivity to hypoxia, which was further enhanced following infection.

      (4) Does overexpression of anti-apoptotic genes in TTC and PGRP-LCx rescue the TTC branching?

      This point was not addressed.

      (5) Have the authors tried to rescue the larvae with shallow food?

      This point was not addressed.

      (6) Is there any effect on the circulating hemocytes or lymph glands in the PGFRP-LCx overexpressing animals?

      This point was not addressed.

      Reviewer #3 (Recommendations for the authors):

      The authors present an intriguing model of immune privilege in Drosophila tracheal terminal cells (TTCs). This model is built upon three key pillars: (1) the absence of innate immune activation in TTCs, (2) the lack of PGRP-LCx expression in TTCs, and (3) the induction of cell death when innate immunity is activated in TTCs. However, the experimental evidence supporting each of these critical points requires substantial strengthening. The reviewer recommends the following improvements and additional experiments to address these core issues:

      (1) Innate immune activation in TTCs:

      Evaluate the expression of additional antimicrobial peptide reporters to provide a more comprehensive assessment of innate immune activation in TTCs.

      In addition to the Drs-GFP reporter line, we performed new infection experiments using other antimicrobial peptide reporters to confirm our results.

      (2) PGRP-LCx expression in TTCs:

      Validate the PGRP-LCx-GAL4 line used in the study to ensure it accurately reflects endogenous PGRP-LCx expression.

      Employ complementary techniques such as in situ hybridization and antibody staining to corroborate the absence of PGRP-LCx in TTCs.

      We also included infection experiments using PGRP-LCx-Gal4 larvae. Infection did not trigger GFP expression in TTCs. However, the overall PGRP-LCx expression pattern observed in other larval tissues supports that the results reflect endogenous PGRP-LCx expression.

      (3) Cell death induction upon immune activation in TTCs:

      Address the possibility that the observed cell death is an artifact of strong, forced Imd pathway activation. To do that,

      perform control experiments activating the Imd pathway in non-TTC tracheal cells to determine if cell death is specific to TTCs.

      Use broader tracheal drivers (e.g., ppk4-GAL4 or btl-GAL4) to activate the Imd pathway and verify if cell death is indeed restricted to TTCs.

      We included results from PGRP-LCx overexpression using the tracheal driver ppk4-Gal4 and stained for the apoptosis marker Dcp-1 (new Fig. S3). We observed increased Dcp-1 signal in dorsal trunk cells, indicating that PGRP-LCx-mediated Dcp-1 cleavage is not restricted to TTCs.

      Ideally, generate a transgenic line expressing physiological levels of PGRP-LCx in TTCs and demonstrate that bacterial infection induces cell death specifically in TTCs through the proposed pathway. The reviewer acknowledges the complexity of this experiment but believe it would significantly strengthen the authors' conclusions.

      We did not generate a new transgenic line but instead used an alternative UAS-PGRP-LCx line (II), which exhibits a milder phenotype. This has now been clarified more prominently in the Results section (Fig. 4C,D). Additionally, we performed further experiments showing an epithelial thickening phenotype whose severity depends on the UAS-PGRP-LCx line used (new Fig. S2).

      In addition to the above major points

      (4) Quantitative data presentation:

      Provide quantitative analyses for the results presented in Figures 2 and 3J-K to allow for a more rigorous evaluation of the data.

      We included a quantitative analysis of the results shown in Fig. 2 (now presented in new Fig. 3). In addition, we added quantification of fluorescence in the TTCs of infected larvae.

      (5) Alternative hypothesis:

      Consider and address an alternative explanation for the lack of innate immune activation in TTCs: the potential gradient of bacterial ligands from proximal trachea to distal TTCs. If this hypothesis is correct, one might expect to see a gradient of Drs expression correlating with the distance from the proximal trachea. Addressing this possibility would strengthen the authors' proposed model.

      We now included the following paragraph as part of the discussion section.

      “An alternative explanation for the observed lack of an immune response in TTCs could be their maximal distance from the spiracles. In this scenario, a gradient of bacterial inducers along the tracheal system might be expected, resulting in a gradual decrease in immune activation from the spiracles toward the TTCs. However, this is not what we observed. In tracheae that displayed an immune response, the response was largely homogeneous along the entire length of the tracheal system, from the spiracles to the TTCs. Only at the transition to the TTCs did the immune response drop abruptly. This observation argues against the gradient hypothesis and suggests that TTCs are specifically excluded from the immune response.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary:

      In this manuscript, the authors used a leucine/pantothenate auxotrophic strain of Mtb to screen a library of FDA-approved compounds for their antimycobacterial activity and found significant antibacterial activity of the inhibitor semapimod. In addition to alterations in pathways, including amino acid and lipid metabolism and transcriptional machinery, the authors demonstrate that semapimod treatment targets leucine uptake in Mtb. The work presents an interesting connection between nutrient uptake and cell wall composition in mycobacteria.

      Strengths:

      (1a) The link between the leucine uptake pathway and PDIM is interesting but has not been characterized mechanistically. The authors discuss that PDIM presents a barrier to the uptake of nutrients and shows binding of the drug with PpsB. However it is unclear why only the leucine uptake pathway was affected.

      We observe interference of L-leucine, but not of pantothenate, uptake in mc2 6206 strain upon semapimod treatment. At present, we do not have any clue whether PDIM presents a barrier exclusively to the uptake of L-leucine. Further studies may shed a light on underlying mechanism(s) by which L-leucine uptake is modulated by this small molecule.

      (1b) We still do not know what PpsB actually does for amino acid uptake - is it a transporter?

      By BLI-Octet we do not find any interaction between L-leucine and PpsB. Therefore, we doubt that PpsB is a transporter of L-leucine.

      (1c) Does semapimod binding affect its activity?

      Our study suggests that semapimod treatment alters PDIM architecture which becomes restrictive to L-leucine. However, at present the exact mechanism is not clear. Further studies are required to thoroughly examine the effect of semapimod on Mtb PpsB activity and alterations in PDIM by mass spectrometry.

      (1d) Does the auxotrophic Mtb have lower PDIM levels compared to wild-type Mtb?

      As per the published report by Mulholland et al, and by vancomycin susceptibility phenotype in our study, both the strains appear to have comparable PDIM levels.

      (2) The authors show an interesting result where they observed antibacterial activity of semapimod against H37Rv only in vivo and not in vitro. Why do the authors think this is the basis of this observation? It is possible semapimod has an immunomodulatory effect on the host since leucine is an essential amino acid in mice. The authors could check pro-inflammatory cytokine levels in infected mouse lungs with and without drug treatment.

      Semapimod inhibits production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6, which would indeed help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth.

      (3) The authors show that the semapimod-resistant auxotroph lacks PDIM. The conclusions would be further strengthened by including validations using PDIM mutants, including del-ppsB Mtb and other genes of the PDIM locus, whether in vivo this mutant would be more susceptible (or resistant) to semapimod treatment.

      PDIM is a virulence factor, and plays an important role in the intracellular survival of the TB pathogen. Mtb strains lacking PDIM are expected to show attenuated growth during infection, even without semapimod treatment. In such a case, it might be difficult to draw any conclusions about the effect of semapimod against PDIM(-) strains in vivo.

      (4) Prolonged subculturing can introduce mutations in PDIM, which can be overcome by supplementing with propionate (Mullholland et al, Nat Microbiol, 2024). Did the authors also supplement their cultures with propionate? It would be interesting to see what mutations would result in Semr strains with propionate supplementation along with prolonged semapimod treatment. 

      Considering the fact that extensive subculturing may result in loss of PDIM, we avoided prolonged subculturing of bacteria. As presented in Fig. 6b, the WT bacteria retain PDIM. While performing the initial screening of drugs, we did not anticipate such phenotype, and hence bacteria were cultured in regular 7H9-OADS medium without propionate supplementation.

      A comprehensive future study would help examining the effect of propionate on generation of semapimod resistant mutants in Mtb mc2 6206.

      Weaknesses:

      I have summarized the limitations above in my comments. Overall, it would be helpful to provide more mechanistic details to study the connection between leucine uptake and PDIM.

      Reviewer #2 (Public review): 

      Summary

      This important study uncovers a novel mechanism for L-leucine uptake by M. tuberculosis and shows that targeting this pathway with 'Semapimod' interferes with bacterial metabolism and virulence. These results identify the leucine uptake pathway as a potential target to design new anti-tubercular therapy. 

      Strengths

      The authors took numerous approaches to prove that L-leucine uptake of M. tuberculosis is an important physiological phenomenon and may be effectively targeted by 'Semapimod'. This study utilizes a series of experiments using a broad set of tools to justify how the leucine uptake pathway of M. tuberculosis may be targeted to design new anti-tubercular therapy.

      Weaknesses

      (1) The study does not explain how L-leucine is taken up by M. tuberculosis, leaving the mechanism unclear. Even though 'Semapimod' binds to the PpsB protein, the relevant connection between changes in PDIM and amino acid transport remains incomplete.

      While Leucine uptake involves specific transporters in other bacteria, such transport system is not known in Mtb. By screening small molecule inhibitors, we came across a molecule, semapimod, which selectively kills the leucine auxotroph (mc2 6206), but not the WT Mtb. To understand the underlying mechanism of differential susceptibility of the WT and auxotrophic strains to this molecule, we evaluated the effect of restoration of leuCD and panCD expression on susceptibility of the auxotrophic strain to semapimod. Interestingly, our results demonstrated that upon endogenous expression of leuCD genes, mc2 6206 strain becomes resistant to killing by semapimod. In contrast, no effect of panCD expression was observed on semapimod susceptibility of mc2 6206. These findings were further substantiated by gene expression analysis of semapimod treated mc2 6206, which exhibits differential regulation of a set of genes that are altered upon leucine depletion in Mtb as well as in other bacteria. Overall results thus provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph.

      To further gain mechanistic insights into the effect of semapimod on leucine uptake in Mtb, we generated the semapimod resistant strain which exhibits point mutation in 4 genes including ppsB. Interestingly, overexpression of wild-type ppsB, but not of other genes, restored susceptibility of the resistant bacteria to semapimod. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As mentioned above, we anticipate that semapimod treatment brings about certain modifications in PDIM which becomes more restrictive to L-leucine. A comprehensive future study will be helpful to examine the effect of semapimod on Mtb physiology.

      (2) Also, the fact that the drug does not function on WT bacteria makes it a weak candidate to consider its usefulness for a therapeutic option.

      We agree that semapimod is not an appropriate drug candidate against TB owing to its inhibitory effect on production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6 that help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth. Therefore targeting L-leucine uptake can be a novel therapeutic strategy against TB.

      Reviewer #3 (Public review): 

      (1) Agarwal et al identified the small molecule semapimod from a chemical screen of repurposed drugs with specific antimycobacterial activity against a leucine-dependent strain of M. tuberculosis. To better understand the mechanism of action of this repurposed anti-inflammatory drug, the authors used RNA-seq to reveal a leucine-deficient transcriptomic signature from semapimod challenge. The authors then measured a decreased intracellular concentration of leucine after semapimod challenge, suggesting that semapimod disrupts leucine uptake as the primary mechanism of action. Unexpectedly, however, resistant mutants raised against semapimod had a mutation in the polyketide synthase gene ppsB that resulted in loss of PDIM synthesis. The authors believe growth inhibition is a consequence of decreased accumulation of leucine as a result of an impaired cell wall and a disrupted, unknown leucine transporter. This study highlights the importance of branched-chain amino acids for M. tuberculosis survival, and the chemical genetic interactions between semapimod and ppsB indicate that ppsB is a conditionally essential gene in a medium depleted of leucine. 

      The conclusions regarding the leucine and PDIM phenotypes are moderately supported by experimental data. The authors do not provide experimental evidence to support a specific link between leucine uptake and impaired PDIM production. Additional work is needed to support these claims and strengthen this mechanism of action.

      As mentioned above, overall results from this study provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As hitherto mentioned, it appears that semapimod treatment brings about certain modifications in PDIM which becomes restrictive to L-leucine. Future studies are required to gain detailed mechanistic insights into the effect of semapimod on Mtb physiology.

      (2) Since leucine uptake and PDIM synthesis are important concepts of the manuscript, experiments would benefit from exploring other BCAAs to know if the phenotypes observed are specific to leucine, and adding additional strains to the 2D TLC experiments to provide confidence in the absence of the PDIM band.

      We thank the peer reviewer for this suggestion. We would be happy to analyse the effect of semapimod on the level of other amino acids including BCAA by mass spectrometry.

      (3) The intriguing observation that wild-type H37Rv is resistant to semapimod but the leucine-auxotroph is sensitive should be further explored. If the authors are correct and semapimod does inhibit leucine uptake through a specific transporter or disrupted cell wall (PDIM synthesis), testing semapimod activity against the leucine-auxotroph in various concentrations of BCAAs could highlight the importance of intracellular leucine. H37Rv is still able to synthesize endogenous leucine and is able to circumvent the effect of semapimod.

      We thank the peer reviewer for this suggestion. We would explore the possibility of analysing the effect of increasing concentrations of BCAAs on mc2 6206 susceptibility to semapimod.

      Recommendations for the authors:

      (1A) Intracellular leucine can decrease from:

      inhibition of transport/uptake via semapimod as the authors claim or

      decreased uptake/requirement of many metabolites due to cells entering static growth arrest from challenge by semapimod

      To rule out the growth-inhibitory effect of semapimod on L-leucine uptake, we estimated intracellular L-leucine in Mtb after brief exposure of 24 hours to 50ng/ml semapimod (kindly refer Materials and Methods). We confirmed that 24 hours of treatment with 50ng/ml semapimod does not cause cells entering static growth arrest.

      (1B) increased consumption/utilization of leucine for some programmed response to semapimod challenge

      Our results show reduced expression of genes involved in leucine catabolism such as accD1, bkdA and bkdB in semapimod-treated cells, and thus the above hypothesis seems unlikely.

      (1C) Additional metabolites should be measured to determine the specificity of the semapimod challenge.

      As mentioned below, we measured intracellular valine in the semapimod-treated Mtb 6206 by LC-MS/MS, which shows no change in its level. These observations thus corroborate a specific effect of semapimod on L-leucine level in the cell.

      (2) The effect of Semapimod on L-leucine uptake is largely based on indirect evidence, without showing reduced transport of the amino acid. Gene expression data is not enough to prove that the amino acid transport is blocked. More compelling evidence is required to confirm this mechanism.

      The authors could perform leucine uptake assays to directly confirm the functioning of Semapimod, inhibiting L-leucine transport. Another possibility would be to try out measuring intra-bacterial leucine levels for drug-treated versus untreated M. tuberculosis strains.

      Data presented in the Fig. 3b shows lesser intracellular L-leucine upon semapimod treatment; in contrast, Sem<sup>R</sup> strain exhibits ~3-fold more intracellular L-leucine, as estimated by mass spectrometry (kindly refer our response to comment #6 below). Together, these observations indicate an inhibitory effect of semapimod on L-leucine uptake by the auxotroph.

      (3) The authors show that the overexpression of leuC-leuD restores Semapimod resistance in the auxotroph (Figs. 3C-3E). Is it possible to examine Semapimod resistance of WT-H37Rv or the complemented mutant grown in leucine-limiting conditions? This sort of evidence will be more direct on the specific drug-target beyond the auxotroph (mc<sup>2</sup> 6206).

      Because endogenous L-leucine synthesis pathway is functional in WT-H37Rv, as well as complemented auxotrophic strain, leucine-limiting conditions are unexpected to yield any effect on susceptibility to semapimod.

      Author response image 1.

      (4) Biolayer Interferometry (BLI) shows Semapimod binds to PpsB (Fig. 6); however, there is no clear evidence that it disrupts PDIM synthesis. More direct evidence would be to study the effect of Semapimod on a ppsB mutant (may be a knock-down). This would prove the specificity of Semapimod for PpsB. Likewise, it would be worth looking into the effect of Semapimod using mutant M. tuberculosis defective for PDIM synthesis.

      As recommended by the peer reviewer, we created the ppsB knockdown strain in the Mtb mc2 6206 by CRISPRi and examined its vulnerability to semapimod treatment. As can be seen in the Author response image 1, ppsB KD strain shows lesser susceptibility to semapimod when compared with the pDcas9-control strain which exhibits significant growth inhibition on the 7H11-OADS-PL agar plate containing 200nM semapimod.

      (5) Metabolomics experiments would benefit from including other control BCAAs like isoleucine and valine to determine if decreased intracellular levels of leucine are specific to semapimod or a general consequence of growth arrest from an antimicrobial agent.

      As suggested by the reviewer, we measured intracellular valine as well as proline levels in the semapimod-treated Mtb 6206 by LC-MS/MS; data presented in the supplimentry figure 5 clearly show no change in their levels upon semapimod treatment.

      (5) Figure 3c, pyrazinamide susceptibility assay could be included on the panCD strain to ensure complementation leads to functional panCD. Parent strain would be resistant to PZA, complement strain would be susceptible. (doi: 10.1038/s41467-019-14238-3).

      The wild-type Mtb 6206 is unable to grow in the absence of pantothenate. We verified resumption of growth of Mtb 6206 in 7H9-OADS-L-leucine medium lacking pantothenate upon PanCD overexpression, which provides more direct evidence of the expression of functional copies of panCD genes.

      (6) does the Sem-R mutant have increased levels of leucine?

      As can be seen in the supplimentry figure 7, Sem<sup>R</sup> strain shows ~3.0 fold increase in the intracellular L-leucine level when compared with the WT strain. In contrast, a comparable level of another BCAA– valine, is observed in both the strains

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Rayan et al. aims to elucidate the role of RNA as a context-dependent modulator of liquid-liquid phase separation (LLPS), aggregation, and bioactivity of the amyloidogenic peptides PSMα3 and LL-37, motivated by their structural and functional similarities.

      Strengths:

      The authors combine extensive biophysical characterization with cell-based assays to investigate how RNA differentially regulates peptide aggregation states and associated cytotoxic and antimicrobial functions.

      Weaknesses:

      While the study addresses an interesting and timely question with potentially broad implications for host-pathogen interactions and amyloid biology, several aspects of the experimental design and data analysis require further clarification and strengthening.

      Major Comments:

      (1) In Figure 1A, the author showed "stronger binding affinity" based on shifts at lower peptide concentrations, but no quantitative binding parameters (e.g., apparent Kd, fraction bound, or densitometric analysis) are presented. This claim would be better supported by including: (i) A binding curve with quantification of free vs bound RNA band intensities (ii) Replicates and error estimates (mean {plus minus} SD).

      We thank the reviewer for this suggestion. To quantitatively support the binding differences observed in Figure 1A, we have now performed densitometric analysis of the EMSA data and included the results in Figure S1. The analysis showed that the Kd for PSMα3 binding to polyAU and polyA RNA is in the same order of magnitude but lower for the polyAU, indicating a stronger binding. A description was added to the results in lines 137-145 of the revised version.

      (2) The authors report droplet formation at low RNA (50 ng/µL) but protein aggregation at high RNA (400 ng/µL) through fluorescence microscopy. However, no intermediate RNA concentrations (e.g., 100-300 ng/µL) are tested or discussed, leaving a critical gap in understanding the full phase diagram and transition mechanisms.

      Our initial choice of 50 ng/µL (low RNA) and 400 ng/µL (high RNA) was guided by a broader RNA titration performed by turbidity measurements across 0, 10, 20, 50, 100, 200, and 400 ng/µL (Figure S2 in the revised version). In this screen, turbidity increased up to 50 ng/µL and then decreased dose-dependently from 100–400 ng/µL. We interpret this non-monotonic behavior as consistent with a transition from a droplet rich regime (maximal light scattering at intermediate dense-phase volume) toward conditions where assemblies become larger and/or more compact and sediment out of the optical path. This is described in lines 158-161 of the revised version.

      Of note, additional intermediate RNA conditions (100 and 200 ng/µL) are included in Figure S14 (of the revised version). While these experiments were performed under the heat-shock perturbation, they nevertheless support the central point that RNA tunes assembly state across intermediate concentrations rather than producing a binary low/high outcome.

      Importantly, we agree with the reviewer that a full phase diagram would be the most rigorous way to define the transition mechanism. However, establishing csat and constructing a complete phase diagram would require systematic measurements of dilute-phase concentrations (e.g., centrifugation/quantification or fluorescence calibration), controlled ionic strength titrations, and time-resolved mapping, which is beyond the scope of the present study. We have therefore revised the text to avoid implying that we provide a complete phase diagram. Instead, we frame our results as a qualitative with multi-assay characterization showing that RNA concentration drives a shift from liquid-like condensates (at low RNA) toward solid-like assemblies (at high RNA), with an intermediate regime suggested by the turbidity transition and supported by additional imaging under stress. Finally, to address the “critical gap” concern directly, we add a sentence (lines 239-241) stating that: “Future work will be required to quantitatively define the phase boundaries and delineate the dominant mechanisms, such as sedimentation, dissolution, or coarsening/aging, across intermediate RNA concentrations”.

      (3) Additionally, the behaviour of PSMα3 in the absence of RNA under LLPS conditions is not shown. Without protein-only data, it is difficult to assess if droplets are RNA-induced or if protein has a weak baseline LLPS that RNA tunes. The saturation concentration (csat) for PSMα3 phase separation, either in the absence or presence of RNA, should be reported.

      In response to the reviewer’s request, we have added Figure 2F, which shows PSMα3 alone in the absence of RNA under the same conditions. PSMα3 does not form droplets in this condition, indicating that condensate formation is RNA-dependent in the tested conditions. This is referred to in the text in lines 190-193 of the revised version. Please see our response about determining the csat in the response to the previous comment.

      (4) For a convincing LLPS claim, it is important to show: Quantitative FRAP curves (mobile fraction and half-time of recovery) rather than only microscopy images and qualitative statements.

      We have included quantitative FRAP analysis in Figure S4 of the revised version, showing normalized recovery curves along with extracted mobile fractions and half-times of recovery (t₁/₂). These quantitative measurements support the dynamic nature of the PSMα3–RNA. This is referred to in the text in lines 179-184 of the revised version.

      (5) The manuscript highly relies on fluorescence microscopy to show colocalization. However, the colocalization is presented in a qualitative manner only. The manuscript would benefit from the inclusion of quantitative metrics (e.g., Pearson's correlation coefficient, Manders' overlap coefficients, or intensity correlation analysis).

      In response, we have added quantitative colocalization analysis to the revised manuscript. Specifically, we now report Pearson’s correlation coefficients and Manders’ overlap coefficients for the dual-channel fluorescence microscopy datasets in Figure S5 of the revised version. These metrics provide an objective measure of co-distribution and complement the qualitative imaging.

      The analysis supports that at low RNA concentrations (droplet/condensate conditions), PSMα3 and RNA show strong colocalization, consistent with RNA being incorporated within, or closely associated with, the peptide-rich phase. In contrast, at high RNA concentrations, where the assemblies are more solid-like/amyloid-positive, the quantitative coefficients decrease, consistent with reduced overlap and an apparent spatial demixing in which RNA becomes partially excluded from the peptide-rich structures. This is referred to in the text in lines 194-203 of the revised version.

      (6) In Figures 3 B and 3C, the contrast between "no AT630 at 30 min, strong at 2 h" (50 ng/μL) and "strong at 30 min" (400 ng/μL) is compelling, but a simple quantification (e.g., mean fluorescence intensity per area) would greatly increase rigor.

      We have included quantitative analysis of AmyTracker630 fluorescence intensity in Figure S6 of the revised version, reporting the mean fluorescence intensity per area for the indicated conditions and time points. This quantification supports the qualitative differences observed in Figures 3B and 3C. This is now referred to in the text in lines 233-236 of the revised version.

      (7) In Figure S3 ssCD data, if possible, indicate whether the α-helical signal increases with RNA concentration or shows a non-linear dependence, which might link to the LLPS vs solid aggregate regimes.

      The ssCD spectra displayed in Figure S7 in the revised version (corresponding to Figure S3 in the original submission) show that the α-helical signature of PSMα3 is markedly enhanced in the presence of RNA compared to peptide alone, as evidenced by increased signal intensity, deeper minima, and more pronounced spectral features characteristic of α-helical structure. Importantly, this enhancement is more pronounced at 400 ng/µL Poly(AU) RNA than at 50 ng/µL, particularly after 2 hours of coincubation, indicating that RNA concentration influences the stabilization of α-helical assemblies. This is now more specifically detailed in the text in lines 258-263 of the revised version.

      We note that solid-state CD does not allow direct quantitative deconvolution of secondary structure content (e.g., % helix) in the same manner as solution CD, due to sample anisotropy, scattering, and orientation effects inherent to dried or aggregated films. Consequently, our interpretation is qualitative rather than strictly quantitative. The ssCD data therefore suggest a non-linear dependence on RNA concentration, rather than a simple linear dose–response. This is also expected considering that phase transition, suggested by the other findings, is intrinsically non-linear.

      (8) In Figure 5B, FRAP recovery in dying cells may reflect artifactual mobility rather than biological relevance. Additionally, the absence of quantification data limits interpretation; providing recovery curves would clarify relevance.”

      We added quantitative FRAP analysis of the effect on PSMα3 within HeLa cells, shown in Figure S8 of the revised version. Compared to PSMα3 assemblies in vitro, nucleolar PSMα3 exhibits slower fluorescence recovery and a reduced mobile fraction. The nucleolus represents a highly crowded, RNA-rich cellular environment, which is expected to impose additional constraints on molecular mobility and likely contributes to the slower recovery kinetics observed in cells. This is now more specifically detailed in the text in lines 324-333 and discussed in lines 597-607 of the revised version.

      (9) The narrative conflates cytotoxicity endpoints (membrane damage, PI staining, aggregates) with localization data (nucleolar foci), creating ambiguity about whether nucleolar targeting drives toxicity or is a consequence of cell death. Separating toxicity assessment from localization analysis, or clearly demonstrating that nucleolar accumulation precedes cytotoxicity, would resolve this ambiguity.

      We thank the reviewer for raising this important point. We agree that, in the current dataset, cytotoxicity readouts (membrane damage, PI staining, aggregate formation) and subcellular localization (nucleolar accumulation) are observed in close temporal proximity, which limits our ability to unambiguously assign causality. In the experiments presented here, PSMα3 was applied at concentrations known to induce rapid membrane disruption and cytotoxicity in HeLa cells. Under these conditions, PSMα3 accumulates on cellular membranes and penetrates into the cell and nucleus on very short timescales (seconds to minutes), likely preceding the temporal resolution accessible by standard live-cell fluorescence microscopy. As a result, nucleolar accumulation and cytotoxic endpoints are detected essentially concurrently, precluding a definitive determination of whether nucleolar association actively drives toxicity or occurs as a downstream consequence of membrane permeabilization and cell damage.

      We therefore emphasize that, in this study, nucleolar localization is presented as a phenomenological observation consistent with RNA-rich compartment association, rather than as a demonstrated causal mechanism of cytotoxicity. We have revised the Discussion (lines 597-607) to clarify this distinction and to avoid implying that nucleolar targeting is the primary driver of cell death.

      We agree that resolving this ambiguity would require systematic time-resolved and concentration-dependent experiments, including analysis at sub-toxic PSMα3 concentrations below the membrane-disruptive threshold, combined with orthogonal imaging approaches. Such experiments are planned for future work but are beyond the scope of the present study.

      (10) In Figure 8, to strengthen the LLPS assignment for LL-37, additional evidence, such as FRAP analysis or observation of droplet fusion events, would be valuable. This is particularly relevant given that the heat shock conditions (65 °C for 15 minutes) could potentially induce partial denaturation or nonspecific coacervation.

      In response to this comment, we have added FRAP analysis of LL-37 assemblies in the revised manuscript (Figure S12), including representative images and corresponding fluorescence recovery curves. The FRAP measurements show minimal fluorescence recovery over the acquisition window, indicating that the LL-37–RNA assemblies formed under these conditions are largely immobile and solid-like, rather than liquid-like droplets. This is now referred to in the text in lines 458-462 of the revised version.

      Reviewer #2 (Public review):

      In this paper, Rayan et al. report that RNA influences cytotoxic activity of the staphylococcal secreted peptide cytolysin PSMalpha3 versus human cells and E. coli by impacting its aggregation. The authors used sophisticated methods of structural analysis and described the associated liquid-liquid phase separation. They also compare the influence of RNA on the aggregation and activity of LL-37, which shows differences from that on PSMalpha3. 

      Strengths:

      That RNA impacts PSM cytotoxicity when co-incubated in vitro becomes clear. 

      Weaknesses:

      I have two major and fundamental problems with this study:

      (1) The premise, as stated in the introduction and elsewhere, that PSMalpha3 amyloids are biologically functional, is highly debatable and has never been conclusively substantiated. The property that matters most for the present study, cytotoxicity, is generally attributed to PSM monomers, not amyloids. The likely erroneous notion that PSM amyloids are the predominant cytotoxic form is derived from an earlier study by the authors that has described a specific amyloid structure of aggregated PSMalpha3. Other authors have later produced evidence that, quite unsurprisingly, indicated that aggregation into amyloids decreases, rather than increases, PSM cytotoxicity. Unfortunately, yet other groups have, in the meantime, published in-vitro studies on "functional amyloids" by PSMs without critically challenging the concept of PSM amyloid "functionality". Of note, the authors' own data in the present study, which show strongly decreased cytotoxicity of PSMalpha3 after prolonged incubation, are in agreement with monomer-associated cytotoxicity as they can be easily explained by the removal of biologically active monomers from the solution.

      We thank the reviewer for this important critique and agree that direct cytotoxicity is most plausibly mediated by soluble PSM species, while extensive fibrillation generally reduces toxicity by depleting these forms, a conclusion supported by our data and by other studies (e.g., Zheng et al 2018 and Yao et al 2019). We do not propose mature amyloid fibrils as the primary toxic entities. Rather, we use the term functional amyloid in a regulatory sense, consistent with other biological amyloids whose fibrillar states modulate activity (e.g., hormone storage amyloids or RNA-binding proteins).

      In line with emerging findings, we interpret PSMα3 toxicity as arising from a dynamic assembly process rather than from a single static molecular species. We previously showed that PSMα3 forms cross-α fibrils that are thermodynamically and mechanically less stable than cross-β amyloids and readily disassemble upon heat stress, fully restoring cytotoxic activity (Rayan et al., 2023). This behavior contrasts with PSMα1, which forms highly stable cross-β fibrils that do not recover activity after heat shock, suggesting that the limited thermostability of PSMα3 is an evolved feature enabling reversible switching between inactive (stored) and active states.

      Consistent with this view, both PSMα1 and PSMα3 are cytotoxic in their soluble states, yet mutants unable to fibrillate lose activity, indicating that fibrillation is required but not itself the toxic end state (Tayeb-Fligelman et al., 2017, 2020; Malishev et al., 2018). Our other studies further show that cytotoxicity toward human cells correlates with inherent or lipid-induced α-helical assemblies, rather than with inert β-sheet amyloids (RagonisBachar et al., 2022, 2026; Salinas 2020, Bücker 2022). Together, these findings support a model in which membrane-associated, dynamic α-helical assembly, which requires continuous exchange between soluble species and growing fibrils, drives membrane disruption, potentially through lipid recruitment or extraction, analogous to mechanisms proposed for human amyloids such as islet amyloid polypeptide (Sparr et al., 2004).

      In the present study, we further show that RNA reshapes this dynamic landscape: while PSMα3 alone progressively loses activity upon incubation, co-incubation with RNA preserves cytotoxicity by stabilizing bioactive polymorphs and condensate-like states, whereas high RNA concentrations promote solid aggregation but nevertheless preserve activity. Thus, aggregation is neither inherently functional nor toxic, but context dependent and environmentally regulated. Taken together, our data support a model in which PSMα3 amyloids act as a dynamic reservoir, enabling S. aureus to tune virulence by reversibly shifting between dormant and active states in response to environmental cues such as heat or RNA.

      This is now discussed in lines 56-76 and 523-553 of the revised version.

      (2) That RNA may interfere with PSM aggregation and influence activity is not very surprising, given that PSM attachment to nucleic acids - while not studied in as much detail as here - has been described. Importantly, it does not become clear whether this effect has biologically significant consequences beyond influencing, again not surprisingly, cytotoxicity in vitro. The authors do show in nice microscopic analyses that labeled PSMalpha3 attaches to nuclei when incubated with HeLa cells. However, given that the cells are killed rapidly by membrane perturbation by the applied PSM concentrations, it remains unclear and untested whether the attachment to nucleic acids in dying cells makes any contribution to PSM-induced cell death or has any other biological significance.

      We thank the reviewer for this important point and agree that PSM–nucleic acid interactions are not unexpected and that our data do not support a direct intracellular role for RNA binding in mediating cytotoxicity. Accordingly, we do not propose nucleolar or nuclear association of PSMα3 as a causal mechanism of cell death. At the concentrations used, PSMα3 induces rapid membrane disruption, and nucleic acid association is observed along with membrane attachment, precluding conclusions about intracellular function. This limitation is now explicitly clarified in the revised manuscript. The biological significance of our findings lies instead in extracellular and environmental contexts, where PSMα3 encounters abundant nucleic acids, such as RNA or DNA released from damaged host cells or present in biofilms as now addressed in lines 622631. Our data show that RNA modulates PSMα3 aggregation trajectories, shifting the balance between liquid-like condensates and solid aggregates, and thereby regulates the persistence and timing of cytotoxic activity. In this framework, RNA acts as a context dependent regulator of virulence, rather than as an intracellular cytotoxic cofactor, an aspect which would be studied in depth in future work. This is now addressed in the text in lines 597-607 of the revised version.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Rayan et al. aims to investigate the role of RNA in modulating both virulent amyloid and host-defense peptides, with the objective of understanding their self-assembly mechanisms, morphological features, and aggregation pathways. 

      Strengths:

      The overall content is well-structured with a logical flow of ideas that effectively conveys the research objectives.

      Weaknesses:

      (1) Figure 2 displays representative FRAP images demonstrating fluorescence recovery within seconds. To gain a more comprehensive understanding of how recovery after photobleaching varies under different conditions, it is recommended to supplement these images with corresponding quantitative fluorescence recovery curves for analysis.

      In response to this comment, we have supplemented the representative FRAP images with quantitative fluorescence recovery curves, reporting normalized recovery kinetics for the indicated conditions. These data are now provided in Figure S4 of the revised manuscript, allowing direct comparison of recovery behavior across conditions (shown by microscopy in Figure 2). In addition, we have included quantitative FRAP analyses for the cellular imaging shown in Figure 5 (presented in Figure S8) and for LL-37 assemblies formed under heat-shock conditions (Figure S12). Together, these additions provide a quantitative framework for interpreting the FRAP results and strengthen the distinction between liquid-like and solid-like assembly states.

      (2) Ostwald ripening typically leads to the shrinkage or even disappearance of smaller droplets, accompanied by the further growth of large droplets. However, the droplet size in Figure 2D decreases significantly after 2 h of incubation. This observation prompts the question, what is the driving force underlying RNA-regulated phase separation and phase transition?”

      We thank the reviewer for this observation. Across multiple samples, we consistently observe a coexistence of small droplets and larger aggregates, rather than systematic growth of larger droplets at the expense of smaller ones or a uniform decrease in droplet size. In addition, the timescales examined do not allow us to reliably assess whether diffusion-driven droplet coalescence is fast enough to draw firm conclusions about droplet size evolution. This is now addressed in the text in lines 181-184 of the revised version.

      A decrease in droplet size over time is nevertheless observed in some instances and is more consistent with a time-dependent conversion of initially liquid-like condensates into more solid-like assemblies, which would reduce molecular mobility and suppress droplet coalescence. In parallel, progressive fibril formation may act as a sink for soluble peptide, leading to partial dissolution or shrinkage of less mature condensates. Together, these observations are consistent with a non-equilibrium aging process, in which RNAregulated assemblies evolve from dynamic condensates toward more solid structures rather than following equilibrium Ostwald ripening.

      (3) The manuscript aims to study the role of RNA in modulating PSMα3 aggregation by using solution-state NMR to obtain residue-specific structural information. The current NMR data, as described in the method and figure captions, were recorded in the absence of RNA. Whether RNA binding induces conformational changes of PSMα3, and how these changes alter the NMR spectra? Also, the sequential NOE walk between neighboring residues can be annotated on the spectrum for clarity.

      The solution-state NMR experiments were performed specifically to characterize the potential binding of EGCG to PSMα3. Due to the strong tendency of PSMα3 to undergo rapid aggregation and line broadening upon RNA addition, solution state NMR spectra in the presence of RNA could not be obtained at sufficient quality for residue-specific analysis. As suggested, we have updated and annotated the sequential NOE walk between neighboring residues on the relevant NOESY spectra to improve clarity.

      (4) The authors claim that LL-37 shares functional, sequence, and structural similarities with PSMα3. However, no droplet formation was observed of LL-37 in the presence of RNA only. The authors then applied thermal stress to induce phase separation of LL-37. What are the main factors contributing to the different phase behaviors exhibited by LL37 and PSMα3? What are the differences in the conformation of amyloid aggregates and the kinetics of aggregation between the condensation-induced aggregation in the presence of RNA and the conventional nucleation-elongation process in the absence of RNA for these two proteins?

      We appreciate this important question and have clarified both the basis of the comparison and the origin of the divergent phase behaviors of LL-37 and PSMα3. While PSMα3 and LL-37 share key properties as short, cationic, amphipathic α-helical peptides that self-assemble and interact with nucleic acids, they differ fundamentally in their assembly architectures. PSMα3 is an amyloidogenic peptide that forms cross-α amyloid fibrils, in which α-helices stack perpendicular to the fibril axis. In contrast, LL-37 can form fibrillar or sheet-like assemblies (observed in cryo grids), but these lack canonical amyloid features without clear cross-α or cross-β amyloid order, as so far observed by crystal structures. This is now clarified in different parts of the text of the revised version. Thus, the comparison between the two peptides is functional and physicochemical rather than implying identical amyloid mechanisms. These structural differences likely underlie their distinct phase behaviors.

      Because LL-37 does not follow a classical amyloid nucleation–elongation pathway, and high-resolution structural information (e.g., cryo-EM) is currently lacking, partly due to its sheet-like, non-twisted morphology (unpublished results), it is not possible to directly compare aggregation kinetics or nucleation mechanisms between LL-37 and PSMα3. It is possible that amyloidogenic systems such as PSMα3 exhibit greater flexibility in prefibrillar and fibrillar polymorphism, enabling RNA-regulated phase behavior, whereas non amyloid assemblies such as LL-37 are more prone to stress-induced solid aggregation. We note that this interpretation is necessarily tentative and does not imply a general rule, but rather reflects differences evident in the present system. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor Comments:

      (1) In the abstract, replacing the word "overriding" with "counteracting" may provide a scientifically neutral tone.

      In the course of revision, the abstract was substantially rewritten to more precisely convey the mechanistic framework and key conclusions of the study. As part of this rewrite, the term "overriding" was removed and the language throughout was revised to adopt a more scientifically neutral tone, consistent with the reviewer's suggestion.

      (2) In abstract, the final sentence is ambitious but heavy. It may benefit from being split into two shorter sentences, for example:

      "These findings establish RNA as a potent, context-dependent modulator of both virulent amyloids and host-defense peptides. They further reveal phase transitions as tunable regulators of peptide activity and potential therapeutic targets across infectious and neurodegenerative diseases."

      As part of the broader abstract revision, the final sentence was restructured and the abstract as a whole was rewritten to improve clarity and readability, in the spirit of the reviewer's recommendation.

      (3) In the Introduction section,

      The phenol-soluble modulins (PSMs) produced by Staphylococci contain amyloid-forming short peptides which play multiple functional roles...", consider "Staphylococcal phenolsoluble modulins (PSMs) are short, amyloidogenic peptides that perform multiple roles central to pathogenesis....

      In accordance with the suggestion, the sentence has been revised.

      (4) To improve narrative flow in the final paragraph of the Introduction, a short bridging sentence could be added, such as:

      "Given these nucleic acid interactions, we next examined whether RNA can drive phase separation or structural reorganization of these amyloidogenic peptides."

      We thank the reviewer for this helpful suggestion. It provided an opportunity to clarify an important distinction between the two peptides studied. While LL-37 can self-assemble into higher-order α-helical structures, it is not amyloidogenic, in contrast to PSMα3. We therefore revised the bridging sentence in the final paragraph of the Introduction to read: “Given their shared cationic, amphipathic α-helical character, but distinct amyloidogenic properties, we sought to examine whether RNA differentially influences the assembly landscapes and bioactivity of PSMα3 and LL-37. “

      (5) The rationale for selecting Poly(A) and Poly(AU) would benefit from further clarification. It would be helpful to specify whether these RNAs are intended to model particular host or bacterial RNA species, such as AU-rich elements, rRNA-like sequences, or mRNA-like contexts.

      Poly(A) and Poly(AU) RNAs were selected as simplified, well-defined model RNAs to probe general peptide–RNA interactions in an unbiased manner, as no prior information was available regarding whether such interactions occur or which specific RNA species might be involved. This rationale is now clarified in the revised text (lines 128–131).

      These RNAs are not intended to represent a single biological transcript, but rather generic RNA features relevant to both host and bacterial contexts, including single-stranded homopolymeric regions and AU-rich elements commonly found in mRNAs and stress srelated RNAs. The use of such reductionist RNA models to study RNA–protein interactions, phase behavior, and RNA-modulated aggregation is well established. We nevertheless agree that RNA sequence and structure may influence peptide assembly and activity, and future studies will address sequence-specific and biologically derived RNAs.

      (6) In Figure 1A, essential EMSA controls- RNA alone, peptide alone, and a nonspecific peptide or PSMα3 should be included to distinguish specific complexes from artifacts, even if presented in the supplementary information. In addition, a competition assay using unlabeled RNA would help confirm binding specificity and rule out predominantly nonspecific electrostatic interactions; these data could also be reported in the supplementary figures.

      An RNA-alone control is already included in Figure 1A of the revised version. The first lane (“0 µM”) shows free Poly(A) or Poly(AU) RNA in the absence of peptide and serves as the negative control against which PSMα3-induced mobility shifts are evaluated. A peptide-alone EMSA cannot be performed, as PSMα3 is highly cationic and does not migrate into the gel in the absence of RNA; moreover, EMSA in this format reports on RNA mobility rather than peptide migration.

      With respect to binding specificity, we compared Poly(A) and Poly(AU) RNAs and observed distinct binding behaviors, which would not be expected for purely nonspecific electrostatic interactions. In addition, the extracted Hill coefficients (>1) are consistent with cooperative binding, further arguing against simple charge-driven association. Finally, the RNA-dependent association of PSMα3 is independently supported by fluorescence microscopy and quantitative colocalization analyses, which corroborate the EMSA results. Together, these orthogonal approaches support the relevance of the observed peptide–RNA interactions.

      (7) In Figure 1B, there is a time mismatch between EMSA (30 minutes) and TEM (2 hours). If aggregation progresses over time, the EMSA pattern at 2 hours may differ. This point could be acknowledged or experimentally addressed, as RNA-peptide assemblies may evolve from liquid-like condensates to more solid aggregates.

      The EMSA and TEM experiments were intentionally performed at different time points to capture distinct stages of the PSMα3–RNA assembly process. The EMSA assay (30 minutes) was designed to probe early RNA–peptide complex formation and binding interactions, before extensive higher-order aggregation occurs. At this stage, we aim to detect mobility shifts reflecting complex formation rather than mature assemblies. In contrast, TEM was performed after 2 hours to visualize later-stage structural outcomes, including fibrillation and morphological reorganization. As aggregation progresses over time, the assemblies evolve from early RNA–peptide complexes into more ordered fibrillar structures, which are best assessed by electron microscopy at later time points. To improve clarity and avoid potential confusion, we have streamlined Figure 1 to focus on the EMSA data, which specifically addresses early binding events. The TEM data were removed from Figure 1 and are now presented in Figure 3, where later-stage structural transitions and fibrillation are shown more comprehensively and in the appropriate mechanistic context.

      (8) In Figure 1B, if feasible, complementing TEM with a confirmatory fibril assay (e.g., ThT kinetics) under the same conditions would strengthen the conclusion that the morphology difference is robust, but it is not mandatory.

      We attempted to perform ThT fibrillation kinetics under the same RNA containing conditions; however, these assays were not informative for this system. PSMα3 aggregates extremely rapidly, producing an immediate and steep increase in ThT fluorescence (Fig. S9 in the revised version), which prevents reliable resolution of RNA dependent differences in aggregation kinetics or lag phases. In addition, Poly(AU) RNA interferes with ThT readout through electrostatic interactions between the negatively charged RNA and the cationic dye, as well as through RNA-induced changes in fibril morphology, both of which complicate quantitative interpretation of fluorescence kinetics. Based on these technical constraints and prior experience with RNA–amyloid systems, ThT kinetics under identical RNA conditions would not provide a robust or interpretable confirmation of the morphological differences observed by TEM.

      (9) In Figure 1B, PSMα3 alone control is missing in TEM images.

      A TEM image of PSMα3 alone is included in Figure 3, where we systematically present fibrillation outcomes across different RNA concentrations alongside the peptide-only control. Figure 1 was streamlined to focus on early RNA– peptide interactions assessed by EMSA, whereas Figure 3 provides a comprehensive TEM analysis of later-stage structural outcomes. This organization was chosen to clearly separate early binding events from subsequent assembly transitions and to avoid redundant presentation of TEM images under similar conditions.

      (10) Although it is experimentally practical to focus on Poly(AU), the justification is very one-sided. The Poly(A) condition, which yields amorphous aggregates, may be equally informative for understanding toxicity, LLPS, or nonfibrillar states and could be discussed more explicitly.

      We agree that Poly(A)-induced amorphous aggregation is informative for understanding non fibrillar assembly states. However, the primary aim of this study was to dissect RNA-dependent regulation of fibrillar assembly and phase behavior, which is most clearly captured using Poly(AU). Poly(A) was therefore included as a comparative condition rather than as a focus for detailed mechanistic analysis. A more systematic comparison of different RNA classes and their effects on non fibrillar states and toxicity is an important direction for future work but is beyond the scope of the present study.

      (11) To improve readability of the manuscript, the main text should follow the order of the figure panels (e.g., A, B, C, D, and E) and numbers (Figure 1, 2...) sequentially, so that readers can easily align with the corresponding images.

      We have revised the manuscript to improve alignment between the main text and the figures, adjusting panel ordering and numbering where appropriate so that the text now follows the figure panels and figure numbers more sequentially. These changes were made to enhance readability while maintaining a logical visual flow within the figures.

      (12) In the result section of Figure 2, the analogy to Ddx4-like systems is a helpful concept, but should be clearly framed as an analogy, not evidence. It would be more accurate to say that the behavior is "conceptually similar to" those systems, while noting that the molecular context is significantly different.

      We have revised the text to explicitly frame the comparison to Ddx4-like systems as a conceptual analogy rather than evidence: lines 158-161 in the revised version.

      (13) In Figure 4, inclusion of positive and negative controls to validate assay performance (e.g., untreated bacteria or HeLa cells, lysis buffer, media alone) would strengthen confidence in the bioactivity measurements.

      We wish to clarify that appropriate positive and negative controls were included in all bioactivity assays and were used to normalize the data presented in Figure 4. For the HeLa cytotoxicity assay (LDH), untreated cells were used to determine spontaneous LDH release (negative control), and cells treated with the manufacturer supplied lysis buffer were used to determine maximum LDH release (positive control). The percent cytotoxicity shown in Figure 4B was calculated relative to these internal controls, as described in the Methods. For the antibacterial assay (PrestoBlue), wells containing E. coli without peptide served as the positive control for 100% viability, while wells containing sterile LB medium alone were used as blanks. Viability values in Figure 4A were normalized to these controls. We have ensured that the Methods section explicitly describes these controls to reinforce confidence in the bioactivity measurements.

      (14) To enhance clarity, consider presenting the RNA concentration and time-dependent effects on PSMα3 bioactivity in a comparison table within the main text or as a supplementary figure.

      We appreciate this suggestion and carefully considered presenting the data in tabular form. However, we found that graphical representation more effectively conveys the trends, transitions, and comparative patterns between conditions. A table would not adequately capture these relationships.

      Reviewer #2 (Recommendations for the authors):

      Further remarks:

      (1) Circumstantial evidence based on the "amyloid inhibitor", EGCG: The results with EGCG, which has been shown to have a moderate amyloid-reducing effect on PSMalpha 1 and PSMalpha4, should not be taken as evidence for amyloid-based cytotoxicity. While increased concentrations of EGCG reduced the cytotoxic effect of PSMalpha3, it is not convincingly shown that this is due to a lower concentration of amyloid vs. monomeric PSM.

      We agree that the effects of EGCG should not be interpreted as evidence for amyloid fibrils being the cytotoxic species. Our data instead support a mechanism in which EGCG primarily targets soluble PSMα3, thereby redirecting its assembly pathway and depleting bioactive species. Specifically, solution-state NMR (Fig. 7) shows that EGCG binds defined residues of monomeric PSMα3, consistent with sequestration of soluble peptide rather than selective inhibition of fibrils. Complementary light and electron microscopy, together with kinetic measurements, indicate that EGCG does not simply stabilize monomers but instead diverts PSMα3 into amorphous, non-functional aggregates, as visualized by TEM (Fig. 6B) and reflected in altered ThT responses (Fig. S9). Importantly, these EGCG-induced aggregates are non-cytotoxic (Fig. 6A/C) and fail to associate with membranes or cells, in contrast to untreated PSMα3, which forms membrane-associated assemblies and induces disruption (newly added Movies S1-S2). Thus, EGCG potentially reduces cytotoxicity by remodeling the aggregation landscape and depleting active soluble species, rather than by selectively inhibiting specific fibril formation. This clarification is now added to the Discussion in lines 554-564 of the revised version.

      (2) It is appreciated that the authors refrain from presenting the unsubstantiated concept of "functional" PSM amyloids in the discussion. However, wording in that direction must also be removed from other parts of the manuscript (e.g. "bioactive fibrillar polymorphs". "The formation of cross-alpha amyloids has been correlated with toxic activity", etc.), generally refraining from uncritically implying that amyloid formation underlies PSM biological activity, and rather discussing that the much more likely explanation of the findings is a lowering of cytolytically active, monomeric PSM concentration.

      As detailed in our response to Major Comment #1, we agree that uncritical language implying that amyloid fibrils themselves are the cytotoxic species should be avoided. Accordingly, we have revised the manuscript to consistently frame amyloid formation in regulatory terms. Aggregation, depending on context, modulates activity by altering the availability, persistence, and assembly pathways of these species. Distinct aggregation states are therefore presented as correlated with, but not equivalent to, cytotoxic activity, and as components of a dynamic assembly landscape rather than as direct toxic entities.

      (3) Discussion: "PSM alpha3 interaction with nucleic acids within human cells ...supports a comparable mechanism...". Please delete this as it is unsubstantiated.

      We agree that the original phrasing overstated the evidence. The sentence was removed and the Discussion was revised to clearly frame nucleolar accumulation as a phenomenological observation reflecting PSMα3's intrinsic nucleic acid–binding capacity, rather than as evidence for a comparable intracellular mechanism. Specifically, the revised Discussion (lines 597–607) states that nucleolar localization is "unlikely to represent a distinct intracellular toxic mechanism" and instead "reflects binding competence within RNA-rich compartments following cellular entry." The biological relevance of this interaction, particularly at sub-cytotoxic concentrations, is noted as an open question requiring further investigation.

      (4) The authors should also cite papers that have argued against their central hypothesis of "functional" PSM amyloids.

      We thank the reviewer for this suggestion. Accordingly, we have revised the manuscript to explicitly cite and discuss studies that argue against amyloid fibrils as the primary cytotoxic species, and that instead attribute PSM cytotoxicity to soluble or membrane-associated forms. These perspectives are now incorporated in the Discussion to provide a balanced view of the field and to clarify how our findings align with, and differ from, existing models of PSM activity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents a comprehensive single-cell atlas of mouse anterior segment development, focusing on the trabecular meshwork and Schlemm's canal. The authors profiled ~130,000 cells across seven postnatal stages, providing detailed and solid characterization of cell types, developmental trajectories, and molecular programs.

      Strengths:

      The manuscript is well-written, with a clear structure and thorough introduction of previous literature, providing a strong context for the study. The characterization of cell types is detailed and robust, supported by both established and novel marker genes as well as experimental validation. The developmental model proposed is intriguing and well supported by the evidence. The study will serve as a valuable reference for researchers investigating anterior segment developmental mechanisms. Additionally, the discussion effectively situates the findings within the broader field, emphasizing their significance and potential impact for developmental biologists studying the visual system.

      Weaknesses:

      The weaknesses of the study are minor and addressable. As the study focuses on the mouse anterior segment, a brief discussion of potential human relevance would strengthen the work by relating the findings to human anterior segment cell types, developmental mechanisms, and possible implications for human eye disease. Data availability is currently limited, which restricts immediate use by the community. Similarly, the analysis code is not yet accessible, limiting the ability to reproduce and validate the computational analyses presented in the study.

      In the revised version we have added an additional paragraph to the discussion section highlighting the human relevance of our work. Additionally, data is public on single cell portal and GEO, accession numbers have been updated. Codes are available on Github (https://github.com/revathi-balasubramanian/Anterior-segment-development-single-cell-data-analysis).

      Reviewer #2 (Public review):

      Summary:

      This study presents a detailed single-cell transcriptomic analysis of the postnatal development of mouse anterior chamber tissues. Analysis focused on the development of cells that comprise Schlemm's Canal (SC) and trabecular meshwork (TM).

      Strengths:

      This developmental atlas represents a valuable resource for the research community. The dataset is robust, consisting of ~130,000 cells collected across seven time points from early post-natal development to adulthood. Analyses reveal developmental dynamics of SC and TM populations and describe the developmental expression patterns of genes associated with glaucoma.

      Weaknesses:

      (1) Throughout the paper, the authors place significant weight on the spatial relationships of UMAP clusters, which can be misleading (See Chari and Patcher, Plos Comb Bio 2023). This is perhaps most evident in the assessment of vascular progenitors (VP) into BEC and SEC types (Figures 4 and 5). In the text, VPs are described as a common progenitor for these types, however, the trajectory analysis in Figure 5 denotes a path of PEC -> BEC -> VP -> SEC. These two findings are incongruous and should be reconciled. The limitations of inferring relationships based on UMAP spatial positions should be noted.

      (2) Figure 2d does not include P60. It is also noted that technical variation resulted in fewer TM3 cells at P21; was this due to challenges in isolation? What is the expected proportion of TM3 cells at this stage?

      (3) In Figures 3a and b it is difficult to discern the morphological changes described in the text. Could features of the image be quantified or annotated to highlight morphological features?

      (4) Given the limited number of markers available to identify SC and TM populations during development, it would be useful to provide a table describing potential new markers identified in this study.

      (5) The paper introduces developmental glaucoma (DG), namely Axenfeld-Rieger syndrome and Peters Anomaly, but the expression analysis (Figure S20) does not annotate which genes are associated with DG.

      (1) We agree that inferring biological relationships from the spatial arrangement of UMAP clusters has limitations and we have qualified our interpretation accordingly in the text. We have also added clarifying language to the trajectory analysis in Figure 5. The intended developmental trajectory is PEC → VP → BEC and SEC; however, the cluster labels in Figure 5 were applied incorrectly. Specifically, VP, BECs cluster was mislabeled as BECs, which led to the confusion. This cluster contains VPs that transition into BECs as well as VPs that are precursors to SECs.

      (2) We recently published the P60 dataset separately (Tolman, Li, Balasubramanian et al., eLife 2025); these data consist of integrated single-nucleus multiome profiles that were subjected to in-depth analysis. Additionally, we found that integrating the P60 dataset with the developmental datasets obscured sub-clustering of mature cell types. In future manuscripts, we will pursue a more detailed analysis of TM development and perform time point–specific clustering, similar to the approach we used for endothelial cells (Figure 4e).

      Comparing proportions of cells at different ages and as the eyes grows needs to be done cautiously. Notwithstanding the limitations, the proportions of TM1, TM2, and TM3 clusters are expected to be similar between P14 and P21 as the proportions at P14 and P60 are similar when comparing to the separately analyzed P60 data. Importantly, our dissection strategy changed with age: from P2 to P14, we removed approximately one-third of the cornea, whereas at P21 and P60 we removed most of the cornea to help maximize representation of limbal cells as the eyes grew. This change in dissection likely contributed to the reduced number of TM3 cells observed at P21. TM3 cells are enriched anteriorly (at-least in adult) and so are located closer to the corneal cut during dissection of the P21 eyes (which despite being larger than younger ages are still small and more delicate to accurately dissect than at P60) and are therefore more likely to be lost. Additional details are provided in the Methods section and the caveats surrounding our dissection method have now been included.

      (3) For Figure 3a and b, we have now pseudo-colored the spaces and provided a quantification of how both TM volume and intratrabecular spaces change with developing age (Figure 3c).

      (4) We have now included a supplemental table of markers for developing and mature TM and SC cell types (Table S3).

      (5) We have highlighted DG genes in rectangular boxes in Figure S20.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Sun et al. generated germline-specific cKO mice for the Znhit1 gene and examined its effect on male meiosis. The authors found that the loss of Znhit1 affects the transcriptional activation of pachytene. Znhit1 is a subunit of the SRCAP chromatin remodeling complex and a depositor of H2AZ, and in cKO spermatocytes, H2AZ is not deposited into the gene region. The authors claim that this is why the PGA was not activated. These findings provide important insights into the mechanisms of transcriptional regulation during the meiotic prophase.

      Strengths:

      The authors used samples from their original mouse model, analyzing both the epigenome and the transcriptome in detail using diverse NGS analyses to gain new insights into PGA. The quality of the results appeared excellent.

      Weaknesses:

      Overall, the data is inconsistent with the authors' claims and does not support their final conclusions. In addition, the sample used may not be the most suitable for the analysis, but a more suitable sample would dramatically improve the overall quality of the paper.

      Thank you for your comprehensive summary of our study and your thoughtful insights into its strengths and weaknesses. We greatly appreciate this valuable feedback, which helps us further improve our work. Below, we provide a detailed response addressing each of the points you raised.

      Reviewer #1 (Recommendations For The Authors):

      Major revisions:

      Surprisingly, many genes were upregulated in the scRNA-seq results. How many XY genes are included? Discuss why many genes are up-regulated in Fig. 5E whereas bulk RNA-seq showed only 70 genes were down-regulated. Since apoptosis-related factors are up-regulated in Fig5E, could these up-regulated genes be due to the high content of the transcriptome of dead cells? As you know, cell death starts, but randomly and violently disrupts the transcriptome, so we think it is not desirable to analyze the transcriptome with dead cells in the mix. Describe this point appropriately in the text or generate new data without dead cells.

      We sincerely appreciate the reviewer’s critical points. Below, we address each point sequentially:

      (1) To address the question about XY-linked genes, we utilized scRNA-seq data to identify differentially expressed sex chromosome genes in spermatocytes at different stages. Our analysis revealed an aberrant activation of XY-linked genes relative to controls. Specifically, 120 XY-linked genes were aberrantly activated in zygotenestage spermatocytes, and 119 XY-linked genes showed aberrant activation in pachytene-stage spermatocytes (revised Fig. 4F). This observation directly indicates that Znhit1 knockout impairs Meiotic Sex Chromosome Inactivation (MSCI), a finding that aligns with our prior characterization of XY chromosome synapsis defects in Znhit1-deficient spermatocytes.

      (2) Two key reasons explain the discrepancy between scRNA-seq and bulk RNA-seq results:

      First, scRNA-seq employs a more permissive threshold for identifying DEGs (log2 fold change [log2FC] = 0.25), thereby enhancing sensitivity to subtle expression changes and enabling the detection of more upregulated genes. In contrast, bulk RNAseq uses a stricter threshold (log2FC = 1), which filters out these subtly upregulated transcripts, resulting in fewer DEGs overall.

      Second, scRNA-seq can capture cell subset-specific differential expression. In contrast, bulk RNA-seq averages signals across mixed cells, masking such subsetspecific expression changes.

      These clarifications have been included in the Data Analysis section of the revised manuscript.

      (3) We fully agree with the reviewer’s concern that dead cells could confound transcriptomic analyses. Before downstream analysis, we excluded non-viable cells via stringent QC: cells with mitochondrial RNA (mtRNA) content exceeding 15% were removed, as high mtRNA content is a well-established marker of cell death or compromised viability. To further validate that upregulated genes were not driven by dead cell contamination, we analyzed the correlation between the expression of apoptosis-related genes and mtRNA fractions in our data. This analysis revealed no significant correlation (Pearson correlation coefficient, r = -0.02; please see Author response image 1). These results collectively rule out dead cell transcriptome contamination as the primary cause of the observed gene upregulation.

      Author response image 1.

      Scatter Chart showing the Pearson correlation between apoptosisrelated genes and mitochondrial RNA fractions in scRNA-seq data.

      Line 280-286: The data in Figures 7I and J are confusing: as shown by KAS-seq, it is natural that ssDNA is not formed in the promoter region in Znhit1-cKO sample because transcription does not proceed, but why is ssDNA formed in the enhancer region in the first place in control and then lost in Znhit1-cKO sample? Generally, it is said that in the enhancer region, including the super-enhancer region, doublestranded DNA is not dissociated, thus not forming ssDNA. Discuss why the loss of ssDNA in the enhancer region affects transcription with appropriate citations. Also, show whether genes downstream of the missing ssDNA in the promoter region have abnormal transcriptional activity, along with the RNA-seq data. Furthermore, in the region shown in Figure 7I, why the chromatin is even more open, as shown by ATACseq in Znhit1-cKO. Discuss whether this is related to transcriptional progression or aberrant substitution with H2A. If the function of ZNHIT1 is to replace H2A with H2AZ for PGA, it is not necessary to show the H2A level in Znhit1-cKO.

      We appreciate the reviewer’s constructive comments.

      (1) ssDNA dynamics in enhancer regions: Emerging evidence demonstrates that active enhancers undergo transient DNA unwinding to form ssDNA, a process critical for transcriptional regulation by transcribing enhancer RNAs (eRNA). KAS‑seq is sufficiently sensitive to detect ssDNA in enhancer regions (Kim et al., 2010; Wu et al., 2020). It has been shown that H2A.Z (deposited by the ZNHIT1-SRCAP complex) is required for maintaining enhancer accessibility and dynamic unwinding (Sporrij et al., 2023). In this study, we found that Znhit1 deletion and defective H2A.Z incorporation impaired enhancer ssDNA formation, indicating that ZNHIT-H2A.Z plays an important role in the activity of both promoter and enhancer.

      (2) Impact of ssDNA loss on transcription: To address how missing ssDNA affects transcriptional activity, we further analyzed changes in KAS‑seq signals following Znhit1 knockout. Overall, KAS‑seq signals were significantly reduced upon Znhit1 depletion, confirming that Znhit1 is essential for ssDNA formation. Further examination of KAS‑seq signals at promoters of downregulated genes also revealed reduced signals (revised manuscript, Fig. S8). In contrast, KAS-seq signals of upregulated genes remained relatively low and showed no changes in both the control and knockout groups, and their upregulation probably results from indirect regulation. These results underscore the importance of ZNHIT1-mediated chromatin states in regulating ssDNA formation and gene expression.

      (3) Aberrant chromatin openness in Znhit1-cKO (ATAC-seq): The increased chromatin accessibility detected by ATAC-seq likely represents a disorganized, nonfunctional state rather than productive transcriptional openness. H2A.Z normally constrains chromatin dynamics to facilitate ordered transcriptional regulation (Cole et al., 2021); its absence in Znhit1-cKO leads to higher ATAC-seq signals, suggesting that this aberrant openness fails to support proper assembly of the transcriptional machinery.

      Minor revisions:

      Line 106. The text says that they looked for chromatin factors, but the legend says that they looked for epigenetic factors. The text must be consistent.

      We have corrected it in the revised manuscript (line 801).

      Line 107. Although it is stated that the transcriptional data published here were used, it appears from the cited references that they are scRNA-seq data. A clear explanation is required in the text or legend.

      We have revised this data as scRNA-seq data (line 107).

      Line 141-143: Using TUNEL analysis in Figure 4F, the authors show that Znhit1cKO testis cells contain many dead cells. Describe the type or stage of the apoptotic cells.

      We appreciate the reviewer’s suggestion. Specifically, we performed TUNEL staining on testes isolated from P14 mice, a critical time point for pachytene development (revised Fig. 2D). We tested this by showing that apoptosis-related genes were significantly upregulated in pachytene-stage spermatocytes in scRNA-seq data (revised Fig. 4D). To further validate this observation, we performed scRNA-seq from P35 testis samples. The results revealed a significant reduction in late pachytene-stage spermatocytes in Znhit1-cKO samples (revised Fig. 2F), consistent with apoptotic loss of pachytene cells. Collectively, these data confirm that Znhit1 knockout impairs pachytene-stage spermatocyte development.

      The authors claimed that the loss of Znhit1 lowers the transcription of a group of genes involved in homologous recombination, including Rnf212, causing a delay in homologous recombination; however, if the process of homologous recombination is delayed, homologous chromosome pairing and synapsis are affected unless DSB repair is completed. Provide a satisfactory explanation for the fact that DNA damage remains on autosomes despite complete synapsis, as shown in Figure 3C, which is likely not solely due to delayed homologous recombination.

      Thank you for this insightful comment. We fully agree that persistent autosomal DNA damage cannot be explained solely by delayed homologous recombination. To resolve this question, we further analyzed autosomal synapsis through SYCP1 and SYCP3 staining. While autosomal synapsis appeared morphologically complete, we identified subtle but significant synapsis defects in autosomal terminal regions (revised Fig. 3A). This suggests that Znhit1 knockout also results in autosomal synapsis defects. We speculate that these synapsis defects are associated with the unresolved autosomal DNA damage we observed.

      Lines 150-163. With regard to XY unpairing in Znhit1-cKO pachytene spermatocytes, there is insufficient discussion as to whether this is due to transcriptional aberrations.

      Thank you for highlighting the need to link transcriptional aberrations to XY unpairing in Znhit1-cKO pachytene spermatocytes. To address this, we analyzed sex chromosome transcription using scRNA-seq data. Relative to controls, 120 XYlinked genes were aberrantly activated at zygotene, and 119 were upregulated at pachytene in Znhit1-cKO spermatocytes (revised Fig. 4F), directly demonstrating Znhit1 knockout disrupts Meiotic Sex Chromosome Inactivation (MSCI). Given that intact MSCI is required to stabilize XY synapsis in pachytene spermatocytes, we conclude that the observed XY unpairing is likely a direct consequence of these sex chromosome transcriptional abnormalities. We add this information to the revised manuscript (lines 221-226).

      Line 187-194. Analysis of the scRNA-seq data is shown in Figure 4, but it lists several genes as stage-specific markers, some of which do not have well-understood meiotic functions. Please cite a reference paper that provides sufficient evidence to qualify this stage.

      In response to this comment, we have refined the presentation of marker genes used for cell annotation (revised Fig. S4B). We have incorporated relevant references supporting their utility as stage-specific markers for the meiotic stages (line 187).

      Line 225-233: If Znhit1 is important for H2AZ deposition and regulates PGA through it, how does it regulate HR-related genes that are expressed earlier through H2AZ deposition during the pachytene stage? For example, Rnf212 is not specifically expressed during the pachytene stage but is one of the targets of MEIOSIN, so it is expressed at an earlier stage.

      Thank you for this insightful comment. We fully acknowledge the reviewer’s key observation that HR-related genes such as Rnf212 are MEIOSIN targets that initiate transcription at earlier meiotic stages, before the pachytene stage. Our stage-resolved scRNA-seq data further showed that the expression of Ccnb1ip1 and Rnf212 was significantly upregulated from zygotene to pachytene, following their initial transcriptional onset. We next showed that the loss of H2A.Z deposition induced by Znhit1 deletion specifically impaired this pachytene-specific secondary transcriptional activation, rather than the early MEIOSIN-driven expression onset (please see Author response image 2).

      Author response image 2.

      Plots showing the expression level of indicated genes in scRNAseq data.

      Line 245-251: As shown in Figure 6E, more than 14,000 genes have H2AZ peaks. In contrast, only approximately 60% of the genes downregulated by Znhit1-cKO appeared to be directly affected by H2AZ. Are the remaining 40% of genes regulated in a different way that is not mediated by H2AZ? Also, only a few percent of the genes with H2AZ peaks are affected, but why are only genes with A-MYB involvement affected, as shown in Figure 7?

      Thank you for these insightful and constructive comments. For the ~40% of downregulated genes not directly linked to H2A.Z, they were likely regulated through indirect mechanisms. H2A.Z deposition mediated by ZNHIT1 may influence upstream transcriptional regulators (e.g., transcription factors or coactivators), whose dysregulation in turn affects these genes.

      The selective effect of H2A.Z loss on A-MYB target genes is explained by the strict context-dependent function of H2A.Z, which requires stage-specific partner transcription factors to exert its regulatory activity. During the zygotene-to-pachytene transition, A-MYB acts as the master regulator of pachytene gene activation and forms a functional collaborative complex with H2A.Z to drive target gene transcription. Disrupted H2A.Z deposition upon Znhit1 deletion specifically impairs the activity of this A-MYB-H2A.Z complex, leading to selective downregulation of A-MYB targets. Other H2A.Z peak-associated genes may rely on alternative cofactors and compensatory mechanisms.

      Line 245-256: Figures 6 and F show that the localization of H2AZ is reduced in Znhit1-cKO mice, which means that no substitution with H2A occurs. If so, show it in the data because the localization of H2A should be increased compared to that in the control.

      To clarify the status of H2A, we have now detected immunofluorescent staining against H2A. While H2A.Z deposition was clearly impaired following Znhit1 deletion, the global level of H2A did not change significantly (Author response image 3). We speculate that this observed absence of a compensatory increase in H2A is likely due to the intrinsically low abundance of the histone variant H2A.Z relative to canonical histone H2A under physiological conditions.

      Author response image 3.

      Immunostaining of SYCP3 and H2A in spermatocyte testis sections of control and Znhit1-sKO mice, Scale bar, 40 μm.

      Reviewer #2 (Public Review):

      Summary:

      The study demonstrates that Znhit1 regulates male meiosis, with deletion causing pachytene failure associated with defective expression of pachytene genes and subtle effects on X-Y pairing and DSB repair. The authors attribute this phenotype to the defective incorporation of the Znhit1 target H2A.Z into chromatin.

      Strengths:

      The paper and the figures are well presented and the narrative is clear. Evidence that the conditional deletion strategy removes Znhit1 is strong, with multiple orthogonal approaches used. Most of the meiotic phenotyping is well performed, and the omics analysis clearly identifies a dramatic effect on the meiotic gene expression program. The link to H2A.Z and A-MYB adds a mechanistic angle to the study.

      Weaknesses:

      (1) Current literature demonstrates that meiotic mutants arrest at one of two stages: midpachytene (stage IV of the seminiferous cycle) or metaphase I (stage XII of the seminiferous cycle). This study documents that in the Znhit1 KO the midpachytene marker H1t appears normally, but that cells arrest before diplotene. If this is true, then arrest must occur during late pachytene, which based on my knowledge has never been documented for a meiotic KO. To resolve this, the authors should present stronger histological substaging evidence to support their claim.

      Thank you for this insightful and constructive comment. To achieve highresolution tracking of cell lineage progression, we performed scRNA-seq analysis using P35 testes in this revised manuscript. scRNA-seq data showed that germ cells normally progressed through all meiotic stages and successfully gave rise to spermatids in control groups. By contrast, in the Znhit1 knockout group, late pachytene spermatocytes decreased significantly, and only very few subsequent germ cell types were observable (revised Fig. 2F, G). In scRNA-seq data, although very few diplotene spermatocytes and meiotic metaphase I cells were detectable, these cells still appeared abnormal, as evidenced by their extremely low Pou5f2 expression. We have revised our description of the meiotic arrest stage in the manuscript.

      (2) The authors overlooked the possible effects of Znhit1 deletion on MSCI. Defective MSCI is a well-established cause of pachytene arrest. Actually, the fact that they see X-Y pairing failure should alert them even more strongly to this possibility because MSCI failure is often associated with defective X-Y pairing. This could be easily addressed by examination of their RNAseq data.

      To address the concern that Znhit1 deletion may impact Meiotic Sex Chromosome Inactivation (MSCI), we analyzed XY-linked gene expression using scRNA-seq data from spermatocytes at distinct stages. Our analysis revealed aberrant activation of XY-linked genes in Znhit1-CKO spermatocytes relative to controls. Specifically, 120 XY-linked genes were activated at zygotene, and 119 XY-linked genes were upregulated at pachytene (revised Fig. 4F). This observation directly demonstrates that Znhit1-CKO impairs MSCI, which aligns with our prior characterization of defective X-Y chromosome synapsis in Znhit1-deficient spermatocytes. To explicitly resolve this concern, we have integrated these MSCIfocused RNA-seq analyses into the revised Results section (lines 221-226).

      (3) The recombination assays need attention.

      In the text the authors state that they studied RPA2 and DMC1, but the figures show RPA2 and RAD51.

      The RPA counts are not quantitated.

      The conclusion that crossover formation fails (based on MLH1 staining) is not justified. This marker does not appear in wt males until late pachytene, so if cells in this mutant are dying before that stage, MLH1 cannot be assessed.

      The authors state that gH2AZ persists in the KO, but I'm not convinced that they are comparing equivalent stages in the wt and KO. In Figure 3C, the pachytene cell is late, whereas in the mutant the pachytene cell is early or mid (when residual gH2AX is expected, even in wt males).

      Previous work (PMID: 23824539) has shown that antibodies reportedly detecting pATM in the sex body are non-specific. I therefore advise caution with the data shown in Figure 3D.

      We appreciate the reviewer’s detailed feedback on our recombination assays and have addressed each concern as follows:

      (1) Discrepancy between text and figures (RPA2/DMC1 vs. RPA2/RAD51): We have corrected this in the revised manuscript.

      (2) Quantitation of RPA2 foci: We have supplemented quantitative analysis of RPA2 foci (revised Fig. S3).

      (3) Conclusion on crossover failure: Single-cell RNA sequencing data from P35 testes definitively confirmed that Znhit1 knockout spermatocytes successfully progressed to the late pachytene stage, ruling out the possibility that our MLH1 staining results are confounded by cell death or arrest before this critical stage. In addition, analysis of transcriptome datasets revealed significant downregulation of important genes required for homologous recombination and crossover formation, including Ccnb1ip1 and Rnf212. Reduced expression of these essential factors may impair the assembly of MLH1 crossover foci. These data demonstrate that ZNHIT1 is essential for proper homologous recombination and crossover formation during male meiosis. We have revised the text to emphasize this context.

      (4) γH2AX persistence and stage matching: We have replaced the images with more representative, stage‑matched pachytene spermatocytes from wild‑type and Znhit1‑KO mice (revised Fig. 2C). Furthermore, prompted by the insightful comment from Reviewer 1, we carefully re‑examined autosomal synapsis and identified abnormal synapsis specifically at the terminal regions of autosomes in Znhit1‑deficient spermatocytes (revised Fig. 3A). These data together confirm that ZNHIT1 is essential for DSB repair during male meiotic prophase I.

      (5) pATM staining issue: Following the reviewer’s advice, we carefully reviewed the relevant literature (PMID: 23824539) and confirmed that the anti‑pATM antibody may exhibit non‑specific staining on the XY chromosomes. Accordingly, we have removed the pATM staining data presented in Figure 3D from the revised manuscript to ensure the accuracy and rigor of our results.

      (4) RNAseq data. The authors show convincingly that Znhit1 activates genes that are normally upregulated at the zyg-pachytene transition. They should repeat the analysis for genes normally upregulated at the prelep- lep and lep-zyg transition to show that this effect is really pachytene-gene specific.

      We appreciate this suggestion. To clarify the stage specificity of ZNHIT1’s regulatory role, we analyzed genes upregulated at the prelep-lep and lepzyg transitions. Our results showed that Znhit1 knockout had little impact on the overall expression levels of these genes (as shown in revised Fig. 4B). In contrast, as we previously reported, genes upregulated at the zygotene-pachytene transition were remarkably downregulated in Znhit1-cKO. These findings further confirm the specificity of ZNHIT1 in regulating pachytene gene expression.

      (5) I am puzzled that the title and overall gist of the study focuses on H2A.Z, when it is Znhit1 that has been deleted.

      We appreciate the reviewer’s observation and have revised the study title as suggested. Specifically, the title is now updated to “ZNHIT1-dependent H2A.Z deposition at meiotic prophase I underlies pachytene gene expression and meiotic progression during male meiosis.”

      Reviewer #3 (Public Review):

      Summary:

      Sun et al. present a manuscript detailing the phenotypic characterization of loss of Znhit1 in male germ cells. Znhit1 is a subunit of the chromatin regulating complex SRCAP that functions to deposit the histone variant H2A.Z. Given that meiosis, and specifically meiotic recombination, occurs in the context of the dynamic condensing of chromosomes, the role of chromatin regulators in general, and histone variants specifically, in mammalian meiosis is an active area of research. Previous work has shown that H2A.Z is found at the locations of recombination in plants, although H2A.Z was previously not found at recombination sites in mammalian meiosis. Here the authors use a conditional approach to ablate Znhit1 in spermatocytes and characterize a block in meiosis in prophase I in the transition from pachytene to diplotene stage.

      Strengths:

      The authors combine current methods in immunohistochemistry and functional genomics to provide strong evidence of meiotic block upon the loss of Znhit1. They find that loss of Znhit1 leads to reduced incorporation of the histone variant H2A.Z, specifically at promoters and enhancers. Further, RNA sequencing found more genes are down-regulated upon loss of Znhit1 compared to upregulated, suggesting that incorporation of H2A.Z is critical for the expression of genes necessary for successful meiotic progression.

      A strength of the manuscript is tying the locations of changes in H2A.Z deposition with binding of the transcription factor A-MYB, providing a mechanism that can potentially combine the changes in chromatin regulation with variable binding of a transcription factor in gene expression in pachytene stage spermatocytes.

      Weaknesses:

      A weakness in the single-cell RNA experiment using cells from 16-day-old male mice. The authors suggest that the rationale for the experiment was to determine where the Znhit1-sKO mutant showed an arrest in meiosis, and claim that this is the pachytene stage. However, in the 'first wave' of meiosis 16-day-old mice are just beginning to enter pachytene, so cells from later meiotic stages will be largely absent in these tubules. This is clear from the UMAP showing a similar pattern of cell distributions between wild-type and mutant mice. Using older mice would have better demonstrated where the mutant and wild-type mice differ in cell-type composition.

      We appreciate the reviewer’s constructive comment. To resolve this issue, we have added new scRNA‑seq data from testes of P35 mice, which harbor a full spectrum of meiotic stages, including late pachytene, diplotene, metaphase I spermatocytes, and post-meiotic spermatids. Compared with wild-type controls, Znhit1-sKO testes exhibited a marked reduction in late pachytene spermatocytes and a near-complete loss of post-pachytene cell types, directly validating the pachytenestage meiotic arrest (revised Fig. 2F, G). All updated analyses have been integrated into the manuscript to strengthen our conclusions.

      The authors use the term pachytene genome activation (PGS) in the manuscript to suggest a novel process by which genes are specifically increased in expression in the pachytene stage of meiotic prophase I, without reference to literature that establishes the term. If the authors are putting forward a new concept defined by this term, it would strengthen the manuscript to describe it further and delineate what the genes are that are activated and discuss potential mechanisms.

      We appreciate the reviewer’s valuable feedback on our use of the term "pachytene genome activation (PGA)".

      To address this, we have revised the text to explicitly frame PGA as a stage-specific transcriptional program observed in our data, defined by the coordinated upregulation of a distinct set of genes during the pachytene stage of meiotic prophase I.

      (1) Definition and Gene Set: Using the scRNA-seq dataset, we formally defined PGA as the transcriptional wave characterized by genes with increased expression in pachytene vs. zygotene spermatocytes (n = 1,560 genes). Functional enrichment analysis shows these genes are primarily involved in DNA repair, cilium organization, and spermatid development (Table S3), consistent with the biological process of germ cell development.

      (2) Relationship to existing literature: While PGA as a term is not widely established, our data align with prior observations of pachytene-specific transcriptional upregulation (Alexander et al., 2023; Ernst et al., 2019; Turner, 2015). Importantly, Alexander et al reveals that in late meiotic stages, starting from pachynema, chromatin has a ~3-fold increase in transcription. We have added these citations to clearly illustrate the relevant advances in the field (lines 68-71).

      (3) Regulation of pachytene-stage gene expression: We further delineate that PGA is regulated by ZNHIT1-dependent H2A.Z deposition. Znhit1 deletion resulted in significant downregulation of 70.1% (1,094 out of 1,560) of these genes. This links PGA to chromatin-based regulation, where ZNHIT1-dependent H2A.Z deposition enables pachytene-specific transcription.

      Generally speaking, the authors present solid evidence for a pachytene block in male germ cell development in mice lacking Znhit1 in spermatocytes. The evidence supporting a change in gene expression during pachytene, that more genes are downregulated in the mutant compared to increased expression, and changes in histone modification dynamics and placement of H2A.Z all support a role in alterations in meiotic gene regulation. However, the support that changes in H2A.Z impacting meiotic recombination (as suggested in the manuscript title) is less supported, rather than a general cell arrest in the pachytene stage leading to cell death. The conclusions around the role of Znhit1 influencing meiotic recombination directly could use further justification or mechanistic hypothesis.

      We acknowledge the reviewer’s comments. Indeed, existing data support the presence of a pachytene block in spermatocytes of Znhit1-deficient mice, along with aberrant pachytene gene expression and impaired H2A.Z deposition.

      In response, we made the following revisions: (1) we adjusted the manuscript title and conclusion to reduce emphasis on a direct H2A.Z-recombination link, and focus instead on ZNHIT1/H2A.Z in pachytene gene regulation and meiotic progression; (2) recombination defects may be indirect consequences of failed pachytene gene regulation, rather than a direct regulatory effect of ZNHIT1 on recombination machinery (lines 314-319).

      Reviewer #3 (Recommendations For The Authors):

      Quality of the images for meiotic spreads - images have low contrast and are tiny. It is difficult to see the SYCP3 results even when the images are magnified on the computer screen.

      We have provided new images with high resolution to ensure a clear visualization of SYCP3 signals.

      Line 165 - indicates the results for DMC1, although the figure suggests the results are for RAD51 foci.

      We have corrected this mistake.

      Line 306 - this manuscript 'confirms' that H2AZ is not found at mammalian recombination sites, a result already in the literature.

      We have corrected this mistake (lines 309-312).

      Reviewing Editor Comments:

      Major points and revisions highlighted by the reviewers:

      (1) Meiotic prophase in Znhit1KO: The main questions to clarify are the stage and status of progression, the analysis of apoptosis, and the consequences of gene expression on the X and Y. Additional analysis for DSB repair foci, gH2AX is also required. Those analysis are needed to answer to reviewer 2. Even if H2AZ was not detected at recombination hotspots, it may be possible that it plays a role in DSB repair but the level is too low for detection. This should be discussed as H2AZ was shown to be involved in DNA repair.

      We sincerely appreciate the reviewing editor’s constructive comments.

      (1) Stage and progression of meiotic prophase: We supplement P35 testes for scRNAseq. Results confirmed Znhit1-KO spermatocytes arrest at late pachytene, and postpachytene stages (diplotene, metaphase I) were nearly absent (revised Fig. 2F, G).

      (2) Apoptosis analysis: We studied this by demonstrating that apoptosis-related genes were upregulated in pachytene spermatocytes at the single-cell level (revised Fig. 4D). To further validate this finding, we performed scRNA-seq analysis on P35 testis samples. Our results revealed a marked reduction in late pachytene spermatocytes in Znhit1-cKO testes (revised Fig. 2F, G), consistent with apoptotic depletion of pachytene-stage cells. Together, these data confirm that Znhit1 ablation impairs pachytene-stage spermatocyte development.

      (3) X/Y gene expression consequences: To address this key point, we performed stage-resolved analysis of XY-linked gene expression using scRNA-seq data from different-stage spermatocytes. Compared with controls, we detected aberrant ectopic activation of XY-linked genes in Znhit1-KO spermatocytes: 120 XY-linked genes were inappropriately activated at zygotene, and 119 remained abnormally upregulated at pachytene (revised Fig. 4F). These results provide direct evidence that Znhit1 deletion impairs Meiotic Sex Chromosome Inactivation (MSCI).

      (4) DSB repair issue: We have replaced the images with more representative, stage‑matched pachytene spermatocytes (revised Fig. 3C). The revised images show consistently increased γH2AX signals in Znhit1-KO spermatocytes. Prompted by Reviewer 1’s comment, we identified abnormal synapsis at autosomal terminal regions in mutant cells. Together, these results confirm that ZNHIT1 is essential for DSB repair during male meiotic prophase I.

      (5) Potential role of H2A.Z in DSB repair: Though H2A.Z was nearly undetectable at recombination hotspots, we discuss two possibilities: (1) ZNHIT1-H2A.Z depletion dysregulated DSB repair-related genes; (2) Current ChIP-seq sensitivity may miss low-abundance H2A.Z at hotspots, which could support repair via chromatin remodeling. Future high-resolution assays (super-resolution imaging, DSB-targeted ChIP-seq) are proposed to validate this. We agree that recombination defects may be indirect consequences of failed pachytene gene regulation, rather than a direct regulatory effect of ZNHIT1 on recombination machinery.

      (2) Gene expression analysis. The first consequence of H2AZ depletion is gene expression downregulation. However, it may be not surprising that some genes are down and others upregulated. There are likely secondary and indirect effects including the upregulation of some genes. The authors should explain and discuss this point such as to answer to questions raised by reviewer 1 and 2.

      The primary consequence of H2A.Z depletion in pachytene spermatocytes is indeed widespread downregulation of genes. For the coexistence of upregulated genes, we explain this via three key points.

      (1) Technical differences between scRNA-seq and bulk RNA-seq (addressing Reviewer 1): scRNA-seq captures cell-type-specific differentially expressed genes that bulk RNA-seq masks (bulk averages signals across mixed cells, hiding changes in rare subsets). Additionally, scRNA-seq uses a lower log2(fold change) threshold (0.25 vs. 1 in bulk RNA-seq), detecting subtle upregulations missed by bulk analysis.

      (2) No dead cell contamination (addressing Reviewer 1): Stringent quality control excluded cells with >15% mitochondrial RNA. Apoptosis-related genes showed no significant correlation with mitochondrial RNA fractions (Pearson correlation coefficient, r = -0.02; please see Author response image 1), ruling out dead cell transcriptome interference.

      (3) Secondary/indirect effects (addressing Reviewers 1 & 2): Upregulated genes likely result from indirect regulatory cascades. H2AZ depletion may disrupt upstream transcription factors, leading to compensatory upregulation of their downstream genes or cell stress responses to meiotic arrest. Notably, Znhit1 knockout specifically impacts genes upregulated at the zygotene-pachytene transition, while genes upregulated at preleptotene-leptotene or leptotene-zygotene transitions remain largely unaffected (revised Fig. 4B), confirming the specificity of H2A.Z’s direct regulatory role and framing upregulation as non-targeted indirect effects.

      (3) The authors should also test the effect of Znhit1KO on the 1196 genes (up PreL/L) and 1325 (up L/Z) as shown in Figure 5D for the PGA. Also in Figure 5B, there is no evaluation of the statistical significance of the variation, this should be revised. X and Y genes should be analysed. KAS-Seq should be correlated with gene expression analysis, and several points as mentioned in the reviews below should be better explained and discussed.

      (1) Effect of Znhit1-KO on PreL/L- and L/Z-upregulated genes: we analyzed the 1196 genes upregulated at the PreL/L transition and 1325 genes upregulated at the L/Z transition. Znhit1 knockout had minimal effect on the expression of these early meiotic gene sets (revised Fig. 4B), whereas genes activated at the zygotene‑pachytene transition were strongly downregulated in Znhit1-KO spermatocytes. These results confirm the specific role of ZNHIT1 in regulating pachytene‑stage gene expression. We have also added a statistical evaluation for the variation shown in Fig. 4B.

      (2) X/Y-linked gene analysis: Analysis of stage‑resolved scRNA‑seq revealed aberrant ectopic activation of 120 XY‑linked genes at zygotene and 119 at pachytene in Znhit1-KO spermatocytes (revised Fig. 4F), demonstrating impaired Meiotic Sex Chromosome Inactivation (MSCI).

      (3) KAS-seq correlation with gene expression: We analyzed the link between KAS‑seq signals and gene expression, and we found that Znhit1 depletion caused a global reduction in KAS‑seq signals, especially at promoters of downregulated genes (revised Fig. S8). Genes with increased expression showed low KAS‑seq signals in both control and mutant groups, likely reflecting indirect regulation. These results highlight the essential role of ZNHIT1 in transcriptional regulation.

      (4) The title should refer to Znhit1, and the effect on meiotic recombination activities may be an indirect consequence of prophase progression arrest, even if some recombination genes are downregulated. This point is important as noted by reviewer 3.

      We fully acknowledge Reviewer 3’s key point and have revised the manuscript title to “ZNHIT1-dependent H2A.Z deposition at meiotic prophase I underlies pachytene gene expression and meiotic progression during male meiosis” to reduce emphasis on a direct H2A.Z-recombination link.

      Regarding meiotic recombination activities: The downregulation of recombinationrelated genes (e.g., Ccnb1ip1, Rnf212) stems from impaired pachytene-stage transcriptional programs caused by ZNHIT1-dependent H2A.Z deposition defects, which in turn leads to prophase progression arrest. Thus, the observed recombination abnormalities may be a secondary consequence of the meiotic prophase arrest, rather than a direct regulatory effect of ZNHIT1 on recombination machinery. This clarification has been integrated into the Discussion section (lines 314-318).

      (5) The recent structural analysis of SRCAP should be cited: Yu et al. Cell Discovery (2024) 10:15 https://doi.org/10.1038/s41421-023-00640-1.

      We have cited this reference in this revised manuscript (lines 234-236).

      (6) The authors should read and answer the specific revisions asked for by the reviewers.

      We have thoroughly read and systematically addressed all specific revisions requested by Reviewers 1, 2, and 3, as detailed in the revised manuscript and supplementary data.

      References

      Alexander, A.K., Rice, E.J., Lujic, J., Simon, L.E., Tanis, S., Barshad, G., Zhu, L., Lama, J., Cohen, P.E., and Danko, C.G. (2023). A-MYB and BRDT-dependent RNA Polymerase II pause release orchestrates transcriptional regulation in mammalian meiosis. Nature communications 14.

      Cole, L., Kurscheid, S., Nekrasov, M., Domaschenz, R., Vera, D.L., Dennis, J.H., and Tremethick, D.J. (2021). Multiple roles of H2A.Z in regulating promoter chromatin architecture in human cells. Nature communications 12, 2524.

      Ernst, C., Eling, N., Martinez-Jimenez, C.P., Marioni, J.C., and Odom, D.T. (2019). Staged developmental mapping and X chromosome transcriptional dynamics during mouse spermatogenesis. Nature communications 10, 1251.

      Kim, T.K., Hemberg, M., Gray, J.M., Costa, A.M., Bear, D.M., Wu, J., Harmin, D.A., Laptewicz, M., Barbara-Haley, K., Kuersten, S., et al. (2010). Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182-187.

      Sporrij, A., Choudhuri, A., Prasad, M., Muhire, B., Fast, E.M., Manning, M.E., Weiss, J.D., Koh, M., Yang, S., Kingston, R.E., et al. (2023). PGE(2) alters chromatin through H2A.Z-variant enhancer nucleosome modification to promote hematopoietic stem cell fate. Proceedings of the National Academy of Sciences of the United States of America 120, e2220613120.

      Turner, J.M. (2015). Meiotic Silencing in Mammals. Annu Rev Genet 49, 395-412. Wu, T., Lyu, R., You, Q., and He, C. (2020). Kethoxal-assisted single-stranded DNA sequencing captures global transcription dynamics and enhancer activity in situ.

      Nature methods 17, 515-523.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Al Asafen and colleagues apply a set of scanning fluorescence correlation spectroscopic approaches (Raster Image Correlation Spectroscopy (RICS), cross-correlation RICS, and pair-correlation function spectroscopy) to address the nuclear-cytoplasmic kinetics of the Dorsal (Dl) transcription factor in early Drosophila embryos. The Toll/Dl system has long been appreciated to establish dorsal-ventral polarity of the embryo through Tolldependent control of Dl nuclear localization, and provides an example of a morphogen gradient produced with high enough precision to yield robust biophysical measurements of general transcription factor activity and function. By measuring GFP-tagged Dl protein, either in wild-type embryos or in mutant embryos with low/medium/high levels of Toll signaling, the authors report diffusivity of Dl in nuclear and cytoplasmic compartments of the embryo, as well as the fraction of mobile and immobile Dl, which can be correlated with DNA binding through cross-correlation RICS. A model is presented where Cactus/IkB is implicated in preventing Dl from binding to DNA.

      Strengths:

      The experiments on wild-type GFP-tagged Dorsal are performed well, are mostly reported well, and are interpreted fairly.

      Weaknesses:

      The discrepancy between experiment and theory as pertains to Michaelis-Menten kinetics is not fully motivated in the text, and could benefit from a more clear presentation. The experiments performed to distinguish between the contribution of Toll-dependent phosphorylation and Cactus interaction models for limiting Dorsal DNA binding are possibly confounded by the presence of wild-type, GFP-tagged Dorsal protein.

      Thank you for your thoughtful feedback. Regarding the discrepancy between experiment and theory in relation to Michaelis-Menten kinetics, we recognize that our initial explanation may not have been explicit enough. Our intent was to illustrate that if DNA binding is a saturable process, then while the absolute concentration of Dl bound to DNA will increase with total Dl levels, the fraction of Dl bound to DNA will decrease. We used Michaelis-Menten kinetics only as a familiar example to convey this concept but did not intend to suggest that the system strictly follows Michaelis-Menten behavior. To clarify this point, we removed mention of Michaelis-Menten as an illustrative analogy and stuck specifically with discussing the system as “saturating.” This primarily affected text in the paragraph starting on Line 204, but also Lines 323-325.

      Regarding the concern about potential confounding effects due to the presence of wildtype GFP-tagged Dorsal (Dl[wt]-GFP): we understand the importance of addressing this point more directly. Therefore, we have imaged the Dorsal-GFP gradient in embryos expressing the UAS-dl[S280P]-GFP or the UAS-dl[S317A]-GFP constructs in the absence of the BAC-recombineered Dl-GFP construct. In both cases, the dl mutants by themselves were not able to recapitulate enough of the Dl gradient to test our hypotheses. We have added this analysis to Supplemental Figure 4 and mentioned this figure on Lines 333-336 and 354-358. Furthermore, we explicitly mention that it is possible the reason why we failed to reject the null hypothesis in the Toll phosphorylation mutant case may be due to the additional copy of Dl[wt]-GFP (the BAC recombineered construct), with text added to Lines 343-345, 365-369 (Results) and 408-418 (Discussion).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Al Asafen, Clark et al., use fluorescence correlation spectroscopy (FCS) to quantitatively analyze the mobility of Dl along the DV axis of the early Drosophila embryo. Dl is essential for dorsal-ventral (DV) patterning and its gradient initiates the activation of several genes and thereby orchestrates the formation of the Drosophila body plan. While the mechanisms underlying the formation of the Dl gradient have been extensively studied by this group and others, there are some observations for which there is not yet a mechanistic explanation. For example, the peak of the Dl gradient grows continuously during nuclear cycles 10-14. This is likely due to Cact-dependent Dl diffusion and Dl binding to DNA. However, the biophysical parameters governing Dl nuclear dynamics that would support these claims have not been previously measured. In this work, the authors provide evidence that GFP-tagged Dl may be separated into a mobile pool and an immobile pool. Interestingly, the fraction of immobile Dl is position-dependent along the DV axis, revealing more binding to DNA in the ventral than in the dorsal nuclei. This is either due to higher binding affinity in ventral locations (due to Toll-dependent Dl phosphorylation) or to higher Dl-Cact binding in dorsal nuclei that would prevent Dl from binding to DNA. Using dl-mutant alleles, the authors support the latter hypothesis.

      Strengths:

      The manuscript is well written and their conclusions are convincingly supported by their methodology and analysis. As a quantitative study, the biophysical analysis seems rigorous, in general.

      Although this is not the first study that employs FSC to investigate the dynamics of a morphogen, it further exemplifies how these quantitative tools can be used to uncover mechanistic aspects of morphogen dynamics during development. In particular, the manuscript reports novel biophysical parameters of Dl dynamics that will be helpful in future hypotheses-driven modeling studies.

      Weaknesses:

      In my opinion, the main weakness of the manuscript is that the main biological implication of the study, namely that the asymmetry in the fraction of immobile Dl is a result of nuclear Dl-Cact binding which prevents Dl from binding DNA (Figure 5), occurs in a region of the embryo where there is very little Dl anyways (Figure 1A, 5A). While it is interesting that the fraction of immobile Dl increases (just a little, but significantly) in dorsal nuclei in mutants expressing a form of Dl with reduced Cact binding it is unclear what is the biological impact of this effect in a location where Dl is nearly absent. As can be seen in Figure 3F, the fraction of immobile is unaffected in Dl-mutant forms with reduced DNA binding, because it is already very low. It is unlikely that Dl binding to Cact in dorsal nuclei would affect shuttling as well since the fraction is very low anyway.

      We thank the reviewer for pointing out the places where we could strengthen our explanations. Here we first address the criticism, also raised by the other reviewer, that the fraction of immobile Dl increases only a small amount (Fig. 5A). [In our reply to the next comment, we address the question of biological implications.] We attempted to explain this small effect size in the manuscript; however, we understand that we could clarify further and, given the fact that eLife has no restraints on space, we added more explanation in the main text.

      In essence, even though the effect was statistically significant, the effect size was small because the mutation was “diluted” by the presence of a wildtype Dl protein tagged with GFP. We were willing to deal with this dilution because the alternative was that, according to previous literature, without any wildtype Dl, no Dl gradient would be present in the reduced Toll phosphorylation mutants, and only a very weak Dl gradient (weakened on both ends) would be present in mutants that reduced Cact binding. We were confident that, with our quantitative approaches, we would be able to detect the diluted effect.

      However, because both reviewers have criticized this diluted effect, in this resubmission, we have included analysis of GFP-tagged mutants without the presence of wildtype Dl protein. Unfortunately, these embryos lack a discernible Dl gradient and cannot be analyzed in such a way as to test the hypotheses that the mutants were generated for.

      Even so, the effect of the Cact-binding mutant was strong enough that we were able to statistically distinguish it from embryos expressing only wildtype Dl-GFP, even with the dilution effect. On the other hand we have also included a caveat that our failure to statistically distinguish Toll phosphorylation mutants from wildtype may be due to the dilution effect. We now also explicitly state the concerns about a lack of a discernible Dl gradient and have included figures of full mutants in the supplement. See also our discussion of Reviewer 1’s similar comment.

      While the authors have a very clear understanding of the biology of the Dl gradient, I feel that the manuscript is more written as a 'tools' paper (i.e., to exemplify how FSC methods and analysis can be used for biological discovery). This is ok, but I think that the authors should discuss further what are the biological implications of these findings other than the contribution to uncovering the biophysical parameters.

      Here we underscore the biological implications of our discovery that Cact is present in the nucleus on the dorsal side. The reviewer mentioned that Cact in the nucleus on the dorsal side appears to have little overall effect, because this is the location of the embryo where there is very little Dl in the first place, which raises the question of whether this discovery is impactful.

      While we previously used the final paragraph of the discussion to touch on the implications of this discovery, we acknowledge that we could have spent more time on the explanation. As such, we have expanded this final paragraph into two paragraphs. In the first of the two, we discuss in more detail the implications specifically of the Dl/Cact interactions in the dorsal-most nuclei, as understood by the results of this paper. In brief, knowing that Dl in the dorsal-most nuclei is bound by Cact results in an updated understanding of the Dl gradient, with increased dynamic range, robustness, and precision (but unknown shape).

      In the second of the two paragraphs, we discuss this result in light of our recent work on imaging Cact in live embryos, in which we have shown that Cact is present in all nuclei at roughly uniform levels. Taken together, we suggest that it is possible that Cact is bound to Dl in all nuclei (not just the dorsal-most), which would allow us to estimate the shape of the overall Dl gradient by subtracting off the fluorescence that stems from Dl/Cact complex.

      For example, I think that the implications of the rejected hypothesis (i.e., that Tolldependent Dl phosphorylation does not seem to have an impact on Dl binding affinities to DNA) are important and should be further discussed (even if no additional experiments are performed). What is then the role of Dl phosphorylation? Perhaps it could have an impact on patterning robustness in lateral regions. The authors should report in Figure 5 also what happens to the fraction of Dl bound to DNA in lateral regions in the reduced Cact binding and reduced Toll phosphorylation mutants.

      We appreciate the reviewer’s suggestion that the rejection of the hypothesis that phosphorylation of Dl by Toll impacts Dl/DNA binding could be expanded upon further. For the role of Dl phosphorylation by Toll: we previously mentioned that this phosphorylation is known to enhance the nuclear import or retention of Dl, and that mutation of serine 317 to an alanine abolishes Toll-mediated phosphorylation of Dl, which results in embryos with no Dl gradient. We had also mentioned that phosphorylation of Dl is not known to affect its DNA binding, which is the hypothesis we sought to test by creating the dl[S317A]-GFP mutants. We did not image any mutants, or the UAS-dl[wt]-GFP control, in the lateral regions, for two reasons. First, this region is easily the smallest of the three regions, in terms of the percentage of the DV axis (see Fig. 1A). Second, because of the dilution effect, we knew the effect size would be small, and as such, we imaged only on the extreme ends of the gradient so that the most clear conclusion could be drawn about the effect that Toll phosphorylation might have on DNA binding of Dl.

      The way that position along the DV axis is reported using the nuclear-cytoplasmic-ratio (NCR) in Figures 1-3 is not incorrect, but I wonder if it is the best way of doing it. The reason is that it spreads out a relatively small region of the embryo (the ventral-most locations) and shrinks a relatively large region of the embryo (lateral and dorsal regions), see Figure 1A. Perhaps reporting the NCR in log_2 units would be more appropriate.

      We agree that there is some distortion of the relative spatial extents of the Dorsal gradient when NCR is used as an independent variable on a plot. However, we prefer the NCR on the horizontal axis because it is closer the functional variable (Dl concentration, rather than spatial location) for the properties we studied.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I really enjoyed the first part of this paper and have only minor suggestions for improvement of the presentation. I am confused about the experimental approach for the final figure, distinguishing phosphorylation and cactus-dependent effects. I'll divide my comments between "First Part/General Suggestions", "Last Part", and finish with some minor typo observations.

      The gist of the issues with the last part of the paper could boil down to insufficient detail/explanation of the section. The discrepancy with expectation with Michaelis-Menten kinetics is presented in a total of three sentences and is not necessarily obvious to the general readership of eLife. The mutants chosen to distinguish the phosphorylation and cactus mechanisms could be described more (why these? aren't other residues phosphorylated?) and possibly why also having wild-type GFP-Dl in the measurements isn't confounding. Since there is unlimited space in this journal, it may be advisable to use this space to fill out these rationales and ideas.

      First part/General Suggestions:

      (1) For the RICS data, (Figures 1 and 2) there is a nice correlation between WT NC ratio and the selected low/med/hi Dl activity mutants. More-or-less the median values in, say, Figure 1E-G are reflected in Figure 1H. However, with the ccRICS data (Figure 3), it looks like there is less correspondence between the range of fraction bound estimates in, for instance, "ventral" in Figure 3D and '10b' in Figure 3E. Can the authors comment on this? Should the reader be able to make this kind of comparison, or does something about data collection for the wt/NCR measurements preclude direct comparison of magnitudes with the panel of mutants? (imaging setup, laser power, etc)?

      The reviewer is correct that there seems to be a discrepancy in the values of ψ between the wt embryos (ventral side) and the Toll10B embryos. It should be noted that the Toll10B embryos are not “ventral-like” in every way, in part because they have unknown activated Toll levels that might be above or below what is seen at the ventral midline in wildtype embryos, and in part because there is no DV gradient, and thus no shuttling in these embryos that would accumulate total Dorsal on the ventral midline. As such, comparisons between Toll10B embryos and the ventral side of wildtype embryos are not exactly one-toone, and we are more confident in comparing among the mutants in an allelic series. To address this question, we have added a sentence to the end of the second paragraph of the “Dorsal/DNA binding exhibits a spatial gradient” subsection of the Results (Lines 233235).

      (2) Materials and methods: Mounting and imaging of Drosophila embryos: the authors cite the "488 nm laser intensity ranged from 0.5% to 3.0%..." The values presented here are not useful for the general reader or an individual looking to replicate these conditions, as emission power produced from such values will vary from instrument to instrument. It is standard in these cases to report an estimated laser power (measured in watts) for each laser line, and a clear description of how such measurements were made (stationary beam, under scanning conditions, with what detector, etc). These measurements are valuable and the authors are strongly encouraged to report such measurements for their setup.

      We appreciate the reviewer’s suggestion and understand the importance of providing absolute laser power values for reproducibility. We have now included the laser power (in watts) for the laser lines on both microscopes used in this study. The revised text can be found in the Materials and Methods section, in the Lines 535-536 and 540.

      (3) The presentation of the data in Figure 4 is difficult to understand. Are the kymographs (A lower) representing the entire length of the big white arrow in A upper? Or do the dashed lines indicate the x-axis limits of the kymograph? It is difficult to tell from the figure legend, where the dashed lines are described as "areas where Dl-GFP movement is measured out of the nucleus." I believe that the authors can make these measurements and that Figure 4B reflects properties of "movement" of Dl out of the nucleus, but how they get there from these data is not clear to this reader. Perhaps a cartoon explaining the green lines and the orange lines in the kymograph or tightening the legend would help.

      We thank the reviewer for their feedback and understand the need for greater clarity in the text of the pCF section and in Figure 4. The widths of the kymographs in the lower panels correspond to the full widths of the images in the upper panels. The pCF measurements were taken at the y-coordinates at the level of the white arrows. The dashed vertical lines connecting the upper and lower panels illustrate two cases of locations along the x-axis of the image where Dl is crossing from inside a nucleus to outside. In the two illustrated cases, these crossings are accompanied by either zero Dl molecules being observed to cross the nuclear barrier (ventral image/kymograph on left) or delayed crossing of Dl molecules (dorsal image/kymograph on right). To address this concern, we have added more detail to the Fig. 4 legend and greatly expanded on a discussion of what pCF does in the text (the second and third paragraph of the section). We have also updated Fig. 4 to align with new explanations from the text: namely, describing the y-axis of the kymographs as Δt (instead of log(time)) and explicitly showing that the pair correlation is for pairs of pixels that are Δx = 6 pixels apart. Further details were also added to the relevant Methods section.

      (4) DV position in the wild-type imaging experiments is operationally determined through measurement of the Dorsal NC ratio. This makes sense, but the strategy is buried in the first paragraph of the results, and not discussed in the M & M. For readers unfamiliar with imaging the fly embryo or the nuances of the Dl gradient, perhaps a sentence or two explaining that embryos were oriented randomly along the DV axis, and DV positions of the imaging region were estimated by measuring the Dl NC ratio.

      We thank the reviewer for this helpful suggestion. To improve clarity, we have added a description of how DV position was determined to the Materials & Methods section (paragraph starting on Line 520). Specifically, we now state that embryos were randomly oriented along the DV axis and that we used the Dorsal NC ratio of intensity as a proxy for measuring the DV position in imaging experiments. Additionally, we have added a statement to the Results section to ensure that this strategy is more clearly introduced (Lines 143-144). We appreciate this recommendation, as it will help readers unfamiliar with fly embryo imaging better understand our approach.

      (5) It would be nice to report the corresponding NC-ratio values for Dl in each of the mutant conditions, perhaps as a supplement to Figure 1. Currently, Figure 1H relies on the (admittedly well-established) properties of the three mutants, but it feels that an additional nice quantitative link in the data can be drawn out here. Do the authors see the strict correlation between the wt and mutant diffusivity measurements at specific NC-ratios?

      We are hesitant to try to draw direct comparisons between the mutants and the behavior of the wildtype embryo at the corresponding NCR. This is because, in the context of these uniform mutants, the NCR is determined by a combination of at least three factors that we cannot measure or control for: the unknown strength of Toll signaling, the unknown capacity of Toll signaling (ie, the potential saturation of the cytoplasmic enzymes controlled by Toll signaling), and, most importantly, the lack of a shuttling mechanism that concentrates Dl on the ventral side of the embryo. As such, the NCR does not represent a continuous variable that transforms the behavior of one mutant into another (or from mutants into wt DV coordinates), as it does along the DV axis in wildtype embryo. This is why the mutant studies are presented as boxplots. At best, we were comfortable only in using the uniform mutants as an allelic series to produce gross trends. We have added a brief statement describing the shuttling caveat to the Results section (Lines 173-177).

      (6) In the section related to Dl nuclear export, the language used to describe Dl kinetics is ambiguous. The term "movement" is used seemingly as a catch-all for nuclear-importexport as distinguished from diffusion. However, diffusion is also a form of movement. Could this section be reworked to explicitly distinguish nuclear import-export and diffusive movements?

      We appreciate the reviewer’s suggestion and agree that the language used to describe Dl kinetics could be more precise. By way of explanation, the pCF analysis calculates the time scale on which Dl can exit the nucleus. pCF only gives a signal if it sees the same Dl molecule twice, at two different locations after some Δt amount of time has passed. Because of this, if a given Dl molecule in a ventral nucleus is being tracked, then that molecule has some probability that it is bound to DNA initially, which means it will take, on average, longer to exit the nucleus than a Dl molecule not initially bound to DNA. Therefore, on the ventral side, the time scale on which Dl exits the nucleus is longer than on the dorsal side (where DNA binding is not happening). This can be true even if the nuclear export rate constants are the same on the ventral side vs the dorsal side. As such, we were careful to choose language that did not imply that we were talking about a nuclear export rate constant. We have added this discussion to the end of the relevant Results section (Lines 308-315).

      We have also revised this section to explicitly distinguish between the mobility associated with exiting the nucleus and diffusive movement, while still trying to distinguish between the time scale of exiting the nucleus vs the nuclear export rate. Specifically, we now refer to ‘time scale of nuclear export’ when discussing transport across the nuclear envelope and reserve the term ‘diffusion’ for passive intracellular movement. Furthermore, we have edited a sentence in this section (Lines 291-293) to describe the distinction we are making between the time scale measured by pCF and the time scale commonly associated with nuclear export (that is, the reciprocal of the rate constant). We hope this clarification improves readability and conceptual clarity.

      Last Part:

      (1) There is an undersold argument centered on Michaelis-Menten kinetics that needs to be explicitly presented, especially since it motivates the final experiments of the paper, which are challenging. In the two sections describing how the data do not adhere to expectations based on Michaelis-Menten Kinetics, the assertion that "the fraction of immoble Dl is expected to decrease with increasing nuclear total Dl concentration" is only intuitively true if the system is saturated. Is the system demonstrably saturated? Another interpretation of this would be that these results demonstrate that the system is likely not saturated. In any case, the authors need to devote some space in the introduction and/or results and/or discussion to fully motivate this point.

      We agree that the reviewer has raised an important point: if the system is very far from saturation, then the fraction of immobile Dl is not expected to decrease with increasing nuclear total Dl concentration. But neither would it increase; it would instead stay flat. To correct this mistake, we have edited the sentences in question to acknowledge the farfrom-saturation scenario, saying “at best, [the fraction bound] remain[s] constant” (Line 209). As such, our original point, which is that in no case would the fraction immobile increase [unless something else is going on besides affinity-based binding to DNA], it still valid.

      (2) Wouldn't any argument on the basis of Michaelis-Menten need to rely on the assumption that the system is at steady-state? Reeves 2012 concludes that during the times measured here, Dl does not reach a steady state. It would be good, in the context of the point above, for the authors to clarify how this impacts the expectations of saturation and the application of M/M kinetics.

      We thank the reviewer for raising this important point. We apologize for not being clear on our points about M/M kinetics and would like to stress again that we are not claiming the system is has M/M kinetics. We appealed to M/M kinetics only as a simple, intuitive example of a saturating system to point out the difference between bound concentration vs bound fraction as functions of total concentration. We did this because previous feedback on our manuscript suggested that the difference between these two variables needed to be made clearer. Because this point seemed controversial with both reviewers, we removed all mention of M/M kinetics and simply refer to the system as “saturating.” For further explanation, see the first paragraph of our response to Reviewer 1’s “weaknesses” in the public review.

      (3) It is not clear to me how the inclusion of wild-type, GFP-tagged dorsal in the experimental setup for Figure 5 is not confounding. For the S317 (phospho-) mutant, GFPtagged alleles of both phospho- and wild-type Dl are expressed. The reasoning is that not enough phospho-mutant Dl gets into the nucleus, and this makes it difficult to distinguish the dorsal from the ventral side of the embryo, so in a dl mutant background, there is expression of wt GFP-dl from a BAC, and nos>Gal4 driven expression of a GFP-tagged S317A mutant dl. The measurements show that on the ventral side of the embryo, there is no difference in the fraction of bound Dl. Couldn't this be predominantly binding of wildtype GFP-Dl? How is this interpretable? Wouldn't it be easier to perform these measurements in a Tl 10b background (or to cross in UAS>Tl[10b]) and for the only GFPtagged dl to be S317A? The same goes for the S234 mutant (could be done in the pelle mutant background).

      We thank the reviewer for raising the point that the confounding effect of wildtype Dl makes it difficult to interpret the results from the 317A mutant. Under the circumstances of the experimental design, we can best conclude that, if the null hypothesis is incorrect, the effect size was too small to detect with our sample size. As such, we have modified our discussion of the results of this experiment to carefully explain this caveat (rather than confidently saying that Toll phosphorylation has no effect). For further explanation, see the second paragraph of our response to Reviewer 1’s “weaknesses” in the public review, as well as our response to the related question raised by Reviewer 2 in the public review.

      Minor issues/typo stuff:

      (1) This reviewer notes that the submitted materials contain neither line numbers nor page numbers.

      We appreciate the reviewer’s feedback. We have now included line numbers and page numbers in the revised manuscript for easier reference.

      (2) First paragraph of results: "We imaged small regions of the embryo..." The parenthetical statement only cites pixel size and directs the reader to the methods. Without the total number of pixels, the pixel size value does not clarify how "small" the imaged region is. Consider including the xy area, pixel dimensions, and pixel size here to assert the smallness of the imaged area.

      We have added the requested information.

      (3) Second paragraph, Introduction: "Dorsal, one of three (Drosophila) homologs to mammalian NF-kB" (Add Drosophila). Also, aren't these orthologs?

      We have made these changes.

      (4) Last sentence of last paragraph in the introduction: Kind of a throw-away sentence. Consider revising.

      We thank the reviewer for making this point; the sentence was originally constructed to state that our quantitative measurements resulted in a biologically significant discovery. However, because Reviewer 2 also mentioned the question of biological significance, we have changed this final sentence to explicitly mention of what the biological significance is: namely, an understanding of the Dl gradient that has superior dynamic range, spatial range, robustness, and precision.

      (5) Where is the median line in the S317A boxplot in Fig 5C?

      The median line is at ψ = 0. We have added an explanation of this to the Figure legend.

      (6) Materials & Methods: Fly transformation, typo: Drosophila embryos were injected with 0.5 µl of each pUAST construct..." The volume of an entire Drosophila embryo is less than 0.5 µl, please revise the units to reflect the value injected. Most likely an absolute volume unit was stated when rather a concentration of an injection solution, delivered at significantly smaller volumes was intended.

      We thank the reviewer for catching this typo. It was intended to indicate a concentration of 0.5 ng/μL, and we have made the appropriate changes.

      Reviewer #2 (Recommendations for the authors):

      (1) Perhaps this has been described in a prior publication (if this is the case, please simply state this somewhere in the Methods section where Dl-GFP embryos are described), but since Dl-GFP embryos have one copy of endogenous dl and one copy of Dl-GFP, how do potential differences in tagged vs. non-tagged Dl interactions with DNA or Cact affect their findings?

      The reviewer brings up a good point, and we acknowledge that any time a protein is tagged with GFP, the behavior of the protein may be affected. We have now explicitly added this caveat to our discussion in a new paragraph on Lines 420-429.

      (2) In the Discussion section, the authors argue that a major implication of their findings is the possibility that Cact binds Dl in the nuclei would imply that the true (active) Dl gradient may be unknown unless the unbounded Dl is separated from the Dl/Cact (inactive form). While this is an interesting point, this idea is not supported by the findings of Figure 5B where there is no effect in the fraction of Dl bound to DNA in the reduced Cactus binding mutants. The authors should report what happens in lateral regions in Figure 5 because perhaps there is an effect there (see comment on this in the Public Review).

      We thank the reviewer for the insight, as we did not directly discuss the implications of the middle column of Fig. 5B on our hypothesis. Indeed, our hypothesis is not supported by Fig. 5B; it is instead inconclusive (failure to reject H0). This is why we designed the second experiment (Fig. 5C) to test the Cactus hypothesis, because the effect size would be greater on the dorsal side.

      Furthermore, as pointed out by both reviewers, the presence of wildtype Dl-GFP in these experiments is confounding. We have discussed this elsewhere in our rebuttal, but briefly, this problem resulted in needing larger effect sizes to detect a statistically significant difference between wt and the mutant populations. This was a necessary evil that we were willing to deal with in order to ensure the Dl gradient could be established so that the dorsal vs ventral sides would be distinguishable. We have added a fuller discussion of these issues to the relevant Results section (Lines 333-336, 343-345, 354-359, 365-369) and also the Discussion section (Lines 412-418), including underscoring the fact that, from a falsification standpoint, the results in Fig. 5B do not allow us to reject either null hypothesis, possibly due to the confounding effect of wildtype Dl. We appreciate the reviewer’s point about this, and believe the changes suggested by the reviewer have improved the manuscript.

      On the other hand, we respectfully disagree with the reviewer that investigating either mutant in the lateral regions of the embryo would bear fruit. To the first approximation, it would be the average between the behaviors on the ventral vs. dorsal sides. For the S317A mutant, neither the ventral nor the dorsal side was conclusive in regards to our hypotheses. (Although we admit here that further investigation into why the S317A column in Fig. 5C was statistically different from wildtype, in the opposite direction from the S234P mutant, may be interesting in future work.) For the S234P mutant, the data were more conclusive on the side of the embryo where the effect size was expected to be large enough to detect a difference. In the lateral regions, the expectation would be that the effect size would be intermediate, which would make the interpretation of the results more difficult (i.e., more likely to be inconclusive). In contrast, as Fig. 5C is already conclusive, we are not confident there would be more information gained by imaging the lateral regions.

      (3) Is Figure 5A a wild-type embryo? If so, I think that the labels are misleading or unclear. Also, is it the same image as in Figure 1A? If so, I suggest replacing this with a schematic since it does not add any new data.

      We have eliminated the labels for the mutants and have added the following comment to the figure 5 legend “Same embryo as in Fig. 1A”.

      (4) Also in Figure 5, I suggest using labels to indicate the schematics instead of simply using their location. You could use 5A', 5A' and 5A', for example.

      We have made the suggested changes.

      (5) The use of some technical labels makes some figures difficult to read. I suggest using more simple labels for mutants in Figure 3F (replace R063C) or Figure 5B, C (replace S234P and S317A).

      We have made changes to Fig. 3F, Fig. 5B,C, and the corresponding places in the figure legends. We have labeled R063C as ↓DNA, S317A as ↓Toll, and S234P as ↓Cact.

      (6) I suggest reporting p-values consistently. For example, in Figure 4B, they use one or two asterisks to denote p-values less than 0.07 and 0.05, respectively, which is somehow arbitrary and unconventional. Why not report the actual values as in Figure 5C, for example? (By the way, I would report in Figure 5B the actual p-values as well, since a nonsignificant value is also reported in Figure 5C. Also in Figure 5C, report values in the same notation (decimal or scientific), i.e., either put 0.005 as 5x10^-3 or 10^-3 as 0.001).

      We have made the suggested changes.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chen, Tu, and Lu focused on how brain-wide dopamine release dynamically changes during sleep/wake state transitions. Using multi-site fiber photometry to monitor DA release, alongside simultaneous EEG and EMG recordings, the authors show distinct DA dynamics during transitions from NREM to WAKE, REM to WAKE, WAKE to NREM, and NREM to REM. Next, they analyze temporal coordination between regions using cross-correlation analysis. Finally, chemogenetic activation of VTA or DRN but not SNc dopamine neurons is shown to promote wakefulness.

      Strengths:

      The manuscript addresses an interesting question: how brainwide dopamine activity evolves across sleep/wake transitions. The combination of multi-site DA recordings with simultaneous EEG/EMG monitoring is technically sophisticated. The experimental logic is generally clear, and the dataset is rich. The result has several interesting observations.

      Weaknesses:

      The authors used the GRAB-DA2m sensor to monitor dopamine release. Although DA2m exhibits higher affinity for dopamine compared to NE (around 15-fold difference in EC50 in HEK cell assays), it is still possible that NE contributes to the recorded signals, particularly during sleep/wake transitions when locus coeruleus activity is strongly modulated. Given the widespread and state-dependent dynamics of NE, this potentially needs to be addressed.

      We thank the reviewer for raising this important methodological consideration. While we acknowledge that a minor contribution from norepinephrine (NE) to the DA2m signal cannot be categorically excluded, several convergent lines of evidence give us confidence that the signals we recorded primarily reflect dopamine release.

      First, DA2m has substantially lower affinity for NE compared to dopamine. The reported EC<sub>50</sub> for NE is ~1200 nM [1], which is ~15-fold higher than for dopamine. In contrast, extracellular NE levels in the prefrontal cortex are typically in the low nanomolar range (generally <5 nM under basal conditions) [2,3]. Because physiological NE concentrations are orders of magnitude below the sensor’s EC<sub>50</sub> threshold, NE is highly unlikely to drive significant DA2m activation in vivo.

      Second, our optogenetic experiments provide direct functional validation. The targeted stimulation of midbrain dopaminergic neurons elicited robust DA2m signal responses across both cortical and subcortical brain areas. This confirms that the sensor reliably captures evoked dopamine release within our specific experimental paradigm.

      Finally, the spontaneous DA2m signal dynamics we observed across sleep-wake states functionally diverge from previously reported patterns of cortical NE release [4]. For example, in Figure 1C, our DA2m recordings in the mPFC revealed high activity during wakefulness, alongside pronounced, sharp changes during NREM-to-WAKE transitions. In contrast, prior study [4] show that NE exhibits comparatively mild fluctuations during wakefulness and transitions between NREM. This temporal and kinetic divergence further supports that our recorded signals isolate region-specific dopaminergic dynamics rather than generalized NE arousal activity.

      Taken together, these physiological, functional, and kinetic distinctions indicate that while a negligible contribution from NE cannot be entirely ruled out, it is highly unlikely to account for a substantial portion of the DA2m signals observed during sleep-wake transitions in our study.

      Similarly, the chemogenetic experiments rely on CNO to activate hM3Dq-expressing dopamine neurons. However, it is well established that CNO can be converted to clozapine in rodents, and clozapine itself is known to influence sleep/wake. Although the authors included non-hM3Dq-expressing mice as controls, the potential confounding effects of clozapine on sleep regulation remain a concern.

      We appreciate the reviewer raising this important point regarding the metabolism of CNO. We are aware of the evidence suggesting that CNO can undergo back-metabolism to clozapine in rodents, which could potentially exert independent effects on sleep-wake architecture. To mitigate this concern, we strictly employed several experimental safeguards:

      (A) Non-hM3Dq Control Group: As noted by the reviewer, we included a cohort of mice that did not express the hM3Dq receptor but received the same dosage of CNO (1 mg/kg). In these animals, we observed no significant alterations in sleep-wake states compared to saline baseline (Figure S3), suggesting that at this dosage, any clozapine produced was below the threshold for behavioral modulation of sleep.

      (B) Dosage Selection: We utilized a relatively low dose of CNO (1 mg/kg), which is widely reported in the literature to minimize the accumulation of clozapine to levels that would interfere with EEG-defined sleep states in rodents [5]. Furthermore, studies have demonstrated that while higher doses of CNO (e.g., 5–10 mg/kg) can produce clozapinelike effects on sleep architecture, lower doses around 1 mg/kg do not yield significant alterations in cortical EEG power distribution or sleep-wake amounts in control animals [6,7].

      Midbrain dopamine neurons exhibit both tonic and phasic firing patterns. In Figure 1, most reported dopamine transitions appear relatively slow. However, some faster, phasic-like components are observable. For example, in NAc-L during REM-to-WAKE transitions, there are 2 phasic-like decreases between −20 and 0 s. The authors used laser-evoked stimulation experiments in the VTA and DRN and showed that 2 s versus 10 s stimulation produces distinct dopamine kinetics, suggesting that different firing patterns generate distinct DA dynamics. Moreover, the temporal profiles vary not only across regions but also across transitions within the same region. For example, in CeA, the NREM-to-WAKE transition shows a relatively rapid decrease, whereas REM-to-WAKE displays a much slower decline. Similarly, some regions (e.g., NAc-L NREM-to-WAKE, DRN REM-toWAKE) show faster changes, while others (e.g., mPFC WAKE-to-NREM, VTA NREM-toWAKE) show slower kinetics. These observations argue against a simple region-specific explanation and instead suggest that distinct firing modes may differentially contribute depending on transition type.

      We thank the reviewer for this insightful comment. We agree that midbrain dopamine neurons exhibit both tonic and phasic action-potential firing patterns. As summarized by Grace et al., dopamine neurons recorded using in vivo electrophysiology can display a slow, irregular, single-spike “tonic” firing pattern, typically around 2–10 Hz, as well as burst-like “phasic” firing patterns [8].

      However, our recordings were performed using GRAB-DA2m fiber photometry. Therefore, our measurements reflect extracellular dopamine dynamics in the recorded target regions rather than the action-potential firing patterns of midbrain dopamine neurons. GRABDA2m has subsecond sensor kinetics and is suitable for detecting extracellular dopamine transients occurring over hundreds of milliseconds to seconds, as well as slower dynamics occurring over seconds to tens of seconds [1], which matches the timescale of the sleep–wake transition-related dynamics observed in previous studies [9,10]. Nevertheless, GRAB-DA2m fiber photometry in our study does not directly resolve dopamine neuron spike timing or distinguish tonic from phasic firing modes. Accordingly, we interpret our signals as extracellular dopamine concentration dynamics rather than as direct measurements of tonic or phasic neuronal firing.

      Therefore, the transition-aligned dopamine signals shown in Figure 1 should be interpreted as dopamine dynamics occurring over seconds-to-tens-of-seconds around sleep–wake transitions, rather than as dopamine neuron firing patterns. In addition, these traces represent GRAB-DA2m signals averaged across sessions and mice within a ±30 s window centered on each sleep/wake transition. Thus, they do not necessarily represent individual dopamine transient patterns on single transitions. We also acknowledge the reviewer’s observation that faster phasic-like components are visible in some traces, including the decreases in the NAc-L preceding REM-to-WAKE transitions. Direct electrophysiological recordings of dopamine neuron firing during sleep–wake transitions would be useful in future studies to determine how tonic and phasic firing modes contribute to the observed dopamine dynamics.

      In the laser-evoked stimulation experiments shown in Figure 3, we thank the reviewer for the thoughtful interpretation. The results indicate that different stimulation durations can produce distinct dopamine release dynamics in downstream projection regions. Moreover, prolonged optogenetic stimulation was associated with more sustained dopamine responses, suggesting that the temporal profile of extracellular dopamine dynamics depends, at least in part, on the duration and region of dopaminergic input [1]. We also agree with the reviewer that the temporal profiles of the GRAB-DA2m signals vary not only across regions, but also across sleep/wake transitions within the same region. For example, in CeA, the NREM-to-WAKE transition shows a relatively rapid dopamine decrease, whereas the REM-to-WAKE transition displays a slower decline.

      Similarly, faster dopamine changes are observed in some region/transition combinations, such as NAc-L during NREM-to-WAKE and DRN during REM-to-WAKE, whereas slower kinetics are observed in others, such as mPFC during WAKE-to-NREM and VTA during NREM-to-WAKE. Together, these effects reflect both region-specific mechanisms and transition-dependent differences in dopaminergic activity.

      While cross-correlation analysis provides insight into the temporal coordination of DA signals across regions, several limitations should be considered. Sleep/wake transitions are inherently non-stationary events, whereas cross-correlation assumes relatively stable signal properties within the analysis window. This mismatch may bias lag estimates and obscure transient lead-lag relationships. Moreover, the temporal resolution of fiber photometry and the kinetics of genetically encoded DA sensors limit the precision with which timing relationships can be interpreted, particularly for sub-second lags.

      We thank the reviewer for raising these important considerations. The temporal relationships between regional dopamine signals were assessed using cross-covariance analysis. We agree that cross-covariance analysis has limitations when applied to sleep/wake transitions, because these transitions are inherently non-stationary events. Although cross-covariance centers the signals by subtracting their means and is therefore less sensitive to baseline offsets than raw cross-correlation, it still summarizes the lagdependent covariance between two signals over the selected analysis window. Therefore, the inferred lag should be interpreted as a transition-level measure of temporal coordination rather than a precise estimate of instantaneous lead–lag timing.

      To minimize the influence of brief or unstable state fluctuations, we only included transitions in which both the preceding and following sleep/wake epochs lasted at least 30 s, and excluded epochs shorter than 30 s [4]. This criterion helped ensure that the analyzed events represented well-defined transitions between sustained behavioral states rather than transient or fragmented episodes. Although dopamine signals may still change dynamically within the transition window, and the temporal resolution of fiber photometry and the kinetics of genetically encoded GRAB-DA2m sensors limit the precision with which fine-scale timing relationships can be interpreted, dopamine signals were relatively stable within each behavioral state, as shown in Fig. 1B and reported previously [1,9,10] Thus, we believe that cross-covariance analysis provides useful information about the temporal coordination of dopamine dynamics across regions.

      In the Introduction, the authors state that they aim to address 'which dopaminergic populations causally drive these patterns.' However, the chemogenetic approach used operates on a relatively slow timescale: CNO-induced activation takes 15-30 minutes to produce effects, and the induced changes are long-lasting. In contrast, the dopamine transitions described in Figure 1 occur on a much faster timescale compared to CNO manipulation. Thus, while chemogenetic activation demonstrates that stimulating VTA or DRN dopamine neurons promotes wakefulness, it does not directly establish that these populations causally drive the rapid transition-related DA dynamics observed in the photometry recordings.

      We thank the reviewer for this thoughtful comment. We agree that chemogenetic manipulation operates on a much slower timescale than the rapid dopamine transients observed during sleep–wake transitions, and therefore does not directly recapitulate these fast dynamics. In particular, CNO-induced activation unfolds over minutes and produces sustained changes in neuronal activity, whereas the DA signals we report fluctuate on a sub-second to second timescale. Our intention with the chemogenetic experiments was not to mimic the precise temporal profile of endogenous DA signals, but rather to test whether increasing the activity of specific dopaminergic populations is sufficient to influence behavioral state.

      In this context, our results show that activation of VTA or DRN dopaminergic neurons robustly promotes wakefulness, supporting a causal role for these populations in sleep– wake regulation at the circuit level. However, we agree that these data do not by themselves establish that these neurons directly generate the rapid transition-related DA dynamics observed in the photometry recordings.

      Reviewer #2 (Public review):

      In "Brainwide dopamine dynamics across sleep-wake transitions", Chen et al. provide a thorough description of how dopamine dynamics fluctuate across sleep-wake transitions and in transitions between sleep states. To achieve this, the authors used multi-channel fiber photometry and a genetically encoded fluorescent dopamine reporter to simultaneously measure dopamine dynamics in 8 brain regions. They also used EEG measurements to precisely quantify and time transitions between sleep states and wakefulness. Finally, the authors used channelrhodopsin to examine dopamine dynamics following subregion stimulation and chemogenetics to test the causal relationship between activation of distinct dopamine neuron populations and their effects on sleep state.

      The conclusions made by the authors in this study are modest and appropriate given the largely observational nature of the principal findings. The use of optogenetics to probe regional dopamine signaling following activation of distinct nuclei is interesting, but not entirely novel and constrained in interpretability. Similarly, the chemogenetics experiment largely confirms previous studies, which the authors correctly cited in the text.

      The principal findings of this study are based on strong methodological and analytical methods. Implanting 8 optical fibers in a single mouse, along with EEG/EMG electrodes, is technically challenging, providing valuable, simultaneous measurements of dopamine fluctuations across the brain. This enables the strong correlational and time-locked analyses performed by the authors in Figure 2. What's more, the use of EEG/EMG electrodes provides time-locked descriptions of sleep states, enabling precise comparisons between the dopamine signal and sleep state transitions.

      The paper has some weaknesses that the authors could address. The analyses in Figure 1 could be strengthened to show how dopamine changes during transitions between specific sleep states. The injection sites for channelrhodopsin and chemogenetic viruses could be validated to strengthen the interpretation of those results. Also, a stronger justification for the experiments conducted in Figure 3 could be provided, as they seem unrelated to the present study.

      Overall, this study has strong descriptive power, convincingly showing how dopamine fluctuates across sleep states. Some of the other aspects of the paper, however, are somewhat limited in novelty and interpretation.

      The analyses in Figure 1 could be strengthened to show how dopamine changes during transitions between specific sleep states.

      We appreciate the reviewer’s thoughtful suggestion. We agree that the directionality and kinetics of dopamine changes during sleep/wake transitions may provide important information beyond state-level dopamine quantification.

      In this study, mice were recorded for 4–5 h during each sleep session. Across the recording period, mice frequently transitioned from NREM to WAKE, WAKE to NREM, NREM to REM, and REM to WAKE. Transitions from WAKE to REM were rarely observed and therefore were not included in the transition analysis. Accordingly, we focused our analysis on the four major transition types: NREM-to-WAKE, WAKE-to-NREM, NREM-toREM, and REM-to-WAKE [4,9,11].

      For each transition type, dopamine dynamics were analyzed separately by aligning the zscored GRAB-DA2m signal to the transition onset and averaging across all epochs of the same transition type. To minimize the influence of brief or unstable state fluctuations, we excluded transitions in which either the preceding or following sleep/wake epoch lasted less than 30 s. The resulting transition-triggered dopamine traces were then averaged across sessions and mice for each transition type independently.

      Thus, the transition analysis preserves the directionality of state changes rather than pooling all sleep/wake transitions together. Because dopamine signals differ across behavioral states, transitions between neighboring states produce distinct temporal profiles when aligned to the transition point [4,9-11]. For example, REM-to-WAKE transitions may show a rapid increase in dopamine in the mPFC, whereas WAKE-to-NREM or NREM-to-REM transitions may show slower and more modest decreases. These transition - specific kinetics may reflect distinct underlying mechanisms, including changes in dopamine neuron firing or local terminal modulation.

      The injection sites for channelrhodopsin and chemogenetic viruses could be validated to strengthen the interpretation of those results.

      We agree with the reviewer that precise histological validation is essential for the correct interpretation of our optogenetic and chemogenetic findings.

      Regarding the chemogenetic experiments, as noted, we provide examples of virus expression in the VTA, DRN, and SNc in Figure 4. By demonstrating the consistency and restriction of our targeting across the entire cohort (VTA, SNc, and DRN), we confirmed that our observed sleep effects were regionally specific. Our data only included mice with accurate targeting and no substantial virus "leakage" into adjacent nuclei.

      We thank the reviewer for this insightful observation regarding the regional dopamine (DA) responses following SNc stimulation. While the SNc is traditionally associated with the dorsal striatum (DLS), several studies have demonstrated that SNc dopaminergic neurons also project to the nucleus accumbens, particularly the lateral shell [12,13]. Furthermore, recent work characterizing the functional heterogeneity of midbrain DA neurons suggests that SNc subpopulations can drive significant DA release in ventral striatal subregions [14]. We appreciate the reviewer’s caution regarding potential off-target effects. While our histological criteria for validation post recordings were stringent, we acknowledge that in any midbrain manipulation, the close anatomical proximity of the VTA and SNc makes it technically challenging to guarantee zero involvement of neighboring VTA neurons. However, by using mice with the most restricted virus expression and fibers targeting, we have minimized this potential confound as much as is technically feasible with current viral and optogenetic methods.

      Also, a stronger justification for the experiments conducted in Figure 3 could be provided, as they seem unrelated to the present study.

      We thank the reviewer for this comment. The experiments in Figure 3 were designed to systematically map the sources of dopaminergic inputs to key brain regions examined in this study [15], including the mPFC, DLS, NAc, and CeA. Establishing these input–output relationships is important for interpreting the photometry signals observed during sleep– wake transitions.

      Specifically, we found that optogenetic activation of VTA dopaminergic neurons elicits DA responses in all four regions, whereas activation of DRN dopaminergic neurons induces responses in the mPFC, DLS, and CeA, and activation of SNc dopaminergic neurons induces responses in the mPFC, NAc, and DLS. These results reveal partially overlapping but distinct projection patterns across dopaminergic populations.

      Taken together, these data provide a circuit-level framework suggesting that VTA, SNc, and DRN dopaminergic neurons may contribute differentially and with distinct weights to the DA signals observed in these regions during sleep wake transitions.

      Overall, this study has strong descriptive power, convincingly showing how dopamine fluctuates across sleep states. Some of the other aspects of the paper, however, are somewhat limited in novelty and interpretation.

      We appreciate the reviewer’s assessment that our study convincingly demonstrates how dopamine fluctuates across sleep states. We agree that the primary contribution of this work is descriptive and foundational. At the same time, we respectfully emphasize that rigorous, comprehensive descriptive studies are essential, particularly when addressing phenomena that have not been systematically characterized. Prior to this work, dopamine dynamics during natural sleep–wake transitions had not been measured simultaneously across multiple brain regions.

      Our multi-site photometry approach advances the field in several important ways. Technically, the combination of simultaneous eight-region fiber photometry with EEG/EMG recordings represents a substantial methodological advance, enabling brainwide, network-level analysis of dopamine dynamics during natural state transitions. This approach reveals emergent features—such as temporal coordination and inter-regional lead–lag relationships—that cannot be captured using single-site recordings. Moreover, integrating brain-wide measurements with region-specific manipulations allows circuitlevel insights that would not be accessible from either approach alone.

      Conceptually, our findings revealed the region, sleep/wake transition type -specific and bidirectional dopamine dynamics, instead of the prevailing view of dopamine as a uniform arousal signal: dopamine decreases in certain limbic regions, such as the central amygdala and nucleus accumbens lateral shell, during arousal transitions, while increasing in cortical and other striatal regions. These results refine simplified models of dopaminergic regulation of arousal. In addition, our data reveal differential circuit contributions, with the VTA and DRN—but not the SNc—promoting wakefulness, highlighting functional specialization within the dopamine system.

      We acknowledge that some aspects of our study, including the optogenetic mapping and chemogenetic experiments, build on established methodologies and in part confirm prior findings. However, these experiments also provide several new insights. First, whereas individual dopamine sources have often been studied in isolation, our systematic comparison across VTA, SNc, and DRN using consistent methods reveals distinct brainwide functional contributions that were not previously established. Second, our optogenetic mapping does not simply recapitulate known projection patterns, but instead uncovers quantitative differences in dopamine release kinetics and magnitude across source–target pairs, which inform the heterogeneity of the transition dynamics. Finally, our findings provide a crucial anatomical and temporal framework for future research on the specific mechanisms driving these dynamics and their precise functional consequences.

      References:

      (1) Sun, F. et al. Next-generation GRAB sensors for monitoring dopaminergic activity in vivo. Nat Methods 17, 1156-1166, doi:10.1038/s41592-020-00981-9 (2020).

      (2) Ihalainen, J. A., Riekkinen, P., Jr. & Feenstra, M. G. Comparison of dopamine and noradrenaline release in mouse prefrontal cortex, striatum and hippocampus using microdialysis. Neurosci Lett 277, 71-74, doi:10.1016/s0304-3940(99)00840-x (1999).

      (3) Berridge, C. W. & Abercrombie, E. D. Relationship between locus coeruleus discharge rates and rates of norepinephrine release within neocortex as assessed by in vivo microdialysis. Neuroscience 93, 1263-1270, doi:10.1016/s0306-4522(99)00276-6 (1999).

      (4) Silverman, D. et al. Activation of locus coeruleus noradrenergic neurons rapidly drives homeostatic sleep pressure. Sci Adv 11, eadq0651, doi:10.1126/sciadv.adq0651 (2025).

      (5) Anaclet, C. et al. The GABAergic parafacial zone is a medullary slow wave sleeppromoting center (vol 17, pg 1217, 2014). Nat Neurosci 17, 1841-1841, doi:DOI 10.1038/nn1214-1841d (2014).

      (6) Ma, C. Y. et al. Microglia regulate sleep through calcium-dependent modulation of norepinephrine transmission. Nat Neurosci 27, 249-258, doi:10.1038/s41593-02301548-5 (2024).

      (7) Traut, J. et al. Effects of clozapine-N-oxide and compound 21 on sleep in laboratory mice. Elife 12, doi:10.7554/eLife.84740 (2023).

      (8) Grace, A. A., Floresco, S. B., Goto, Y. & Lodge, D. J. Regulation of firing of dopaminergic neurons and control of goal-directed behaviors. Trends Neurosci 30, 220-227, doi:10.1016/j.tins.2007.03.003 (2007).

      (9) Darmohray, D. et al. Brainstem circuit for sickness-induced sleep. Sci Adv 11, doi:ARTN eady024510.1126/sciadv.ady0245 (2025).

      (10) Hasegawa, E. et al. Rapid eye movement sleep is initiated by basolateral amygdala dopamine signaling in mice. Science 375, 994-+, doi:10.1126/science.abl6618 (2022).

      (11) Ding, X. et al. Neuroendocrine circuit for sleep-dependent growth hormone release. Cell 188, 4968-4979 e4912, doi:10.1016/j.cell.2025.05.039 (2025).

      (12) Poulin, J. F. et al. Mapping projections of molecularly defined dopamine neuron subtypes using intersectional genetic approaches. Nat Neurosci 21, 1260-1271, doi:10.1038/s41593-018-0203-4 (2018).

      (13) Lerner, T. N. et al. Intact-Brain Analyses Reveal Distinct Information Carried by SNc Dopamine Subcircuits. Cell 162, 635-647, doi:10.1016/j.cell.2015.07.014 (2015).

      (14) Azcorra, M. et al. Unique functional responses differentially map onto genetic subtypes of dopamine neurons. Nat Neurosci 26, 1762-1774, doi:10.1038/s41593023-01401-9 (2023).

      (15) Eban-Rothschild, A., Rothschild, G., Giardino, W. J., Jones, J. R. & de Lecea, L. VTA dopaminergic neurons regulate ethologically relevant sleep-wake behaviors. Nat Neurosci 19, 1356-1366, doi:10.1038/nn.4377 (2016).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Pecak et al have deciphered the conformational dynamics of a heterodimeric model ABC transporter, TmrAB, a functional homolog of the human antigen transporter TAP, using single-molecule Forster resonance energy and fluorophores attached to residues at either nucleotide binding domains or periplasmic gate. The analysis not only differentiated ATP-free and bound states but also enabled the real-time monitoring of protein conformational changes, precisely dissecting transport cycles and resolving transient intermediates. This study is absolutely significant in providing and establishing a general pipeline delineating the conformational dynamics in heterodimeric ABC transporters.

      We thank the reviewer for this accurate and thoughtful summary of our work and its broader significance. We agree that the combination of single-molecule FRET with orthogonal validation approaches enables mechanistic resolution of conformational states and transitions that are not accessible by ensemble measurements. In particular, this framework allows direct discrimination of ATP-free and ATP-bound conformations, real-time tracking of transport cycle progression, and identification of transient intermediates in the heterodimeric ABC transporter TmrAB. We further agree that these capabilities support a generalizable strategy for dissecting conformation dynamics in related ABC transporters.

      Strengths:

      The scientific study is very well documented for experimental design, results, and conclusions supported by the experimental data. The authors have determined the conformational dynamics of TmrAB across different ATP concentrations, including physiological ones, and resolved an outward open state and other conformational states consistent with previous cryoEM and DEER studies.

      Weaknesses:

      The scientific study needs a bit of in-depth analysis with respect to consistency in K<sub>d</sub> and its implications on the mechanism.

      The apparent K<sub>d,ATP</sub> values were determined using two complementary approaches that report on different aspects of the system. Ensemble FRET measurements yielded values of 51° ± 38° µM (TmrAB<sup>NBD</sup>), 68°  ± 25° µM (TmrAB<sup>PG</sup>), and 95° ± 26° µM (TmrAB<sup>PG_EQ</sup>), which are in good agreement with previously reported biochemical estimates (~100° µM for TmrAB<sup>EQ</sup>) (Stefan et al, 2020). The slightly elevated value observed for the E→Q variant may reflect modest perturbation of nucleotide handling in this slow-turnover background. Notably, the close agreement between labeled and unlabeled variants indicates that fluorophore attachment does not measurably affect ATP binding.

      In contrast, smFRET-derived K<sub>d,ATP</sub> values (13° ± 1° µM for TmrAB<sup>NBD</sup> and 2° ± 1° µM for TmrAB<sup>PG</sup>) are systematically lower. This difference likely arises from the difficulty of deconvoluting overlapping FRET populations at sub-K<sub>d,ATP</sub> concentrations, particularly for TmrAB<sup>PG</sup>, where state assignment is less well separated. Despite this quantitative offset, both approaches consistently indicate ATP saturation well below physiological concentrations and therefore support the same mechanistic conclusion that ATP binding drives conformational switching in TmrAB.

      Reviewer #2 (Public review):

      In their manuscript entitled 'ATP-driven conformational dynamics reveal hidden intermediates in a heterodimeric ABC transporter', Pečak et al. use elegant single-molecule FRET experiments in detergent to investigate the heterodimeric ABC transporter TmrAB. By combining simulations of the transporter's accessible volume with elegant trapping strategies, the authors identify an unresolved outward-facing open state and conclude that it is usually obscured by a rapidly interconverting ATP-bound ensemble. Overall, the study demonstrates that smFRET can resolve the short-lived intermediate states of TmrAB and potentially other ABC transporters that are obscured in ensemble measurements.

      It is a very interesting study that highlights the power of combining high-resolution structural information with spectroscopic approaches. I have three major points and a few minor criticisms.

      We thank the reviewer for the thoughtful and constructive evaluation of our manuscript and for highlighting the strength of combining structural and single-molecule approaches. We have addressed all major and minor points in detail below and revised the manuscript where appropriate to clarify limitations, justify analysis choices, and improve transparency.

      Major points:

      (1) The main weakness is that the authors base their conclusions on a very limited set of FRET pairs. While TmrAB has been extensively studied in terms of its structure, the authors should at least acknowledge this limitation more clearly.

      We agree that our conclusions are based on a limited number of FRET reporter pairs, and we now explicitly state this limitation in the revised manuscript. The chosen labeling positions were selected to probe two functionally critical regions—the nucleotide-binding domains and the periplasmic gate—based on prior structural and spectroscopic evidence. While this represents sparse sampling of the full conformational space, it is consistent with typical smFRET studies of membrane transporters, where experimental constraints generally limit the number of simultaneously accessible labeling positions (Asher et al, 2021; Asher et al, 2022; Levring et al, 2023; Wang et al, 2020).

      Importantly, both independent reporter variants yield consistent ATP-dependent population shifts, supporting the robustness of the observed trends. We further clarify that additional labeling sites could, in principle, resolve finer structural sub-states; however, given the already limited population separation in the current variants, such extensions would likely provide diminishing returns in state resolvability under the present experimental conditions. This trade-off is now explicitly discussed.

      (2) Most smFRET distributions were fitted with one, two, or three Gaussians. However, in several cases, additional populations with noticeable amplitudes appear to be present (e.g., Figure 3c at 0.1 mM and 3 mM ATP; Figure 4a, apo; Figure 4c, 0.3 mM R9L). Could the authors clarify why these populations were not included in the analysis?

      We thank the reviewer for this careful observation. Low-amplitude subpopulations are occasionally detected in individual histograms; however, they were not included in the quantitative model because they do not meet criteria for reproducibility, amplitude robustness, or structural assignability. Specifically, these features vary between replicates, contribute minimally to total population, and cannot be mapped to structurally or biochemically defined states based on available cryo-EM (Hofmann et al, 2019), DEER/PELDOR (Barth et al, 2018; Barth et al, 2020), or accessible-volume simulations.

      Similar minor subpopulations have been reported in smFRET studies and often attributed to photophysical or labeling heterogeneity effects (Asher et al, 2022; Husada et al, 2018). To avoid over-parameterization, we therefore restricted analysis to reproducible, structurally supported states. This rationale is now clarified in the revised manuscript.

      (3) Figure 3c (3 mM ATP): Is it truly possible to distinguish the two states in this distribution?

      We agree that state separation in the TmrAB<sup>PG</sup> variant is limited (ΔE° = °0.11), and we now explicitly acknowledge this constraint in the manuscript. To improve robustness under these conditions, we used a constrained fitting strategy in which the apo-state distribution was fixed from nucleotide-free measurement, reducing parameter degeneracy during fitting of ATP-bound datasets.

      While single-molecule trajectory-based approaches such as Hidden Markov Modeling would be ideal for resolving dynamic interconversion, this was not feasible due to the low fraction of dynamic traces at the available temporal resolution. We therefore rely on population-level analysis, which remains consistent across replicates and reporter variants.

      Notably, independent measurements from two reporter positions (TmrAB<sup>NBD</sup> and TmrAB<sup>PG</sup>) yield similar ATP-bound population fractions at saturating ATP concentrations (~77% vs. ~80%), supporting the robustness of the inferred state distribution despite partial overlap.

      References

      Asher WB, Geggier P, Holsey MD, Gilmore GT, Pati AK, Meszaros J, Terry DS, Mathiasen S, Kaliszewski MJ, McCauley MD, Govindaraju A, Zhou Z, Harikumar KG, Jaqaman K, Miller LJ, Smith AW, Blanchard SC, Javitch JA (2021) Single-molecule FRET imaging of GPCR dimers in living cells. Nat Methods 18: 397–405. doi:10.1038/s41592-021-01081-y

      Asher WB, Terry DS, Gregorio GGA, Kahsai AW, Borgia A, Xie B, Modak A, Zhu Y, Jang W, Govindaraju A, Huang LY, Inoue A, Lambert NA, Gurevich VV, Shi L, Lefkowitz RJ, Blanchard SC, Javitch JA (2022) GPCR-mediated beta-arrestin activation deconvoluted with single-molecule precision. Cell 185: 1661–1675 e1616. doi:10.1016/j.cell.2022.03.042

      Barth K, Hank S, Spindler PE, Prisner TF, Tampé R, Joseph B (2018) Conformational coupling and trans-inhibition in the human antigen transporter ortholog TmrAB resolved with dipolar EPR spectroscopy. J Am Chem Soc 140: 4527–4533. doi:10.1021/jacs.7b12409

      Barth K, Rudolph M, Diederichs T, Prisner TF, Tampé R, Joseph B (2020) Thermodynamic basis for conformational coupling in an ATP-binding cassette exporter. J Phys Chem Lett 11: 7946–7953. doi:10.1021/acs.jpclett.0c01876

      Hofmann S, Januliene D, Mehdipour AR, Thomas C, Stefan E, Brüchert S, Kuhn BT, Geertsma ER, Hummer G, Tampé R, Moeller A (2019) Conformation space of a heterodimeric ABC exporter under turnover conditions. Nature 571: 580–583. doi:10.1038/s41586-019-1391-0

      Husada F, Bountra K, Tassis K, de Boer M, Romano M, Rebuffat S, Beis K, Cordes T (2018) Conformational dynamics of the ABC transporter McjD seen by single-molecule FRET. EMBO J 37: e100056. doi:10.15252/embj.2018100056

      Levring J, Terry DS, Kilic Z, Fitzgerald G, Blanchard SC, Chen J (2023) CFTR function, pathology and pharmacology at single-molecule resolution. Nature 616: 606–614. doi:10.1038/s41586-023-05854-7

      Nocker C, Pečak M, Nocker T, Fahim A, Sušac L, Tampé R (2026) Single-molecule dynamics reveal ATP binding alone powers substrate translocation by an ABC transporter. Nat Commun 17 doi:10.1038/s41467-026-70021-1

      Nöll A, Thomas C, Herbring V, Zollmann T, Barth K, Mehdipour AR, Tomasiak TM, Bruchert S, Joseph B, Abele R, Olieric V, Wang M, Diederichs K, Hummer G, Stroud RM, Pos KM, Tampé R (2017) Crystal structure and mechanistic basis of a functional homolog of the antigen transporter TAP. Proc Natl Acad Sci U S A 114: E438–E447. doi:10.1073/pnas.1620009114

      Stefan E, Hofmann S, Tampé R (2020) A single power stroke by ATP binding drives substrate translocation in a heterodimeric ABC transporter. eLife 9: e55943. doi:10.7554/eLife.55943

      Wang L, Johnson ZL, Wasserman MR, Levring J, Chen J, Liu S (2020) Characterization of the kinetic cycle of an ABC transporter by single-molecule and cryo-EM analyses. eLife 9: e56451. doi:10.7554/eLife.56451

    1. Author response:

      We appreciate the extremely helpful feedback from the reviewers and editors for our manuscript. We are happy that the reviewers have appreciated what we are doing here, performing the initial work that should set the stage with Drosophila larva as a model for hyperactive stimulant response. Every comment is certainly addressable within a reasonably short time period and we look forward to improving our paper in an upcoming revision.

      We have some confusion about the “fundamental issue” of using nicotine, as we see the excitation as the fundamental effect we are studying, but we can continue to discuss and clarify this.

      We plan to make significant edits to our introduction and background sections to better frame the goals of the work, and will clarify and expand on our methods, and more carefully make any claims about neural mechanisms.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      We thank the reviewer for their constructive and insightful comments and agree with the importance of the points raised. We recognize that aspects of our original presentation may have been unclear or overly strong in their interpretation. We have therefore revised the manuscript to clarify our intended scope, moderate our claims, and strengthen the analysis. In the second paragraph of the Discussion, we have explicitly acknowledged the concerns raised by the reviewer and outlined how they have been addressed in the revised manuscript. Our detailed responses are provided below.

      (1) Tone of model-data correspondence

      Numerous statements describe the RNN as "closely mimicking," "recapitulating," or being "nearly identical" to claustral neural dynamics, sometimes extending to claims about causal relationships between neural activity and behavior. Given that neural data were not used to train the model, and that only a small subset of trained networks showed the reported dynamics, these statements should be substantially softened throughout the manuscript. The RNN should be framed as providing one possible computational realization consistent with existing data, not as a close instantiation of the biological circuit

      We agree with the reviewer’s comment. The expressions noted by the reviewer (e.g., closely mimicked, nearly identical, recapitulate) will be replaced with alternative wording that conveys a more moderate meaning (Line 16-17, 65-66, 83, 96, 120, 212).

      (2) Non-uniqueness of RNN solutions

      The fact that only a small fraction of trained networks exhibited "claustrum-like" clusters deserves deeper discussion. This observation raises the possibility that the identified solution is fragile or highly specific rather than canonical. The authors should explicitly discuss the non-uniqueness of internal solutions in behavior-trained RNNs, including the range of alternative network dynamics that can reproduce the same behavior. In particular, it should be clarified why the specific network exhibiting "claustrum-like" clusters is informative about claustral computation, rather than representing one arbitrary solution among many.

      As the reviewer pointed out, behaviorally trained RNNs can admit multiple internal solutions that produce the same behavioral output, and we acknowledge the non-uniqueness of such internal solutions. However, we do not interpret the fact that only a subset of trained RNNs exhibit dynamics similar to those observed in the claustrum as evidence that this solution is fragile. Notably, the claustrum-like dynamics emerged spontaneously during training and were not explicitly enforced. Furthermore, our finding suggests that the emergence of this particular dynamical regime depends on relatively specific structural constraints.

      Our criterion for selecting RNNs that could inform the computational principles of the claustrum was their ability to reproduce the behavioral and physiological observations obtained in the delayed escape experiments. RNNs that were excluded may reflect information-processing strategies used by other brain regions or may rely on artificial logical structures. The computational demand of the task, which integrates temporally separated signals, naturally drives convergence toward networks with recurrent excitatory connectivity capable of maintaining persistent activity. Indeed, all networks that exhibited a claustrum-like cluster shared a common structural feature: strong recurrent excitatory connectivity within Cluster 1. This property is consistent with biological characteristics observed in the slice experiments shown in Fig 2.

      Importantly, the computational principles derived from this RNN were found to be quantitatively consistent with in vivo single-neuron activity patterns. Specifically, analysis using an eigenvalue-based metric (λ<sub>3</sub>/Σλ) revealed the same directional effect in both the RNN and the claustrum neuron data. In addition, a leave-one-neuron-out analysis showed that this pattern was broadly distributed across in vivo claustral neurons rather than being driven by a small subset (see Fig. 4).

      Taken together, these convergent lines of evidence suggest that the computational model is not simply one arbitrary solution among many possible alternatives, but rather implements a computational principle that may underlie claustral functions.

      (3) GPFA trajectory comparisons

      The qualitative similarity between RNN trajectories and GPFA-derived trajectories from sparse in vivo data is interesting but insufficient to support claims of robustness or population-level structure. Statements suggesting that these patterns are unlikely to arise from noise or random fluctuations are not justified, given the single-trial, pseudo-population nature of the data. Either additional quantitative controls should be added, or the interpretation should be substantially tempered.

      As the reviewer pointed out, the GPFA trajectory comparison presented in the original manuscript remained largely qualitative, and we agree that this alone was insufficient to establish robustness or provide convincing evidence for population-level structure. In the revised manuscript, we have therefore added the requested quantitative analysis (see Fig. 4).

      Before describing the analysis, we would like to clarify several methodological limitations associated with pseudopopulation and single-trial data. GPFA estimates latent trajectories based on assumptions about covariance structure among neurons and temporal smoothness. In pseudopopulation datasets, the true simultaneously recorded covariance structure cannot be fully reconstructed, which is an inherent limitation. Because our dataset is based on single trials, the analysis does not directly exploit trial-to-trial variability. Nevertheless, the estimation of the latent space still depends on the covariance structure among real claustral neurons, suggesting that the inferred trajectories remain tied to biologically meaningful population dynamics.

      Accordingly, the quantitative metric we introduce is not entirely independent of the GPFA estimation step. Rather, it is intended to evaluate the geometric structure of the single-trial latent trajectories estimated by GPFA. We acknowledged this limitation in the revised manuscript.

      Specifically, for the biological data, we reanalyzed the GPFA-derived latent trajectories in PCA space and computed an eigenvalue-based metric (λ<sub>3</sub>/Σλ). For each of the 20 time bins, we applied a sliding window of 10 bins and calculated the covariance matrix within that window. The eigenvalues of PC1, PC2, and PC3 were then obtained, and the third eigenvalue (λ<sub>3</sub>) was normalized by the total variance (Σλ = λ<sub>1</sub> + λ<sub>2</sub> + λ<sub>3</sub>). This metric quantifies the degree to which the trajectory locally deviates from a planar structure that can be explained by two dominant axes. An increase in λ<sub>3</sub>/Σλ indicates that the population-state trajectory forms a higher-dimensional geometric structure beyond a simple two-dimensional combination.

      For the RNN data, in contrast, the activity of all units can be observed simultaneously and sufficient trial repetitions are available. Therefore, GPFA was not applied; instead, PCA was performed directly on the population activity for each trial. We then computed an average trajectory across trials and applied the same λ<sub>3</sub>/Σλ metric. Thus, although the initial dimensionality reduction steps differ between the two systems, the definition and calculation of the final quantitative metric are identical. The focus of the comparison is therefore not the dimensionality reduction technique itself, but the geometric dimensional structure of the population trajectories evolving over time.

      Importantly, within the biological dataset, the GPFA estimation procedure, preprocessing steps, pseudopopulation construction, subsampling strategy, temporal alignment criteria, and smoothing parameters were applied identically across conditions. Likewise, the same analysis pipeline was used for all conditions in the RNN. If structural biases had been introduced during covariance estimation or dimensionality reduction, they would be expected to affect all conditions within each system similarly. Nevertheless, the λ<sub>3</sub>/Σλ value was consistently and significantly higher in the CS condition than in the Neutral condition, and this directional pattern was observed in both the RNN and the claustral neuron data. This suggests that the effect reflects condition-specific differences in population dynamical structure rather than artifacts arising from a particular dimensionality reduction method.

      To further test whether the observed effect might be driven by a small subset of neurons or specific neuron combinations, we performed a leave-one-neuron-out analysis on the claustrum dataset. Recomputing λ<sub>3</sub>/Σλ while removing one neuron at a time showed that, in the CS group, most neurons contributed relatively evenly to this metric, whereas the Neutral group did not show such a distributed contribution pattern. This indicates that the observed three-dimensional structure is not driven by a few outlier neurons or incidental covariance patterns, but rather reflects an organized population-level phenomenon.

      If the result were primarily due to structural artifacts introduced by the pseudopopulation construction or dimensionality reduction procedures, it would be unlikely for consistent selective differences to repeatedly emerge between conditions under identical analysis pipelines. The consistently higher λ<sub>3</sub>/Σλ values observed in the CS condition therefore provide indirect support that this pattern reflects condition-specific population dynamics rather than estimation bias.

      Taken together, these results suggest that the observed three-dimensional structure reflects condition-specific population dynamics rather than analysis artifacts. The fact that the same quantitative metric yields consistent effects in both the RNN and claustral data further strengthens the correspondence between the two systems.

      (4) Scope of functional claims

      The discussion connecting the findings to broad theories of claustral function, global workspace, or consciousness extends well beyond the data presented. These speculative links should be clearly labeled as such and significantly reduced in strength and prominence.

      We agree with the reviewer and stated that references to these theories are speculative, while substantially reducing both their emphasis and prominence in the manuscript (Line 444-446, 451).

      (5) Comment on Conceptual Interpretation of the Behavioral Paradigm:

      The manuscript repeatedly describes the delayed escape task as an "inference-based behavioral paradigm" and states that animals "infer that a value-neutral alternative space is likely to be safer" when the CS is presented in a novel environment. While I appreciate that the US-CS association was established in a different context and that the CS is then presented in a new environment, I am not convinced that the current behavioral evidence uniquely supports an inference interpretation.

      First, it is not clear that this task is widely recognized in the literature as a canonical inference task, in the sense of, for example, sensory preconditioning, transitive inference, or model-based inference paradigms. Rather, the observed effect-that CS animals escape faster to a neutral compartment than neutral-CS controls-can be parsimoniously interpreted in terms of generalized threat value, heightened fear/anxiety, or a bias toward avoidance/escape under elevated threat, without requiring an explicit inferential step about the specific safety of the alternative compartment. The fact that no prior training is needed is compatible with flexible generalization, but does not by itself demonstrate inference in a more formal computational sense.

      Second, the inference claim becomes central to the manuscript's conceptual framing (e.g., the idea that rsCla supports "inference-based escape"), yet the behavioral analyses presented here and in the cited prior work do not clearly rule out simpler accounts. Clarifying this distinction would help avoid overstating both the inferential nature of the behavior and the specific role of rsCla and the RNN's "claustrum-like" cluster in supporting inference per se, as opposed to more general integration of threat-related signals with an opportunity for escape.

      We agree with the reviewer’s concern. First, we referred to the delayed escape behavioral task as “a behavioral paradigm that requires integration of temporally separated task-relevant signals.” (Line 7-8). We also removed references to the term inference throughout the manuscript (Line 46, 51, 67, 397).

      Reviewer #2 (Public review):

      We sincerely thank the reviewer for their constructive and insightful comments. Through the revision process, the manuscript has been substantially improved, with increased reproducibility, more appropriate acknowledgment of prior work, and a clearer and more logical presentation of the study.

      (1) This paper is based on behavioral results and neural recordings from their prior paper (Han et al.), but data, e.g., in Figure 1, are not clearly identified as new or as coming from that source. Figure 1A, for example, appears to be taken directly from Han et al. No methods are given in this manuscript for the behavioral testing or the in vivo electrophysiology.

      We agree with the reviewer that this distinction should be made clearer. In the original manuscript, we indicated in the Figure 1 legend that panels A, D, E, F, and L (left) were reproduced from Han et al. (2024). To further clarify this point, we explicitly noted this distinction again in the main text (Line 74, 85). In addition, we described the behavioral experiments and in vivo electrophysiological recordings performed in Han et al. (2024) in the Methods section and include the appropriate citation (Line 463-530).

      (2) Many other details are unclear. Examples include model training, the weight matrices and how these changed with training (p. 13), equations 2 and 3 (p. 13), the sources for the constants in the equations (p. 14), the methods (anesthesia, stereotaxic coordinates, injection specifics and details for "sparse expression") for the ChrimsonR injections.

      We agree with the reviewer’s comment and have revised the manuscript to provide a more detailed description of the model training procedure, weight initialization, and parameter selection.

      We expanded the explanation of the model training procedure and weight initialization. Specifically, the recurrent (W<sub>rec</sub>) and output (W<sub>out</sub>) weight matrices were initialized using a Glorot normal distribution with a standard deviation of to ensure stable signal propagation during early training. In addition, we now explicitly describe the training algorithm and optimization procedure. The network was trained using the Adam optimizer implemented in TensorFlow (v2.1.0) with a batch size of 256 for 1.2 million training iterations, minimizing the per-trial loss function defined in the manuscript. We also explicitly stated how Dale’s principle was maintained throughout training: rows in W_out corresponding to inhibitory units were zeroed out, and recurrent weights were continuously constrained so that excitatory and inhibitory neurons preserved their respective positive and negative synaptic projections. To illustrate how the weight structure evolved during training, we explicitly reference Figure 2A, which visualizes the final mean inter-cluster synaptic weights and highlights the strong recurrent connectivity that emerged within Cluster 1. Regarding Equations 2 and 3 and their constants, we clarified that the target escape times used to anchor the network were based on experimentally measured behavioral latencies (48.7 s for the CS-present condition and 111.3 s for the CS-absent condition). Furthermore, the regularization coefficients (λ = 0.01 and λ<sub>FR</sub> = 0.95) were selected through a grid search procedure to maintain biologically plausible firing rates while preventing overfitting.

      We detailed the surgical procedures that were previously omitted. This includes the specific anesthesia protocol (sodium pentobarbital, 50 mg/kg, i.p.), stereotaxic mounting, and the exact coordinates for the rsCla (AP +2.95, ML ±1.95, DV -3.85 mm). To define "sparse expression," we specified that the AAV was diluted 1:4 in sterile saline. Finally, we included the precise injection parameters: delivery at 20 nL/min via a pressure injection system, with the pipette left in place for 10 minutes post-infusion to ensure adequate diffusion. (Line 635, 636-639, 641-643). We have added these contents in the Methods section. 

      (3) The explorations of model behavior are a catalog of everything tried rather than an organized demonstration of what the model can and cannot do. The figures could be reduced in number to emphasize the key comparisons of the different clusters and the model's behavior under different conditions, intended to "test" the model.

      We agree with the reviewer’s comment and have reorganized the figures to focus on the key results. Specifically, we separated the original figures so that they correspond to (1) Presentation of an RNN model consistent with the results of actual claustral recordings, (2) identification of dimensionality-reduced population activity patterns in the model, (3) comparison of these patterns with population activity patterns derived from recorded claustral neurons, (4) proposal of a nonlinear integration mechanism, and (5) the suggestion that such integration may be implemented through dynamic coding. Using this figure organization, we first identify RNN models trained on behavioral metrics whose dynamics are consistent with experimental claustral recordings. We then compare the dimensionality-reduced population activity patterns of these models with those derived from recorded claustral neurons to evaluate their biological plausibility. After selecting the models that satisfy this criterion, we perform further analyses that would be difficult to achieve using real neural recordings alone. These analyses ultimately allow us to propose dynamic coding exhibiting nonlinear integration as a plausible computational mechanism.

      (4) On page 6, the E-E connectivity is argued from Shelton et al. (2025) and against Kim et al. (2016), but ignores Orman (2015), which, to this reviewer's knowledge, was the first to demonstrate such connectivity, including the long-duration events and impact of planes of section.

      We agree with the reviewer’s suggestion and will include a reference to Orman (2015). We have clarified that neuronal activity can persist for extended periods and that such persistent activity has been observed in claustral slices prepared at a specific slicing angle (Line 144).

      (5) Whereas the authors are entitled to their own opinion of prior work (references 3-8), it is inappropriate to misrepresent prior work as only demonstrating a "limited function" of claustrum. Additional papers by Mathur's group and Citri's group are ignored.

      We agree with the reviewer’s comment and have revised the relevant sentences in the Introduction section.  We also included and acknowledged the contributions of previous studies by the Mathur group and the Citri group by adding additional references to their works (Line 36, 429).

      In summary, the authors have made a computational model that recapitulates the firing of a subset of potentially claustral neurons during a particular behavioral task (delayed escape is certainly not the only behavior that involves claustrum - see e.g., attention, salience, sleep). If the conclusion is that excitatory claustral cells must be connected to other excitatory claustral cells, such a conclusion is not new, and the electrophysiological E-E metrics are not well quantified (e.g., connectivity frequency, strength of connection). If the model is intended to predict how the claustrum might accomplish any other task, there is insufficient detail to evaluate the model beyond the evidence that the model creates a subset of cells that can sustain firing during the delay period in the delayed escape task.

      All relevant work must be appropriately cited throughout the manuscript.

      Regarding the E–E metric, we obtained the following result. When including recordings in which the whole-cell recording could not be completed, optogenetically evoked responses were observed in 38 out of 43 patched cells. This suggests that approximately 90% of the cells receive intra-claustral excitatory input. However, the current dataset does not allow us to quantify the connection probability or the strength of these connections.

      As the reviewer pointed out, the RNN developed in this study is specifically designed for the delayed escape task, and we do not intend to claim direct generalization to other proposed functions of the claustrum, such as attention, salience, or sleep. The goal of this study is to computationally characterize the temporal integration mechanism of the claustrum observed in this specific task. We have included this in the Discussion section. In the second paragraph of the Discussion, we have explicitly acknowledged the concerns raised by the reviewer and outlined how they have been addressed in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents a novel toolkit for visualizing and manipulating neurotransmitterspecific vesicles in C. elegans neurons, addressing the challenge of tracking neurotransmitter dynamics at the level of individual synapses. The authors engineered endogenously tagged vesicular transporters for glutamate, GABA, acetylcholine, and monoamines, enabling cell-specific labeling while maintaining physiological function. Additionally, they developed conditional knockout strains to disrupt neurotransmitter synthesis in single neurons. The study reveals that over 10% of neurons in C. elegans exhibit co-transmission, with a detailed case study on the ADF sensory neuron, where serotonin and acetylcholine are trafficked in distinct vesicle pools. The approach provides a powerful platform for studying neurotransmitter identity, synaptic architecture, and co-transmission.

      Strengths:

      (1) This toolkit offers a generalizable framework that can be applied to other model organisms, advancing the ability to investigate synaptic plasticity and neural circuit logic with molecular precision.

      (2) Through the use of this toolkit, the authors uncover molecular heterogeneity at individual synapses, revealing co-transmission in over 10% of neurons, and offer new insights into neurotransmitter trafficking and synaptic plasticity, advancing our understanding of synaptic organization.

      Weaknesses:

      (1) While the article introduces valuable tools for visualizing neurotransmitter vesicles in vivo, the core techniques are based on previously established methods. The study does not present significant technological breakthroughs, limiting the novelty of the methodological advancements.

      The reviewer is correct that this study does not introduce fundamentally new molecular or imaging techniques. Rather, the goal of this work is to establish a generalizable and experimentally validated framework for investigating neurotransmission in vivo at single-cell resolution. To achieve this, we deliberately integrate robust and well-established approaches, including CRISPR-based genome engineering, endogenous tagging, intersectional labeling strategies, and behavioral genetics, into a unified toolkit that enables questions that were previously difficult to address in intact animals.

      The novelty of the work therefore lies not in the invention of individual technologies, but in their systematic integration, functional validation, and deployment to reveal new biological insights, such as the prevalence and spatial organization of co-transmission in vivo.

      (2) The article does not fully explore the potential implications or the underlying mechanisms governing this process, while the discovery of co-transmission in over 10% of neurons is an intriguing finding. A deeper investigation into the functional uniqueness and interactions of neurotransmitters released from individual co-transmitting neurons - perhaps through case study examples - would strengthen the study's impact.

      We agree with the reviewer that this study does not exhaustively explore the functional implications or mechanisms of co-transmission. The primary goal of this work is to introduce and share a validated set of strains that enable monitoring and cell-specific disruption of the major neurotransmitter systems in C. elegans, using molecular components that are broadly conserved across species. By establishing this toolkit, we aim to enable the mechanistic, single-cell analyses of co-transmitting neurons that extend beyond the scope of the present study but represent important next steps for the field.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors developed fluorescent reporters to visualize the subcellular localization of vesicular transporters for glutamate, GABA, acetylcholine, and monoamines in vivo. They also developed cell-specific knockout methods for these vesicular transporters. To my knowledge, this is the first comprehensive toolkit to label and ablate vesicular transporters in C. elegans. They carefully and strategically designed the reporters and clearly explained the rationale behind their construct designs. Meanwhile, they used previously established functional assays to confirm that the reporters are functional. They also tested and confirmed the effect of cell-specific and pan-neuronal knockout of several of these transporters.

      Strengths:

      The tools developed are versatile: they generated both green and red fluorescent reporters for easy combination with other reporters; they established the method for cell-typespecific KO to analyze the function of the neurotransmitter in different cell types. The reagents allow visualization of specific synapses among other processes and cell bodies. In addition, they also developed a binary expression method to detect co-transmission "We reasoned that if two neurotransmitters were co-expressed in the same neuron, driving Flippase under the promoter of one transmitter would activate the conditional reporter - resulting in fluorescence - only in cells also expressing a second neurotransmitter identity". Overall, this is a versatile and valuable toolkit with well-designed and carefully validated reagents. This toolkit will likely be widely used by the C. elegans community.

      Weaknesses:

      The authors evaluated the positions of fluorescent puncta by visually comparing their positions with the positions of synapses indicated by EM reconstruction. It would provide stronger supportive evidence if the authors also examined co-localization of these reporters with well-established synaptic reporters previously published by their lab, such as reporters that label presynaptic sites of AIY interneurons.

      We have now included images of the synaptic vesicle marker RAB-3 in neurons like ASE (new Figure S2) and RIB (new Figure S4D). We mention in the text that the patterns observed with VGLUT/EAT-4 (in Figure 2E) and VGAT/UNC-47 (Figure 3D) are like those observed in the Rab3 images (Figure S2 and S4D, now discussed in lines 180-182 and line 244, respectively), supporting labeling of presynaptic vesicles.

      Additionally, we now show that in the ADF neuron, a mutant for the conserved presynaptic kinesin KIF1A, results in the accumulation of VACh/UNC-17 and VMAT/CAT-1 in the cell soma and the elimination of the signal from the ADF axon (new Figure 7D-D’). These results are also consistent with the idea that these labeled transporters localize to synaptic vesicles that fail to be transported into the axon in the absence of a functional KIF1A/UNC-104 protein (lines 408-411).

      This toolkit will likely be widely used by the C. elegans community. To facilitate the adoption of the approach and method by worm labs, the authors should include their plan for the dissemination of all of the reagents included in the kit, along with all of the associated information, including construct sequences and the protocols for their use.

      We thank the reviewer or this suggestion, and in response we now: (1) have deposited all strains that we developed in this study to the Caenorhabditis Genetics Center, (2) have created a public website with sequences and genotyping information for each allele developed (https://www.intralab.app/research-papers/cuentas-condori_etal-2026) and(3) have named the tool kit, SynaptoTagMe, and included the name in the title and in the text. We also added the information of the public website to the main text (lines 140-142) and methods section (lines 540-542).

      Reviewer #3 (Public review):

      Summary:

      Cuentas-Condori et al. generate cell-specific tools for visualizing the endogenous expression of, as well as knocking out, four different classes of neurotransmitter vesicular transporters (glutamatergic, cholinergic, GABAergic, and monoaminergic) in C. elegans. They then use these tools in an intersectional strategy to provide evidence for the coexpression of these transporters in individual neurons, suggesting co-transmission of the associated neurotransmitters.

      Strengths:

      A major strength of the work is the generation of several endogenous tools that will be of use to the community. Additionally, this adds to accumulating evidence of co-transmission of different classes of neurotransmitters in the nervous system.

      Weaknesses:

      A weakness of the study is a lack of comparison to previously published single-cell sequencing data. These tools are alternatively described in the manuscript as superior to the sequencing data and as validation of the sequencing data, but neither claim can be assessed without knowing how they compare and contrast to that data. It is thus not clear to what extent the conclusions of this paper are an advance over what could be determined from the sequencing data on its own. Finally, some technical considerations should be discussed as potential caveats to the robustness of their intersectional strategy for concluding that certain genes are indeed co-expressed. Overall, claims about cotransmission should be tempered by the caveats presented in the discussion, suggesting that co-expression of these transporters is not in and of itself sufficient for neurotransmitter release.

      To clarify, we do not claim that our tools are superior to single-cell sequencing data. Rather, we view the characterization of neurotransmitter identity as an iterative process of discovery and validation across complementary approaches. Moreover, while this study provides an additional lens through which to examine neurotransmitter identity, its primary advance is not in redefining transmitter identity per se, but in establishing a toolkit that enables direct, in vivo monitoring and manipulation of neurotransmitter use at single-cell resolution.

      We do agree on the importance of explicitly comparing our findings with prior studies. In the revised manuscript we have therefore strengthened this integration by:

      (1) Revising Figure S9 and its legend to indicate the source of information for each neuron;

      (2) Adding a new Table 3 summarizing neurons consistently reported to have co-transmission potential;

      (3) Adding a new Table 4 listing neurons previously suggested to be co-transmitter neurons but not consistently supported across datasets;

      (4) Revising the Results to clarify these comparisons (lines 372-374 and 381-383); and

      (5) Incorporating this discussion into the main text (lines 482–488).

      In the Discussion we also now acknowledge technical caveats of the intersectional strategy, emphasizing that co-expression of vesicular transporters indicates co-transmission potential but is not, on its own, sufficient evidence of functional co-release (lines 482–488).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The design of different recombination sites for the transporters is a key strength of this paper. While the authors have provided justification and validation for the chosen sites, it would be valuable to know whether alternative insertion sites were tested as controls. A comparative analysis of multiple sites would provide important insights, especially for the design of similar sites in other proteins or in mammalian systems.

      Our paper lists all the sites tested for labeling each synaptic vesicle transporter. To summarize this information, we have added Table 5 in the Methods section (line 591).

      (2) Given the endogenous nature of the transporter design, it would be interesting to know if the authors have observed dynamic vesicle trafficking to explain the partial overlap shown in Figure 7. A dynamic approach could better capture the potential synergism and heterogeneity of co-transmission. I recommend that the authors try time-lapse imaging to explore this dynamic process further.

      We agree that dynamic imaging approaches, including time-lapse analysis of vesicle trafficking, represent an exciting avenue to further investigate the spatial and temporal organization of co-transmission. Such experiments are part of ongoing work in our laboratory and will be the focus of future studies aimed at dissecting the dynamic regulation of transmitter-specific vesicle populations in vivo.

      (3) The paper identifies co-transmission across a significant proportion of neurons, but the functional implications and interactions of neurotransmitters released from individual cotransmitting neurons are not fully explored. A case study focusing on the uniqueness and interactions of neurotransmitter release in these neurons would provide further clarity on the biological relevance of co-transmission.

      We agree with the reviewer on the importance of dissecting the functional implications of co-transmission and understanding how different neurotransmitters interact within individual co-transmitting neurons in vivo. The primary goal of this study is to establish and share tools that enable such investigations, and we anticipate that future work, using these reagents, will examine the functional roles of co-transmission on a neuron-by-neuron basis in the future.

      (4) Minor Comments:

      (a) Figure S1D: The label "eat-4" in the eat-4::GFP image appears in italics.

      We have corrected this.

      (b) Figure 2C: The figure legend is missing the statistical significance notation (*** p).

      We have corrected this.

      (c) Figure 2D: The scale bar should be labeled as 10 μm.

      We have added the label.

      (d) Figure S4B: The image quality could be improved for better clarity.

      We have replaced the image.

      (e) Figure S8: The figure legend formatting needs attention, and the scale bar is missing in Figure S8C.

      We have added panel labels and the scale bar.

      Reviewer #3 (Recommendations for the authors):

      (1) A comparison of the results generated in this paper to the Cengen data (or other previously published data) would greatly strengthen the paper. Figure S7 seems to be a compilation of several different data sets, but this is very unclear if so, and there is no indication of which neurons are from which data, and whether there is any conflicting evidence (or what cutoffs were used to determine co-expression from Cengen). If there are indeed conflicting results, the ramifications should be discussed. Finally, given the caveat introduced in the discussion regarding the I2 neuron not expressing GABA synthesis or reuptake machinery, a more thorough analysis of which neurons identified here do or don't express other relevant genes may be warranted.

      In the revised version, we have added Tables 3 and 4 to explicitly compare our findings with CeNGEN and prior studies. Table 3 lists neurons consistently reported across independent datasets to have co-transmission potential, while Table 4 highlights neurons that have been suggested, but not consistently supported, across studies. We now also provide explicit references for each neuron in these tables and have clarified data sources and annotations in the legend to Figure S7 (now Figure S9). These additions are intended to make points of agreement and discrepancy across datasets transparent and to better contextualize our findings within existing resources.

      (2) The intersectional strategy used to identify co-expression of different transporters has some caveats that should be discussed. Specifically, removing the entire open reading frame of the eat-4 gene (as opposed to employing a T2A strategy) could potentially also remove some negative regulatory elements (for example, located within introns), leading to the inappropriate expression of the fluorescent reporter. This should at least be mentioned as a potential caveat.

      We have added this caveat into the discussion section (lines 511-513).

      (3) The colocalization experiments performed in Figure 7 seem to rely on the use of a transgenic allele (syb7882) that was not previously validated for functionality. This is only a problem because: a) another allele with a constitutive mRuby in the same position (ot907) did not seem to be fully functional in the thrashing assays (Figure S4F), and thus it is at least conceivable that the differences in localization are due to the non-functional transporters being relegated to compartments destined for degradation. Validating this strain (after panneuronal Flippase expression) in the thrashing assay would dispel this concern.

      We have performed thrashing assays with allele syb7882 (UNC-17::mRuby3 GLP-on) (new Figure S6), in which we find that labeling UNC-17 with C. elegans-optimized mRuby3 (driven by pan-cellular Flippase) results in animals whose thrashing behavior is indistinguishable from that of wild-type animals. This result is consistent with the idea that the distinct subsynaptic localizations observed between VMAT/CAT-1 and VAChT/UNC-17 in ADF neurons arise from endogenous cellular subsynaptic organization programs.

      We additionally note that allele ot907 labels UNC-17 with mKate2, not mRuby3, and that this allele is different from wild type animals in a thrashing assay (Figure S5F). The syb7882 allele that we generated labels UNC-17 with mRuby3 and is not different from wild type in a thrashing assay. We are unsure as to these distinct phenotypes between ot907 and syb7882, but note that in addition to the use of different fluorescent proteins, each allele also employs distinct linker sequences between UNC-17 and the fluorescent protein (new Figure S6). We now explain this difference in the figure legend of Figure S5 (lines 1184-1189).

      Minor comments:

      (1) Is there a difference between the strains imaged in Figures 3D and S3D? If so, this is not clear. If not, why are they shown twice, and why do they look so different from each other?

      We have replaced panel S3D with an endogenous RAB-3::mScarlet marker in RIB neurons to show that the localization of this synaptic vesicle marker parallels the punctated pattern of UNC-47::gfp11x3 reconstituted specifically in RIB neurons. See new panel S4D and line 244.

      But to explain, GFP1-10 is expressed with an extrachromosomal array, which drives variable expression of the array and can explain the difference.

      (2) Strains are alternatively denoted by their effect in the main figures, and by their allele names in the supplementary figures. This can be confusing when trying to compare data between the two figures (e.g., Figures 4C and S4F). Perhaps adding the allele names as parentheticals in the main figure might help.

      We have modified the paper to include the name of the alleles used in the panels of the main figures. Additionally, we now mention the specific alleles used for the functional assays in the figure legends.

      (3) To better understand the ramifications and efficiency of the cat-1 FLP-mediated removal (Figure 5E), it would be interesting to compare it directly to the ADF-specific removal of tph-1 referenced in the text.

      We agree that a direct comparison between the FLP-mediated removal of cat-1 and ADFspecific removal of tph-1 would be informative for assessing the efficiency and functional consequences of these manipulations. These experiments represent an interesting direction for future work, and we plan to pursue such comparisons in subsequent studies.

      (4) ADF seems to express very low levels of cho-1 (reuptake transporter), based on the images in Figure S8. Does it express higher levels of cha-1 (synthesis)?

      We have not directly compared the relative expression levels of cho-1 and cha-1 in ADF neurons in this study. Such quantitative comparisons of synthesis and reuptake machinery represent an interesting direction for future work but fall beyond the scope of the present manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ma et al. show that melanoma cells induce an EMT-like state in nearby keratinocytes and that when this state is induced experimentally by Twist-overexpression the resulting alteration in keratinocytes is inhibitory for melanoma invasion. These conclusions are based on experiments in vivo with zebrafish and, in vitro, with human cells. The work is carefully done and provides new insights into the interactions between melanoma cells and their environment.

      We appreciate your support for our overall conclusions.

      Strengths:

      The use of both zebrafish and human cells adds confidence that findings are relevant to human melanomas while also further demonstrating the utility of the zebrafish system for discovering important new features of melanoma biology that could ultimately have clinical impacts. The work also combines a nice suite of approaches including different models for induced melanomagenesis in zebrafish, single-cell RNA-sequencing, and more. Some of the final observations are intriguing as well, especially the possibility of EMT-induced melanocyte-keratinocyte interactions via Jam3 expression; it will be interesting to see if this is indeed a mechanism for restraining melanoma invasion. The paper is clearly written and the inferences are appropriate for the results obtained. Overall the work makes a solid contribution to our understanding of important, but too often neglected, roles of the tumor microenvironment in promoting or inhibiting tumor progression and outcome.

      Weaknesses:

      No critical weaknesses were noted.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Ma et. al. utilizes a zebrafish melanoma model, single-cell RNA sequencing (scRNA-seq), a mammalian in vitro co-culture system, and quantitative PCR (Q-PCR) gene expression analysis to investigate the role keratinocytes might play within the melanoma microenvironment. Convincing evidence is presented from scRNA-seq analysis showing that a small cluster of melanoma-associated keratinocytes upregulates the master EMT regulator, transcription factor, Twist1a. To investigate how Twist-expressing keratinocytes might influence melanoma development, the authors use an in vivo zebrafish model to induce melanoma initiation while overexpressing Twist in keratinocytes through somatic transgene expression. This approach reveals that Twist overexpression in keratinocytes suppresses invasive melanoma growth. Using a complementary in vitro human cell line co-culture model, the authors demonstrate reduced migration of melanoma cells into the keratinocyte monolayer when keratinocytes overexpress Twist. Further scRNA-seq analysis of zebrafish melanoma tissues reveals that in the presence of Twist-expressing keratinocytes, subpopulations of melanoma cells show altered gene expression, with one unique melanoma cell cluster appearing more terminally differentiated. Finally, the authors use computational methods to predict putative receptor-ligand pairs that might mediate the interaction between Twist-expressing keratinocytes and melanoma cells.

      Strengths:

      The scRNA-seq approach reveals a small proportion of keratinocytes undergoing EMT within melanoma tissue. The use of a zebrafish somatic transgenic model to study melanoma initiation and progression provides an opportunity to manipulate host cells within the melanoma microenvironment and evaluate their impact on tumour progression. Solid data demonstrate that Twist-expressing keratinocytes can constrain melanoma invasive development in vivo and reduce melanoma cell migration in vitro, establishing that Twist-overexpressing keratinocytes can suppress at least one aspect of tumour progression.

      Weaknesses:

      While the scRNA-seq analysis of melanoma tissue and RT-PCR analysis of EMT gene expression in isolated keratinocytes provide evidence that a subpopulation of host keratinocytes upregulates Twist and other EMT marker genes and potentially undergoes EMT, the in vivo evidence for keratinocyte EMT within the melanoma microenvironment is based on cell morphology in a single image without detailed characterization and quantification. No EMT marker gene expression was examined in melanoma tissue sections to determine the proportion and localization of Twist+ve keratinocytes within the melanoma microenvironment.

      We agree this needed better support. To address this, we have collaborated with the Sorger lab who has performed Spatial Transcriptomics on early human melanoma samples (n=8 samples). The advantage of this method is that they can dissect microregions of interest (MRs) RNA-seq to discern keratinocytes vs. melanocytes. We queried regions that had higher or lower numbers of atypical melanocytes in these biopsies with our TAK or TWIST signature. While the normal sample had no enrichment, we found that a subset of the human samples had evidence of these signatures in the keratinocytes, particularly the ones which had a higher proportion of atypical melanocytes. These data support our model that early melanomas enact an EMT like program in a subset of nearby keratinocytes.

      The scRNA-seq UMAP suggests the proportion of EMT keratinocytes within the melanoma microenvironment is very small, raising questions about their precise location and significance within the tumour microenvironment. Although both in vivo and in vitro evidence demonstrates that Twist-expressing keratinocytes can suppress melanoma progression, the conditions modelled by the authors involve over-expression of Twist in all keratinocytes, which do not naturally occur within the melanoma microenvironment and, therefore, might not be relevant to naturally occurring melanoma progression. The author did not test whether blocking EMT through down-regulation of Twist in keratinocytes may influence melanoma development, which would establish the role of Twist expression keratinocytes in the melanoma microenvironment.

      We entirely agree, and ideally would do the exact experiment you suggested, which is to knockout TWIST in the keratinocytes using CRISPR and see how this affects the tumor phenotype. However, despite our best efforts, we do not yet have an efficient method for performing knockouts in the tumor microenvironment. If we used standard 1-cell embryo transgenic approaches with a krt4-Cas9, this would severely disrupt skin development in the whole animal, and would be viable. Theoretically, we could do this with TEAZ, but we have found that the expression of Cas9 in the microenvironment (i.e. under a krt4 promoter) is relatively inefficient. For example, we tried a krt4-Cas9 coupled with an sgRNA against GFP (as a test of the system) and this did not work well. Thus, a major goal for future studies is to develop a technology that would allow us to do this exact experiment. Finally, we do not have enough cells present in the sections to answer the question of whether the EMT keratinocytes are associated with certain melanoma cell states (i.e. proliferative, invasive), although we agree this would be an important question for future studies.

      To address the potential mechanism by which Twist-expressing keratinocytes suppress melanoma progression, a second scRNA-seq analysis was conducted. However, this analysis is not adequately presented to provide strong evidence for proposed mechanisms for how Twist-expressing keratinocytes suppress melanoma cell invasion. CellChat analysis was used to attempt to identify receptor-ligand pairs that might mediate keratinocyte-melanoma cell interaction, but the interactions between tumour-associated keratinocytes (TAK) and melanoma cells were not included in the analysis. Furthermore, although genetic reporters were used to label both keratinocytes and melanoma cells, no images showing the detailed distribution and positional information of these cells within melanoma tissue are presented in the report. None of the gene expression changes detected through Q-PCR or scRNA-seq were validated using immunostaining or in situ hybridization.

      As noted above, we have now added human biopsy samples from the Sorger lab to our analysis, showing that the TAK/TWIST keratinocytes occur directly adjacent to the atypical melanocytes in these samples. While these early melanomas are quite difficult to obtain (most samples are used for diagnostic purposes), this provides further support to our zebrafish models.

      Overall, the data presented in this report draw attention to a less-studied host cell type within the tumour microenvironment, the keratinocytes, which, similar to well-studied immune cells and fibroblasts, could play important roles in either promoting or constraining melanoma development.

      Counterintuitively, the authors show that Twist-expressing EMT keratinocytes can constrain melanoma progression. While the detailed mechanisms remain to be uncovered, this is an interesting observation.

      Reviewer #3 (Public review):

      Summary:

      In this study the authors use the zebrafish model and in vitro co-cultures with human cell lines, to study how keratinocytes modulate the early stages of melanoma development/migration. The authors demonstrate that keratinocytes undergo an EMT-like transformation in the presence of melanoma cells which leads to a reduction in melanoma cell migration. This EMT transformation occurs via Twist; and resulted in an improvement in OS in zebrafish melanoma models. Authors suggest that the limitation of melanoma cell migration by Twist-overexpressing keratinocytes was through altered cell-cell interactions (Jam3b) that caused a physical blockage of melanoma cell migration.

      Strengths:

      The authors describe a new cross-talk between melanoma and its major initial microenvironment: the keratinocytes and how instructed by melanoma cells keratinocytes undergo an EMT transformation, which then controls melanoma migration. Overall, the paper is very well written, and the results are clearly organized and presented.

      Weaknesses:

      (1) To really show their last point it would be important to CRISPR KO Jam3b in melanoma with twist OE keratinocytes, in vivo or in vitro.

      The CellChat data suggest that Jam3b is likely important in melanoma development, as it has been shown to be important in melanocyte development (Eom, Dev Biol 2021). Studying this specifically in melanoma progression is an area of ongoing study in our lab, and we have begun to generate the Jam3b knockouts as you suggested. Since this set of experiments is quite extensive, we feel this set of data deserves a separate manuscript, which we hope to complete in the near future.

      (2) The use of patient biopsies from early-stage melanomas vs healthy tissue to assess if there is a similar alteration of morphology of adjacent keratinocytes and an increase in vimentin in human samples would strengthen the author's findings.

      As noted above, we have now added human biopsy samples from the Sorger lab to our analysis, showing that the TAK/TWIST keratinocytes occur directly adjacent to the atypical melanocytes in these samples. While these early melanomas are quite difficult to obtain (most samples are used for diagnostic purposes), this provides further support to our zebrafish models.

      (3) The cell-cell junctions and borders between cells (melanoma/ keratinocytes) should be characterized better, with cellular and sub-cellular resolution. Since melanocytes can "touch" with their dendrites ~40 keratinocytes - can authors expand and explain better their model? Can this explain that in some images we cannot observe a direct interface between the cells?

      We have now added higher resolution images of these junctions. Our overall hypothesis, related to point (2) above, is that Jam3b mediates these junctions between melanoma cells and keratinocytes, which is why we are now pursuing this in a followup study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Please say a little more about any phenotypes that might have been evident inTwist-overexpression fish in the absence of melanomas, and clarify in the text that these were mosaic animals, as a first (incorrect) reading left the impression that stablelines had been made.

      In these experiments, we co-injected the melanoma plasmids along with the krt4-TWIST plasmids, creating mosaic animals. Because of this, we did not have a way of specifically looking at the effect of TWIST in the absence of melanoma. We agree this needs better clarification and have added this to the Results.

      (2) Violin plot colors in main and Supplementary Figures tend to obscure data points. Colors for keratinocyte clusters are not discernible in Figure 4C.

      We have remade the plots in a different color scheme to try and make these stand out more easily.

      (3) Clarify that N-cadherin = cdh2 in Figure 1

      We have fixed this in the legend for Figure 1.

      (4) Clarify the relationship between keratinocytes highlighted in Figure 2B and used for Hallmark expression in Figure 2B, and those analyzed for expression of candidate genes in Figure 2E. The last shows many NKC whereas whereas even the larger group circled in Figure 2B as keratinocytes seems to have far fewer cells, unless massively overplotted. Is the rest of that cluster in Fig. 2B keratinocytes as well?

      In the analysis in Figure 2E, we first calculated genes differentially expressed in the TAK vs. NKCs (found in Figure 2B). We used those genes as input into GSEA analysis, which showed enrichment for EMT programs specifically in the TAKs. We recognize that the number of TAKs is relatively small (compared to all of the other cells in the single-cell UMAP) but that is the most we were able to get from this particular scRNA run, because the melanoma cells naturally make up the vast majority of the cells in the 10X run. This is why we performed downstream mechanistic analysis (in the rest of the paper) to ensure this result was not an artifact of a small number of TAKs.

      (5) Define "NES" in the Figure 2 legend.

      NES indicates “Normalized Enrichment Score”, a standard output of GSEA. This has been added to the legend.

      (6) Indicate how many control vs. Twist+ fish were found to have invasive vs non-invasive tumors upon histological examination. Were tumors in the latter fish always contained within the epidermis proper, or did some extend deeper if given enough time?

      In the histology analysis, we used n=3 control fish and n=3 TWIST overexpressing fish. Main Figure 3 shows n=1 of these fish from each group, and the other n=2 from each is shown in Supplemental Figure 1. In this cohort (taken at 26 weeks), all of the TWIST tumors were contained within the epidermis, but we did not let them grow longer to see if (given enough time) they could have invaded below this. Around 26 weeks, the survival decreased so made this an unfeasible experiment at later time points. We have added a statement about this to the Results section.

      Reviewer #2 (Recommendations for the authors):

      Going through the data presented in the figures, here are my comments:

      (1) Figure 1: To strengthen the evidence that keratinocytes in the melanoma microenvironment undergo EMT, it would be beneficial to provide immunostaining or in situ data for EMT marker genes within melanoma tissue sections co-stained with a keratinocyte marker (such as an anti-GFP antibody).

      We agree this type of analysis is an important validation of our findings. Doing this in zebrafish tumors is difficult, as human/mouse antibodies for EMT marker genes typically do not work in fish. In addition, we felt that validating our results in human melanomas would make our findings more generalizable. Therefore, we established a collaboration with Peter Sorger’s lab, who have been performing high-resolution spatial transcriptomics on early melanoma samples from humans. While these are difficult to attain (since most early lesions are processed for clinical diagnosis) they have a collection of n=8 samples that they subjected to GeoMX spatial analysis. In this method, the samples are first stained with antibodies to definitively mark keratinocytes (PANCK) vs. melanoma cells (SOX10) and all samples are reviewed by expert pathologists. From this, microregions (MRs) of interest are selected to then undergo RNA-seq. After control analysis to ensure both keratinocytes and melanocytes were present in the samples, they then used our TAK or TWIST signatures as a query. Both signatures were enriched in the keratinocytes adjacent to early melanomas, but not in normal skin samples or in samples with few atypical melanocytes. This provides further evidence that the altered keratinocytes we see in our fish are present and enriched in human biopsy specimens.

      (2) Figure 2: In panel B, the UMAP shows the separation of single cells, and keratinocytes are circled. However, there are two clusters of keratinocytes, and the graph does not indicate which cluster represents tumour-associated keratinocytes (TAKs) versus normal keratinocytes (NKCs). The two clusters also appear to differ in abundance, so it would be helpful to report the proportion of keratinocytes that are TAKs undergoing EMT, according to the individual dots in Figure 2E. In Figure 2E,TAKs seem to have very few cells compared to the other clusters. Given the relatively small number of EMT-TAKs detected in the single-cell RNA-seq data, I wonder how much direct influence these cells could exert on the bulk of melanoma cells in vivo.The evidence would be strengthened if an IHC analysis could show the location of Twist-expressing keratinocytes within the melanoma microenvironment and whether they are associated with certain melanoma cell markers but not others (i.e., markers indicating different differentiation states of melanoma cells). To further support the role of Twist-expressing keratinocytes in the melanoma microenvironment, it would be beneficial to perform a knockout (KO) of Twist in keratinocytes within the melanoma microenvironment.

      In Figure 2B, we agree that the color scheme made it difficult to discern TAKs vs. NKCs.

      We have changed the color scheme to make this more clear.

      The number of TAKs undergoing EMT is relatively small, and this is why we performed the overexpression studies of TWIST in order to expand the field of keratinocytes undergoing EMT. To get at the question of whether these are really important in tumor initiation and progression, we ideally would do the exact experiment you suggested, which is to knockout TWIST in the keratinocytes using CRISPR and see how this affects the tumor phenotype. However, despite our best efforts, we do not yet have an efficient method for performing knockouts in the tumor microenvironment. If we used standard 1-cell embryo transgenic approaches with a krt4-Cas9, this would severely disrupt skin development in the whole animal, and would not be expected to be viable. Theoretically, we could do this with TEAZ, but we have found that the expression of Cas9 in the microenvironment (i.e. under a krt4 promoter) is relatively inefficient. For example, we tried a krt4-Cas9 coupled with an sgRNA against GFP (as a test of the system) and this did not work well. Thus, a major goal for future studies is to develop a technology that would allow us to do this exact experiment. Finally, we do not have enough cells present in the sections to answer the question of whether the EMT keratinocytes are associated with certain melanoma cell states (i.e. proliferative, invasive), although we agree this would be an important question for future studies.

      (3) Figure 4: Co-culture results show that melanoma cells migrate further on a control HaCaT cell monolayer compared to a TWIST-overexpressing HaCaT cell monolayer. While this phenotype might support the conclusion that TWIST-expressing keratinocytes reduce melanoma cell invasion, it should be interpreted with caution. The data can be interpreted as TWIST-HaCaT cells inhibiting melanoma cell migration; however, an alternative explanation cannot be ruled out. For example, wild-type HaCaT cells might provide a suitable substrate for melanoma cells to migrate, whereas TWIST-HaCaT cells lack this property. To address this, the baseline melanoma cell migration should be established in this assay by coating the plate with cells from the same melanoma cell line and allowing melanoma cells from the flipped cover slip to migrate out.

      We have performed the experiment you suggested using Hs.294T and SKMEL2 cells and provided this as a new Supplemental Figure 2. This demonstrated that the melanoma cells in this context could indeed migrate out of the coverslip at baseline. Thus, it is possible, as you indicated, that the phenotype we have observed might be due to something lacking in the TWIST keratinocytes that promotes migration. Since we cannot differentiate between these two possibilities (i.e. that TWIST KCs actively inhibit migration vs. lacking something that promotes migration), we have modified the text to indicate both of these possible mechanisms could be at play.

      (4) In the representative images shown in the figure, it appears that both HaCaT cells and melanoma cells in the upper and lower panels are at very different densities."Contact inhibition" and "cell sorting" are well-known phenomena in tissue-cultured cells, so when cells are seeded at different densities, their ability to move away from the initial location could vary. From the Materials and Methods section, it is unclear why cell densities are drastically different in the images presented. Images in the upper panel show both melanoma cells and keratinocytes at lower densities, and in the TWIST group, melanoma cells under the cover slip appear to aggregate into clusters with TWIST-expressing keratinocytes surrounding each aggregated cluster. This suggests that cell sorting might be occurring, potentially mediated by cadherins or Eph-ephrins.

      We recognized this discrepancy as well. In the setup of the experiment, we seeded the exact same number of cells for both the Hs.294T (Figure 4E) and SKMEL2 (Figure 4G) experiment. But when we took the images after 20 hours of co-culture, it was clear that the HaCat densities were different, as seen in the figures. We suspect this might be because these two melanoma cells may secrete different factors (i.e. growth factors) that impact upon HaCat proliferation, adhesion or cell sorting. Despite this, in terms of the ability of the melanoma cells to migrate into the HaCATs, we saw similar results across both experiments, suggesting that it is not HaCAT density alone that explains the results. But we agree we need to clarify this point about cell density more clearly in the manuscript, and we have amended the Discussion to indicate the above points.

      (5) Figure 5: Single-cell RNA-seq analysis comparing cells from control melanomas with cells from melanomas developed in a Twist-expressing keratinocyte background could provide valuable information on how melanoma cells alter their phenotype and how Twist-expressing keratinocytes respond to melanoma development. However, the information presented in the manuscript is not persuasive in this regard (appears to be minimal).

      (a) In Figure 5C, the differences between melanoma cells in a control background versus those in a Twist-expressing keratinocyte background include cells from more than one unique cluster, but most of the different clusters are not discussed, except for one prominent cluster indicated by an arrow.

      The reason we pointed out that one cluster is that it was the major thing that was different in the control melanomas vs. the TWIST melanomas. To better clarify this point, we have made a new Supplemental Figure 3 comparing the clusters in each situation: 7 in the control melanomas vs. 8 in the TWIST melanomas (Supp. Figure 3d). To then better understand the nature of the TWIST melanomas, we performed Gene Set Enrichment Analysis (GSEA) compared to the control melanomas. Interestingly, this revealed a striking enrichment for pathways related to oxidative phosphorylation using both GO and Hallmark terms. Because we had previously shown that melanoma cells with high ox-phos are typically in the more melanocytic and less invasive state (Lumaquin-Yin, Nature Communications 2023), we therefore analyzed our TWIST melanomas by comparing this unique cluster to the well-annotated melanoma cell state signatures from Tsoi et al (Cancer Cell, 2018). This showed that most of the TAKs and TWIST-KCs were in the melanocytic/transitory cluster, which are thought to be the least invasive of all the melanoma cell states. Thus, it seems likely that high levels of TWIST in the keratinocytes induces a low invasion state in the melanoma cells. We have added this data and interpretation to the Results and Discussion sections of the manuscript.

      (b) In Figure 5D, it is unclear whether TAKs include both wild-type keratinocytes and Twist-expressing keratinocytes. 

      We oversimplified this plot for the sake of visualization, but realize that in doing so we obscured some important details. In the plot, we separate normal keratinocytes (NKCs) vs. tumor associated keratinocytes (TAKs). TAKs are, by definition, TWIST<sup>hi</sup>/EMT<sup>hi</sup> and represent upregulation of endogenous TWIST. In contrast, when we force overexpression of TWIST in the keratinocytes, then we see an entirely new cluster appear, as expected. 

      (c) In Figure 5F, TAKs are interacting with melanoma cells so it is unclear why the CellChat analysis did not include TAKs. 

      This was an oversight on our part, and the Figure has now been corrected to include this. TAKs in both the control and TWIST melanomas have numerous interaction partners, whereas the TWIST-KCs have relatively fewer and more specific interactions.

      (d) Finally, Figure 5G needs clearer labelling,currently unclear which gene is expressed by the sender and which is by the receiver.

      This has been clarified in Figure 5F with specific indicators of “sender” vs. “receiver”.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1E - in this figure, it is possible to observe the altered morphology of keratinocytes but these cells are not in the vicinity of the melanoma cells - can authors please make a zoom-in in the region of the interface? And quantify the distance between cells - at least the image they show looks like the cells that are mostly de-formed are far away from the melanoma but perhaps was just this example....please clarify. Or there are patches of keratinocytes that go through EMT and others that maintain their epithelial structure?

      We have now added zoom-in images of the interface (Figure 1E). In nearly all sections examined, some keratinocytes maintain their hexagonal normal epithelial structure, but the majority of the cells appear altered. We have attempted to quantify this effect, along with the distance between cells with this EMT-like morphology, but have not found a reliable method given the heterogeneity across samples. That is why we instead chose to quantify the EMT-like keratinocytes (what we refer to as TAKs) using single-cell RNA seq, which showed that 32% of the population had the TAK signature, whereas 68% resembled normal keratinocytes. We feel this is more quantitative than imaging alone.

      This data has been added to the Results section.

      (2) Figure 3B - could not find the number of fish analyzed.

      This was an oversight on our part. We studied n=135 control melanomas vs. n=118

      TWIST melanomas. This data has now been added to Figure 3B.

      (3) Figure 3D - missing a graph with quantification and zoom images in the tail keratinocytes/ melanoma interface.

      In this particular cohort of animals, we unfortunately did not specifically track body vs. fin melanomas, so we are not able to quantify this.

      (4) Figure 4 - it would be nice again to have a zoom-in to observe the interface of cells- maybe use a phalloidin staining to visualize better how cells are touching each other.

      We have added a zoom in image of the interface to the image (Figure 4E). We have very much wanted to do immunohistochemistry (not just for phalloidin, but for other markers as well) on these coverslip co-cultures and have tried, but we have not been successful. This is likely because the assay requires plastic plates, which are incompatible with doing this, but agree that getting this to work would be an important area for future development.

      (5) I believe the paper deserves a last figure - with the model.

      We agree and this has now been added as Figure 7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This manuscript provides several important findings that advance our current knowledge about the function of the gustatory cortex (GC). The authors used high-density electrophysiology to record neural activity during a sucrose/NaCl mixture discrimination task. They observed population-based activity capable of representing different mixtures in a linear fashion during the initial stimulus sampling period, as well as representing the behavioral decision (i.e., lick left or right) at a later time point. Analyzing this data at the single neuron level, they observed functional subpopulations capable of encoding the specific mixture (e.g., 45/55), tastant (e.g., sucrose), and behavioral choice (e.g., lick left). To test the functional consequences of these subpopulations, they built a recurrent neural network model in order to "silence" specific functional subpopulations of GC neurons. The virtual ablation of these functional subpopulations altered virtual behavioral performance in a manner predicted by the subpopulation's presumed contribution.

      Strengths:

      Building a recurrent neural network model of the gustatory cortex allows the impact of the temporal sequence of functionally identifiable populations of neurons to be tested in a manner not otherwise possible. Specifically, the author's model links neural activity at the single neuron and population level with perceptual ability. The electrophysiology methods and analyses used to shape the network model are appropriate. Overall, the conclusions of the manuscript are well supported.

      Weaknesses:

      One potential concern is the apparent mismatch between the neural and behavioral data. Neural analyses indicate a clear separation of the activity associated with each mixture that is independent of the animal's ultimate choice. This would seemingly indicate that the animals are making errors despite correctly encoding the stimulus. Based solely on the neural data, one would expect the psychometric curve to be more "step-like" with a significantly steeper slope. One potential explanation for this observation is the concentration of the stimuli utilized in the mixture discrimination task. The authors utilize equivalent concentrations, rather than intensity-matched concentrations. In this case, a single stimulus can (theoretically) dominate the perception of a mixture, resulting in a biased behavioral response despite accurate concentration coding at the single neuron level. Given the difficulty of isointensity matching concentrations, this concern is not paramount. However, the apparent mismatch between the neural and behavioral data should be acknowledged/addressed in the text.

      We thank the Reviewer for the insightful comments and thoughtful suggestions. Our electrophysiological recordings show that GC dynamically encodes stimulus concentration of mixture elements, dominant perceptual quality, and decisions of directional lick. With regard to the encoding of mixtures, the clear separation of activity associated with each mixture (Figure 3) is present at a trial-averaged pseudo-population level, and average activities associated with more similar, intermediate mixtures are closer to each other in this space. At a single trial level activities evoked by similar, intermediate mixtures are much harder to separate. This increased similarity can lead to behavioral errors resulting from either incorrect encoding of the stimulus or from the inability to interpret the stimulus to guide the correct decision. The psychometric function, which shows that more distinct stimuli (100/0 vs 0/100) lead to fewer mistakes than more ambiguous, intermediate mixtures (55/45 vs 55/45), is consistent with the increased ambiguity of responses to intermediate mixtures.

      The Reviewer is correct that there could be a slight mismatch in the perceived intensity of the mixture components. This mismatch could be the reason for the slight asymmetry in our psychometric function (Figure 1B). However, it is not uncommon for mice in these 2AC tasks to also have a motor laterality bias in their responses that manifests itself for the more ambiguous stimuli. We chose not to model this bias given its subtlety and its unknown origin. Rather, we chose to model an ideal scenario in which stimuli have matched intensity and no motor bias exists. In the revised manuscript we discuss this issue.

      Reviewer #1 (Recommendations for the authors):

      (1) The apparent mismatch between neural and behavioral data. I am providing more details in this section to hopefully better illustrate my concern.

      (a) Based on the author's psychometric curve, sucrose appears to be a more salient signal causing the behavior to be shifted (e.g., a 50/50 mixture results in a >60% predicted behavioral performance). If both sucrose and salt were intensity-matched, a 50/50 mixture should result in a behavioral performance near 50%. The increased salience of sucrose could cause the animals to have lower overall performance despite accurate neural encoding. Alternatively, certain animals could display a strong side bias, skewing the data slightly. These issues have seemingly been fixed in the model data, which displays a more balanced psychometric curve. Accordingly, the model data seemingly displays a larger shift in error trials as compared to correct trials (Figure 6A).

      The reviewer is correct in observing that the average experimental psychometric curve in Figure 1B shows a slight shift in favor of the sucrose side with a 50/50 mixture. We fit psychometric curves to each session and the mean value of P(Sucrose choice | Stimulus = 50/50) across sessions was significantly different from 0.5 (one-sample t-test, p = 0.003), with 5 probabilities below 0.5 and 18 above it.

      This slight bias could be attributed to a slight mismatch in the perceived intensity of the mixture components and/or lateral motor biases. In any case, it is subtle and its origins were not a focus of this study.

      Models were not trained to match the animals’ psychometric curves, but rather to choose correctly in an ideal scenario where stimuli have matched intensities. This explains why the model simulations lack the bias observed in animal behavior data.

      We do not believe that there is a mismatch between the experimental behavioral and neural data, as trial-averaged pseudo-population trajectories are farther in neural space for more discriminable stimuli and closer in neural space for more similar stimuli, consistent with behavioral performance that is high for more discriminable stimuli and low for more similar stimuli. Moreover, as the model also shows, a clear separation of trial-averaged trajectories still results in a sigmoidal performance function for trial-to-trial behavior.

      Finally, subtle behavioral biases would not necessarily be expected to appear in our dPCA analyses since we used this technique to find a single axis that best separates all stimuli conditions regardless of choice when the pseudo-population data are projected upon it. Additional modes of activity that explain less overall variance might better reflect biases.

      (b) Although I am not an expert at these analyses, I wonder whether the elevated bump (i.e., >0) in Figure 3C of the 55/45 mixture that occurs early in the stimulus presentation further supports the hypothesis mentioned above and could indicate an early signal of salience/increased intensity?

      The reviewer is correct that the 55/45 trajectory features a brief positive wave right after stimulus delivery before going negative. While this may be related to stimuli not being explicitly balanced for intensity, it could also reflect a signal related to ambiguity or balanced mixtures. We are hesitant to interpret this positive deflection as conclusive evidence of a bias in neural activity, given its short duration and the natural variability of neural signals.

      (2) The increase in step-perception neurons after the decision period is confusing (Figure 4C). The text states (line 246) "the analysis reveals a small and time-invariant proportion of step-perception neurons". However, the proportion doubles after the decision-making process, which is seemingly a significant change. Why does this occur? This observation is noticeably missing from the network data. Could it be attributed to a mislabeling of "step-choice" neurons, given the correlation between the left/right decision and sweet/salty? Either way, it is very noticeable and should be addressed.

      We cannot be sure of the reason for the increase in step-perception neurons after decisions. One possibility is that they are acting as feedback for learning, encoding the percept to compare with choice and outcome to improve performance. The model, which presumably learns the task differently from the animals, does not seem to leverage this signal for its own learning. We have modified the text, now referring to a “small but consistently present proportion” of step-perception neurons, and included this proposed explanation in the Discussion.

      (3) Optional: I think the authors are missing an opportunity to analyze the temporal aspect of this multiplex code using their network-based modeling approach. A significant proportion of neurons fall into different categories (i.e., step-perception/linear, etc.) at different time points. However, the virtual ablation experiments remove any neuron that falls into one of these categories at any time. By limiting the cell-specific virtual ablation to specific time windows, you could (I think) provide stronger evidence for the temporal sequence of the encoding of these perceptual aspects.

      This was an excellent suggestion for an additional modeling experiment, so we performed it. A new supplemental figure (Figure S8) and additional text in the revised manuscript showcase the results. In summary:

      In terms of behavioral results, ablating the linear coding units in the beginning (that is, silencing all units that are labeled linear in any bin within the first 1.2 s after stimulus onset for the entirety of the 1.2 s) significantly reduces performance, as does ablating the step-perception or step-choice coding units at the end (1.2 s prior to choice). The remaining combinations of coding type and timing of the ablation do not affect performance.

      Regarding the dynamics of coding types (compare Figure 7A), stimulus coding activity was significantly blunted only by ablating the linear coding units in the beginning, whereas choice coding activity was diminished by ablating the choice coding units at the end or by ablating the linear coding units at either the beginning or the end.

      Reviewer #2 (Public review):

      Lang et al. investigate the contribution of individual neuronal encoding of specific task features to population dynamics and behavior. Using a taste-based decision-making behavioral task with electrophysiology from the mouse gustatory cortex and computational modeling, the authors reveal that neurons encoding sensory, perceptual, and decision-related information with linear and categorical patterns are essential for driving neural population dynamics and behavioral performance. Their findings suggest that individual linear and categorical coding units have a significant role in cortical dynamics and perceptual decision-making behavior.

      Overall, the experimental and analytical work is of very high quality, and the findings are of great interest to the taste coding field, as well as to the broader systems neuroscience field.

      I have a couple of suggestions to further enhance the authors' important conclusions:

      My main comment is the distinction between constrained and unconstrained units. The authors train a small percentage of units to match the real neural data (constrained units), and then find some unconstrained units that are similar to the real neural data and some that are not. As far as I could tell, the relative fraction of constrained and unconstrained units in the trained RNN is not reported; I assume the constrained ones are a much smaller population, but this is unclear. The selection of different groups of neurons for the RNN ablation experiments appears to be based on their response profiles only. Therefore, if I understood correctly, both constrained and unconstrained units are ablated together for a given response category (e.g., linear or step-perception). It would be useful, therefore, to separately compare the effects of constrained vs. unconstrained RNN units.

      We thank the Reviewer for the constructive feedback. The Reviewer is correct that ablations were carried out with respect to response categories only and included both constrained and unconstrained units.

      The ratio of total units to constrained units was fixed at 5.88, thus constrained units were ~17% of the network and unconstrained units were ~83%. This value is specified in the Methods (RNN: Components and dynamics), but we have reported it in the Results of the revised manuscript for clarity.

      We have also edited the Methods because they wrongly stated that the ratio of unconstrained (rather than total) units to constrained units was 5.88.

      Specifically:

      (1) For the analyses in the initial version of the manuscript, the authors should specify how many units in each ablation category are constrained and unconstrained.

      In the revised manuscript, we have specified the fractions of constrained and unconstrained units within each response category. For convenience, they are reported here: linear = 194 constrained and 691 unconstrained units; step-perception = 147 constrained and 840 unconstrained units; step-choice = 129 constrained and 814 unconstrained units; “other” = 353 constrained and 1739 unconstrained units.

      (2) The authors should repeat Figure 6, but only for unconstrained units to test how much of the effects in the initial version of Figure 6 are driven by constrained vs. unconstrained RNN units.

      In the revised version we have included two additional supplemental figures (Figures S5-6) where the analyses of Figure 6 are carried out separately for constrained and unconstrained units. In short, the results for the constrained units strongly resemble those for the experimental data, while the results for the unconstrained units strongly resemble those for all model units.

      (3) The authors should repeat Figure 7, but performing ablations separately on the constrained and unconstrained units to examine how the network behaves in each case and the resulting "behavioral" effect.

      The revised version includes a supplemental figure (Figure S7) with the results of these additional ablation simulations.

      In summary:

      In terms of behavioral performance, the prior results showing that ablating linear, step-perception, or step-choice units significantly impairs performance, while ablating “other” has no significant effect, hold even if ablation is restricted to only constrained or only unconstrained units. There is a significant main effect of constrained vs unconstrained; on average, ablating the unconstrained population impairs performance more, most likely due to their larger population size.

      In terms of dynamics, to impair stimulus coding by ablating step-choice units, you must ablate them all; to impair stimulus coding by ablating linear or step-perception units, however, ablating just the unconstrained ones suffices. As before, ablating linear, step-perception, or step-choice units significantly impairs choice coding activity, while ablating “other” units does not; these results hold even if ablation is restricted to only constrained or only unconstrained units. Finally, there is again a significant main effect of constrained vs unconstrained; on average, ablating the unconstrained population impairs dynamics more, most likely due to the larger population size.

      Reviewer #2 (Recommendations for the authors):

      (1) In addition to panel 5B, it would be informative to show data from individual mice and the corresponding RNNs trained on each mouse, to assess how closely they match. If available, including one representative example of a good match and one of a less accurate match would help the reader get a better sense of the data.

      Figure 5B shows the average behavioral performance of the model. Individual models were not trained directly on the psychometric curves of experimental sessions; they were trained to perform the task correctly. After successful training, model simulations were run with input noise to be able to produce a sigmoidal psychometric curve. However, although the input noise was tuned to capture the overall correct rate of the corresponding experimental session, we did not attempt to match the details of the psychometric curve. See also the next reply.

      (2) In addition to panel 5C, it would be useful to add examples of experimentally observed PSTHs and the corresponding activity trajectory for the units in the RNN trained to match them, for all the other coding patterns (step-perception and step-choice).

      We note that the PSTH in 5C is not an example of a linear coding unit as the Reviewer implies, but simply one with a good fit, and here the model’s output was produced in the absence of input noise. In order to classify step-perception and step-choice responses one needs error trials, but the model was trained without this input noise that induces errors (and produces a sigmoidal psychometric function) to match experimental PSTHs from correct trials only. Post-training simulations were then run with input noise to induce error trials, and model unit response profiles were classified based on this. However, there is no guarantee that error trials in the model match the error trials in the experiment; therefore, step-perception and step-choice units in the model may or may not be step-perception and step-choice units in the data. Despite this limitation, the revised manuscript includes additional examples, in Figure S2, of experimentally observed PSTHs and their corresponding model activity, to supplement Figure 5C and provide a better sense of the goodness-of-fit.

      (3) Electrophysiological data in Figure 2 - It would be helpful to provide statistics on how many neurons change their activity in each session.

      In the revised manuscript we have included across-session statistics for proportions of neurons that are taste-responsive and that show decision preparatory activity. We have also included tables (Tables S1 and S3) with the numbers of neurons that are taste-responsive and that show preparatory activity for each session in the experimental and model data.

      (4) Peak auROC selection - How was the peak auROC selected? Selecting only one bin for the peak could be potentially problematic and may result in the incorrect identification of an outlier that does not faithfully represent the neuron's overall activity. The peak selection could instead be based on several consecutive bins showing a consistent trend. If this approach was already implemented, the authors should explicitly describe it in the Methods section.

      Peak auROC was selected from a single bin (with average duration about 50ms). While it is true that this may result in outlier neurons that transiently prefer one stimulus strongly but more consistently prefer the other, we opted for a simple criterion to sort the neurons into two categories for visualization. Adopting more stringent criteria that consider multiple bins may result in neurons that cannot be placed in either category, and we wanted a way to examine the entire pseudo-population. Also, the entire auROC trace is visualized in the heatmap, so potential outliers are not hidden and can be assessed by eye.

      Reviewer #3 (Public review):

      Primary taste cortex neurons show a variety of dynamic response profiles during taste decision-making tasks, reflecting both sensory and decision variables. In the present study, Lang et al. set out to determine how neurons with distinct response profiles contribute to perceptual decisions about taste stimuli.

      The methods, with reference to the behavioral task and electrophysiological recordings/data analysis, are straightforward, solid, and appropriate. The computational model is presented in a clear and conceptually intuitive manner, although the details are outside of my area of expertise.

      The experimental design features a simple 2-alternative forced-choice design that yielded clear psychometric curves across a range of stimuli. In vivo recordings were performed using Neuropixels and yielded an appropriate sample of single neuron responses. The strength of the model lies in the fact that it consists of single neurons whose response profiles mimic those recorded in vivo, and allows neuron-selective manipulation.

      By virtually lesioning specific subsets of neurons in the network, the authors demonstrate that a relatively small population of neurons with specific tuning profiles was sufficient to produce the observed neural dynamics and behavioral responses. This effect was selective as lesioning other responsive neurons did not affect overall response dynamics or performance.

      These findings provide new insight into the relation between the response profiles of single neurons in sensory cortex, their population-level activity dynamics, and the perceptual decisions they inform.

      The approach is particularly innovative as it uses computational modeling to target functionally-defined "cell types", which cannot necessarily be targeted by more conventional genetic approaches.

      We thank the Reviewer for the positive assessment of our study.

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction: I'm missing a clearly stated specific hypothesis and what is predicted on the basis of that hypothesis. What is the alternative?

      The null hypothesis is that single neuron activity patterns, even when clearly structured, do not matter for population activity or behavior. Alternatively, they do matter for these phenomena, and our model supports the alternative hypothesis. We have made this hypothesis clearer in the Introduction.

      (2) Discussion: Much of the text is a recap of the Introduction and Results sections. Please elaborate on the specific insights gained from the findings. The idea that tuned neurons in the sensory cortex are the basis for perception and perceptual decisions concerning the features being represented by those neurons is generally accepted. What the present study adds to this insight could be described more explicitly. On the other hand, the idea that small populations of tuned neurons are responsible for perception of taste/perceptual decisions about taste appears in contrast with previous accounts where stimulus features/decisions are reflected in correlated changes in activity across distributed populations of taste cortical neurons, including ones that are not necessarily tuned or even overtly responsive. How do the present findings relate to this idea?

      This is a very good point about reconciling these findings with past ones that have focused on coordinated changes across ensembles of neurons, i.e., metastable dynamics of internal (hidden) states. There is a brief mention of metastability toward the end of the Discussion, but we agree it deserves elaboration.

      This work does emphasize single unit activity, but in the context of, and as relevant to, population activity. We believe that the findings and frameworks of previous studies and those presented here are compatible rather than mutually exclusive. There is no reason why neurons with the coding patterns we studied here cannot coordinate with others to participate in the formation of different metastable states. The question of which—neurons with specific response profiles, or ensemble activity patterns that may involve these neurons?—is necessary and sufficient for producing perception and behavior during the mixture-based decision-making task is interesting but rather difficult to answer because of the single units’ contribution to both alternatives. One would need to utilize a manipulation that disrupts ensemble coordination without disrupting single unit activity to differentiate between them. We have made these points clearer in the Discussion.

      (3) Results: RNNs were based on data from single sessions -- how many neurons of each tuning type were observed in each session? In particular, there were 23 sessions but only 25 neurons total tuned to choice, suggesting that modelled choice neurons were based on ~1 neuron.

      The revised manuscript includes the session-by-session breakdown of response types for both experiment and model in two supplementary tables (Tables S2 and S4). We note that there are 25 neurons tuned to choice during the last 500 ms of the trial prior to decision, but 114 out of 626 neurons in total are tuned to choice in some time bin in the experimental data.

      (4) Minor: Indicate the time windows used for analysis of stimulus sampling, delay, and choice on the figures.

      The revised manuscript now includes the illustration of sampling and delay windows in Figure 2C-D, since we averaged the values over these windows for use in a 2-way ANOVA. All other figures either are associated with bin-by-bin analyses and have the first central and lateral licks (T and D) indicated, or have the time windows specified (e.g., Figure 4B, which uses [T, T + 0.5 s] and [D - 0.5 s, D]).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper aims to characterise the physiological and computational underpinnings of the accumulation of intermittent glimpses of sensory evidence.

      Strengths:

      (1) Elegant combination of electroencephalography and computational modelling.

      (2) The authors describe results of two separate experiments, with very similar results, in effect providing an internal replication.

      (3) Innovative task design, including different gap durations.

      Weaknesses:

      (1) The authors introduce the CPP as tracking an intermediary (motor-independent) evidence integration process, and the MBL as motor preparation that maintains a sustained representation of the decision variable. It would help if the authors could more directly and quantitatively assess whether their current data are in line with this. That is, do these signals exhibit key features of evidence accumulation (slope proportional to evidence strength, terminating at a common amplitude that reflects the bound)? Additionally, plotting these signals report locked (to the button press) would help here. What do the results mean for the narrative of this paper?

      The reviewer is correct that properties such as temporal slope scaling with evidence strength and stereotyped threshold-like amplitude were key in establishing that the CPP reflects evidence accumulation in conventional continuous-stimulus tasks, and its motor independence was demonstrated in how it exhibited the same evidence-dependent dynamics in the absence of motor requirements (e.g. O'Connell et al 2012). We agree that it is of interest to check any such properties that can be feasibly tested in the current, distinct task context of intermittent evidence with delayed responses. Given the way in which participants performed our delayed-response task, sometimes terminating decisions early, it is in the CPP-P1 that conventional patterns of coherence-dependence in slope and amplitude would be expected. Indeed, we found that the CPP-P1 reached higher amplitudes (Fig. 3A, Author response image 1) and exhibited a steeper build up in high- compared to low-coherence trials (Author response image 1). The slope and amplitude profile of the CPP-P2 is complex due to the variability in baseline activity across our various delay conditions and the bounded process that participants engaged in, but it is still consistent with an accumulation process. Our simulations provide a full account of how an accumulating signal could produce the observed results.

      Author response image 1.

      Grand-averaged (± sem) CPP-P1 traces in both experiments (top). Bottom boxplot graphs indicate the average slope computed as the slope between 0.2 s post P1 onset (when CPP begins its buildup) and the time when peak amplitude was reached within the [0.4-0.6s] interval, computed for each subject individually. Red crosses indicate outliers, computed as values exceeding 1.5 times the interquartile range away from the bottom or top of the box. Grey lines indicate single subject estimates, and asterisks reflect the significance of paired ttests for the estimated slope and amplitude effects; **p<0.01, *p<0.05. H = high coherence, L = low coherence.

      Like in other delayed-response tasks (Twomey et al 2016; McCone et al 2026), we observe here that the CPP peaks and falls well before the response is cued or indeed executed (here, in fact peaking and falling for each individual pulse). Thus, its pre-response dynamics will not relate to stimulus-driven evidence accumulation in the way they do in immediate response contexts (e.g. O’Connell et al. 2012; Steinemann et al. 2018). We therefore do not analyse response-aligned CPPs in the experiment.

      As to the intermediary role we have interpreted for the CPP, in addition to the local pulse driven peak-and-fall dynamics compared to the sustained profiles of motor preparation signals, we can point to the obvious temporal delay between the signals, where evidence-dependent buildup in the CPP substantially precedes that of motor preparation, as observed in all previous studies comparing the two (e.g. Kelly & O'Connell 2013).

      (2) The novelty of this work lies partly in the aim to characterize how the CPP and MBL interact (page 5, line 3-5). However, this analysis seems to be missing. E.g., at the single-trial level, do relatively strong CPP pulses predict faster/larger MBL? The simulations in Figure 5 are interesting, but more could be done with the measured physiology.

      As exemplified in the extant EEG-decision literature, the low signal-to-noise ratio of EEG is such that attempts are seldom made to link two EEG signals on a single-trial basis, and studies instead favour testing single-trial relationships between each individual EEG signal and behaviour, or, most commonly, comparing patterns of variation in the EEG signals across experimental conditions (e.g. difficulty). Accordingly, here we show that trials with high coherence P1 evoked 1) higher CPP amplitudes (Fig. 3A,C), and 2) stronger MBL (Fig. S2 & S3). Further, we showed that particularly high CPP amplitudes following the first pulse led to stronger weights on choice for the first pulse (Fig. S11), which could only be mediated by the motor system.

      (3) The focus on CPP and MBL is hypothesis-driven but also narrow. Since we know only a little about the physiology during this "gaps" task, have the authors considered computing TFRs from different sensor groupings (perhaps in a supplementary figure?).

      While we agree that it might be interesting to explore frequency bands and sensors more broadly, we feel that such an exploration would detract from the hypothesis-driven focus on how prominent, well-characterised decision signals in the brain behave in a context where evidence is presented in an atypical, seldom-studied manner, namely in the form of temporally separate pulses. Our aim was not to explore whole-brain dynamics that might be engaged during the task, but rather to get a better understanding of the functional roles of the neural processes underlying the CPP and MBL during decision making. Providing a detailed description of whole-scalp responses is thus beyond the scope of this paper, but given that all data will be made publicly available this can be pursued in future work and by other researchers.

      (4) The idea of a potential bound crossing during P1 is elegant, albeit a little simplistic. I wonder if the authors could more directly show a physiological signature of this. For example, by focusing on the MBL or occipital alpha split by the LL, LH, HL and HH conditions, and showing this pulse- as well as report-locked. Related, a primacy effect can also be achieved by modelling (i) self-excitation of the current one-dimensional accumulator, or (ii) two competing accumulators that produce winner-take-all dynamics. Is it possible to distinguish between these models, either with formal model comparison or with diagnostic physiological signatures?

      In addition to the CPP amplitude effects we report in the main paper, the reviewer is correct that pulse-locked MBL can also provide a physiological signature of the greater number of pulse-1 bound crossings when that pulse is high-coherence. This is shown in Figure S3, where we see this coherence-dependent effect consistently across all gap durations and both experiments. Figure S2 also shows that the MBL step-change after P2 is greater in P1-low coherence trials in Experiment 1, as predicted by the bound-crossing account, and consistent with the CPP findings. We note that this effect appears absent in Experiment 2, but this is likely because the greater proportion of shorter gap durations (0, .12, .36s) mean that updates following P2 are likely to still capture P1-driven changes, due to signal-transmission delays. Please also note that Fig. S2 and S3 have been updated from the previous version, because while revising the paper we noticed a mistake whereby we were plotting alpha band power (813Hz) rather than the intended beta (13-30Hz). The results remain qualitatively unchanged. Although there isn’t sufficient single-trial signal-to-noise ratio to be able to categorise individual trials as having crossed a threshold or not, this is strong evidence in support of the coherence dependent amplitudes of the CPP and motor updates. Analyzing beta locked to the report would not be informative in this case because of the delayed reporting structure of the task and the threshold-crossing relationship beta exhibits with response execution (O’Connell et al. 2012). That is, beta will reach the same amplitude immediately prior to the response regardless of whether or not decisions were terminated during P1. Instead, we believe that the empirical CPP-P2 traces we show provide direct evidence that the second pulse was not fully integrated in all trials, and as our modelling confirms, this is consistent with bound crossings occurring sometimes before P2. First, the fact that CPP-P2 amplitudes were overall lower than CPP-P1 amplitudes mirrors the behavioural observation that the first pulse had a stronger weight on choice than the second one. Second, we show that trials where the CPP was particularly high after the first pulse were also trials where P1 also exerted a particularly strong influence on choice (see Fig. S11), further validating the idea that higher CPP amplitudes are directly related to behaviour.

      Regarding self-excitation (SE) and winner-take-all competition (WTAC), these could indeed contribute to the behavioural primacy effects, but they would not detract from our central finding that the CPP does not encode a sustained representation of a decision variable, but rather reflects two rounds of evidence accumulation feeding into a single decision process. Further, it is not immediately clear whether/how these alternative models might also account for the CPP-P1/CPP-P2 results as simply as our bounded model does. While it might be theoretically possible for SE/WTAC models to explain 1) why the CPP-P2 is generally lower than the CPP-P1 across conditions, and 2) why the maximum CPP-P2 amplitudes in P1-high trials are smaller than in P1-low trials, these patterns of results are not an immediate consequence of standard implementations. Further, while the question of whether the accumulation process is perfect integration or involves SE or WTAC is certainly of additional interest, given that this is a delayed response task and does not provide information on termination timing through RT distributions, arbitrating between these modes of integration would not be straightforward with the current data.

      (5) The way the authors specify the random effects of the structure of their mixed linear models should be specified in more detail. Now, they write: "Where possible, we included all main effects of interest as random effects to control for interindividual variability." This sounds as if they started with a model with a full random effect structure and dropped random components when the model would not converge. This might not be sufficiently principled, as random components could be dropped in many different orders and would affect the results. Do all main results hold when using classical random effects statistics on subject-wise regression coefficients?

      The equations in the paper include the full details of the random effects structure we used for each model. We note that only two of our four equations did not include a full random effect structure, indeed due to convergence issues. We have now fit these models with a maximal random effects structure (i.e. including all fixed effects as random effects as well) with the ‘bobyqa’ optimiser. This resulted in singular fits for both Eq. 2 (Exp. 1 and Exp. 2) and Eq. 3 (Exp. 2 only). Following previous suggestions, we used a weakly informative wishart prior (Chung et al. 2015) to regularise the random effects covariance matrix using the blme package (Chung et al. 2013), which resolved the singular fit problem. However, the model still produced convergence warnings in some models. To assess these models’ robustness, we compared the fixed effect parameter estimates across multiple optimisers, as suggested by the lme4 developers (see lm4 documentation). Parameter estimates across optimisers rarely deviated by more than one decimal point across 6 optimisers (see Bates et al. 2011), and we thus concluded the model estimates were robust and convergence warnings were a false positive, a known issue in lme4. For all models in the paper, we report the parameters estimated using the “bobyqa” optimiser. All main inferential results remain unchanged (except for one interaction that was not of interest in Exp. 1), and the estimated slopes and statistical results for all models have been updated in the manuscript. We also included all these details in the manuscript.

      Reviewer #2 (Public review):

      Summary:

      This manuscript examines decision-making in a context where the information for the decision is not continuous, but separated by a short temporal gap. The authors use a standard motion direction discrimination task over two discrete dot motion pulses (but unlike previous experiments, fill the gaps in evidence with 0-coherence random dot motion of differently coloured dots). Previous studies using this task (Kiani et al., 2013; Tohidi-Moghaddam et al., 2019; Azizi et al., 2021; 2023) or other discrete sample stimuli (Cheadle et al., 2014; Wyart et al., 2015; Golmohamadian et al., 2025) have shown decision-makers to integrate evidence from multiple samples (although with some flexible weighting on each sample). In this experiment, decision-makers tended not to use the second motion pulse for their decision. This allows the separation of neural signatures of momentary decision-evidence samples from the accumulated decision-evidence. In this context, classic electroencephalography signatures of accumulated decision-evidence (central-parietal positivity) are shown to reflect the momentary decision-evidence samples.

      Strengths:

      The authors present an excellent analysis of the data in support of their findings. In terms of proportion correct, participants show poorer performance than predicted if assuming both evidence samples were integrated perfectly. A regression analysis suggested a weaker weight on the second pulse, and in line with this, the authors show an effect of the order of pulse strength that is reversed compared to previous studies: A stronger second pulse resulted in worse performance than a stronger first pulse (this is in line with the visual condition reported in Golmohamadian et al., 2025). The authors also show smaller changes in electrophysiological signatures of decision-making (central parietal positivity and lateralised motor beta power) in response to the second pulse. The authors describe these findings with a computational model which allows for early decision-commitment, meaning the second pulse is ignored on the majority of trials. The model-predicted electrophysiological components describe the data well. In particular, this analysis of model-predicted electrophysiology is impressive in providing simple and clear predictions for understanding the data.

      Weaknesses:

      Some readers may be left questioning why behaviour in this experiment is so different from previous experiments, which use almost exactly the same design (Kiani et al., 2013; TohidiMoghaddam et al., 2019; Azizi et al., 2021; 2023). The authors suggest this may be due to the staircase procedure used to calibrate the coherence of (single-pulse) dot motion stimuli for individuals at the start of the experiment. But it remains unclear why overall performance in this experiment is so bad. Participants achieved ~85% correct following 400 ms of 33 - 45% coherent motion. In previous work, performance was ~90% correct following 240ms of 12.8% coherent motion. It seems odd that adding the 0% coherent motion in the temporal gaps would impair performance so greatly, given it was clearly colour-coded. There is a lack of detail about the stimulus presentation parameters to understand whether visual processing explains the declined performance, or if there is a more cognitive/motivational explanation.

      We thank the reviewer for highlighting this. We apologise for not providing full details about the visual display, which we have included now.

      The moving dots were presented centrally on the monitor, at a 5 degree aperture, and moving at a speed of 5 degrees/second. The monitor refresh rate was 60Hz for 19 participants and 85Hz for 3 participants in Experiment 1, while it was 85Hz for 19 participants and 60Hz for 2 participants in Experiment 2. Dot density in our task was similar to previous studies (16.7 dots/degree/s<sup>2</sup>, as in Kiani & Shadlen 2013; Tohidi-Moghaddam et al. 2019; Azizi et al. 2021, 2023). However, in contrast to previous studies, we did not include any feedback on a trial-bytrial basis, instead only providing feedback at the end of each block indicating the average accuracy. This would have made it harder for participants to continually assess how well they were performing and to adjust their strategies (e.g. increase their bound for better accuracy) accordingly. We agree that the inclusion of 0% coherence dots during the gap between pulses is unlikely to have caused the participants’ relatively low overall performance, especially since we did not find accuracy to be overall lower for longer 0%-coherence gaps.

      Further, as the reviewer notes, we used a staircasing procedure at the beginning of the experiment which used only single pulses of evidence. This may have encouraged participants to set a bound that can usually be reached by one pulse, and the resultant early terminations meant that they seldom used the full 400ms of evidence that were available to them. In fact, we would like to thank the reviewer for pointing out Golmohamadian et al., 2025, which used a similar variable delays task structure but with different visual stimuli. They, like us, trained on a single-pulse task version and omitted trial-by-trial feedback in the main task, and, also like us, reported a stronger choice reliance on pulse-1. This suggests that these two factors may suffice to induce a primacy rather than a recency effect.

      There are other reasons why performance may have been different in our task compared to previous studies. For example, our task included a lead-in period that was longer than in previous studies and contained 0%-coherence dots, in order to minimise interfering VEP components (the lead in period was between 700 to 1050ms in our study, compared to 200– 500 ms in Kiani & Shadlen 2013; Tohidi-Moghaddam et al. 2019 & Azizi et al. 2023, and 400 -1000 ms in Azizi & Ebrahimpour 2021). This longer and visually explicit preparation period may have acted as a warning cue, allowing participants to fully prepare before the first pulse, and again making it easier for them to hit a bound with only that information.

      We have added a more detailed discussion about how our stimuli and the task characteristics may have resulted in a substantially different performance in our task compared to previous studies in the discussion section.

      Recommendations for the authors:

      Reviewing Editor:

      Please consider the following reviewer suggestions for how to strengthen the evidence for your central claims, which could translate into an improved assessment of the "strength of evidence".

      Apart from these useful suggestions, I had some concerns about scholarship, because the list of studies currently cited in your introduction is exclusively from your group, while one of the phenomena of interest - motor beta power lateralization (MBL) in decision-making - has been widely studied by several groups, using also other techniques.

      I was wondering why you chose not to cite the ample MEG evidence for the role of MBL in decision-making. This has been shown both in classical random dot motion tasks (Donner et al, Curr Biol, 2009; de Lange et al, J Neurosci, 2013; Pape et al, Nat Commun, 2016; Urai et al, Nat Commun, 2022) as well as in tasks involving discrete evidence samples (Wilming et al, Nat Commun, 2020; Murphy et al, Nat Neurosci, 2021). Another relevant EEG study is by Ian Gould et al, J Neurosci, 2010. There is also quite a bit of monkey LFP work (mainly by Saskia Haegens) on choice-selective beta power in the motor system of the macaque, although the link to the lateralized beta power suppression in your work and the above human E/MEG studies remains a bit elusive. I feel it would be important to provide a more balanced reflection of the existing literature on this phenomenon.

      We thank the editor for this fair comment, and we apologise for having provided a too narrow, EEG-centric view of the literature, arising from our interest in the CPP component which hasn’t yet been characterised in MEG or LFPs. We have now substantially expanded the introduction to provide a more balanced and comprehensive overview of the literature.

      Reviewer #1 (Recommendations for the authors):

      (1) The diffusion model needs to be explained in more detail. For example, it should be explicitly stated that the model was fit to only choices, as most readers would expect reaction times. Further, it needs to be started if the model was fit separately for each subject or in one go to the group-level data. If the former, it is important to add error bars of the betweensubjects variability (in simulated and empirical data) to Figure 4A. If the latter, it would be important to determine uncertainty using bootstrapping.

      The original model was fit to grand-average data, as stated in the methods section. To assess between-subjects variability, we have re-fitted the model to each individual subject, for each experiment. The average of the individually-estimated model parameters closely recapitulated the values obtained from the fit to grand-averaged data (Fig. S12). We then simulated N = 10000 trials for each individual, and we report the grand-averaged results with error bars indicating the standard error of the mean as a supplementary figure (Fig. S13). The results replicate the ones reported in the main manuscript. We have also made it explicit that the models are fit to accuracy data but not RT.

      (2) The authors write numerous times that the MBL exhibits an "evidence-dependent" buildup. However, should this not be "choice-dependent"? In Figure 2A, one can clearly see that the sign of MBL follows choice and not objective evidence.

      We thank the reviewer for this comment. By evidence-dependent, we mean that lateralisation towards the correct response is strongest in high-coherence trials (see Fig. S2, S3). This is indeed because the sign of MBL is choice-dependent, and participants are less likely to make mistakes in high-coherence trials. We have added a clarification sentence in the text.

      (3) It would aid readability to add sub-conclusions at the end of each Results section.

      We have added clarifications where needed.

      (4) In Figure 1B, I cannot see a dashed line for the HL condition. I understand that it must lie under the LH condition, but it would be good to show it separately.

      We thank the reviewer for this comment. Since we cannot show both lines separately without additional panels, given the HL and LH lines perfectly overlap, we indicate at the end of the caption that this is the case as follows: “Note that a perfect accumulator predicts identical accuracies for the HL and LH conditions, and therefore the two lines overlap.”

      (5) In Figure 4B, is the horizontal dashed line important? It is confusing because the legend incorrectly states that this is "data".

      Thanks for this observation - it was only there to indicate a 50% as a benchmark to assess how frequent early terminations are, but we agree that it was unnecessary and potentially confusing, so we have removed it from the plot.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors should more directly address how behaviour in their task differs quite substantially from previous experiments with very similar designs (including why such high coherence levels are required, over a longer duration, to reach overall worse performance). Some readers may also be interested in a broader discussion of how decision-makers may use flexible weights when integrating evidence across samples over time. While the explanation of bounded accumulation is convincing in this context, Tsetsos et al., (2012) suggest recency effects (as in Cheadle et al., 2014; Wyart et al., 2015) cannot be explained by bounded accumulation, but rather integration leak. Other factors may include stimulus consistency (Glickman et al., 2022) or even choice consistency across decisions (Bronfman et all., 2015). Golmohamadian et al., 2025 demonstrated flexibility in decision strategies across sensory modalities.

      As we described above, we have added some more detailed explanation about why it might be the case that behaviour in our study differs from previous reports using similar tasks. We agree that the reversed pulse-reliance in our study compared to others presents an opportunity to discuss flexibility in decision strategy and so we have now added a broader discussion on different patterns of integration in various task contexts. We thank the reviewer for pointing out Golmohamadian et al., 2025, as they, like us, trained on a single-pulse task version and omitted trial-by-trial feedback in the main task, and, like us, reported a stronger choice reliance on pulse-1.

      (2) Another open question is how central parietal positivity reflects an accumulation signal in the case of continuous evidence, but reflects momentary evidence in the case of discrete evidence samples. If, in both cases, the parietal evidence is passed along to motor processes for bounded decision commitment, how do motor processes deal with the changes in what is represented? Can the relationship between MBL and CPP in the model-simulated data shed some light on this? Specifically, how is the 0-gap condition treated in this simulation (which shows only 1 CPP peak but with a longer time to decay) compared to non-zero gap conditions (which show 2 peaks)?

      This is a very interesting and important point, and we thank the reviewer for raising it. We believe that the CPP in our intermittent-dots task reflects dot-motion evidence integration in the same way as in conventional continuous evidence tasks, building at an evidence dependent rate (see Author response image 1), with the only difference being that integration processes can be turned “on” or “off” depending on whether evidence is present, and can thus be temporally split into multiple “rounds” of accumulation when there is a gap.

      Our model simulations assume that evidence integration is triggered by the dots turning yellow, indicating the presence of evidence, and feeds continuously to the motor system in these periods. However, it is switched off either when 1) a bound has been hit, or 2) the dots turn blue again, at which point the CPP falls (see various rates of signal decay in Fig. S7). The reason the CPP continues longer before it peaks and falls in the zero-gap condition, by this account, is because there is no dot-colour change at the end of pulse-1 to switch it off, and thus the accumulation process continues until either a bound is hit, or the yellow dots turn blue after pulse-2. When there is a non-zero gap, despite the CPP being switched off, the decision variable itself remains encoded at the motor level so that no information is lost. This requires that the same instruction that turns-off the CPP must also break or pause the flow from the CPP to the motor level and allow it to hold its current level until either a second pulse resumes a feed from a newly-triggered CPP, or response execution is cued. Thus, in our account, the accumulation process underlying the CPP in our intermittent-evidence task is identical to conventional continuous-evidence tasks, but since it can be turned “on” and “off” as a function of whether or not evidence is clearly present or absent, produces two “rounds” of integration in non-zero gap conditions. The motor process also receives a feed from the CPP as in conventional continuous-evidence tasks, but with this feed similarly gated by the presence of evidence.

      A slightly different and perhaps more challenging question (which the reviewer was perhaps alluding to) relates to tasks where evidence comes not in short noisy snippets, but rather as static tokens (e.g. Wyart et al. 2012, 2015; Murphy et al. 2021; Parés-Pujolràs et al. 2025). In these instances, the CPP exhibits transient evoked responses to each token, which scale with the belief updates resulting from it (Parés-Pujolràs et al. 2025). However, it remains unclear whether these transient potentials reflect a temporally-evolving integration process to compute the appropriate belief update afforded by that token in the context of a particular task, or rather reflect the output of such a process. The former account would be similar to our interpretation of the transient deflections observed in this gaps task, which we believe capture the same temporal integration processes as those commonly observed in conventional continuous noisy stimuli paradigms, only short-lived. The latter account would instead be specific to low-noise stimuli like tokens, where the computations required for belief updating may not require a temporally-extended integration process, but rely on different mechanisms to compute belief updates (e.g. prior-based modulations of sensory encoding, attention or neural gain). These questions remain open for future investigation.

      (3) From what I understand, the model suggests all-or-none integration of the second pulse: either the bound has not been reached and the pulse is perfectly integrated, or the bound has been reached and so the pulse is not integrated. The CPP amplitude at pulse 2 is therefore determined not only by the strength of the evidence at pulse 2 but also by the proportion of trials where the evidence is not ignored: CPP at pulse 2 is of lower amplitude because it is calculated as an average across trials where it is either similar to CPP at pulse 1 or otherwise completely absent. Another explanation for the lower average amplitude is that all trials have a smaller amplitude (somewhat different from the main conclusions of the paper). It would be nice to show the dichotomy predicted by the model in the empirical data. I'm thinking of something similar to this 'bifurcation' analysis from Sergent et al., 2021. Or more simply, estimates of CPP amplitude from single trials (perhaps an average over a short window around the peak) should be more variable at pulse 2, with some reaching similar amplitudes to pulse 1, and many close to baseline, whereas at pulse 1, there should be a more uniform cluster of amplitudes. If all CPP peak amplitudes were lower, would this motivate a model comparison where, for example, additional evidence from the second pulse was down-weighted according to certainty following the first pulse (leading to all trials down-weighting the second pulse)? This could link in nicely with some of the more nuanced analyses related to attention in the supplementary figures.

      We thank the reviewer for this insightful comment, which will help us clarify how our model works. The integration of the second pulse does not work in an all-or-none manner. In our model, the accumulation stops whenever a bound is reached at the downstream motor level. This can happen 1) at some point during the 1st pulse (no integration of pulse 2 at all), 2) during the 2nd pulse (partial integration of pulse 2, until the bound is hit), or 3) not crossed at all (full integration of pulse 2). Our model thus allows for partial integration of the second pulse rather than all-or-none. Author response image 2 shows 3 example trials that illustrate how the model works. The CPP amplitudes at pulse 2 are thus determined by two main factors: 1) whether or not accumulation of P2 is precluded by an earlier bound crossing in P1 (if it is, the CPP amplitude is assumed to equal 0), and 2) whether and when accumulation ended if it did take place. Our interpretation is that, given that trials where pulse 1 was low coherence were 1) less likely to terminate early (Fig. 4B) and 2) had achieved lower levels of accumulated evidence (Fig. 4C), the LL and LH conditions are linked to a higher proportion of trials where accumulation at pulse 2 does occur, and it lasts for a longer amount of time because the distance required to reach a bound is longer than in their pulse 1 high-coherence counterparts. We have clarified this point in the results section describing the model.

      The reviewer notes: “Another explanation for the lower average amplitude is that all trials have a smaller amplitude (somewhat different from the main conclusions of the paper)”. However, our interpretation in fact predicts that the vast majority of trials should indeed exhibit smaller amplitudes. That can again be explained by the three trial types mentioned above. Unlike in CPP-P1, there would be a majority of trials where integration does not occur at all. Only trials where evidence was at least partially integrated during P2 would be predicted to have CPPP2 amplitudes that are overall positive, and even in those instances, average amplitudes would be overall lower than CPP-P1 in trials that terminated early, because of the lower distance remaining to be covered before hitting a bound. Author response image 2 illustrates this point. Thus, the prediction regarding how CPP amplitude variance or distribution shape would compare between P1 and P2 is less straightforward than if it were all-or-none on P2, not to mention the fact that EEG noise would likely drown-out distributional features like this. We therefore focus on a comparison of the means, for which our model has the clear prediction that most trials should exhibit lower CPP-P2 amplitudes. To assess whether empirical observations meet this prediction, and following the reviewer’s suggestion, we extracted the mean amplitudes around 0.45-0.55s after P1 and P2, for each single trial. CPP-P2 data were baselined using the amplitude 100 ms before P2 onset, as in Fig. S5 - note that this is likely to introduce spurious drifts due to overlapping potentials from P1, but given that grand averaged traces still qualitatively captured the key effects we assume it is a valid approach. We then pooled CPP-P1 and CPP-P2 amplitudes across pulses, and z-scored them for each participant separately. In both experiments, in a majority of participants (Exp. 1: 16/22, Exp. 2: 17/21) the median z-CPP-P1 amplitude was higher than that of z-CPP-P2. Author response image 3 illustrates the pooled distributions.

      Author response image 2.

      Decision variable simulations illustrating sample single trials (top) and CPP traces averaging data across conditions and N = 1000 trials (bottom), using model fits from Exp 2, in the long gap condition. Overlaid text indicates the percentage of trials in each subset, for each condition. The horizontal line indicates the bound; shaded areas indicate pulse presentation times. A. The bound was hit during P1, and therefore no further accumulation occurred during P2. B. The bound was hit during P2, and therefore P2 was only partially accumulated, C. No bound was hit, and therefore all evidence from P2 was accumulated.

      Author response image 3.

      Pooled CPP–P1 and CPP-P2 amplitudes [450-550ms post-pulse] distributions, normalised within-participant, and baselined 100ms before pulse onset. In both experiments, CPP-P2 amplitudes had a lower median (vertical line) normalised amplitude than CPP-P1.

      (4) A minor note: Full details of stimulus presentation (size, number of dots, dot size, speed, lifetime) would be appreciated.

      Thank you - we have now provided these details in the methods section (see also reply to public reviews above).

      (5) Are the authors sure they want to use this 'Gaps task' name? It seems a bit strange to introduce this name in this context, where there isn't really a 'Gap' (random dot motion fills the gap). A reader could get the impression the name was given in the Kiani et al., 2013 study (page 3, paragraph 1: "This scenario has begun to be studied using an intermittent- evidence or "gaps" task (Kiani et al., 2013) ...") but this is not true, Kiani et al. never use the term "Gaps task", nor has any other study since (as far as I know).

      We thank the reviewer for noting this oversight on our part - we have now made it clear that “gaps task” is the way we refer to the task originally developed by Kiani et al. 2013 in the introduction. We have decided to still use this name because it is a convenient proxy, in the understanding that “gap” refers to a “gap” in coherent motion as in Kiani et al (2013), albeit not a proper blank as in the original implementation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides valuable insights with solid evidence into altered tactile perception in a mouse model of ASD (Fmr1 mice), paralleling sensory abnormalities in Fragile X and autism. Its main strength lies in the use of a novel tactile categorization task and the careful dissection of behavioral performance across training and difficulty levels, suggesting that deficits may stem from an interaction between sensory and cognitive processes. However, while the experiments are well executed, the reported effects are subtle and sometimes non-significant. The interpretation of results may be overextended given the nature of the data (solely behavioral), the reliance on repeated d′ measures may obfuscate some of the results without clearer psychometric or regressionbased analyses, and the absence of mechanistic, causal, or computational approaches limits the strength of the broader conclusions. The work will be relevant to those interested in autism, cognition, and/or sensory processing.

      We thank the editors for their positive assessment of the data quality and the novelty of our behavioral task, and for pointing out the limitations inherent in behavioral studies.

      We would like to clarify one important point regarding the use of d′ measures. While d′ was included to quantify sensitivity, our conclusions are not based solely on repeated d′ measures. In addition to d′, we analyzed raw behavioral data (correct and incorrect choice rates), and categorization performance was assessed using psychometric curves fitted with logistic regression models. These complementary analyses provide converging evidence and ensure that our interpretations are supported by multiple robust measures.

      In the revised manuscript, we have further strengthened the analyses by including additional regression-based assessments, reporting effect sizes for subtle effects, and refining the statistical methods for clarity and transparency.

      We fully acknowledge that this work is behavioral and does not directly reveal the underlying neural mechanisms. Nonetheless, the translational framework we have developed establishes a robust foundation for future studies. This platform can be directly applied in clinical research on autism and other neuropsychiatric conditions involving sensory-cognitive interactions, and provides a solid basis for subsequent mechanistic, causal, or computational investigations to uncover the neural circuits mediating these effects.

      We greatly appreciate the editors’ and reviewers’ guidance and believe the revisions have clarified and strengthened the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.

      We appreciate the reviewer’s statement highlighting the importance of our study.

      Strengths:

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism.

      We thank the reviewer for recognizing the quality of our experiments and the relevance of our findings for understanding tactile perception and cognition in autism.

      Weaknesses:

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure).

      We thank the reviewer for these constructive comments. We acknowledge that aspects of the analyses were previously difficult to follow, and we have reworked the Results section to improve clarity and transparency.

      We would like to emphasize that all d′ measures are complemented by analyses of raw response rates (correct and incorrect choices), ensuring that our interpretations are not solely dependent on this metric. In addition, we applied standard psychometric analyses wherever possible. For the training phase, only two stimulus amplitudes were presented, which precluded the construction of full psychometric curves; however, for the categorization phase, psychometric analyses were feasible and are reported in Figure 3. Specifically, psychometric functions were fitted to the data using logistic regression, allowing us to estimate both categorization bias (threshold) and precision (slope) across stimulus intensities. These analyses revealed no evidence of categorization bias or precision in Fmr1<sup>-/y</sup> mice across stimulus strengths.

      Following the reviewer’s suggestion, we have also added general linear model analyses that account for trial history, providing a complementary perspective on decision-making dynamics. Finally, while the calculation of d′ is detailed in the Methods, we have revised the Results to clearly explain its use and appropriateness in each relevant analysis.

      These revisions aim to provide a clearer, more comprehensive picture of the data while ensuring that all conclusions are supported by multiple complementary measures.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task.

      Strengths:

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands.

      We thank the reviewer for emphasizing the strengths of our task design and analysis approach, and we appreciate that the potential of this platform for future mechanistic investigations is recognized.

      Weaknesses:

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative.

      We thank the reviewer for the careful reading of our manuscript and for these constructive comments. We agree that our study is purely behavioral, and we appreciate the opportunity to clarify the scope and interpretation of our findings. The primary goal of this work was to characterize behavioral patterns during tactile discrimination and categorization in a translationally relevant mouse model of autism.

      Although we did not include direct neural recordings, causal manipulations, or computational modeling, our analyses combining choice behavior, sensitivity measures from signal detection theory, psychometric curves, and regression-based models of trial history provide a detailed and robust characterization of perceptual learning, stimulus discrimination, categorization, and the interplay of cognitive processes with tactile perception. The manuscript has been revised to explicitly state that our conclusions are behavioral, emphasizing that this work establishes a foundation for future studies aimed at elucidating the neural and circuit mechanisms underlying these sensory–cognitive interactions.

      Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered.

      Alternative explanations for our findings including differences in motivation, fatigue, satiety, stereotyped licking, or reward valuation were carefully considered. As described in the Methods, only testing sessions with >70% correct performance on the training stimuli (12 µm and 26 µm) were included, excluding sessions with reduced motivation, fatigue, satiety, or stereotyped licking that could confound performance on low- or high-salience stimuli.

      Although differences in reward valuation could affect learning speed, we observed no genotype differences in training duration (Fig. 1B-D, Fig. S1C-D). Sessions with disengagement were analyzed only during epochs of active task performance (information added to the revised Methods section, lines 619-620). Reward-driven choice biases were unlikely, as no genotype differences were observed in categorization bias (Fig. 3F) and GLM analyses confirmed that previous reward outcome did not affect current choices (Fig. 4D).

      Finally, altered reward valuation could increase miss rates. Elevated miss rates in Fmr1<sup>-/y</sup> mice were restricted to the lowest-intensity stimulus (12 µm) under high cognitive load, demonstrating a salience- and context-specific effect inconsistent with generalized motivational or reward deficits. The Discussion has been updated to clarify these points and delimit the scope of our interpretations (lines 483-499).

      Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      This was not done intentionally. References to Load Theory were meant to provide conceptual inspiration for assessing attention in high cognitive load conditions during categorization, rather than to indicate a formal test. Moreover, we do not claim to have tested the Weak Central Coherence theory, although our results suggest reduced facilitation of across- category discrimination. Finally, we agree that citing Adaptive Resonance Theory, which is grounded in artificial neural network models, could be misleading, and we have revised the text accordingly.

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations.

      We thank the reviewer for this comment and agree that our study is purely behavioral and does not provide direct mechanistic evidence for top-down pathway dysfunction. In the first version of the manuscript, the term “top-down” was used at the behavioral level, referring to the influence of higher-order cognitive processes (e.g., categorization, attention, sensory and choice history integration) on tactile perception, rather than to imply specific neural circuits.

      We acknowledge that identifying the neural pathways underlying these effects would require extensive mechanistic experiments, including identifying the specific top-down pathway that modulates the influence of categorization on discrimination without directly altering categorization itself and performing pathway-specific recordings and manipulations. Such work represents a substantial mechanistic research program beyond the scope of the present study.

      To clarify that our study does not provide insights into the neural underpinnings of the studied behavioral processes, we have revised the manuscript, removing the term “top-down” or replacing it with “higher-order processes” where appropriate. We also explicitly noted that future work using neural recordings or causal manipulations will be needed to uncover the neural underpinnings of these behavioral phenomena (lines 508-510).

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited.

      We recognize that terms such as “reduced top-down categorization influence” and “choice consistency bias” are derived from behavioral observations. However, we respectfully note that these behavioral inferences are widely used in clinical studies to characterize cognitive tendencies (Soulières et al., 2007; Feigin et al., 2021) and are not inherently speculative.

      The translational impact of our work lies in the development of a robust behavioral platform that allows precise dissection of tactile perception and cognitive influences in a manner directly comparable to clinical studies. While we agree that neural, circuit-level, or causal manipulations would provide valuable mechanistic insight, the current study establishes a foundational behavioral framework that can guide and inform future investigations into the underlying neurobiological substrates.

      To ensure clarity, we have revised the manuscript throughout to explicitly indicate that all conclusions are based on behavioral measures and do not imply mechanistic evidence.

      (3) Statistical analysis:

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on non-significant findings undermines confidence in the conclusions.

      We chose to present both statistically significant effects and trends to ensure transparency and to highlight that commonly used aggregate measures, such as d′, can sometimes obscure meaningful underlying patterns. In the text, p-values between 0.05 and 0.1 are described as trends without over-interpreting their significance. To further support interpretation, we have now computed effect sizes (Hedges’ g) for all subtle effects. In the revised manuscript, all interpretations of non-significant effects have been reworded to avoid overstatement.

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations.

      The number of mice used per genotype is consistent with standard practices in behavioral studies of sensory processing. To complement statistical analyses and account for small sample sizes, we have calculated effect sizes (Hedges’ g) for all subtle or trend-level effects (p ≈ 0.05–0.1), providing a measure of effect magnitude independent of sample size.

      As the reviewer correctly noted, no animals were excluded as outliers, since observed variability reflects true biological differences rather than experimental or technical errors. In the revised manuscript, we re-examined all datasets for potential outliers, and when identified, analyses were performed both with and without the data point. Any results sensitive to single animals are explicitly reported. This procedure is now detailed in the Methods section (lines 675-679).

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.

      We thank the reviewer for highlighting this important point. To control for false positives arising from multiple comparisons, we applied the Bonferroni correction. This information has been added to the Methods section (line 682) to ensure transparency and reproducibility of all statistical tests.

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as t-tests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test.

      We thank the reviewer for raising this point, as this was not done intentionally. In the revised manuscript, miss rates for high- and low-salience stimuli were reanalyzed using a mixedeffects linear model, which appropriately accounts for repeated measurements within sessions (Fig. 5; Results section: lines 320-340). This analysis confirmed that Fmr1<sup>-/y</sup> mice exhibit increased miss rates specifically at the 12 µm amplitude, with the effect disappearing at higher low-salience amplitudes (18 µm). Post-hoc comparisons with Bonferroni correction revealed a strong trend for increased misses at 12 µm (T-test: t = -2.8437, p = 0.058, Hedge’s g = 1.23), while no significant differences were found at other amplitudes. The Methods section has been updated to detail this statistical approach for analyzing miss rates (lines 686687).

      (4) Emphasis on theoretical models:

      The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed.

      As mentioned above, our goal was not to directly test theoretical frameworks such as Adaptive Resonance Theory, Load Theory of Attention, or Weak Central Coherence, but rather to provide a context for interpreting our behavioral findings. In the revised manuscript, we have removed references to the Load Theory from the Results section and reframed the Discussion to emphasize that our results are consistent with certain predictions from these cognitive theories, without implying that the experiments directly assessed them. This clarifies that the interpretations are based on observed behavioral patterns, while still acknowledging the potential relevance of these frameworks to better understand tactile perception and cognition in autism.

      Reviewer #3 (Public review):

      Summary:

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice.

      Strengths:

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD.

      We appreciate the reviewer’s positive assessment regarding our study’s translational value and the importance of our behavioral findings.

      Weaknesses:

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.

      We thank the reviewer for these helpful suggestions. We agree that visualizing behavioral patterns, such as raster and density plots of licks, as well as learning rate over time, provides additional insights into learning dynamics. In response, we have added these analyses to the revised manuscript (Fig. S1, Fig. S2), which illustrate both individual and group-level learning trajectories and trial-by-trial licking patterns.

      There was no assessment of reversal learning in Fmr1<sup>-/y</sup> mice in this study. While this is an interesting and important question, and is motivated by previous preclinical and clinical findings, it falls outside the scope of the current manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Main Comments

      (1) This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism vs. WT controls. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention. The experiments seem well performed, with interesting results. I found certain aspects of the analysis not clearly explained, which made it difficult at times to understand.

      Please see specific details in the comments below.

      (2) To measure sensitivity, the authors present many comparisons of d' - sometimes between pairs of stimuli (or sometimes even for a single stimulus level).

      (a) Firstly, the calculation of d' for a single stimulus value is unclear (because the same proportion of high/low choices for a given stimulus can result from shifts in bias/criterion).

      We agree with the reviewer that calculating d′ for a single stimulus conflates sensitivity with response bias/criterion differences. For this reason, the panels showing d′ for individual stimulus amplitudes during training (Fig. 1F and 1G in the original manuscript) have been removed from the manuscript.

      In addition, we revised our d’ (Fig. 1E) and criterion calculations (Fig. 2A), treating the high amplitude stimuli as “signal” and low amplitude stimuli as “noise”, based on the Signal Detection Theory. The formulas used in the revised manuscript take into account correct responses during high amplitude stimuli and wrong responses during low amplitude stimuli to calculate the sensitivity and bias of the mice during discrimination in the training period.

      Sensitivity (d′) is now computed as:

      d' = z(lick right|high amplitude stimulus) - z(lick right|low amplitude stimulus)

      and the criterion (c) as:

      c = −1/2 × [z(lick right / high amplitude) + z(lick right / low amplitude)]

      (b) Secondly, while calculating d' makes sense for comparing two stimulus levels (like in the training condition), in the test condition (with a spread of stimuli), this becomes a little tedious - at times difficult to follow and unclear.

      I would have thought that sensitivity (at least for overall performance) would be better compared using data from all the stimuli - e.g. either using:

      (i) the sigma of the psychometric curve (although the downside of that approach is that it ignores history effects), or

      (ii) a logistic regression for the choices, given the stimuli, where the weights assigned to the stimulus magnitude indicate sensitivity (the advantage of that approach is that history effects, like the previous trials/choices can be used as regressors in the model). Accordingly, it can simultaneously also quantify the history effects. This could even be expanded to a GLMM (mixed effects for different mice).

      We thank the reviewer for this very valuable feedback. Indeed, during the testing phase, we calculated sensitivity d’ to probe the overall categorization sensitivity (Fig. 3H).

      (i) This analysis was only complementary to the psychometric curves (fitted on the rightward lick rate for each stimulus amplitude using a general linear model – Fig. 3A). As the reviewer proposes, we had calculated the sigma of the psychometric curve (Fig. 3G, slope) to assess categorization precision. Sensitivity calculations have also now been revised using the aforementioned formula (d' = z(lick right|high amplitude stimulus) - z(lick right|low amplitude stimulus).

      (ii) To incorporate history effects, we implemented generalized linear models (GLMs) with a binomial link function to predict high-salience licks (right-lick choices) based on the current stimulus, trial history, genotype, and their interactions. A main-effects model included current stimulus, previous stimulus, previous outcome, previous choice, and genotype, followed by interaction terms to assess genotype-specific modulation of history effects. These analyses are now presented in the new Figure 6.

      The resulting coefficients are shown in Fig. 6A. As expected, decisions were primarily driven by current stimulus amplitude (Fig. 6A, B). Both genotypes displayed a tendency to repeat previous choices (Fig. 6A, C), while previous reward outcomes did not influence current choice (Fig. 6A, D). Notably, stimulus amplitude history showed genotype-specific effects: WT mice were negatively influenced by the previous stimulus, whereas Fmr1<sup>-/y</sup> mice remained unaffected (Fig. 6A, E).

      To clearly visualize these findings, we plotted psychometric curves and marginal effects accounting for current stimulus, previous choice, previous outcome, and previous stimulus (Fig. 6B-E). These analyses are now fully integrated into the Methods (lines 688-702), Results (Fig. 6, lines 341-369), and Discussion (lines 469-479) sections of the revised manuscript.

      (3) I find some of the terminology used confusing/misleading:

      (a)The term "Categorization thresholds" can be misleading - in psychometric curves, "thresholds" often refer to the sigma (SD) of the fitted curve used to measure sensitivity (inversely related). Here, I think that the meaning is in terms of the PSE/ criterion. Perhaps the terminology can be improved to prevent confusion on this matter. E.g., I think that here the authors mean a measure of bias/criterion/PSE or similar. Correct? Not really a perceptual "threshold".

      We thank the reviewer for pointing this out. In our analysis, the term “threshold” referred to the inflection point (i.e., the midpoint parameter μ) of the fitted logistic psychometric function used to categorize high- versus low-amplitude stimuli. We termed it “threshold” in the categorization of high and low amplitude stimuli. We agree with the reviewer that we could also use the term “Categorization bias”. We originally opted to avoid this term, not to confuse the readers when referring to the criterion (signal detection theory) as “response bias”. However, seeing as the term “threshold” may be confusing as well, we adopted the term “Categorization bias” in the updated version of the manuscript (lines 282, 284, 637-638, 785, Fig. 3F).

      (b) Similarly, I think that "Categorization accuracy" can be misleading when describing the slope of the psychometric curve. Performance could have a steep slope but still be quite inaccurate (e.g., if there is a big bias). Perhaps "precision" is a better description of the slope?

      We thank the reviewer for this suggestion. The slope of the psychometric curve is often referred to as “sensitivity” in the literature (Carandini and Churchland, 2014), but in our original manuscript we used the term “accuracy” to avoid confusion with the d′ measure from signal detection theory. We have revised the manuscript and Figures with the term “precision” as the reviewer suggested (lines 282, 284, 637-638, 786, Fig. 3G).

      Minor Comments

      (1) Abstract: "determines how autistic individuals engage" - there are other factors too. So, I think that "determines" is a little strong. Perhaps "influences" is more appropriate.

      We have incorporated the reviewer’s suggestion (line 7).

      (2) Figure 1 F, G. On the one hand, d' is defined as "sensitivity (d') in discriminating between high- and low-salience stimuli" - that seems to make sense. But then d' is also calculated and presented for each salience level on its own. How was this done? Namely, percent correct (or proportion of choices high/low salience) could be affected by criterion shifts as well as sensitivity. This makes calculating the d' for a single (low or high) salience stimulus ambiguous. So, how do these authors make this conclusion?

      We agree that calculating d′ for a single stimulus amplitude is ambiguous, because the resulting value conflates true stimulus sensitivity with shifts in response bias or criterion. Consequently, all analyses and figures reporting d′ for individual high- or low-salience stimuli (e.g., Figures 1F and 1G) have been removed from the revised manuscript.

      In the updated analyses, d′ is calculated only across high- versus low-salience stimuli, following standard Signal Detection Theory procedures, ensuring that it reflects true discriminability between the two categories (Methods, line 631; Figure 1E).

      (3) "Our results showed comparable correct choice rates in Fmr1-/y and WT mice (Fig. 1H), for both high- and low-salience stimuli (Fig. S1C-D). In contrast, Fmr1-/y mice presented a significantly higher rate of incorrect choices (Fig. 1I)." - aren't correct choices and incorrect choices complementary (i.e., 1-x) in a 2AFC? How is this possible?

      We thank the reviewer for pointing this out. Correct and incorrect choices are complementary at the single-trial level if miss trials are excluded. However, in our analyses, correct and incorrect choice rates were calculated by normalizing the number of correct or incorrect responses to the total number of trials (including misses), which breaks this complementarity and contributes to the differences observed in Fig. 1H–I. This was clarified in the Methods section (lines 616-617). Moreover, incorrect responses were less frequent than correct ones and are thought to reflect lapses, response bias, and impulsive responding rather than sensory performance, making them more sensitive to genotype-dependent differences in behavioral control. Based on this concept, we further examined whether incorrect choices were preferentially associated with specific stimulus amplitudes and assessed response bias and prior effects.

      (4) The conclusion that "they showed a strong trend toward reduced sensitivity for lowsalience stimuli (Fig. 1G)" has a confound - it could be that there was a criterion shift (rather than differences in sensitivity)?

      We agree with the reviewer that the previously reported trend in sensitivity for low-salience stimuli could reflect a criterion shift rather than true differences in sensory sensitivity. Because sensitivity estimates for individual stimulus amplitudes are not well-defined in a 2AFC framework, we have removed the sensitivity calculations for high- and low-salience stimuli considered independently. Instead, we now present salience-specific differences using correct and incorrect response rates for each stimulus amplitude, which more directly capture performance differences without assuming changes in sensory sensitivity (Fig. 1G-I, S1E-F).

      (5) Figure 3D, E - I stumbled over this in comparison to Figure 3B, C. That is because (a) In D and E, the authors compare right-lick responses (reporting high salience) to stimuli of 12 μm and 14 μm amplitude (Figure 3D) and low-salience lick rates for the same (Figure 3E). I would have thought that these approaches are simply complementary (1-x) - see related minor question above/below. So, what is the advantage of presenting them both?

      We presented both panels to clarify the source of the observed differences in performance. Specifically, showing right-lick responses (reporting high-salience choices) alongside low salience lick rates allows us to distinguish whether reduced high-salience reporting arises from an actual shift in choice (e.g., increased leftward licking) versus an increase in miss trials at the lowest amplitude (12 µm). By presenting both, we can demonstrate that the effect is primarily driven by an increase in leftward choices rather than by missed responses, providing a more precise interpretation of behavioral changes. The complementary analysis for leftward choices has now been moved to the supplemental material (Fig. S5A) and the reason for this analysis has been clarified in the Results (lines 275-276).

      (b) In B and C, the authors compare two differences in stimulus magnitude (2 and 4 μm), but in Figure 3D and E, only one difference (2 μm) from two perspectives. I was expecting a comparison with stimuli differing by 4 μm in amplitude (comparable to the high stimulus comparison of 26 μm vs. 22 μm stimuli).

      We have indeed analyzed the 12 μm versus 16 μm stimulus pair, which corresponds to a 4 μm difference and is reliably discriminated by both genotypes. In the original manuscript, we did not include this comparison because of the differences already seen at a 2 μm amplitude difference. Based on the reviewer’s suggestion, we have now included the 12 μm vs. 16 μm comparison in the revised manuscript (Results, lines 270-272; Fig. 3E) to provide a complementary perspective consistent with the high-salience comparisons (26 μm vs. 22 μm).

      (c) "Sensitivity d' for high- and low-salience stimuli was calculated based on the Correct and Incorrect choice rate for high- and low-salience stimuli respectively." How were trials for which the animal did not respond taken into account? Were these part of the denominator? Or were these excluded when calculating proportions? (related to the Q regarding Figure 3 D,E above).

      Indeed, the Miss trials were part of the denominator. This is now clarified in the Methods section (line 631).

      (d) "c = d'(high)- d'(low)." - I did not understand this fully. There were several high and several slow stimuli - so how were these calculated? Pooled for high and pooled for low? Per stimulus difference?

      This was indeed calculated for pooled high and low amplitudes during testing. In the revised manuscript, criterion c has been recalculated based on the average correct high rate (for stimuli of 20-26 µm amplitude) and average incorrect low rate (for stimuli of 12-18 µm amplitude), using the same formula as in the analysis of the training dataset:

      c = −1/2 × [z(lick right / high amplitude) + z(lick right / low amplitude)]

      Pooling across amplitudes allows us to obtain a single summary measure of response bias toward the right lickport, independent of stimulus discriminability. This approach is consistent with standard signal detection theory practices when multiple stimulus levels are present.

      If the inter-trial interval is 5-10s, how is a 5s timeout a punishment?

      The 5 s timeout serves as a punishment by temporarily delaying access to the next trial and potential reward, thereby reducing the overall reward rate. Even though the inter-trial interval (ITI) varies between 5 and 10 s, the timeout increases the effective delay before the next opportunity to earn a reward, discouraging incorrect responses. This is consistent with standard operant conditioning procedures, where brief timeouts act as negative consequences without being overly severe. Across most trials, the timeout effectively reduces expected reward rate, though its impact is minimal when the ITI is already long.

      Reviewer #2 (Recommendations for the authors):

      Task-related questions:

      (1) What evidence is there that the 40 Hz, 12 μm stimulus is "low salience: while the 40 Hz, 26 μm stimulus is "high salience"? This seems like an arbitrary distinction without showing sensitivity curves across a group of animals. Better definitions of the stimuli and the actual forces applied are necessary.

      We thank the reviewer for this comment. Based on our previous work (Semelidou et al., bioRxiv; Accepted in Advanced Science), both the 40 Hz, 12 µm and 40 Hz, 26 µm stimuli are clearly suprathreshold. In the present study, however, stimulus salience is defined in a relative and operational manner within this suprathreshold range.

      Specifically, analysis of miss trials (Fig. S3E) shows that the 40 Hz, 12 μm stimulus consistently elicited a higher proportion of missed responses compared to the 40 Hz, 26 μm stimulus across animals, indicating lower behavioral performance for the lower-amplitude stimulus. We therefore refer to the 12 μm stimulus as “low salience” and the 26 μm stimulus as “high salience” to denote relative differences in perceptual strength and attentional engagement within the suprathreshold range, rather than differences in detectability or absolute sensory sensitivity. This definition has been clarified in the Methods (lines 583-587) and Results sections (lines 115-119; lines 225-227).

      (2) Sensitivity curves/detection thresholds for each mouse should be included in the study.

      We thank the reviewer for this suggestion. Sensitivity curves and detection thresholds for low-amplitude and low-frequency vibrotactile forepaw stimulation have been systematically characterized in our previous study (Semelidou et al., bioRxiv, Accepted in Advanced Science). In that work, we demonstrated that stimuli with similar amplitudes and even lower frequency (10Hz) than those used in the present study are reliably detectable by mice, confirming that both the 40 Hz, 12 µm and 40 Hz, 26 µm stimuli fall within the suprathreshold range.

      Because the goal of the present study was not to determine absolute detection thresholds but rather to examine discrimination and categorization performance within a suprathreshold range, we did not re-establish full psychometric detection curves for each mouse.

      We have clarified this rationale in the revised manuscript (Results, lines 108-113; Methods, lines: 577-579).

      (3) What force is being applied during stimulus presentations? 12 or 26 μm does not provide enough information about the stimuli applied. What are the physical parameters of the indenter? What material, what tip size?

      Vibrotactile stimuli were delivered to the forepaw via a piezoelectric actuator. A 12.7 mm stainless steel post (ThorLabs) was mounted on the actuator vertically and a 0.6 mm stainless steel rod (ThorLabs) was clamped horizontally onto this post. The horizontal rod served as the contact bar on which the animal rested its right forepaw.

      Stimuli were sinusoidal vibrations at 40 Hz with peak-to-peak displacements of 12 μm (low salience) or 26 μm (high salience). The actuator displacement was calibrated prior to experiments to ensure accurate vibration amplitudes.

      Animals were positioned in the setup to ensure stable and consistent forepaw contact with the rod delivering the vibration. Pilot experiments with an extra sensor to monitor forepaw placement confirmed that the mice did not remove their forepaws from the bar before stimulus delivery. All this information is now added in the Methods section (lines 552-555, 580-582).

      (4) Only one vibration stimulus was used (40 Hz) - this preferentially activates specific subsets of low-threshold mechanoreceptors and not others. A range of vibrotactile stimuli (with varying frequencies) would be more useful. From this limited range of stimuli, it is difficult to assess whether the findings would extrapolate to other types of stimuli.

      We agree that using a single vibration frequency limits the generalization of our findings across the full range of mechanoreceptor subtypes and vibrotactile stimulus conditions. In the present study, we deliberately focused on amplitude discrimination within the flutter range (<50 Hz), as this frequency preferentially activates subsets of low-threshold mechanoreceptors relevant for flutter perception and is commonly used in clinical studies of tactile amplitude discrimination (Puts et al., 2014, 2017; Asaridou et al., 2022). By holding frequency constant and varying only amplitude, we were able to isolate amplitude-dependent perceptual and decision-making processes while minimizing frequency-dependent variability and to facilitate direct translational comparisons with human studies using similar flutter stimuli.

      We acknowledge, however, that extending the paradigm to additional, high frequencies would help determine whether the observed effects generalize across mechanoreceptor channels. We have now added this point as a future direction in the Discussion section (lines 510-514).

      (5) The methods indicate that during the implementation of the water-restriction protocol, mice had access to a solid water supplement in their home cage. How did they control for how much water supplement was consumed by each mouse before the testing sessions?

      We thank the reviewer for raising this point. The solid water supplement was divided into premeasured individual portions, and each mouse received its allotted amount only after the daily training/testing session. Daily body weight measurements were used to monitor hydration and ensure that all animals maintained stable body weight. If necessary, supplemental water was adjusted to maintain animals within the approved weight range. This procedure is now described in the Methods section (line 567-571).

      (6) A control version of the test, perhaps using a different sensory modality, would be useful for making conclusions.

      We agree that testing other sensory modalities would provide a useful control for assessing the generalizability of the observed effects. However, in the present study, we intentionally focused on the tactile modality, as touch has been shown to play a critical role in autism across sexes and predict other core behavioral symptoms. This makes touch particularly relevant for investigating translational mechanisms in this model.

      By specifically targeting tactile perception, we aimed to investigate the link between sensory discrimination, decision-making, and cognitive modulation within a modality that is strongly implicated in autism. Previous studies in autistic individuals have demonstrated similar interactions between cognitive processes and perceptual decision-making in the visual domain, suggesting that such effects may not be modality-specific. Nevertheless, extending this paradigm to additional sensory systems would be valuable to directly test whether comparable cognitive influences on perception generalize across modalities. We have now incorporated this perspective as a future direction in the Discussion section (lines 514-518).

      Reviewer #3 (Recommendations for the authors):

      There are several questions:

      (1) It is important to show stimulus intensity-response curves representing tactile responses for both WT and Fmr1-/y mice.

      We thank the reviewer for this important comment. Detection sensitivity curves for lowamplitude and low-frequency vibrotactile stimulation of the forepaw have been characterized in detail in our previous study (Semelidou et al., bioRxiv; now accepted in Advanced Science). In that work, we showed that stimuli at or above 8 µm amplitude and 10Hz frequency are reliably detected by both WT and Fmr1<sup>-/y</sup> mice.

      Based on these findings, the current study employed vibrotactile stimuli at a higher frequency (40 Hz) and amplitudes of 12 µm and above, ensuring that all stimuli were well within the suprathreshold range for both genotypes. This experimental choice was made to specifically probe discrimination, categorization, and decision-making processes, rather than basic sensory detection. As a result, the behavioral effects reported here cannot be attributed to differences in stimulus detectability.

      We have clarified this rationale in the revised manuscript to make explicit that the absence of full intensity-response curves in the current study reflects a deliberate focus on suprathreshold perceptual and cognitive processes rather than sensory threshold differences (Results, lines 108-113; Methods, lines: 577-579).

      (2) There is no difference in the time it takes to learn the task between WT and Fmr1-/y mice. But how does the learning rate curve look? Is there a difference in the slope between WT and Fmr1-/y early vs late into learning?

      We thank the reviewer for this suggestion. To directly address whether learning dynamics differed between genotypes, we analyzed learning curves across training.

      We first computed the correct choice rate per day for each animal (Fig. S2A) and fit a mixedeffects model including training day, genotype, and their interaction. This analysis revealed no genotype differences in baseline performance or learning rate with minimal Genotype × Day interaction (Fig. S2A-top, Fig. S2C).

      We additionally computed the slope of the learning curve for each individual, which also showed no difference across genotypes (Fig. S2B). In addition, within-animal day-to-day performance variability was also comparable across groups (Fig. S2A-bottom, S2D).

      These analyses indicate that WT and Fmr1<sup>-/y</sup> mice exhibit similar learning trajectories during training. The learning curves are now included in Figure S2, described in the Results (lines 140–151) and detailed in the Methods (lines 644-658).

      (3) It would be useful to see raster plots of licks for different trials and the corresponding lick density plots for early vs late trials.

      We thank the reviewer for this suggestion. To visualize trial-by-trial behavior, we included example lick traces from an early 100-trial session and a late 100-trial session, alongside the corresponding raster plots of licks (Fig. S1A–B).

      (4) Consistent with the first question, examples of intermediate learning stages would help gain more insight into how both WT and Fmr1-/y mice learn.

      In line with the reviewer’s suggestion, we examined whether WT and Fmr1<sup>-/y</sup> mice showed different performance during intermediate stages of learning. To this end, we defined the middle three days of the training period of each animal as the intermediate learning phase. We compared both the mean correct-choice rate and individual learning slopes across this interval. Statistical analyses revealed no significant genotype differences in either measure, indicating comparable performance and learning dynamics during the intermediate phase of training (lines 152-156).

      (5) How does the learning rate change with increased cognitive load for both WT and Fmr1-/y mice?

      We thank the reviewer for this question. While our experimental design did not include a manipulation of cognitive load during the learning phase itself, we assessed whether increased cognitive load affected performance by analyzing behavior on the first day of testing, when animals were required to categorize and discriminate among a larger set of stimuli compared to training.

      Using performance on the training stimuli during this first testing session as a proxy, we found no significant difference between WT and Fmr1<sup>-/y</sup> mice in correct choice rate (Author response image 1). This indicates that increased cognitive load did not differentially affect performance on familiar stimuli across genotypes at this stage.

      Because this analysis does not reflect learning rate per se, but rather performance under increased task demands after learning had already occurred, we did not incorporate it into the main Results section. Instead, it is presented here to directly address the reviewer’s question.

      Author response image 1.

      Correct choice rate for the 12 µm and 26 µm stimuli during the first day of testing when the cognitive load is high.

      (6) How does the learning rate change if the sensory stimuli are more challenging for both WT and Fmr1-/y to detect?

      We thank the reviewer for this question. In the present study, animals were deliberately trained using well-separated, suprathreshold low- and high-salience stimuli to ensure reliable stimulus detection and to avoid confounding learning rate with perceptual difficulty or discrimination limits.

      A recent study (Heimburg et al., 2025) has shown that learning is slower when the difference between the two training stimuli is reduced. Based on these results, we would expect that decreasing the separation between low- and high-salience stimuli would similarly increase training duration for both WT and Fmr1<sup>-/y</sup> mice, since our results do not indicate any discrimination or categorization deficits in the mouse model of autism. However, directly testing how stimulus difficulty modulates learning rate would require a dedicated manipulation of stimulus spacing during training and was beyond the scope of the current study.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals.

      These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We sincerely thank the editors and reviewers for their careful evaluation and constructive feedback, which has helped us substantially improve the clarity and rigor of the manuscript. In the revised version, we have clarified the interpretation of the electrophysiological experiments, corrected the labeling of recorded signals as light evoked EPSCs, and removed statements implying differences in absolute synaptic strength. To address concerns about the interpretation of Fig. 7, we have added quantitative analyses of EPSC kinetics and revised the text to focus on synaptic response dynamics rather than amplitude differences. We have also removed analyses that could cause confusion and expanded the Methods section to provide additional experimental details, including the optogenetic stimulation configuration in slice recordings. Together, these revisions strengthen the interpretation of the electrophysiological results and improve the overall clarity and transparency of the study.

      Public Reviews:

      Reviewer #1 (Public review):

      Weakness:

      The authors focused primarily on female mice limiting generalizability and leaving the readers with questions about the impact of sex differences on their results. The tube test is used as a manipulation of the "emotional state" in several of the experiments. While the authors show the changes to corticosterone levels as a consequence of win/loss in the tube test, stronger claims might be made with comparisons to other gold standard stressors such as forced social defeat or social isolation.

      We thank the reviewer for these thoughtful comments.

      First, we acknowledge that the present study was conducted primarily in female mice, which may limit the generalizability of the findings. Female mice were selected to reduce variability associated with male aggression and housing-related stress, which can complicate behavioral assays such as social interaction and dominance testing. While focusing on a single sex allowed us to maintain experimental consistency across multiple behavioral paradigms, we agree that sex differences could influence the neural circuits underlying emotional and social behaviors. We have now added a statement in the Discussion acknowledging this limitation and noting that future studies will be necessary to determine whether similar circuit mechanisms operate in male mice.

      Second, we appreciate the reviewer’s suggestion regarding the use of other stress paradigms. In this study, the tube test was used primarily to establish social dominance relationships between paired mice rather than as a classical stress-induction paradigm. Nevertheless, we observed measurable physiological changes associated with repeated win/loss outcomes, including alterations in corticosterone levels in brain lysates of loser mice after repeated tube-test competitions. Notably, repeated win/loss outcomes in the tube test were associated with significant increases in corticosterone levels in loser mice, indicating that the paradigm produced measurable physiological responses consistent with stress-related processes. These findings suggest that repeated social competition in this context can induce transient physiological and behavioral changes associated with social hierarchy. We agree that paradigms such as chronic social defeat stress or social isolation represent well-established models for inducing sustained stress responses. We have therefore revised the manuscript to clarify that the tube test in our study serves as a model of social competition and rank establishment rather than a canonical stress paradigm, and we highlight the comparison with other stress models as an important direction for future work.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      In relation to figure 7. Their response does not really clarify the issue:

      (a) They argue that they are not making claims about synapse strength. However they still state "In the mPFC→NAc pathway, blue light stimulation evoked larger excitatory postsynaptic currents (EPSCs) in winner mice compared to losers (Fig. 7E). This suggests stronger synaptic transmission in winners' mPFC→NAc circuits. " They don't show this, they just show that normalized to some arbitrary value the responses of the earlier durations is higher or lower, which is very hard to interpret.

      They argue in the rebuttal that the aim of this is to highlight response kinetics, but these are not quantified or discussed in any way.

      We thank the reviewer for this helpful comment. We agree that the normalized input output curves shown in the original submission did not allow conclusions about absolute synaptic strength, and we also acknowledge that response kinetics were not previously quantified despite being mentioned in the rebuttal.

      To address both concerns, we have revised Fig. 7 and added quantitative analyses of EPSC kinetics. Specifically, we measured the rise and decay slopes of light-evoked EPSCs recorded in postsynaptic neurons within the NAc and BLA of winner and loser mice. In the mPFC→BLA pathway, both the EPSC rise and decay slopes were significantly increased in loser mice compared with winners (rise slope: p = 0.0138; decay slope: p = 0.0392), suggesting enhanced synaptic responsiveness and faster charge transfer kinetics in BLA neurons of losers. In contrast, in the mPFC→NAc pathway, both mEPSC rise and decay slopes were not significantly different between groups. 

      These results provide a quantitative characterization of synaptic response dynamics and reveal pathway-specific differences in synaptic properties associated with social hierarchy. Importantly, this analysis does not rely on amplitude normalization and therefore allows a more interpretable comparison of synaptic response profiles between groups. We have updated Fig. 7 and the corresponding Results section to include these analyses. 

      (b) They still haven't labeled the responses correctly. The responses in figure 7 are not "voltage spikes" but light-evoked EPSCs.

      We apologize for the incorrect terminology. All instances of “voltage spikes” have been corrected to “light-evoked EPSCs” in the figure legends and text.

      (c) They argue that responses do not vary across experiments/slices because they use a constant viral injection volume targeted to the same co-ordinates and identical placement of the fiber and recording location. While I am sure they aim to do that, it is almost impossible to ensure that this was identical across experiments and that the degree of opsin labelling in their slices was the same (See for example Mao et al., 2011 PMID: 21982373 who pioneer the approach of using within slice comparisons to account for this). If I understand their explanation of their strategy correctly, the authors own rebuttal highlights this point, they seem to have needed to vary the LED duration by an order of magnitude (1-10ms) to ensure reliable responses across experiments, even for the same projection.

      We thank the reviewer for raising this important point. We agree that it is not possible to ensure identical opsin expression or light delivery across experiments. We have revised the manuscript to explicitly acknowledge this limitation and clarify that normalization was used to mitigate, but not eliminate, inter-slice variability. We now avoid any interpretation that relies on absolute response amplitude across animals.

      Regarding “LED duration variability (1-10 ms)”, we agree that the need to adjust stimulation duration reflects variability in effective opsin activation across slices. We now clarify this point in the Methods and Results and emphasize that stimulation parameters were optimized to reliably evoke responses rather than to equate absolute light input across experiments.

      Importantly, our main conclusions do not rely on absolute EPSC amplitude comparisons. Instead, they are supported by analyses that are less sensitive to variability in opsin expression or light delivery, including EPSC kinetics (rise and decay slopes), paired-pulse ratio measurements, and AMPA/NMDA ratios. These complementary measures provide a more robust characterization of synaptic properties across conditions.

      (d) Similarly in Fig S6 it is unclear what they are showing. The Y axis is still labeled in pA, yet they claim this is an action potential? Also this analysis is rather irrelevant to the data shown in figure 7 as the pathway between PFC and BLA/NAc is not preserved.

      We thank the reviewer for pointing out the lack of clarity in Fig. S6. We agree that it does not directly inform the interpretation of Fig. 7 and may cause confusion. To improve the clarity and focus of the manuscript, we have therefore removed Fig. S6 from the revised manuscript. The removal of this supplementary figure does not affect the main conclusions of the study.

      (e) It now also seems that these experiments were performed by placing a fiber optic into the slice to elicit responses. This should be detailed in the methods.

      We thank the reviewer for noting this omission. We have added a detailed description of fiber-optic placement within the slice for optogenetic stimulation to the Methods section. Specifically, we clarify that blue light was delivered through a fiber optic positioned above the recorded slice to activate ChR2-expressing mPFC axon terminals within the BLA or NAc. The placement of the fiber relative to the recorded neurons and the stimulation parameters are now explicitly described in the revised Methods section.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript examines the evolution of molluscan shells using single-cell analyses of the adult mantle of Crassostrea gigas and compares these data with previous datasets from embryonic and larval stages of this species and other spiralians. The authors provide support for a scenario in which secretory cells are broadly conserved across spiralians, and the incorporation of lineage-restricted genes contributes to the evolution of molluscan shells.

      Strengths:

      High-quality datasets for mantle tissue in Crassostrea gigas and thorough comparisons with existing datasets for this species and other spiralians. Balanced discussion.

      Weaknesses:

      No major weaknesses. The analyses follow fairly standard approaches in the field that have been previously applied and developed in similar systems.

      We thank the reviewer for the positive evaluation of our work. We are encouraged that the reviewer finds our conclusions balanced and the analyses appropriate. Although no major concerns were raised, we will incorporate clarifications and improvements prompted by the other reviewers to further strengthen the manuscript.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Validation of cell types

      Cell type identities are not convincingly validated. Although the authors cite previous studies (l. 92), the referenced marker genes are largely not used, and the cited works do not provide sufficient spatial validation. Without in situ data, the inferred locations of cell types (e.g. Figure 2A) are not supported. Spatial validation of marker genes (e.g. via HCR) is essential, particularly for a study addressing shell field evolution. In addition, the gastrula dataset is not meaningfully analyzed, and its inclusion remains unclear.

      We thank the reviewer for this important comment regarding cell type validation. In the previous version of the manuscript, we provided a detailed compilation of referenced marker genes from previous studies in Supplementary File 2. It is possible that, due to an incorrect or unclear reference in the main text, this information was not readily accessible. We will correct and clarify these citations in the revised manuscript to ensure that these resources are clearly presented.

      We agree that spatial validation would provide important support for cell type identities. In the revised version, we will strengthen this aspect by selecting more specific marker genes for each SEC cluster and performing fluorescence in situ hybridisation (FISH) to validate their spatial localization.

      Regarding the gastrula dataset, our original intention was to investigate the developmental transition of shell gland-related cell populations from gastrula to trochophore stages. However, following the reviewer’s suggestion and considering the limited interpretability of the gastrula dataset in its current form, we agree that its inclusion does not substantially strengthen the study. We therefore plan to remove the gastrula dataset from the revised manuscript, and instead focus on the trochophore stage as a representative developmental stage for larval shell formation, enabling a clearer comparison between larval and adult shell-forming cell populations. We note that this change does not affect the main conclusions of the study. In addition, we will curate a refined set of experimentally supported marker genes, and provide an updated supplementary table summarizing detailed information, including cell type annotations, literature sources, and experimental validation methods.

      (2) Robustness of cell type classification 

      Several proposed cell types may not represent distinct entities (not individuated) but rather reflect over-clustering. Marker genes are often not specific and are shared across clusters (e.g. Sec1/Sec2), making it difficult to distinguish cell types reliably.

      In the revised manuscript, we will refine marker gene selection by prioritizing genes with higher specificity and stronger discriminatory power to improve the robustness of cell type identification. To further support cell identity assignment, we will select representative marker genes for SEC clusters and perform FISH to validate their spatial localization. These revisions will lead to a more robust and conservative interpretation of cell populations.

      (3) Comparative analysis of secretory cells

      The comparative framework is not sufficiently supported. Secretory cells are highly diverse, and without proper validation, their comparison across taxa is not meaningful. The transcription factor analysis is limited, as only a few genes are shared and many are inconsistently expressed (Figure 3E). The conclusion of a conserved regulatory program across spiralians is therefore overstated.

      We agree that secretory cell types are highly diverse across spiralians and that cross-species comparisons require careful interpretation. In the revised manuscript, we will adopt a more cautious framework, highlight partial conservation of regulatory program alongside functional convergence in secretory processes. We also will strengthen the comparative framework by integrating functional annotations, which may provide complementary support beyond individual gene overlaps. Importantly, we will improve the reliability of oyster SEC annotations through FISH-based spatial validation, thereby increasing confidence in cross-species comparisons. These revisions will provide a more balanced and biologically grounded interpretation of secretory cell evolution across spiralians.

      (4) Clarity and interpretation of results

      Results are at times difficult to follow and remain superficial. Marker genes are insufficiently annotated (especially for Crassostrea), and comparisons across taxa lack functional interpretation. Unvalidated and heterogeneous cell types are grouped together, and transcriptional similarities are overinterpreted. Overall, key conclusions are not adequately supported by the presented data.

      In the revised manuscript, we will re-evaluate marker gene annotations to ensure support from existing experimental evidence. For SEC populations, we will validate representative markers using FISH. We will also expand the functional annotation of marker genes and strengthen cross-species comparisons. In addition, we will substantially revise the Results and Discussion sections to improve clarity and depth, reduce overinterpretation of transcriptional similarities, and ensure that all conclusions are more tightly aligned with the strength of the supporting evidence.

      Reviewer #3 (Public review):

      Weaknesses:

      (1) My main concern is that the authors rely primarily on previous studies for the experimental and functional characterisation of the identified cell types. The cited papers (Piovani, 2023 and de la Forest Divonne et al., 2025) deal with distinct stages or tissues (larvae and hemocytes, respectively), which limits their direct relevance. The authors also cite other papers for in situ expression data; it would be helpful to summarise somewhere (e.g. in a table) which genes have been experimentally characterised and what their expression domains are, or alternatively to provide HCR or in situ staining on the mantle. For instance, what is the rationale for the claim that proliferative cells give rise to the mantle? The trajectory inference approach used (Monocle) would likely yield a similar result regardless of the reference cell type, so additional justification is needed.

      We agree that our reliance on previous studies for functional and experimental characterization requires clearer justification and integration. In the revised manuscript, we will compile a new supplementary table summarizing marker genes with available experimental validation, including their associated cell types, literature sources, and experimental methods. For SEC populations, we will select representative marker genes and perform FISH to validate their spatial localization, thereby providing independent support for cell identity.

      Regarding trajectory inference, we agree that methods such as Monocle are sensitive to assumptions. We will clarify the rationale for root cell selection, test alternative root assignments to assess robustness, and revise our interpretation to avoid strong lineage claims. Rather than stating that proliferative cells give rise to mantle cells, we will describe the observed trajectory as being consistent with a potential developmental relationship, while acknowledging that this does not constitute direct evidence of lineage progression.

      (2) More broadly, I find that the functional properties of the identified cell types and their relationship to the expressed genes deserve more detailed discussion. For example, at L100, several genes are mentioned, but their functional roles are not discussed. Similarly, the basis for annotating the proliferative cells is not explained. How was gene orthology assessed? Throughout the manuscript, vertebrate-style gene names are used without explicitly establishing orthology status in oyster, which should be addressed.

      We thank the reviewer for this important comment. In the revised manuscript, we will expand the functional interpretation of key genes by incorporating available literature and, where possible, functional annotations. We will also clarify the basis for cell type annotation and explicitly describe the criteria used, including for proliferative cell populations (e.g. cell proliferation-associated markers).

      Regarding gene annotation, gene names in oyster were assigned based on sequence similarity searches against the eggNOG database. In the revised manuscript, we will provide a comprehensive supplementary table linking gene IDs to their annotations, along with the corresponding database sources. In addition, we will clearly describe how orthology relationships were assessed, including the methods and criteria used (e.g. sequence similarity searches and orthology databases). Throughout the revised manuscript, we will ensure that the use of vertebrate-style gene names is accompanied by appropriate annotation information and does not imply unsupported one-to-one orthology relationships.

      (3) More detail is needed on the methods and quality control for the single-cell data. The authors should clarify that the platform used (BMKMANU) is a droplet-based technology comparable in principle to Drop-seq. BMKMANU is not widely used in the field. How does it compare to 10x Genomics in terms of sensitivity and cell recovery? The authors appear to use the 10x Chromium cellranger pipeline for data analysis, which suggests compatibility, but this should be stated explicitly. Additionally, no information is provided on the number of sequencing runs or biological replicates, nor on how reproducible the results are across samples.

      In the revised manuscript, we will expand the Methods section to provide a clearer and more detailed description of the experimental and analytical procedures. BMKMANU is a droplet-based single-cell RNA-seq platform, conceptually comparable to Drop-seq and similar in principle to 10x Chromium. We will also explicitly state that the data generated are compatible with the Cell Ranger pipeline, which was used for downstream processing and analysis. Although BMKMANU is less widely used than 10x Genomics platforms, it has been successfully applied in several recent studies (e.g. Li et al., 2024: https://doi.org/10.1007/s11427-023-2548-3; Li et al., 2025: https://doi.org/10.1038/s41559-025-02642-6; Wei et al., 2024: https://doi.org/10.1038/s41467-024-46780-0), demonstrating its applicability for single-cell transcriptomic analyses across different biological systems. Regarding platform performance, based on technical information provided by the manufacturer, BMKMANU shows comparable sensitivity and cell capture efficiency to 10x Genomics platforms (http://www.biomarker.com.cn/zhizao/dg1000danxibao). In this study, the mantle sample was obtained from a single individual oyster and processed in a single sequencing run, without batch effects introduced by multiple runs. We will clearly state this in the revised manuscript. In addition, we will provide detailed quality control metrics, including the number of cells retained, gene detection rates, and filtering criteria.

      (4) A limitation of the phylostratigraphic analysis is that it is restricted to mantle tissue, making it difficult to place the results in a whole-organism context. How do the age profiles of mantle-expressed genes compare to those of more evolutionarily conserved tissues, such as the nervous system? I appreciate the methodological and experimental constraints, but this is a genuine limitation of the study. The authors could at least discuss it explicitly, and ideally consider generating a broader single-cell atlas of the oyster to provide this comparative baseline.

      We agree that restricting the phylostratigraphic analysis to mantle tissue represents a limitation when attempting to place our findings in a whole-organism evolutionary context. In the revised manuscript, we will explicitly acknowledge this limitation and expand the Discussion to address how gene age profiles in mantle tissue may differ from those in more evolutionarily conserved tissues. In particular, we will clarify that the enrichment of younger, lineage-specific genes observed in shell-forming cells may reflect tissue-specific functional specialization, and therefore should not be directly generalized to other cell types.

      We acknowledge that a broader single-cell atlas spanning multiple tissues would provide an important comparative baseline for interpreting gene age patterns across the organism. While generating such a dataset is beyond the scope of the present study, we will highlight this as an important direction for future research.

      (5) Have the authors considered the potential importance of lineage-specific gene duplication? It is well established that spiralians, including oysters, have undergone extensive lineage-specific duplication of transcription factors such as homeobox genes, and many structural shell-associated proteins may similarly have been duplicated. This could be relevant to interpreting both the phylostratigraphic results and the expansion of secretory gene families.

      We thank the reviewer for this insightful suggestion. Lineage-specific gene duplication is likely to play an important role in shaping both transcription factor repertoires and shell-associated gene families in spiralians, including oysters. In the revised manuscript, we will incorporate a discussion of lineage-specific duplication, particularly in relation to transcription factors and biomineralization-related proteins. We will also, where feasible, explore its potential contribution to our observations and highlight how such duplications may drive the expansion and diversification of secretory gene families.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This paper reports a previously unrecognized mechanism by which platelets compact fibrin fibers during clot retraction. Rather than simply pulling on fibers, the authors propose that platelets generate swirling motions that wind and loop fibrin into dense structures.

      While the results are intriguing, the underlying physical mechanism remains unexplained. In particular, it is unclear how platelets generate swirling motion capable of inducing fibrin coiling, especially when suspended in 3d fibrin mesh. This raises concerns about the conclusions.

      We explained our hypothesis concerning the physical mechanism of how platelets may generate the swirling motion, lines 200-215 and in the discussion under "ideas and speculations". We will provide, however, a more detailed explanation about this process in the revised version.

      The reviewer is right, it is difficult to imagine how platelets in a 3D fibrin mesh can accumulate fibers at the base of their extensions to form a cage-like fiber organisation around the center of the platelets. We therefore developed the 2D fiber-retraction assay, which we believe provides important insights for the coiled fiber accumulations above spread platelets in the 2D situation but also provides a framework for interpreting similar processes that may occur within a 3D clot. In response, we will place greater emphasis on clarifying and strengthening the comparison between the potential mechanistic aspects in the 2D and 3D assays, in order to better support our proposed model.

      Also, does fibrin have inherent chirality or structural asymmetry that could promote coiling independently of platelet activity?

      Yes, double stranded fibrin protofibrils have a helical twist [1]. Furthermore, a clot formed in the absence of platelets and other cellular components shows intrinsic tensile forces [2]. However, we show that inhibition of actomyosin actions prevents fibrin fiber accumulation in the 2D fiber-retraction assay providing evidence that platelet actions are necessary to observe the coiled fibers above spread platelets.

      Furthermore, platelet retraction typically involves platelet aggregation rather than isolated cells, and it is unclear how fibrin coiling would proceed in clustered platelets.

      Under the in vitro fiber retraction conditions used in our study (constrained or unconstrained clots or even in the 2D assay) individual platelets are homogenously distributed within the forming clot or on the coverslip. Therefore, there are no big platelet aggregates or clusters of platelets under our experimental conditions and the results can only demonstrate how individual platelets act on the fibrin fibers. We will emphasize this point in the revised version.

      Reviewer #2 (Public review):

      Summary:

      Grichine et al. investigate platelet-mediated fibrin compaction using human donor platelets and propose a novel mechanistic model in which platelets generate contractile forces and wind fibrin fibers into compact coiled structures. Using a combination of 2D spread assays, 3D clot imaging via expansion microscopy, live-cell imaging, and computational modelling, the authors present evidence of cage-like fibrin architectures, coiled-fibre morphologies, and platelet-centred "rosette" structures present during fibre compaction. They further suggest that actomyosin-driven cytoskeletal dynamics, potentially involving rotational or swirling motion, underlie this proposed winding mechanism, analogous to DNA looping and compaction. The study addresses an important and longstanding question in thrombosis and hemostasis and offers a conceptually novel perspective on clot compaction.

      Strengths:

      The integration of multiple imaging modalities is a notable strength of this paper. In particular, the 2D fiber-retraction assay provides a useful model for understanding the spatio-temporal dynamics of platelet-mediated fibrin compaction, which can be applied to other systems and may yield detailed mechanistic insights into biological processes. The live-imaging approaches are particularly well executed and offer valuable dynamic insight.

      Weaknesses:

      The primary weakness of this paper lies in its descriptive nature and its reliance on correlative rather than causal evidence. Several interpretations are not uniquely supported by the data presented. For example, the categorisation of fibrin accumulation in 2D assays as "fiber winding" and "fibre compaction" remains descriptive without establishing winding as a mechanism.

      In the revised version, we will avoid the terms fiber winding/compaction when introducing the 2D fiber-retraction assay (figure 3) to better align with the level of evidence, since coiled fibers cannot be distinguished in this figure. However, coiled fibers above spread platelets are clearly visible in figure 4 and 8 and dynamic fiber rotations or winding are observed in figure 12 and video 9. These observations will be presented more cautiously, as indicative rather than definitive evidence of a winding mechanism.

      Alternative mechanisms, such as circular bundling, stacked fibers under tension, or fibrin crosslinking-induced aggregation, are neither excluded nor investigated.

      For fibrin fiber bundling, staggered or crosslinked protofilaments no platelet actions are necessary as described previously [2, 3] . Since we observed a clear difference between +/- blebbistatin conditions in the 2D fiber-retraction assay, the fiber compaction we observe depends on platelet actions. Consequently, we consider these alternative mechanisms unlikely based on our data. This will be stated explicitly in the results section.

      Although the authors present compelling live imaging, establishing winding as a dynamic phenotype would require quantitative analyses, such as measuring angular velocities and coiling rates.

      We will incorporate quantitative measurements to complement the observations obtained from live imaging. It is important to note, however, that angular velocities and coiling rates are likely influenced by the number of fiber–fiber contacts present at the time coiling occurs. Specifically, an increased number of contacts is expected to elevate tension within the network, thereby modulating the forces generated by platelets and, consequently, affecting both velocity and coiling dynamics.

      The use of a second fluorophore-labelled fibrin population could further strengthen evidence for rotational dynamics.

      These live videos are quite difficult to acquire because of the following reasons:

      Small platelet size

      Heterogeneity of platelets within the population (10 d half-life, old platelets may not be able to compact fibers efficiently).

      The speed of the process and the time needed to adjust parameters for image acquisition, necessitates an arbitrary choice of the acquisition window and only one acquisition (90 min) per sample preparation is possible.

      Furthermore, the laser induced illumination can perturb the observed processes. We therefore use high-spatial-resolution 3D confocal time-lapse imaging, performed in photon-counting mode with very low laser excitation.

      For these reasons, the use of additional markers would be technically challenging and could perturb the delicate equilibrium and dynamics of the process under investigation.

      Similarly, the inference of rotational contractility or actomyosin "swirling", based on chiral actin organisation and blebbistatin treatment, is not sufficiently supported to conclude that platelets actively wind or loop fibrin fibers.

      Importantly, in the 2D fiber-retraction assay, we do not propose that the rotational actomyosin activity leads to a contractility of the platelets which would allow fiber retraction. Rather, we suggest that cytoskeletal actomyosin swirling (as demonstrated for nucleated cells by Bershadsky's team) can induce rotational dragging of extracellular bound fibrin fibers around the pseudonucleus of spread platelets thereby promoting accumulation of fibrin fibers. Consistent with this interpretation, inhibition of myosin by blebbistatin prevents the accumulation of fibrin fibers above spread platelets in the 2D fiber-retraction assay (Fig. 3).

      The mathematical model, while complementary and well-constructed, relies on multiple assumptions and lacks predictive validation.

      We thank the reviewer for this insightful comment and acknowledge that the proposed model relies on several important assumptions. In our view, the most significant assumption is that integrin molecules undergo rotational downstream motion as a consequence of their coupling to the swirling cytoskeleton. To assess the necessity and impact of these assumptions, we will perform additional calculations and include the results in the Supplementary Information. These analyses will also provide further validation of the proposed model and underlying mechanism. At the same time, it is important to emphasize that the primary purpose of the model was to examine whether the hypothetical swirling dynamics of the cytoskeleton, together with the associated receptors, could in principle reproduce the experimentally observed fibrin organization.

      Appraisal:

      While the authors successfully document intriguing fibrin architectures and provide a compelling descriptive framework, they do not fully demonstrate a mechanistic model of active fibrin winding by platelets. The conclusions regarding platelet-driven winding and rotational dynamics are not sufficiently supported by direct or quantitative evidence. To substantiate these claims, the study would benefit from experiments that directly link platelet dynamics to fibrin organisation, including coordinated measurements of platelet motion and fibre rearrangement. As it stands, the results are suggestive but do not definitively support the proposed mechanism.

      Discussion and Impact:

      Despite these limitations, the study addresses an important question in thrombosis and hemostasis and introduces a potentially impactful conceptual framework for understanding clot compaction. The imaging approaches and datasets presented will be valuable to the community, particularly for researchers interested in platelet mechanics and fibrin organisation. However, the overall impact will depend on whether the proposed mechanism can be more rigorously validated. In its current form, the study presents an interesting and thought-provoking model, but would benefit from either stronger experimental support for the proposed mechanisms or a more cautious interpretation of the findings.

      We agree that the proposed mechanism requires further validation. In the revised manuscript, we will therefore present a more cautious and explicitly hypothesis-driven interpretation of the mechanism. We hope that the publication of our observations will be of interest to researchers in the field of thrombosis and clot mechanics who possess the specialized tools and expertise necessary to rigorously evaluate and either substantiate or refute the proposed mechanistic model.

      Reviewer #3 (Public review):

      Summary:

      This work aims to understand the mechanisms that platelets use to interact with and compact fibrin fibers during clot formation. This is an important process during wound healing, and recent work has demonstrated that platelets play a critical role in generating the force required to drive the accumulation of fibrin. The authors argue that current models are insufficient to account for the observed reduction in clot volume and propose that platelets actively 'wind up' these fibers by undergoing myosin-dependent rotation. While interesting, the experiments performed by the authors do not directly test this mechanism, and further evidence is required to support their claims.

      Weaknesses:

      (1) The motivation to switch from the system used in Figures 1 and 2 to the '2D fiber-retraction assay' is not clear. While the authors state that this system has 'reduced complexity', the differences between these assays appear to disrupt the 'cage-like' organization of fibrin around platelets shown in Figures 1 and 2 (compare images in Figure 2 with those in Figure 4). An in-depth comparison of two methods is needed to support the conclusions from the 2D system.

      We agree that the cage-like fibrin organization around platelets is disrupted in the 2D fiber-retraction assay when platelets are completely spread on the coverslip before they have encountered fibrin fibers (Fig. 4). However, some platelets form the same number of extensions as platelets in a 3D clot (Fig. 9 A, B) and are not completely spread on the glass surface. For these platelets a cage-like fibrin organisation is retained under the 2D conditions (Fig. 5 and 6). However, the fiber density at the base of the bulbs is higher in the 2D assay than under the constrained 3D clot retraction conditions (Fig. 1C and Fig. 2), probably because in the 2D condition the fibers are less constrained and readily available for compaction.

      Furthermore, the change in plasma volume (Figure 2 vs Figure 7) should also be tested - the authors state that this increases fibrin fiber formation, but this is not quantified or demonstrated in the figures. Notably, this appears to change the morphology of the fibrin fibers shown (comparing Figure 2 and Figure 7).

      We thank the reviewer for raising this point. We would like to clarify that Figure 2 and Figure 7 correspond to two distinct experimental setups: the constrained clot retraction assay (Figure 2) and the 2D fiber-retraction assay (Figure 7). As such, they are not directly comparable. We understand, however, that the reviewer is likely referring to the apparent differences between Figures 3–6 (lower plasma volume, higher fiber density) and Figures 7–8 (higher plasma volume, lower apparent fiber density).

      The reduced number of visible fibers in the latter condition is not solely a consequence of plasma volume per se, but rather results from the formation of a labile fibrin gel at higher plasma concentrations, which is lost during the fixation and aspiration steps. This effect was initially observed across samples from two donors with differing plasma fibrinogen levels. In one case, an unusually low fibrinogen concentration allowed the addition of higher plasma volumes without inducing gel formation. In contrast, in the other sample, a more typical fibrinogen level resulted in gel formation under the same conditions.

      Importantly, we performed all experiments using matched donor plasma and platelets. As a result, the precise fibrinogen concentration could not be determined prior to experimentation. Nonetheless, post hoc measurements confirmed that fibrinogen levels in most donor samples fell within the normal physiological range, which allowed us to always use the same plasma volumes for low and high plasma concentrations (4ul/ml PBS and 7 ul/ml PBS, respectively) except for one donor as mentioned above.

      (2) It is unclear how the classification of platelets as 'fiber-winding' versus 'fiber compaction' differs in Figure 2. The criteria used for these classifications should be stated. Further, it seems premature to characterize fibers as wound without having established this earlier in the manuscript.

      The reviewer probably refers to figure 3 and he is right; it is premature to mention fiber winding at this stage of the results section (see our response to reviewer #2). In the revised version, we will therefore present the criteria used to classify the different degrees of fiber accumulations without referring to fiber winding.

      (3) Is the 'gearwheel' different from the 'cage' of fibrin fibers? They appear similar, but it is difficult to distinguish between them with only qualitative descriptions of these phenotypes.

      The "gearwheel" is observed for completely spread platelets in the 2D fiber-retraction assay and a figure illustrating our hypothetical speculations to compare the 2D gearwheel with the 3D clot situation is presented in the discussion under the "Ideas and Speculations" paragraph (Fig. 13). We will give a more comprehensive explanation in the revised version.

      (4) The quantification of platelet extensions in Figure 9 is confusing. While those in 9A are clear, those in 9B are not. For instance, what is the difference between #7 and #8 in the middle panel of 9B? It does not seem like #8 is labeling an extension.

      For the platelet shown in the middle panel of Figure 9B, the extensions cannot be clearly distinguished in the MIP (Maximum Intensity Projection) image because extension #8 is positioned above extension #7 and is therefore superimposed in the projection. However, the two extensions can be differentiated when examining the 3D image stack (Video 4). As indicated in the figure legend, the number of extensions was determined manually by scrolling through the z-stack image sequence. In the revised version, we will also define the abbreviation “MIP” as Maximum Intensity Projection.

      (5) It is unclear what the modeling accomplishes, as there is no comparison between the results of these simulations and their experiments.

      We thank the reviewer for this valuable concern. We chose not to combine the experimental fibrin organization and the modeling results within the same figure panel, as the resulting image would be too complex and difficult to interpret. However, we will provide a more detailed comparison between the experimental observations and the modeling results in the Results section. It is also important to emphasize that the comparison between the model and the experimental data was intended to be primarily qualitative rather than quantitative.

      (6) The data presented in Figure 12 provides the most direct support for their mechanism, but falls short of directly testing their claims. These experiments should be repeated to include blebbistatin to test the contribution of myosin and include quantitative rather than qualitative comparisons of these experiments.

      As mentioned already above, these live videos are quite tricky to acquire because of the following reasons:

      Small platelet size

      Heterogeneity of platelets within the population (10 d half-life, old platelets may not be able to compact fibers efficiently).

      The speed of the process and the time required to optimize imaging parameters, necessitate the selection of an arbitrary acquisition window. Consequently, only a single acquisition of approximately 90 min can be performed per sample preparation, with no guarantee that relevant platelet-fibrin interactions can be acquired in the acquisition window.

      Furthermore, after blood donation, the first sample is usually ready to be acquired around 3 pm, acquisition time 90 min. At least 10 successful acquisitions per condition would be required to ensure statistical robustness, but maximal 4 can be acquired per donor, because platelet samples start to deteriorate within twelve hours after blood donation.

      Taken together, the intrinsic heterogeneity of the platelet population, the low likelihood of capturing informative events, and the limited availability of suitable imaging resources at our institute render a robust and quantitative comparison between conditions with and without blebbistatin extremely challenging, if not impractical, within a reasonable timeframe.

    1. Author response:

      eLife Assessment

      This valuable study reports that the ALDH-abundant cells display stem cell properties and may play a key role in the endometrial epithelial development in the mouse. The data supporting the main conclusion are solid, although further improvements are needed to strengthen the conclusions. This work will be of great interest to reproductive biologists and biomedical researchers working on women's reproductive health.

      We thank the reviewers and editor for their critical reading and assessment of our manuscript. We carefully considered each of the points raised by the reviewers. In this document and in the edited manuscript and figures, we have carefully addressed each of the comments and requested modifications. In light of these changes, we expect that you will find that the manuscript has improved.

      We indicate our responses to the reviewers below in blue font and highlight the changes in the manuscript using the line numbers corresponding to the tracked version of the revised document.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Tang et al. characterizes the expression dynamics and functional roles of aldehyde dehydrogenase 1 activity in uterine physiology. Using a combination of in vivo lineage tracing and cell ablation coupled with organoid culture, the authors propose that Aldh1a1 lineage-marked cells contribute to uterine gland development and epithelial regeneration. The descriptive data will be of interest to reproductive biologists and clinicians and will build on established hypotheses in the field. The manuscript is well written and scientifically sound; however, several experimental limitations and interpretation caveats should be addressed.

      We thank the reviewer for their comments and expert assessment of our paper.

      (1) The methods surrounding the passage number and duration of culture following sorting prior to transcriptomic profiling should be clarified in the figure legends. Related to this, the representative images in Figures 1D and 1E do not appear consistent with the quantification presented in Figures 1F-H and should be reconciled.

      Thanks for this comment. We have now clarified this in the Figure 1 legend as follows,

      Lines 1026-1029: “Organoid formation assay performed immediately after luminal epithelial cell isolation and by plating equal numbers of viable ALDH<sup>LO</sup> (D) and ALDH<sup>HI</sup> (E) epithelial cells. ALDH<sup>LO</sup> and ALDH<sup>HI</sup> organoids were cultured for two weeks and passaged once prior to the organoid formation assays and transcriptomic analyses.”

      Regarding the second comment, we recognize that the images we showed may not have been the most representative of our quantification. As such, we replaced them with the organoid images below so that they better reflect the quantification outlined in Figure 1F-H.

      (2) The conclusion that ALDH1A1+ cells are enriched in populations with stem cell characteristics relies primarily on transcriptomic analysis. Protein-level co-localization should be performed to strengthen this claim.

      We thank the reviewer for this comment. Unfortunately, the antibodies for many of these stem cell markers (such as LGR5, AXIN2, and SUSD2) are not well-suited for immunostaining. Others that have been proposed in human and are amenable to immunostaining are not suitable markers for mouse endometrial stem cells (such as CDH2). We hope that by showing that ALDH1A1 is expressed in patterns that are similar to the previously published stem cell markers LGR5 and AXIN2 (i.e., throughout the epithelium in the developing uterus and subsequently enriched in the tips of the endometrial glands of adult mice), along with transcriptomic studies, we can demonstrate its utility as a marker for mouse endometrial stem cells.

      (3) The overlap of 19 genes between the data set here and AXIN2 HI data is presented as evidence of shared stemness identity, but no statistical assessment of this overlap is provided. A hypergeometric test should be performed to determine whether this overlap is greater than expected by chance.

      Thank you for this suggestion. We have performed a hypergeometric test and determined that the reported shared genes between the two datasets are greater than is expected by chance. We have updated the results section to state the following:

      Lines 133-141: "We determined that the overlap between ALDH<sup>HI</sup> and Axin2<sup>+</sup> stemness marker genes was significantly greater than expected by chance for both upregulated (21/346 genes, 1.81-fold enrichment, p = 0.0067) and downregulated (19/674 genes, 1.67-fold enrichment, p = 0.021) gene sets (hypergeometric test, universe = 23,182 genes)."

      (4) The impact of tamoxifen injection on Aldh1a1 expression should be characterized in the neonatal uterus, as tamoxifen itself has known estrogenic activity that could confound interpretation of the lineage tracing results at early postnatal timepoints.

      Although we took measures to control for this possibility by using multiple time-points and models to trace the impact of Aldh1a1<sup>+</sup> cells in development and adulthood, we recognize the importance of this comment and acknowledge that this is a limitation in the design of our study. We have included the following text to the Discussion acknowledging this point:

      Lines 434-442: “Given the well-documented impacts of tamoxifen for lineage tracing studies, it is imperative to use doses of tamoxifen that will minimize estrogenic impacts and result in off-target effects (Rios et al., 2016). This often requires administration at doses that will achieve maximal recombination of the desired gene, while ensuring that the potential deleterious impacts of tamoxifen are minimized (Chen et al., 2023; Pimeisl et al., 2013). The cre/ERT2 tamoxifen inducible model is widely used to study uterine biology where it serves as a useful tool to interrogate the spatiotemporal impact of key genes, either through inactivation or for lineage tracing. Despite its widely documented utility across many tissue types and developmental timepoints, the use of tamoxifen and its impacts on the endometrium remain a limitation of our study, which we tried to address by implementing multiple timepoints, doses, and orthogonal assays in our experimental design.”

      (4b) Related to this, while low-dose tamoxifen is shown to label individual cells within 24 hours of injection, the translation dynamics of the label following Cre-mediated recombination can require up to 72 hours. The presence of only a few labeled clones at PND8 but multiple separate clones per cross-section at later timepoints warrants discussion and may reflect labeling kinetics rather than clonal expansion.

      The reviewer raises an important point. We agree that the 72hr-translation kinetics of the cre-mediated recombination is a legitimate consideration for interpreting our data and we have added the text below to the Discussion section acknowledging this point.

      We have addressed this by adding the following text to the discussion:

      Lines 418-423: We hypothesized that the singly labeled cells observed from one day tracing experiments expanded in a clonal fashion during the various timepoints we measured. We note that the translation kinetics of the labeled cells following cre-mediated recombination may contribute to the limited labeling observed at PND8/PND15 and there is a potential for delayed labeling of cells between 24 and 72 hours of tamoxifen administration. However, the continuous increase in labeled cells at the subsequent timepoints favors our interpretation of clonal expansion as the primary explanation.

      (5) It would strengthen the in vivo ablation data to validate the degree of cell death following diphtheria toxin treatment directly. It is possible that a general decrease in cell number rather than specific loss of a stem cell population is responsible for the observed reduction in gland number and FOXA2 expression (Tongtong et al 2017).

      We agree that this is an important control to incorporate into our experimental design. To rule out this possibility, we performed immunohistochemistry of cleaved caspase 3 in the uterine tissues of DTR<sup>flox/flox</sup> and DTR<sup>flox/flox</sup>;Aldh1a1<sup>cre/ERT2</sup> mice 4 days after administration of diphtheria toxin. The results indicate similar levels of cleaved caspase 3 detection in both genotypes, suggesting that the decrease in FOXA2+ cells is not due to non-specific cell death, but rather the result of ALDH1A1<sup>+</sup> cells. These data and the following text have been added to the manuscript:

      Lines 321-325: “We determined that the decreased in FOXA2<sup>+</sup> cells in the experimental mice was not the result of non-specific DT-mediated cell death, as similar levels of cleaved caspase 3-positive cells were detected in the DT-treated control ROSA26<sup>DTR/DTR</sup> and ROSA26<sup>DTR/DTR</sup>;Aldh1a1<sup>cre/ERT2/+</sup> mice 4 days post-diphtheria toxin administration (Figure S3G-H’).”

      (6) The lineage tracing data in the postpartum endometrium demonstrate that Aldh1a1-marked cells are present during regeneration, but it remains unclear whether these cells are preferentially activated or expanded in response to tissue injury. Coupling these studies with diphtheria toxin-mediated ablation during active regeneration would more directly test the proposed regenerative role of this population.

      This is a great point and one that we would be very interested in pursuing as follow-up studies in our future work. Regretfully, due to the long generation time and experimental procedures associated with these proposed studies, we are not able to include these experiments in the current manuscript. Thus, we have changed our wording and conclusions throughout the manuscript to be less definitive in terms of the role of Aldh1a1 in regeneration, since this will be the focus of future studies

      The contribution of stromal Aldh1a1 lineage-positive cells is underexplored in the discussion, given the lineage tracing data showing stromal labeling across multiple timepoints and its potential relevance to mesenchymal-to-epithelial transition.

      Thank you for the suggestion. We have now expanded this section in the Discussion to include the following:

      Lines 497-505: We also found ALDH1A1<sup>+</sup> stromal cells were more prevalent when tracing began in adult mice. Other studies have shown that mesenchymal cells contribute to endometrial regeneration in the postpartum phase or after induced menses through a process of MET (Cousins et al., 2014; Kirkwood et al., 2022; Li et al., 2025). Similarly, lineage tracing studies have shown that MET is an active process and contributes to epithelial cell regeneration in the post-partum phase (Huang et al., 2012; Patterson et al., 2013). Although this is an area of active investigation in the field, with some contradicting reports, it is plausible to hypothesize that endometrial tissue has the capacity to undergo wound-healing and regeneration via several mechanisms (Ang et al., 2023; Ghosh et al., 2020). The process of MET in wound healing is widely documented in other organs, such as the kidney, liver and lung, where MET is associated with depletion of the resident epithelial cell pool (Bi et al., 2012; Niayesh-Mehr et al., 2024; Zeisberg et al., 2005).

      Finally, the word 'control' may overstate the functional evidence presented. 'Contribute' may be more accurate given the partial and context-dependent nature of the phenotypes observed.

      We agree with the reviewer’s point that control may overstate the evidence that we provide in the manuscript. To reflect this, we have edited the manuscript title and text to address this suggestion.

      Reviewer #2 (Public review):

      Tang et al. investigated the contribution of Aldh1a1+ cells, as putative stem/progenitor cells, to endometrial development, maintenance during the estrous cycle, and postpartum repair in mouse models. They employed in vitro organoid formation and in vivo lineage tracing models coupled with RNA-seq to test the stem-ness of Aldh1a1+ cells. They found that mouse endometrial cells with high ALDH activity (using the ALDEFLUOR assay) formed more and larger organoids and were enriched for stem/progenitor cell gene signatures. Similar results were shown using endometrial cells from a human patient sample. Epithelial ALDH1A1 expression was shown to be hormonally regulated, becoming more restricted to the glands, a putative epithelial stem cell niche, under estrogen stimulation. Using lineage-tracing initiated postnatally/prepubertally, Aldh1a1+ epithelial cells were shown to expand, contributing to both the luminal and glandular epithelium into adulthood, whereas adult initiation of labeling showed expansion of stromal Aldh1a1+ cells but not epithelial. Postnatal ablation of single-labeled Aldh1a1+ epithelial cells resulted in impaired gland development. Lastly, Aldh1a1-lineage traced cells (adult labeled) were present during postpartum endometrial repair as were epithelial/mesenchymal transitional cells.

      This study addresses an important area of research in the field of endometrial stem/progenitor cell biology. The authors are commended for their use of multiple complementary methods, including lineage tracing, DTR-mediated cell ablation, organoid assays, and RNA-seq in mouse and human models to assess the stem-like nature of Aldh1a1+ cells. The data support the stem/progenitor phenotype of Aldh1a1+ epithelial cells during endometrial development; however, there are noted discrepancies between organoid formation assays and lineage tracing experiments regarding the stemness of Aldh1a1+ epithelial cells in adults. Specifically, organoids were generated from adult cells and demonstrated in vitro stem cell activity; however, in vivo lineage-tracing of adult cells either during the estrous cycle or postpartum repair does not show expansion of Aldh1a1+ cells, suggesting they do not have stem/progenitor activity. Additionally, the stem-ness of epithelial vs stromal Aldh1a1+ cells is confounded in the study because epithelial cells were not purified for organoid experiments, epithelial cells were not exclusively lineage-traced as stromal cells were also labeled, and mesenchymal-epithelial transition was suggested to occur during postpartum repair. The following specific comments are presented to detail these concerns:

      We thank the reviewer for their critical reading of our manuscript and constructive comments.

      (1) The statement in the brief summary, "...critical for lifelong endometrial regeneration," is not supported by the data provided.

      We have edited the brief summary to exclude this statement, it now reads as follows:

      Lines 4-5: “We uncover ALDH1A1<sup>+</sup> cells as a group of hormone sensitive stem cells contributing to endometrial development and regeneration.”

      (2) AlDH1A1 is not restricted to the endometrial epithelium, and epithelial cells were not purified by flow cytometry for experiments in Figure 1. Figure 2 clearly shows the presence of mesenchymal cells, even using the described method for enriching for epithelial cells. Therefore, contaminating mesenchymal cells with high ALDH activity may confound the experimental results in Figure 1, either through promoting epithelial cell growth or through MET. The authors should provide clear evidence of epithelial purity in organoid experiments or that mesenchymal cells are not contained in the ALDHhi population. These comments also apply to the human organoid experiments in Figure 7.

      We thank the reviewer for raising this important point. Our group has been using the enzymatic method to routinely separate epithelial from stromal cell populations from the mouse uterus (see references dating back to 2015, PMID 26721398, 28324064, 34099644). In these experiments we typically obtain >98% purity in the epithelial and stromal cell compartments, respectively. We can directly observe this purity in the immunofluorescence images shown below, where mouse endometrial epithelial cells and stromal cells were enzymatically separated and immunostained with E-cadherin and vimentin antibodies to detect epithelial and mesenchymal cells in both cell preparations. The images show very few contaminating epithelial and stromal cells in either cell preparation. We have observed similar results when preparing epithelial and stromal cell preparation from the human endometrium, where the epithelial cell organoids display high purity with ~100% epithelial cell expression when we perform immunostaining.

      Author response image 1.

      Purity of mouse endometrial epithelial cells obtained via enzymatic and mechanical dissociation. A-B) Shows the epithelial (A) and stromal (B) cells plated on glass coverslips and immunostained with an epithelial cell marker (cytokeratin 8, red), a stromal cell marker (vimentin, green), and DAPI.

      Author response image 2.

      Human endometrial epithelial organoids were fixed and immunostained with cytokeratin 8 (green) and DAPI. The images are typical for our epithelial cell cultures and demonstrate that all epithelial cells are CK8-positive.

      (3) Lines 186-187: Susd2 was increased in EpSC clusters, yet this is a mesenchymal stem/progenitor marker in humans. The authors should discuss the implications of this.

      We thank the reviewer for highlighting this. We have now included the following in our Discussion to address this point:

      Lines 528-533: Clustering with this population of EpSCs were Susd2<sup>+</sup> cells, which are well-characterized mesenchymal progenitors that are enriched in the perivascular regions of the human endometrium (Darzi et al., 2016; Khanmohammadi et al., 2021). The presence of Susd2<sup>+</sup> cells, while unexpected in an epithelial stem cell niche, could indicate the presence of a transitional mesenchymal or perivascular cell that is differentiating into epithelium. Evidence for both mesenchymal and Nestin2<sup>+</sup> pericytes have been recently described in the mouse endometrial epithelium (Kirkwood et al., 2022; Li et al., 2025).

      (4) In Figure 5, RFP+ epithelial cells should be quantified as in previous figures to substantiate the statement in lines 279-280, "At PPD5, the proportion of RFP+ epithelial cells had expanded relative to PPD1 and PPD3 (Figure 5E-E')." Especially because in the low mag images (C-E), RFP+ epithelial cells appear to be most abundant at PPD1 and decrease at PPD3 and PPD5, suggesting that they may not be involved in endometrial regeneration/repair (contradicting the interpretation in line 285). Further, if there is in fact a decrease over postpartum repair, then regeneration should be removed from the title of the manuscript. RFP+ stromal cells should also be quantified.

      We appreciate this reviewer’s comment and agree that as stated, the conclusion is not fully supported by the data. To address this comment, we have edited the results so that they clearly indicate the results and remove any ambiguity:

      As requested, we quantified the number of RFP+ stromal and epithelial cells during the postpartum phase and noted that RFP+ cells were prominent in the stromal compartment of the endometrium. While RFP+ epithelial were also observed during these timepoints, they were less abundant than RFP+ stromal cells. Because the number of RFP+ cells did not significantly change over the postpartum phases in neither the stromal nor epithelial compartment, we have modified our conclusion to state that ALDH1A1+ cells are transiently detected in the regenerating endometrium.

      Results:

      Lines 286-295: “By analyzing the uterine tissues near the placental detachment site, we observed that RFP positive cells were prominent in the endometrial stromal cells that were adjacent to the luminal epithelium (Figure 5C-C’, green arrows). RFP<sup>+</sup> cells were also observed in the stromal cells near the placental detachment sites at PPD1 and PPD3 (Figure 5D’-E’, red & blue arrows) and in limited luminal epithelial cells (Figure 5D”,E”). Quantification of RFP<sup>+</sup> cells throughout these postpartum phases indicated that stromal cells had more frequent ALDH1A1<sup>+</sup> stromal cells (360 ± 103, PPD1, n=3; 217 ± 107, PPD3, n=3; 254 ± 32, PPD5, n=4) than ALDH1A1<sup>+</sup> epithelial cells in the regenerating endometrium (65 ± 65, PPD1, n=3; 20 ± 10, PPD3, n=3; 114.25 ± 39, PPD5, n=4) (Figure S4).”

      Discussion:

      Lines 513-521: “We also noted that a majority of ALDH1A1<sup>+</sup> cells were localized to the active areas of endometrial regeneration near the placental detachment sites at PPD1 with a pronounced expression in the sub-epithelial stromal cells. As regeneration progressed, we continued to observe ALDH1A1<sup>+</sup> cells in the stromal compartment within the placental detachment sites at PPD3 and PPD5, with a progressive, but not statistically significant, increase in ALDH1A1<sup>+</sup> epithelial cells. Collectively, our data demonstrate that ALDH1A1<sup>+</sup> lineage cells participate in the restoration of endometrial architecture and functional compartments in the postpartum phase, even if their direct contribution is transient. Future detailed and mechanistic studies will be necessary to fully characterize their role in this process and their long-term consequence in postpartum regeneration.”

      (5) For Figure 7F, it should be clearly stated in the main text that the results are from one patient sample and the data presented are experimental replicates, so as not to be confused with biological replicates (the same for Supplementary Figure S4). Were B and G in Figure 7 also from one patient?

      Thanks for pointing this out. We have edited the figure legends in the main text and supplemental figures to indicate this.

      Lines 337-338: “…main figures show representative results from one patient sample performed in technical replicates, with additional patient samples included in the supplement…”

      (6) Lines 425-427: "Ovariectomized mice treated with 90-day E2 pellets, on the other hand, showed a complete restriction of ALDH1A1 to the glandular crypts." In Figure 2 S' ALDH1A1+ cells are visible in the LE (the staining is lighter than in the GE but looks real), contradicting this statement.

      This is an important distinction. We have now edited this part of the manuscript to state:

      Lines 459-462: “Ovariectomized mice treated with 90-day E2 pellets, on the other hand, showed enriched ALDH1A1 in the glandular crypts with weak luminal epithelial staining, while the ovariectomized controls had strong ALDH1A1 expression throughout the luminal and glandular epithelium.”

      (7) Lines 466-467: "In cycling mice, we found sporadic cells that expressed both stromal and epithelial markers in the ALDHA1+ cells." These data are not presented.

      We apologize for the confusion, this sentence has been removed from the discussion.

      (8) These data support the role of Aldh1a1+ cells in endometrial epithelial development, but conclusions about their role in repair/regeneration should be tempered as the data are much weaker here.

      We thank the reviewer for their overall assessment. To address this point, we have thoroughly edited the appropriate areas to temper the conclusions and ensure that they are strongly supported by our data. We have also edited the manuscript’s title to reflect this.

      Reviewer #3 (Public review):

      Summary:

      Tan et al demonstrated the importance of ALDH-high cells in the epithelial development in the mouse endometrium, and these cells displayed properties of stem cells.

      We thank the reviewer for their assessment of our manuscript.

      Strengths:

      The findings are solid, supported and validated through a combination of technical methods. I appreciated this combined use of mouse and human endometrial cells to strengthen the findings. Genomic results from a single-cell sequencing dataset were informative as they depicted the different stages of the estrus cycle during the regeneration process. Verification with immunostainings with various markers made it convincing for readers to visualize the cell's location, progression, and status at different timepoints. Utilizing human endometrial cells further demonstrated that the phenomenon observed in mice can be translated to humans.

      This work will greatly advance the understanding of endometrial regeneration for reproductive biologists.

      We thank the reviewer for their expert assessment and positive comments regarding our manuscript.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      Reference

      Ang, C.J., Skokan, T.D., and McKinley, K.L. (2023). Mechanisms of Regeneration and Fibrosis in the Endometrium. Annu Rev Cell Dev Biol 39, 197-221.

      Bi, W.R., Jin, C.X., Xu, G.T., and Yang, C.Q. (2012). Bone morphogenetic protein-7 regulates Snail signaling in carbon tetrachloride-induced fibrosis in the rat liver. Exp Ther Med 4, 1022-1026.

      Chen, M.Y., Zhao, F.L., Chu, W.L., Bai, M.R., and Zhang, D.M. (2023). A review of tamoxifen administration regimen optimization for Cre/loxp system in mouse bone study. Biomed Pharmacother 165, 115045. Cousins, F.L., Murray, A., Esnal, A., Gibson, D.A., Critchley, H.O., and Saunders, P.T. (2014). Evidence from a mouse model that epithelial cell migration and mesenchymal-epithelial transition contribute to rapid restoration of uterine tissue integrity during menstruation. PLoS One 9, e86378.

      Cousins, F.L., Pandoy, R., Jin, S., and Gargett, C.E. (2021). The Elusive Endometrial Epithelial Stem/Progenitor Cells. Front Cell Dev Biol 9, 640319.

      Darzi, S., Werkmeister, J.A., Deane, J.A., and Gargett, C.E. (2016). Identification and Characterization of Human Endometrial Mesenchymal Stem/Stromal Cells and Their Potential for Cellular Therapy. Stem Cells Transl Med 5, 1127-1132.

      Ghosh, A., Syed, S.M., Kumar, M., Carpenter, T.J., Teixeira, J.M., Houairia, N., Negi, S., and Tanwar, P.S. (2020). In Vivo Cell Fate Tracing Provides No Evidence for Mesenchymal to Epithelial Transition in Adult Fallopian Tube and Uterus. Cell Rep 31, 107631.

      Huang, C.C., Orvis, G.D., Wang, Y., and Behringer, R.R. (2012). Stromal-to-epithelial transition during postpartum endometrial regeneration. PLoS One 7, e44285.

      Khanmohammadi, M., Mukherjee, S., Darzi, S., Paul, K., Werkmeister, J.A., Cousins, F.L., and Gargett, C.E. (2021). Identification and characterisation of maternal perivascular SUSD2(+) placental mesenchymal stem/stromal cells. Cell Tissue Res 385, 803-815.

      Kirkwood, P.M., Gibson, D.A., Shaw, I., Dobie, R., Kelepouri, O., Henderson, N.C., and Saunders, P.T.K. (2022). Single-cell RNA sequencing and lineage tracing confirm mesenchyme to epithelial transformation (MET) contributes to repair of the endometrium at menstruation. Elife 11.

      Li, S.Y., Whiteside, S., Li, B., Sun, X., and DeFalco, T. (2025). Mesenchymal-to-epithelial transition of perivascular cells contributes to endometrial re-epithelialization. Nat Commun 16, 10174.

      Niayesh-Mehr, R., Kalantar, M., Bontempi, G., Montaldo, C., Ebrahimi, S., Allameh, A., Babaei, G., Seif, F., and Strippoli, R. (2024). The role of epithelial-mesenchymal transition in pulmonary fibrosis: lessons from idiopathic pulmonary fibrosis and COVID-19. Cell Commun Signal 22, 542.

      Patterson, A.L., Zhang, L., Arango, N.A., Teixeira, J., and Pru, J.K. (2013). Mesenchymal-to-epithelial transition contributes to endometrial regeneration following natural and artificial decidualization. Stem Cells Dev 22, 964-974.

      Pimeisl, I.M., Tanriver, Y., Daza, R.A., Vauti, F., Hevner, R.F., Arnold, H.H., and Arnold, S.J. (2013). Generation and characterization of a tamoxifen-inducible Eomes(CreER) mouse line. Genesis 51, 725-733.

      Rios, A.C., Fu, N.Y., Cursons, J., Lindeman, G.J., and Visvader, J.E. (2016). The complexities and caveats of lineage tracing in the mammary gland. Breast Cancer Res 18, 116.

      Seishima, R., Leung, C., Yada, S., Murad, K.B.A., Tan, L.T., Hajamohideen, A., Tan, S.H., Itoh, H., Murakami, K., Ishida, Y., et al. (2019). Neonatal Wnt-dependent Lgr5 positive stem cells are essential for uterine gland development. Nat Commun 10, 5378.

      Zeisberg, M., Shah, A.A., and Kalluri, R. (2005). Bone morphogenic protein-7 induces mesenchymal to epithelial transition in adult renal fibroblasts and facilitates regeneration of injured kidney. J Biol Chem 280, 8094-8100.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, the authors have addressed the recruitment and firing patterns of motor units (MUs) from the long and lateral heads of the triceps in the mouse. They used their newly developed Myomatrix arrays to record from these muscles during treadmill locomotion at different speeds, and they used template-based spike sorting (Kilosort) to extract units. Between MUs from the two heads, the authors observed differences in their firing rates, recruitment probability, phase of activation within the locomotor cycle, and interspike interval patterning. Examining different walking speeds, the authors find increases in both recruitment probability and firing rates as speed increases. The authors also observed differences in the relation between recruitment and the angle of elbow extension between motor units from each head. These differences indicate meaningful variation between motor units within and across motor pools and may reflect the somewhat distinct joint actions of the two heads of triceps.

      Strengths:

      The extraction of MU spike timing for many individual units is an exciting new method that has great promise for exposing the fine detail in muscle activation and its control by the motor system. In particular, the methods developed by the authors for this purpose seem to be the only way to reliably resolve single MUs in the mouse, as the methods used previously in humans and in monkeys (e.g. Marshall et al. Nature Neuroscience, 2022) do not seem readily adaptable for use in rodents.

      The paper provides a number of interesting observations. There are signs of interesting differences in MU activation profiles for individual muscles here, consistent with those shown by Marshall et al. It is also nice to see fine-scale differences in the activation of different muscle heads, which could relate to their partially distinct functions. The mouse offers greater opportunities for understanding the control of these distinct functions, compared to the other organisms in which functional differences between heads have previously been described.

      The Discussion is very thorough, providing a very nice recounting of a great deal of relevant previous results.

      We thank the Reviewer for these comments.

      Weaknesses:

      The findings are limited to one pair of muscle heads. While an important initial finding, the lack of confirmation from analysis of other muscles acting at other joints leaves the general relevance of these findings unclear.

      The Reviewer raises a fair point. While outside the scope of this paper, future studies should certainly address a wider range of muscles to better characterize motor unit firing patterns across different sets of effectors with varying anatomical locations. Still, the importance of results from the triceps long and lateral heads should not be understated as this paper, to our knowledge, is the first to capture the difference in firing patterns of motor units across any set of muscles in the locomoting mouse.

      While differences between muscle heads with somewhat distinct functions are interesting and relevant to joint control, differences between MUs for individual muscles, like those in Marshall et al., are more striking because they cannot be attributed potentially to differences in each head's function. The present manuscript does show some signs of differences for MUs within individual heads: in Figure 2C, we see what looks like two clusters of motor units within the long head in terms of their recruitment probability. However, a statistical basis for the existence of two distinct subpopulations is not provided, and no subsequent analysis is done to explore the potential for differences among MUs for individual heads.

      We agree with the Reviewer and have revised the manuscript to better examine potential subpopulations of units within each muscle as presented in Figure 2C. We performed Hartigan’s dip test on motor units within each muscle to test for multimodal distributions. For both muscles, p > 0.05, so we can not reject the null hypothesis that the units in each muscle come from a multimodal distribution. However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.

      Still, the limited sample size warrants further data collection and analysis since the varying properties across motor units may lead to different activation patterns. Given these results, we have edited the text as follows:

      “A subset of units, primarily in the long head, were recruited in under 50% of the total strides and with lower spike counts (Figure 2C). This distribution of recruitment probabilities might reflect a functionally different subpopulation of units. However, the distribution of recruitment probabilities were not found to be significantly multimodal (p>0.05 in both cases, Hartigan’s dip test; Hartigan, 1985). However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.”

      The statistical foundation for some claims is lacking. In addition, the description of key statistical analysis in the Methods is too brief and very hard to understand. This leaves several claims hard to validate.

      We thank the Reviewer for these comments and have clarified the text related to key statistical analyses throughout the manuscript, as described in our other responses below.

      Reviewer #2 (Public review):

      The present study, led by Thomas and collaborators, aims to describe the firing activity of individual motor units in mice during locomotion. To achieve this, they implanted small arrays of eight electrodes in two heads of the triceps and performed spike sorting using a custom implementation of Kilosort. Simultaneously, they tracked the positions of the shoulder, elbow, and wrist using a single camera and a markerless motion capture algorithm (DeepLabCut). Repeated one-minute recordings were conducted in six mice at five different speeds, ranging from 10 to 27.5 cm·s⁻¹.

      From these data, the authors reported that:

      (1) a significant portion of the identified motor units was not consistently recruited across strides,

      (2) motor units identified from the lateral head of the triceps tended to be recruited later than those from the long head,

      (3) the number of spikes per stride and peak firing rates were correlated in both muscles, and

      (4) the probability of motor unit recruitment and firing rates increased with walking speed.

      The authors conclude that these differences can be attributed to the distinct functions of the muscles and the constraints of the task (i.e., speed).

      Strengths:

      The combination of novel electrode arrays to record intramuscular electromyographic signals from a larger muscle volume with an advanced spike sorting pipeline capable of identifying populations of motor units.

      We thank the Reviewer for this comment.

      Weaknesses:

      (1) There is a lack of information on the number of identified motor units per muscle and per animal.

      The Reviewer is correct that this information was not explicitly provided in the prior submission. We have therefore added Table 1 that quantifies the number of motor units per muscle and per animal.

      (2) All identified motor units are pooled in the analyses, whereas per-animal analyses would have been valuable, as motor units within an individual likely receive common synaptic inputs. Such analyses would fully leverage the potential of identifying populations of motor units.

      Please see our answer to the following point, where we address questions (2) and (3) together.

      (3) The current data do not allow for determining which motor units were sampled from each pool. It remains unclear whether the sample is biased toward high-threshold motor units or representative of the full pool.

      We thank the Reviewer for these comments. To clarify how motor unit responses were distributed across animals and muscle targets, we updated or added the following figures:  

      Figure 2C

      Figure 4–figure supplement 1

      Figure 5–figure supplement 2

      Figure 6–figure supplement 2

      These provide a more complete look at the range of activity within each motor pool, suggesting that we do measure from units with different activation thresholds within the same motor pool, rather than this variation being due to cross-animal differences. For example, Figure 2C illustrates that motor units from the same muscle and animal show a wide variety of recruitment probabilities. However, the limited number of motor units recorded from each individual animal does not allow a statistically rigorous test for examining cross-animal differences.

      (4) The behavioural analysis of the animals relies solely on kinematics (2D estimates of elbow angle and stride timing). Without ground reaction forces or shoulder angle data, drawing functional conclusions from the results is challenging.

      The Reviewer is correct that we did not measure muscular force generation or ground reaction forces in the present study. Although outside the scope of this study, future work might employ buckle force transducers as used in larger animals (Biewener et al., 1988; Karabulut et al., 2020) to examine the complex interplay between neural commands, passive biomechanics, and the complex force-generating properties of muscle tissue.

      Major comments:

      (1) Spike sorting

      The conclusions of the study rely on the accuracy and robustness of the spike sorting algorithm during a highly dynamic task. Although the pipeline was presented in a previous publication (Chung et al., 2023, eLife), a proper validation of the algorithm for identifying motor unit spikes is still lacking. This is particularly important in the present study, as the experimental conditions involve significant dynamic changes. Under such conditions, muscle geometry is altered due to variations in both fibre pennation angles and lengths.

      This issue differs from electrode drift, and it is unclear whether the original implementation of Kilosort includes functions to address it. Could the authors provide more details on the various steps of their pipeline, the strategies they employed to ensure consistent tracking of motor unit action potentials despite potential changes in action potential waveforms, and the methods used for manual inspection of the spike sorting algorithm's output?

      This is an excellent point and we agree that the dynamic behavior used in this investigation creates potential new challenges for spike sorting. In our analysis, Kilosort 2.5 provides key advantages in comparing unit waveforms across multiple channels and in detecting overlapping spikes. We modified this version of Kilosort to construct unit waveform templates using only the channels within the same muscle (Chung et al., 2023), as clarified in the revised Methods section (see “Electromyography (EMG)”):

      “A total of 33 units were identified across all animals. Each unit’s isolation was verified by confirming that no more than 2% of inter-spike intervals violated a 1 ms refractory limit. Additionally, we manually reviewed cross-correlograms to ensure that each waveform was only reported as a single motor unit.”

      The Reviewer is correct that our ability to precisely measure a unit’s activity based on its waveform will depend on the relationship between the embedded electrode and the muscle geometry, which alters over the course of the stride. As a follow-up to the original text, we have included new analyses to characterize the waveform activity throughout the experiment and stride (also in Methods):

      “We further validated spike sorting by quantifying the stability of each unit’s waveform across time (Figure 1–figure supplement 1). First, we calculated the median waveform of each unit across every trial to capture long-term stability of motor unit waveforms. Additionally, we calculated the median waveform through the stride binned in 50 ms increments using spiking from a single trial. This second metric captures the stability of our spike sorting during the rapid changes in joint angles that occur during the burst of an individual motor unit. In doing so, we calculated each motor unit’s waveforms from the single channel in which that unit’s amplitude was largest and did not attempt to remove overlapping spikes from other units before measuring the median waveform from the data. We then calculated the correlation between a unit’s waveform over either trials or bins in which at least 30 spikes were present. The high correlation of a unit waveform over time, despite potential changes in the electrodes’ position relative to muscle geometry over the dynamic task, provides additional confidence in both the stability of our EMG recordings and the accuracy of our spike sorting.”

      We have included a supplementary to Figure 1 to highlight the effectiveness of our spike sorting.

      (2) Yield of the spike sorting pipeline and analyses per animal/muscle

      A total of 33 motor units were identified from two heads of the triceps in six mice (17 from the long head and 16 from the lateral head). However, precise information on the yield per muscle per animal is not provided. This information is crucial to support the novelty of the study, as the authors claim in the introduction that their electrode arrays enable the identification of populations of motor units. Beyond reporting the number of identified motor units, another way to demonstrate the effectiveness of the spike sorting algorithm would be to compare the recorded EMG signals with the residual signal obtained after subtracting the action potentials of the identified motor units, using a signal-to-residual ratio.

      Furthermore, motor units identified from the same muscle and the same animal are likely not independent due to common synaptic inputs. This dependence should be accounted for in the statistical analyses when comparing changes in motor unit properties across speeds and between muscles.

      We thank the Reviewer for this comment. Regarding motor unit yield, as described above the newly-added Table 1 displays the yield from each animal and muscle.

      Regarding spike sorting, while signal-to-residual is often an excellent metric, it is not ideal for our high-resolution EMG signals since isolated single motor units are typically superimposed on a “bulk” background consisting of the low-amplitude waveforms of other motor units. Because these smaller units typically cannot be sorted, it is challenging to estimate the “true” residual after subtracting (only) the largest motor unit, since subtracting each sorted unit’s waveform typically has a very small effect on the RMS of the total EMG signal. To further address concerns regarding spike sorting quality, we added Figure 1–figure supplement 1 that demonstrates motor units’ consistency over the experiment, highlighting that the waveform maintains its shape within each stride despite muscle/limb dynamics and other possible sources of electrical noise or artifact.

      Finally, the Reviewer is correct that individual motor units in the same muscle are very likely to receive common synaptic inputs. These common inputs may reflect in sparse motor units being recruited in overlapping rather than different strides. Indeed, in the following text added to the Results, we identified that motor units are recruited with higher probability when additional units are recruited.

      “Probabilistic recruitment is correlated across motor units

      Our results show that the recruitment of individual motor units is probabilistic even within a single speed quartile (Figure 5A-C) and predicts body movements (Figure 6), raising the question of whether the recruitment of individual motor units are correlated or independent. Correlated recruitment might reflect shared input onto the population of motor units innervating the muscle (De Luca, 1985; De Luca & Erim, 1994; Farina et al., 2014). For example, two motor units, each with low recruitment probabilities, may still fire during the same set of strides. To assess the independence of motor unit recruitment across the recorded population, we compared each unit’s empirical recruitment probability across all strides to its conditional recruitment probability during strides in which another motor unit from the same muscle was recruited (Figure 7). Doing this for all motor unit pairs revealed that motor units in both muscles were biased towards greater recruitment when additional units were active (p<0.001, Wilcoxon signed-rank tests for both the lateral and long heads of triceps). This finding suggests that probabilistic recruitment reflects common synaptic inputs that covary together across locomotor strides.”

      (3) Representativeness of the sample of identified motor units

      However, to draw such conclusions, the authors should exclusively compare motor units from the same pool and systematically track violations of the recruitment order. Alternatively, they could demonstrate that the motor units that are intermittently active across strides correspond to the smallest motor units, based on the assumption that these units should always be recruited due to their low activation thresholds.

      One way to estimate the size of motor units identified within the same muscle would be to compare the amplitude of their action potentials, assuming that all motor units are relatively close to the electrodes (given the selectivity of the recordings) and that motoneurons innervating more muscle fibres generate larger motor unit action potentials.

      We thank the Reviewer for this comment. Below, we provide more detailed analyses of the relationships between motor unit spike amplitude and the recruitment probability as well as latency (relative to stride onset) of activation.

      We generated Author response image 1 to illustrate the relationship between the amplitude of motor units and their firing properties. As suspected, units with larger-amplitude waveforms fired with lower probability and produced their first spikes later in the stride. If we were comfortable assuming that larger spike amplitudes mean higher-force units, then this would be consistent with a key prediction of the size principle (i.e. that higher-force units are recruited later). However, we are hesitant to base any conclusions on this assumption or emphasize this point with a main-text figure, since EMG signal amplitude may also vary due to the physical properties of the electrode and distance from muscle fibers. Thus it is possible that a large motor unit may have a smaller waveform amplitude relative to the rest of the motor pool.

      Author response image 1.

      Relation between motor unit amplitude and (A) recruitment probability and (B) mean first spike time within the stride. Colored lines indicate the outcome of linear regression analyses.

      Currently, the data seem to support the idea that motor units that are alternately recruited across strides have recruitment thresholds close to the level of activation or force produced during slow walking. The fact that recruitment probability monotonically increases with speed suggests that the force required to propel the mouse forward exceeds the recruitment threshold of these "large" motor units. This pattern would primarily reflect spatial recruitment following the size principle rather than flexible motor unit control.

      We thank the Reviewer for this comment. We agree with this interpretation, particularly in relation to the references suggested in later comments, and have added the following text to the Discussion to better reflect this argument:

      “To investigate the neuromuscular control of locomotor speed, we quantified speed-dependent changes in both motor unit recruitment and firing rate. We found that the majority of units were recruited more often and with larger firing rates at faster speeds (Figure 5, Figure5–figure supplement 1). This result may reflect speed-dependent differences in the common input received by populations of motor neurons with varying spiking thresholds (Henneman et al., 1965). In the case of mouse locomotion, faster speeds might reflect a larger common input, increasing the recruitment probability as more neurons, particularly those that are larger and generate more force, exceed threshold for action potentials (Farina et al., 2014).”

      (4)    Analysis of recruitment and firing rates

      The authors currently report active duration and peak firing rates based on spike trains convolved with a Gaussian kernel. Why not report the peak of the instantaneous firing rates estimated from the inverse of the inter-spike interval? This approach appears to be more aligned with previous studies conducted to describe motor unit behaviour during fast movements (e.g., Desmedt & Godaux, 1977, J Physiol; Van Cutsem et al., 1998, J Physiol; Del Vecchio et al., 2019, J Physiol).

      We thank the Reviewer for this comment. In the revised Discussion (see ‘Firing rates in mouse locomotion compared to other species’) we reference several examples of previous studies that quantified spike patterns based on the instantaneous firing rate. We chose to report the peak of the smoothed firing rate because that quantification includes strides with zero spikes or only one spike, which occur regularly in our dataset (and for which ISI rate measures, which require two spikes to define an instantaneous firing rate, cannot be computed). Regardless, in the revised Figure 4B, we present an analysis that uses inter-spike intervals as suggested, which yielded similar ranges of firing rates as the primary analysis.

      (5)    Additional analyses of behaviour

      The authors currently analyse motor unit recruitment in relation to elbow angle. It would be valuable to include a similar analysis using the angular velocity observed during each stride, re broadly, comparing stride-by-stride changes in firing rates with changes in elbow angular velocity would further strengthen the final analyses presented in the results section.

      We thank the Reviewer for this comment. To address this, we have modified Figure 6 and the associated Supplemental Figures, to show relationships in unit activation with both the range of elbow extension and the range of elbow velocity for each stride. These new Supplemental Figures show that the trends shown in main text Figure 6C and 6E (which show data from all speed quartiles on the same axes) are also apparent in both the slower and faster quartiles individually, although single-quartile statistical tests (with smaller sample size than the main analysis) not reach statistical significance in all cases.

      Reviewer #3 (Public review):

      Summary:

      Using the approach of Myomatrix recording, the authors report that:

      (1) Motor units are recruited differently in the two types of muscles.

      (2) Individual units are probabilistically recruited during the locomotion strides, whereas the population bulk EMG has a more reliable representation of the muscle.

      (3) The recruitment of units was proportional to walking speed.

      Strengths:

      The new technique provides a unique data set, and the data analysis is convincing and well-performed.

      We thank the Reviewer for the comment.

      Weaknesses:

      The implications of "probabilistical recruitment" should be explored, addressed, and analyzed further.

      Comments:

      One of the study's main findings (perhaps the main finding) is that the motor units are "probabilistically" recruited. The authors do not define what they mean by probabilistically recruited, nor do they present an alternative scenario to such recruitment or discuss why this would be interesting or surprising. However, on page 4, they do indicate that the recruitment of units from both muscles was only active in a subset of strides, i.e., they are not reliably active in every step.

      If probabilistic means irregular spiking, this is not new. Variability in spiking has been seen numerous times, for instance in human biceps brachii motor units during isometric contractions (Pascoe, Enoka, Exp physiology 2014) and elsewhere. Perhaps the distinction the authors are seeking is between fluctuation-driven and mean-driven spiking of motor units as previously identified in spinal motor networks (see Petersen and Berg, eLife 2016, and Berg, Frontiers 2017). Here, it was shown that a prominent regime of irregular spiking is present during rhythmic motor activity, which also manifests as a positive skewness in the spike count distribution (i.e., log-normal).

      We thank the Reviewer for this comment and have clarified several passages in response. The Reviewer is of course correct that irregular motor unit spiking has been described previously and may reflect motor neurons’ operating in a high-sensitivity (fluctuation-driven) regime. We now cite these papers in the Discussion (see ‘Firing rates in mouse locomotion compared to other species’). Additionally, the revision clarifies that “probabilistically” - as defined in our paper - refers only to the empirical observation that a motor unit spikes during only a subset of strides, either when all locomotor speeds are considered together (Figure 2) or separately (Figure 5A-C):

      “Motor units in both muscles exhibited this pattern of probabilistic recruitment (defined as a unit’s firing on only a fraction of strides), but with differing distributions of firing properties across the long and lateral heads (Figure 2).”

      “Our findings (Figure 4) highlight that even with the relatively high firing rates observed in mice, there are still significant changes in firing rate and recruitment probability across the spikes within bursts (Figure 4B) and across locomotor speeds (Figure 5F). Future studies should more carefully examine how these rapidly changing spiking patterns derive from both the statistics of synaptic inputs and intrinsic properties of motor neurons (Manuel & Heckman, 2011; Petersen & Berg, 2016; Berg, 2017).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, there are several issues with the statistics that need to be corrected to properly support the claims made in the paper.

      The authors compare the fractions of MUs that show significant variation across locomotor speeds in their firing rate and recruitment probability. However, it is not statistically founded to compare the results of separate statistical tests based on different kinds of measurements and thus have unconstrained differences in statistical power. The comparison of the fractional changes in firing rates and recruitment across speeds that follow is helpful, though in truth, by contemporary standards, one would like to see error bars on these estimates. These could be generated using bootstrapping.

      The Reviewer is correct, and we have revised the manuscript to better clarify which quantities should or should not be compared, including the following passage (see “Motor unit mechanisms of speed control” in Results):

      “Speed-dependent increases in peak firing rate were therefore also present in our dataset, although in a smaller fraction of motor units (22/33) than changes in recruitment probability (31/33). Furthermore, the mean (± SE) magnitude of speed-dependent increases was smaller for spike rates (mean rate<sub>fast</sub>/rate<sub>slow</sub> of 111% ± 20% across all motor units) than for recruitment probabilities (mean p(recruitment)<sub>fast</sub>/p(recruitment)<sub>slow</sub> of 179% ± 3% across all motor units). While fractional changes in rate and recruitment probability are not readily comparable given their different upper limits, these findings could suggest that while both recruitment and peak rate change across speed quartiles, increased recruitment probability may play a larger role in driving changes in locomotor speed.”

      The description in the Methods of the tests for variation in firing rates and recruitment probability across speeds are extremely hard to understand - after reading many times, it is still not clear what was done, or why the method used was chosen. In the main text, the authors quote p-values and then state "bootstrap confidence intervals," which is not a statistical test that yields a p-value. While there are mathematical relationships between confidence intervals and statistical tests such that a one-to-one correspondence between them can exist, the descriptions provided fall short of specifying how they are related in the present instance. For this reason, and those described in what follows, it is not clear what the p-values represent.

      Next, the authors refer to fitting a model ("a Poisson distribution") to the data to estimate firing rate and recruitment probability, that the model results agree with their actual data, and that they then bootstrapped from the model estimates to get confidence intervals and compute p-values. Why do this? Why not just do something much simpler, like use the actual spike counts, and resample from those? I understand that it is hard to distinguish between no recruitment and just no spikes given some low Poisson firing rate, but how does that challenge the ability to test if the firing rates or the number of spiking MUs changes significantly across speeds? I can come up with some reasons why I think the authors might have decided to do this, but reasoning like this really should be made explicit.

      In addition, the authors would provide an unambiguous description of the model, perhaps using an equation and a description of how it was fit. For the bootstrapping, a clear description of how the resampling was done should be included. The focus on peak firing rate instead of mean (or median) firing rate should also be justified. Since peaks are noisier, I would expect the statistical power to be lower compared to using the mean or median.

      We thank the Reviewer for the comments and have revised and expanded our discussion of the statistical tests employed. We expanded and clarified our description of these techniques in the updated Methods section:

      “Joint model of rate and recruitment

      We modeled the recruitment probability and firing rate based on empirical data to best characterize firing statistics within the stride. Particularly, this allowed for multiple solutions to explain why a motor unit would not spike within a stride. From the empirical data alone, strides with zero spikes would have been assumed to have no recruitment of a unit. However, to create a model of motor unit activity that includes both recruitment and rate, it must be possible that a recruited unit can have a firing rate of zero. To quantify the firing statistics that best represent all spiking and non-spiking patterns, we modeled recruitment probability and peak firing rate along the following piecewise function:

      Eq. 1:

      Eq. 2:

      where y denotes the observed peak firing rate on a given stride (determined by convolving motor unit spike times with a Gaussian kernel as described above), p denotes the probability of recruitment, and λ denotes the expected peak firing rate from a Poisson distribution of outcomes. Thus, an inactive unit on a given stride may be the result of either non-recruitment or recruitment with a stochastically zero firing rate. The above equations were fit by minimizing the negative log-likelihood of the parameters given the data.”

      “Permutation test for joint model of rate and recruitment and type 2 regression slopes

      To quantify differences in firing patterns across walking speeds, we subdivided each mouse’s total set of strides into speed quartiles and calculated rate (𝜆, Eq. 1 and 2, Fig. 5A-C) and recruitment probability terms (p, Eq. 1 and 2, Fig. 5D-F) for each unit in each speed quartile. Here we calculated the difference in both the rate and recruitment terms across the fastest and slowest speed quartiles (p<sub>fast</sub>-p<sub>slow</sub> and 𝜆<sub>fast</sub>-𝜆<sub>slow</sub>). To test whether these model parameters were significantly different depending on locomotor speed, we developed a null model combining strides from both the fastest and slowest speed quartiles. After pooling strides from both quartiles, we randomly distributed the pooled set of strides into two groups with sample sizes equal to the original slow and fast quartiles. We then calculated the null model parameters for each new group and found the difference between like terms. To estimate the distribution of possible differences, we bootstrapped this result using 1000 random redistributions of the pooled set of strides. Following the permutation test, the 95% confidence interval of this final distribution reflects the null hypothesis of no difference between groups. Thus, the null hypothesis can be rejected if the true difference in rate or recruitment terms exceeds this confidence interval.

      We followed a similar procedure to quantify cross-muscle differences in the relationship between firing parameters. For each muscle, we estimated the slope across firing parameters for each motor unit using type 2 regression. In this case, the true difference was the difference in slopes between muscles. To test the null hypothesis that there was no difference in slopes, the null model reflected the pooled set of units from both muscles. Again, slopes were calculated for 1000 random resamplings of this pooled data to estimate the 95% confidence interval.”

      The argument for delayed activation of the lateral head is interesting, but I am not comfortable saying the nervous system creates a delay just based on observations of the mean time of the first spike, given the potential for differential variability in spike timing across muscles and MUs. One way to make a strong case for a delay would be to show aggregate PSTHs for all the spikes from all the MUs for each of the two heads. That would distinguish between a true delay and more gradual or variable activation between the heads.

      This is a good point and we agree that the claim made about the nervous system is too strong given the results. Even with Author response image 2 that the Reviewer suggested, there is still not enough evidence to isolate the role of the nervous system in the muscles’ activation.

      Author response image 2.

      Aggregate peristimulus time histogram (PSTH) for all motor unit spike times in the long head (top) and lateral head (bottom) within the stride.

      In the ideal case, we would have more simultaneous recordings from both muscles to make a more direct claim on the delay. Still, within the current scope of the paper, to correct this and better describe the difference in timing of muscle activity, we edited the text to the following:

      “These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, the motor pool for the long head becomes active roughly 100 ms before the motor pool supplying the lateral head during locomotion (Figure 3C).”

      The results from Marshall et al. 2022 suggest that the recruitment of some MUs is not just related to muscle force, but also the frequency of force variation - some of their MUs appear to be recruited only at certain frequencies. Figure 5C could have shown signs of this, but it does not appear to. We do not really know the force or its frequency of variation in the measurements here. I wonder whether there is additional analysis that could address whether frequency-dependent recruitment is present. It may not be addressable with the current data set, but this could be a fruitful direction to explore in the future with MU recordings from mice.

      We agree that this would be a fruitful direction to explore, however the Reviewer is correct that this is not easily addressable with the dataset. As the Reviewer points out, stride frequency increases with increased speed, potentially offering the opportunity to examine how motor unit activity varies with the frequency, phase, and amplitude of locomotor movements. However, given our lack of force data (either joint torques or ground reaction forces), dissociating the frequency/phase/amplitude of skeletal kinematics from the frequency/phase/amplitude of muscle force. Marshall et al. (2022) mitigated these issues by using an isometric force-production task (Marshall et al., 2022). Therefore, while we agree that it would be a major contribution to extend such investigations to whole-body movements like locomotion, given the complexities described above we believe this is a project for the future, and beyond the scope of the present study.

      Minor:

      Page 5: "Units often displayed no recruitment in a greater proportion of strides than for any particular spike count when recruited (Figures 2A, B)," - I had to read this several times to understand it. I suggest rephrasing for clarity.

      We have changed the text to read:

      “Units demonstrated a variety of firing patterns, with some units producing 0 spikes more frequently than any non-zero spike count (Figure 2A, B),...”

      Figure 3 legend: "Mean phase ({plus minus} SE) of motor unit burst duration across all strides.": It is unclear what this means - durations are not usually described as having a phase. Do we mean the onset phase?

      We have changed the text to read:

      “Mean phase ± SE of motor unit burst activity within each stride”

      Page 9: "suggesting that the recruitment of individual motor units in the lateral and long heads might have significant (and opposite) effects on elbow angle in strides of similar speed (see Discussion)." I wouldn't say "opposite" here - that makes it sound like the authors are calling the long head a flexor. The authors should rephrase or clarify the sense in which they are opposite.

      This is a fair point and we agree we should not describe the muscles as ‘opposite’ when both muscles are extensors. We have removed the phrase ‘and opposite’ from the text.

      Page 11: "in these two muscles across in other quadrupedal species" - typo.

      We have corrected this error.

      Page 16: This reviewer cannot decipher after repeated attempts what the first two sentences of the last paragraph mean. - “Future studies might also use perturbations of muscle activity to dissociate the causal properties of each motor unit’s activity from the complex correlation structure of locomotion. Despite the strong correlations observed between motor unit recruitment and limb kinematics (Fig. 6, Supplemental Fig. 3), these results might reflect covariations of both factors with locomotor speed rather than the causal properties of the recorded motor unit.”

      For better clarity, we have changed the text to read:

      “Although strong correlations were observed between motor unit recruitment and limb kinematics during locomotion (Figure 6, Figure 6–figure supplement 1), it remains unclear whether such correlations actually reflect the causal contributions that those units make to limb movement. To resolve this ambiguity, future studies could use electrical or optical perturbations of muscle contraction levels (Kim et al., 2024; Lu et al., 2024; Srivastava et al., 2015, 2017) to test directly how motor unit firing patterns shape locomotor movements.The short-latency effects of patterned motor unit stimulation (Srivastava et al., 2017) could then reveal the sensitivity of behavior to changes in muscle spiking and the extent to which the same behaviors can be performed with many different motor commands.”

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      Introduction:

      (1) "Although studies in primates, cats, and zebrafish have shown that both the number of active motor units and motor unit firing rates increase at faster locomotor speeds (Grimby, 1984; Hoffer et al., 1981, 1987; Marshall et al., 2022; Menelaou & McLean, 2012)." I would remove Marshall et al. (2022) as their monkeys performed pulling tasks with the upper limb. You can alternatively remove locomotor from the sentence and replace it with contraction speed.

      Thank you for the comment. While we intended to reference this specific paper to highlight the rhythmic activity in muscles, we agree that this deviates from ‘locomotion’ as it is referenced in the other cited papers which study body movement. We have followed the Reviewer’s suggestion to remove the citation to Marshall et al.

      (2) "The capability and need for faster force generation during dynamic behavior could implicate motor unit recruitment as a primary mechanism for modulating force output in mice."

      The authors could add citations to this sentence, of works that showed that recruitment speed is the main determinant of the rate of force development (see for example Dideriksen et al. (2020) J Neurophysiol; J. L. Dideriksen, A. Del Vecchio, D. Farina, Neural and muscular determinants of maximal rate of force development. J Neurophysiol 123, 149-157 (2020)).

      Thank you for pointing out this important reference. We have included this as a citation as recommended.

      Results:

      (3) "Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in the triceps brachii (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units (Figure 1E) as described previously (Chung et al., 2023)."

      This sentence can be misleading for the reader as the array used by the researchers has 4 threads of 8 electrodes. Would it be possible to specify the number of electrodes implanted per head of interest? I assume 8 per head in most mice (or 4 bipolar channels), even if that's not specifically written in the manuscript.

      Thank you for the suggestion. As described above, we have added Table 1, which includes all array locations, and we edited the statement referenced in the comment as follows:

      “Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in forelimb muscles (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units in the triceps brachii long and lateral heads (Table 1, Figure 1E) as described previously (Chung et al., 2023).“

      (4) "These findings demonstrate that despite the overlapping biomechanical functions of the long and lateral heads of the triceps, the nervous system creates a consistent, approximately 100 ms delay (Figure 3C) between the activation of the two muscles' motor neuron pools. This timing difference suggests distinct patterns of synaptic input onto motor neurons innervating the lateral and long heads."

      Both muscles don't have fully overlapping biomechanical functions, as one of them also acts on the shoulder joint. Please be more specific in this sentence, saying that both muscles are synergistic at the elbow level rather than "have overlapping biomechanical functions".

      We agree with the above reasoning and that our manuscript should be clearer on this point. We edited the above text in accordance with the Reviewer suggestion as follows:

      "These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, …”

      (5) "Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role."

      It is difficult to draw such an affirmative conclusion on the synaptic inputs from the data presented by the authors. The differences in firing rates may solely arise from other factors than distinct synaptic inputs, such as the different intrinsic properties of the motoneurons or the reception of distinct neuromodulatory inputs.

      To better explain our findings, we adjusted the above text in the Results (see “Motor unit firing patterns in the long and lateral heads of the triceps”):

      “Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role.”

      We also included the following distinction in the Discussion (see “Differences in motor unit activity patterns across two elbow extensors”) to address the other plausible mechanisms mentioned.

      “The large differences in burst timing and spike patterning across the muscle heads suggest that the motor pools for each muscle receive distinct inputs. However, differences in the intrinsic physiological properties of motor units and neuromodulatory inputs across motor pools might also make substantial contributions to the structure of motor unit spike patterns (Martínez-Silva et al., 2018; Miles & Sillar, 2011).”

      (6) "We next examined whether the probabilistic recruitment of individual motor units in the triceps and elbow extensor muscle predicted stride-by-stride variations in elbow angle kinematics."

      I'm not sure that the wording is appropriate here. The analysis does not predict elbow angle variations from parameters extracted from the spiking activity. It rather compares the average elbow angle between two conditions (motor unit active or not active).

      We thank the Reviewer for this comment and agree that the wording could be improved here to better reflect our analysis. To lower the strength of our claim, we replaced usage of the word

      ‘predict’ with ‘correlates’ in the above text and throughout the paper when discussing this result.

      Methods:

      (7) "Using the four threads on the customizable Myomatrix array (RF-4x8-BHS-5), we implanted a combination of muscles in each mouse, sometimes using multiple threads within the same muscle. [...] Some mice also had threads simultaneously implanted in their ipsilateral or contralateral biceps brachii although no data from the biceps is presented in this study."

      A precise description of the localisation of the array (muscles and the number of arrays per muscle) for each animal would be appreciated.

      (8) "A total of 33 units were identified and manually verified across all animals." A precise description of the number of motor units concurrently identified per muscle and per animal would be appreciated. Moreover, please add details on the manual inspection. Does it involve the manual selection of missing spikes? What are the criteria for considering an identified motor unit as valid?

      As discussed earlier, we added Table 1 to the main text to provide the details mentioned in the above comments.

      Regarding spike sorting, given the very large number of spikes recorded, we did not rely on manual adjusting mislabeled spikes. Instead, as described in the revised Methods section, we verified unit isolation by ensuring units had >98% of spikes outside of 1ms of each other. Moreover, as described above we have added new analyses (Figure 1–figure supplement 1) confirming the stability of motor unit waveforms across both the duration of individual recording sessions (roughly 30 minutes) and across the rapid changes in limb position within individual stride cycles (roughly 250 msec).

      Reviewer #3 (Recommendations for the authors):

      Figure 2 (and supplement) show spike count distributions with strong positive skewness, which is in accordance with the prediction of a fluctuation-driven regime. I suggest plotting these on a logarithmic x-axis (in addition to the linear axis), which should reveal a bell-shaped distribution, maybe even Gaussian, in a majority of the units.

      We thank the Reviewer for the suggestion. We present the requested analysis (Author response image 3), which shows bell-shaped distributions for some (but not all) distributions. However, we believe that investigating why some replotted distributions are Gaussian and others are not falls beyond the scope of this paper, and likely requires a larger dataset than the one we were able to obtain.

      Author response image 3.

      Spike count distributions for each motor unit on a logarithmic x-axis.

      Why not more data? I tried to get an overview of how much data was collected.

      Supplemental Figure 1 has all the isolated units, which amounts to 38 (are the colors the two muscle types?). Given there are 16 leads in each myomatrix, in two muscles, of six mice, this seems like a low yield. Could the authors comment on the reasons for this low yield?

      Regarding motor unit yield, even with multiple electrodes per muscle and a robust sorting algorithm, we often isolated only a few units per muscle. This yield likely reflects two factors. First, because of the highly dynamic nature of locomotion and high levels of muscle contraction, isolating individual spikes reliably across different locomotor speeds is inherently challenging, regardless of the algorithm being employed. Second, because the results of spike-train analyses can be highly sensitive to sorting errors, we have only included the motor units that we can sort with the highest possible confidence across thousands of strides.

      Minor:

      Figure captions especially Figure 6: The text is excessively long. Can the text be shortened?

      We thank the Reviewer for this comment. Generally, we seek to include a description of the methods and results within the figure captions, but we concede that we can condense the information in some cases. In a number of cases, we have moved some of the descriptive text from the caption to the Methods section.

      References

      Berg, R. W. (2017). Neuronal Population Activity in Spinal Motor Circuits: Greater Than the Sum of Its Parts. Frontiers in Neural Circuits, 11. https://doi.org/10.3389/fncir.2017.00103

      Biewener, A. A., Blickhan, R., Perry, A. K., Heglund, N. C., & Taylor, C. R. (1988). Muscle Forces During Locomotion in Kangaroo Rats: Force Platform and Tendon Buckle Measurements Compared. Journal of Experimental Biology, 137(1), 191–205. https://doi.org/10.1242/jeb.137.1.191

      Chung, B., Zia, M., Thomas, K. A., Michaels, J. A., Jacob, A., Pack, A., Williams, M. J., Nagapudi, K., Teng, L. H., Arrambide, E., Ouellette, L., Oey, N., Gibbs, R., Anschutz, P., Lu, J., Wu, Y., Kashefi, M., Oya, T., Kersten, R., … Sober, S. J. (2023). Myomatrix arrays for high-definition muscle recording. eLife, 12, RP88551. https://doi.org/10.7554/eLife.88551

      De Luca, C. J. (1985). Control properties of motor units. Journal of Experimental Biology, 115(1), 125–136. https://doi.org/10.1242/jeb.115.1.125

      De Luca, C. J., & Erim, Z. (1994). Common drive of motor units in regulation of muscle force. Trends in Neurosciences, 17(7), 299–305. https://doi.org/10.1016/0166-2236(94)90064-7

      Farina, D., Negro, F., & Dideriksen, J. L. (2014). The effective neural drive to muscles is the common synaptic input to motor neurons. The Journal of Physiology, 592(16), 3427–3441. https://doi.org/10.1113/jphysiol.2014.273581

      Hartigan, P. M. (1985). Algorithm AS 217: Computation of the Dip Statistic to Test for Unimodality. Applied Statistics, 34(3), 320. https://doi.org/10.2307/2347485

      Henneman, E., Somjen, G., & Carpenter, D. O. (1965). FUNCTIONAL SIGNIFICANCE OF CELL SIZE IN SPINAL MOTONEURONS. Journal of Neurophysiology, 28(3), 560–580. https://doi.org/10.1152/jn.1965.28.3.560

      Karabulut, D., Dogru, S. C., Lin, Y.-C., Pandy, M. G., Herzog, W., & Arslan, Y. Z. (2020). Direct Validation of Model-Predicted Muscle Forces in the Cat Hindlimb During Locomotion. Journal of Biomechanical Engineering, 142(5), 051014. https://doi.org/10.1115/1.4045660

      Kim, J. J., Wyche, I. S., Olson, W., Lu, J., Bakir, M. S., Sober, S. J., & O’Connor, D. H. (2024). Myo-optogenetics: Optogenetic stimulation and electrical recording in skeletal muscles. https://doi.org/10.1101/2024.06.21.600113

      Lu, J., Zia, M., Baig, D. A., Yan, G., Kim, J. J., Nagapudi, K., Anschutz, P., Oh, S., O’Connor, D., Sober, S. J., & Bakir, M. S. (2024). Opto-Myomatrix: μLED integrated microelectrode arrays for optogenetic activation and electrical recording in muscle tissue. https://doi.org/10.1101/2024.07.01.601601

      Manuel, M., & Heckman, C. J. (2011). Adult mouse motor units develop almost all of their force in the subprimary range: A new all-or-none strategy for force recruitment? Journal of Neuroscience, 31(42), 15188–15194. https://doi.org/10.1523/JNEUROSCI.2893-11.2011

      Marshall, N. J., Glaser, J. I., Trautmann, E. M., Amematsro, E. A., Perkins, S. M., Shadlen, M. N., Abbott, L. F., Cunningham, J. P., & Churchland, M. M. (2022). Flexible neural control of motor units. Nature Neuroscience, 25(11), 1492–1504. https://doi.org/10.1038/s41593-022-01165-8

      Martínez-Silva, M. de L., Imhoff-Manuel, R. D., Sharma, A., Heckman, C. J., Shneider, N. A., Roselli, F., Zytnicki, D., & Manuel, M. (2018). Hypoexcitability precedes denervation in the large fast-contracting motor units in two unrelated mouse models of ALS. eLife, 7(2007), 1–26. https://doi.org/10.7554/eLife.30955

      Miles, G. B., & Sillar, K. T. (2011). Neuromodulation of Vertebrate Locomotor Control Networks. Physiology, 26(6), 393–411. https://doi.org/10.1152/physiol.00013.2011

      Petersen, P. C., & Berg, R. W. (2016). Lognormal firing rate distribution reveals prominent fluctuation–driven regime in spinal motor networks. eLife, 5. https://doi.org/10.7554/elife.18805

      Srivastava, K. H., Elemans, C. P. H., & Sober, S. J. (2015). Multifunctional and Context-Dependent Control of Vocal Acoustics by Individual Muscles. The Journal of Neuroscience, 35(42), 14183–14194. https://doi.org/10.1523/JNEUROSCI.3610-14.2015

      Srivastava, K. H., Holmes, C. M., Vellema, M., Pack, A. R., Elemans, C. P. H., Nemenman, I., & Sober, S. J. (2017). Motor control by precisely timed spike patterns. Proceedings of the National Academy of Sciences of the United States of America, 114(5), 1171–1176. https://doi.org/10.1073/pnas.1611734114

    1. Author response:

      The following is the authors’ response to the current reviews.

      We are pleased that Reviewer 3 appreciated our findings and found the temporal lag between the expression of TFF1 and TFF3 during signaling particularly interesting. The reviewer also advised us not to overemphasize that this lag arises from phase separation of ERα at the TFF1 locus, as the use of 1,6-hexanediol alone is not sufficient to conclusively establish whether ERα condensates undergo liquid–liquid phase separation. We agree with this assessment and have revised the manuscript accordingly. Specifically, we have modified the title to remove reference to phase separation and have updated the text throughout the manuscript to avoid claiming that the observed condensates are a result of phase separation. The revised title is: “Ligand-dependent Enhancer Activation Indirectly Modulates Non-target Promoters in a Chromatin Domain.”

      With these changes, we are proceeding with the Version of Record using revised version of the manuscript.

      ———

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      The manuscript by Bohra et al. describes the indirect effects of ligand-dependent gene activation on neighboring non-target genes. The authors utilized single-molecule RNA-FISH (targeting both mature and intronic regions), 4C-seq, and enhancer deletions to demonstrate that the non-enhancer-targeted gene TFF3, located in the same TAD as the target gene TFF1, alters its expression when TFF1 expression declines at the end of the estrogen signaling peak. Since the enhancer does not loop with TFF3, the authors conclude that mechanisms other than estrogen receptor or enhancer-driven induction are responsible for TFF3 expression. Moreover, ERα intensity correlations show that both high and low levels of ERα are unfavorable for TFF1 expression. The ERa level correlations are further supported by overexpression of GFP-ERa. The authors conclude that transcriptional machinery used by TFF1 for its acute activation can negatively impact the TFF3 at peak of signaling but once, the condensate dissolves, TFF3 benefits from it for its low expression.

      Strengths:

      The findings are indeed intriguing. The authors have maintained appropriate experimental controls, and their conclusions are well-supported by the data.

      Weaknesses:

      There are some major and minor concerns that related to approach, data presentation and discussion. But I think they can be fixed with more efforts.

      We thank the reviewer for their positive comments on the paper. We have addressed all their specific recommendations below.  

      The deletion of enhancer reveals the absolute reliance of TFF1 on its enhancers for its expression. Authors should elaborate more on this as this is an important finding.

      We thank the reviewer for the comment. We have now added a more detailed discussion on the requirement of enhancer for TFF1 expression in the revised manuscript (line 368-385).  

      In Fig. 1, TFF3 expression is shown to be induced upon E2 signaling through qRT-PCR, while smFISH does not display a similar pattern. The authors attribute this discrepancy to the overall low expression of TFF3. In my opinion, this argument could be further supported by relevant literature, if available. Additionally, does GRO-seq data reveal any changes in TFF3 expression following estrogen stimulation? The GRO-seq track shown in Fig.1 should be adjusted to TFF3 expression to appreciate its expression changes.

      We have now included a browser shot image of TFF3 region showing GRO-Seq signal at E2 time course (Fig. S1C). We observed an increased transcription towards the 3’ end of TFF3 gene body at 3h.  The increased transcription at 3h, corroborates with smFISH data. The relative changes of TFF3 expression measured by qRT-PCR and smFISH for intronic transcripts are somewhat different, we speculate that such biased measurements that are dependent on PCR amplifications could be more for genes that express at low levels and smFISH using intronic probes may be a more sensitive assay to detect such changes.    

      Since the mutually exclusive relationship between TFF1 and TFF3 is based on snap shots in fixed cells, can authors comment on whether the same cell that expresses TFF1 at 1h, expresses TFF3 at 3h? Perhaps, the calculations taking total number of cells that express these genes at 1 and 3h would be useful.

      Like pointed out by the reviewer, since these are fixed cells, we cannot comment on the fate of the same cell at two time points. To further address this limitation, future work could employ cells with endogenous tags for TFF1 and TFF3 and utilize live cell imaging techniques. In a fixed cell assay, as the reviewer suggests, it can be investigated whether a similar fraction shows high TFF3 expression at 3h, as the fraction that shows high TFF1 expression at 1 h. To quantify the fractions as suggested by the reviewer, we plotted the fraction of cells showing high TFF1 and TFF3 expression at 1h and 3h. We identify truly high expressing cells by taking mean and one standard deviation (for single cell level data) at E2-1hr as the threshold for TFF1 (80 and above transcript counts) and mean and one standard deviation (for single cell level data) at E2-3hr as the threshold for TFF3 (36 and above transcript counts). The fraction with high TFF1 expression at 1h  (12.06 ± 2.1) is indeed comparable to that with high TFF3 expression at 3h (12.50 ± 2.0) (Fig. 2C and Author response image 1). We should note that if the transcript counts were normally distributed, a predetermined fraction would be expected to be above these thresholds and comparable fractions can arise just from underlying statistics. But in our experiments, this is unlikely to be the case given the many outliers that affect both the mean and the standard deviation, and the lack of normality and high dispersion in single cell distributions. Of course, despite the fractions being comparable, we cannot be certain if it is the same set of cells that go from high expression of TFF1 to high expression of TFF3, but definitely that is a possibility. We thank the reviewer for pointing out this comparison.

      Author response image 1.

      The graph represents the percent of cells that show high expression for TFF1 and TFF3 at 1h and 3h post E2 signaling. The threshold was collected by pooling in absolute RNA counts from 650 analyzed cells (as in Fig. 2C). The mean and standard deviation over single cell data were calculated. Mean plus one standard deviation was used to set the threshold for identifying high expressing cells. For TFF1, as it maximally expresses at 1h the threshold used was 80. For TFF3, as it maximally expresses at 3h the threshold used was 36. Fraction of cells expressing above 80 and 36 for TFF1 and TFF3 respectively were calculated from three different repeats. Mean of means and standard deviations from the three experiments are plotted here.

      Authors conclude that TFF3 is not directly regulated by enhancer or estrogen receptor. Does ERa bind on TFF3 promoter? 

      The ERa ChIP-seq performed at 1h and 3h of signaling suggests that TFF3 promoter is not bound by ERa as shown in supplementary Fig. 1B and S1B. However, one peak upstream to TFF1 promoter is visible and that is lost at 3h. 

      Minor comments:

      Reviewer’s comment -The figures would benefit from resizing of panels. There is very little space between the panels.

      We have now resized the figures in the revised manuscript.

      The discussion section could include an extrapolation on the relationship between ERα concentration and transcriptional regulation. Given that ERα levels have been shown to play a critical role in breast cancer, exploring how varying concentrations of ERα affect gene expression, including the differential regulation of target and non-target genes, would provide valuable insights into the broader implications of this study.

      This is a very important point that was missing from the manuscript. We have included this in the discussion in the revised manuscript (line 426-430).

      Reviewer #2:

      Summary:

      In this manuscript by Bohra et al., the authors use the well-established estrogen response in MCF7 cells to interrogate the role of genome architecture, enhancers, and estrogen receptor concentration in transcriptional regulation. They propose there is competition between the genes TFF1 and TFF3 which is mediated by transcriptional condensates. This reviewer does not find these claims persuasive as presented. Moreover, the results are not placed in the context of current knowledge.

      Strengths:

      High level of ERalpha expression seems to diminish the transcriptional response. Thus, the results in Fig. 4 have potential insight into ER-mediated transcription. Yet, this observation is not pursued in great depth however, for example with mutagenesis of ERalpha. However, this phenomenon - which falls under the general description of non monotonic dose response - is treated at great depth in the literature (i.e. PMID: 22419778). For example, the result the authors describe in Fig. 4 has been reported and in fact mathematically modeled in PMID 23134774. One possible avenue for improving this paper would be to dig into this result at the single-cell level using deletion mutants of ERalpha or by perturbing co-activators.

      We thank the reviewer for pointing us to the relevant literature on our observation which will enhance the manuscript. We have discussed these findings in relations to ours in the discussion section (Line 400-413). We thank the reviewer for insight on non-monotonic behavior.

      Weaknesses:

      There are concerns with the sm-RNA FISH experiments. It is highly unusual to see so much intronic signal away from the site of transcription (Fig. 2) (PMID: 27932455, 30554876), which suggests to me the authors are carrying out incorrect thresholding or have a substantial amount of labelling background. The Cote paper cited in the manuscript is likewise inconsistent with their findings and is cited in a misleading manner: they see splicing within a very small region away from the site of transcription. 

      We thank the reviewer for this comment, and apologize if they feel we misrepresented the argument from Cote et al. This has now been rectified in the manuscript. However, we do not agree that the intronic signals away from the site of transcription are an artefact. First, the images presented here are just representative 2D projections of 3D Z-stacks; whereas the full 3D stack is used for spot counting using a widely-used algorithm that reports spot counts that are constant over wide range of thresholds (Raj et al., 2008). The veracity of automated counts was first verified initially by comparison to manual counts. Even for the 2D representations the extragenic intronic signals show up at similar thresholds to the transcription sites. 

      The signal is not non-specific arising from background labeling, explained by following reasons:

      • To further support the time-course smFISH data and its interpretation without depending on the dispersed intronic signal, we have analyzed the number of alleles firing/site of transcription at a given time in a cell under the three conditions. We counted the sites of transcription in a given cell and calculated the percentage of cells showing 1,2,3,4 or >4 sites. We see that the percent of cells showing a single site of transcription for TFF1 is very high in uninduced cells and this decreases at 1h. At 1h, the cells showing 2, 3 and 4 sites of transcription increase which again goes down at 3h (Author response image 2A). This agrees with the interpretation made from mean intronic counts away from the site of transcription. Similarly, for TFF3, the number of cells showing 2,3 and 4 sites of transcription increase slightly at 3hr compared to uninduced and 1hr (Author response image 2B).  We can also see that several cells have no alleles firing at a given time as has been quantified in the graphs on right showing total fraction of cells with zero versus non-zero alleles firing (Author response image 2A-B). A non-specific signal would be present in all cells.

      • There is literature on post-transcriptional splicing of RNA beyond our work, which suggests that intronic signal can be found at relatively large distances away from the site of transcription. Waks et al. showed that some fraction of unspliced RNA could be observed up to 6-10 microns away from the site of transcription suggesting that there can be a delay between transcription and (alternative) splicing (Waks et al., 2011). Pannuclear disperse intronic signals can arise as there can be more than one allele firing at a time in different nuclear locations. The spread of intronic transcripts in our images is also limited in cells in which only 1 allele is firing at E2-1 hour (Author response image 2C) or uninduced cells (Author response image 2D). Furthermore, Cote et al. discuss that “Of note, we see that increased transcription level correlates with intron dispersal, suggesting that the percentage of splicing occurring away from the transcription site is regulated by transcription level for at least some introns. This may explain why we observe posttranscriptional splicing of all genes we measured, as all were highly expressed.” This is in line with our interpretation that intron signal dispersal can occur in case of posttranscriptional splicing (Coté et al., 2023). Additionally, other studies have suggested that transcripts in cells do not necessarily undergo co-transcriptional splicing which leads us to conclude that intronic signal can be found farther away from the site of transcription. Coulon et al. showed that splicing can occur after transcript release from the site and suggested that no strict checkpoint exists to ensure intron removal before release which results in splicing and release being kinetically uncoupled from each other (Coulon et al., 2014). Similarly, using live-cell imaging, it was shown that splicing is not always coupled with transcription, and this could depend on the nature and structural features of transcript (such as blockage of polypyrimidine tract which results in delayed recognition) (Vargas et al., 2011). Drexler  et al. showed that as opposed to drosophila transcripts that are shorter, in mammalian cells, splicing of the terminal intron can occur post-transcriptionally (Drexler et al., 2020). Using RNA polymerase II ChIP-Seq time course data from ERα activation in the MCF-7 cells, Honkela et al. showed that large number of genes can show significant delays between the completion of transcription and mRNA production (Honkela et al., 2015). This was attributed to faster transcription of shorter genes which results in splicing  delays suggesting rapid completion of transcription on shorter genes can lead to splicing-associated delays (Honkela et al., 2015). More recently, comparisons of nascent and mature RNA levels suggested a time lapse between transcription and splicing for the genes that are early responders during signaling (Zambrano et al., 2020). The presence of significant numbers of TFF1 nascent RNA in the nucleus in our data corroborates with above observations. 

      • Uniform intensities across many transcripts suggests these are true signal arising from RNA molecules which would not be the case for non-specific, background signal (Author response image 2E).

      • Splicing occurs in the nucleus and intron containing pre-transcripts should be nuclear localized. Thus, intronic signals should remain localized to the nucleus unlike the mature mRNA which translocate to the cytoplasm after processing and thus exonic signals can be found both in the nucleus and the cytoplasm. In keeping with this, we observe no signal in the cytoplasm for the intronic probes and it remains localized within the nucleus as expected and can be seen in Author response image 2F, while exonic signals are observed in both compartments. This suggests to us that the signal is coming from true pre-transcripts. There is no reason for non-specific background labelling to remain restricted to the nucleus.

      • We observe that the mean intronic label counts for both the genes TFF1 and TFF3 increases upon E2-induction compared to uninduced condition (Fig. 2B). Similarly, the mean intronic count for both genes reduce drastically in the TFF1-enhancer deleted cells (Fig. 3C, D). This change in the number of intronic signal specifically on induction and enhancer deletion suggests that the signal is not an artefact and arises from true nascent transcripts that are sensitive to stimulus or enhancer deletion.

      • We expect colocalization of intronic signal with exonic signals in the nucleus, while there can be exonic signals that do not colocalize with intronic, representing more mature mRNA. Indeed, we observe a clear colocalization between the intronic and exonic signals in the nucleus, while exonic signals can occur independent of intronic both in the nucleus and the cytoplasm. This clearly demonstrates that the intronic signals in our experiments are specific and not simply background labelling (Author response image 2G).

      These studies and the arguments above lead us to conclude that the presence of intronic transcripts in the nucleus, away from the site of transcription is not an artefact. We hope the reviewer will agree with us. These analyses have now been included in the manuscript as Supplementary Figure 6 and have been added in the manuscript at line numbers 106-111, 201204,  215-217 and line 231-235. We thank the reviewer for raising this important point.

      Author response image 2.

      Dynamic induction and RNA localization of TFF1 and TFF3 transcription across cell populations using smRNA FISH A. Bar graph depicting the percentage of cells with 1,2,3,4, or greater than 4 sites of transcription for TFF1 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph on right shows the number of cells with zero or non-zero number of alleles firing. B. Bar graph depicting the percentage of cells with 1,2,3,4 or greater than 4 sites of transcription for TFF3 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph in the middle shows the number of cells with 2,3,4 or greater than 4 sites of transcription for TFF3.The graph on the right shows the number of cells with zero or non-zero number of alleles firing. C. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in cells induced for 1 hour with E2. The image shows that when a single allele of TFF1 is firing, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. D. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in uninduced cells. The image shows that when a single allele of TFF1 is firing and transcription is low, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. E. Line profile through several transcripts in the nucleus show uniform and similar intensities indicating that these are true signals. F. 60X Representative images from a single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1 (top) and InTFF3 and ExTFF3 (bottom). The image shows that there is no intronic signal in the cytoplasm, while exonic signals can be found both in the nucleus and the cytoplasm. The scale bar is 5 microns. G. 60X Representative images from single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1. The image shows that all intronic signals are colocalized with exonic signals, but all exonic signals are expectedly not colocalized with intronic signals, representing more mature mRNA. The scale bar is 5 microns.

      One substantial way to improve the manuscript is to take a careful look at previous single cell analysis of the estrogen response, which in some cases has been done on the exact same genes (PMID: 29476006, 35081348, 30554876, 31930333). In some of these cases, the authors reach different conclusions than those presented in the present manuscript. Likewise, there have been more than a few studies that have characterized these enhancers (the first one I know of is: PMID 18728018). Also, Oh et al. 2021 (cited in the manuscript) did show an interaction between TFF1e and TFF3, which seems to contradict the conclusion from Fig. 3. In summary, the results of this paper are not in dialogue with the field, which is a major shortcoming. 

      We thank the reviewer for pointing out these important studies. The studies from Prof. Larson group are particularly very insightful (Rodriguez et al., 2019). We have now included this in the discussion (line 106-111 and line 420-424) where we suggest the differences and similarities between our, Larson’s group and also Mancini’s group (Patange et al., 2022; Stossi et al., 2020). 

      The 4C-Seq data from the manuscript Oh et al. 2021 is exactly consistent with our observation from Fig 3 as they also observed little to no interaction between TFF1e and TFF3p in WT cells, only upon TFF1p deletion, did the TFF1e become engaged with the TFF3p. In agreement with this, we also observe little to no interaction between TFF1e and TFF3p in WT cells (Fig.3A). This is also consistent with our competition model for resources between these two genes. Oh et al. shows interaction between TFF1e and TFF3 when the TFF1 promoter is deleted showing that when the primary promoter is not available the enhancer is retargeted to the next available gene (Oh et al., 2021). It does not show that in WT or at any time point of E2 signalling does TFF1e and TFF3 interact.

      In the opinion of this reviewer, there are few - if any - experiments to interrogate the existence of LLPS for diffraction-limited spots such as those associated with transcription. This difficulty is a general problem with the field and not specific to the present manuscript. For example, transient binding will also appear as a dynamic 'spot' in the nucleus, independently of any higher-order interactions. As for Fig. 5, I don't think treating cells with 1,6 hexanediol is any longer considered a credible experiment. For example, there are profound effects on chromatin independent of changes in LLPS (PMID: 33536240).  

      We are cognizant of and appreciate the limitations pointed out by the reviewer. We and others have previously shown that ERa forms condensates on TFF1 chromatin region using ImmunoFISH assay (Saravanan et al., 2020).  The data below shows the relative mean ERα intensity on TFF1 FISH spots and random regions clearly showing an appearance of the condensate at the TFF1 site. Further, the deletion of TFF1e causes the reduction in size of this condensate. Thus, we expect that these ERα condensates are characterized by higher-order interactions and become disrupted on treatment with 1,6-hexanediol. These condensates are the size of below micron as mentioned by the reviewer, but most TF condensates are of the similar sizes. We agree with the reviewer that 1,6- hexanediol treatment is a brute-force experiment with several irreversible changes to the chromatin. Although we have tried to use it at a low concentration for a short period of time and it has been used in several papers (Chen et al., 2023; Gamliel et al., 2022). The opposite pattern of TFF1 vs. TFF3 expression upon 1,6- hexanediol treatment suggests that there is specificity. Further, to perturb condensates, mutants of ERa can be used (N-terminus IDR truncations) however, the transcriptional response of these mutants is also altered due to perturbed recruitment of coactivators that recognize Nterminus of ER, restricting the distinction between ERa functions and condensate formation.

      References:

      Chen, L., Zhang, Z., Han, Q., Maity, B. K., Rodrigues, L., Zboril, E., Adhikari, R., Ko, S.-H., Li, X., Yoshida, S. R., Xue, P., Smith, E., Xu, K., Wang, Q., Huang, T. H.-M., Chong, S., & Liu, Z. (2023). Hormone-induced enhancer assembly requires an optimal level of hormone receptor multivalent interactions. Molecular Cell, 83(19), 3438-3456.e12. https://doi.org/10.1016/j.molcel.2023.08.027

      Coté, A., O’Farrell, A., Dardani, I., Dunagin, M., Coté, C., Wan, Y., Bayatpour, S., Drexler, H. L., Alexander, K. A., Chen, F., Wassie, A. T., Patel, R., Pham, K., Boyden, E. S., Berger, S., Phillips-Cremins, J., Churchman, L. S., & Raj, A. (2023). Post-transcriptional splicing can occur in a slow-moving zone around the gene. eLife, 12. https://doi.org/10.7554/eLife.91357.2

      Coulon, A., Ferguson, M. L., de Turris, V., Palangat, M., Chow, C. C., & Larson, D. R. (2014). Kinetic competition during the transcription cycle results in stochastic RNA processing. eLife, 3, e03939. https://doi.org/10.7554/eLife.03939

      Drexler, H. L., Choquet, K., & Churchman, L. S. (2020). Splicing Kinetics and Coordination Revealed by Direct Nascent RNA Sequencing through Nanopores. Molecular Cell, 77(5), 985-998.e8. https://doi.org/10.1016/j.molcel.2019.11.017

      Gamliel, A., Meluzzi, D., Oh, S., Jiang, N., Destici, E., Rosenfeld, M. G., & Nair, S. J. (2022). Long-distance association of topological boundaries through nuclear condensates. Proceedings of the National Academy of Sciences of the United States of America, 119(32), e2206216119. https://doi.org/10.1073/pnas.2206216119

      Honkela, A., Peltonen, J., Topa, H., Charapitsa, I., Matarese, F., Grote, K., Stunnenberg, H. G., Reid, G., Lawrence, N. D., & Rattray, M. (2015). Genome-wide modeling of transcription kinetics reveals patterns of RNA production delays. Proceedings of the National Academy of Sciences of the United States of America, 112(42), 13115. https://doi.org/10.1073/pnas.1420404112

      Oh, S., Shao, J., Mitra, J., Xiong, F., D’Antonio, M., Wang, R., Garcia-Bassets, I., Ma, Q., Zhu, X., Lee, J.-H., Nair, S. J., Yang, F., Ohgi, K., Frazer, K. A., Zhang, Z. D., Li, W., & Rosenfeld, M. G. (2021). Enhancer release and retargeting activates disease-susceptibility genes. Nature, 595(7869), Article 7869. https://doi.org/10.1038/s41586-021-03577-1

      Patange, S., Ball, D. A., Wan, Y., Karpova, T. S., Girvan, M., Levens, D., & Larson, D. R. (2022). MYC amplifies gene expression through global changes in transcription factor dynamics. Cell Reports, 38(4). https://doi.org/10.1016/j.celrep.2021.110292

      Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A., & Tyagi, S. (2008). Imaging individual mRNA molecules using multiple singly labeled probes. Nature Methods, 5(10), Article 10. https://doi.org/10.1038/nmeth.1253

      Rodriguez, J., Ren, G., Day, C. R., Zhao, K., Chow, C. C., & Larson, D. R. (2019). Intrinsic Dynamics of a Human Gene Reveal the Basis of Expression Heterogeneity. Cell, 176(1–2), 213-226.e18. https://doi.org/10.1016/j.cell.2018.11.026

      Saravanan, B., Soota, D., Islam, Z., Majumdar, S., Mann, R., Meel, S., Farooq, U., Walavalkar, K., Gayen, S., Singh, A. K., Hannenhalli, S., & Notani, D. (2020). Ligand dependent gene regulation by transient ERα clustered enhancers. PLOS Genetics, 16(1), e1008516. https://doi.org/10.1371/journal.pgen.1008516

      Stossi, F., Dandekar, R. D., Mancini, M. G., Gu, G., Fuqua, S. A. W., Nardone, A., De Angelis, C., Fu, X., Schiff, R., Bedford, M. T., Xu, W., Johansson, H. E., Stephan, C. C., & Mancini, M. A. (2020). Estrogeninduced transcription at individual alleles is independent of receptor level and active conformation but can be modulated by coactivators activity. Nucleic Acids Research, 48(4), 1800. https://doi.org/10.1093/nar/gkz1172

      Vargas, D. Y., Shah, K., Batish, M., Levandoski, M., Sinha, S., Marras, S. A. E., Schedl, P., & Tyagi, S. (2011). Single-Molecule Imaging of Transcriptionally Coupled and Uncoupled Splicing. Cell, 147(5), 1054–1065. https://doi.org/10.1016/j.cell.2011.10.024

      Waks, Z., Klein, A. M., & Silver, P. A. (2011). Cell-to-cell variability of alternative RNA splicing. Molecular Systems Biology, 7(1), 506. https://doi.org/10.1038/msb.2011.32

      Zambrano, S., Loffreda, A., Carelli, E., Stefanelli, G., Colombo, F., Bertrand, E., Tacchetti, C., Agresti, A., Bianchi, M. E., Molina, N., & Mazza, D. (2020). First Responders Shape a Prompt and Sharp NF-κB-Mediated Transcriptional Response to TNF-α. iScience, 23(9), 101529. https://doi.org/10.1016/j.isci.2020.101529

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides a valuable characterization of individual sarcomere's contractility and synchrony in spontaneously beating cardiomyocytes as a function of substrate stiffness. The authors, however, provide an incomplete explanation for the observed heterogeneous and stochastic dynamics, so that the work remains mainly descriptive. The work will be of interest to scientists working on muscle biophysics, nonlinear dynamics, and synchronization phenomena in biological systems.

      We appreciate the reviewer’s insightful comments. A detailed explanation of the described phenomena in the form of a theoretical model and simulations was not included in our manuscript, because we believed it would be most impactful to present a detailed quantitative statistical description of the experiments in one manuscript and then introduce the model, which we already had in preparation, in a separate manuscript to avoid diluting the overall message.

      However, following the reviewers’ advice, we have now included a comprehensive model into the revised manuscript. This model qualitatively and quantitatively explains the experimentally observed phenomena and introduces a novel class of coupled relaxation oscillators based on a non-monotonic force-velocity relationship of individual sarcomeres. We believe that this addition significantly strengthens the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors experimentally demonstrated the heterogeneous behavior of sarcomeres in cardiomyocytes and that a stochastic component exists in their contractile activity, which cancels out at the level of myofibrils.

      Strengths:

      The experiments and data analysis are robust and valid. With very good statistics and unbiased methods, they show cellular activity at the individual level and highlight the heterogeneity between biological networks. The similarity of the results to the study cited in [24] demonstrates the validity of the in vitro setup for answering these questions and the feasibility of such in-vitro systems to extend our knowledge of physiology.

      Weaknesses:

      Compared to the current literature ([24]), the study does not show a high degree of innovation. It mainly confirms what has been established in the past. The authors complemented the published experiments by developing an in vitro setup with stem cells and by changing the stiffness of the substrate to simulate pathological conditions. However, the experiments they performed do not allow them to explain more than the study in [24], and the conclusions of their study are based on interpretation and speculation about the possible mechanism underlying the observations.

      We thank the reviewer for contextualizing our work with the literature. We appreciate the comparison to the study by Kobirumaki-Shimozawa et al. which we cite prominently. They observed stochastically varying beating patterns of individual sarcomeres on a beat-to-beat basis. They propose that this arises from a "titin-based mechanism" operating stochastically, which they interpret as being fundamentally linked to sarcomere-length-dependent effects. This interpretation differs from our model. We feel that the inclusion of our comprehensive model in the revised manuscript will emphasize the significance and novelty of our findings. Our work proposes a distinct alternative mechanistic explanation for the observed stochasticity, grounded in the force-velocity relationship and intrinsic stochasticity, and presents additional novel dynamic phenomena (such as popping and high-frequency oscillations) not reported in the literature yet. We outline the key advancements of our study below:

      (1) Physiologically Relevant Human Model System: Our study utilizes human induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs). Using a human cell model provides direct relevance for understanding human cardiac physiology and pathophysiology, overcoming limitations inherent in translating findings from rodent models. The hiPSC-CMs exhibit key physiological differences from the mouse ventricular myocytes observed in [24], most notably beating at a significantly lower frequency (~1 Hz or 60 bpm) compared to mice (~5-8 Hz or 300-500 bpm). This difference in timescale is critical as it allowed us to resolve complex intra-beat dynamics that may be different and also harder to observe in mouse cardiomyocytes.

      (2) Advanced Experimental Methodology and Resolution: We developed a novel assay incorporating our SarcAsM algorithm for high-throughput tracking and analysis of individual sarcomere dynamics. This approach gave us spatial resolution better than 20 nm at significantly higher sampling rates than previous studies, including Kobirumaki-Shimozawa et al. Furthermore, our high-throughput in vitro approach made it possible to analyze vastly larger datasets than, e.g., the study by Kobirumaki-Shimozawa et al. (which reports observations from fewer than 20 myofibrils, encompassing less than 200 sarcomeres in total). While we recognize that in-vivo tissue studies present unique experimental challenges, the substantially greater statistical power of our study is crucial for reliably characterizing the complex, stochastic dynamics we report. The enhanced resolution and statistical robustness are not merely incremental; they enable the detailed identification and analysis of heterogeneous behaviors that were previously inaccessible or could not be characterized with the same level of confidence.

      (3) Novel Observed Phenomena: Our high-resolution data reveals specific dynamic behaviors, such as sarcomere "popping" and high-frequency oscillations during contraction, which, to our knowledge, have not been previously reported or characterized in cardiomyocytes. The resolution limitations and the high beating frequency in mouse models may not have permitted the observation of these subtle, but potentially important phenomena.

      (4) Distinct Mechanistic Explanation and Model: Kobirumaki-Shimozawa et al. propose a qualitative model where sarcomere motion variability primarily arises from length-dependent activation. This view is essentially a static one, based on a long history of isometric skeletal muscle experiments, where time-dependent forces are not relevant. We argue that in highly dynamic cardiomyocytes this may not be the most useful approach. While we acknowledge length dependence can play a role, our integrated experimental-theoretical work proposes a different primary mechanism. Our model demonstrates that the observed stochastic heterogeneity and beat-to-beat variations, including the oscillatory motion and popping, can be quantitatively explained by dynamic instabilities arising from a non-monotonic force-velocity relationship of individual sarcomeres in conjunction with intrinsic sarcomere-level stochastic fluctuations. The model emphasizes the active, transient nature of force generation rather than solely assuming length dependence. Our model provides an alternative explanation for the observed dynamics, and a quantitative, mechanism-based understanding.

      Reviewer #2 (Public Review):

      Summary:

      Sarcomeres, the contractile units of skeletal and cardiac muscle, contract in a concerted fashion to power myofibril and thus muscle fiber contraction.

      Muscle fiber contraction depends on the stiffness of the elastic substrate of the cell, yet it is not known how this dependence emerges from the collective dynamics of sarcomeres. Here, the authors analyze the contraction time series of individual sarcomeres using live imaging of fluorescently labeled cardiomyocytes cultured on elastic substrates of different stiffness. They find that reduced collective contractility of muscle fibers on unphysiologically stiff substrates is partially explained by a lack of synchronization in the contraction of individual sarcomeres.

      This lack of synchronization is at least partially stochastic, consistent with the notion of a tug-of-war between sarcomeres on stiff sarcomeres. A particular irregularity of sarcomere contraction cycles is 'popping', the extension of sarcomeres beyond their rest length. The statistics of 'popping' suggest that this is a purely random process.

      Strengths:

      This study thus marks an important shift of perspective from whole-cell analysis towards an understanding of the collective dynamics of coupled, stochastic sarcomeres.

      Weaknesses:

      Further insight into mechanisms could be provided by additional analyses and/or comparisons to mathematical models.

      We thank the reviewer for the feedback. We have enhanced the manuscript by a comprehensive dynamic model, that we also contrast with previously proposed models.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript of Haertter and coworkers studied the variation of length of a single sarcomere and the response of microfibrils made by sarcomeres of cardiomyocytes on soft gel substrates of varying stiffnesses.

      The measurements at the level of a single sarcomere are an important new result of this manuscript. They are done by combining the labeling of the sarcomeres z line using genetic manipulation and a sophisticated tracking program using machine learning. This single sarcomere analysis shows strong heterogeneities of the sarcomeres that can show fast oscillations not synchronized with the average behavior of the cell and what the authors call popping events which are large amplitude oscillations. Another important result is the fact that cardiomyocyte contractility decreases with the substrate stiffness although the properties of single sarcomeres do not seem to depend on substrate stiffness.

      The authors suggest that the cardiomyocyte cell behavior is dominated by sarcomere heterogeneity. They show that the heterogeneity between sarcomeres is stochastic and that the contribution of static heterogeneity (such as composition differences between sarcomeres) is small.

      Strengths:

      All the results are to my knowledge new and original and deserve attention.

      Weaknesses:

      However, I find the manuscript a bit frustrating because the authors only give very qualitative explanations of the phenomena that they observe. They mention that popping could be explained by a nonlinear force-velocity relation of the sarcomere leading to a rapid detachment of all motors. However, they do not explicitly provide a theoretical description. How would the popping depend on the parameters and in particular on the substrate stiffness? Would the popping statistics be affected by the stiffness? It is also not clear to me how the dependence on the soft gel stiffness of the cardiomyocyte cell can be explained by the stochasticity of the sarcomere properties. Can any of the results found by the authors be explained by existing theories of cardiomyocytes? The only one I know is that of Safran and coworkers.

      I also found the paper very difficult to read. The authors should perhaps reorganize the structure of the presentation in order to highlight what the new and important results are.

      We are grateful for this detailed and critical feedback. The observed phenomena (stochastic heterogeneity, popping, high-frequency oscillatory motion) can indeed be explained by a nonmonotonic force-velocity relation along with stochastic fluctuations of individual sarcomeres. At the time of initial submission of this manuscript, we already had a theoretical model in preparation, which both qualitatively and quantitatively explains the observed phenomena. As a result, we included certain interpretations preemptively, which caused some lack of clarity in the absence of the full model. We have now added the model to this manuscript, providing a mechanistic interpretation of our findings. The model is different from prior models in that it emphasizes time-dependent forces, typically disregarded in models built to understand isometric skeletal muscle experiments.

      We have shortened, streamlined and restructured our manuscript to improve the readability and accessibility of our study.

      Recommendations for the authors:

      There is a consensus among reviewers that the link between the stiffness dependence of the observed stochastic dynamics and the proposed tug-of-war mechanism is unclear. More quantitative support and discussion is required, possibly using theoretical modeling.

      We are grateful for the insightful and comprehensive feedback by both editor and reviewers. As suggested, we have now added a comprehensive model explaining the observed phenomena and presenting a new conceptual view on cardiac muscle dynamics.

      Reviewer #1 (Recommendations For The Authors):

      The authors addressed an interesting question related to the dynamics of cardiac cells and their multiscale dynamics. They did a good job in terms of experimental design and data analysis. However, I fear that they do not contribute enough new information to the topic.

      The authors should refer to the study in [24] and explain better the difference between these two studies. Although the different approaches are quite obvious, it is not clear to me what additional insights they add to the problem. They conducted their experiments with different stiffnesses. However, the conclusions they draw from the study are based on speculation (e.g. about the behavior of myosin heads in relation to shortening and relaxation), while their data mainly confirm previous studies. They need to address more explicitly the novelty of their study.

      Novelty and Comparison with Previous Studies: We understand the concern about distinguishing our contribution from prior work, specifically Kobirumaki-Shimozawa et al., 2021.

      As detailed in our public response, these are the key advances:

      Use of a medically relevant human iPSC-CM model vs. mouse cardiomyocytes.

      Superior spatial and temporal resolution via our SarcAsM algorithm, revealing novel phenomena like popping and high-frequency oscillations not previously reported.

      Significantly greater statistical power due to our high-throughput in vitro assay.

      We added a distinct mechanistic explanation based on the dynamic force-velocity relationship and sarcomere-level stochasticity, contrasting with the static, deterministic titin/length-dependence focus of previous studies.

      Interpretation and Speculation: We acknowledge that without the explicit model, some interpretations in the initial submission appeared speculative. As noted in our public response, we had already started to develop a theoretical model explaining our observations at the time of submission, targeting a second follow-up publication. Including interpretations based on this unpublished model prematurely clearly caused confusion. We now include the full model in the revised manuscript.

      Integration of the Theoretical Model: We have now fully integrated the model into the revised manuscript. The model explicitly demonstrates how the non-monotonic force-velocity relationship of individual sarcomeres leads to dynamic instabilities around a critical force threshold. This instability along with stochasticity drives a 'tug-of-war' between coupled sarcomeres, generating complex emergent behaviors.

      Mechanistic Explanation Beyond Length-Dependence: Our model quantitatively reproduces all key experimental findings (stochastic heterogeneity, popping, oscillations) without relying on length-dependent activation effects. This strongly supports our conclusion that the active, transient dynamics of individual sarcomeres governed by the force-velocity relationship are fundamental drivers of these complex contractile patterns. We believe this provides a significant conceptual advance, highlighting a potentially underappreciated aspect of sarcomere dynamics. Previous models focused mostly on length-dependence, historically based on skeletal muscle fiber experiments that were often done under static, isometric conditions. We feel that the new model represents a substantial paradigm shift in understanding highly dynamic muscles such as heart muscle.

      We are confident that the inclusion of the model addresses the majority of the reviewer's concerns.

      Additional comments:

      The authors write of a tug-of-war competition between the sarcomeres, and I'm not sure what they mean by that. I would spend more words explaining this point, especially because it seems to be an important point to describe their results. Similarly, they talked about an all-or-nothing phenomenon when they described the elongation of sarcomeres. What do they mean by this?

      We have revised the manuscript where clarification was needed and now define the terms mentioned more explicitly.

      (1) "Tug-of-War": We used this term metaphorically to describe the mechanical competition between linearly coupled sarcomeres within a myofibril, especially when contracting against rigid external boundary conditions. While it is not a perfect analogy, the metaphor intuitively captures the inherent instability of this interaction: similar to how a team in a real tug-of-war might suddenly yield when one person tires and the rest of team gets overloaded, rather than steadily losing ground, the dynamic instability arising from the non-monotonic force-velocity relationship (detailed in our model, lines 300ff) can cause individual sarcomeres to abruptly change state (e.g., shorten or rapidly lengthen) while under tension from their neighbors. We have removed the term from the title and now use it more sparingly within the manuscript to better reflect its role as an illustrative analogy.

      (2) "All-or-Nothing" Elongation (Popping): The term "popping" describes our experimental observation of sudden, rapid, and extensive elongation of individual sarcomeres. This typically occurs late in the contraction cycle during early relaxation, when overall force may be declining, but individual sarcomeres can still experience significant tension from their neighbors. We described this specific type of rapid elongation in the original manuscript as an "all-or-nothing" phenomenon because, typically, sarcomeres in these events yield rapidly and strongly overshoot their resting length without recovering in a given activation cycle. The speed of popping events is substantially higher than the speed of coordinated gradual shortening observed during systoles that is driven by bound myosin heads. This observation strongly suggests an instability-driven, avalanche-like unbinding of myosin heads from the actin filaments during these events.

      We agree that the term "all-or-nothing" is not precise, and we have removed it, as it is not essential for describing the observed "popping" dynamics.

      The authors claim that the popping frequency increases as a function of stiffness. However, Figure 4E does not really seem to be a common practice in terms of statistical significance. A better description could help to remove this doubt.

      We clarified the presentation of popping frequency data and its statistical interpretation.

      (1) Popping Frequency vs. Substrate Stiffness (previously Figure 4D, now Figure 3G):

      We first corrected that the dependence of popping frequency on substrate stiffness was presented in Figure 4D, not 4E. In the revised, shortened manuscript it can be now found in Fig. 3G. Due to the large number of observations (N) in our dataset, the slight upward trend in popping frequency with increasing substrate stiffness shown in Figure 4D does reach statistical significance using standard tests. For details see Figure captions.

      (2) Popping Frequency vs. Sarcomere Resting Length (previously Figure 4E, now Figure 3H):

      Figure 4E addresses the relationship between popping frequency and the individual sarcomere's resting length. To generate this plot, we binned sarcomeres based on their measured resting length (in intervals of 0.02 µm) and calculated the mean popping frequency within each bin across all conditions. We have now clarified this in the figure caption.

      (3) Interpretation of Length Dependence:

      While Figure 3H clearly shows that longer sarcomeres are more prone to popping, we argue this is likely a modulating factor rather than the sole underlying cause. Two key observations support this interpretation:

      Even very short sarcomeres (e.g., < 1.65 µm resting length) exhibit a non-zero popping frequency (around 5-10%), indicating that popping is not exclusive to long sarcomeres.

      The distribution of resting lengths, now added to the graph, is narrower than the wide range (1.6-2.0 µm) plotted in Figure 3H. Popping still occurs stochastically within a myofibril of sarcomere with relatively similar resting lengths.

      Therefore, while length clearly influences the probability of popping, the phenomenon itself appears to be fundamentally stochastic, occurring across a range of lengths. This is consistent with our model in which dynamic instabilities (driven by the non-linear force-velocity relationship) and stochastic fluctuations are the primary triggers, while length affects probability of occurrence.

      Changes in Manuscript:

      We have revised the text associated with Figures 3G and 3H to clarify the distinction between stiffness and length dependence.

      We have added a statement in the Methods section and figure legends (e.g., Legend for Fig 3) explaining our approach to statistical analysis and interpretation for large datasets where standard p-values may be less informative.

      We believe these clarifications directly address the reviewer's concerns about the data presentation and interpretation in Figure 3.

      Reviewer #2 (Recommendations For The Authors):

      This is an interesting study, which however could and should be extended, see below. The current manuscript contains much less information than its length suggests; its figures contain partially redundant data.

      Taking into account this critical feedback, we have restructured, streamlined and shortened the manuscript to improve readability and accessibility.

      (1) How regular are the cellular contraction cycles?

      Have the authors computed a coefficient of variation of cycle durations?

      Does this regularity depend on substrate stiffness?

      We have substantially improved the detection accuracy of contraction intervals compared to our initial submission (details see SarcAsM, https://www.biorxiv.org/content/10.1101/2025.04.29.650605v1). We calculated the beating rate variability (defined as the standard deviation of cycle durations), and found a low variability of on average less than 0.05 s across the tested conditions. The distribution of this variability is positively skewed, with the majority of values clustering near zero. We have added new panels showing these results to Fig. S2B.

      (2) Which experiments could the authors perform to identify the origin of the apparent 3-Hz oscillations?

      Would these oscillations persist even if the cardiomyocytes would not beat?

      We now address these questions in the revised manuscript.

      (1) Active Nature: The ~3 Hz oscillations are clearly linked to active contraction. They are absent in quiescent, non-beating cardiomyocytes observed under identical conditions, confirming that they are not passive fluctuations or baseline cellular tremors.

      (2) Signal Fidelity: We are confident these are genuine physiological events, not artifacts. Our high temporal resolution (~15 ms frame time) and tracking accuracy (< 20 nm) allow reliable detection because events are well above system noise. This is now explained in the revised manuscript.

      (3) Can the authors augment their study by modeling?

      For example, could the experimental data be fitted by a Kuramoto-type model of the form d phi_i / dt = eps*sin( Omega - phi_i ) + lambda*sin( phi_i - phi_i+1 ) + xi_i, combining phase-locking of sarcomere oscillations with phase phi_i to intracellular calcium oscillations with phase Omega, and anti-phase synchronization between neighboring sarcomeres, as well as noise xi?

      If yes, how would the coupling strength depend on subtrate stiffness?

      We now added a model. While a Kuramoto-type phase model is powerful for studying synchronization, we determined that a more mechanistic approach was required. Crucially, sarcomeres are mechanically coupled in series within a myofibril, and this direct physical linkage is not well-represented by the abstract, phase-based coupling of a Kuramoto model.

      Instead, our model comprises serially coupled sarcomeres, each governed by an underdamped Langevin equation. This framework allowed us to infer the force-velocity relation without any prior assumptions directly from our experimental data, revealing a critical non-monotonic characteristic. As we now emphasize in the revised manuscript, this behavior is mathematically equivalent to a Van-der-Pol relaxation oscillator, which reflects the instability-driven nature of the system.

      Furthermore, and in line with the reviewer's suggestion, our model incorporates a stochastic noise term which we found essential for reproducing the observed phenomena. Without this noise term, the characteristic sarcomere dynamics do not emerge (Fig. 5).

      (4) What is the maximally extended length of titin, and how does this length correspond to the maximal length of popping sarcomeres?

      The force-extension curves of titin have been measured in single-molecule experiments (and the packing density of titin is known) - can the authors use this information to infer the forces acting inside sarcomeres?

      We thank the reviewer for this thoughtful question. While sarcomere length during popping can be measured, inferring the corresponding intra-sarcomeric force is not straightforward in a living, contracting cardiomyocyte. The relationship between extension and force is complex and dynamic, involving multiple molecular components.

      Our data show elongations up to 0.5 μm during popping events. While this magnitude is plausibly within the extensibility range of titin and other mechanically relevant components (Caporizzo & Prosser, 2021; Loescher & Linke, 2023), directly inferring force from this observation is challenging. In such a multi-component system with both active and passive elements, total force comprises several factors that cannot be disentangled from a simple length measurement alone. First, the system is dominated by active, velocity-dependent force generation of cross-bridges, which our model shows is non-monotonic. Second, titin exhibits a restoring force that is strongly strain-rate dependent (Rief et al., 1997), critical during rapid elongation. Third, viscous drag forces within the sarcomere are also highly strain-rate dependent, contributing significantly during rapid length changes. Fourth, other structural elements such as microtubules and intermediate filaments contribute to viscoelastic properties, particularly at high strains (Caporizzo & Prosser, 2021). This complex interplay makes it impossible to map a given sarcomere length to a unique force value using single-molecule titin data alone.

      (5) I urge the authors to make their raw data openly available.

      We agree on the importance of data availability. While the complete raw imaging dataset is several hundred gigabytes and thus impractical to deposit, we have uploaded a comprehensive dataset to Zenodo to ensure full reproducibility. This repository includes a representative subset of raw imaging data (50 cells per condition), with corresponding sarcomere motion data provided in a readable JSON format. Crucially, the deposition also contains the complete aggregated data underlying all figures and statistical analyses presented in the manuscript. All provided data can be programmatically accessed and analyzed using our `SarcAsM` Python API. The data can be accessed at: https://doi.org/10.5281/zenodo.17564384.

      Minor

      (1) How did the authors determine the start and end of contraction cycles when analyzing their data?

      The start and end points of each contraction cycle were identified using ContractionNet, a custom convolutional neural network we developed for this purpose. This method, used for all analyses in the revised manuscript, detects contraction intervals with high accuracy directly from sarcomere dynamics time-series data and significantly outperforms the threshold-based approach used previously. The complete methodology, algorithm description, and validation of ContractionNet are detailed in our companion paper on the SarcAsM analysis software

      (www.biorxiv.org/content/10.1101/2025.04.29.650605v1, see Fig. S6).

      (2) What are the measurement errors in determining Delta_SL?

      The measurement error for the Z-band trajectories is approximately 17 nm. This high tracking accuracy is achieved with our deep-learning-based Z-band segmentation approach, which employs a 3D convolutional neural network (3D U-Net) to leverage both spatial and temporal context for robust Z-band segmentation in noisy, high-speed recordings. A full description of this validation is available in our SarcAsM companion paper (see Figure S3 therein).

      (3) Does popping occur while other sarcomeres are still contracting?

      This is an important point. Yes, popping frequently occurs while other sarcomeres within the same myofibril are still actively shortening. This simultaneity is clearly visualized in the newly added Movie M1, which displays a phase-space plot (velocity vs. length change relative to rest) for all tracked sarcomeres over time. In this visualization, popping events appear as trajectories moving into the top-right quadrant (rapid elongation), while concurrently, other sarcomeres are represented by points in the left quadrants (negative velocity), indicating ongoing shortening. We have included Movie M1 as supplementary material.

      (4) The authors argue that their data on popping sarcomeres is consistent with homogeneous popping probabilities.

      (5) Can the authors assess in simulations how dispersed the popping probabilities of individual sarcomeres could be before they would notice a statistically significant difference to the homogeneous case?

      This question touches on a key challenge in analyzing these complex dynamics. A direct statistical test of popping probability for each individual sarcomere is not feasible, as the number of events per sarcomere over our observation time is too low for robust single-unit analysis. Consequently, our approach relies on testing the cumulative distributions of inter-event spatial distances and temporal gaps across all sarcomeres within a given region (LOI).

      In nearly half of the analyzed LOIs, these cumulative distributions were statistically indistinguishable (p > 0.05) from the geometric distribution expected for a single, homogeneous stochastic process. This provides strong support for our primary conclusion that popping is fundamentally a random phenomenon.

      For the cases that deviate from the homogeneous model, we argue that this does not refute the underlying stochasticity of the events. Instead, we propose this is the expected statistical signature of pooling data from a population of sarcomeres that have slight, intrinsic variations in their individual popping probabilities due to factors like resting length or structural integrity. Even if each sarcomere's popping is a locally random event, a cumulative test performed on a population with varied baseline probabilities is expected to detect a deviation from a simple, homogeneous model.

      Regarding the requested simulation study: While we agree this would be methodologically informative, the sensitivity to detect probability dispersion depends on multiple interacting factors (number of sarcomeres per LOI, observation time, event rates, and the assumed form of heterogeneity). Any single simulation scenario would therefore be highly model-dependent and of limited generality. Rather than introducing additional assumptions, we base our conclusions on the observed agreement with the homogeneous model in approximately half of LOIs and the correlation of deviations with measurable properties (Fig. 4E). A comprehensive statistical analysis would constitute a substantial methodological study beyond the scope of this mechanistically focused manuscript.

      (6) Can the authors measure sarcomere rest length and check if this rest length is correlated with the popping probability of individual sarcomeres?

      Yes, we performed this analysis. As shown in Figure 3H (previously Fig. 4E), we found a positive correlation between sarcomere resting length and popping frequency, confirming that longer sarcomeres have a higher probability of popping.

      Importantly, however, the popping probability remains non-zero even for shorter sarcomeres. As detailed in our response to Reviewer #1 regarding this figure, we interpret resting length as a significant modulating factor that influences popping probability, rather than the sole determinant of the phenomenon.

      (7) Several mathematical models of sarcomere contraction exist (e.g., crossbridge models).

      (8) Could the authors perform computer simulations of several such stochastic sarcomere models coupled in series?

      Alternatively, could the authors discuss this?

      As I understand, references 16-18 model myofibril contraction assuming static variability of sarcomeres, but do not account for stochasticity in the contractility of individual sarcomeres.

      We thank the reviewer for this excellent suggestion. We have performed such simulations, and the theoretical model is a central component of our revised manuscript (new Figures 4 and 5; manuscript lines 316ff).

      As the reviewer points out, previous models (e.g., refs 12 and 14 in our manuscript) have often relied on predefined static variability between sarcomeres to explain heterogeneous behavior. Our work takes a fundamentally different approach. We model the myofibril as a chain of serially coupled sarcomeres, where the dynamics of each unit are governed by an underdamped Langevin equation. This formulation inherently incorporates stochasticity and describes the interplay between a non-monotonic, velocity-dependent active force, a length-dependent passive force, and the mechanical coupling to its neighbors.

      Crucially, the model parameters were not assumed, but were instead inferred by fitting the model directly to our experimental data using a gradient-free optimization algorithm. This data-driven stochastic model was sufficient to quantitatively reproduce key observed phenomena, including high-frequency oscillations and popping events. Our central finding is that these complex behaviors emerge naturally from the coupled system, driven by the non-monotonic force-velocity relationship and intrinsic stochastic fluctuations. This demonstrates that predefined static heterogeneity is not required to explain the observed dynamics.

      (9) The manuscript could be shortened (e.g., lines 52-56 in the introduction provide little extra value).

      We have significantly revised the entire manuscript to improve clarity and readability. We have removed sentences in the introduction as suggested and substantially restructured major sections. One of the main reasons for this was the integration of our theoretical model, which was originally prepared as a separate manuscript. This required us to completely reframe the introduction and reorganize the figures and results.

      We are confident that these extensive changes have resulted in a stronger, more concise and impactful paper that now integrates our experimental findings with a theoretical model.

      (10) Figure 2 is overloaded with data. Several panels could be moved to the SM without compromising the key message.

      Introducing the notation in panels Figures 2A-C does not seem ideal to me; maybe add a cartoon?

      We agree that the Fig. 2 was dense. We have redesigned panels A-F to improve clarity and better guide the reader. We now use a consistent color-coding scheme to link the extrema in the phase portraits (A-C) to the corresponding distributions of individual sarcomeres (E-G). We have also revised the accompanying text to make the figure's logic more transparent.

      We have considered moving panels A-C to the supplementary materials. However, we believe their placement in the main text is crucial for two reasons:

      (1) Revealing Core Dynamics: The length-velocity phase portrait is the first visualization that reveals the underlying near-oscillatory dynamics of individual sarcomeres. This was not an assumed behavior but a critical experimental observation that directly motivated our entire theoretical modeling effort. We now also provide animated versions of these plots (Movies X-Y) to further illustrate these complex dynamics.

      (2) Enabling Model-Experiment Comparison: A phase portrait is a standard tool for comparing experimental data with theoretical models. Retaining it in the main text allows us to directly compare data and model in our new Figures 4 and 5, providing a clear validation of our model.

      (11) Similarly, Figures 4F, G, and H seem dispensable to me.

      (I also wonder how clear the analogy of a coin flip is if a biased coin with probabilities p and 1-p needs to be used.)

      We agree that the previous Figure 4F, which served a purely illustrative purpose, was dispensable and have removed it. The "coin flip" analogy was potentially confusing and we have removed it.

      As part of a broader restructuring of the manuscript, the quantitative analyses from the original Figures 4G and 4H are now presented as Figures 3I and 3J. They provide important supporting evidence for the stochastic nature of the resulting popping events. We believe retaining this quantitative analysis is valuable, and we hope that by streamlining the figure and removing the analogy, we have addressed the reviewer's concerns.

      (12) Equation (1) is unnecessarily complicated. The same holds for Equation (2).

      It might make sense to separate definitions for serial and mutual correlations.

      (This would also simplify the axes labels in Figure 3C.)

      (13) The notation used in Equation (1) is not fully clear.

      I assume t denotes a unit-less time index and T is the unit-less duration of a contraction cycle, measured in multiples of a fixed time interval?

      Regarding comments (12) and (13):

      We thank the reviewer for these helpful suggestions. In response to comment (12), we have separated the definitions for the mutual (r<sub>m</sub>) and serial (r<sub>s</sub>) correlation coefficients, presenting them as distinct calculations rather than as special cases of a single, more complex formula. This makes their definitions more direct and explicit. The calculation for the serial correlation coefficient has also been streamlined into a concise inline definition.

      In response to comment (13), we have clarified the notation in Equation (1). In the manuscript text (lines 208f), we now explicitly state that 𝑡 represents the discrete, unitless time index (i.e., the frame number) within a time-series, and 𝑇 is the total number of frames (i.e., the total duration in frames) of a given contraction cycle.

      While Equation (1) itself is the standard definition for the uncentered correlation coefficient and cannot be algebraically simplified, we have added text to specify this and justify its use. This metric (equivalent to cosine similarity) is appropriate for our analysis as it assesses the similarity in the shape of motion patterns, independent of their mean values.

      Finally, to further streamline the paper, we have removed the velocity correlation analysis and the corresponding parts of Figure 3.

      (14) The authors should make clear in all figures what is experiment and what is simulation.

      We have now clarified the nature of each graph in the figure captions.

      (15) The caption of Figure 3C could be simplified.

      We have simplified all figure captions.

      (16) I found Figure 3A hard to understand.

      We concluded that Figure 3A was confusing and did not add essential information to the manuscript. We have removed it entirely.

      Reviewer #3 (Recommendations For The Authors):

      In conclusion, l think that the manuscript would gain a lot if some more precise and more quantitative interpretation of the results were given. This might require a collaboration with theorists.

      We have integrated a novel theoretical framework into the revised manuscript (new Figures 4 and 5; manuscript lines 300ff as described above.

      This new section introduces a data-driven, stochastic dynamical model that simulates the myofibril as a chain of serially coupled sarcomeres. Each sarcomere's motion is governed by an underdamped Langevin equation, a formulation that inherently accounts for stochasticity. Crucially, our model incorporates a non-monotonic force-velocity relationship inferred directly from our experimental data, rather than relying on predefined static variability between sarcomeres a key distinction from previous theoretical work.

      This integrated model successfully and quantitatively reproduces all major experimental phenomena described in the paper, including high-frequency oscillations and stochastic "popping" events. It demonstrates that these complex behaviors emerge naturally as dynamic instabilities from the coupled system. This addition elevates the manuscript from a descriptive study to one that provides a predictive, mechanism-driven framework for understanding sarcomere dynamics.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is a theoretical analysis that gives compelling evidence that length control of bundles of actin filaments undergoing assembly and disassembly emerges even in the absence of a length control mechanism at the individual filament level. Furthermore, the length distribution should exhibit a variance that grows quadratically with the average bundle length. The experimental data are compatible with these fundamental theoretical findings, but further investigations are necessary to make the work conclusive concerning the validity of the inferences for filamentous actin structures in cells.

      We think this is an excellent assessment of the article. We suggest adding a sentence after the first one: “The distribution of bundle lengths is not Gaussian but Gumbel, since the bundle length is the length of the longest filament in the bundle.”

      Public Reviews:

      Reviewer #1 (Public Review):

      Actin filaments and their kinetics have been the subject of extensive research, with several models for filament length control already existing in the literature. The work by Rosario et al. focuses instead on bundle length dynamics and how their fluctuations can inform us of the underlying kinetics. Surprisingly, the authors show that irrespective of the details, typical "balance point" models for filament kinetics give the wrong scaling of bundle length variance with mean length compared to experiments. Instead, the authors show that if one considers a bundle made of several individual filaments, length control for the bundle naturally emerges even in the absence of such a mechanism at the individual filament level. Furthermore, the authors show that the fluctuations of the bundle length display the same scaling with respect to the average as experimental measurements from different systems. This work constitutes a simple yet nuanced and powerful theoretical result that challenges our current understanding of actin filament kinetics and helps relate accessible experimental measurements such as actin bundle length fluctuations to their underlying kinetics. Finally, I found the manuscript to be very well written, with a particularly clear structure and development which made it very accessible.

      We are grateful to Reviewer #1 for this very favorable assessment.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a theoretical study of the length dynamics of bundles of actin filaments. They first show a "balance point model" in which the bundle is described as an effective polymer. The corresponding assembly and disassembly rates can depend on bundle length. This model generates a steady-state bundle-length distribution with a variance that is proportional to the average bundle length. Numerical simulations confirm this analytic result. The authors then present an analysis of previously published length distributions of actin bundles in various contexts and argue that these distributions have variances that depend quadratically with the average length. They then consider a bundle of N-independent filaments that each grow in an unregulated way. Defining the bundle length to be that of the longest filament, the resulting length distribution has a variance that scales quadratically with the average bundle length.

      Strengths:

      The manuscript is very well written, and the computations are nicely presented. The work gives fundamental insights into the length distribution of filamentous actin structures. The universal dependence of the variance on the mean length is of particular interest. It will be interesting to see in the future, how many universality classes there are, and which features of a growth process determine to which class it belongs.

      Weaknesses:

      (1) You present the data in Fig. 3 as arguments against the balance point model. Although I agree that the data is compatible with your description of a bundle of filaments, I think that the range of mean lengths you can explore is too limited to conclusively argue against the balance point model. In most cases, your data extend over half an order of magnitude only. Could you provide a measure to quantify how much your model of independent filaments fits better than the balance point model?

      Indeed, we agree that the experimental data we present, each on their own, provide inconclusive evidence of the scaling predicted by our model. However, in aggregate, as presented in Fig. 3E, the data make for compelling evidence of scaling of the variance with the average length squared, as quantified by the power-law fit. Also, we think that Fig. 3E argues strongly against the Balance Point Model, because the data do not conform with simple linear scaling (indicated by the dashed line in Fig. 3E). Regardless, we agree with the referee that better data is needed to make a more convincing case, and we see this paper as a call to arms to collect such data in the future. The published data we used (other than our own data from experiments on yeast actin cables) is from experiments that were not designed with this question in mind, i.e., how do length fluctuations scale with the mean?

      (2) Concerning your bundled-filament model, why do you consider the polymerizing ends to be all aligned? Similarly to the opposite end, fluctuations should be present. Furthermore, it is not clear to me, where the presence of crosslinking proteins enters your description. Finally, linked to my first remark on this model, why is the longest filament determining the length of the bundle in all the biological examples you cite? I am thinking in particular about the actin cables in yeast.

      In the case of the yeast actin cables (which grow from the bud neck into the mother cell), we know that the formins that polymerize the actin filaments are spatially aligned at the bud neck. In the cases of stereocilia and microvilli, again the polymerizing ends of the actin filaments are well-aligned at the growing tips of these bundled actin structures, as indicated by classic EM studies from Lew Tilney and others. The alignment of polymerizing actin filament ends is more difficult to assess at the leading edge of lamellipodia, because of undulating shape of the polymerization (membrane) surface. In fact, this could be the reason why data from the lamellipodia experiments deviate from the line in Fig. 3E, in contrast to the data from the other three structures (this is discussed in some detail in the Supplement). Regarding the actin crosslinkers, the only role they play in our model is keeping the filaments connected in the bundle. As far as the question of why the longest filament in the actin cable is the one that specifies the length of the cable, this is addressed in more detail in our McInally et al., 2024 (PNAS) paper, where we measured cable length by segmenting the fluorescence signal of the cable. Therefore, the filaments in the bundle that extend the furthest define the reported length. Also, given the function of the cables for transporting vesicles, the furthest reach of the filaments in the bundle defines the area from which the vesicles are collected.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      An important result of the model proposed by the authors is that the relationship between bundle mean length and variance should also inform the number of filaments in the bundle (Equation 13). In the SI the authors thus predict from fitting experimental results that bundles should be made of around 173 filaments, which is larger than most values proposed in the literature (and quoted in this work), except for stereocilia. Can the authors comment on this?

      This is an interesting point that we have been thinking about. Indeed, the model does relate the number of filaments to the variance of the length, but this dependence is logarithmic and therefore insensitive to changes in the number of filaments. Consequently, the number 173 comes with very large error bars and should be thought of more like a few hundred filaments in terms of the precision with which we can extract this number from data. We make this point more clearly in the revised SI, where we now say that based on the data the best we can do is say that the number of filaments is between 80 and 400.

      Along the same lines, in their derivation of Equations 12 and 13 (a key result of the manuscript) the authors make some approximations that are only valid for large N (number of filaments in the bundle). Is this approximation valid for actin cables or filopodia, estimated to comprise only around 10 filaments?

      Indeed, even for N=10 filaments the approximate formulas have errors that are well below what can be measured. We consider the details of the approximation in deriving Equations 12 and 13 from the exact distribution (Equation 11) in the Supplemental section “Distribution of bundle lengths when individual filament lengths are exponentially distributed”. For example, the exact result involves the harmonic number which for N=10 is 2.88, while the approximate formula ln(N) + gamma we use yields 2.92, a fractional error that is < 2%.

      A key assumption of the model is that the bundle length corresponds to the maximum individual filament length inside the bundle. Couldn't bundles comprise several filaments one after another, head-to-tail? What do the authors expect then?

      Excellent point. Indeed, this is precisely the geometry of the yeast actin cable. In our previously published McInally et al., 2024 (PNAS) paper we worked out the math in that case and found that the main result about the variance holds. In this paper we presented a simpler, model that retains the same features of the one presented in the PNAS paper to better accentuate the origins of the scaling of the variance with the mean length, which is simply the result of bundling and identifying the length of the bundle with the length of the longest filament (or, more precisely, furthest extending filament) in the bundle.

      The model also allows us to relate the bundle length fluctuations and average to the individual filament characteristic length (Equations 12 and 13 again). Can the authors comment on the values of 〈l〉 they would obtain for experimental data?

      It is hard to give a precise number, as we would need to know also the number of filaments in the bundle, and for that we would need better electron microscopy data (which has proven difficult for the field to obtain). Still with typical numbers in the 10s to 100s the expected average filament lengths are roughly, ln(10) – ln(100), or 2-5 times smaller than the average bundle length.

      I find the Methods section a bit underwhelming. In particular, can the authors give more details on their treatment of experimental data? Bootstrapping sampling is mentioned but there is no information on the size of the original data sets, which could affect the validity of such a method.

      Thanks for the criticism. We have added details regarding the sizes of the data sets used in the analysis in the Methods section.

      Along the same lines, is the graph in Figure 1E the result of a simulation like the ones the authors used to obtain their result or is it just a schematic? If the first, I would suggest replacing it with an actual simulated length trajectory. In general, I think this work would benefit from more detailed explanations and examples of how stochastic trajectories were computed and analysed.

      This is also a good point. We still prefer to keep the schematic in this figure since our goal here is to define the question before we commence with computations and data analysis. The stochastic trajectories were generated using the standard Gillespi algorithm and the statistics of length were gathered once the dynamics of length reach steady state. We explain this in the Methods section and give more details in the Supplement.

      Finally, while I find the writing in this manuscript to be excellent, I think the figures require some work. The schematics and drawings, which are very low resolution, the font size for the axes, and the choice of colours all make it more cumbersome than necessary to understand what is being shown.

      Thank you for pointing this out. We have made better versions of the figures.

      Reviewer #2 (Recommendations For The Authors):

      "In this case, the length distribution of the bundle derived from extreme value statistics, leads to a peaked non-Gaussian distribution, even when filaments within the bundle are unregulated and exponentially distributed."

      You mention "extreme value statistics" only once, in the introduction. I would suggest that you come back to this notion and explain how your results connect to extreme value statistics or delete it from the manuscript.

      Good point. We added a sentence to draw the reader’s attention to the fact that our result is an extreme value distribution (Equation 11 is the Gumbel distribution) used in statistics of extreme events.

      This is a follow-up of one of my major points of criticism: Fig. 3A: why do you fit (if I understand correctly) the blue and orange data points with the same power law? For (A-- D) The data extend over less than an order of magnitude. Why is a power law fit appropriate? Can you quantify how much better your fits are compared to a linear dependence? Bundling the data of all structures yields a common matter curve (with the exception of filopodia). This is quite remarkable, I think, and merits some more discussion than currently given in the manuscript.

      Good point. We should have been more clear. In Figures 3A-D we show individual data sets for the different bundle structures and compare the prediction of the Balance Point Model (dashed line) to the data. We also do a fit to a power law to show that the data is consistent with the Bundle model. This comparison is made much more clear in Figure 3E.

      Fig 1B, right does not show the addition and removal of subunits - Fig. 1C does. Panel C is not explained in the caption. The second appearance of (D) in the caption could be omitted.

      Good points. We fixed these issues in the new version of the Figure and caption.

      "For individual actin filaments (...)" I found this and the following paragraph slightly confusing at first reading: as long as you write about single filaments, do you have annealing in mind, where two filaments merge and form a longer filament? In case you consider a bundle, do you consider a filament that is cross-linked to other filaments and thereby added to the bundle? Similarly for removing filament segments (severing or unbundling)? Probably, my confusion is a consequence of you seemingly using filament to describe bundles as well as single actin filaments.

      Sorry for the confusion. We tried to be consistent throughout the text and use “filament” to denote a single actin filament and “bundle” a collection of parallel filaments crosslinked together. The assembly and disassembly dynamics of the filaments in the bundle are only relevant to the extent that they affect the length distribution of individual filaments. The main result is largely independent of that (as demonstrated in the Supplement by considering different single filament distributions) once we decide that the length of the bundle is given by the length of the longest filament in the bundle. This is the point of extreme value statistics where a universal, Gumbel distribution for the length of the longest filament in the bundle arises independent of the length distribution of a single filament (this result is akin to the Central Limit Theorem which predicts a Gaussian distribution of the mean of a large number of random numbers irrespective of the distribution they’re drawn from.)

      In Figure 4D, the variance of the filopodia lengths" Probably Figure 3D?

      Yes. Thank you. We fixed this.

      "The filopodia data seemingly has the same slope (...) but with variances higher than what is measured for other actin structures." This finding does not contradict the main statement of a nonlinear scaling of the variance with the mean length, right? I therefore find this discussion slightly peripheral and also confusing. Also, what is the reason to assume that EM might get the actual length of filopodia wrong by a factor of 2 to 3?

      The issue with filopodia is that the way the lengths are measured is by the extent to which the structure as a whole protrudes from the cell. This leaves unresolved the lengths of the actual filaments in the structure, and we suspect that they are longer as they extend into the cytoplasm. This would contribute to the shift off the common curve in the direction that is observed (larger variance associated with smaller average length). We have no way to justify that this would lead to a 2-3 factor other than that would be enough to collapse the data onto the common curve. Clearly more careful experiments are needed to resolve the issue. We added some clarifying remarks to this effect into the discussion.

      Eq.(14) What is Z?

      Thanks for pointing out this omission. Z = L/<L> and we have added that in the formula where Z appears.

      LIST OF CHANGES

      Here we summarize the changes we made to the manuscript and the Supplementary material in response to the reviewers.

      (1) Fixed typo: Figure 1 legend had two parts labelled D which has been changed into a D and a C. The explanation of panel C has been added.

      (2) Fixed typo: The incorrect call to Figure 4D is now corrected to Figure 3D.

      (3) In the Supplementary material we made more precise our estimate of the number of filaments. The wording “From this we can estimate the number of filaments. We find, with a confidence interval of…” we have changed to “From this we can estimate the number of filaments to be between 80 and 400 which compares favourably to the typical number of filaments in the different actin structures that were analyzed.”

      (3) In the Methods section we added the number of measured filament lengths in the different data sets used in the analysis.

      (4) We made better (higher resolution) versions of all the Figures.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The factors that create and maintain diversity in host-associated microbiomes remain poorly understood. A better understanding of these factors will help in the efforts to leverage the adaptive potential of the microbiome to help solve pressing problems in health and agriculture.

      Experimental evolution provides a promising path forward as we can track the causes and consequences in the emergence of novel variants, but experimental evolution remains underutilized in host-microbiome interactions. Here, Gracia-Alvira utilizes a long-term experimental evolution study in Drosophila simulans under hot and cold temperature regimes to identify strain-level variation in an important fly bacterium, Lactiplantibacillus plantarum. They identify three strains of L. plantarum, which are most prevalent in their respective three temperature regimes, suggesting that these are locally adapted bacteria. Then, using a combination of genomics, in vitro, and in vivo, Gracia-Alvira et al attempt to understand the factors that led to the differentiation of the hot and cold L. plantarum and their impacts on the fly host.

      Strengths:

      This is an excellent use of experimental evolution to track the emergence of novelty in the microbiome. The genomic analyses are all solid and appropriate for the data sets. It is especially striking that the comparisons with the other, independent experimental evolution studies in different labs (and across continents between Portugal and South Africa) show a consistent response to temperature. Many have disregarded the microbiome as it is something that is too sensitive to seemingly innocuous variables (particularly in the fly microbiome), such that we cannot find generalities. However, this finding highlights the potential for experimental evolution to uncover these dynamics. The question of how strains emerge and are maintained is timely and is one of the key open questions in host-microbiome evolution currently.

      Weaknesses:

      (1) The framing in the title and throughout the discussion about "subspecies competition" does not match the data that was collected. The subspecies competition requires actually tracking the competitive outcomes between the hot, cold, and unevolved L. plantarum. In the in vivo work, I can see that mixes of the strains were made, but they did not track whether the cold strain outcompeted the hot strain in vivo under cold conditions, for example.

      We thank the reviewer for the honest concern and take this opportunity to defend our claim of "subspecies competition used across the manuscript. As the reviewer states, subspecies competition requires tracking the competitive outcomes between the three clades, and this is what we did by sampling and sequencing across ten years of experimental evolution (Figures 4 and S3). For this reason, we point that the subspecies competition assessment comes from the direct observation of changes in relative abundance across the time series, and not from the follow-up experiments in vivo or in vitro.

      While Figure 4 is suggestive that there is ongoing competition in the hot temperature regime, this is not necessarily shown in the cold, which is dominated by the C clade. It could also be that the bacteria cannot survive in the flies at the different temperatures. The growth curve assays hint that the bacteria can grow, but the plate reader couldn't actually maintain the 18 {degree sign}C temperature (line 455). So all of this evidence is very indirect and insufficient to say that strain competition is driving these patterns.

      We thank the reviewer for the alternative hypothesis that could explain the observed subspecies dynamic. We rule out that dominance of clade C in the cold occurs because the other two clades cannot grow in this regime based on three pieces of evidence:

      (1) In the time series, clades H and U decrease, but never disappear (Figures 4 and S3), even showing some peaks of abundance in specific replicate populations (Figure S3).

      (2) We isolated individuals belonging to clade H in the cold-evolved populations, as shown in figure 2. This is a direct evidence that clade H prevails in the cold-evolved populations, although in low abundance.

      (3) We did grow the three taxa in fly food petri dishes incubated at both temperature regimes, observing growth in all cases.

      We will include the food growth experiment in the revised manuscript as further supporting evidence for growth in both regimes.

      (2) The in vivo results are interesting in that there appears to be a fitness cost of clade C, but the explanation is underdeveloped. I say under-developed because in Figure 4, the cold L. plantarum remains much higher throughout adaptation to the hot temperature regime than the hot L. plantarum in the cold regime. The hot L. plantarum is low abundance throughout the cold regime. I felt like this observation was not explained, but it seems relevant to understanding the strain dynamics.

      We acknowledge that a strong fitness cost of clade C is observed in axenic D. melanogaster. In the native host, D. simulans, with reduced microbiome, we observed delayed development that could even be an advantage depending on the situation, as pointed out by reviewer 3 in the recommendations.

      Even if we assume that flies colonized with clade C are less fit in the experimental evolution, another caveat is whether the flies can actively select for the L. plantarum clade. Under this assumption, a clade that imposes a fitness cost to the fly (clade C) should be selected against over time because the flies colonized by this clade will have less offspring, or develop later than the rest. Alternatively, as the microbiome is shared among all the individuals in the population, the host might not be able to “purge” the pernicious clade, and L. plantarum dynamics might be controlled solely by the relative fitness between clades in the given experimental treatment. We will discuss this hypothesis in the revision as a way to explain the relationship between the abundance of each clade and the effect on the host.

      I will also note that this is not the first time that L. plantarum or other Lactobacillus have been shown to exert fitness costs to Drosophila. Gould, PNAS, 2018, shows that both Lactobacillus plantarum and Lactobacillus brevis in mono-association have lower fitness (measured through Leslie matrix projections using lifespan and fecundity) than axenic flies. Many studies of wild Drosophila fail to find Lactobacillus, or it is low abundance (e.g., Chandler, PLoS Genetics, 2014; Wang, Environmental Microbiology Reports, 2018; Henry & Ayroles, Molecular Ecology, 2022; Gale, AEM, 2025). This might help provide useful context for the in vivo results.

      We thank the reviewer for the references. These observations will be compared to our phenotypic results and discussed in the revised version of the manuscript.

      (3) The data in Figure 4 are compelling to focus on the L. plantarum variants. However, I can see from the methods that the competitive mapping included only other strains of Wolbachia.

      We appreciate the thorough reading of the methods by the reviewer. The competitive mapping comprised two steps: first we discarded the reads that mapped to Drosophila, Wolbachia and additional potential contaminants from sequencing facitilies (human, dog...). This step leaves the reads originated from whole the external microbiome of the flies, including L. plantarum. The second competitive mapping step recruits the reads that map any clade of L. plantarum.

      It is not clear how other members of the microbiome changed in response to the temperature regimes. As I note in point #2, given that Lactobacillus is often rare, it is not clear what the rest of the microbiome looks like over the course of adaptation. Indeed, it seems like Mazzucco & Schlotterer, PRSB, 2021 did a broader analysis of the microbiome and found that Acetobacter is by far the most common bacterium (I think this data is also part of the data shown here?). Expanding on why or why not in this context is important and will improve this study, particularly if the focus is on connecting these evolutionary dynamics to ecological competition to explain the emergence of strain diversity.

      We acknowledge that the rest of the Drosophila microbiome is not addressed in this study, as we wanted to focus the storyline around the intraspecific dynamics found in L. plantarum. We consider that a complete characterization of the whole Drosophila microbiome would unnecessarily elongate the paper and thus we treat it as a constant biotic factor.

      We must point out that our dataset is not the one reported by Mazzucco & Schlötterer, which was done in D. melanogaster, rather than D. simulans. Nevertheless, both experiments share the same infrastructure, temperature regimes and fly maintenance.

      We will include a list of taxa that were isolated from the populations, as well as to report L. plantarum prevalence and abundance across the experiment in order to provide context of the microbiome, beyond L. plantarum, to the readership.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gracia-Alvira et al. investigated how environmental temperature affects competition among members of the microbiome, with a focus on intraspecific diversity, using the Drosophila model. Notably, the authors identified three clades of Lactiplantibacillus plantarum from a natural population of Drosophila simulans collected in Florida. They tracked the dynamics of these three bacterial clades under two temperature conditions over the course of more than ten years. Using comparative genomics and phylogeny, they showed that these three bacterial clades likely adapted to their host independently in a temperature-specific manner. Further, by combining in vitro culture and in vivo mono-association assays, they demonstrated the functional divergence of these three bacterial clades phenotypically, including their growth dynamics and effects on host fitness. Lastly, they performed pathway analysis and speculated on key genomic variance supporting such functional divergence.

      Strengths:

      The laboratory evolutionary experiment in response to cold or hot environmental temperature is impressive, given its more than ten years of experimental time period. This collection of achieved microbiome samples paired with the fly host data can be a valuable resource for the field.

      Weaknesses:

      The laboratory evolutionary experiment can be limited due to its artificial experimental setup. For example, wild flies rely on a more diverse set of food sources and are constantly exposed to new bacterial inoculations, whereas under laboratory conditions, flies live in a more restricted ecosystem. In addition, environmental temperatures differ among different locations, but they also involve seasonal changes within the same region. This manuscript can be strengthened with further discussions that elaborate on these limitations.

      As the reviewer has correctly noted, our experimental setting is not exempt from limitations. Lab-reared flies are fed with a defined standard diet. Furthermore, although the system is not completely close to bacterial migration, this is limited as replicate populations are not allowed to mix during the maintenance of the flies. For this reason, we consider our laboratory setting as a compromise between observing wild populations, which undergo all biotic and abiotic stresses but cannot be manipulated, and evolving the bacteria in absence of the host, or in gnobiotic hosts, in which biotic interactions are not fully considered. We will extend on this in the new version of the manuscript.

      Moreover, the extent of host effects involved in these experiments remains ambiguous, because it is unclear whether these Lactiplantibacillus plantarum mostly reside within fly guts or on Drosophila medium. The laboratory evolutionary experiment possibly favored better colonizers on Drosophila medium under either cold or hot temperatures, which subsequently can saturate fly guts. As fully dissociating these variables can be experimentally tedious, the authors may want to comment more on these aspects in the discussion. Or they may want to consider some measurements. For example, measuring the growth rate of these bacteria on Drosophila medium under different temperatures, in addition to the current MRS culture experiments, or measuring the portion of the Lactiplantibacillus on Drosophila medium versus these stably colonizing fly guts.

      The reviewer's point was briefly addressed in the Results chapter: "Phenotypic differences in liquid culture".

      Reviewer #3 (Public review):

      Summary:

      The study presents an analysis of 297 pangenomes derived from 20 populations of Drosophila simulans, at 19 time points for fast-reproducing individuals in a hot environment, or at 10 time points for slow-reproducing individuals in a cold environment, over a period of more than 10 years. The authors select a particular microbial component of the pangenomes and study the dynamics of Lactiplantibacillus plantarum strains in two environments. They discover that the revealed operational taxonomic units could be divided into three phylogenetic clades, which have their own genomic and genetic features, different adaptive capabilities that depend on the environment, and have a distinct impact on the fitness of the host.

      Strengths:

      The authors prove that bacterial microbiome components are sensitive to the environment and could rapidly (years) be fixed in eukaryotic populations. This study establishes a tractable model that potentially enables the study of variability of the physiological influence of distinct strains of an important commensal species, Lactiplantibacillus plantarum, on the Drsosophila host. It is clearly shown that this single species consists of several phylogenetically and functionally diverse strains. The authors did not limit their interest to their own model, but rather they have integrated a comparative approach by analysing phylogenetic relationships among 92 described L.plantarum strains.

      Overall, the study is novel and delivers important discoveries of a longitudinal, well-replicated experiment, generating a substantial amount of genomic data. It highlights an important dimension of research that environmental selection operates at the subspecies level.

      Weaknesses:

      Even though the authors show only one particular example by conducting their longitudinal experiment, they honestly acknowledge failures important for interpretation of the biological significance of the results (gnotobiotic mono-association experiments was done with D.melanogaster, but not D. simulans) and therefore they state limitations of their conclusions (weaker effects in the non-axenic flies are due to the presence of other taxa or to higher-order interactions with other members of the microbiome). These interactions could significantly affect bacterial growth, metabolism, and physiological influence on the host.

      We agree with the reviewer in that the use gnobiotic animals is a limitation, as by "tuning" the flies' microbiome we are modifying the interactions between members, which can potentially change the phenotypic outcome. Nevertheless, we use it as a complementary approach, rather than the only inference in our study.

      The authors exploit the results of their experiment to speculate about a wide range of evolutionary phenomena, like within-species competition, ecological adaptation and evolution of the host, fitness advantage of bacteria to the host, the benefits of parasitism or mutualism, the domestication of the microbiome, etc. At the end, they conclude that their study "highlights that even subspecies diversity plays a key role in adaptation to environmental temperature". However, the potential mechanisms of such adaptation are barely discussed, so that the focus of the study shifts from the temperature-induced changes in microbial population structures toward metabolism-related adaptations of clade representatives that enable them to diversify their carbon and nitrogen sources. The role of the temperature factor remains elusive.

      We acknowledge that our study does not fully resolve the mechanism by which a different clade ends up dominating each temperature regime. The MRS liquid experiment was an attempt to answer whether differences in optimal growth temperature could explain the temperature-specific abundance of the two clades. Our experiments showed, however, thatthis was not the case. Beyond this point, it is hard to disentangle the role of the temperature, as it could also act indirectly on the bacteria, for example, through the host or the food.

      A second observation in our time series was that a third clade, U, was unfit in both regimes despite starting the experiment in high abundance. For this reason we also studied what made this clade less fit. Based on our analyses, we propose that the decrease of clade U was driven by the shift to a laboratory diet, shared by all experimental populations.

      In addition to that, the paper has a clearly minimalistic experimental approach to address functional properties of the revealed L.plantarum strains, so that their own fitness, or their relationship with the Drosophila host, is characterised superficially. Therefore, the authors' discourse can be speculative rather than factual (especially when the authors use the expression "likely" to share their guesses in the "Results" section). Nevertheless, these minor drawbacks do not underscore the novelty of the discovered phenotypes and the importance of their further investigation.

      We consider the reviewer's concern and will tone down the phrasing when reporting our findings in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Giordano et al. demonstrate that yeast cells expressing separated N- and C-terminal regions of Tfb3 are viable and grow well. Using this creative and powerful tool, the authors effectively uncouple CTD Ser5 phosphorylation at promoters and assess its impact on transcription. This strategy is complementary to previous approaches, such as Kin28 depletion or the use of CDK7 inhibitors. The results are largely consistent with earlier studies, reinforcing the importance of the Tfb3 linkage in mediating CTD Ser5 phosphorylation at promoters and subsequent transcription.

      Notably, the authors also observe effects attributable to the Tfb3 linker itself, beyond its role as a simple physical connection between the N- and C-terminal domains. These findings provide functional insight into the Tfb3 linker, which had previously been observed in structural studies but lacked clear functional relevance. Overall, I am very positive about this manuscript and offer a few minor comments below that may help to further strengthen the study.

      We appreciate the reviewer’s positive assessment of our work and suggestions for improvement.

      (1) Page 4

      PIC structures show the linker emerging from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits, followed by a turn and a short stretch of helix just N-terminal to a disordered region that connects to the C-terminal region (see schematic in Figure 1A).

      The linker helix was only observed in the poised PIC (Abril-Garrido et al., 2023), not in other fully-engaged PIC structures.

      Thanks for clarifying. We note that some structures of TFIIH alone also see the long helix. Accordingly, we modified this section to read:

      “In many TFIIH and PIC structures the linker is not visible, presumably due to flexibility. However, when it is seen (Abril-Garrido et al., 2023; Greber et al., 2019), the linker emerges from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits…”

      (2) Page 8

      Recent structures (reviewed in (Yu et al., 2023)) show that the Kinase Module would block interactions between the Core Module and other NER factors. Therefore, TFIIH either enters into the NER complex as the free Core Module, or the Kinase Module must dissociate soon after.

      To my knowledge, this is still controversial in the NER field. I note the potential function of the kinase module is likely attributed to the N-terminal region of Tfb3 through its binding to Rad3.

      We are not experts on NER, but in reviews of the field this appears to be a widely held assumption. A 2008 paper from the Egly lab (Coin et al., DOI 10.1016/j.molcel.2008.04.024) is usually cited, which shows that the interaction between XPD (metazoan Rad3) and XPA is likely incompatible with XPD-MAT1 interaction. In addition to the Yu 2023 review, we now also cite a more recent publication that more extensively reviews the models for core TFIIH interactions (van Sluis et al, 2025). We looked at the multiple recently published structures of various TCR-NER and GG-NER intermediate complexes, and none of them show the CAK module or even the Tfb3/Mat1 N-term, even though those proteins were typically included during assembly. We also consulted with our colleagues Johannes Walter and Lucas Farnung, who are studying various TC-NER intermediates biochemically and structurally. Although the CAK module is included in their assembly reactions, it is not visible in their cryoEM structures. They tell me that the presence of CAK would be compatible with early TC-NER intermediates, but is predicted to overlap with later interactions of XPD with the TC-NER factor STK19 (see Mevissen et al., Cell 2024). To be conservative, we modified the sentence to say “Recent structures … suggest” rather than “show”.

      Because the yeast strains used in Figure 6 retain the N-terminal region of Tfb3, the UV sensitivity assay presented here is unlikely to directly address the contribution of the kinase module to NER.

      We agree that our experiment only shows that the connection between Tfb3 N- and C-term domains is not necessary for NER. The individual domains might still be able to function independently. Accordingly, we changed the heading of that section from “Disconnected core TFIIH does not cause an NER defect” to “Split Tfb3 does not cause an NER defect.” This more closely matches the figure legend title.

      (3) Page 11

      Notably, release of the Tfb3 Linker contact also results in the long alpha-helix becoming disordered (Abril-Garrido et al., 2023), which could allow the kinase access to a far larger radius of area. This flexibility could help the kinase reach both proximal and distal repeats within the CTD, which can theoretically extend quite far from the RNApII body.

      Although the kinase module was resolved at low resolution in all PIC-Mediator structures, these structural studies consistently reveal the same overall positioning of the kinase module on Mediator, indicating that its localization is constrained rather than variable. This observation suggests that the linker region may help position the kinase module at this specific site, likely through direct interactions with the PIC or Mediator. This idea is further supported by numerous cross-links between the linker region and Mediator (Robinson et al., 2016).

      That is true. But please note that this sentence was meant to describe movement of the kinase module AFTER release from Mediator (see previous sentence). Re-reading the passage, we realized the confusion is because we propose multiple possible pathways in that paragraph. In the first half, we suggest the capture of the kinase module by Mediator might trigger the conformation changes in the linker. In the second half (where it says “Alternatively….”) we suggest the Mediator-CAK interaction could instead come first, and the release of this contact could free the CAK module to move around. We have modified the paragraph to make it clear these are two different distinct models.

      Reviewer #2 (Public review):

      Summary:

      This work advances our understanding of how TFIIH coordinates DNA melting and CTD phosphorylation during transcription initiation. The finding that untethered kinase activity becomes "unfocused," phosphorylating the CTD at ser5 throughout the coding sequence rather than being promoter-restricted, suggests that the TFIIH Core-Kinase linkage not only targets the kinase to promoters but also constrains its activity in a spatial and temporal manner.

      Strengths:

      The experiments presented are straightforward, and the models for coupling initiation and CTD phosphorylation and for the evolution of these linked processes are interesting and novel. The results have important implications for the regulation of initiation and CTD phosphorylation.

      Weaknesses:

      Additional data that should be easily obtainable and analysis of existing data would enable an additional test of the models presented and extract additional mechanistic insights.

      We thank the reviewer for the positive assessment and address their specific suggestions below.

      Reviewer #3 (Public review):

      Summary:

      Eukaryotic gene transcription requires a large assemblage of protein complexes that govern the molecular events required for RNA Polymerase II to produce mRNAs. One of these complexes, TFIIH, comprises two modules, one of which promotes DNA unwinding at promoters, while the other contains a kinase (Kin28 in yeast) that phosphorylates the repeated motif at the C-terminal domain (CTD) of the largest subunit of Pol II. Kin28 phosphorylation of Ser5 in the YSPTSPS motif of the CTD is normally highly localized at promoter regions, and marks the beginning of a cycle of phosphorylation events and accompanying protein association with the CTD during the transition from initiation to elongation.

      The two modules of TFIIH are linked by Tfb3. Tfb3 consists of two globular regions, an N-terminal domain that contacts the Core module of TFIIH and a C-terminal domain that contacts the kinase module, connected by a linker. In this paper, Giordano et al. test the role of Tfb3 as a connector between the two modules of TFIIH in yeast. They show that while no or very slow growth occurs if only the C-terminal or N-terminal region of Tfb3 is present, near normal growth is observed when the two unlinked regions are expressed. Consistent with this result, the separate domains are shown to interact with the two distinct TFIIH modules. ChIP experiments show that the Core module of TFIIH maintains its localization at gene promoters when the Tfb3 domains are separated, while localization of the kinase module and of Ser5 phosphorylation on the CTD of Pol II is disrupted. Finally, the authors examine the effect of separating the Tfb3 domains on another function of TFIIH, namely nucleotide excision repair, and find little or no effect when only the N-terminal region of Tfb3 or the two unlinked domains are present.

      Strengths:

      Experiments involving expression of Tfb3 domains in yeast are well-controlled, and the data regarding viability, interaction of the separate Tfb3 domains with TFIIH modules, genome-wide localization of the TFIIH modules and of phosphorylated Ser5 CTDs, and of effects on NER, are convincing. The experiments are consistent with current models of TFIIH structure and function and support a model in which Tfb3 tethers the kinase module of TFIIH close to initiation sites to prevent its promiscuous action on elongating Pol II.

      We appreciate that the reviewer finds that our main conclusions are convincing.

      Weaknesses:

      (1) The work is limited in scope and does not provide any major insights into the mechanism of transcription. One indication of this limitation is that in the Discussion, published structural and functional results on transcription are used to support the interpretations of the results here more than current results inform previous models or findings.

      The story we present here is pretty simple, so in that sense we agree it is limited. However, we believe the findings do have mechanistic implications. That the Tfb3/Mat1 tether not only targets kinase activity to the 5’ end, but also somehow limits it from acting downstream seems significant. As for the Discussion, in our papers we always attempt to tie in our results and models with as much of the relevant published literature as possible. We believe this is more interesting, useful, and convincing than simply summarizing the Results section.

      (2) The first described experiment, which purports to show that three kinases cannot function in place of Kin28 when tethered (by fusion) to Tfb3, is missing the crucial control of showing that Kin28 can support viability in the same context. This result also does not connect with the rest of the manuscript.

      Our original motivation for the experiment in Figure 1 was to develop a system where we could plug different kinases into the CTD-proximal position. This didn’t work, so it is true that this negative result is somewhat unconnected to the rest of the paper. We choose to include it because it produced the unexpected observation that the Tfb3 C-term domain was not essential for viability, contradicting an earlier report. As for the suggested control of fusing Kin28, please see our reply to the editor’s comments below.

      (3) Finally, the authors present the interesting and reasonable speculation that the TFIIH complex and connecting Tfb3 found in mammals and yeast may have evolved from an earlier state in which the two TFIIH subdomains were present as unconnected, distinct enzymes. This idea is supported by a single example from the literature (T. brucei). A more thorough evolutionary analysis could have tested this idea more rigorously.

      Please see our full reply to Point 5 in the editor’s comments. In short, T. brucei was the only primitive eukaryote for which h we found an actual biochemical analysis of TFIIH. However, we now cite some papers reporting protein sequence comparisons for organisms not having a consensus CTD, which lend further support to our idea of fusion of a CDK to TFIIH co-evolved with the CTD during very early in eukaryotic evolution.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Suggestions for Improvement:

      (1) Analyze existing Pol II ChIP-seq data to determine whether reduced TSS-proximal vs. gene-body occupancy observed with the split Tfb3 alleles reflects initiation defects, and whether different gene classes (high vs. low expression, stress-induced genes) show differential effects of splitting Tfb3.

      Thanks for the suggestion. The new analysis is included as Supplemental Figure S6. Several factors indicate an initiation defect rather than an elongation defect (either elongation processivity or elongation rate). First, the shape of the RNApII occupancy trace is flat in all mutants, arguing against a processivity defect, which would have led to a downward slope due to RNApII progressively dropping off from the gene. Because this effect is best seen on long genes (more than 2kb), we generated metagene profiles on long, well-expressed genes only, which led to the same conclusion (see Sup Fig 6A). Second, the mutants lead to decreased RNApII occupancy, arguing against a strong decrease in elongation rate, which -if anything- would have led to an increase in RNApII during early transcription. While we cannot completely exclude the possibility of a mild decrease in elongation rate, such an effect doesn’t fit the patterns we observe. The overall decrease of RNApII occupancy is rather a strong indication of a decrease in early steps (PIC assembly or initiation).

      As requested, we looked at potential differences between gene classes two ways. First, we generated RNApII metagenes on RNApII occupancy quintiles (Q1-Q5). As shown in Sup Fig 6B, RNApII occupancy is similarly decreased in all mutants for all quintiles, demonstrating that the effect of Tfb3 splitting on transcription is not linked to expression level. Second, we generated RNApII occupancy metagenes for TFIID-regulated genes and coactivator redundant (CR) genes. This classification from the Hahn lab (doi:10.7554/eLife.50109) is very similar to the one developed by the Pugh lab (doi:10.1016/s1097-2765(04)00087-5). TFIID-regulated genes are enriched for housekeeping genes and are typically devoid of a TATA box, while the CR genes tend to be highly regulated and to contain a TATA box. As shown in Sup Fig 6C, the effect of the Tfb3 split mutants is similar on both gene classes.

      (2) Determine whether Kin28 abundance in whole cell extracts is reduced by splitting Tfb3, as a factor in reducing its occupancies at gene promoters.

      We actually did test for Kin28 and Ccl1 levels in the extracts when we did the IP experiment shown in Fig 3. We ran the extracts next to the precipitated factors. Unfortunately, as can be seen on the bottom blot, our antibodies were not strong enough to detect either Kin28 or Ccl1 in extracts, even with WT Tfb3. Although we don’t include this inconclusive result in the final paper, we show it in Author response image 1 (note that extracts are labeled as “IgG input”).

      Author response image 1.

      (3) Include the key positive control construct of replacing the C-term of Tfb3 with Kin28 in the experiments of Figure 1.

      We elected not to do this experiment for several reasons. As reviewer 3 points out, this kinase fusion experiment turned out to be somewhat disconnected from the rest of the paper. Even though it didn’t work, we included it in the paper because the results led us to the realization that the Tfb3 C-term was actually not fully essential for viability as reported, which in turn led us to the idea of splitting Tfb3. Structural studies (https://doi.org/10.1126/sciadv.abd4420, https://doi.org/10.1073/pnas.2009627117, https://doi.org/10.7554/eLife.44771) show that, in addition to providing linkage to the core module, the C-term of Tfb3 induces a conformation change in Kin28/Cdk7 necessary for full kinase activity (which is likely why the strains without C-term are just barely viable). If we were to pursue why the fusions didn’t work, we could tether Kin28 directly to the Tfb3 linker (and may try this in the future), but then would need to also express the C-term separately for its activating function. Even then, this would be an imperfect control for the fusion experiments in Figure 1. Because were trying to best mimic Kin28 being tethered via the accessory subunit Tfb3/Mat1, in the Figure 1 experiment we did not directly attach the kinases to Tfb3. For Ctk1/Cdk12, we fused the Tfb3 linker to the Ctk3 accessory subunit (analogous to Tfb3), and for Bur1/Cdk9, we fused to the cyclin subunit Bur2 (there is no known third subunit in this complex). The one exception was Mpk1, which has no partner subunits and is not a CDK. There are many reasons why this high-risk protein fusion experiment may not have worked, but we feel it’s not that useful to pursue it in this paper.

      (4) Provide direct evidence for the claimed dominant negative effect of the N-term-Linker construct by extending results in Figure 2C to compare growth of WT TFB3 cells expressing this construct vs. vector alone.

      We thank the reviewers for this suggestion. We tested this by transforming high copy plasmids expressing the different Tfb3 truncations into cells expressing the WT Tfb3. We did not see a clear dominant negative effect (some colonies were small, but many looked normal). Accordingly, in the absence of a reproducible effect, we removed this claim from the paper. In Fig 2C, the WT plasmid was transformed into cells already expressing the truncation on a high copy plasmid (the opposite order of our new experiment). It’s possible that phenotypes vary depending on which plasmid was there first (2 micron plasmids have variable copy number and can compete with each other for replication and passage during cell division). In any case, in the face of ambiguous results we no longer claim a dominant negative effect of the N-term-Linker protein. This was a minor side-point of the paper and does not affect any of our other conclusions.

      (5) Expand the evolutionary analysis to provide evidence beyond the case of T. brucei that the Tfb3-mediated connection between core and kinase modules is an evolutionary addition to the ancestral state.

      We note that the two papers we cited for the lack of a CAK module in T. brucei reached that conclusion based on purification of its TFIIH complex. We were unable to find similar biochemical studies in other primitive eukaryotes. Another way to expand the evolutionary comparison would be through sequence homology searches. We attempted to do this using various tools available at NCBI and EMBL. These show that Tfb3/Mat1 is found extensively throughout eukaryotes. Unfortunately, because the NTD of Tfb3 is a RING domain, homology searches in primitive eukaryotes yield a number of weak matches in the zinc binding motif, but no way of knowing if any of these are related to TFIIH. Similarly, searches with Cdk7/Kin28 or Cyclin H/Ccl1 pulls up all CDKs and cyclins, with roughly equal statistical similarity to the yeast kinase/cyclin. Someone with more experience with evolutionary analysis would likely have better luck, but our efforts were inconclusive. However, we did find two papers from Guo and Stiller (2004 and 2005) that analyzed genome sequences available at the time and reached the conclusion that both concensus CTD and the CAK module are absent in the evolutionary branch of primitive eukaryotes that contains T. brucei and Giardia lamblia. We also found papers identifying a putative Mat1/Tfb3 in Plasmodium falciparum, although this protein was not yet shown to be associated with TFIIH. We now cite these papers in the discussion of our evolutionary hypothesis.

      (6) Include Western blot analysis of the Tfb3 chimeras and truncations analyzed in Figures 1-2 to determine if poor expression contributes to any of the poor-growth phenotypes.

      The western blot of the Tfb3 fusions used in Figure 1 is shown in Sup Fig 1. The Tfb3 truncations are shown in the Input panel of Fig 3A (although some of these are TAP fusions, the growth phenotypes did not change with TAP-tagging). In general, all the fusions and truncations are detectable but possibly reduced relative to WT Tfb3. Note that the anti-Tfb3 antibody is a polyclonal made against recombinant Tfb3, and we don’t know that the reactive epitopes are distributed equally throughout the protein, so it’s difficult to be confident about relative quantitation with partial Tfb3 proteins.

      (7) Provide direct evidence that the N-terminal Tfb3 segment interacts exclusively with the core TFIIH module and not Kin28, analogous to the opposite results shown in Figure 3B and 4A-B for the C-terminal domain.

      This could be interesting, but we elected not do this experiment due to time and manpower limitations. Since the N-term is unambiguously essential for viability, we can assume it retains at least some interactions with core TFIIH (unless the N-term has some other essential function that hasn’t been discovered).

      (8) Confirm that the Ser5P phosphorylation levels given by the different Tfb3-TAP immune complexes are all much higher than the background level observed with control complexes prepared with extracts expressing WT, untagged Tfb3.

      We should have done this control in Sup Fig 2B, especially since we did pull down the beads from the untagged strain as shown in panel A. We haven’t seen appreciable kinase activity when we’ve done this control in the past, so we feel confident the signals seen are not background. Therefore, we elected not to repeat this experiment.

      (9) Conduct an in vitro reconstitution comparing the activity of free kinase module and intact TFIIH on elongating RNA polymerase II in directing promoter-localized vs. downstream Ser5P accumulation.

      This would be a nice experiment, but would require a substantial amount of work that is beyond our resources at the time.

      (10) Revise the text to better emphasize any novel mechanistic insights afforded by the work and address all other minor comments/criticisms.

      Done, as addressed in all the other comment replies.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors suggest that their results support model 3, in which intact TFIIH restrains kinase activity outside the PIC. Directly testing this model would be a significant addition and would strengthen the proposed mechanism. An in vitro reconstitution comparing the activity of the free kinase module and intact TFIIH on elongating RNA polymerase II (or, at a minimum, purified Pol II) would directly test the mechanism underlying downstream Ser5P accumulation.

      Sup Fig 2 addresses this point to some extent, since we the TAP pull-down of full-length Tfb3 precipitates at least some intact TFIIH, whereas the split C-term TAP constructs do not (as shown in Fig 4). However, this is not a very quantitative assay and we agree with the reviewer that a careful reconstitution, especially in the context of real transcription, would be far better. Unfortunately, this is currently beyond our capabilities. However, in the Discussion we do cite some published data arguing that association of the core TFIIH does have some inhibitory effect on the kinase module. First, in our 2002 MCB paper (Keogh et al., see Fig 7) using a GST-CTD kinase assay, we found that free kinase module (called TFIIK there) was strongly active even with a non-phosphorylatable mutation in the activating T-loop. In contrast, the same mutation inactivated CTD kinase activity in the intact TFIIH. Similarly, the Taatjes lab (Rimel et al., Genes Dev. 2020) found that free CAK was active on multiple substrates that were not phosphorylated by the full TFIIH complex.

      (2) Experiments from Carl Wu's laboratory (Nguyen et al., 2021) showed that there is a significant amount of apparently free Kin28 as well as free TFIIH in cells. Please reference and comment on this when discussing the model, suggesting that TFIIH is mostly sequestered at promoters.

      Good point. We added this to the discussion where we discuss the arguments against a sequestering model.

      (3) The existing ChIP-seq data could be analyzed more thoroughly to extract additional mechanistic insights. Specifically: (i) quantify TSS-proximal vs. gene body Pol II to determine if reduced occupancy reflects initiation defects (ii) analyze whether gene classes (high vs. low expression, stress-induced genes) show differential effects.

      Thanks for the suggestion. We did this and show the results as a new Supplemental Figure 6. No differences were found. Please see our response to the Editor’s comment #1 for a fuller description.

      (4) The complete loss of Kin28 ChIP signal in mutant strains (Figure 5B) could reflect kinase mislocalization or reduced protein abundance. Figure 3B examines TAP-purified material but does not address total cellular protein levels. Examining whole-cell extracts for Kin28 and Ccl1 in all strains would strengthen the interpretation of the ChIP results.

      As described in our response to Point 2 in the Editor’s comments section, we did do this control. Unfortunately, the Kin28 and Ccl1 antibodies were not strong enough to detect these proteins in extracts before precipitation.

      Reviewer #3 (Recommendations for the authors):

      (1) The experiment of Figure 1 should be repeated with a Tfb3-Kin28 positive control or dropped from the manuscript.

      This could be an interesting experiment, but please see our response to Editor comment #3 for why we decided to keep the figure as is.

      (2) Figure 2C legend doesn't mention linker C-term low copy construct.

      Thanks for catching that error. It is now fixed.

      (3) The claim that the N-term linker has a dominant negative effect (Figure 2C) requires direct comparison (growth on the same plate) of TFB3+ cells with and without expression of the N-term linker.

      As detailed in the response to the Editor’s comment #4, we did this test. The results did not support a dominant negative phenotype, so we removed this claim. Thanks for helping us avoid a mistake.

      (4) Page 7, "Supplementary Fig. S4A, B, promoters in green boxes" should read "Supplementary Fig. S5A, B, promoters in green boxes".

      Thanks for catching that error. It is now fixed.

      (5) Readers might be concerned that the ChIP-seq signal observed in Figure 5 and S5 could reflect an artifactual signal over highly transcribed regions. The different distributions of Rpb1, Ser5p, and Ser2p argue against this. This might be worth mentioning in the text.

      Thanks for raising this issue. “Hyper-ChIPpable” genes can be a problem in metagene analysis. We now include the analysis suggested by Reviewer 2 where we separately look at genes with different transcription frequencies. Seeing the same relative patterns regardless of expression level makes us confident that the results are not artifactual.

      (6) p. 12, "the Tfb3 the linker"; "In contrast, The N-term linker"; "suggest" should be "suggests"

      We appreciate the reviewer’s careful reading of the manuscript and have corrected these typos.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The detailed genetic analysis of two cancer genes (BRCA1 and BRCA2) demonstrated new roles for these genes in causing the tumor microenvironment in lung cancer. Further experimental explorations of the immune-related changes may still be required. The solid findings of this study provide a foundation for further developing drugs targeting BRCA1/2 in lung cancer therapy.

      We would like to express our sincere gratitude for your thoughtful and constructive comments on our manuscript. We carefully considered each comment from these two reviewers and revised the manuscript accordingly. Below, we provided a point-by-point response to each comment.

      Reviewer #1 (Public review):

      Summary:

      Liao et al. performed a large-scale integrative analysis to explore the function of two cancer genes (BRCA1 and BRCA2) in lung cancer, which is one of the cancers with an extremely high mortality rate. The detailed genetic analysis demonstrated new roles of BRCA1/2 in causing the tumor microenvironment in lung cancer. In particular, the discovery of different mechanisms of BRCA1 and BRCA2 provides an essential foundation for developing drugs that target BRCA1 or BRCA2 in lung cancer therapy.

      Strengths:

      (1) This study leveraged large-scale genomic and transcriptomic datasets to investigate the prognostic implications of BRCA1/2 mutations in LUAD patients (~2,000 samples). The datasets range from genomics to single-cell RNA-seq to scTCR-seq.

      (2) In particular, the scTCR-seq offers a powerful approach for understanding T cell diversity, clonal expansion, and antigen-specific immune responses. Leveraging these data, this study found that BRCA1 mutations were associated with CD8+ Trm expansion, whereas BRCA2 mutations were linked to tumor CD4+ Trm expansion and peripheral T/NK cell cytotoxicity.

      (3) This study also performed a comprehensive analysis of genomic variation, gene expression, and clinical data from the TCGA program, which provides an independent validation of the findings from LUAD patients newly collected in this study.

      (4) This study provides an exemplary integration analysis using both computational biology and wet bench experiments. The experimental testing in the A549 cell line further supports the robustness of the computational analysis.

      (5) The findings of this study offer a comprehensive view of the molecular mechanisms underlying BRCA1 and BRCA2 mutations in LUAD. BRCA1 and BRCA2 are two well-known cancer-related genes in multiple cancers. However, their role in shaping the tumor microenvironment, particularly in lung cancer, is largely unknown.

      (6) By focusing on PD-L1-negative LUAD patients, this study demonstrated the molecular mechanisms underlying resistance to immune therapy. These new insights highlight new opportunities for personalized therapeutic strategies to BRCA-driven tumors. For example, they found histone deacetylase (HDAC) inhibitors consistently downregulated 4-R genes in A549 cells.

      (7) The deposition of raw single-cell sequencing (including scRNA-seq and scTCR-seq) data will provide an essential data resource for further discovery in this field.

      Weaknesses:

      (1) The finding of histone deacetylase (HDAC) inhibitors suggests the potential roles of epigenetic regulation in lung cancer. It would be interesting to explore epigenetic changes in LUAD patients in the future.

      Thank you for your insightful comment. We fully agree that the specific situation of epigenetic dysregulation in LUAD needs to be explored. We believe that future investigations utilizing clinical specimens and animal models to map histone acetylation patterns and DNA methylation profiles were crucial for identifying novel biomarkers and therapeutic targets unique to LUAD.

      (2) For some methods, more detailed information is needed.

      This is a valid point. We agree that additional details regarding are necessary for clarity and reproducibility. We have expanded these method details in the revised manuscript.

      (3) There are grammar issues in the text that need to be fixed.

      We apologize for our irregular use of grammar. In the revised manuscript, we carefully checked the grammar and make corrections.

      (4) Some text in the figures is not labeled well.

      We appreciate the reviewers' comments. We have added labels to the revised version of the figures.

      Reviewer #2 (Public review):

      Summary:

      This study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The work highlights distinct roles of BRCA1 and BRCA2 mutations in shaping immune-related processes, and is logically structured with clearly presented analyses. However, the conclusions rely primarily on descriptive computational analyses and would benefit from additional immunological validation.

      Strengths:

      By integrating public datasets with in-house data, this study examines the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma from multiple perspectives using multi-omics approaches. The analyses are diverse in scope, with a clear overall logic and a well-organized structure.

      Weaknesses:

      The study is largely descriptive and would benefit from additional immunological experiments or validation using in vivo models. The fact that the BRCA1 and BRCA2 samples were each derived from a single patient also limits the robustness of the conclusions.

      Thank you for this excellent suggestion. In the revised manuscript, we supplemented the additional immunological experiments and validation based on pathological tissue sections of lung adenocarcinoma patients. In addition, we elaborated on the limitations of our study in the Discussion section and provided reasonable explanations.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The abstract includes a lot of abbreviations, which makes it difficult to follow. For example, "IFN" is not defined. And "HRR" is defined but used only once in the abstract. This issue also appears in other parts, such as "OAK" on page 5, line 114; "DFS" on page 15, line 398; and "DSBs" on page 20, line 558. Please try to avoid unnecessary abbreviations.

      Thank you for highlighting this. We have revised the manuscript to minimize the use of abbreviations. Specifically, we have now defined all necessary abbreviations upon first mention (including 'IFN') and have removed or spelled out those used infrequently to ensure the text flows more smoothly for the reader.

      (2) Page 5, line 129, what data type is used in this part analysis?

      We apologize for our negligence. The whole exome sequencing data used here has been added in the revised manuscript.

      Materials and methods, page 6, lines 131-132: “The raw reads (fastq) of whole exome sequencing were pre-processed and trimmed with fastp (Version: 0.23.4) based on default parameters.”

      (3) Page 6, line 138, Add citation for ANNOVAR.

      Thank you for your suggestion. We have added a citation for ANNOVAR in the revised manuscript.

      (4) Page 8, line 211, what cutoff is used to define the significant makers?

      Thank you for your insightful comment. We provided the cutoff used to define significant markers.

      Materials and methods, page 8, lines 213-215: “Differential expression genes for specific clusters were identified using the “FindMarkers” function, with a threshold of |avg_log2FC| ≥ 0.5 and adjusted P-value ≤ 0.01.”

      (5) Page 11, line 276, HEK293T is not a lung cancer cell line. It would be better to label the details of this cell line.

      Thank you for your correction. We have now clarified HEK293T in the text by stating: 'human embryonic kidney cell line HEK293T'.

      Materials and methods, page 11, lines 277-278: “The human lung cancer cell line A549 (#SCSP-503) and the human embryonic kidney cell line HEK293T (#SCSP-502) were purchased from the Type Culture Collection of the Chinese Academy of Sciences, China.”

      (6) Page 16, line 415, what samples and how many individuals were used for the exome sequencing?

      We agree that specifying the sample set is crucial. The exome sequencing was conducted on 2 individuals (four samples). The samples used were tumor tissues (2 samples) and matched blood (2 samples). This information has been clarified in the revised manuscript.

      Results section, page 16, lines 415-416: “Exome sequencing was performed on four samples from two individuals: two tumor tissues and two matched blood samples.”

      (7) Page 17, line 468, Replace "Differently" with "In contrast" (more appropriate for scientific writing).

      Thank you for pointing this out. We agree that "In contrast" is more appropriate for scientific writing. Accordingly, we have replaced "Differently" with "In contrast" in this sentence (Results section, page 18, line 483).

      (8) Page 18, line 489, what is HMG?

      Thank you for pointing this out. HMG stands for High Mobility Group. We have clarified this by writing out the full term upon first mention in the manuscript (Results section, page 19, line 503).

      (9) Page 19, line 527, check the grammar for this sentence.

      We appreciate your careful reading. We have carefully rephrased this sentence to ensure clarity and grammatical accuracy.

      Results section, page 20, line 540: “Based on pseudotime order, we divided trajectories into 10 bins and analyze the activity changes of related features.”

      (10) Page 20, line 541-546. It would be better to split this long sentence into smaller ones.

      Thank you for your insightful comment. We have revised the text, splitting the long sentence into smaller ones for better clarity.

      Results section, page 20, lines 554-559: “MHC class I and II molecules showed increased activity in late pseudotime in BRCA1- and BRCA2-mutant cells, respectively (Fig. 4G-I). This pattern was also reflected in the cell density analysis (Fig. 4J). Furthermore, CD8<sup>+</sup> Tcm and Th1 signatures exhibited higher activity in late pseudotime in BRCA1- and BRCA2-mutant cells, respectively (Fig. S5F-G). These findings suggest a differential association with CD8<sup>+</sup> versus CD4<sup>+</sup> T cell engagement.”

      (11) Page 20, line 550, remove "." after "of".

      Thank you for catching this. We have removed it (Results section, Page 21, line 563).

      (12) Page 22, line 592, what is "LME"?

      Thank you for pointing this out. "LME" was indeed redundant in the original manuscript, so we have removed it in the revised version (Results section, Page 22, lines 607-609).

      (13) Page 24, line 674, Replace "suggest" with "suggested"?

      We apologize for our negligence. In the revised manuscript, we have replaced "suggest" with "suggested" (Results section, Page 25, lines 691-693).

      (14) Page 35, Figure 1I, Use "B cells" instead of "B".

      Thank you for your detailed review. We have changed to the appropriate label in Figure 1I.

      (15) Page 36, Figure 2H, the statistics and p-value are needed to show.

      Thank you for your suggestion. We have added the statistical analysis for Figure 2H, and the p-values were indicated in the revised Figure.

      Special thanks to you for your kind comments.

      Reviewer #2 (Recommendations for the authors):

      Major:

      (1) Line 44. In the Introduction section, a brief description of the prevalence of HRD or BRCA1/2 mutations in lung cancer patients should be included to highlight the significance of the study.

      This is an excellent suggestion. We revised the Introduction section (page 3, lines 61-64) to include a brief overview of the prevalence of BRCA1/2 mutations specifically in lung cancer patients. We believe this addition will strengthen the background for readers.

      Introduction section, page 3, lines 61-64: “Among the key genetic mutations that drive LUAD, BRCA1 and BRCA2 mutations (with prevalence rates of approximately 4% and 5%, respectively) have been increasingly implicated in the pathogenesis and progression of lung cancer [9, 13].”

      (2) Line 302-355. There are relatively serious grammatical issues, and substantial revision of the text is recommended.

      We acknowledge the grammatical issues in the original text. We have now carefully revised the Materials and methods section of the manuscript (pages 11-14, lines 277-358) to correct these issues and improve the overall readability. We believe the revised version is significantly improved.

      (3) Line 375. The Results section lacks detailed information on the specific BRCA1/BRCA2 mutations and data explaining how these mutations lead to functional alterations of BRCA1/2.

      Thank you for your insightful comment. In the revised manuscript, we added the amino acid changes caused by the specific BRCA1/BRCA2 mutation sites and expand the text to discuss the predicted and known pathogenic mechanisms of these variants (Results section, page 16, lines 420-433).

      Results section, page 16, lines 420-433: “Exome sequencing data show that these two types of tumor tissues harbor somatic nonsynonymous single nucleotide variants (SNV) in BRCA2 (p.N372H) and BRCA1 (p.E991G, p.S1566G, p.K1136R, p.P824L, and p.Y809H), respectively (Table S1). The BRCA2 p.N372H variant lies within the BRC3 or BRC4 motifs critical for RAD51 binding. It may alter binding affinity, impair high-fidelity homologous recombination repair, and promote genomic instability [39-41]. In BRCA1, mutations are distributed across two key functional domains: the Coiled-Coil domain (e.g., p.E991G, p.Y809H, p.P824L) and the BRCT domain (e.g., p.K1136R, p.S1566G). Coiled-Coil mutations disrupt BRCA1-PALB2-BRCA2 complex assembly, impairing localization to DNA damage sites and subsequent RAD51 recruitment; BRCT domain mutations compromise phospho-protein recognition and G2/M checkpoint control, leading to defective DNA damage response and unchecked proliferation of damaged cells [42-44]. Together, these defects promote the accumulation of genomic scars and chromosomal instability.”

      (4) Line 492-498. Changes in genes associated with BRCA1 and BRCA2 mutations should be validated by immunofluorescence.

      Thank you for your insightful comment. Immunofluorescence would provide valuable orthogonal validation of the protein-level consequences of these mutations. To address this, we obtained pathological tissue sections from patients carrying BRCA1/2 mutations and performed immunofluorescence staining for S100A10, a risk gene associated with BRCA1 mutations. We found that S100A10 was upregulated in BRCA1-mutated tumor tissue compared to adjacent non-cancerous tissue.

      Results section, page 24, lines 673-675: “Immunofluorescence experiments on patient tissue sections revealed that S100A10 was upregulated in BRCA1-mutated tumor tissue relative to adjacent non-cancerous tissue (Fig. S11D-E).”

      (5) Line 538. Although both BRCA1 and BRCA2 deficiencies impair DNA damage repair, BRCA1, but not BRCA2, activates the cGAS-STING pathway. This is a particularly interesting observation and should be validated by immunofluorescence experiments.

      Thank you for highlighting this observation. To address this, we conducted immunofluorescence experiments to quantify STING, the key protein of cGAS-STING pathway, in BRCA1- and BRCA2-deficient tissues to confirm this phenotype. We have included these results in the revised manuscript.

      Results section, page 21, lines 578-584: “Furthermore, our results revealed that BRCA1-mutant tumors showed higher activity of cGAS-STING signaling and STING mediated induction of host immune responses compared to BRCA2-mutant tumors (Fig. 5G and Fig. S6F). Also, cGAS-STING signaling gens, including cGAS, STING1, and downstream factors STAT1 and CCL5, were upregulated in BRCA1-mutant tumor cells (Fig. 5H). This observation was validated through immunofluorescence staining experiments on patient tumor tissue sections (Fig. 5I-J).”

      (6) Line 599. "CD8+ Trm cells were more abundant in BRCA1-mutant sample, whereas CD4+ Trm cells were higher in BRCA2-mutant sample". This part is also recommended to be validated using immunofluorescence or more rigorous flow cytometry analyses.

      We sincerely appreciate this insightful suggestion. To address this, we performed immunofluorescence staining to quantify the abundance of CD8<sup>+</sup> and CD4<sup>+</sup> Trm cells in BRCA1- and BRCA2-mutant tissues. We have included these results in the revised manuscript.

      Results section, page 22, lines 614-617: We identified two tissue-resident memory T cell (Trm) subsets, CD8<sup>+</sup> Trm and CD4<sup>+</sup> Trm, both predominantly derived from tumor tissues (Fig. 6B). “Interestingly, our analysis revealed that CD8<sup>+</sup> Trm cells were more abundant in BRCA1-mutant tumor, whereas CD4<sup>+</sup> Trm cells were more abundant in BRCA2-mutant tumor (Fig. 6B-D, Fig. S7D, and Fig. S8A-B).”

      (7) Line 643-676. The authors identified four risk genes associated with BRCA1 mutations-S100A10, LDHA, MYL12A, and GAPDH; however, MYL12A was not validated in the subsequent in vitro experiments. The authors state that "S100A10 can promote cancer metastasis by recruiting MDSC cells, and increased LDHA activity contributes to tumor immune escape." However, because immune cells were not included in the in vitro assays, these results instead suggest that these genes may directly suppress tumor cell proliferation.

      We thank the reviewer for this insightful observation. Our intention was not to suggest that the reduction in proliferation observed in our in vitro assays was caused by the disruption of immune cell recruitment or immune escape pathways. As the reviewer correctly points out, those mechanisms are irrelevant in a system lacking immune cells. Our results showing that "Knockdown of S100A10, LDHA, and GAPDH reduced LUAD cell proliferation in vitro (Fig. 7D-E)" strongly suggest a direct, cell-autonomous role for these genes in regulating LUAD cell growth. For the MYL12A gene, the existing study have shown that BRCA1 transcriptionally regulates this gene involved in breast tumorigenesis (PMID: 12032322). In view of the characteristics of MYL12A in lung cancer, we will conduct in-depth in vitro and in vivo validation experiments in future studies.

      (8) Line 677. The authors should emphasize the limitations arising from the small sample size and the lack of in vivo validation models in the Discussion section.

      Thank you for highlighting these important limitations. We agree that the small sample size and the lack of in vivo validation are significant limitations of the current study. We have explicitly addressed these points in the Discussion section (page 27, lines 740-750) to ensure the interpretation of our data is appropriately qualified and to provide transparency regarding the scope of our conclusions.

      Discussion section, page 27, lines 740-750: “Although we included both tumor tissues and matched paracancerous and blood samples, the sample size remains modest, which may limit the statistical power and generalizability of our findings. Therefore, our results should be interpreted as preliminary, and further studies with larger, independent cohorts are required to validate these observations. Single-cell RNA-seq and TCR-seq analyses in this study provide high-resolution insights into the cellular and clonal dynamics of the TME, the functional validation of key mechanisms remains largely correlative. While our in vitro experiments provide valuable mechanistic insight, the lack of in vivo validation, which cannot fully recapitulate the complex TME. Future studies utilizing murine models or patient-derived organoids are essential to establish causal relationships and elucidate the underlying molecular pathways.”

      Minor:

      (1) Line 163: cell/μl should be corrected to cells/μL.

      Thank you for catching this. We have corrected it in the revised manuscript (Methods section, page 7, line 165).

      (2) Line 388: Please clarify how the HRD score, tumor mutation burden, and neoantigen load were calculated.

      We thank the reviewer for this request for clarification. In the revised manuscript, we have expanded the Methods section (page 5, lines 117-121) to provide a detailed description of how these metrics were calculated. HRD score was calculated as the unweighted sum of loss of heterozygosity (LOH), telomeric allelic imbalance (TAI), and large-scale state transitions (LST). Tumor mutation burden (TMB) was defined as the total number of somatic nonsynonymous mutations per megabase of the exome captured by the sequencing panel. Neoantigen load was predicted by NetMHCpan using the patient's HLA typing and the identified somatic mutations. The data for these three indicators all obtained from a previous study (PMID: 29628290). We believe these additions provide the necessary transparency and reproducibility for our study.

      Methods section, page 5, lines 117-121: The HRD score was determined by summing specific genomic alterations, including loss of heterozygosity (LOH), large-scale state transitions (LST), and telomeric allelic imbalances (TAI). “Tumor mutation burden (TMB) was defined as the total number of somatic nonsynonymous mutations per megabase of the exome captured by the sequencing panel. Neoantigen load was predicted by NetMHCpan using the patient's HLA typing and the identified somatic mutations.”

      (3) Line 421: BRCA12 should be corrected to BRCA2.

      Thank you for your detailed review. We have revised it.

      (4) The order of Figures 7D and 7E should be reversed.

      Thank you for your insightful comment. According to your suggestion, we reversed the order of Figures 7D and 7E in the revised manuscript.

      Special thanks to you for your kind comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the emerging role of fungal pathogens in colorectal cancer and provides mechanistic insights into how Candida albicans may influence tumor-promoting pathways. While the work is potentially impactful and the experiments are carefully executed, the strength of evidence is limited by reliance on in vitro models, small patient sample size, and the absence of in vivo validation, which reduces the translational significance of the findings.

      Strengths:

      (1) Comprehensive mechanistic dissection of intracellular signaling pathways.

      (2) Broad use of pharmacological inhibitors and cell line models.

      (3) Inclusion of patient-derived organoids, which increases relevance to human disease.

      (4) Focus on an emerging and underexplored aspect of the tumor microenvironment, namely fungal pathogens.

      Weaknesses:

      (1) Clinical association data are inconsistent and based on very small sample numbers.

      We thank the reviewer for this important comment. We have investigated 4 colorectal tumors (2 in early stage and 2 in late stage), and we observed Candida albicans in the 2 late-stage samples while not in the early-stage ones. This result is consistent with TCGA data (which is large-scale) that Candida albicans mainly detectable in the late-stage colorectal tumors (Fig. 1c) and suggests that Candida albicans contributed to colorectal cancer progression, which is the main research direction of this study.

      (2) No in vivo validation, which limits the translational significance.

      We appreciate the reviewer’s concern regarding the lack of in vivo validation. While we recognize the value of in vivo models, our current institutional biosafety protocols and animal facility designations do not support the handling of pathogenic microorganisms like Candida albicans in live infection models. Consequently, these experiments were beyond the immediate technical scope of this study. To validate the findings using cell lines, we have performed Candida albicans infection experiments using organoids collected from colorectal cancer patients instead (Fig. 7). We have revised the Discussion section to acknowledge this limitation and clarify that the current work serves as a mechanistic study based on in vitro and ex vivo systems. We have also incorporated references to recent studies demonstrating the in vivo effects of C. albicans in tumor models, which support the biological relevance of our findings.

      (3) Species- and cell type-specificity claims are not well supported by the presented controls.

      We thank the reviewer for this insightful comment. We agree that our current dataset does not warrant definitive conclusions regarding species- or cell type-specificity. Accordingly, we have tempered our claims throughout the manuscript, describing the observed effects as context-dependent across different epithelial models. Specifically, we observed differential responses among the cell lines and epithelial systems evaluated, suggesting variability rather than strict specificity. Furthermore, the Discussion has been expanded to address potential underlying factors, such as variations in EGFR expression levels and other signaling determinants. We have also added a dedicated section to acknowledge this limitation and emphasize the need for future systematic investigations using a more diverse array of fungal species and cell models.

      (4) Reliance on colorectal cancer cell lines alone makes it difficult to judge whether findings are specific or general epithelial responses.

      We appreciate the reviewer’s thoughtful concern. Although most of our mechanistic experiments were performed in colorectal cancer cell lines, we also evaluated our finding across a broader range of epithelial models, including normal human colon-derived organoids and the breast epithelial cancer line MCF7 (Fig. 8). Neither model exhibited HIF-1α activation upon C. albicans exposure, supporting that the hypoxia response we observed might not be universal. Interestingly, the observed response in non-colorectal epithelial cancer lines (e.g., HCC1937, NUGC-3) suggests that this mechanism is not strictly confined to CRC. Based on these observations, we propose that the specificity is likely related to EGFR levels but may involve additional epithelial determinants, which we aim to investigate in future work.

      Reviewer #2 (Public review):

      The authors in this manuscript studied the role of Candida albicans in Colorectal cancer progression. The authors have undertaken a thorough investigation and used several methods to investigate the role of Candida albicans in Colorectal cancer progression. The topic is highly relevant, given the increasing burden of colon cancer globally and the urgent need for innovative treatment options. However, there are some inconsistencies in the figures and some missing details in the figures, including:

      (1) The authors should clearly explain in the results section which patient samples are shown in Figure 1B.

      We thank the reviewer for pointing out this omission. We apologize for the lack of clarity in the initial submission. The patient samples shown in Figure 1B are from the CRC patients with Stage III. We have revised the manuscript to explicitly state this information in the legend for Figure 1B to ensure better clarity for the reader.

      (2) What do a, ab, b, b written above the bars in Figure 1F represent? Maybe authors should consider removing them, because they create confusion. Also, there is no explanation for those letters in the figure legend.

      We thank the reviewer for this helpful comment. The letters above the bars represent statistical groupings from post-hoc multiple-comparison tests (a standard convention used after ANOVA or similar analyses): bars sharing the same letter are not significantly different, whereas different letters indicate statistically distinct groups. We chose this letter-based system over asterisks to avoid the visual clutter and potential confusion that often arise from numerous pairwise comparisons; therefore, we will retain the letter-based grouping. In the revised manuscript, we have explicitly defined this notation in the figure legend to be ease of interpretation for the reader.

      (3) The authors should submit all the raw images of Western blot with appropriate labels to indicate the bands of protein of interest along with molecular weight markers.

      We appreciate the reviewer’s request for raw data. We have now included the raw images of the Western blots in the supplementary materials, with clear annotations of the bands corresponding to the proteins of interest as well as molecular weight markers.

      (4) The authors should do the quantification of data in Figure 2d and include it in the figure.

      We thank the reviewer for this valuable suggestion. In the revised manuscript, we have quantified the subcellular localization of HIF-1α in PBS-treated versus C. albicans–infected cells shown in Figure 2d. The quantification results are shown in the following figure and provided in Supplementary Figure 3c.

      (5) In Figure 2h, the authors should indicate if the quantification represents VEGF expression after 6h or 12h of C. albicans co-culture with cells.

      We thank the reviewer for pointing this out. We have updated Figure 2h to specify that the quantification represents VEGF expression after 12 hours of co-culture with Candida albicans.

      (6) In Figure 2i, quantification of VEGF should be done and data from three independent experiments should be submitted. The authors should also mention the time point.

      We thank the reviewer for this helpful comment. In the revised manuscript, we have quantified VEGFA fluorescence intensity based on three independent experiments (the other 2 replicates were shown in Author response image 1). The corresponding time point (12 hours of co-culture) has been clearly indicated in the figure legend.

      Author response image 1.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Some of the statements regarding Candida albicans and CRC progression in Figure 1 may be overstated (since the association with stage/survival may be cross-confounded). That is, analyses of survival ought to be stage-adjusted.

      We thank the editor for this important comment. We agree that the association between C. albicans and patient survival may be influenced by tumor stage as a confounding factor. In the revised manuscript, we have moderated our statements regarding the clinical associations and clarified the limitations of these analyses, now presenting these findings as correlative observations rather than causal relationships. We have also noted in the Discussion that stage-adjusted analyses would be required to more rigorously assess the independent contribution of C. albicans to patient outcomes.

      (2) Fan et al. (citation 26) is incorrectly referenced. The paper states that Bacteroides fragilis does not affect Candida albicans colonization. Instead, Bacteroides thetaiotamicron was shown to reduce C. albicans colonization, but B. fragilis was used in the current study as a control.

      We thank the editor for pointing out this error, and we have corrected the citation accordingly. As noted, the referenced study indicates that Bacteroides thetaiotaomicron, rather than Bacteroides fragilis, reduces C. albicans colonization. We selected B. fragilis as a control in this study because it is a prevalent gut commensal and has been previously implicated in colorectal cancer progression. Although prior reports suggest that B. fragilis does not significantly affect C. albicans growth, we observed that co-culture with B. fragilis led to a noticeable inhibition of C. albicans growth under our experimental conditions. This discrepancy may reflect differences in experimental settings. We believe these findings provide additional context for the complex interactions between gut microbiota and fungal pathogens.

      (3) The link between hypoxia signaling is interesting, but for the most part, these experiments were largely done in normoxic conditions, while the colon is generally hypoxic. So I would have encouraged the authors to consider testing the effect of C. albicans presence/absence under low-oxygen conditions, which may be more physiologically relevant.

      We thank the editor for this insightful suggestion. We fully agree that evaluating the effects of C. albicans under hypoxic or anaerobic conditions would be highly relevant to the physiological tumor microenvironment. Although we have attempted to assess the impact of C. albicans on cell migration under hypoxic conditions, we observed that tumor cells exhibited markedly accelerated migration and proliferation, resulting in near-complete wound closure within 24 hours in control groups. This limited our ability to reliably detect differences between conditions using standard migration assays. We agree that in vivo models may provide a more physiologically relevant context to address this question, and we will pursue this direction in future studies when appropriate experimental conditions become available.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1 inconsistencies: In Figure 1C, there is no significant difference in C. albicans detection between stage II and stage III CRC patients. In fact, more patients in stage II appear positive, which is inconsistent with Figures 1A and 1B. For Figures 1A and 1B, the sample size (n=2) is too low to support meaningful conclusions. Please also clarify which stage is represented in Figure 1B.

      We thank the reviewer for this important comment. In the revised manuscript, we have clarified the sample information and explicitly stated that the samples shown in Figure 1b are derived from stage III CRC patients. We have also moderated our conclusions and described these findings as exploratory observations. Regarding the apparent inconsistency between Figure 1C and Figures 1a-b, we consider that this discrepancy may be partly due to the small number of clinical samples analyzed in our study. In addition, the TCGA-based analysis relies on transcriptomic data, whereas our analysis is based on immunohistochemistry (IHC). These methodological differences may also contribute to the observed variation.

      (2) Weak link between clinical and in vitro data: The transition from clinical samples to CRC cell line models feels tenuous. While C. albicans may induce hypoxia signaling, it is unclear whether this is specific to CRC cells or could occur in other epithelial cell types. Some broader testing would help strengthen this link.

      We thank the reviewer for this insightful comment. We agree that reinforcing the bridge between clinical observations and in vitro mechanistic findings, as well as clarifying cell type specificity, is important for a comprehensive study. In the revised manuscript, we have clarified that the clinical data provide correlative evidence, while the mechanistic insights are derived from controlled in vitro systems. To address the issue of cell type specificity, we have included additional analyses across multiple epithelial cell models (Figure 8). These results indicate that the response to C. albicans is not restricted to colorectal cancer cells but varies across different epithelial contexts.

      (3) Lack of in vivo validation: The mechanistic findings would be substantially strengthened by in vivo data, e.g., murine CRC models. Without this, the translational impact is limited.

      We appreciate the reviewer’s concern regarding the lack of in vivo validation. While we recognize the value of animal models, our current institutional biosafety protocols and facility designations do not support the handling of pathogenic microorganisms like Candida albicans in live infection models. Consequently, these experiments were beyond the immediate technical scope of this study, and better be performed in future studies to validate the mechanisms.

      (4) Figure 8B interpretation: The authors conclude that C. albicans shows the strongest effect on c-Myc and c-Jun activation. However, from the presented blots, the differences compared to other fungi are not obvious. The claim should be toned down or quantified more rigorously.

      We thank the reviewer for this important comment. We agree that the differences in c-Myc and c-Jun activation among fungal species are not sufficiently pronounced to support a strong comparative claim. In the revised manuscript, we have moderated the corresponding statements to avoid overinterpretation.

      (5) Cell type specificity: Since the title emphasizes CRC specificity, the cell line comparison shown in Figure 8 should be moved earlier in the results. This would clarify from the start whether the described mechanisms are CRC-specific or more generalizable.

      We thank the reviewer for this insightful suggestion. We agree that earlier presentation of cell type comparisons would help clarify the scope of the observed effects. We have revised the Results section accordingly: “To evaluate the cell type specificity of this response, we further analyzed additional epithelial cell models, as shown in Figure 8”.

      In this study, we initially identified the effects of C. albicans in colorectal cancer (CRC) cells and therefore focused on establishing the underlying mechanisms in this context. Subsequently, we extended our analysis to additional epithelial cell types to evaluate whether these effects are shared or context-dependent. We believe that this stepwise organization, from detailed mechanistic investigation in CRC cells to broader comparison across cell types, provides a logical and coherent flow for the reader. In the revised manuscript, we have further clarified this rationale in the text to improve readability and interpretation.

      (6) It would be good to use a negative fungi control instead of a PBS control for most of the experiment.

      We thank the reviewer for this valuable suggestion. We agree that a negative fungal control would further strengthen the conclusions. Unfortunately, we were unable to incorporate additional controls during the revision, while we believe that our comparative analysis across multiple fungal species (Figure 8) partially addresses this issue by demonstrating differential signaling responses. Future studies will incorporate appropriate negative fungal controls to further validate the specificity of these effects.

      (7) It is surprising that the Dectin-1 inhibitor shows a smaller effect compared with the TLR2 inhibitor. This result warrants further explanation, as Dectin-1 is a well-known receptor for C. albicans β-glucans. The authors should clarify whether this difference reflects cell type-specific expression (e.g., low Dectin-1 in CRC cells), ligand accessibility, or pathway redundancy, and discuss how it aligns with existing literature.

      We thank the reviewer for this insightful comment. We agree that the relatively modest effect of Dectin-1 inhibition compared to TLR2 inhibition warrants further consideration. In the revised manuscript, we have expanded the Discussion to address this observation. We propose several possible explanations: Firstly, the expression level of Dectin-1 is relatively low in these epithelial cancer cells, thereby limiting its functional contribution. Secondly, differences in ligand accessibility, particularly in the context of fungal cell wall architecture, may influence receptor engagement. Finally, redundancy and cross-talk among pattern recognition receptor pathways compensate for Dectin-1 inhibition. These observations highlight the context-dependent nature of host–fungal interactions.

      Reviewer #2 (Recommendations for the authors):

      All my comments that need to be addressed are given above and below:

      (1) What do a and b represent in Figure 2f? They should be removed or clearly explained in the figure legend, as they are creating confusion for the audience.

      We thank the reviewer for this comment. The letters indicate statistical groupings from post hoc multiple comparison tests. In the revised manuscript, we have added a clear explanation of this notation to the corresponding figure legends to be ease of interpretation for the reader.

      (2) In the figure legend of S3a, the authors mentioned only the Caco2 cell line, whereas in the figure, there are two more cell lines, HCT116 and SW48. The authors should revise the figure legend.

      We thank the reviewer for this comment. We have addressed this point and made the necessary corrections in the revised manuscript.

      (3) The scale bar information is missing for Figure S3b. It should be included.

      We thank the reviewer for this comment. The same scale bar was applied across all images in this panel. We have clarified this in the figure legend.

      (4) In Figure 2e, the HIF-1α level in the Caco2 cells at 24 hr time point is a lot higher compared to the level at the 12-hour time point after C. albicans infection. But in the WB quantification in Figure 2f, the level of HIF-1α is not higher when compared to 12hr. Although it is relative data based on control, authors should check this calculation again for any errors.

      We thank the reviewer for carefully examining the data. We have re-verified the quantification and confirmed that the values represent relative protein levels normalized to the corresponding controls at each time point.

      Because samples from different time points were processed and analyzed separately, direct comparison of absolute protein levels across time points is not appropriate. Therefore, relative quantification within each time point provides a more accurate and representative assessment of HIF-1α changes.

      (5) Line 125-127: This sentence should be rephrased.

      We thank the reviewer for this comment. We have revised the corresponding section to improve clarity.

      (6) PHD-mediated ubiquitination is the primary mechanism regulating HIF-1α protein stabilization. The authors should add an appropriate reference here.

      We thank the reviewer for this suggestion. An appropriate reference has been added in the revised manuscript to support this statement.

      (7) The authors claim that they observed that although the total level of HIF-1α increased, the ratio of its ubiquitinated form to total HIF-1α decreased. The authors should clearly indicate in the figure which protein band from the WB image was used for quantification from Figure S3c, which resulted in the graph presented in Figure S3d.

      We thank the reviewer for this suggestion. We have revised the figure legend to improve clarity.

      (8) In Figure 3a, there are some faint grey color lines. These graphs should be reformatted.

      We thank the reviewer for this comment. We did not observe obvious faint grey lines in the original figure; however, these artifacts may have arisen during image conversion or file transfer. To ensure optimal image quality, we have provided high-resolution vector files to improve clarity.

      (9) What do a and b in the bar graphs shown in Figure 3d,e; S4d,e,f represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends.

      (10) What do a,b,c in the bar graphs shown in Figure 4c,d,h represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends.

      (11) There are some faint grey lines in the bar graphs shown in Figure 4g. These lines should be removed.

      We thank the reviewer for this comment. We did not observe obvious faint grey lines in the original figure; however, these artifacts may have arisen during image conversion or file transfer. To ensure optimal image quality, we have provided high-resolution vector files to improve clarity.

      (12) Grey line below HIF-1α in the graph shown in Figure h should be removed.

      We thank the reviewer for this comment. We did not observe obvious faint grey lines in the original figure; however, these artifacts may have arisen during image conversion or file transfer. To ensure optimal image quality, we have provided high-resolution vector files to improve clarity.

      (13) The authors wrote - notably, despite treatment with AG1478, the levels of HIF-1α and c-MYC in C.albicans-infected cells remained significantly elevated compared to the uninfected control group (Figure 4b). There is no quantification for c-MYC. Statistics for HIF-1α quantification are missing. These should be added.

      We thank the reviewer for this comment. We have quantified HIF-1α levels, and the results are presented in Figure 4d, including statistical analysis.

      (14) There is no data for knockdown of MYD88, Dectin-1, and SYK as mentioned in the text lines 222-224. The authors should explain this discrepancy.

      We thank the reviewer for this important comment. MYD88, Dectin-1, and SYK are expressed at relatively low levels in HCT116 cells, and our preliminary qPCR analyses indicated that it would be technically challenging to achieve reliable and quantifiable knockdown of these targets. Nevertheless, previous studies have reported that Dectin-1 can be present on the surface of epithelial cells, suggesting that it may still contribute to fungal recognition even at low expression levels. Therefore, given the technical constraints of gene knockdown in this specific context, we reasoned that pharmacological inhibition would provide a more robust approach to suppress this pathway.

      (15) In line 227 in the results section it should be Figure S5c-e instead of Figure S5b-e. Figure S5b results do not match the results that are being explained here.

      We thank the reviewer for this comment. We have corrected the typos in the revised manuscript.

      (16) What do a,b,c in the bar graphs shown in Figure 5 a,b,i represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends.

      (17) Was the experiment in Figure 5e done in triplicate? If not, it should be done in triplicate and quantified. The scale bar information is missing for IF images shown in Figure 5e. It should be added.

      We thank the reviewer for this comment. The experiments were independently repeated for three times, and the quantification shown in Figure 5g represents the combined results from these biological replicates. The same scale bar was applied across all images in this panel. We have clarified this in the figure legend.

      (18) Lines 273-274 in the results section: Als3 and Hwp1 are known to be involved in the adhesion of C. albicans to epithelial cells, while Ece1 encodes the virulence factor candidalysin. References should be added.

      We thank the reviewer for this suggestion. We have added a reference in the revised manuscript to support this statement.

      (19) What do a and b in the bar graphs shown in Figures 6 f,h,r represent? Since these letters are confusing and are present in several figures, they should be either deleted or clearly explained in the figure legends or text.

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends to be ease of interpretation for the reader.

      (20) What do a,b, and c in the bar graphs shown in Figure S8 b represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend to of this notation to the corresponding figure legends to be ease of interpretation for the reader.

      (21) Scale bar should be added in Figure S9.

      We thank the reviewer for these helpful comments. We have addressed this point and made the necessary corrections in the revised manuscript.

      (22) What do a and b, in the bar graphs shown in Figure S11 represent?

      We thank the reviewer for this comment. The letters indicate statistical groupings from post hoc multiple comparison tests. In the revised manuscript, we have added a clear explanation in the figure legend of this notation to the corresponding figure legends to be ease of interpretation for the reader.

      (23) Were the organoids used in this paper characterized? If yes, how? Also, it should be mentioned in the appropriate section in the manuscript.

      The organoids are not characterized; they are cultured using patients’ samples according to our previous protocols (He et al. Cell Stem Cell 2022).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors analyze connectome data from Drosophila and compare the physical wiring with functional connectivity estimated from calcium imaging data. They quantify structure-function relationships as a correlation of the two connectivity modalities. They report correlations roughly comparable to what has been described in the literature on sc/fc relationships in mammalian connectome data at the meso-scale. They then repeat their analysis, focusing on segregated versus unsegregated synapses. They derive separate connectomes using one or the other class of synapse. They show differential contributions to the sc/fc relationships by segregated versus unsegregated synapses.

      Strengths:

      There is nice synthesis of multimodal imaging data (Ca and EM data from flies and meso-scale data from human and marmoset).

      Thank you very much for your comments.

      Weaknesses:

      (1) The paper is written in an unusual way. The introduction intermingles results with background, making it hard to figure out what precisely is being tested.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (2) There are also major methodological gaps. Though the mammalian connectomes are used as a point of reference, no descriptions of their origins or processing are included.

      The reanalysis of marmoset data is presented in Ext. Data Figure. However, as pointed out by other reviewers, the data was obtained in [10], and the processing is also described in [10]. Therefore, we have revised the caption and removed the Ethics Declaration.

      (3) A major weakness stems from the actual calculation of the sc/fc correlation. In general, SC is sparse. In the case of the EM connectomes, it is *exceptionally* sparse (most neural elements are not connected to one another). The authors calculated sc/fc coupling by correlating the off-diagonal elements of sc (the logarithm of its edge weights) and fc matrices with one another. The logarithmic transformation yields a value of infinity for all zero entries. The authors simply impute these elements with 0. This makes no sense and, depending on whether these zero elements are distributed systematically versus uniformly random, could either inflate or deflate the sc/fc correlations. Care must be taken here.

      Thank you for pointing this out. As you mentioned, the SC matrix becomes increasingly sparse as the number of ROIs increases (Ext. Data Fig.2-2b). In contrast, the FC matrix may contain values even when there are no direct connections between ROIs (indirect connections). We conducted an investigation into this issue. To deal with this issue, Honey et al. (2009) [6] resampled the elements of the SC matrix in rank order using a Gaussian distribution and calculated the FC-SC correlation between this resampled SC and FC.

      Ext. Data Fig.2-2a shows a comparison between resampled SC (Honey et al.’s method) and log-scaled SC (our method). Up to 200 ROIs, the proportion of SC matrix elements that are zero is less than 10% (Ext. Data Fig.2-2b), and there is little zero replacement of logarithmic elements. In this situation, replacing with Gaussian arithmetic tends to increase the correlation coefficient (Ext. Data Fig.2-2a). On the other hand, with 10,000 ROIs, where sparsity is extremely high, the proportion of SC matrix elements that are zero exceeds 70%. In this situation, 70-80% of the zeros are randomly assigned from the smaller end of the Gaussian distribution, which causes a lowering of the correlation coefficient (Ext. Data Fig.2-2a, c, d). For these reasons, we believe that log-scaled SC has less bias than resampling with a Gaussian distribution, and conclude that using log-scaled SC as is in this paper is reasonable. Log-scaled SC has also been used in previous studies [9, 68] and is considered a simple method for showing the relationship (correlation) between FC and SC. To show that we have considered this issue, Ext. Data Fig.2-2 has been added to the manuscript.

      (4) Further, in constructing the segregated versus unsegregated connectomes, they use absolute thresholds for collecting synapses. It is unclear, however, whether similar numbers of synapses were included in both matrices. If the number is different, that might explain the differential relationship with fc; one matrix has more non-zero entries (and as noted earlier, those zero entries are problematic).

      Author response image 1.

      a, Sparsity rate histogram of SC matrix with cPPSSI (0-0.1) and subsampled null SC matrices corresponding Fig.4e. Red line indicates sparsity rate of SC matrix with cPPSSI (0-0.1). b, Sparsity rate histogram of SC matrix with cPPSSI (0.9-1) and subsampled null SC matrices corresponding Fig.4f. c, Sparsity rate histogram of SC matrix with reciprocal synapse (≤2𝜇𝑚) and subsampled null SC matrices corresponding Fig.4i.

      Thank you for pointing this out. The number of synaptic connections in the SC matrix shows a large difference between those extracted from cPPSSI (0-0.1) and cPPSSI (0.9-1) (Fig. 4e, f). However, when null SC matrices (99) were generated for each and compared with the cPPSSI-extracted matrices, the FC-SC correlation was significantly higher or lower. At this point, since the sparsity rates of the null SC matrices differed a lot from that of the SC matrices extracted by cPPSSI, we regenerated the null SC matrices in Fig. 4e and 4i. As shown in Author response image 1, we ensured that the extracted SCs (red lines) fit within the null-generated matrices. This figure was added to Ext. Data Fig.4-5, and the main text was also revised. The sparsity rates are 0.52 for cPPSSI (0-0.1) and 0.123 for cPPSSI (0.9-1). Since both cases involve comparisons with null SC matrices that have closely similar sparsity rates, we believe comparison using log-scaled SC is appropriate.

      (5) There was also considerable text (in the results) describing the processing of the Ca data. In this section, the authors frequently refer to some pipelines as "better" or "worse" (more or less effective). But it is not clear what measures they adopted to assess the effectiveness of a pipeline.

      Detailed registration flow of Ca data is described in “Preprocessing of D. melanogaster calcium imaging data” in Materials and Methods section (Ext. Data Fig. 1-1a). Then, optimal nuisance factor removal methods and smoothing size were investigated. We used both correlation analysis (FC-SC correlation) and ROC curve analysis (FC-SC detection). Since signals are assumed to be transmitted between regions based on SC, when SC is treated as the ground truth, we considered a pipeline with a FC-SC higher similarity and higher detection to be better. We updated the Results section to include this point.

      Reviewer #2 (Public review):

      Summary:

      Okuno et al. investigate the structure-function relationship in the fruit fly Drosophila melanogaster. To do so, they combine published data from two recent synapse-level connectomes ("hemibrain" and "FlyWire") with a dataset comprising functional whole-brain calcium imaging and behavioural data. First, they investigate the applicability of fMRI pre-processing techniques on data from calcium imaging. They then cross-correlate this pre-processed functional data with structural data extracted from the connectomes, including a comparison to humans. The authors proceed to compare the two connectomes and find significant differences, which they attribute to differences in the accuracy of the synapse detections. Next, they present a novel algorithm to quantify whether neurons are segregated (pre- and postsynapses are spatially separate) or unsegregated (pre- and postsynapses are mixed). Using this approach, they find that unsegregated neurons may contribute more to function than segregated neurons. Applying a general linear model to the functional dataset suggests that activity in two brain areas (Wedge and AVLP) is suppressed during walking. The authors identify a GABAergic neuron in the connectome that could be responsible for this effect and suggest it may provide feedback to the fly's "compass" in the central complex.

      Strengths:

      The study tackles a relevant question in connectomics by exploring the relationship between structural and functional connectivity in the Drosophila brain. The authors apply a range of established and adapted analytical methods, including fMRI-style preprocessing and a novel synaptic segregation index. The effort to integrate multiple datasets and to compare across species reflects a broad and methodical approach.

      Thank you very much for your comments.

      Weaknesses:

      The manuscript would benefit from a clearer overarching narrative to unify the various analyses, which currently appear somewhat disjointed. While the technical methods are extensive, the writing is often convoluted and lacks crucial details, making it difficult to follow the logic and interpret key findings. Additionally, the conclusions are relatively incremental and lack a compelling conceptual advance, limiting the overall impact of the work.

      (1) The introduction currently contains a number of findings and conclusions that would be better placed in the results and discussion to clearly delineate past findings from new results and speculations.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (2) The narrative would benefit greatly from some clear statements along the lines of "we wanted to find out X, therefore we did Y".

      Thank you for pointing this out. In many biology papers, the problem is clear, but as you say, this paper starts by comparing the very fine SC and FC of flies, which makes the problem unclear and the results sporadic. We have revised the structure of the introduction.

      (3) More concise terminology would be helpful. For example, the connectomes are currently referred to as either "hemibrain", "FlyEM", "whole-brain", or "FlyWire".

      Thank you for pointing this out. We revised the manuscript to separate "hemibrain" and "whole-brain" from "connectome." "hemibrain" and "whole-brain" retain their original meanings.

      (4) The abstract claims "a new, more robust method to quantify the degree of pre- and post-synaptic segregation". However, the study fails to provide evidence that this method is indeed more robust than existing methods.

      We apologize, but this information was not included in the main figures or the Results section. It is presented in the Methods section and Ext. Data Fig. 4-1i, j. We moved related texts from the Methods to the Results section.

      (5) The authors define unsegregated neurons as having mixed pre- and postsynapses in the same space. However, this ignores the neurons' topology: a neuron can exhibit a clearly defined dendrite with (mostly) postsynapses and a clearly defined axon with (mostly) presynapses, which then occupy the same space. This is different from genuinely unsegregated neurons with no distinct dendritic and axonal compartments, such as CT1.

      Thank you for pointing this out. Regarding this point, we think it is difficult to discuss the neuron’s topology in this paper. We defined PPSSI and demonstrated only that unsegregated neurons with mixed pre- and post-synapses are scattered throughout the brain (Ext. Data Fig. 4-2e). Further research is needed to determine the relationship with morphology in individual neurons.

      One possibility is that inhibitory, non-spiking unsegregated neurons, such as CT1 amacrine cell [24, 27, 28] or interneurons in Antennal Lobe [29], may be widely used throughout the brain (WAGN is also a candidate for this). Grimes et al. [34] mentioned “The retina is a beautiful example of a neural network that optimizes signal processing capacity while minimizing cellular cost.” To maintain the signal dynamic range, A17 amacrine cells must optimize the processing units and wiring costs. If one unit equaled one cell, an enormous number of cell bodies would be required, reducing the number of processing units per volume and increasing the energy cost during development. To optimize this, they proposed arranging units capable of parallel processing within a single cell, thereby maximizing the processing units and wiring costs per volume.

      Signal bursts might also occur in the central nervous system (CNS), in which case CNS neurons also require dynamic range adjustment. The concept of optimizing processing units per volume is highly compelling and is thought to apply not only to the retina but throughout the entire brain.

      (6) It is not entirely clear where the marmoset dataset originates from. Was it generated for this study? If not, why is there a note in the Ethics Declaration?

      Marmoset data were reported in [10] and it was not generated for this study. We therefore removed the Ethics Declaration.

      (7) On the differences between hemibrain and FlyWire: What is the "18.8 million post-synapses" for FlyWire referring to? The (thresholded) FlyWire synapse table has 130M connections (=postsynapses). Subsetting that synapse cloud to the hemibrain volume still gives ~47M synapses. Further subsetting to only connections between proofread neurons inside the hemibrain volume gives 19.4M - perhaps the authors did something like that? Similarly, the hemibrain synapse table contains 64M postsynapses. Do the 21M "FlyEM" post-synapses refer to proofread neurons only? If the authors indeed used only (post-)synapses from proofread neurons, they need to make that explicit in results and methods, and account for differences in reconstruction status when making any comparisons. For example, the mushroom body in the hemibrain got a lot more attention than in FlyWire, which would explain the differences reported here. For that reason, connection weights are often expressed as, e.g., a fraction of the target's inputs instead of the total number of synapses when comparing connectivity across connectomic datasets. Furthermore, in Figure 3b, it looks like the FlyWire synapse cloud was not trimmed to the exact hemibrain boundaries: for example, the trimmed FlyWire synapse cloud seems to extend further into the optic lobes than the hemibrain volume does.

      Thank you for pointing this out. FlyEM connectome data version 1.2 was downloaded and used as described in Data Availability. This data is provided in the format defined by https://neuprint.janelia.org/public/neuprintuserguide.pdf, and we extracted neurons and synapses from it.

      The entire segmentation body is 28M segmentations, and there were 99,644 Traced proofread neurons. In addition, there were 73M (pre- or post- alone) synapses, 87M records in synapseSets and 128M records in synapseSet-to-synapse. When we extracted post-synapses between Traced neurons, the total number was 21.4M (i.e., connections from Traced neurons to other body fragments like Orphans were excluded).

      The FlyWire dataset (v783) was downloaded from the flywire codex and Zenodo. This dataset contained 139,255 proofread neurons and 54.5M (pair of pre- and post-) synapses, as described in Dorkenwald et al. [13], with 18.8M post-synapses in the regions corresponding to the hemibrain primary ROIs. We have updated the Results and Methods sections by taking into account your comment.

      In Fig. 3b, these images were created using a mask that extended the boundaries of the hemibrain primary ROIs, making the boundaries unclear. Therefore, we corrected the images in Fig. 3b by adjusting the mask so that the boundaries were properly aligned.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Okuno et al. re-analyze whole-brain imaging data collected in another paper (Brezovec et al., 2024) in the context of the two currently available Drosophila connectome datasets: the partial "FlyEM" (hemibrain) dataset (Scheffer et al., 2020) and the whole-brain "FlyWire" dataset (Dorkenwald et al., 2024). They apply existing fMRI signal processing algorithms to the fly imaging data and compute function-structure correlations across a variety of post-processing parameters (noise reduction methods, ROI size), demonstrating an inverse relationship between ROI size and FC-SC correlation. The authors go on to look at structural connectivity amongst more polarized or less polarized neurons, and suggest that stronger FC-SC correlations are driven by more polarized neurons.

      Strengths:

      (1) The result that larger mesoscale ROIs have a higher correlation with structural data is interesting. This has been previously discussed in Drosophila in Turner et al., 2021, but here it is quantified more extensively.

      (2) The quantification of neuron polarization (PPSSI) as applied to these structural data is a promising approach for quantifying differences in spatial synapse distribution.

      Thank you very much for your comments.

      Weaknesses:

      One should not score noise/nuisance removal methods solely by their impact on FC-SC correlation values, because we do not know a priori that direct structural connections correspond with strong functional correlations. In fact, work in C. elegans, where we have access to both a connectome and neuron-resolution functional data, suggests that this relationship is weak (Yemini et al., 2021; Randi et al., 2023). Similarly, I don't think it's appropriate to tune the confidence scores on the EM datasets using FC-SC correlations as an output metric.

      Thank you for pointing this out. We believe that the FC in C. elegans uses cell body dynamics, which is different from the synaptic population dynamics in a region of fly calcium imaging or fMRI data (BOLD [Blood Oxygenation Level Dependent] signal). The BOLD signal in a region is thought to correspond to the neurovascular coupling of synaptic population dynamics. Furthermore, compartmentalization of a neuron has been observed in C. elegans (Hendricks et al., 2012)*, showing different dynamics across neuron compartments. Thus, the dynamics of the cell body and the dynamics of the synaptic population in other regions are different in C. elegans. We speculate that there is some relationship between FC-SC between regions, because the FC-SC correlation in the fly brain reached r=0.87 with 20 ROIs (Fig. 2d). We believe that this result is different from the cell body dynamics in C. elegans.

      *Hendricks et al., “Compartmentalized calcium dynamics in a C. elegans interneuron encode head movement,” Nature 487, 99-103 (2012)

      Any discussion of FC-SC comparisons should include an analysis of excitatory/inhibitory neurotransmitters, which are available in the fly connectome dataset. However, here the authors do not perform any analyses with neurotransmitter information.

      A comparison between FC-SC and neurotransmitter has been written in the Results section. We investigated the ratios of neurotransmitter input (ExtFig.3-2a) and output (Fig. 3f) in each region, and investigated the relationship between this ratio and FC-SC correlation in each neurotransmitter. This revealed significant correlations for acetylcholine (r=0.39, p=0.0013) and GABA (r=-0.25, p=0.046) (Fig. 3g). That is, the higher the percentage of excitatory connections, the higher the FC-SC correlation; conversely, the higher the percentage of inhibitory connections, the lower the FC-SC correlation.

      Comparisons between fly and human MRI data are also premature here. Firstly, the fly connectomes, which are derived from neuron-scale EM reconstructions, are a qualitatively different kind of data from human connectomes, which are derived from DSI imaging of large-scale tracts. Likewise, calcium data and fMRI data are very different functional data acquisition methods-the fact that similar processing steps can be used on time-series data does not make them surprisingly similar, and does not in my view, constitute evidence of "similar design concepts."

      Thank you for pointing this out. As you say, fiber bundles of DTI and EM connectome are completely different. Nevertheless, the fact remains that the FC-SC correlation is high in both the fly and human brains. As mentioned above, both regional signal from calcium imaging and BOLD signal from fMRI are based on synaptic population dynamics. It was estimated that 43% of the energy consumption in the gray matter is due to synaptic activity of neurons (Harris et al., 2012), and the BOLD signal fluctuates greatly due to this activity. Furthermore, synaptic activity is thought to be much faster than the activity of microglia and astrocytes, so the FC of fMRI is thought to mainly capture the regional correlation of synaptic activity. In other words, in both flies and humans, although the size is different, the pre-synaptic activity in one region and the pre-synaptic activity in another region via neural fibers are being compared in a common manner in the form of FC-SC.

      In addition, non-spiking unsegregated neuron exists in mammals as well, such as the amacrine cell of the retina [34], and even pyramidal cells in the neocortex show local mixtures of pre- and post-synapses (Ext. Data Fig.1-2). If a functional unit is realized by local compartment in a neuron as mentioned in [34], the fly will be a powerful model organism for investigating them, and its functional “design concept” may also be useful for mammals.

      Harris et al., “The Energetics of CNS White Matter,” J. Neurosci., 2012, 32 (1) 356-371

      The comparison of FlyEM/FlyWire connectomes concludes that differences are more likely a result of data processing than of inter-individual variability. If this is the case, the title should not claim that the manuscript covers individual variability.

      Thank you for pointing this out. Inter-individual variability is relevant to both SC and FC. Regarding SC, we think the difference in the number of synapses between the two individuals is due to the difference in detection power caused by differences in the resolution of the electron microscope. Regarding FC, as stated in the Results section, “Spatial smoothing is useful for absorbing inter-individual variability and conducting second-level group analysis.” Increasing the smoothing size improves the correlation and AUC between group-averaged FC and SC, indicating the presence of inter-individual variability in FC (Fig. 2b, Ext. Data Fig. 2-1b, especially when the number of ROIs is high). We added this text in the Introduction and Results sections to address your comment.

      The analysis of the wedge-AVLP neuron strikes me as highly speculative, given that the alignment precision between the connectome and the functional data is around 5 microns (Brezovec* et al, PNAS 2024).

      As you mentioned, functional analysis has limitations in spatial resolution. In particular, the resolution in the Z axis is 4 μm, which is 1,000 times lower than the resolution of electron microscopy data. This makes it difficult to perfectly match synaptic activity to a synapse in the structural data. Furthermore, spatial smoothing is applied to functional images to absorb inter-individual variability, which can only provide blurred results for group analyses. These are considered limitations of the methods used in fMRI analysis. Despite these limitations, we applied GLM analysis to walking behavior and observed clear inactivity region. This region roughly corresponds to the synaptic cloud of a neuron named WAGN (Fig.5b and c). This neuron also connects to WPNb and ANs in the connectome data, suggesting a possibility that it is related to walking behavior. This is merely a screening reference; therefore, further biological experimentation is needed to pursue this topic.

      Recommendations for the authors:

      Reviewing Editor Comments:

      We should emphasize that the reviewers encouraged revision and resubmission. If the reviewers' comments were to be addressed in full in a revision to strengthen the evidence, this would significantly increase the impact of the findings and the relevance of the work to the fly neuroscience community and to the connectomics field more broadly.

      Thank you very much for your comments.

      Major Issues:

      (1) Structural correlation and functional correlation measure very different aspects of network data, yet a simple correlation between the off-diagonal elements of the two is used. It would be expected that this would not be directly proportional, and it's not clear why this would be a sensible measure. The authors need a better solution for dealing with the zero entries in the SC matrix. Replacing the infinities with zeros and then running the linear regression to get an SC/FC relationship is not appropriate. Even with a better metric, given that both intuition and other studies have shown a weak correlation between FC and SC, using FC-SC correlation as a quality descriptor for other properties is not proper. Furthermore, the authors don't account for neurotransmitter identity in the structural data, which would have strong implications for the relationships between FC and SC.

      Thank you for pointing this out. To investigate this issue we compared the FC-SC correlation between the Gaussian resampled SC approach used in Honey et al. (2009) [6] and the log-scaled SC used in this study (Ext. Data Fig.2-2a). With a small number of ROIs, the sparsity rate is low (Ext. Data Fig.2-2b), resulting in less zero replacement. Therefore, log-scaled SC is likely to more accurately represent the FC-SC relationship. Furthermore, with a large number of ROIs, the sparsity rate exceeds 70%, and Gaussian resampled SC randomly assigns a large number of zero elements from the smaller end of the distribution. This tends to lower the correlation (Ext. Data Fig.2-2c, d), suggesting that log-scaled SC provides fairer results. Log-scaled SC has been used in previous studies [9, 68] and is considered a simple method for showing the relationship (correlation) between FC and SC. When zero replacement is undesirable, using connection weights (the proportion of connections originating from the target region among all connections) can yield results similar to log-scaled SC (data not shown). It may be possible to compare various methods, but this is outside the scope of this study and requires further research.

      The C. elegans studies presented by Reviewer #3 showed a weak correlation between FC and SC. However, C. elegans neurons do not fire and exhibited different calcium fluctuations depending on the region (Hendricks et al., 2012). This suggested that the cell body and various synaptic terminal regions have different FCs, which is consistent with the objective of our study (neuronal compartmentalization). If a functional unit is locally composed of multiple neurons and synapses, it is expected that SC and FC from that region will show a strong relationship. Larger regions would include multiple functional units, and a relationship between SC and FC would also be found, which is consistent with the results of our study. The C. elegans study compared FC of the cell body (a region) with SC of whole cell (not a same region), which would be inconsistent.

      (2) Synaptic segregation on neurons can be topologically present even if pre- and post-synaptic synapses are present in similar regions of space, as an axon branch and dendrite branch can overlap in space but remain distinct along the arbor. The authors emphasize a region-based definition that does not reflect cellular anatomy. Moreover, the authors do not make an argument for their claim of better robustness of their new synaptic segregation measures.

      Author response image 2.

      Distance calculation for DBSCAN. a, Example synapse pair (black dot) of distance calculation. Red line shows the straight-line distance, and green line shows the morphology-based distance. DBSCAN will places two synapses in the same cluster based on straight-line distance, but they will be in different clusters based on the morphology-based distance.

      Thank you for pointing this out. We changed from using DBSCAN based on the straight-line distance between synapses to DBSCAN based on the morphology-based distance via the branch nearest to the synapse (Author response image 2a). This resulted in a synaptic segregation measure that incorporates cellular anatomy. We updated all related figures, such as Figure.4, Ext. Data Figure.4-1, 4-2, 4-3, 4-4, Figure.5h. Also, we updated related text in the Results and Methods sections.

      (3) Reviewers found the overall structure of the paper is difficult to follow, with sections appearing disjoint and the aims of different sections not well described. This extended to the paper organization as well, with the introduction not clearly setting up the questions and being distinct from the results. The manuscript would benefit from a clearer overarching narrative to unify the various analyses.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (4) Similarly, there are several descriptions of data and analysis that are unclear or lacking, including the source of the marmoset data and how the FlyWire synapse was subsampled.

      As pointed out by other reviewers, the marmoset data was obtained in [10], and the processing is also described in [10]. Therefore, we have revised the caption and removed the Ethics Declaration.

      We have updated the Results and Methods sections regarding the extraction of "traced" neurons and synapses in FlyEM connectome data, and the extraction of post-synapses in hemibrain primary ROIs in FlyWire connectome data.

      (5) Comparisons between FlyWire and Hemibrain have shown many similarities and some clear examples of inter-individual variability. There was concern that technical decisions with handling FlyWire synapse sampling were responsible for some of the differences observed between the datasets.

      In response to Reviewer #2's question, we answered that both FlyEM and FlyWire use proofread neurons and their connecting synapses. We also updated Fig. 3b and the Results and Methods sections.

      Reviewer #1 (Recommendations for the authors):

      The paper is written in an unusual way. It would be helpful if the introduction read more like a standard introduction. Describe the relevant background that the reader needs to understand the results that come later. Frame the experiments in terms of a question or hypothesis. Results should be relegated to the results section (or, if you like, a final paragraph that summarizes the findings). They should not be intermingled throughout the introduction.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      The authors must be more attentive in terms of how they construct the segregated/unsegregated connectomes. I suggest exploring various thresholds/bins, but also considering proportionality thresholds that match the number of synapses.

      Thank you for pointing this out. As pointed out by other reviewers, we changed from using DBSCAN based on the straight-line distance between synapses to DBSCAN based on the morphology-based distance via the branch nearest to the synapse (Author response image 2a). This resulted in a synaptic segregation measure that incorporates cellular anatomy.

      We also considered about the sparsity rates of the SC matrices. Since the sparsity rates of the null SC matrices differed a lot from that of the SC matrices extracted by cPPSSI, we regenerated the null SC matrices, shown in Fig. 4e and 4i. As shown in Author response image 1, we ensured that the extracted SCs fit within the null-generated matrices. This figure was added to Ext. Data Fig.4-5, and the main text was also revised.

      The authors need a better solution for dealing with the zero entries in the sc matrix. Replacing the infinities with zeros and then running the linear regression to get an sc/fc relationship is not appropriate.

      Thank you for pointing this out. To investigate this issue, as pointed out by other reviewers, we compared the FC-SC correlation between the Gaussian resampled SC approach used in Honey et al. (2009) [6] and the log-scaled SC used in this study (Ext. Data Fig.2-2a). With a small number of ROIs, the sparsity rate was low (Ext. Data Fig.2-2b), resulting in less zero replacement. Therefore, log-scaled SC is likely to more accurately represent the relationship. Furthermore, with a large number of ROIs, the sparsity rate exceeds 70%, and resampled SC randomly assigns a large number of zero elements from the smaller end of the distribution. This tends to lower the correlation (Ext. Data Fig.2-2c, d), suggesting that log-scaled SC provides fairer results. Using connection weights (the proportion of connections originating from the target region among all connections) can yield results similar to log-scaled SC (data not shown), because this matrix can also be very sparse. It may be possible to compare various methods, but this is outside the scope of this study and requires further research.

      It would be useful to include a description of where the human/marmoset datasets came from. It would be useful to describe the processing of those datasets and whether they're comparable to how the fly data was processed.

      As pointed out by other reviewers, the marmoset data was obtained in [10], and the processing is also described in [10]. Therefore, we have revised the caption and removed the Ethics Declaration.

      The pre-processing of fly calcium imaging data is described in the Methods section. Unfortunately, this processing method is not comparable to that used in humans/marmosets as it was highly customized.

      The authors report sc/fc correlations for the human/marmoset datasets based on single papers. However, in the human case, especially, the strength of sc/fc correlations is highly variable. Not just based on number/size of parcels, but based on amount of data, processing pipeline, single-subject versus group averaged (incidentally, single-subject sc/fc is ‘much’* lower than group-averaged, which has big implications for this study, where the fly datasets are, in essence, N=1 studies).

      Yes, there are numerous FC-SC correlation studies. We think Honey et al. (2009) [6] to be a highly representative study. It showed r = 0.39 to 0.48 for individual participants in 998 ROIs, and r = 0.36 for averaged one, but it increased r = 0.53 excluding absent or inconsistent structural connections. So, single-subject may not be much lower than group-averaged. Since the SC for a fly is an N=1 study, the FC-SC correlation for the same individual cannot be calculated. We think further research will be necessary.

      Reviewer #2 (Recommendations for the authors):

      Abstract:

      Please introduce the term "ROI"

      Thank you for pointing this out. We have revised the Abstract.

      Introduction:

      (1) On a general note: the introduction reads like an extended abstract (i.e., a mix of results and discussion).

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (2) Line 43: Does this mean FC-SC correlation is higher in flies but not significantly so? Please clarify.

      We performed Mann-Whitney U test and it was not significant (p= 0.2667).

      (3) Line 51: The "confidence" score does not indicate the degree of synaptic detection.

      In the NeuPrint user guide, https://neuprint.janelia.org/public/neuprintuserguide.pdf it states “confidence - The certainty that an annotated synapse is correct and valid.” Since “degree of synaptic detection” may be difficult to understand, we changed it to “certainty of an annotated synapse.”

      (4) Line 59-61: This statement needs refining: post-synapses do not "receive" neurotransmitters, action potentials aren't conducted along nerve fibres.

      We changed “receive” to “sense.” About “action potentials,” we changed “conduct an action potential” to “graded potentials”, and removed “along nerve fibers.”

      (5) Line 61: calcium activity as detected via GCaMP correlates with (electric) neuronal activity - please cite relevant GCaMP literature here.

      We added F. Helmchen and J. Waters, "Ca2+ imaging in the mammalian brain in vivo," Eur J Pharmacol., vol. 447, pp. 119-129, 2002.

      (6) Line 76: "interconnected" is rather vague; just say "many Drosophila neurons are reciprocally connected".

      Thank you for pointing this out. Lin et al., (2024) showed motif analysis and there are many reciprocal, three-node and rich-club connections. However, introduction was updated and this sentence was removed.

      (7) Line 77: comparing unsegregated vs reciprocal synapses is overly simplistic; these are separate features of the same object - i.e., a synapse can be reciprocal and at the same time be segregated in the presynaptic neuron but unsegregated in the postsynaptic neuron.

      Thank you for pointing this out. As you say, the relationship is complicated. In this paper, we are concerned with the degree of segregation of pre- and post-synapses, and we are looking at the segregation within a neuron. In this case, nearby reciprocal synapses (<=2 μm) are included in unsegregated synapses. We have made a correction to the sentence.

      (8) Line 79: I don't understand how we get from unsegregated synapses to local activity.

      Retinal amacrine cells have extensive unsegregated synapses, which provide local feedback inhibition of burst inputs [34]. We changed the text around these descriptions.

      (9) Line 80: What does "more essential function" mean?

      We removed this sentence.

      (10) Line 85: "as shown earlier": Is this based on results in this study or prior work? See also the general above note on mixing results/discussion into the introduction.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (11) Line 85-87: I don't understand how the applicability of certain fMRI analysis methods in turn means that functional activity is locally compartmentalized. Did you mean to say something along the lines of "we applied common fMRI methods which showed functional activity is locally compartmentalized"?

      These sentences discuss the commonality between fMRI (BOLD signal) and calcium signal, which both represent presynaptic population dynamics within a local region (voxel). Furthermore, unsegregated synapses are widespread throughout the fly brain (Ext. Data Fig.4-2) and can also be observed in human pyramidal cells (Ext. Data Fig.1-2). Unsegregated synapses suggest local compartment activity [33, 34, 39, 40] and contribute more to functional activity (Fig.4b). Therefore, the similar trend in FC-SC correlation (Fig.2d) between humans and flies suggest that both species exhibit localized compartmental activity via unsegregated synapses throughout the entire brain.

      Because these sentences contain many conclusions, they have been moved from the Introduction to the Discussion section.

      (12) Line 87: Please provide a reference for "common among various species".

      Thank you for pointing this out. Because these sentences contain many conclusions, they have been moved from the Introduction to the Discussion section.

      Results:

      (1) Line 91-92:

      (a) Please explain where the calcium data came from, how it was generated, etc.

      We added the data source and a reference (Brezovec et al. [14]).

      (b) Please clarify: what registration method?

      This is not simple. Please see the Methods section and Ext. Data Fig.1-1. This is also indicated in the text.

      (c) "calcium image" → "calcium image data"?

      We changed “calcium image” to “calcium imaging data”.

      (d) What is the "FDA template"?

      This is a brain template created by Brezovec et al. [14]. JRC2018 is a well-known brain template, but it was created by immunostaining postmortem brains and did not fit well with calcium imaging data from living flies. Therefore, we used the FDA template.

      (2) Line 93: Please introduce the term "ROI".

      We added “(Region of Interest)” in Line 38.

      (3) Line 94: Ito et al., Neuron (2014) "A systematic nomenclature for the insect brain" is a better reference for Drosophila neuropils; for the hemibrain, the ROIs were generated to match that original atlas

      Thank you for pointing this out. We added a reference.

      (4) Line 95/96: It is unclear what was used as the basis for the k-means/distance-based clustering

      This was because we wanted to investigate whether nuisance factor removal methods are robust, even for such diverse types of ROI. We added this point to the text.

      (5) Line 120ff: I'm not sure how the total number of ROIs is relevant for comparing flies and humans, given (a) the huge difference in brain size and (b) the difference in resolution of the functional data.

      Indeed, the fly brain and the human neocortex are completely different. We are investigating whether there are commonalities between them using a metric called FC-SC correlation. As described in our answer for (11), both the fMRI (BOLD signal) and calcium signal represent presynaptic population dynamics within a local region (voxel). FC represents the synchronization of synaptic activity between regions, and SC represents the structural connectivity of neurons. Both flies and humans showed high SC-FC correlation and showed similar trends (Fig. 2d), so we believe it would be interesting to investigate this phenomenon.

      (6) Line 123: "by contrast" is misleading here since, as you say, there isn't really a difference.

      We changed “by contrast” to “and.”

      (7) Line 141: I'm somewhat worried that the differences between FlyWire and hemibrain synapse counts are due to the issues mentioned above.

      Thank you for the comment but we are not sure about “the issues mentioned above” is referring to.

      (8) Line 148: There is no evidence that any differences in synapse are due to the resolution or anisotropy (as suggested in the introduction).

      We apologize that we don’t have direct evidence for it. We changed this to the sentence “This may be caused by differences in detection accuracy resulting from the resolution of EM scanning, but not to inter-individual variability.”

      (9) Line 155: References "39,45" have no brackets.

      These are not referencing numbers, but brain regions of Brodmann area 39 and 45.

      (10) Line 155-157: I don't think we can infer the composition of brain areas in humans based on a tenuous correlation in flies; this is highly speculative and really should be in the discussion.

      In humans, there are areas with strong and weak FC-SC correlations [8], which may be due to the E-I (Excitatory-Inhibitory) balance of connections. We investigated this possibility by comparing the correlation between neurotransmitters and FC-SC correlations in the fly brain. We slightly changed this sentence.

      (11) Line 159: I find the first 2-3 sentences in this paragraph confusing. Are you saying that you did all these things in the prior results sections, or that you wanted to look at X and therefore you did Y? Maybe there is an issue with the tense here?

      We changed the sentences around this description.

      (12) Line 161: "whole-brain" = FlyWire?

      We changed “whole-brain” to “FlyWire”.

      (13) Line 163: Please explain the "PPSSI" acronym.

      This is now explained on Line 75.

      (14) Line 165: The description of how the cPPSSI was calculated is hard to follow. For example, what's the "fraction of synapse number".

      We changed our sentences around this description to be clearer. The cPPSSI is the degree of segregation within a cluster and is also assigned to each synapse. The PPSSI is then the average of the cPPSSI values of all synapses in a neuron.

      (15) Line 166: Is there a difference between "cPPSSI" and "PPSSI"?

      Yes, there is. Please see our answer for (14).

      (16) Line 167: "The result showed a histogram resembling a normal distribution" → I suggest running a normality test.

      Thank you for pointing this out. We tested it by Lilliefors test and the result was p=0.001 (significantly not a normal distribution). Since there are numerous values with PPSSI=1, it is not judged to be a normal distribution. We therefore changed this description.

      (17) Line 173: I am somewhat worried about a selection bias in your correlation of segregated vs unsegregated synapses. First, it seems like only a small fraction of neurons are in the 0-0.1 and 0.9-1 PPSSI range. I would suggest running a proper correlation between PPSSI and FC-SC correlation instead of looking at just the two extremes. Second, your examples for segregated neurons (APL + CT1) are large neurons that densely innervate spatially close and functionally very similar neuropils. If the sample of unsegregated neurons consists mainly of these large interneurons, I'm not at all surprised that they contributed strongly to FC-SC correlation.

      Thank you for pointing this out. For this work we investigated synapses (not neurons), extracting those with cPPSSI of 0-0.1 and 0.9-1, and performed a rank text with the FC-SC correlation of random sub-sampled synapses. We aimed to demonstrate that unsegregated synapses in particular, strongly contribute to FC-SC, and we hope to investigate overall trends in a future study.

      (18) Line 185: I don't think the function of reciprocal synapses is "considered to be clear". There are examples of feedback inhibition through reciprocal synapses, in particular in the visual system, but that does not mean that this is true across the board.

      We changed “considered to be clear” to “considered to be clearer than unsegregated synapses.” Of course, the function of reciprocal synapses is unknown for the whole brain, but we think it is more well-studied than unsegregated synapses.

      (19) Line 188 / Figure 4h: that figure panel does not appear to show transmitter pairs.

      Figure 4h (FlyWire) showed transmitter pairs. Ext. Data Fig.4-1g did not, because FlyEM does not have transmitter information.

      (20) Line 192: Please clarify "functionally common".

      We changed our sentences to clarify this.

      (21) Line 199: "ventral nerve code" → "ventral nerve cord".

      We fixed this typo.

      (22) Line 201: I don't think you can use "conversely" here.

      We changed “Conversely” to “Moreover.”

      (23) Line 201: How certain are you that the WAGN neuron is the only candidate? Also, it would be nice to provide the neuron IDs so that people can identify them in the connectome.

      Thank you for pointing this out. We added Root ID: 720575940644632087 in the text. Actually, we found several GABA neuron candidates, such as 720575940637611365, 720575940644632087, 720575940613552947, 720575940640333109 and 720575940612264817. We investigated whether ER1(L) was present in these downstream connections and found that 720575940644632087 had the strongest connection with the largest number of synapses, so we adopted this.

      (24) Line 207: When you say "the left WAGN was strongly connected", are those connections not also present for the right WAGN?

      There is a right WAGN (Root ID: 720575940624377224), but it does not have strong interconnections with WPNb tier 2/3 (left) neurons. For the right WAGN, there are few inputs from WPNb tier 2/3 (left). We added “(left)” in the text.

      (25) Line 212: I don't think you can use "however" here.

      We removed “however.”

      (26) Line 214: "well unsegregated" → "very unsegregated"?

      This sentence was removed, because we recalculated Fig. 5h.

      Ethics Declaration:

      It seems the marmoset data were reported on in [10], so why is there a reference to the generation of the dataset?

      Yes, marmoset data were reported in [10], so we removed the Ethics Declaration.

      Reviewer #3 (Recommendations for the authors):

      (1) In my opinion, the title and framing of this manuscript dramatically overstate the results presented here. Also, the results presented in the different figures in this manuscript seem disjointed and are not very related to each other.

      Thank you for pointing this out. We have rewritten our manuscript slightly to address this. Inter-individual variability is relevant to both SC and FC. Regarding SC, we think the difference in the number of synapses between the two individuals is due to the difference in detection power caused by differences in the resolution of the electron microscope. Regarding FC, as stated in the Results section, “Spatial smoothing is useful for absorbing inter-individual variability and conducting second-level group analysis.” Increasing the smoothing size improves the correlation and AUC between group-averaged FC and SC, indicating the presence of inter-individual variability in FC (Fig. 2b, Ext. Data Fig. 2-1b, especially when the number of ROIs is high). We added this text in the Introduction and Results sections.

      (2) There are multiple ways to compute structural correlation matrices-the methods the authors implemented should be discussed in greater detail in the manuscript.

      Thank you for pointing this out. To investigate this issue, as pointed out by other reviewers, we compared the FC-SC correlation between the Gaussian resampled SC approach, used in Honey et al. (2009) [6] and the log-scaled SC approach, used in this study (Ext. Data Fig.2-2a). With a small number of ROIs, the sparsity rate was low (Ext. Data Fig.2-2b), resulting in fewer zero replacement. Therefore, log-scaled SC is likely to more accurately represent the relationship in our study. Furthermore, with a large number of ROIs, the sparsity rate exceeds 70%, and resampled SC randomly assigns a large number of zero elements from the smaller end of the Gaussian distribution. This tends to lower the correlation (Ext. Data Fig.2-2c, d), suggesting that log-scaled SC provides fairer results. Using connection weights (the proportion of connections originating from the target region among all connections) can yield results similar to log-scaled SC (data not shown), because this matrix can be also very sparse. The log-scaled SC aprroach has been used in previous studies [9, 68] and is considered a simple method for showing the relationship (correlation) between FC and SC. It may be possible to compare various methods in-depth, but this is outside the scope of this study and requires further research.

      (3) The use of the FC-SC detection score defined by the authors should be discussed and justified more extensively in the text.

      Thank you for pointing this out. This has already been discussed in [10]. We defined our own “FC-SC detection score,” but we consider the overall approach to be well established in the literature. For example, Stafford et al. (2014) carried out FC-SC detection for 168 mouse cortical regions, and obtained 78.26% sensitivity and 81.69% specificity for the top 1% of SC. Hori et al. (2020) also investigated FC-SC detection for 55 cortical regions of the marmoset brain left hemisphere, achieving an AUC of 0.72. We think FC-SC detection is an index that evaluates the relationship between FC and SC from a different angle than FC-SC correlation and is worthwhile.

      Hori et al., (2020). Comparison of resting-state functional connectivity in marmosets with tracer-based cellular connectivity. NeuroImage, 204, 116241.

      Stafford et al., (2014). Large-scale topology and the default mode network in the mouse connectome. Proc. Natl. Acad. Sci. U.S.A., 111(52), 18745-18750.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) They start by incubating LFA-1 with iRBCs and show by flow analysis that a substantial population of these iRBCs binds to the LFA-1 (Figure 1C). They do conduct the control with uninfected RBCs, but put this in the supplementary material. As this is a critical control, I think that it should be moved to Figure 1C as it is essential to allow interpretation of the iRBC data. The authors also do not state which strain of P. falciparum they used (line 144). This is critical information as different strains have different variant surface antigens and should be included. With these changes, this data seems convincing.

      We thank the reviewer for this important suggestion. We agree that the uninfected RBC (uRBC) control is critical for interpreting the specificity of LFA-1 αI-Fc binding. In the revised manuscript, we have ensured that these control data are clearly presented and appropriately referenced in the main text; however, we have retained them in the Supplementary Information (Supplementary Figure S1) to maintain clarity and avoid overcrowding Figure 1, while still ensuring their visibility and accessibility to the reader. Importantly, these data demonstrate negligible binding of LFA-1 αI-Fc to uRBCs compared to iRBCs, supporting specificity. We have explicitly stated the parasite strain used (Plasmodium falciparum 3D7) in the Methods section (line 475).

      (2) They next incubated LFA-1 with the iRBCs, cross-linked and conducted a pulldown, identifying GP130 as a binding partner. Using cross-linkers is a dangerous strategy as it risks non-specific cross-linking. Did they try without cross-linking and find an interaction?

      We agree that cross-linking can introduce potential artefacts. To mitigate this, we included hIgG control pulldown experiments performed under identical conditions. Proteins identified in the control eluate were excluded as background (summarized in Supplementary Table S1). Importantly, PfGBP-130 was the only protein specifically enriched in the LFA-1 αI-Fc pulldown across all three biological replicates (Fig. 2A, Venn Diagram). While cross-linking was used to stabilize transient interactions, consistent enrichment of PfGBP-130 across the three biological replicates precludes any concerns of non-specificity.

      (3) They raised antibodies to PfGBP and showed IFA, which reveals that these antibodies stain iRBCs (Figure 2Ciii). This experiment lacks a critical control of uninfected RBCs, which needs to be included to show that the staining is specific. Without this, it is not possible to conclude that there is iRBC-specific staining with PfGBP.

      The question pertains to Fig. 2Biii. The IFA images include both infected and neighboring uninfected erythrocytes within the same field. No PfGBP-130 staining is observed in uninfected cells. PfGARP staining, specifically done to verify parasite-infected cell and surface localisation, shows complete resonance with PfGBP-130 staining. This unequivocally shows that the antibodies raised specifically recognise only infected RBCs.

      (4) They then conduct a pulldown using LFA-Fc, which does show GP130 only in the presence of the LFA-Fc, but not when empty beads are used. This is convincing. BLI measurements are also used to study this interaction (Figure 2Ci). The BLI data is presented in such a way that any association phase is obscured by the y-axis, which makes it impossible to know whether there is binding here. I think that the data needs to be shown with some baseline before the addition of the ligand so that the association can be seen. The data is also a bit messy with a downward drift and the curves showing different shapes, for example, with the 1.0uM curve seeming to have a different association rate. Also, is this n=1? I think that this data needs to be repeated and replicated. As this is the only data which shows a direct interaction between LFA1and GBP, as pulldowns are done with lysates, which might mean bridging components. I think that it is important to repeat the BLI or use additional biophysical methods to assess binding, to obtain more convincing data.

      We sincerely thank the reviewer for highlighting this important concern regarding the BLI data presentation and interpretation. We would like to clarify that the baseline signal prior to ligand addition was subtracted during data processing; therefore, the plotted curves represent the net response following ligand association. However, we agree that this may have obscured the visualization of the association phase. Accordingly, in the revised manuscript, we have re-plotted the data with adjusted y-axis scaling to better capture the association kinetics. In addition, to ensure robustness and reproducibility, the BLI experiments were performed in multiple independent replicates (n ≥ 3) using independently purified protein batches. The original figure showed a representative dataset; we have now included averaged sensorgrams along with standard deviation in the calculated KD values [K<sub>D</sub> = (1.7 ± 0.22) × 10<sup>-8</sup> M] (Figure 2C (i)). These revisions provide a clearer and more accurate representation of the binding interaction.

      (5) The authors next do some modelling of the putative complex. This is done by homology modelling and docking, which is not the most up-to-date method and is over-interpreted. Personally, I would remove this data as I did not find it convincing, and it is not important for the story. If the authors wish to include it, then I think that they should validate the modelling by mutagenesis to show that the residues which the models indicate might bind are involved in the interaction.

      We thank the reviewer for this thoughtful comment regarding the modelling analysis. We agree that computational docking and homology-based modelling have inherent limitations and should not be over-interpreted. In our study, these analyses were included strictly as supporting evidence to provide a structural framework for the PfGBP-LFA-1 interaction, while the primary conclusions are based on direct biochemical and functional validation, including pull-down, BLI measurements, receptor knockdown, and cellular inhibition assays. Importantly, the use of docking approaches such as ClusPro, followed by interface analysis and MD simulations, is a widely accepted and routinely used strategy to generate testable hypotheses for protein-protein interactions, particularly when experimental structures are unavailable (e.g., Comeau et al., 2004; Weng et al., 2019). We believe that the current modelling serves as a useful complementary analysis that is consistent with, and supportive of, the experimentally validated interactions.

      (6) They next made GP130 and tested the binding of this to THP-1 cells, which are often used as a model for macrophages. They observe greater binding of PfGBP-Fc to these cells when compared with hIgG and show that LFA-1 siRNA reduces this binding. I was a little confused about how the flow plots related to the graph in the bottom right corner of Figure 3Bii. In the flow plots, hIgG control shows 12.8% of cells in the gated region, while the unstained cells has 5.63%, but the MFI data shows a decrease in binding for hIgG vs unstained cells. How is this consistent? Also, the siRNA reduces the number of cells in the gated region from 66.6% to 25.9%, which is still substantially more that 5.63% in the unstained control. This also doesn't seem quite consistent with the MFI data. Could the authors explain this? Also, perhaps an additional experiment would be to add soluble LFA-1 into this assay as an additional control to determine whether this blocks PfGBP binding to the THP-1 cells? It could be that there are additional mechanisms of binding which indicate why the siRNA has a partial effect. The same is true for the NK cell experiments in Figure 3Ci, in which the siRNA has a partial effect. The authors also test binding to HEK, HepG2 and 'stem' cells and claim' only background levels of binding', but in each case, there is more binding to these cells by PfGBP-Fc than by hIgG, albeit less than in THP-1 and NK cells. Why have the authors decided that these increases are not significant? All in all, these experiments do indicate a role for the GBP-LFA1 interaction in the binding of immune cells to iRBCs, but perhaps not as absolutely as is suggested.

      We thank the reviewer for this insightful comment. The apparent discrepancy arises because the flow plots depict the percentage of cells within a defined positive gate, whereas the graphs quantify mean fluorescence intensity (MFI) across the entire population. We have revised figure legend accordingly to indicate the same. Regarding the partial reduction in binding upon LFA-1 (CD11a) knockdown, we agree that this indicates LFA-1 is a major but not exclusive contributor, which is biologically plausible given incomplete siRNA depletion and the known avidity-dependent nature of integrin interactions. Importantly, our conclusion is supported by multiple orthogonal approaches (αI-domain binding, LC-MS/MS identification, BLI, docking, receptor knockdown, and functional blockade). We also appreciate the suggestion of soluble LFA-1 competition, which we acknowledge as an important future experiment. Finally, we have revised the text regarding HEK293T, HepG2, and stem cells to reflect that PfGBP-Fc binding is minimal but not absent, consistent with low/non-expression of LFA-1 in non-immune cells. Overall, we have moderated our claims to state that PfGBP-LFA-1 interaction is a dominant and functionally relevant mechanism, while not excluding additional low-affinity or accessory interactions.

      Figure legend change: Representative flow plots depict the percentage of cells within a predefined positive gate, whereas the accompanying summary graph quantifies fluorescence intensity across the analyzed population. These two metrics report distinct properties of the distribution and are therefore not expected to be numerically identical.

      (7) The authors next produce CHO cells with PfGBP on the surface. These cells bind toLFA-1 specifically. When these cells were incubated with primary NK cells, they did see increases in activation markers, which were reduced by the addition of anti-CD11a, suggesting these to be specific. They also conduct the same experiment with anti-GBP with iRBCs, but this is in a different figure. It would be easier for the reader if Figure 5B were in the same figure as Figure 4B, as it is related data using the same method. I found this data convincing, showing that the LFA1:GBP interaction does contribute to immune cell recognition and activation.

      We thank the reviewer for this positive assessment and helpful suggestion regarding figure organization. We agree that the CHO-PfGBP and iRBC-based NK cell activation assays represent conceptually related experiments that both address LFA-1-PfGBP dependent activation using similar readouts. We have retained separate panels to distinguish the reductionist CHO-based system from the physiologically relevant iRBC context. We believe that the combined evidence from both systems strengthens the conclusion that PfGBP-LFA-1 interaction is a key contributor to NK cell recognition and activation.

      (8) The authors next conduct an experiment in which they assess parasite growth in the presence of NK cells and in the presence of anti-GBP. They use Heochst staining as a measure of parasite growth and claim that NK cells reduce the number of parasites, but that anti-GBP abolishes this effect (Figure 5A). I found this experiment very unconvincing as there are small effects and no demonstration of significance. More commonly used approaches to study parasite growth are lactate dehydrogenase GIA assays or calcein-AM labelling. I did not find this experiment convincing and would either remove or supplement with additional data using a more robust assay, with repeats and tests of statistical significance.

      We respectfully disagree that the assay should be removed, because flow-cytometric quantification of P. falciparum parasitemia using DNA dyes such as Hoechst is a widely used, accepted, and high-throughput approach for measuring infected erythrocytes and parasite growth, with clear separation of infected from uninfected RBCs and good reproducibility across malaria studies (Dent et. al., 2009; Jang et. al., 2014). Importantly, closely related immune-cell killing experiments in the malaria field have used the same general strategy, co-culture with effector cells followed by flow-cytometric enumeration of parasitemia to infer parasite control, including the seminal NK-cell study by Chen et. al., 2014, which our assay design follows conceptually, and later work showing reduced parasitemia after co-incubation with cytotoxic lymphocytes measured by nucleic-acid dye flow cytometry. We therefore believe the experiment is methodologically valid and directly relevant to the biological question, namely whether disrupting PfGBP-LFA-1 engagement alters NK-cell-mediated restriction of parasite expansion.

      Reviewer #2 (Public review):

      (1) PfGBP-130 is proposed to be a membrane protein based on a single predicted transmembrane domain. Figures 2b and 3a show ribbon schematics with this TM domain at residues 51-68, in agreement with TM prediction algorithms such as TMHMM 2.0 and Phobius. However, this predicted TM is upstream of the PEXEL motif (residues 84-88, sequence RILAE), a conserved sequence for parasite protein export to host cytosol that is proteolytically processed at its 4th residue. Thus, residues 1-87are removed from PfGBP-130 prior to export, yielding a mature protein without predicted TMs. Prior studies have determined that the mature PfGBP-130 lacks TMs and is retained as a soluble protein in host cell cytosol (PMID: 19055692, 35420481). Thus, the authors' model of PfGBP-130 as a surface-exposed membrane protein conflicts with both computational analysis of the mature protein and these prior reporter studies. An important simple experiment would be to evaluate PfGBP-130membrane association in immunoblots using the authors' PfGBP-130 antibody after hypotonic lysis (PMID: 19055692) and after alkaline extraction (e.g. 100 mM NaCO3, pH 11 as frequently used, PMID: 33393463). If the prior studies and computational analyses are correct, the protein will be predominantly in the soluble and/or alkaline supernatant fractions.

      We thank the reviewer for this important observation regarding PfGBP-130 topology and export. We agree that the presence of a PEXEL motif supports proteolytic processing and that the mature protein may lack a classical transmembrane domain. However, consistent with our model of surface accessibility, we would like to clarify that in an independent proteomic study performed in our laboratory on the membrane-enriched fraction of Plasmodium falciparum-infected erythrocytes, PfGBP-130 was reproducibly identified by LC-MS/MS among membrane-associated proteins (data not shown; can be provided upon request). These findings support the conclusion that, irrespective of the absence of a canonical transmembrane domain, PfGBP-130 is associated with the iRBC membrane compartment, likely via peripheral or protein-complex–mediated interactions, as described for several exported Plasmodium proteins.

      (2) Many findings rely on the specificity of antibodies generated against PfGPB-130 or NK cell receptors. Although the authors have included key controls (use of isotype control antibodies, lack of anti-PfGBP-130 binding to uninfected cells), cross-reactivity between P. falciparum antigens is well-recognized and could significantly undermine the interpretation of experiments (PMID: 2654292 and 1730474 provide key examples of antigens recognized by antibodies raised against other proteins). For example, the surface localization in IFA experiments (Figure 2B(iii)) could reflect anti-PfGBP-130binding to an unrelated parasite surface antigen, a possibility not addressed by any of the authors’ controls. As another example, the iRBC lysate immunoblot using this antibody in Fig. 2B(iv) suggests a MW of 95 kDa, which corresponds to the unprocessed pre-protein before export; cleavage in the PEXEL motif yields a processed mature protein of 85 kDa, which should be readily resolved from the pre-protein in immunoblots (PMID: 19055692). A better immunoblot using immature infected cell stages might show both the pre-protein and the mature protein as a doublet band.

      We thank the reviewer for raising this important concern regarding antibody specificity. We agree that cross-reactivity among P. falciparum antigens is a known issue and have taken multiple steps to ensure specificity in our study. First, the anti-PfGBP-130 antibodies were generated against a defined recombinant fragment and show no detectable binding to uninfected RBCs and no signal in hIgG control immunoprecipitates, supporting specificity. Importantly, in our LC-MS/MS analysis of LFA-1 αI-domain pull-downs, PfGBP-130 was specifically enriched and consistently identified across replicates, independently validating the target recognized by the antibody. Furthermore, the same antibody detects a single dominant band in both iRBC lysates and αI pull-down fractions, arguing against widespread cross-reactivity. Regarding the apparent molecular weight (~95 kDa), we agree that this likely corresponds to the precursor form, and that a processed form (~85 kDa) may not be well resolved under our current conditions.

      (3) PfGBP-130 is not essential for in vitro cultivation (PMID: 18614010 and MIS of 1.0 in the piggyBac mutagenesis screen as tabulated on plasmodb.org, indicating a highly dispensable gene). The authors should use the knockout line as a control in their IFA localization experiments to address antibody specificity. More fundamentally, their model predicts that NK cells should not recognize or kill infected cells from the knockout line when compared to their untransfected parent. Such results with the knockout line would compellingly support the authors' model without reliance on antibodies that may cross-react with other parasite antigens. PMID: 18614010reported that the PfGBP-130 knockout exhibited increased membrane rigidity, suggesting an intracellular scaffolding protein rather than a surface localization and use as a ligand for LFA-1 interaction and NK cell-mediated killing.

      We agree that a PfGBP-130 knockout line would provide a powerful genetic validation of both antibody specificity and the proposed functional role of PfGBP-130 in NK cell recognition. At present, such experiments were not included in this study, and we acknowledge this as an important limitation. However, we would like to emphasize that our conclusion does not rely on antibody-based localization alone; rather, it is supported by multiple orthogonal approaches, including LFA-1 αI-domain pull-down coupled to LC-MS/MS, biophysical interaction analysis, receptor knockdown, and functional blocking assays. In addition, in one of our previous proteomic analyses of the membrane-enriched fraction of infected erythrocytes, PfGBP-130 was identified among the proteins present in the membrane fraction, supporting its association with the iRBC membrane compartment despite lacking a classical mature transmembrane domain.

      (4) PfGBP-130 non-essentiality raises the question of why the gene would be retained if it triggers NK cell-mediated killing of infected cells in vivo. Presumably, this killing would pose strong selective pressure against retention of PfGBP-130. Some speculation is warranted to support the model.

      We thank the reviewer for this thoughtful evolutionary question. We agree that if PfGBP-130 enhances NK-cell recognition, its retention likely reflects a context-dependent fitness trade-off rather than a simple benefit or cost. This situation is not unusual in P. falciparum: several exported or surface-associated proteins are retained despite being immunogenic because they also provide advantages in other settings, such as erythrocyte remodeling, cytoadhesion, niche adaptation, immune modulation, or transmission. The clearest precedent is the PfEMP1/var system, in which highly immunogenic surface antigens are nevertheless strongly maintained because they mediate sequestration and in vivo fitness, while antigenic variation limits continuous immune exposure (Chew et. al., 2022). Similarly, other variant surface antigens such as STEVOR and RIFIN are retained despite immune recognition because they contribute to erythrocyte binding, antigenic diversity, and immune evasion or modulation (Niang et. al., 2009; Sakoguchi et. al., 2025). More broadly, many P. falciparum genes that appear dispensable in standard in vitro culture are nevertheless preserved because culture does not recapitulate the selective pressures present in vivo, including splenic clearance, endothelial interactions, immune attack, and within-host competition.

      Reviewer #3 (Public review):

      (1) Anti-GBP130 antibodies are used in the cellular assays to block the interaction between GBP130 and LFA1. They should therefore also block interactions betweenGBP130 and LFA1 recombinant proteins in the biolayer interferometry experiment. Do the authors have data to show this? Similarly, the anti-CD11a antibodies used to block the interaction in the cellular assays should also block the in vitro interaction between recombinant LFA1 and GBP130.

      We thank the reviewer for this insightful suggestion. We agree that demonstrating antibody-mediated inhibition of the recombinant PfGBP-LFA-1 interaction would provide an additional orthogonal validation of the interface. While such blocking experiments were not included in the original BLI dataset, our current study already establishes the specificity of this interaction through multiple independent approaches, including αI-domain pull-down and LC-MS/MS identification, BLI-derived high-affinity binding (KD ~10<sup>-8</sup> M), structural docking, receptor knockdown, and antibody-mediated inhibition in cellular systems. We note that antibody-mediated blocking in a purified biophysical system is not always directly comparable to cellular assays, as epitope accessibility, orientation on biosensor surfaces, and conformational states of integrins (which are known to undergo activation-dependent structural changes) can influence inhibition efficiency. Nonetheless, we fully agree that this represents an important validation experiment.

      (2) The structural modelling analysis of the predicted complex between GBP130 andLFA1 (Figure 2cii) predicts that the majority of the important GBP130 interface residues are located in the region D509-N607. However, the authors present BLI data for the GBP130-LFA1 interaction, which used the N-terminal fragment of GBP (residues 69-270), which does not include the GBP130 residues predicted to be important for the formation of the complex between the two proteins. Could the authors provide an explanation for how an interaction was observed with theGBP130-N fragment, which does not contain the residues predicted to be important for interacting with LFA1?

      We thank the reviewer for this important observation. We agree that the structural model predicts a major interaction interface within the D509-N607 region of PfGBP-130; however, this does not preclude the existence of additional or auxiliary binding determinants within the N-terminal region used in our BLI assays (aa 69-270). PfGBP-130 is a multi-domain, repeat-containing protein, and such proteins frequently exhibit distributed or multivalent interaction interfaces, where individual regions can independently engage binding partners with lower affinity while the full-length protein achieves higher avidity through cooperative interactions. In our study, the BLI data using the N-terminal fragment demonstrate that this region is sufficient to mediate direct interaction with the LFA-1 αI domain, whereas the structural model based on full-length predictions likely captures a dominant or higher-affinity interface in the C-terminal region. Importantly, the interaction is supported by multiple orthogonal datasets, including pull-down/LC-MS/MS, cellular binding assays, and functional inhibition, indicating that the observed binding is not an artefact of fragment choice.

      Author response image 1.

      To further examine this, we performed docking and binding energy analyses comparing the full-length PfGBP-130-LFA-1 complex with the N-terminal domain-LFA-1 complex. Using the PRODIGY server, the predicted binding affinity for the full-length complex was -9.8 kcal/mol, whereas the N-terminal domain complex exhibited a still favorable binding energy of -5.6 kcal/mol. Similarly, HawkDock (v2) analysis yielded binding energies of -22.2 kcal/mol for the full-length complex and -14.1 kcal/mol for the domain-only complex. While reduced relative to the full-length protein, these values remain well within the range of stable protein-protein interactions, supporting the ability of the N-terminal region to independently contribute to binding. These energy calculations take into account all non-covalent interactions. For clarity, hydrogen bonds have been specifically highlighted in the figure to represent key interaction interface.

      (3) There is no section in the materials and methods describing how the BLI was performed; this should be added. The highest concentration ofGBP130 used in the interaction measurements is 1.4uM, almost 100x the measured Kd (0.015uM) for the GBP130-LFA1 interaction. At these high concentrations ofGBP130, I would expect to start seeing saturation of binding, but the interferometry curves show that saturation is not close to being reached. This strongly suggests that the binding of GBP130 to LFA1 is non-specific.

      We thank the reviewer for raising these important technical points. We have included a detailed description of the biolayer interferometry (BLI) methodology in the Materials and Methods section in the manuscript. Regarding the concern about lack of saturation at higher analyte concentrations, we respectfully disagree that this necessarily indicates non-specific binding. In BLI assays, incomplete saturation can arise from several well-recognized factors, including suboptimal orientation or partial inaccessibility of immobilized ligand on the biosensor, mass transport limitations, or heterogeneous binding populations particularly relevant for integrins such as LFA-1, whose αI domain exists in multiple conformational states with distinct affinities. Importantly, the interaction exhibits clear concentration-dependent association and dissociation kinetics that fit a 1:1 binding model with a KD in the nanomolar range, which is inconsistent with non-specific interactions that typically show poor fitting and minimal dissociation. Furthermore, the specificity of the PfGBP-LFA-1 interaction is supported by multiple independent lines of evidence in our study, including selective enrichment in αI-domain pull-downs, absence in IgG controls, reduction upon CD11a knockdown, and functional inhibition by blocking antibodies in cellular assays. We have now clarified these points in the revised manuscript and tempered the interpretation to acknowledge potential experimental constraints of BLI while maintaining that the cumulative data strongly support a specific interaction.

      Minor points:

      (1) For the pulldown experiments, can the authors confirm that cross-linking was also performed for the protein A beads + hIgG control?

      Yes, DTSSP cross-linking was performed identically in the protein A beads + hIgG control arm. This is consistent with the control design described in the manuscript.

      (2) If the recombinant CD11a I subdomain used as a probe is correctly folded and functional, it should bind ICAM1. Do the authors have this data?

      We agree that ICAM-1 binding is an important functional validation for the recombinant CD11a αI probe (Hogg et. al., 1998). The isolated αI domain of LFA-1 is well established as the principal ICAM-1-binding module, and soluble αI-domain reagents have previously been shown to bind/block ICAM-1 interactions. We did not include this control in the current version.

      (3) Were the authors able to perform the reciprocal pull-down, using pfGBP130-N-Fc to pull down LFA1 from cell surfaces?

      We did not perform a reciprocal pull-down with PfGBP130-N-Fc and native cell-surface LFA-1 in the present study; we agree this would be a useful orthogonal experiment.

      (4) After identifying GBP130 as a co-purifying protein in the LFA-1 pull-down experiments, the authors select an N-terminal fragment of GBP130 to recombinantly express and use. How did the authors narrow down which region of GBP130interacted with LFA-1?

      The N-terminal PfGBP130 fragment (aa 69-270) was selected empirically as a tractable, soluble recombinant segment containing a defined repeat-containing extracellular region, rather than because we had already mapped the full LFA-1-binding interface. We agree with the reviewer that our structural model suggests that additional residues, including a likely dominant interface outside this fragment, may contribute to the full interaction, and we have clarified that the N-terminal fragment should be interpreted as a minimal binding-competent region, not necessarily the sole binding site.

      (5) As erythrocytes age, their surface undergoes biochemical changes, most notably a drop in levels of sialylation, decreasing the net repulsive negative charge, and they generally become more adherent. Can the authors exclude the possibility that, rather than binding to a parasite-derived ligand, LFA alpha 1 is instead binding to a marker of older erythrocytes? In the data presented, increased binding of LFA alpha 1 is observed as parasites progress through the life cycle, but the host erythrocytes will be ageing during parasite replication, which could account for the increased levels of LFA alpha 1 binding. To rule out this explanation, data from LFA alpha 1 staining of age-matched uninfected erythrocytes could be provided.

      We agree that erythrocyte aging can alter surface sialylation and adhesiveness, and loss of sialic acid is known to reduce erythrocyte surface charge and increase adhesiveness. However, our data argue against aging alone explaining the signal, because LFA-1 αI-Fc binding was compared with uninfected RBC controls and the interaction led to enrichment of a parasite-derived ligand, PfGBP130, in pull-down/MS analyses.

      (6) Figure 3b(i) Surface staining of THP1 cells was performed using GBP-130 Fc as a probe, which should detect all LFA1-positive cells. But no accompanying staining data using an anti-LFA1 antibody are shown, so it is not possible to determine whether staining profiles with GBP-130 Fc match staining profiles with anti-LFA1 antibodies. This is important to show what proportion of LFA1-positive cells can recognise parasite-derived GBP-130 Fc.

      (7) Figure 3c(i) Surface staining of peripheral NK cells is performed using GBP-130 Fc as a probe, which should detect all LFA1-positive cells. Here, as well, there are no staining data using an anti-LFA1 antibody. This would allow a comparison between cell population LFA1 staining with an anti-LFA1 antibody and cell population LFA1 staining with GBP-130 Fc. The two staining profiles should be similar as both probes bind the same surface marker. However, it appears this might not be the case because the staining data using GBP-130 Fc show that only a minor proportion of NK cells (~20%) stain positive, but the majority of peripheral NK cells usually express CD11a, as it is a key adhesion molecule in the formation of immune synapses with target cells. This suggests that GBP-130 can only bind to a subset of NK cells, and if it is binding LFA1, then it can only play a role in mediating the formation of an immune synapse with this subpopulation of NK cells. Could the authors include a comment in the manuscript making clear that the GBP-130 only assists a small proportion of NK cells in adhering to parasite-infected erythrocytes? Are there any reasonable hypotheses as to whyGBP-130 was only able to stain a small subpopulation of LFA1-expressing NK cells?

      For minor comment 6 and 7

      We agree that parallel staining with anti-CD11a would help relate PfGBP130-Fc binding to total LFA-1-positive THP-1 and NK-cell populations. Importantly, LFA-1 expression and ligand binding competence are not equivalent, because integrin binding depends strongly on activation/conformation and avidity state; in NK cells, only a subset can display LFA-1 in a partially activated conformation at baseline despite broader CD11a expression. Thus, a smaller PfGBP130-Fc-positive subset than the total CD11a-positive population is biologically plausible and does not imply inconsistency.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, the authors investigate LFP responses to methionine in the olfactory system of the Xenopus tadpole. They show that this response is local to the glomerular layer, arises ipsilaterally, and is blocked by pharmacological blockade of AMPA and NMDA receptors, with little modulation during blockade of GABA-A receptors. They then show that this response is translently enlarged following transection of the contralateral olfactory nerve, but not the optic lobe nerve. Measurement of ROS- a marker of inflammation- was not affected by contralateral nerve transection, and LFP expansion was not affected by pharmacological blockade of ROS production. Imaging biased towards presynaptic terminals suggests that the enlargement of the LFP has a presynaptic component. A D2 antagonist increases the LFP size and variability in intact tadpoles, while a GABA-B antagonist does not. On this basis, the authors conclude that the increase driven by contralateral nerve transection is due to DA signaling.

      Overall, I found the array of techniques and approaches applied in this study to be creatively and effectively employed. However, several of the conclusions made in the Discussion are too strong, given the evidence presented. For example, the authors state that "The observed potentiation was not related to inflammatory mediators associated to inury, because it was caused by a release of the inhibition made by D2 dopamine receptor present in OSN axon terminals." This statement is too strong - the authors have shown that D2 receptors are sufficient to cause an increase in LFP, but not that they are required for the potentiation evoked by nerve transection. The right experiment here would be to get rid of the D2 receptors prior to transection and show that the potentiation is now abolished. In addition, the authors have not shown any data localizing D2 receptors to OSN axon terminals.

      Similarly, the authors state, "the onset of LFP changes detected in glomeruli is determined by glutamate release from OSNs." Again, the authors have shown that blockade of AMPA/NMDA receptors decreases the LFP, and that uncaging of glutamate can evoke small negative deflections, but not that the intact signal arises from glutamate release from OSNs. The conclusions about the in vivo contribution of this contralateral pathway are also rather speculative. Acute silencing of one hemisphere would likely provide more insight into the moment-to-moment contributions of bilateral signals to those recorded in one hemisphere.

      We thank the reviewer for their positive evaluation of our manuscript. We agree with their opinion about the necessity of including new experimental evidence to back up discussion and conclusions

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This is a creative and careful study, but I felt that the conclusions in the Discussion were too strong. I think these could either be toned down or additional experiments could be done to support the idea that D2 receptors are required for the nerve transection-evoked potentiation, that the source of glutamatergic input is OSNs, and that contralateral interactions are mediated by DA. In particular, I think anatomical stains showing which neurons are carrying the DA signal and whether there is any potentiation of DA release after nerve transection would greatly strengthen the conclusions.

      This new version of the manuscript contains two new figures: 6 and 9.

      New figure 6 addresses the suggestion of this reviewer and provides anatomical evidence for the distribution of dopaminergic neurons in the olfactory bulb of X. tropicalis tadpoles using a tyrosine hydroxylase antibody (mouse monoclonal, Immunostar cat. no. 22941, 1:250; RRID:AB_57226). We identified a discrete neuronal population present in the border between the mitral cell layer and the glomerular layer that resembles the type1 TH+ population described in adult frogs (Boyd and Delaney 2002). TH+ neurons send their processes to innervate olfactory glomeruli and we provide evidence that they contact the GFP lateral glomerulus labelled in Dre.mxn1:GFP X. tropicalis tadpoles (Fig. 6C). These results reinforce a modulatory role for dopamine on glomerular neurotransmission. Materials & methods (lines 152-167), results (lines 393-399) and discussion (lines 550-563) have been modified accordingly.

      Figure 9 provides new evidence on the interhemispheric connections involved in the potentiation of glomerular responses. We first demonstrate that dorsolateral pallial neurons participate in the processing of olfactory information based on the general consideration that the lateral pallium is an olfactory cortex. We confirmed this possibility by stimulating the olfactory epithelium and recording ipsilateral calcium transients in pallial neurons of tubb2b:GCaMP6s tadpoles. We next injured the dorsolateral pallium and 24-48h afterwards we recorded odor-evoked responses in the GFP labelled glomerulus located contralaterally. We observed a ~70% potentiation of responses, which was comparable to the ~75% potentiation obtained by olfactory nerve transection. These results illustrated the involvement of pallial neurons in the control of glomerular output by likely modifying the activity of TH+ neurons. The results (473-506) and discussion (569-576) now include these new results.

      Does the contribution of DA signalling change across development? I think this would be helpful to interpret the results and relatively straightforward to do: apply raclopride at different developmental stages and measure how much potentiation occurs at each stage.

      This is indeed an interesting point, but conducting a comprehensive study of dopamine release throughout development would require a substantial amount of work and delay the publication of this paper. To perform these experiments, we should first implement new technical approaches, such as successfully injuring young tadpoles or recording from late premetamorphic stages. We believe that the proposed experiments could define a new line of arguments rather than complement the present work. Nonetheless, we acknowledge the suggestion of this reviewer.

      In this new version, we provide strong evidence for dopamine release in the glomerular layer, and a key question that arises is the nature of TH+ positive neurons. Recent findings obtained in mice show that there are five different types of dopaminergic interneurons present in the olfactory bulb (Kosaka, Pignatelli, and Kosaka 2020), and important functional differences exist between axon-bearing and anaxonic neurons (Dorrego-Rivas et al. 2025). This evidence suggests a key role for development. A completely new study based on transgenic X. tropicalis displaying labeled TH+ neurons could bring together development, anatomy, and physiology to gain an understanding of how dopaminergic signaling shapes glomerular function.

      In addition, there are several places where showing additional raw data in the figures and carefully quantifying variability would be helpful. For example, in Figure 3B, the authors should show equivalent raw traces from intact and transected tadpoles. In Figure 5D, it would be helpful to show raw traces for LFP equivalent to what is shown for presynaptic imaging in Figure 5E. In Figures 6E-F, it would be helpful to show raw traces.

      Thank you for this suggestion. The examples have been added to the figure panels.

      I found the last experiment with photobleaching somewhat inconclusive, and I am not sure what it adds to the study as presently written. Line 418: Please quantify how many OSNs remained. Line 423: What is the hypothesis for the source of variability?

      The goal of this experiment is to investigate the participation of chemotopy in the potentiation induced by contralateral injury. The elimination of 30-50% of topographically related OSNs did not alter contralateral glomerular responses. This evidence suggests that chemotopy was not relevant to the gain of function observed ; however, we cannot completely rule out a certain topographical contribution, as it was not possible to completely silence all inputs of the studied glomerulus. We now link these findings to the likely innervation of several glomeruli by TH+ neurons, which suggests the absence of a one-to-one glomerulus relationship. LFP amplitudes and their variance are now illustrated in box plots to highlight the absence of significant differences. Lines (457-471).

      An increase in the variance among the recordings obtained is a consistent empirical observation. Although it is a hallmark of the potentiation recorded, we cannot provide a mechanistic explanation. Considering that neurotransmitter release from OSN axon terminals is normally inhibited by dopamine, we hypothesize that disinhibition drives an increase in release probability , leading to larger variations in glutamate release. Such variations could be reflected in the amplitude of LFP negativities.

      It would be helpful to include a measurement of LFP over time so we have some idea of how stable the odor delivery is.

      The amplitude of LFP responses was stable for >30 min. Figure 3B shows recordings obtained during 30 min and new Figure 7F over 42 min. We believe that these examples illustrate that the amplitude, as well as kinetics of the responses obtained were consistent over the period studied.

      Line 227: Small upward deflection - could this be an electrical artifact? Can you run the stimulus delivery with no odor (say, with water) to see if you get the same signal?

      We do not know the precise source of this upward deflection. It is not an electrical artifact related to stimulation, which is sometimes evident (Fig 7A, methionine application). When present, it occurs after the activation of OSNs. One possibility is that the deflection originates in the layer of nerve fibers reflecting some aspect related to the conduction of APs and the relative position of the electrode. Interestingly, some recordings of LFP responses at the level of glomeruli carried out in rats also show a positive deflection (see Figs. 1B, 2A, 3B in (Lecoq, Tiret, and Charpak 2009), thus suggesting it is an intrinsic characteristic of this type of recordings.

      Line 237-239: I wasn't clear from the text whether this was a variation due to development, to transection, or natural variability.

      We now indicate that the relationship reflects normal development (lines 261-264).

      Line 521: N-type VGCCs: can these be targeted with pharmacology to strengthen the argument?

      We acknowledge this suggestion but we have not carried out these experiments as we believe that the interpretation could be complex due to the high density of synapses present in glomeruli and the likely involvement of other types of VGCCs in neurotransmitter release.

      Small issues:

      (1) Line 190-196: Some of this could potentially be moved to the Discussion section.

      These are some arguments to defend the validity of our experimental approach to record the response of the lateral glomerulus labeled by GFP. If we move them to the discussion, the information related to the spatial extent of our recordings would be split between results and discussion. We believe that the current format of the paper allows to focus the discussion on the interpretation of the results obtained.

      (2) Line 268: exponential recover phase.

      Thanks. Corrected.

      (3) Line 278: affected to -> arises from

      Thanks. Corrected.

      (4) Line 282: affect to -> can affect.

      Thanks. Corrected.

      (5) Line 403: 2Phatal technique: Please state briefly what this is

      It is now indicated: two-photon chemical apoptotic targeted ablation (2Phatal).

      NOTE:

      During the revision of this manuscript we realized that Figures 3C and 4B indicated mean±SD. The panels have been amended to show mean±s.e.m.

      References

      Boyd, J. D., and K. R. Delaney. 2002. "Tyrosine hydroxylase-immunoreactive interneurons in the olfactory bulb of the frogs Rana pipiens and Xenopus laevis." J Comp Neurol 454 (1):42-57. doi: 10.1002/cne.10428.

      Dorrego-Rivas, A., D. J. Byrne, Y. Liu, M. Cheah, C. Arslan, M. Lipovsek, M. C. Ford, and M. S. Grubb. 2025. "Strikingly different neurotransmitter release strategies in dopaminergic subclasses." Elife 14. doi: 10.7554/eLife.105271.

      Kosaka, T., A. Pignatelli, and K. Kosaka. 2020. "Heterogeneity of tyrosine hydroxylase expressing neurons in the main olfactory bulb of the mouse." Neurosci Res 157:15-33. doi: 10.1016/j.neures.2019.10.004.

      Lecoq, J., P. Tiret, and S. Charpak. 2009. "Peripheral adaptation codes for high odor concentration in glomeruli." J Neurosci 29 (10):3067-72. doi: 10.1523/JNEUROSCI.6187-08.2009.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      An ongoing controversy in the field of learning and memory is the specific neural mechanism that maintains long-term memory (LTM). A prominent hypothesis proposed by Sacktor and Fenton and their colleagues is that LTM is maintained by the ongoing activity of the atypical PKC isoform PKMζ. Early evidence in support of this hypothesis came from experiments showing that an inhibitory peptide, ZIP, whose activity was purported to be specific for PKMζ, blocked late-phase hippocampal LTP (L-LTP) and LTM. However, in 2013, two articles reported that LTM was normal in PKMζ knockout mice and that ZIP erased LTM in the knockout mice, indicating that ZIP lacked specificity for PKMζ. In response, Sacktor and Fenton and colleagues reported in 2016 that in PKMζ null mice, there is an increase in the expression of PKC𝜾/𝛾, a related isoform of atypical PKC, and this increased expression can compensate for PKMζ; their data indicated that the upregulation of PKC 𝜾/𝛾 mediates L-LTP and LTM in the PKMζ. In the present article, the authors provide additional support for this idea. They replicate the finding of an upregulation of PKC 𝜾/𝛾 expression in the hippocampus of PKMζ knockout mice; in addition, they show that the expression of several other PKC isoforms is upregulated in the knockouts. They find that down-regulation of PKC𝜾/𝛾 expression in the hippocampus using the Cre-LoxP technology, the 2016 paper merely used an inhibitor to block the activity of PKC𝜾/𝛾-blocks L-LTP. Finally, the authors demonstrate that, although LTM is preserved in the single PKMζ knockout mouse, it is eliminated in the PKMζ/PKC𝜾/𝛾 double knockout mouse.

      Strengths:

      The experiments appear to have been carefully executed, the results reliable, and the paper well-written. Overall, the article provides significant additional support for the idea that the activity of PKMζ is critical for the maintenance of hippocampal L-LTP and LTM. The article uses genetic methods, rather than simply pharmacological ones, to demonstrate that when PKMζ is genetically deleted, PKC𝜾/𝛾, compensates for the missing PKCζ.

      Weaknesses:

      The paper sets up what I believe is probably a false dichotomy between a structural explanation - a change in the number of synaptic connections among neurons - and the persistent kinase activity explanation for memory maintenance. Why are these two explanations necessarily antithetical? It is possible that an increase in synaptic connections and the ongoing activity of PKMζ both contribute substantially to memory maintenance. The authors certainly don't provide any evidence that the number of synapses in the hippocampus remains unchanged after the induction of L-LTP or LTM. Indeed, I see no reason why persistent PKMζ activity could not be a mechanism for the maintenance of an enhanced number of synaptic connections following the induction of LTP/LTM. To the best of my knowledge, this possibility has not yet been explored. Consequently, I don't see why the present results would lead one to favor a biochemical explanation over a structural one for memory maintenance. Given the significant experimental evidence that LTM involves persistent structural changes in neurons, both explanations are equally plausible at present.

      As requested, we eliminated the discussion of a dichotomy between structural and biochemical mechanisms of long-term memory in the Abstract and Introduction. We now briefly address the relationship between the two hypotheses, which are not mutually exclusive, in the Discussion.

      Reviewer #2 (Public review):

      Summary:

      The authors are attempting to advance understanding of the role of unconventional PKCs, PKCM𝛇, and PKC𝜄/𝝀 in maintenance of late-phase LTP. Their results help to clarify the interplay between "structural" and "biochemical/enzymatic" mechanisms of LTP and learning in the hippocampus.

      Strengths:

      A strength is the use of conditional knock-outs of PKCM𝛇 and PKC𝜄/𝝀 to assess the role of these two enzymes in maintaining long-term potentiation and in compensating for each other when one of them is conditionally knocked out in the adult.

      Weaknesses:

      The paper is extremely difficult to read because the abstract does not clearly state the advances made over earlier studies by the use of conditional KO mutation. For example, in line nine of the abstract, the authors state, "Here, we found PKC𝜄/𝝀 persists in LTP and long-term memory when PKM𝛇 is genetically deleted." This is confusing because it sounds as though the experiments have repeated earlier published experiments in which the gene encoding PKM𝛇 is deleted in the embryo. The authors are not clear throughout the manuscript that they are using conditional KO of the two enzymes in the adult animal, rather than deletion of the gene. The term "genetically deleted" does not mean "conditionally deleted in the adult." The final sentences of the abstract are: "Whereas deleting PKM𝛇 and PKC𝜄/𝝀 individually induces compensation, deleting both aPKCs abolishes hippocampal late-LTP. Hippocampal 𝜄/𝝀-𝛇 -double-knockout eliminates spatial long-term memory but not short-term memory. Thus, in the absence of PKM𝛇 , a second persistent biochemical process compensates to maintain late-LTP and long-term memory." These sentences do not convey a clear logical conclusion. The Discussion does a better job of stating the importance of the experiments.

      We have clarified the genotypes of the mice in the abstract and throughout the text.

      Reviewer #3 (Public review):

      Summary:

      The manuscript addresses an important, yet unresolved and long-debated, question: whether atypical protein kinase C is required for the maintenance of late-long-term synaptic potentiation (L-LTP) and long-term memory (LTM). The authors confirm previous findings that persistent activity of PKMζ is required for hippocampal L-LTP and spatial memory. They demonstrate that genetically deleting PKCι/λ and PKMζ individually induces compensatory upregulation, whereas deleting both atypical PKCs abolishes hippocampal L-LTP spatial long-term memory. The study uses an elegant combination of immunoblots, electrophysiology, and behavioral assays. The use of Cre-recombinase to target specific hippocampal regions and neurons adds to the rigor of the findings.

      Strengths:

      The manuscript addresses an important, yet unresolved and long-debated, question; whether PKMζ is required for the maintenance of L-LTP and LTM. The study demonstrates that PKCι/λ, which was previously shown to be critical for the initial generation of the early phase of LTP and short-term memory, becomes persistently active in L-LTP and LTM in a PKMζ knock-out model, compensating for the loss of PKMζ. Furthermore, when the compensation mechanisms are eliminated by simultaneous deletion of both PKMζ and PKCι/λ, maintenance of LTP and long-term spatial memory, but not of short-term memory, is diminished. The strength of this study is that the authors used a double-knockout strategy to directly address the controversy concerning the roles of PKMζ in memory formation. By showing that PKCι/λ compensates when PKMζ is deleted, the authors provided a compelling explanation for previous contradictory findings.

      Weaknesses:

      (1) The authors should provide the numerical values for all data.

      (2) It appears that blind procedures were only used for the behavioral experiments. Some explanation is warranted.

      (3) The description of the immunoblotting procedures lacks sufficient detail. The authors state that immunoblots were stained with multiple antisera to visualize multiple PKCs on the same immunoblot. To conserve antisera, the immunoblots were cut to isolate the relevant proteins based on molecular weight. Isoforms with similar molecular weights were either stained with antisera of different species or on separate blots. Despite this explanation, it is unclear how immunoblotting was performed in practice. For example, in Figure 1B, the authors compared the changes of four conventional PKC isoforms. Because all four antibodies are mouse monoclonal antibodies recognizing proteins of similar molecular weights, each probing should presumably have its own actin loading controls. However, these controls are missing from the figure. Some clarification is warranted.

      (4) The statement in the legend to Figure 4B, that the increases of maximum avoidance time from pretraining to trial 1 are not different, indicates both groups of mice successfully established short-term memory, which is not correct. The analysis only reveals that there is no difference between the two groups. No differences could be due to both groups learning the same, as the authors suggest, or alternatively to no learning in either group.

      (5) The labeling on some of the illustrations (e.g., Figure 2B) is unreadable.

      (6) In Figure 4B, only the single statistical comparison between "pretaining" and "1 trial" is shown. The other comparisons described in the legend should also be illustrated.

      (7) There is no documentation to support the statement that "The prevailing textbook mechanism for how memory is retained asserts that stable structural changes at synapses, the result of initial protein synthesis and growth, sustain memory without the need for ongoing biochemical activity dedicated to storing information" or for the statement in the Discussion that the structural model of memory storage is the standard account.

      (1) Numerical data used in statistical analyses are now provided for LTP experiments in Figure 4 figure supplement 1. Numerical values for all other experiments are presented in the figures.

      (2) Blind procedures were performed for all experiments except for LTP experiments that involved the transfection of eGFP as control, as the eGFP could be detected visually in the hippocampal slice by the experimenter. This is now clarified in the Statistics section of the Methods.

      (3) The description of immunoblotting was clarified in the Methods, and actin loading controls presented for all immunoblots in Figure 1 and Figure 1 figure supplements 1 and 2.

      (4) Short-term memory (Figure 5B) is now determined by 2 methods. First, we show that for both groups the times to enter the shock zone increase in the first training trial, as compared to the pretraining session with the shock off. The increases are not different between the groups. Second, we show increases of the maximal avoidance time from pretraining to trial 1 for both groups are significant, and that the increases are not different. These data show that short-term memory was present in both groups and not measurably different between the groups.

      (5) The fonts of the figure labels were enlarged.

      (6) The comparisons between pretraining and training trial 1 and between training trials 1 and 3 for the two groups are now shown in Figure 5B.

      (7) We abbreviated our discussion of the structural model, which is now presented at the end of the Discussion (as per Reviewer 1), and removed the comment that it is the prevailing view, stating instead that the hypothesis is “widely held.”

      Additional points: As requested, the timing of tamoxifen injections and tissue collection for immunohistochemistry is clarified in the protocol schematic of a new Figure 2A and Figure 2A legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study examines the evolution of virulence and antibiotic resistance in Staphylococcus aureus under multiple selection pressures. The evidence presented is convincing, with rigorous data that characterizes the outcomes of the evolution experiments. However, the manuscript's primary weakness is in its presentation, as claims about the causal relationship between genotypes and phenotypes are based on correlational evidence. The manuscript needs to be revised to address these limitations, clarify the implications of the experimental design, and adjust the overall narrative to better reflect the nature of the findings.

      Thank you for your feedback. Here, we summarize the major changes made in the revised manuscript:

      (1) We did not test causality between mutations and phenotypes in our study. We were intentional about not using causal wording (“mutation X caused/led to/resulted in phenotype Y”), and only discussed these results using the terms “correlation” and “association”, and only when they were statistically significant. We understand that some readers may view these terms as being equivalent to “causation”, thus in the revision, we have modified our wording as suggested (please see below for specific lines).

      (2) We agree that experimental evolution in nematodes is not a direct simulation of evolution in humans. The goal of our study was first and foremost, a test of how multiple selective pressures can shape pathogen evolution. This point was presented in the first paragraph, the second to last paragraph of the Introduction (which included our hypotheses), and the last paragraph of the manuscript. References to humans and other mammalian systems were intended to point out similarities between our findings and what had already been found in S. aureus outside the lab. Despite differences between mammals and nematodes, several parallels arose at both the phenotypic and genomic levels, which is interesting from an evolutionary standpoint. We understand that more experiments and tests would be needed before we can make claims about the selective pressures acting on S. aureus outside the lab. We presented some information in the context of humans because a large part of the literature on S. aureus is on its role as a major bacterial pathogen; we did not want to neglect this aspect of its natural life history.

      In the revised manuscript, we are more explicit in stating these points, as well as tempering some language regarding human infection, and removing some references to humans. Please see below for specific lines as well as justification for specific references to humans/mammalian systems.

      (3) We have including additional details on the experimental design below. We hope this is sufficiently clarifying.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate how methicillin-resistant (MRSA) and sensitive (MSSA) Staphylococcus aureus adapt to a new host (C. elegans) in the presence or absence of a low dose of the antibiotic oxacillin. Using an "Evolve and Resequence" design with 48 independently evolving populations, they track changes in virulence, antibiotic resistance, and other fitness-related traits over 12 passages. Their key finding is that selection from both the host and the antibiotic together, rather than either pressure alone, results in the evolution of the most virulent pathogens. Genomically, they find that this adaptation repeatedly involves mutations in a small number of key regulatory genes, most notably codY, agr, and saeRS.

      Strengths:

      The main advantage of the research lies in its strong and thoroughly replicated experimental framework, enabling significant conclusions to be drawn based on the concept of parallel evolution. The study successfully integrates various phenotypic assays (virulence, growth, hemolysis, biofilm formation) with whole-genome sequencing, offering an extensive perspective on the adaptive landscape. The identification of certain regulatory genes as common targets of selection across distinct lineages is an important result that indicates a level of predictability in how pathogens adapt.

      Thank you very much.

      Weaknesses:

      (1) The main limitation of the paper is that its findings on the function of specific genes are based on correlation, not cause-and-effect evidence. While the parallel evolution evidence is strong, the authors have not yet performed the definitive tests (i.e., reconstruction of ancestral genes) to ensure that the mutations identified in isolation are enough to account for the virulence or resistance changes observed. This makes the conclusions more like firm hypotheses, not confirmed facts.

      We have replaced instances of “association” and “correlation” with wording similar to that suggested where applicable, including:

      L 342 – 344: “The loss of SCCmec and ACME was more often identified in populations exhibiting an increase in total growth from the ancestor outside the host…”

      L 371 – 375: “Mutations in three genes were regularly identified in populations exhibiting significant increases in virulence from the ancestor: codY, gdpP, and pbpA. Mutations in agr in general were not associated with changes in overall virulence, but MSSA populations harboring mutations in this gene were more likely to exhibit greater virulence compared to MRSA populations (Wilcoxon rank sum exact test P = 0.045).”

      L 377: “Mutations in specific genes were often found in populations able to hemolyze red blood cells…”

      L 379 – 381: “There were also significant differences between the mutations regularly identified in oxacillin-resistant populations evolved from the MSSA ancestor...”

      L 384 – 385: “By contrast, mutations in agr were often in populations exhibiting loss of hemolytic activity, consistent with previous findings...”

      L 409 – 410: “Mutations that arose during experimental evolution are regularly found in strains associated with human systemic infections.”

      We have also stated that ancestral reconstruction is needed:

      L 553 – 555: “Future experiments may include introducing these mutations into the ancestral background to directly link the mutations in these genes to evolved virulence.”

      (2) In some instances, the claims in the text are not fully supported by the visual data from the figures or are reported with vagueness. For example, the display of phenotypic clusters in the PCA (Figure 6A) and the sweeping generalization about the effect of antibiotics on the mutation rates (Figure S5) can be more precise and nuanced. Such small deviations dilute the overall argument somewhat and must be corrected.

      In reference to Fig. 6A, we have revised the statement as suggested: “…where populations exposed to host and sub-MIC oxacillin clustered together, largely separating from all other treatments…” Line 442

      In reference to Fig. S5, we conducted statistics to include both MRSA and MSSA populations and examined the effect of oxacillin on the number of mutations. While oxacillin had a significant effect on the number of mutations, we agree with the reviewer that this may be driven by the MRSA populations and have clarified: “Sub-MIC oxacillin selection also resulted in more mutations than in its absence ( = 5.92, P = 0.015), although this is likely driven by MRSA populations.” Lines 310 – 311

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes the results of an evolution experiment where Staphylococcus aureus was experimentally evolved via sequential exposure to an antibiotic followed by passaging through C. elegans hosts. Because infecting C. elegans via ingestion results in lysis of gut cells and an immune response upon infection, the S. aureus were exposed separately across generations to antibiotic stress and host immune stress. Interestingly, the dual selection pressure of antibiotic exposure and adaptation to a nematode host resulted in increased virulence of S. aureus towards C. elegans.

      Strengths:

      The data presented provide strong evidence that in S. aureus, traits involved in adaptation to a novel host and those involved in antibiotic resistance evolution are not traded off. On the contrary, they seem to be correlated, with strains adapted to antibiotics having higher virulence towards the novel host. As increased virulence is also associated with higher rates of haemolysis, these virulence increases are likely to reflect virulence levels in vertebrate hosts.

      Weaknesses:

      Right now, the results are presented in the context of human infections being treated with antibiotics, which, in my opinion, is inappropriate. This is because

      (1) exposure to the host and antibiotics was sequential, not simultaneous, and thus does not reflect the treatment of infection, and

      (2) because the site of infection is different in C. elegans and human hosts.

      We have removed the two sentences referencing site of infection:

      Introduction: “In the host, antibiotic concentrations will gradually decline after administration due to metabolism and excretion.”

      Discussion: “…in addition to infection of antibiotic-treated hosts, where there is uneven distribution of drugs across tissues.”

      For our rationale for discussing humans in general, please see below.

      Nevertheless, the results are of interest; I just think the interpretation and framing should be adjusted.

      Thank you very much.

      Reviewer #3 (Public review):

      Summary:

      Su et al. sought to understand how the opportunistic pathogen Staphylococcus aureus responds to multiple selection pressures during infection. Specifically, the authors were interested in how the host environment and antibiotic exposure impact the evolution of both virulence and antibiotic resistance in S. aureus. To accomplish this, the authors performed an evolution experiment where S. aureus was fed to Caenorhabditis elegans as a model system to study the host environment and then either subjected to the antibiotic oxacillin or not. Additionally, the authors investigated the difference in evolution between an antibiotic-resistant strain, MRSA, and an isogenic susceptible strain, MSSA. They found that MRSA strains evolved in both antibiotic and host conditions became more virulent, and that strains evolved outside these conditions lost virulence. Looking at the strains evolved in just antibiotic conditions, the authors found that S. aureus maintained its ability to lyse blood cells. Mutations in codY, gdpP, and pbpA were found to be associated with increased virulence. Additionally, these mutations identified in these experiments were found in S. aureus strains isolated from human infections.

      Strengths:

      The data are well-presented, thorough, and are an important addition to the understanding of how certain pathogens might adapt to different selective pressures in complex environments.

      Thank you very much.

      Weaknesses:

      There are a few clarifications that could be made to better understand and contextualize the results. Primarily, when comparing the number of mutations and selection across conditions in an evolution experiment, information about population sizes is important to be able to calculate the mutation supply and number of generations throughout the experiment. These calculations can be difficult in vivo, but since several steps in the methodology require plating and regrowth, those population sizes could be determined. There was also no mention of how the authors controlled the inoculation density of bacteria introduced to each host. This would need to be known to calculate the generation time within the host. These caveats should be addressed in the manuscript.

      While the population sizes within hosts and generation time could be determined, we would need to conduct additional experiments (e.g., infecting nematodes with S. aureus, then crushing, plating, and counting colony forming units across time intervals) in order to obtain measurements for pathogen growth in hosts across time. For experimental evolution, we crushed a set number of dead nematodes (30) and all bacteria that were released were allowed to grow in liquid media before an aliquot (25%) was used to seed the next passage. Picking and crushing nematodes across 48 populations for one time point was an arduous task. The additional steps of picking, crushing, and plating nematodes across multiple time intervals at the same time experimental evolution was being performed would not be logistically sound.

      In terms of the inoculation density of bacteria, all nematodes were placed on abundant lawns of S. aureus. Nematodes were exposed to full lawns the entire infection step; bacteria remained in abundance. While we do not know the exact inoculum each individual nematode was exposed to, we know that they ingested the bacteria because of the high mortality rate. Furthermore, we followed the same procedure for every replicate across every host-associated treatment. Host individuals within and across passages were also genetically identical to one another. Altogether, these factors allowed for more consistency across the experiment, such that relative inoculum size should be similar across individual hosts. Please refer to the evolution experiment diagram (Author response image 1) for more details.

      Ultimately, while knowing the absolute population size, inoculum size, and generation time within the host is interesting, the rounds of selection (the number of times each population was exposed to the selective pressures) is also important in addressing our major question. Every treatment, which started out from one ancestral clone (MRSA or MSSA), was exposed to the same number of bouts of selection (passages), yet we see significant divergence in terms of traits and mutations. Future directions would certainly involve determining the number of steps (e.g., number of generations within hosts) required to reach these end points, but not knowing exactly how many steps were required do not detract from addressing the larger question of determining how pathogens respond to multiple selective pressures.

      Another concern is the number of generations the populations of S. aureus spent either with relaxed selection in rich media or under antibiotic pressure in between the host exposure periods. It is probable then that the majority of mutations were selected for in these intervening periods between host infection. Again, a more detailed understanding of population sizes would contribute to the understanding of which phase of the experiment contributed to the mutation profile observed.

      We conducted every step of the evolution experiment on the same timeline. For example, all replicates across treatments were grown in liquid media at the same time (see Author response image 1.). All populations were exposed to the same selective pressures at this step of the experiment. We can then compare populations that were subsequently exposed to hosts against those that were not. Populations passaged without a host served as the control. Mutations that were solely unique to host-exposed populations would more likely contribute to the traits of interest, compared to mutations that were in common between the host-exposed and no-host treatments. Similar comparisons could be made with the oxacillin-exposed and no-oxacillin populations.

      In general, the only differences between treatments would be driven by the treatments themselves. Given that we are interested in treatment-level effects, any differences in population size or generation time between treatments could contribute to the treatment effects we observe, and thus were not something we aimed to hold uniform across our experiment.

      Author response image 1.

      Schematic of procedural steps involved in one passage of S. aureus through nematodes (+host -ox) compared to without nematodes (-host -ox).

      Recommendations for the authors:

      Reviewing Editor Comments:

      We encourage you to address all other comments raised by the reviewers; however, the review team has identified the following points as the most critical and fundamental to improve your manuscript:

      (i) Reframing the narrative: You will need to adjust the narrative so that the study is presented as a "proof of principle" rather than a direct simulation of a human infection.

      While we referenced human infection, we believe the study had been presented as a proof of principle. Examples include:

      (1) We discussed the gap of knowledge in the first paragraph: “It is unclear how virulence evolves in the face of more than one selective pressure and whether this trait is constrained or facilitated by antibiotic resistance.” Lines 86 – 88

      (2) In the second to last paragraph in the Introduction, we presented the main hypotheses: “Adaptation may require resources to be expended toward either virulence or antibiotic resistance, leading to a trade-off between these traits (Ferenci, 2016). Alternatively, weaker selection from sub-MIC antibiotics may interact synergistically with hosts and facilitate the evolution or maintenance of high virulence and antibiotic resistance.” Lines 176 – 179

      (3) The last paragraph concluded with “Our findings ultimately emphasize the importance of considering the host context in the evolution of antibiotic resistance. Integrating multiple traits, such as virulence, antibiotic resistance, and fitness may be critical in identifying the factors that facilitate host shifts and persistence of drug-resistant pathogens.” Lines 613 – 616

      These paragraphs, which set up the context for our work, did not primarily discuss human infections.

      In the revised manuscript, we have further tempered language regarding human infection:

      L 169 - 172: “Experimentally evolving S. aureus in C. elegans thus allows us to track the early stages of virulence and antibiotic resistance evolution in novel host populations with the potential to identify conserved genomic regions underlying evolved traits.”

      L 595 – 596: “Additional direct tests are needed to evaluate the role of these mutations in adaptation of S. aureus to different infection sites.”

      L 610 – 611: “Pathogen evolution in a tractable invertebrate animal model yielded phenotypes and genotypes similar to those identified in mammalian hosts, highlighting the utility of evolution experiments to identify potential ecological and genetic mechanisms that may give rise to pathogen traits conserved across systems.”

      And removed some references to humans:

      In the Introduction: “In the host, antibiotic concentrations will gradually decline after administration due to metabolism and excretion.”

      In the Discussion: “…in addition to infection of antibiotic-treated hosts, where there is uneven distribution of drugs across tissues.”

      Otherwise, our rationale for referencing humans/mammalian systems in our Introduction include:

      Setting the context of our study system: we discussed humans and clinical significance when we first introduced S. aureus (lines 132 – 151) and experimental evolution (lines 153 – 172). Much of what is known about S. aureus outside the lab is when it is interacting with humans, thus we weaved in relevant information that has been discovered in other organisms.

      Hemolysis: This ability is important for S. aureus virulence toward C. elegans (Sifri et al., 2003).

      S. aureus genomic database: we intended to leverage this large-scale database of genomes isolated from S. aureus outside the lab to compare patterns emerging from experimental evolution to those in existing isolates. Due to its relevance as a major bacterial pathogen, most of the isolates happen to be from clinical settings.

      (ii) Adjusting the causal language: You will need to soften the language so that correlational claims do not appear to be causal.

      We have adjusted language as noted above.

      (iii) Clarifying methodological aspects: You will need to provide more details on the methodology, such as population sizes, and clarify the implications of these in the conclusions of the work.

      We have provided additional explanation of methodology and the role of control (no host) treatments above.

      Reviewer #1 (Recommendations for the authors):

      The paper is robust, and the study is of great significance. Tackling the subsequent issues would greatly enhance the paper and elucidate its findings.

      Major Recommendations:

      (1) Revising Causal Language: The main flaw of the manuscript lies in its presentation of correlational data as if it were causal. We highly suggest a thorough review of the text to soften causal language when connecting genotypes to phenotypes. The absence of ancestral reconstruction should be recognized as a constraint. Assertions ought to be presented as robust, evidence-based hypotheses. For instance, rather than saying a mutation "associated with significant increases in virulence," you might say "was regularly identified in groups that developed increased virulence, strongly suggesting this gene's role in the adaptation." This will more precisely clarify the contribution of the work.

      We have softened language and stated that ancestral reconstruction is needed as noted above.

      (2) Expand on Parallel Mutations: The examination of parallel evolution in Figure 4A is intriguing but would be notably stronger with additional details. I suggest including an additional supplementary figure or table detailing the specific non-synonymous mutations identified in the highly parallel genes (e.g., codY, agr, gdpP). It is essential for the reader to understand whether parallel evolution is happening at the gene level (different mutations in a single gene) or at the nucleotide level (the precise same mutation appearing again). Kindly specify if any of these mutations were nonsense mutations, as this suggests that the loss-of-function is advantageous.

      The full table of mutations is in fig share (10.6084/m9.figshare.28745558). We have added a Supplemental Table (Table S2) containing mutations in genes occurring in more than two populations. Many of these mutations were not the same, indicating parallel evolution at the gene level (lines 315 – 317).

      Minor Recommendations for Clarity and Accuracy:

      (1) Introduction:

      Lines 176-177: Please add a citation for the statement describing the function of the SCCmec cassette, as this is established knowledge.

      Done.

      (2) Results:

      Section Title (Line 254): The title "Host and sub-MIC antibiotic promoted growth..." is imprecise. Figure 3B shows that it is the combination of these factors that promotes growth in MRSA, while oxacillin alone is detrimental. Please revise the title to reflect this synergistic effect.

      “Synergistically” has been added to the title: “Host and sub-MIC antibiotic synergistically promoted growth of MRSA…” Lines 269 – 270

      Lines 261-263: The description of Figure 3B is incomplete. The text should explicitly state that the -host+ox treatment resulted in the lowest growth for MRSA, which provides a critical contrast and suggests a fitness cost.

      We have added “By contrast, exposure to sub-MIC oxacillin alone yielded the lowest growth, suggesting a fitness cost.” Lines 277 – 278

      Line 294: The claim that "Sub-MIC oxacillin selection also resulted in more mutations" is a generalization not supported for the MSSA genotype, according to Figure S5. Please revise this sentence to specify that this effect was observed in the MRSA populations.

      We have clarified: “Sub-MIC oxacillin selection also resulted in more mutations than in its absence ( = 5.92, P = 0.015), although this is likely driven by MRSA populations.” Lines 310 – 311

      Lines 419-421: The claim that the +host+ox populations in Figure 6A "formed a distinct cluster" is an overstatement, as there is visible overlap with one other treatment (e.g., host-ox). Please revise this to more accurately describe the visual data (e.g., "clustered together, largely separating...").

      We have revised the statement as suggested: “…where populations exposed to host and sub-MIC oxacillin clustered together, largely separating from all other treatments…” Lines 442 – 443

      Lines 422-424: The interpretation of the MRSA PCA (Figure 6A) focuses on the correlation between virulence and sub-MIC growth. However, the correlation between "biofilm production" and "growth without oxacillin" appears visually stronger. Please address this correlation as well for a more complete interpretation.

      We have added “For MRSA populations, biofilm production and growth without oxacillin also appeared to be positively correlated.” Lines 447 – 448

      (3) Discussion:

      Lines 469-470: The statement that "exposure to oxacillin resulted in pathogens causing the greatest host mortality" is imprecise. The data in Figure 2A show that it is the combination of host and oxacillin. Please revise this for accuracy and add a direct citation to Figure 2A here.

      We have added clarification: “Nonetheless, we observed differing evolutionary trajectories, where exposure to oxacillin in host-associated treatments resulted in pathogens causing the greatest host mortality.” Lines 496 – 498

      Reviewer #2 (Recommendations for the authors):

      After reviewing the paper and reading the previous reviews from PLoS Biology, my biggest criticism of the paper is the way the story is told. In principle, the results are interesting and relevant, but the analogy to human infection and immune system/ antibiotic treatment strategies does not fit entirely with the experimental design or the results. I think the motivation needs to be reframed. In the study, antibiotic exposure is purely environmental, i.e., not in the host. How does environmental antibiotic use affect in vivo evolution, as this is not tested? As previous reviewers have pointed out, S. aureus is not an enteric pathogen in humans but most often causes skin infections. Furthermore, much of the results and discussion is focused on haemolysis of red blood cells, a cell type that C. elegans does not have. What the paper does present, on the other hand, and something that is interesting and novel, is a test in a model system of how a bacterial pathogen evolves to competing selection pressures. I might have hypothesised a priori that these competing pressures result in trade-offs, something which there is no evidence of, even though growth rate does not appear to be negatively impacted as a consequence of selection for drug resistance and virulence together. Instead, many traits are correlated and seemingly at the mechanistic level. This is cool and is a proof of principle, even if the system does not completely mirror reality, and I think the story should be told as such.

      We agree entirely with the reviewer that testing how pathogens respond to multiple selective pressures and the resulting lack of trade-offs are significant and interesting. We presented this question (lines 86 – 88) and our hypothesis about such trade-off in the Introduction (lines 176 – 179). As stated above, we had framed our paper to highlight these points and have removed references to antibiotic concentrations in treated humans.

      We measured and discussed hemolysis because it is important for virulence toward C. elegans (lines 195 – 197) (Sifri et al., 2003). We believe our manuscript contained a reasonable discussion of this trait. For example, three panels of the main figures presented the main hemolysis results (Figures 2B, 2C, and 2D), whereas 23 other panels did not at all involve hemolysis. In the Discussion, hemolysis took up half of the shortest paragraph (lines 509 – 519) and an additional sentence (line 589 – 591), out of seven total paragraphs.

      Specific comments:

      (1) L137-138. Can S. aureus really survive for long periods of time outside of the host? Can you clarify this statement? Do you mean it is an opportunistic pathogen and can also replicate in the environment?

      S. aureus can form biofilms and persist for weeks on inert surfaces (Kramer et al., 2024; Tran et al., 2023), indicating that it may replicate in non-host environments. We have included the phrase “opportunistic pathogen” to clarify (line 145).

      (2) L187 - to ascertain

      Corrected.

      (3) Figure 2B - there seems to be a benefit of haemolysis activity to oxacillin resistance, perhaps a crossover in mechanism? In MSSA, without a host, it goes to complete fixation, whereas it is completely lost when antibiotics aren't present. I know this is discussed later, but I would appreciate a more detailed hypothesis of why this could be.

      Antibiotics have been found to induce expression of virulence traits, such as in the case of oxacillin and hemolysis. Thus, it is reasonable that exposure to oxacillin during evolution would maintain MSSA’s hemolytic ability. We hypothesize that the loss of hemolysis in the absence of oxacillin may be due to the cost of hemolysis expression without a stimulant (oxacillin), hemolysis may not be expressed as often and be subject to deleterious mutations. Alternatively, the stress that cells were under favored virulence in some way, rather than the direct action of the antibiotic.

      (4) L225-228 - As C. elegans do not have red blood cells, why would we expect this? Do you see increased lysis of C. elegans gut cells? Or could it be due to iron accumulation as you are growing the staph on BHI?

      We measured and correlated nematode mortality with hemolytic ability because hemolysis had been found to be involved in virulence toward C. elegans (Sifri et al., 2003). The hemolysis phenotype is a surrogate for S. aureus virulence gene expression.

      (5) Figure 3A - There seems to be a growth cost of evolving oxacillin resistance in the absence of a host. Why might this be?

      MRSA populations exposed to oxacillin without a host during evolution visually exhibited the lowest growth rate. While this is an interesting question, the result was not statistically significant, so we cannot speculate in the manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Some claims in the introduction are either non cited or not correctly stated. The second sentence has a claim about the interplay between antibiotic resistance and virulence with no citation listed. Additionally, there is a claim about S. aureus "evading detection" by attacking the host's immune cells. That is by definition not avoiding detection. Perhaps phrasing it as resisting host immune function would make it clearer.

      We have added a citation (lines 80 – 81) and clarified our wording: “Once inside the host, S. aureus resists host immune function by hindering or lysing immune cells.” Lines 140 – 141

      (2) Once in the introduction and in the discussion, the authors referred to S. aureus as a novel pathogen for C. elegans, I do not think enough is known to make this statement.

      This S. aureus strain is novel because it was isolated from humans, so at least in its recent evolutionary past, it has not interacted with C. elegans. Furthermore, we used a C. elegans isolate (N2) that had been frozen and maintained in the lab on E. coli, and had not been exposed to other microbes in its recent evolutionary past. Finally, S. aureus has not been found to be a native pathogen of C. elegans in nature (Ekroth et al., 2021).

      (3) Key suggestion: Change Figure 1C to reflect the design better. So you could have the +OXA before the host and then have an arrow looping back again to show the cycle of each step. So a figure that would have something like: MRSA > +OXA > +host>+OXA --> MRSA .

      We have updated the figure as suggested.

      (4) Suggest changing "greatest" on line 191, section header to greater.

      Done.

      (5) Line 258: Rich media can still provide selective pressures that are difficult to quantify - fast growth, cofactor and other nutrient limitations due to that fast growth

      We have adjusted our wording: “Importantly, rich media reduced the risk of introducing additional selective pressures than those being tested.” Lines 273 – 274

      (6) Why were intergenic mutations routinely ignored? These can often be very important phenotypically.

      We had focused on genes because there was a sufficient number of genes to discuss, but we have added a Supplemental Table (Table S2) containing all mutations (including intergenic and synonymous) appearing in more than 2 populations. We have also added information regarding mecA, an accessory gene, highlighting the role non-core genes may have in shaping bacterial evolution:

      “Despite evolving in similar environments, MRSA and MSSA populations differing only in the presence of an intact accessory gene (mecA)—proceeded on divergent evolutionary paths…” Lines 66 – 68

      “Carriage of Staphylococcal cassette chromosome mec (SCCmec), which encodes mecA, an accessory gene that provides resistance…” Lines 187 – 188

      “As MRSA and MSSA only differed in the presence of an intact mecA gene at the start of the experiment, accessory genes may play important roles in shaping bacterial evolution (Jackson et al., 2011).” Lines 472 – 474

      (7) Line 294: more mutations than what?

      We have clarified the sentence: “Sub-MIC oxacillin selection also resulted in more mutations than in its absence…” Lines 310 – 311

      (8) Lines 295-297: wording is pretty confusing. It seems that the discussion is about increased mutation rates, possibly due to hypermutators resulting from mutL or recA mutations, but this isn't well-thought out and much is implied here. Furthermore, see the above comment about comparing mutations across conditions - it's hard to make inferences of mutation rates without knowing the mutation supply as a result of varying population sizes across conditions and through the experiment.

      We have clarified the sentence: “…there were only two mutations in DNA and mismatch repair genes (mutL and recA), suggesting repair genes were not the sole mechanism involved.” Lines 313 – 314

      Because all populations evolved from one ancestral clone (either MRSA or MSSA), all mutations that are found at the end of the experiment would have arisen de novo from that ancestor. Since all populations experienced the same number of passages/rounds of selection, we determined mutation rate by counting the number of mutations that were found at the last passage for each replicate population. Populations that acquired significantly more mutations had a higher mutation rate in terms of # of mutations/# of selection rounds.

      (9) Line 486: typo "Mutations genes".

      Corrected.

      (10) Line 487: "antibiotics may allow" is awkward; suggest changing to more precise language, possibly relating to pleiotropy if that is what was meant here.

      We had intended to mean “adaptation [to antibiotics] may allow”. We have clarified: “Mutations in genes involved in resistance to antibiotics were found more often in populations with increased virulence, suggesting that antibiotic adaptation may also favor evolution of virulence.” Lines 514 – 516

      REFERENCES

      Ekroth AKE, Gerth M, Stevens EJ, Ford SA, King KC. 2021. Host genotype and genetic diversity shape the evolution of a novel bacterial infection. ISME Journal 15:2146–2157. DOI: https://doi.org/10.1038/s41396-021-00911-3, PMID: 33603148

      Kramer A, Lexow F, Bludau A, Köster AM, Misailovski M, Seifert U, Eggers M, Rutala W, Dancer SJ, Scheithauer S. 2024. How long do bacteria, fungi, protozoa, and viruses retain their replication capacity on inanimate surfaces? A systematic review examining environmental resilience versus healthcare-associated infection risk by “fomite-borne risk assessment.” Clinical Microbiology Reviews. PMID: 39388143

      Sifri CD, Begun J, Ausubel FM, Calderwood SB. 2003. Caenorhabditis elegans as a model host for Staphylococcus aureus pathogenesis. Infection and Immunity 71:2208–2217. DOI: https://doi.org/10.1128/IAI.71.4.2208-2217.2003, PMID: 12654843

      Tran NN, Morrisette T, Jorgensen SCJ, Orench-Benvenutti JM, Kebriaei R. 2023. Current therapies and challenges for the treatment of Staphylococcus aureus biofilm-related infections. Pharmacotherapy 43:816–832. DOI: https://doi.org/10.1002/phar.2806, PMID: 37133439

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Gosselin et al., develop a method to target protein activity using synthetic single-domain nanobodies (sybodies). They screen a library of sybodies using ribosome/ phage display generated against bacillus Smc-ScpAB complex. Specifically, they use an ATP hydrolysis deficient mutant of SMC so as to identify sybodies that will potentially disrupt Smc-ScpAB activity. They next screen their library in vivo, using growth defects in rich media as a read-out for Smc activity perturbation. They identify 14 sybodies that mirror smc deletion phenotype including defective growth in fast-growth conditions, as well as chromosome segregation defects. The authors use a clever approach by making chimeras between bacillus and S. pnuemoniae Smc to narrow-down to specific regions within the bacillus Smc coiled-coil that are likely targets of the sybodies. Using ATPase assays, they find that the sybodies either impede DNA-stimulated ATP hydrolysis or hyperactivate ATP hydrolysis (even in the absence of DNA). The authors propose that the sybodies may likely be locking Smc-ScpAB in the "closed" or "open" state via interaction with the specific coiled-coil region on Smc. I have a few comments that the authors should consider:

      Major comments:

      (1) Lack of direct in vitro binding measurements:

      The authors do not provide measurements of sybody affinities, binding/ unbinding kinetics, stoichiometries with respect to Smc-ScpAB. Additionally, do the sybodies preferentially interact with Smc in ATP/ DNA-bound state? And do the sybodies affect the interaction of ScpAB with SMC?

      It is understandable that such measurements for 14 sybodies is challenging, and not essential for this study. Nonetheless, it is informative to have biochemical characterization of sybody interaction with the Smc-ScpAB complex for at least 1-2 candidate sybodies described here.

      We agree with the reviewer that adding such data would be reassuring and that obtaining solid data using purified components is not trivial, even for a smaller selection of sybodies. We have now incorporated ELISA data as new Table S1, which shows that most sybodies support clear binding to Smc-ScpAB. Curiously, while (only) some sybodies show a clear preference for ATP-bound or unbound Smc, this is not a strong predictor of the strength of phenotype observed in vivo. We have also attempted to characterize the binding of Smc to sybodies by other methods including pull-downs, cross-linking, and by biophysical methods (GCI). However, we prefer to not include these data as the outcomes are not clear due to inconsistencies in the behaviour of purified sybodies.

      (2) Many modes of sybody binding to Smc are plausible

      The authors provide an elaborate discussion of sybodies locking the Smc-ScpAB complex in open/ closed states. However, in the absence of structural support, the mechanistic inferences may need to be tempered. For example, is it also not possible for the sybodies to bind the inner interface of the coiled-coil, resulting in steric hinderance to coiled-coil interactions. It is also possible that sybody interaction disrupts ScpAB interaction (as data ruling this possibility out has not been provided). Thus, other potential mechanisms would be worth considering/ discussing. In this direction, did AlphaFold reveal any potential insights into putative binding locations?

      We have attempted to map the binding by structure prediction, however, so far, even the latest versions of AlphaFold are not able to clearly delineate the binding interface that we have confidently identified by the mapping using chimeric proteins. Indeed, many ways of binding are possible, including disruption of ScpAB interaction. However, since the mapped binding sites are located on the SMC coiled coils, the later scenario seems unlikely and would be an indirect consequence of altered coiled coil configuration, consistent with our current interpretation.

      (3) Sybody expression in vivo

      Have the authors estimated sybody expression in vivo? Are they all expressed to similar levels?

      We have tagged selected sybodies with gfp and performed live cell imaging. This shows that sybodies without strong phenotypes are similarly expressed at least at low inducer concentration. Moreover, many sybodies localize as foci in the cell presumably by binding to Smc complexes loaded onto the chromosome at ParB/parS sites. We have included example data in the revised version of the manuscript as Figure S4 and Figure S5. Notably, a sybody (Sb007) with a weak growth phenotype shows focal localization at low inducer concentration and high expression levels when fully induced, comparable to sybodies with strong phenotypes. Altogether, this suggests that the lack of phenotype is not due to absence of sybody expression or localization.

      (4) Sybodies should phenocopy ATP hydrolysis mutant of Smc

      The sybodies were screened against an ATP hydrolysis deficient mutant of Smc, with the rationale that these sybodies would interfere this step of the Smc duty cycle. Does the expression of the sybodies in vivo phenocopy the ATP hydrolysis deficient mutant of Smc? Could the authors consider any phenotypic read-outs that can indicate whether the sybody action results in an smc-null effect or specifically an ATP hydrolysis deficient effect?

      As alluded to above, we think that our selection gave rise to sybodies that bind various, possibly multiple Smc conformations. Consistent with this idea, the phenotypes of sybody expression are similar to null mutant rather than the ATP-hydrolysis defective EQ mutant, which display even more severe growth phenotypes in B. subtilis. To highlight this point, we have added the following notes to the text:

      “These conditions favour ATP-engaged particles alongside the typically predominant ATP-disengaged rod-shaped state.”

      “ELISA data revealed that nearly all clones bind purified Smc-ScpAB (Table 1). However, the ELISA signals of only few Sybodies showed clear dependence on the presence or absence of ATP and DNA (Table S1).”

      Significance:

      Overall, this is an impressive study that uses an elegant strategy to find inhibitors of protein activity in vivo. The manuscript is clearly written and the experiments are logical and well-designed. The findings from the study will be significant to the broad field of genome biology, synthetic biology and also SMC biology. Specifically, the coiled coil domain of SMC proteins have been proposed to be of high functional value. The authors have elegantly identified key coiled-coil regions that may be important for function, and parallelly exhibited potential of the use of synthetic sybody/designed binders for inhibition of protein activity.

      Reviewer #2 (Public review):

      Summary:

      Structural Maintenance of Chromosome proteins (SMCs), a family of proteins found in almost all organisms, are organizers of DNA. They accomplish this by a process known as loop extrusion, wherein double-stranded DNA is actively reeled in and extruded into loops. Although SMCs are known to have several DNA binding regions, the exact mechanism by which they facilitate loop extrusion is not understood but is believed to entail large conformational changes. There are currently several models for loop extrusion, including one wherein the coiled coil (CC) arms open, but there is a lack of insightful experimentation and analysis to confirm any of these models. The work presented aims to provide much-needed new tools to investigate these questions: conformation-selective sybodies (synthetic nanobodies) that are likely to alter the CC opening and closing reactions.

      The authors produced, isolated, and expressed sybodies that specifically bound to Bacillus subtilis Smc-ScpAB. Using chimeric Smc constructs, where the coiled coils were partly replaced with the corresponding sequences from Streptococcus pneumoniae, the authors revealed that the isolated sybodies all targeted the same 4N CC element of the Smc arms. This region is likely disrupted by the sybodies either by stopping the arms from opening (correctly) or forcing them to stay open (enough). Disrupting these functional elements is suggested to cause the Smc-dependent chromosome organization lethal phenotype, implying that arm opening and closing is a key regulatory feature of bacterial Smc-ScpAB.

      Significance:

      The authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes.

      Reviewer #3 (Public review):

      Summary:

      Gosselin et al. use the sybody technology to study effects of in vivo inhibition of the Bacillus subtilis SMC complex. Smc proteins are central DNA binding elements of several complexes that are vital for chromosome dynamics in almost all organisms. Sybodies are selected from three different libraries of the single domain antibodies, using the "transition state" mutant Smc. They identify 14 such mutant sybodies that are lethal when expressed in vivo, because they prevent proper function of Smc. The authors present evidence suggesting that all obtained sybodies bind to a coiled-coil region close to the Smc "neck", and thereby interfere with the Smc activity cycle, as evidenced by defective ATPase activity when Smc is bound to DNA.

      The study is well done and presented and shows that the strategy is very potent in finding a means to quickly turn off a protein's function in vivo, much quicker than depleting the protein.

      The authors also draw conclusions on the molecular mode of action of the SMC complex. The provide a number of suggestive experiments, but in my view mostly indirect evidence for such mechanism.

      My main criticism is that the authors have used a single - and catalytically trapped form of SMC. They speculate why they only obtain sybodies from one library, and then only identify sybodies that bind to a rather small part of the large Smc protein. While the approach is definitely valuable, it is biassed towards sybodies that bind to Smc in a quite special way, it seems. Using wild type Smc would be interesting, to make more robust statements about the action of sybodies potentially binding to different parts of Smc.

      The reviewer reports (Rev. #1 and Rev. #3) made us realize that the manuscript text was misleading on the this point. Although we used the purified ATP hydrolysis–deficient Smc protein for sybody isolation, this is not expected to restrict the selection to a specific conformation. As described in detail in Vazquez-Nunez et al. (Figure 5), this mutant displays the ATP-engaged conformation only in a smaller fraction of complexes (~25% in the presence of ATP and DNA), consistent with prior in vivo observations reported by Diebold-Durand et al. (Figure 5). Rather than limiting the selection to a particular configuration, our aim was to reduce the prevalence of the predominant rod state in order to broaden the range of conformations represented during sybody selection. Consistent with this interpretation, only a small number of isolated sybodies show strong conformation-specific binding in the presence or absence of ATP/DNA, as observed by ELISA (now included in the manuscript). Notably, the effect size of ATP/DNA on ELISA signals was not a strong predictor to the strength of phenotypes observed in vivo. The text has been revised accordingly. See line 84 and line 92.

      We are thus quite confident based prior work (and on the now included ELISA data) that the Smc ATPase mutation did not strongly bias the selection in one way or another. The surprising bias towards coiled coil binding sites has likely other explanations, as they likely form a preferred epitope recognized by sybodies from the loop library.

      Line 105: Alternatively, the other libraries did not produce good binders or these sybodies were 106 not stably expressed in B. subtilis. This could be tested using Western blotting - I am assuming sybody antibodies are commercially available. However, this test is not important for the overall study, it would just clarify a minor point.

      While there are antibody fragments available to augment the size of sybodies (PMID: 40108246), these recognize 3D-epitopes and are thus not suited for Western blotting. We did not follow up on the negative results of two of the three libraries but would like to point out again that there are several biases that likely emerge for the same reason (bias to library, bias to coiled coil binding site). If correct, then sybodies are likely ineffective in inactivating Smc in B. subtilis, with the notable exceptions of the sybodies that we have isolated and characterized in this manuscript. We have added this notion to the manuscript.

      Fig. 2B: is odd to count Spo0J foci per cells, as it is clear from the images that several origins must be present within the fluorescent foci. I am fine with the "counting" method, as the images show there is a clear segregation defect when sybodies are expressed, I believe the authors should state, though, that this is not a replication block, but failure to segregate origins.

      We agree that this is an important point. We have added the following statement to clarify this point: “These elongated cells are known to harbour expanded nucleoids, consistent with delayed oriC separation rather than delayed DNA replication”

      Testing binding sites of sybodies to the SMC complex is done in an indirect manner, by using chimeric Smc constructs. I am surprised why the authors have not used in vitro crosslinking: the authors can purify Smc, and mass spectrometry analyses would identify sites where sybodies are crosslinked to Smc. Again, I am fine with the indirect method, but the authors make quite concrete statements on binding based on non-inhibition of chimeric Smc; I can see alternative explanations why a chimera may not be targeted.

      We have made several attempts of testing direct binding with mixed outcomes and decided to not include those results in the light of the stronger and more relevant in vivo mapping. However, we have added ELISA results (new Table S1) that support a direct interaction.

      Smc-disrupting sybodies affect the ATPase activity in one of two ways. Again, rather indirect experiments. This leads to the point Revealing Smc arm dynamics through synthetic binders in the discussion. The authors are quite careful in stating that their experiments are suggestive for a certain mode of action of Smc, which is warranted.

      In line 245, they state More broadly, the study demonstrates how synthetic binders can trap, stabilize, or block transient conformations of active chromatin-associated machines, providing a powerful means to probe their mechanisms in living cells. This is off course a possible scenario for the use of sybodies, but the study does not really trap Smc in a transient conformation, at least this is not clearly shown.

      We agree and have simplified the statement by removing “stabilize” and “transient”.

      Overall, it is an interesting study, with a well-presented novel technology, and a limited gain of knowledge on SMC proteins.

      We respectfully disagree with the last point, since our unique results highlight the importance of the Smc coiled coils. which are less well represented in the SMC literature (when compared to the heads and hinge domains for example), likely (at least in part) due the mild effect of single point mutations on coiled coil dynamics.

      Significance:

      The work describes the gaining and use of single-binder antibodies (sybodies) to interfere with the function of proteins in bacteria. Using this technology for the SMC complex, the authors demonstrate that they can obtain a significant of binders that target a defined region is SMC and thereby interfere with the ATPase cycle.

      The study does not present a strong gain of knowledge of the mode of action of the SMC complex.

      As pointed out above, we respectfully disagree with this assertion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Lumen formation is a fundamental morphogenetic event essential for the function of all tubular organs, notably the vertebrate vascular network, where continuous and patent conduits ensure blood flow and tissue perfusion. The mechanisms by which endothelial cells organize to create and maintain luminal space have historically been categorized into two broad strategies: cell shape changes, which involve alterations in apical-basal polarity and cytoskeletal architecture, and cell rearrangements, wherein intercellular junctions and positional relationships are remodeled to form uninterrupted conduits. The study presented here focuses on the latter process, highlighting a unique morphogenetic module, junction-based lamellipodia (JBL), as the driver for endothelial rearrangements.

      Strengths:

      The key mechanistic insight from this work is the requirement of the Arp2/3 complex, the classical nucleator of branched actin filament networks, for JBL protrusion. This implicates Arp2/3-mediated actin polymerization in pushing force generation, enabling plasma membrane advancement at junctional sites. The dependence on Arp2/3 positions JBL within the family of lamellipodia-like structures, but the junctional origin and function distinguish them from canonical, leading-edge lamellipodia seen in cell migration.

      Weaknesses:

      The study primarily presents descriptive observations and includes limited quantitative analyses or genetic modifications. Molecular mechanisms are typically interrogated through the use of pharmacological inhibitors rather than genetic approaches. Furthermore, the precise semantic distinction between JAIL and JBL requires additional clarification, as current evidence suggests their biological relevance may substantially overlap.

      We have previously analyzed the effects of different ve-cadherin (cdh5) mutant alleles on EC rearrangements (Paatero et al., 2018; Sauteur et al., 2014).These mutants show complex defects (e.g. hypersprouting, reduced contact inhibition during anastomosis) in EC behavior early in vascular tube formation. We find that analysis of JBL dynamics and function is very difficult in such situations. The use of small molecule inhibitors allows acute interventions within a defined time-window and to avoid pleiotropic effects of genetic ablations. We have expanded our discussion on the distinction between JAIL and JBL and hope that this will clarify why – in our opinion – these terms should be used differentially in different cell biological contexts (see below and lines 348-374 in the manuscript).

      Reviewer #2 (Public review):

      Summary:

      In Maggi et al., the authors investigated the mechanisms that regulate the dynamics of a specialized junctional structure called junction-based lamellipodia (JBL), which they have previously identified during multicellular vascular tube formation in the zebrafish. They identified the Arp2/3 complex to dynamically localize at expanding JBLs and showed that the chemical inhibition of Arp2/3 activity slowed junctional elongation. The authors therefore concluded that actin polymerization at JBLs pushes the distal junction forward to expand the JBL. They further revealed the accumulation of Myl9a/Myl9b (marker for MLC) at the junctional pole, at interjunctional regions, suggesting that contractile activity drives the merging of proximal and distal junctions. Indeed, chemical inhibition of ROCK activity decreased junctional mergence. With these new findings, the authors added new molecular and cellular details into the previously proposed clutch mechanism by proposing that Arp2/3-dependent actin polymerization provides pushing forces while actomyosin contractility drives the merging of proximal and distal junctions, explaining the oscillatory protrusive nature of JBLs.

      Strengths:

      The authors provide detailed analyses of endothelial cell-cell dynamics through time-lapse imaging of junctional and cytoskeletal components at subcellular resolution. The use of zebrafish as an animal model system is invaluable in identifying novel mechanisms that explain the organizing principles of how blood vessels are formed. The data is well presented, and the manuscript is easy to read.

      Weaknesses:

      While the data generally support the conclusions reached, some aspects can be strengthened. For the untrained eye, it is unclear where the proximal and distal junctions are in some images, and so it is difficult to follow their dynamics (especially in experiments where Cdh5 is used as the junctional marker). Images would benefit from clear annotation of the two junctions. All perturbation experiments were done using chemical inhibitors; this can be further supported by genetic perturbations.

      We have added annotations to several figures and paid particular attention to the proximal and distal junctions.

      We have previously analyzed the effects of different ve-cadherin (cdh5) mutant alleles on EC rearrangements (Paatero et al., 2018; Sauteur et al., 2014). These mutants show complex defects (e.g. hypersprouting, reduced contact inhibition during anastomosis) in EC behavior early in vascular tube formation. We find that analysis of JBL dynamics and function is very difficult in such situations. The use of small inhibitors allows acute interventions within a defined time-window and to avoid pleiotropic effects of genetic ablations.

      Reviewer #3 (Public review):

      The paper by Maggi et al. builds on earlier work by the team (Paatero et al., 2018) on oriented junction-based lamellipodia (JBL). They validate the role of JBLs in guiding endothelial cell rearrangements and utilise high-resolution time-lapse imaging of novel transgenic strains to visualise the formation of distal junctions and their subsequent fusion with proximal junctions. Through functional analyses of Arp2/3 and actomyosin contractility, the study identifies JBLs as localized mechanical hubs, where protrusive forces drive distal junction formation, and actomyosin contractility brings together the distal and proximal junctions. This forward movement provides a unique directionality which would contribute to proper lumen formation, EC orientation, and vessel stability during these early stages of vessel development.

      Time-lapse live imaging of VEC, ZO-1, and actin reveals that VEC and ZO-1 are initially deposited at the distal junction, while actin primarily localizes to the region between the proximal and distal sites. Using a photoconvertible Cdh5-mClav2 transgenic line, the origin of the VEC aggregates was examined. This convincingly shows that VE-cadherin was derived from pools outside the proximal junctions. However, in addition to de novo VEC derived from within the photoconverted cell, could some VEC also be contributed by the neighbouring endothelial cell to which the JBL is connected?

      Yes, the green (non-converted) VE-cadherin can indeed originate from either of the two cells. The main point we want to make, based on our observations, is that the red (converted) VE-cadherin from the proximal junction (as defined by the ROI) does not contribute to the distal junction.

      As seen for JAILs in cultured ECs, the study reveals that Arp2/3 is enhanced when JBLs form by live imaging of Arpc1b-Venus in conjunction with ZO-1 and actin. Therefore Arp2/3 likely contributes to the initial formation of the distal junction in the lamellopodium.

      Inhibiting Arp2/3 with CK666 prevents JBL formation, and filopodia form instead of lamellopodia. This loss of JBLs leads to impaired EC rearrangements.

      Is the effect of CK666 treatment reversible? Since only a short (30 min) treatment is used, the overall effect on the embryo would be minimal, and thus washing out CK666 might lead to JBL formation and normalized rearrangements, which would further support the role of Arp2/3.

      We have performed washout experiments and find that the ectopic filopodia disappear when the inhibitor is removed. This experiment is shown in supplementary Figure 3 and supplementary Movies 12 and 13.

      From the images in Figure 4d it appears that ZO-1 levels are increased in the ring after CK666 treatment. Has this been investigated, and could this overall stabilization of adhesion proteins further prevent elongation of the ring?

      This is an interesting thought and we haven take a closer look. There is quite a bit of sample-to-sample variation in the ZO1 signal. The quantification (Author response image 1) indicates that there is no increase in the CK666 treated embryos on average.

      Author response image 1.

      To explore how the distal and proximal junctions merge, imaging of spatiotemporal imaging of Myl9 and VEC is conducted. It indicates that Myl9 is localized at the interjunctional fusion site prior to fusion. This suggests pulling forces are at play to merge the junctions, and indeed Y 27632 treatment reduces or blocks the merging of these junctions.

      For this experiment, a truncated version of VEC was use,d which lacks the cytoplasmic domain. Why have the authors chosen to image this line, since lacking the cytoplasmic domain could also impair the efficiency of tension on VEC at both junction sites? This is as described in the discussion (lines 328-332).

      This line was used because it labels the entire JBL protrusion more clearly. We have also included an example using the VE-cad-Venus line (supplementary Figure 4b), which shows a Myl-Cherry pattern consistent with the other examples.

      Since the time-lapse movies involve high-speed imaging of rather small structures, it is understandable that these are difficult to interpret. Adding labels to indicate certain structures or proteins at essential timepoints in the movies would help the readers understand these.

      We have added annotations and labels to all movies. We have also improved annotations in several figures (i.e. Figs. 1, 2, 5, 6 and 7)

      Recommendations for the authors:

      Reviewing Editor Comments:

      Overall, the reviewers are supportive of the manuscript but identify a number of areas where the clarity of the presented data could be improved, and further quantification could be provided to strengthen your conclusions. We would encourage you to address these minor concerns as best you can and to consider the recommendations of all three reviewers when deciding how to revise your manuscript.

      Reviewer #1 (Recommendations for the authors):

      Lumen formation is a fundamental morphogenetic event essential for the function of all tubular organs, notably the vertebrate vascular network, where continuous and patent conduits ensure blood flow and tissue perfusion. The mechanisms by which endothelial cells organize to create and maintain luminal space have historically been categorized into two broad strategies: cell shape changes, which involve alterations in apical-basal polarity and cytoskeletal architecture, and cell rearrangements, wherein intercellular junctions and positional relationships are remodeled to form uninterrupted conduits. The study presented here focuses on the latter process, highlighting a unique morphogenetic module, junction-based lamellipodia (JBL), as the driver for endothelial rearrangements.

      JBL are described as oscillating membrane protrusions emerging at endothelial junctions, operating in a ratchet-like manner to mediate convergent cell movements. This ratchet mechanism allows endothelial cells to approach each other, thereby aligning and joining local luminal segments into a continuous vascular structure. The study employs in vivo high-resolution time-lapse imaging, a technically demanding method that captures spatiotemporal dynamics of cytoskeletal and adhesion complexes during JBL activity with unprecedented detail.

      The key mechanistic insight from this work is the requirement of the Arp2/3 complex, the classical nucleator of branched actin filament networks, for JBL protrusion. This implicates Arp2/3-mediated actin polymerization in pushing force generation, enabling plasma membrane advancement at junctional sites. The dependence on Arp2/3 positions JBL within the family of lamellipodia-like structures, but the junctional origin and function distinguish them from canonical, leading-edge lamellipodia seen in cell migration.

      An intriguing observation is that a novel junction arises at the distal pole of a JBL. This distal junction is formed from a pool of VE-cadherin that is spatially redistributed from regions outside the initial JBL domain. The distal junction then merges with the proximal junction through a process dependent on actomyosin contractility, as was judged by Myl9 recruitment.

      The alternation between pushing forces (Arp2/3-dependent JBL protrusion) and pulling forces (actomyosin-driven junction fusion) defines JBL as a bidirectional mechanical module. Inhibition of actomyosin prevents merging of proximal and distal junctions, thereby stalling lumen continuity. This two-phase system, actin-based extension followed by actomyosin-mediated constriction, ensures both elongation and maturation of endothelial arrangements, ultimately securing vascular patency.

      This manuscript represents a robust and thoughtfully executed study that advances our understanding of lumen formation during vascular development. The overarching conclusions are well substantiated, and the results section provides a clear and detailed exposition of the key findings. I appreciate the explanatory movie at the end. Nevertheless, I offer several remarks for further improvement:

      (1) The fluorescent images presented are visually compelling, yet lack quantitative analysis in the initial figure. Although quantification is included in Figure 3, it is advisable to incorporate this analysis into Figure 1 as well. Early presentation of quantification will help the reader to appreciate the impact and significance of the findings from the outset.

      We appreciate the reviewer’s suggestion and have now added line graphs to measure the spatiotemporal intensities of the Utrophin and ZO-1 reporters in Figure 1b. These measurements demonstrate the sequence of F-actin protrusion and subsequent junctional movement. In Figure 1a, we have added a double-headed arrow which shows the overall movement of the junction towards the dorsal side of the forming DLAV.

      (2) For the fluorescence images, further quantitative analysis of membrane overlap, either in terms of width or pixel overlap, would enhance the rigor of the study. Temporal quantification of overlap may provide valuable insights into the stability and reproducibility of the process across experimental replicates.

      JBL are quite heterogenous with respect to size, shape and dynamics, which makes quantifications of membrane overlap (JBL size) across experimental replicates difficult. We have published some quantifications on JBL orientation and oscillation in our previous paper (Paatero et al., 2018, Nat. comm. Figures 1 and 2), which are in agreement with our current study.

      (3) When referencing the role of Arp2/3, the authors employ an ArpC1b transgenic fish. The results section should thus specifically address the involvement of ArpC1b, rather than generalizing to Arp2/3. In the discussion, it would be appropriate to speculate on the potential involvement of the complete Arp2/3 complex. Notably, the use of CK is acknowledged as a broadly accepted inhibitor of actin polymerization.

      As ArpC1b is a subunit of an active Arp2/3 complex (Padrick et al., 2011), we have used an ArpC1b-Venus as a readout for Arp2/3 localization. The construct has been validated before in cell culture (Law et al., 2021) as well as in zebrafish (Malchow et al., 2024) and the spatiotemporal distribution of the reporter shown to be consistent with Arp2/3 complex. We are stating this in the results section (lines 173-178) and subsequently use the term Arp2/3 to facilitate reading of the text. In the corresponding figure legends, we are maintaining the term ArpC1b. CK666 interferes with the dimerization of Arp2 and Arp3 subunits and thus prevents activity of the Arp2/3 complex.

      (4) The discussion regarding JAIL versus JBL involvement remains challenging to interpret. If JAIL structures arise from the loss of cell-cell contacts, both JAIL and JBL resemble membrane protrusions and are likely governed by similar molecular mechanisms, predominantly actin polymerization and Arp2/3 activity, with probable contribution from Rac1 signaling. The precise semantic distinction between JAIL and JBL warrants further clarification, as their biological relevance may be overlapping.

      We agree with the reviewer. Below we outline the reasons why lamellipodial protrusions that emanate from cell-cell junctions should not be indiscriminately called JAIL, but that JAIL and JBL constitute different cellular activities acting in different tissue contexts. We have modified the text in the Discussion (lines 348-374).

      (1) JAIL have originally been described in cell culture experiments (Abu-Taha et al., 2014). According to this and subsequent papers by the same group, local dissolution of endothelial adherens junctions (i.e. downregulation of VE-cadherin) triggers the formation of lamellipodia-like structures. These protrusions eventually retract, followed by the reestablishment of EC junctions.

      (2) In our in vivo studies, we observed lamellipodial protrusions during endothelial cell rearrangements, and we call these structures JBL (Paatero et al., 2018). While JBL appear very similar to JAIL in general (i.e. regulation by Arp2/3 and its localization), we also observe two critical differences. For one, JBL form while maintaining the original (proximal) junction. Moreover, a distal junction is formed at the front edge of the JBL, leading to a “double junction” configuration. In our current manuscript, we have examined the role of actomyosin contractility and find that it correlates with and is required for the merging of proximal and distal junctions during JBL cycles. These observations indicates that the proximal and distal junctions are essential components of JBL function during endothelial cell elongation and rearrangements. These salient and distinct features prompted us to adopt the term junction-based-lamellipodia (JBL), in order to differentiate them from JAIL.

      (3) We like to argue that JAIL and JBL represent similar but different lamellipodia-like protrusions. JAILs are associated with the maintenance of endothelial integrity, and control permeability and trans-endothelial cell migration, as has been suggested by several publications (Cao et al., 2017; Kipcke et al., 2025; Seebach et al., 2021; Taha et al., 2014). In contrast, JBL drive cell rearrangements, by step-wise elongation of cell junctions leading to convergent cell movements.

      (4) Although JAIL have also been implicated in endothelial cell migration (Cao and Schnittler, 2019; Cao et al., 2017; Seebach et al., 2021), neither junctional patterns nor junctional dynamics have been analyzed in this context. We therefore propose that JAIL and JBL are actin-based protrusions forming at endothelial cell-cell junctions, but act in different contexts to provide cell motility (JBL) or endothelial integrity (JAIL).

      (5) Some of the quantification plots, specifically in figures 5d and 6c, do not display significant differences or distribution patterns. It would be beneficial to revise these graphs to clearly represent statistical significance and underlying data distributions.

      Because of the spatiotemporal heterogeneity, it is difficult to perform statistical quantifications across samples. In Figure 5c/d, we have imaged/analyzed myl9-EGFP in a mosaic situation, in which only one of interacting cells expresses high levels of myl9-EGFP. This is a rare situation and we managed to image only this example. Nevertheless, it is consistent with our other expression data of myl9-reporters and also with our previous photoconversion experiments using photoconvertible UCHD (Paatero et al., 2018, Figure 4), which shows that actin-rich JBL form at the front end of the endothelial cell in the direction of junction elongation. In Figure 5d, we have quantified the average intensity of GFP signal within the region of interest. The newly added error bars indicate the standard deviation between pixel intensities within the ROI.

      In Figure 6c, we have analyzed the Myl9b-mCherry intensities and find that it is redistributed during a JBL cycle. The spatial distribution is evident from the heat-map and we have not included a standard deviation. Myl9b-mCherry levels are very heterogenous and is not possible to quantify intensities across samples. We have, however, included four more examples of Myl9b-mCherry distribution in Supplementary Figure 4. The patterns observed in these samples are consistent with those in Figure 6.

      (6) The observation of myosin recruitment does not inherently imply a concomitant increase in actomyosin contractile activity. The inclusion of phospho-MLC staining would considerably strengthen the evidence for enhanced actomyosin activity.

      This is a good suggestion and we have extensively tried different anti-P-Myl antibodies (and protocols), but did not get them to work reliably on zebrafish embryos. We therefore rely on published work that has established the correlation between the recruitment of myosin light chain and increased actomyosin tension (Fernandez-Gonzalez et al., 2009; Munjal et al., 2015).

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1a is not described/mentioned in the Results.

      The have corrected this (lines 102-108). We have also added measurements to better present the different dynamics of F-actin (UCHD) and ZO1 within the JBL and the relative endothelial cell movements (see Figure 1b), as suggested by reviewer#1.

      (2) In Figure 3a, the authors claim that Arp2/3 is deposited at the distal side of the junction ring. While it is clear where the proximal junction is (ZO1-rich), the distal junction is less so (hardly any ZO1). It is therefore difficult to agree based on this time-lapse imaging that Arpc1b-Venus is at the distal junction. Can the authors please include panels showing merged channels and annotate where the proximal and distal junctions are?

      The activation of the Arp2/3 complex and the formation of the distal junction are sequential events. We see that ArpC1b oscillates with an accumulation at the onset and during JBL protrusion. In contrast, the distal junction is formed when the protrusive activity has been stopped. One caveat of the analysis shown in Figure 3a is that our ZO1 reporters label the distal junction only very weakly – this is in particular the case for the ZO1-tdTomato knock-in. The distal junction is better visible in VE-cadherin and UCHD reporters, as shown in Figures 5 to 7.

      (3) In Figures 3b and c, it is also difficult to distinguish proximal and distal junctions in these images. Please mark the boundaries in the image panels (Figure 3b) and indicate on the x-axis where the proximal and distal junctions are (Figure 3c).

      In Figure 3b, we show ArpC1b-Venus and mRuby-UCHD side-by-side. This Figure demonstrates that the Arp2/3 complex maintains its position at the front of the JBL during the protrusive phase (always distal to the UCHD signal). The imaging is done at very short intervals (1/30sec), which makes it difficult to follow entire oscillations due to photo-bleaching of the ArpC1b reporter.

      (4) The treatment of CK666 resulted in perturbed localization of Arpc1b-Venus. Therefore, the inhibition of junctional elongation can also be explained by the mislocalization of Arp2/3, rather than the inhibition of Arp2/3 activity at the junctions. Can the authors discuss this or perform another experiment that is more specific to manipulating Arp2/3 activity?

      CK666 is a well-established inhibitor of Arp2/3. Structural and functional analyses have shown that CK666 interferes with the interaction between Arp2 and Arp3, thereby preventing the activation of the complex (Hetrick et al., 2013; Padrick et al., 2011). We therefore conclude that the phenotypes we observe in CK666 treatment are due to Arp2/3 inhibition.

      It is possible that CK666 prevents ArpC1b binding to the Arp2/3 complex. However, published work suggests that ArpC1b can bind to Arp2/3 also in its inactive state (Chou et al., 2022). Thus, we can only speculate why we lose localization ArpC1b under CK666. We prefer not to do so.

      (5) In Figures 5d and 6c, is the quantification of Myl9 intensity of one cell only? If so, can the authors show the dynamics of average Myl9 intensity i) between forwarding and non-forwarding JBL poles and ii) as the proximal and distal junctions merge several endothelial cells?

      Figure 5c/d depicts two interacting cells, expressing different levels of Myl9a-EGFP. This is a rare experimental situation and we managed to image only this example. We quantified the average signal at both poles of the junctional ring within a region of interest. The newly added error bars indicate the standard deviation between pixel intensities within the ROI. The analysis has been done on immunofluorescent images, therefore a dynamic analysis over time is not possible.

      In Figure 6c, we have analyzed the Myl9b-mCherry intensities and find that it is redistributed during a JBL cycle. The spatial distribution is evident from the heat-map and we have not included a standard deviation. Myl9b-mCherry levels are very heterogenous and is not possible to quantify intensities across samples. We have, however, included four more examples of Myl9b-mCherry distribution in Supplementary Figure 4. The patterns observed in these samples are consistent with those in Figure 6.

      (6) Figure 5. The 'f' in the figure legend should be 'e' since there is no panel 'f'.

      We have corrected this.

      (7) Figure 7. As the boundaries for proximal and distal junctions are not always clear, especially when Cdh5 appears as clusters, how do you determine where the two junctions are in order to measure the interjunctional space? Please offer a clearer explanation in the Methods.

      We have added the following in the M&M. “Junctional merging tracking Speed of junctional merge was evaluated by monitoring isolated junctional rings during DLAV formation. Inhibitor treatment Y-27632 (75 μM) or DMSO (1%) were applied 30 min before mounting. The same concentrations of chemicals were applied to the low-melting-point agarose mounting medium and the E3 medium on top of it before imaging and imaging the junctions for 10-15 min on an Olympus SpinSR spinning disc microscope. Distances were measured using Fiji software. In each frame, the interjunctional distance was defined as the maximum distance between the proximal and distal junctions. A line was manually drawn between the proximal and distal junctions in Fiji, and its length was recorded. The same proximal and distal junction landmarks were used consistently across all time points.”

      (8) One would think that upon the inhibition of junctional mergence (by ROCK inhibition), actin polymerization would persist to push the distal junction forward to elongate the JBL. However, there is instead a decrease in junctional elongation (Figure 7b). Can the authors speculate why? Additionally, junction elongation can probably be achieved by continuous "pushing" of the distal junction alone (through actin polymerization). Can the authors speculate why there is a need/what is the benefit of merging proximal and distal junctions for junction elongation?

      These are all very interesting questions, but they are quite complex and would require extensive and speculative answers, which is outside the scope of this study. Nevertheless, here are a few quick thoughts on these issues.

      (1) When endothelial cells elongate, they have to overcome tensile forces at the junctions (generated by the subjunctional actomyosin belt). JBL are providing a tractive and deforming force, which overcomes the tensile force and thus promotes junctional elongation.

      (2) The distal junction is then providing an anchor to which the actin cytoskeleton can attach. The space between proximal and distal junction becomes a compartment of local actomyosin contraction, which provides the force for the ratchet to move the proximal junction forward  junctional mergence.

      (3) Thus, it is not the protrusion (pushing) itself that elongates the cell but the elongation of the junction (driven by actomyosin contraction)!

      (4) The maintenance of the proximal junction is most likely needed to ensure endothelial integrity during the JBL cycles.

      (5) How the frequency and the size of JBLs is regulated is not known. One possible player that might be involved is an internal clock mechanism (e.g. a feedback loop via small GTPases (such as Rac)  Arp2/3 regulation). Another possibility is that JBL size is limited by it sweeping up basally localized VE-cadherin (in cis-configuration). Increasing cell-cell adhesion (by VE-cad trans-interactions between the JBL and the underlying cell) eventually stop the protrusion. It is also possible that an cell-autonomously controlled mechanism of F-actin polymerization (actin pulses) are involved in the regulation of the JBC cycle length.

      (9) The animation showing the molecular mechanism of JBL function during endothelial junction elongation (Video 25) is very helpful in understanding the dynamic coupling between junctional proteins, actomyosin cytoskeleton, and junction remodelling. However, I wonder why there are no Myosin II proteins binding to the actin bundles during the merging of proximal and distal junctions (between 0:25 and 0:28), since this is one of the main findings by the authors in this study.

      Since we show two JBL cycles, we want to spread the information over both of them.

      References:

      Cao, J. and Schnittler, H. (2019). Putting VE-cadherin into JAIL for junction remodeling. J. Cell Sci. 132.

      Cao, J., Ehling, M., März, S., Seebach, J., Tarbashevich, K., Sixta, T., Pitulescu, M. E., Werner, A. C., Flach, B., Montanez, E., et al. (2017). Polarized actin and VE-cadherin dynamics regulate junctional remodelling and cell migration during sprouting angiogenesis. Nat. Commun. 8, 1–20.

      Chou, S. Z., Chatterjee, M. and Pollard, T. D. (2022). Mechanism of actin filament branch formation by Arp2/3 complex revealed by a high-resolution cryo-EM structure of the branch junction. Proc. Natl. Acad. Sci. U. S. A. 119, e2206722119.

      Fernandez-Gonzalez, R., Simoes, S. de M., Röper, J. C., Eaton, S. and Zallen, J. A. (2009). Myosin II Dynamics Are Regulated by Tension in Intercalating Cells. Dev. Cell 17, 736–743.

      Hetrick, B., Han, M. S., Helgeson, L. A. and Nolen, B. J. (2013). Small molecules CK-666 and CK-869 inhibit actin-related protein 2/3 complex by blocking an activating conformational change. Chem. Biol. 20, 701–712.

      Kipcke, J. P., Odenthal-Schnittler, M., Aldirawi, M., Franz, J., Bojovic, V., Seebach, J. and Schnittler, H. (2025). TNF-α induces VE-cadherin-dependent gap/JAIL cycling through an intermediate state essential for neutrophil transmigration. Front. Immunol. 16,.

      Law, A. L., Jalal, S., Pallett, T., Mosis, F., Guni, A., Brayford, S., Yolland, L., Marcotti, S., Levitt, J. A., Poland, S. P., et al. (2021). Nance-Horan Syndrome-like 1 protein negatively regulates Scar/WAVE-Arp2/3 activity and inhibits lamellipodia stability and cell migration. Nature Communications 2021 12:1 12, 5687-.

      Malchow, J., Eberlein, J., Li, W., Hogan, B. M., Okuda, K. S. and Helker, C. S. M. (2024). Neural progenitor-derived Apelin controls tip cell behavior and vascular patterning. Sci. Adv. 10, 1174.

      Munjal, A., Philippe, J. M., Munro, E. and Lecuit, T. (2015). A self-organized biomechanical network drives shape changes during tissue morphogenesis. Nature 524, 351–355.

      Paatero, I., Sauteur, L., Lee, M., Lagendijk, A. K., Heutschi, D., Wiesner, C., Guzmán, C., Bieli, D., Hogan, B. M., Affolter, M., et al. (2018). Junction-based lamellipodia drive endothelial cell rearrangements in vivo via a VE-cadherin-F-actin based oscillatory cell-cell interaction. Nat. Commun. 9,.

      Padrick, S. B., Doolittle, L. K., Brautigam, C. A., King, D. S. and Rosen, M. K. (2011). Arp2/3 complex is bound and activated by two WASP proteins. Proc. Natl. Acad. Sci. U. S. A. 108, E472–E479.

      Sauteur, L., Krudewig, A., Herwig, L., Ehrenfeuchter, N., Lenard, A., Affolter, M. and Belting, H. G. (2014). Cdh5/VE-cadherin promotes endothelial cell interface elongation via cortical actin polymerization during angiogenic sprouting. Cell Rep. 9, 504–513.

      Seebach, J., Klusmeier, N. and Schnittler, H. (2021). Autoregulatory “Multitasking” at Endothelial Cell Junctions by Junction-Associated Intermittent Lamellipodia Controls Barrier Properties. Front. Physiol. 11,.

      Taha, A. A., Taha, M., Seebach, J. and Schnittler, H. J. (2014). ARP2/3-mediated junction-associated lamellipodia control VE-cadherin-based cell junction dynamics and maintain monolayer integrity. Mol. Biol. Cell 25, 245–256.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their paper, Shimizu and Baron describe the signaling potential of cancer gain-of-function Notch alleles using the Drosophila Notch transfected in S2 cells. These cells do not express Notch or the ligand Dl or Dx, which are all transfected. With this simple cellular system, the authors have previously shown that it is possible to measure Notch signaling levels by using a reporter for the 3 main types of signaling outputs, basal signaling, ligand-induced signaling and ligand-independent signaling regulated by deltex. The authors proceed to test 22 cancer mutations for the above-mentioned 3 outputs. The mutation is considered a cluster in the negative regulatory region (NRR) that is composed of 3 LNR repeats wrapping around the HD domain. This arrangement shields the S2 cleavage site that starts the activation reaction.

      The main findings are:

      (1) Figure 1: the cell system can recapture ectopic activation of 3 existing Drosophila alleles validated in vivo.

      (2) Figure 2: Some of the HD mutants do show ectopic activation that is not induced by Dl or Dx, arguing that these mutations fully expose the S2 site. Some of the HD mutants do not show ectopic activation in this system, a fact that is suggested to be related to retention in the secretory pathway.

      (3) Figure 3: Some of the LNR mutants do show ectopic activation that is induced by Dl or Dx, arguing that these might partially expose the S2 site.

      (4) Figure 4-6: 3 sites of the LNR3 on the surface that are involved in receptor heterodimerization, if mutated to A, are found to cause ectopic activation that is induced by Dl or Dx. This is not due to changes in their dimerization ability, and these mutants are found to be expressed at a higher level than WT, possibly due to decreased levels of protein degradation.

      Strengths and Weaknesses:

      The paper is very clearly written, and the experiments are robust, complete, and controlled. It is somewhat limited in scope, considering that Figure 1 and 5 could be supplementary data (setup of the system and negative data). However, the comparative approach and the controlled and well-known system allow the extraction of meaningful information in a field that has struggled to find specific anticancer approaches. In this sense, the authors contribute limited but highly valuable information.

      Reviewer #2 (Public review):

      Summary:

      This ambitious study introduced 22 mutations corresponding to amino acid substitution mutations known to induce cancer in human Notch1, located within the Negative Regulatory Region, into the Drosophila Notch gene. It comprehensively examined their effects on activity, intracellular transport, protein levels, and stability. The results revealed that the impact of amino acid substitutions within the Negative Regulatory Region can be grouped based on their location, differing between the Heterodimerization Domain and the Lin12/Notch Repeat. These findings provide important insights into elucidating the mechanisms by which amino acid substitution mutations in human Notch1 cause leukemia and cancer.

      Strengths:

      In this study, the authors successfully measured the activity of amino acid-substituted Notch with high precision by effectively leveraging the advantages of their previously established experimental system. Furthermore, they clearly demonstrated ligand-dependent and Deltex-dependent properties.

      Weaknesses:

      Amino acid substitution mutations exhibit interesting effects depending on their position, so interest naturally turns to the mechanisms generating these differences. Unfortunately, however, elucidating these mechanisms will require considerable time in the future. Therefore, it is reasonable to conclude that questions regarding the mechanism fall outside the scope of this paper.

      We thank the editors and reviewers for their initial reviews and constructive suggestions. We have revised the manuscript with some additional data contained in two additional supplementary figures and by the inclusion of additional text.

      Reviewer #3 (Public review):

      While this is indeed an exciting set of observations, the work is entirely cell-line-based, and is the primary reason why this approach dampens the enthusiasm for the study. The analysis is confined to Drosophila S2 cells, which may not fully recapitulate tissue or organism-level regulatory complexity observed in vivo. Some Drosophila HD domain mutants accumulate in the secretory pathway and do not phenocopy human T-ALL mutations. Possibly due to limitations on physiological inputs that S2 cells cannot account for, or species-specific differences such as the absence of S1 cleavage.

      Thus, the findings may not translate directly to understanding Notch 1 function in mammalian cancer models. While the manuscript highlights mechanistic variety, the functional significance of these mutations for hematopoietic malignancies or developmental contexts in live animals remains untested. Overall, the work does not yet provide evidence for altered Notch signaling that is physiologically relevant.

      S2 cells are a standard cell culture model which have been extensively used for analysing Notch signalling mechanisms and by and large are found to recapitulate the mechanisms of Notch activation and its regulation in vivo. However, we agree that it will be desirable in future work to build on our current findings by generating Notch mutants in vivo in Drosophila as the in vivo context may introduce additional nuances in the behaviour of the mutants.This can be done by overexpressing cDNA constructs in particular tissues, or more physiologically by generating endogenous gene mutations using CRISPR/Cas9 based gene editing. However, the likely outcome of the latter approach is embryo lethality due to constitutive over-activation during development. Therefore, methods of genetic manipulation need to be applied which allow the final activating mutant form to be generated in somatic clones. We feel that this would be considerable amount of additional work and is out of scope for the current study, but we look forward to developing this approach in future work.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (a) Table 1: Explain the rationale for mapping non-conserved residues between human and fly Notch; consider adding an alignment or supplementary figure.

      We have added a new Supplementary figure S2 showing an alignment of Notch sequences from different species to indicate the degree of conservation at the sites chosen for our mutagenesis study. Some locations were highly conserved and some locations less so. Both conserved and non-conserved residues were included to examine how structural perturbations at equivalent positions affect signalling activity, independent of sequence conservation. In addition to the new supplementary figure, we have changed the text in the Table 1 legend to clarify.

      (b) Add or discuss data connecting LNR and HD mutant expression levels with stability and degradation mechanisms.

      We have added additional text in the results section referring to Fig6A/B regarding the varying Notch protein levels between the different mutants. With regard to the slower degradation kinetics of certain LNR-C mutants in Fig6 E/F, we have also added a new supplementary figure S3 which shows that mutants from the LNR/HD interface do not behave similarly to the LNR-C mutants with respect to their degradation kinetics.

      (c) Some mutants, especially those retained in the secretory pathway, are insufficiently characterized. The mechanism underlying their differential trafficking and stability remains underexplored.

      We have added some extra text to the discussion section which explores the issue of secretory pathway retention of HD mutants in Drosophila cells further.

      (2) Figure Legends:

      (a) Figure 1A - Explain the ribbon vs. space-filling representation and color coding; include a definition of the Heterodimerization Domain.

      We have added extra text to the Figure 1A legend

      (b) Figure 2E - Clarify mutant selection; if possible, include additional examples for consistency.

      We added extra text regarding selection of mutants for study into the legend of Figure 2

      (c) Figure 3-4 - Explain logic for alanine substitutions; discuss difference at residue 1570 (P vs. A).

      We added the following text to the result section. “Y1532 and Y1535 are not mutated in human cancers and therefore could not be assessed through patient-derived variants. Alanine substitution provides a controlled way to probe their contribution to NRR integrity and activation sensitivity by selectively removing their side-chain interactions while preserving overall fold.” We added extra text in the discussion section regarding the differences in the outcomes of the 1570 to A and P mutations.

      (d) Figure 4 - Improve resolution and legibility.

      We have replaced figure 4.

      (e) Figure 6C - Correct residue numbering (1563, 1566).

      Thank you for spotting this. This has been corrected.

      (f) Figure 6F - Include control where protein levels do not increase.

      A new supplementary figure S3 has been added which included this control data.

      (3) Contextual and Conceptual Framing:

      (a) Incorporate the limitations of the S2 system, and delineate which mechanistic insights are likely conserved versus those that may be species- or context-specific.

      We have incorporated text to discuss S2 cell limitations.

      (b) The study does not test functional consequences in hematopoietic or developmental contexts. Expand the discussion to emphasize how these cell-based findings could inform future in vivo studies or mammalian cancer modeling.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This manuscript offers valuable structural and mechanistic insights into the structure and assembly of the Type II internal ribosome entry site (IRES) from encephalomyocarditis virus (EMCV) and the translation initiation complex, revealing a direct interaction between the IRES and the 40S ribosomal subunit. While a solid cryo-EM method was used, enhancing the overall resolution or adding complementary biochemical data would further improve the clarity and impact of this study. This manuscript will attract researchers in cap-independent translation, host-pathogen interactions, and virology.

      We thank the editorial team for a favourable assessment and for mentioning our work as ‘valuable’. In the following sections, we have addressed the weaknesses and recommendations pointed out by the Reviewers and hope for an improvement in the description of this work.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have studied how a virus (EMCV) uses its RNA (Type 2 IRES) to hijack the host's protein-making machinery. They use cryo-EM to extract structural information about the recruitment of viral Type 2 IRES to ribosomal pre-IC. The authors propose a novel interaction mechanism in which the EMCV Type 2 IRES mimics 28S rRNA and interacts with ribosomal proteins and initiator tRNA (tRNAi).

      Strengths:

      (1) Getting structural insights about the Type 2 IRES-based initiation is novel.

      (2) The study allows a good comparison of other IRES-based initiation systems.

      (3) The manuscript is well-written and clearly explains the background, methods, and results.

      We thank Reviewer 1 for appreciating our efforts and finding structural insights about the Type 2 IRES-based initiation presented in this study as novel.

      Weaknesses:

      (1) The main weakness of the work is the low resolution of the structure. This limits the possibility of data interpretation at the molecular level.

      However, despite the moderate resolution of the cryo-EM reconstructions, the model fits well into the density. The analysis of the EMCV IRES-48S PIC structure is thorough and includes meaningful comparisons to previously published structures (e.g., PDB IDs - 7QP6 and 7QP7). These comparisons showed that Map B1 represents a closed conformation, in contrast to Map A in the open state (Figure 2). Additionally, the proposed 28S rRNA mimicry strategy supported by structural superposition with the 80S ribosome and sequence similarity between the I domain of the IRES and the h38 region of 28S rRNA (Fig. 4) is well-justified.

      We agree that the low resolution of the map has compromised the data interpretation at the molecular level, and we thank the reviewer for appreciating our findings at this resolution. Due to the low resolution, we have reported findings for stretches or regions such as the domain I loops and stems, rather than individual nucleotides.

      (2) The lack of experimental validation of the functional importance of regions like the GNRA and RAAA loops is another limitation of this study.

      We agree about the lack of additional experiments other than Cryo-EM for probing the importance of regions such as GNRA and RAAA loops in this study. Previously, multiple studies have reported on the importance of GNRA and RAAA loops and we have cited them in the manuscript. The essentiality of RAAA loop for type 2 IRES was demonstrated in earlier report López de Quinto and Martínez-Salas, 1997 (Cited in manuscript). Further, the conservation of this loop across the type 2 IRES family adds to the importance of this loop (Manuscript Figure 6B). This loop and its flanking G-C stem are similar to h38 of 28S rRNA, and it appears that RAAA loop adopts a mimicry mechanism to interact with the 40S ribosomal protein- uS19, thus highlighting its importance for interaction with 40S. Experiments destabilising the G-C stem also compromise IRES activity, as shown for the case of FMDV IRES (Fernández et al 2011). Previous studies related to the mutation of the GNRA or GCGA loop in EMCV IRES have shown a deficiency in IRES activity (Roberts and Belsham, 1997; Robertson et al 1999), suggesting the importance of these regions in the viral IRES biology, and these reports are cited in the manuscript. Not only EMCV IRES, but mutation in the GUAA (representative of GNRA) loop of FMDV IRES also showed a significant reduction in IRES activity (López de Quinto and Martínez-Salas, 1997). In this work, we observe that the GCGA loop interacts with tRNA<sub>i</sub> in the EMCV IRES-48S PIC, thus implicating the importance of this loop. Moreover, incubation of FMDV IRES with 40S ribosomes has shown a decrease in SHAPE reactivity in domain 3 apex (position 170- 200 nucleotides) (Lozano et al 2018), which corresponds to EMCV IRES domain I apex.

      However, to address this concern in the revised manuscript we mutated these loops and performed luciferase assay (Supplementary figure 4 A). The results showed decreased IRES activity (Pg 10) and correlated with previous reports demonstrating the importance of these regions for overall IRES activity.

      (3) Minor modifications related to data processing and biochemical studies will further validate and strengthen the findings.

      (a) In the cryo-EM data section, the authors should include an image showing rejected particles during 2D classification. This would help readers understand why, despite having over 22k micrographs with sufficient particle distribution and good contrast, only a smaller number of particles were used in the final reconstruction. Additionally, employing map-sharpening tools such as Ewald sphere correction, Bayesian polishing, or reference-based motion correction might further improve the quality of the maps. Targeting high-resolution structures would be particularly informative.

      We have included the image for rejected 2D classes (Author response image 1). We agree with the Reviewer’s query related to the huge number of micrographs and relatively smaller number of particles for the final reconstruction. Since the total number of micrographs (22000) is the summation of multiple datasets, prepared and collected at different times, the distribution of the particles per micrograph was not uniform in all sessions, ranging from good to poor. Among these, around 8000 micrographs have poor particle number and distribution. As a result, the number of particles per micrograph is heterogeneous across the compiled dataset, and only 237054 ribosomal particles were obtained after multiple rounds of 2D and 3D classification. Further, the final reconstruction was performed using particles obtained after masked classification for IRES and ternary complex density. Only the particles that show the best density for both IRES and ternary complex are used for this map. Another set of particles that have only a portion of IRES and NO density for ternary complex forms another map. And we have a third map with an empty 40S.

      We thank the reviewer for the suggestions to improve the quality of the maps further. As suggested, we started with the processing of the data. However, during this process the common computational cluster that were using for this data processing had to be physically relocated, and unfortunately after the relocation we faced technical issues in accessing and continuing with the processing. Several attempts to resolve the issue with the help of IT team failed. Thus, we lost 3-4 months without any progress. Therefore, we used Relion on our in-house workstation to process the data files from the start, as our in-house computational resources are unequipped to run cryoSPARC processes (for large dataset due to memory limitations).

      We reprocessed the datasets in Relion5 and did ‘Bayesian Processing’, for reference-based beam-induced motion correction per-particle. Post-processing, we used cryoSPARC to merge the particles and tried classifying the good ribosome particles using focus-based masked classification, as shown in Supplementary Figure 1.1. However, this processing did not improve the resolution, as Map B (containing 40S, tRNA, IRES) had an overall resolution of 4.8 Å (Author response image 2). Therefore, we would like to report the same maps as given in the initial submission.

      We estimated the time to redo the entire processing using cryoSPARC on the common computational cluster, and it would take us another 3-4 months or more and we do not anticipate a massive improvement in the extra density.

      Author response image 1.

      The selected 2D classes and the rejected 2D classes from initial round of classification, and the final selected 2D classes, which were subjected to Ab-initio reconstruction to get the good ribosome particles.

      Author response image 2.

      Reprocessing of the entire dataset using Relion5 for polishing of selected particles, followed by 3D classification and refinements in cryoSPARC.

      (b) The strategic modelling of different IRES domains into the density, particularly the domain into the region above the 40S head, is appreciable. However, providing the full RNA tertiary structure (RNAfold) of the EMCV IRES (nucleotides 280-905) would better explain the logic behind the model building and its molecular interpretation.

      We thank the reviewer for appreciating the modelling of the domain I apex in the cryo-EM density. We tried to predict the full tertiary structure of the IRES using Alphafold3; however, inclusion of the full-length sequence from 280-905 gave models of extremely low confidence (Author response image 3), and a few domains do not abide by the secondary structure of EMCV IRES as reported in Duke et al 1992.

      Author response image 3.

      Prediction of tertiary structure of EMCV IRES (280-905 nucleotides) and zoomed features for each domain present in the IRES. The predicted aligned error plot for the RNA structure is shown.

      We used individual domains of EMCV IRES and predicted the tertiary structure, independent of other IRES domain using Alphafold3. As a result, the confidence scores improved, and the tertiary structures also correlated with the experimentally determined EMCV IRES secondary structure (Duke et al 1992; Maloney and Joseph, 2024). Although the overall tertiary structure of EMCV IRES is lacking, recent studies were able to solve the structures of EMCV IRES domains in complex with their respective binding proteins. We superimposed the independently predicted domains D, E, and F tertiary structure on the NMR ensemble of IRES domain D to F with PTB1 (Dorn et al 2023), where the predicted domains fit in the experimental model. Similarly, we used the cryo-EM structure of domain J-K-eIF4G-eIF4A (Imai et al 2023) and found a close fit with the predicted structures. The analysis highlighted that the domain I apex serves as the best fit with the extra density with respect to architecture and fitting. This analysis is now added in the revised manuscript in Supplementary figure- 3.2.

      Furthermore, 3D structural models of FMDV IRES domains 2, 3, and 4 (corresponding to EMCV IRES domains- H, I, and J-K) were predicted from SHAPE reactivity values and RNAComposer server (Figure 3, Lozano et al 2018). The predicted architecture of domain 3 apex (FMDV IRES) coincides with our domain I apex model (EMCV IRES).

      (c) Although the authors compare their findings with other types of IRESs (Types 1, 3, and 4), there is no experimental validation of the functional importance of regions like the GNRA and RAAA loops. Including luciferase-based assays or mutational studies of these regions for validation of structural interpretations is strongly recommended.

      We have discussed the possibility of how the other IRESs, such as type 1 and type 5, might use similar strategies as EMCV IRES to assemble the 48S PIC, given the similarity in the motif sequence and position across the viral IRESs. Like EMCV IRES, the type 1 IRES (Poliovirus, Coxsackie virus, etc.) also harbours the GNRA loop, preceded by a C-rich loop at its longest domain, known for long-range RNA-RNA interactions. The segment harbouring GNRA loop is highly conserved across the type 1 family of IRESs (Kim et al 2015). The Aichi viral IRES harbours a GNRA loop in its longest domain, that is, domain J. Deletion of the GNRA loop has compromised the IRES activity; however, substitution mutations in this region have elevated the IRES activity or remained unaltered (Yu et al 2011). We have hypothesized that these IRESs might use the GNRA motifs in their longest domain (domain IV in type 1, and domain J in Aichi virus- type 5) based on the location and architecture to that of EMCV IRES, where GNRA is present in the longest domain (I) and preceded by a C-rich loop where it can potentially mediate long-range interactions with tRNA<sub>i</sub>, as all these IRESs require eIF2-ternary complex for the formation of 48S PIC. Parallelly, like EMCV IRES, type 1 and type 5 IRESs have the placement of this GNRA motif-containing domain before the eIF4G-binding domain. Thus, we suggest the possibility of adoption of a similar strategy by these IRESs to interact with tRNA<sub>i</sub> during the formation of 48S PIC. During the revision of this work a preprint reported the structure of polioviral IRES-48S PIC where domain IV apex (similar to domain I apex in EMCV IRES) interacts with uS13 and uS19, and the GNRA loop directly interacts with tRNA<sub>i</sub> during start codon recognition (Velazquez et al 2025). We hypothesize that Aichiviral IRES might use this motif to mediate long-range interactions with tRNA<sub>i</sub>, similar to type 1 and type 2 IRESs, as all these IRESs require eIF2-ternary complex for the formation of 48S PIC.

      Reviewer #2 (Public review):

      Summary:

      The field of protein translation has long sought the structure of a Type 2 Internal Ribosome Entry Site (IRES). In this work, Das and Hussain pair cryo-EM with algorithmic RNA structure prediction to present a structure of the Type 2 IRES found in Encephalomyocarditis virus (EMCV). Using medium to low resolution cryo-EM maps, they resolve the overall shape of a critical domain of this Type 2 IRES. They use algorithmic RNA prediction to model this domain onto their maps and attempt to explain previous results using this model.

      Strengths:

      (1) This study reveals a previously unknown/unseen binding modality used by IRESes: a direct interaction of the IRES with the initiator tRNA.

      (2) Use of an IRES-associated factor to assemble and pull down an IRES bound to the small subunit of the ribosome from cellular extracts is innovative.

      (3) Algorithmic modeling of RNA structure to complement medium to low resolution cryo-EM maps, as employed here, can be implemented for other RNA structures.

      We thank Reviewer 2 for positive and encouraging comments on our work, appreciating our ‘innovative’ approach of using IRES-associated factor to assemble and pull down the IRES-bound ribosomal complex.

      Weaknesses:

      (1) Maps at the resolution presented prevent unambiguous modelling of the EMCV-IRES. This, combined with the lack of any biochemical data, calls into question any inferences made at the level of individual nucleotides, such as the GNRA loop and CAAA loop (Figure 4).

      We understand the concerns raised by the reviewer related to the resolution of the EMCV IRES-48S PIC map. We refrained from commenting on individual nucleotides or molecular interactions in the manuscript. Instead, we discuss loops, RNA stretches or motifs that could be inferred with more confidence in the IRES density as shown in Figure 4. The EMCV IRES can directly interact with the 40S ribosome using its domain H and I (Chamond et al 2014), however, the details of this interaction were unknown. We observe that the CAAA loop of domain I apex interacts with 40S ribosome based on the placement of a portion of domain I in the cryo-EM map. This is also reflected in the SHAPE data (Chamond et al 2014-Supplementary figures 2, and 8), where a decrease in reactivity is evident in the presence of 40S ribosome. In addition, incubation of EMCV IRES with rabbit reticulocyte lysate (RRL) offered protection to domain I apex regions, which included the CAAA loop (Maloney and Joseph, 2024- Figure 4b).

      Furthermore, this decrease in SHAPE reactivity pattern is evident for FMDV IRES domain 3 apex (similar to domain I in EMCV IRES) in the presence of 40S ribosome (Lozano et al 2018). Thus, these studies are consistent with the placement of IRES model in the cryo-EM map. Moreover, we performed structural analysis (mentioned above) which showed that the domain I apex serves as the best fit with the extra density with respect to architecture and fitting (Supplementary figure- 3.2).

      (2) The EMCV IRES contains an upstream AUG at position 826, where the PIC can assemble (Pestova et al 1996; PMID 8943341). It is unclear if this start codon was mutated in this study. If it were not mutated, placement of AUG-834 over AUG-826 in the P-site is unexplained.

      We thank the reviewer for bringing up this point, as we missed mentioning this in the initial submission. The EMCV IRES does not require scanning and directly positions the AUG-834 at the P site (Pestova et al 1996). In Pestova et al 1996, the intensity of the toeprint at AUG-834 is more intense than that of AUG-826. Further, AUG-834 lies in the Kozak context, whereas AUG-826 has a poor Kozak context, and AUG-826 codon is not in-frame with AUG-834. Therefore, the synthesis of the polypeptide requires AUG-834 at the P site. In our cryo-EM map, we observed that the tRNA<sub>i</sub> is in a P<sub>IN</sub> state, which indicates the recognition of the start codon, and we reasoned that it is more likely that AUG-834 is placed at the P site than AUG-826. We have mentioned this in the revised manuscript as we had NOT mutated AUG-826 (Pg 8).

      (3) The claims the authors make about (i) the general overall shape and binding site of the IRES, (ii) its gross interaction with the two ribosomal proteins, (iii) the P-in state of the 48S, (iv) the rearrangement of the ternary complex are all warranted. Their claims about individual nucleotides or smaller stretches of the IRES-without any supporting biochemical data-is not warranted by the data.

      We thank the reviewer for warranting major claims, and due to the low-resolution we have reported findings for stretches or regions such as the domain I loops and stems, rather than individual nucleotides. The interaction of domain I apical region with uS13, uS19, and tRNA<sub>i</sub> is also observed the high-resolution structure of reconstituted EMCV IRES-48S PIC that was reported in a preprint while our work was under peer review process (Bhattacharjee et al 2025). Thus, the reconstituted EMCV IRES-48S PIC (Bhattacharjee et al 2025) also supports our assignment of domain I and its conserved loops, interacting with ribosome and tRNA<sub>i</sub>.

      Reviewer #3 (Public review):

      Summary:

      Type II IRES, such as those from encephalomyocarditis virus (EMCV) and foot-and-mouth disease virus (FMDV), mediate cap-independent translation initiation by using the full complement of eukaryotic initiation factors (eIFs), except the cap-binding protein eIF4E. The molecular details of how IRES type II interacts with the ribosome and initiation factors to promote recruitment have remained unclear. Das and Hussain used cryo-electron microscopy to determine the structure of a translation initiation complex assembled on the EMCV IRES. The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Strengths:

      The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Weaknesses:

      While this reviewer acknowledges the technical challenges inherent in determining the structure of such a highly flexible complex, the overall resolution remains insufficient to fully support the authors' conclusions, particularly given that cryo-EM is the sole experimental approach presented in the manuscript.

      The study is biologically significant; however, the authors should improve the resolution or include complementary biochemical validation.

      We thank Reviewer 3 for acknowledging the technical challenges in this study and finding our study biologically significant. We understand the concerns related to low resolution and the requirement of complementary biochemical validation for our reported observations and interpretations in the manuscript. We tried to improve the resolution, but the improvement was not sufficient to resolve the IRES at the nucleotide level. Independently, another group has reported the same findings at a higher resolution while our work was under peer review process (Bhattacharjee et al 2025), which corroborates our structural data on EMCV IRES and its interaction with ribosome and tRNA<sub>i</sub> in its 48S PIC stage. Further, in the revised manuscript we also present biochemical validation for GNRA and RAAA loops in EMCV IRES. We mutated these loops and performed luciferase assay (Supplementary figure 4 A). The results showed decreased IRES activity (Pg 10) and correlated with previous reports (Roberts and Belsham, 1997; López de Quinto and Martínez-Salas, 1997; Robertson et al 1999) demonstrating the importance of these regions for overall IRES activity.

      Reviewing Editor Comments:

      The reviewers' comments are appended. While the reviewers acknowledge the complexity associated with this system, they also raised concerns about the modeling of RNA and registering its sequence in low-resolution maps. We believe that the strength of evidence and overall impact of your study can be elevated by providing higher-resolution cryo-EM data or complementary biochemical studies and addressing reviewers' concerns.

      Reviewer #2 (Recommendations for the authors):

      (1) Science:

      Have the authors tried a focused refinement (local refinement in cryoSPARC) using a generous mask that encloses the head and the IRES but excludes the ternary complex and the body of the 40S? This can be done with all the particles in map B (~55K) and has the possibility of improving the resolution of domain I which can be subsequently used to build a better model of the IRES. See the middle right panel, light yellow colored mask in Figure 1A in PMID 37659578 for the type of mask being suggested.

      We did another round of 2D classification to eliminate any residual junk in the ~55k particle set, corresponding to Map B. Post classification, 49439 particles were selected and refined using non-uniform refinement to get Map B11. The overall resolution of Map B11 was 4.6 Å. Thereafter, we made a mask around the 40S head-IRES-tRNA on Map B11 and subjected the class for local refinement. The overall local resolution in the masked region improved to 4.5 Å (Author response image 4).

      Author response image 4.

      Data processing- Map B particles were 2D classified, and further junk was cleared as rejected particles. The selected particles were refined using non-uniform refinement to get Map B11, and later, a focused mask circling the head-tRNA-IRES region was used for local refinement in the region to yield map B111.

      We estimated the local resolution across the focused region in Map B111 and compared this with that of Map B (Author response image 5). The local refinement shows minor improvement in the local resolution in this region, and is not sufficient to resolve the IRES density at the level of nucleotides.

      Author response image 5.

      Comparison of local resolution across head-IRES-tRNA in map B1 (as reported in the manuscript) and Map B111.

      (2) Presentation:

      (a) Please use the previously established convention of naming the domains: "domain I", "domain H", etc, instead of "I domain" or "J-K domain" while describing parts of the IRES.

      We have made the changes as per the established convention.

      (b) Figure 2B reports a 6.9 A distance vs. 7 A in the text. Please use ~ or approximately to keep numbers consistent.

      We have used ~ symbol to suggest the approximate distance.

      (c) References missing on page 15 when referring to "previously determined HCV and CrPV structures".

      We have added the references (Pg 12).

      (d) Please edit the text for typos and sentence structure.

      The typos and sentence structure were corrected wherever necessary.

      (e) Some phrases and sentences (e.g. last few sentences of the first paragraph in the discussion) could be rewritten for clarity.

      Previous sentence- “The domain I of EMCV IRES is similar to domain IV of polioviral IRES (or other type 1 IRESs such as Coxsackie viral IRES) in terms of length, secondary structure, and conserved motifs (GNRA, C-rich) positioning (Fig. 6C), therefore, anticipating a similar interaction with tRNA<sub>i</sub>, highlighting a sequestering tendency by competing with cellular mRNAs.”

      Rephrased sentence- “Like EMCV IRES, the type 1 IRES (Poliovirus, Coxsackie virus, etc.) also harbours the GNRA loop, preceded by a C-rich loop at its longest domain, known for long-range RNA-RNA interactions. The segment harbouring GNRA loop is highly conserved across the type 1 family of IRESs (Kim et al 2015). The domain I of EMCV IRES is similar to domain IV of polioviral IRES or other type 1 IRESs in terms of length, secondary structure, and conserved motifs (GNRA, C-rich) positioning (Fig. 6C). Therefore, we anticipate a similar interaction of domain IV (in type 1 IRES class) with tRNA<sub>i</sub>. Also, this interaction of IRES with tRNA<sub>i</sub> could be a strategy by which these IRESs can sequester the tRNA<sub>i</sub> pool in the cell, rendering them unavailable for capped cellular mRNAs.”

      Reviewer #3 (Recommendations for the authors):

      (1) For the revision process, the authors provided three atomic models alongside their corresponding cryo-EM density maps, including a 48S complex in closed conformation. Given this conformation, it is reasonable to interpret the structure as representing a post-start codon recognition state (late-stage initiation). However, this reviewer finds that the local resolution within the mRNA channel is insufficient to support the atomic model building as presented. The density does not allow for an unambiguous assignment of nucleotides in this region; the authors should either improve the local resolution or remove the modeled mRNA from the structure.

      We understand the concern of the Reviewer. Although the mRNA density in the channel is poor, we modelled the mRNA with AUG-834 at the P site because the known biology of EMCV IRES. The EMCV IRES does not require scanning and directly positions the AUG-834 at the P site (Pestova et al 1996). In Pestova et al 1996, the intensity of the toeprint at AUG-834 is more intense than that of AUG-826. Further, AUG-834 lies in the Kozak context, whereas AUG-826 has a poor Kozak context, and AUG-826 codon is not in-frame with AUG-834. Therefore, the synthesis of the polypeptide requires AUG-834 at the P site. In our cryo-EM map, we observed that the tRNA<sub>i</sub> is in a P<sub>IN</sub> state, which indicates the recognition of the start codon, and we reasoned that it is very likely that AUG-834 is placed at the P site.

      (2) As noted by the authors, the start codon in the EMCV IRES is positioned within a strong Kozak sequence. The nucleotide at position -3 is known to interact with eIF2α, yet, in the current model, A831 is positioned such that physical contact with eIF2α would be structurally impossible. This discrepancy raises concerns about the accuracy of the modeled eIF2α, which, like other regions of the structure, is not clearly supported by the cryo-EM density. The authors should revise the atomic model of eIF2α to ensure it is consistent with the experimental map and established molecular interactions.

      In our analysis of EMCV IRES-48S PIC, we could observe eIF2α and eIF2γ in Map B and B1. However, the local resolution was low to model the entire protein with side-chains (Supplementary figure 1.2 A). So, we used rigid body fitting of eIF2α and eIF2γ (Author response image 6). From the model, we could trace the backbone of Arg55, however could not resolve the side chain. Similarly, the mRNA in the channel was modelled based on placement of AUG-834 at the P site for EMCV IRES, which enabled us to model the flanking residues, rather than at the nucleotide-level resolution. We anticipate that a higher resolution structure will be able to capture this interaction of eIF2α with mRNA nucleotide (-3), therefore refrained from commenting on this interaction in the manuscript. In the revised manuscript, we have removed the side chains of eIF2α and eIF2γ, and kept the Cα-backbone only. The map-model statistics of map B1 is updated in table 1.

      Author response image 6.

      (left) Fitting of eIF2α model in the map. (right) Fitting of Cα backbone of eIF2α and mRNA in the map.

      (3) The authors observed additional density interacting with ribosomal proteins uS19 and uS13, and tRNA, which they tentatively assign to domain I of the IRES. Although the local resolution in this region does not allow an unambiguous assignment, the interpretation is reasonable. However, further structural and functional validation is necessary to support this assignment. The authors should improve the local resolution, either by performing focused refinement or by increasing the number of particles used in the reconstruction.

      The assignment of the extra density to domain I of the IRES was based on the architecture of the density. This density allows no other IRES domain to fit in this region (Supplementary figure 3.2). We tried to improve the local resolution using focused refinement, but the resolution was insufficient to resolve the IRES at the nucleotide level. Please see the above-mentioned comments in this regard on Pg 12.

      (4) Figure 5 shows a slight shift in the position of the ternary complex. Is the observed tRNA conformation compatible with the structural rearrangements required for 60S subunit joining?

      During the transition of 48S PIC to 80S elongation-competent complex, there are major changes in the conformation of tRNA<sub>i</sub>, due to the joining of eIF5B, and release of eIF2 (Petrychenko et al 2024). This joining event of eIF5B positions the tRNA<sub>i</sub> elbow and acceptor stem towards the 40S body to aid 60S ribosomal subunit joining (Petrychenko et al 2024). However, in the context of EMCV IRES-48S PIC, we observed that the position of tRNA<sub>i</sub> elbow and acceptor stem is towards the 40S head, and away from the body. On superimposing the human 48S PIC structure (before 60S joining), 48S-5 (PDB Id- 8PJ5- Petrychenko et al 2024), we note that tRNA<sub>i</sub> in EMCV IRES-48S PIC is away from the canonical tRNA<sub>i</sub> position (in contact with eIF5B). Therefore, we anticipate a change in tRNA<sub>i</sub> conformation during eIF5B joining and eIF2 release. This hypothesis coincides with the fact that the IRES interacting with the tRNA<sub>i</sub> elbow needs to be displaced from the position to facilitate the interaction of tRNA<sub>i</sub> with eIF5B. Moreover, this rearrangement would also aid in 60S joining and prevent any clash with the IRES domain I. We have added this in Results selection 5 and Figure 5D.

      (5) In the discussion section, the authors state: "eIF3-eIF4G interaction is dispensable for EMCV IRES-48S PIC formation, so we do not rule out the possibility that EMCV IRES may dislodge eIF3 from its position on the solvent surface as observed in the case of HCV IRES (Hashem et al, 2013)." This statement is highly speculative. Is there any experimental or structural evidence to support this proposed mechanism in the context of EMCV IRES?

      Previous biochemical reports on the eIF3-eIF4G interaction suggested that eIF4G residues from 1011-1104 interact with eIF3 (Villa et al 2013). In the context of EMCV IRES, this region of eIF4G is not required to form 48S PIC on the IRES, suggesting the eIF3-eIF4G interaction is dispensable for EMCV IRES-48S PIC formation. However, the recent structure of the human canonical 48S PIC has shown that the eIF4G-HEAT1 domain can interact with eIF3 subunits c, h, and l, and that eIF4G-bound eIF4A can interact with 40S ribosomal protein eS7, thus mediating the interaction between eIF4-bound mRNA and the 43S PIC (Brito Querido et al 2024) but the known eIF3-binding region in eIF4G was not captured in the map. Although the canonical eIF3-eIF4G interaction is essential in the case of cap-dependent initiation, this interaction could be dispensable for 48S PIC formation on EMCV IRES. In case of HCV IRES-mediated initiation, eIF3 is displaced from its canonical position that facilitates the binding of HCV IRES to 40S ribosomal subunit (Hashem et al 2013). We did not see any density corresponding to eIF3 in the obtained maps. Further, we have used focused classification using a mask on the canonical eIF3 position; however, we do not see any density corresponding to eIF3 in the EMCV IRES-48S PIC complex. Therefore, we hypothesized the possibility that eIF3 might be dislodged from its canonical binding site on the 40S ribosomal subunit. However, as per the recent independent report on EMCV IRES-48S PIC, eIF3 is present in the complex (Bhattarcharjee et al 2025).

      Hence, we have rephrased the existing sentence- “However, eIF3-eIF4G interaction is dispensable for EMCV IRES-48S PIC formation, so we do not rule out the possibility that EMCV IRES may dislodge eIF3 from its position on the solvent surface as observed in case of HCV IRES (Hashem et al 2013).”

      Rephrased sentence- “However, the canonical eIF3-eIF4G interaction (Villa et al 2013) is dispensable for EMCV IRES-48S PIC formation (Lomakin et al 2000; Sweeney et al 2014), and we do not see any density for eIF3 even after focused classification. However, as per the recent independent report on reconstituted EMCV IRES-48S PIC, eIF3 is present in the complex at the canonical position (Bhattarcharjee et al 2025). This position of eIF3 further highlights the possibility that eIF4G-eIF4A proteins are also placed similarly to the canonical eIF3-eIF4G-eIF4A position (Brito Querido et al 2024) in context to EMCV IRES-48S PIC. Thus, placing eIF4G-domain J-K close to ES6 of 40S ribosome, which coincides with the previous hydroxyl radical cleavage assay (Yu et al 2011).”

      (6) eIF4A has been shown to directly interact with eIF3 and facilitate recruitment of the 43S PIC. Does the interaction of the J-K domain with eIF4G/eIF4A, compatible with the known eIF4A-eIF3 interaction within the 43S PIC? In other words, during EMCV IRES-mediated initiation, could the eIF4A-eIF3 interaction functionally substitute for the eIF4G-eIF3 interaction?

      Reports on EMCV IRES-mediated translation initiation have shown eIF4G as an essential component of 48S PIC formation (Pestova et al 1996; Lomakin et al 2000; Kolupaeva et al 2003; Sweeney et al 2014), where eIF4G directly interacts with domain J-K of IRES and eIF4A, thus enabling loading of eIF4A on the IRES. In our study, the cryo-EM map of EMCV IRES-48S PIC lacks density for eIF3 and eIF4 proteins, and locating eIF4F is challenging due to the inherent flexibility associated with the complex. Previous studies on EMCV IRES-48S PIC have mapped the location of eIF4G close to ES6 towards the platform side of the body and eIF3 using the hydroxyl radical cleavage assay (Yu et al 2011). The human 48S initiation complex structures have shown a similar location for eIF4G, which is at the mRNA exit site, contacting eIF3 (Brito Querido et al 2020; Brito Querido et al 2024). On overlapping the 18S rRNA of EMCV IRES-48S PIC to that of the human 48S PIC in closed conformation (PDB Id- 8OZ0), and further superimposing the J-K-St- eIF4G- eIF4A (PDB Id- 8HUJ) on human 48S PIC (PDB Id- 8OZ0) with respect to HEAT1 of eIF4G, the domain J-K becomes positioned at the subunit face of 40S body, close to ES6 (Author response image 7). This correlates with the previously reported position for eIF4G with respect to EMCV IRES-48S PIC (Yu et al 2011). The predicted model shows no clashes with the canonical eIF4A-eIF3/ eIF4G-eIF4A-eIF3 interaction, or with the domain J-K-eIF4G-eIF4A model. Thus, highlighting a possibly compatible interaction axis among eIF3-eIF4G-eIF4A-domain J-K of IRES.

      Author response image 7.

      (upper left) Location of eIF4G-eIF4A in canonical human 48S PIC (PDB Id- 8OZ0). (upper right) Superimposition of 18S rRNA from human 48S and EMCV IRES 48S. (lower left) Superimposition of Human Closed 48S PIC structure (PDB Id- 8OZ0) on EMCV IRES-48S PIC model and placement of EMCV IRES- J-K domain-HEAT1-eIF4A structure (PDB Id- 8HUJ) with respect to eIF4G-HEAT1 domain. (lower right) Predicting location of eIF3 and eIF4 proteins in EMCV IRES-48S PIC.

      (7) Assuming that the additional density near the ternary complex corresponds to Domain I of the IRES and that the codon in the P site represents the EMCV AUG start codon, what is the authors' mechanistic model for EMCV IRES-mediated initiation? Specifically, how is the mRNA positioned or inserted into the 40S mRNA channel in the absence of canonical scanning? As it stands, the discussion does not sufficiently address this key aspect of the EMCV initiation mechanism.

      The EMCV IRES start codon (A-834) is directly placed in the P site (Pestova et al 1996), and the captured complex harboured the initiator tRNA in P<sub>IN</sub> state with AUG at the P site. This start codon is preceded by domains J-K-L, where the J-K domain interacts with eIF4 proteins via eIF4G1-HEAT1 domain, and L domain is 20 residues upstream of the AUG and known to interact with eIF4B (Pestova et al 1996; de Quinto et al 2001). Based on the position and binding partners for these domains, the domain L could be placed at the mRNA exit site, preceded by domain J-K, which could be placed close to eIF4G-eIF4A position on EMCV IRES 48S PIC, near expansion segment 6 (ES6). The domain J-K can interact with eIF4G, localized close to the left foot or ES6 as per previous biochemical experiments (Yu et al 2011). This suggests that position of eIF4G and eIF4A could be the same as that of cap-dependent initiation where it can interact with eIF3 core subunits as well as the IRES domain J-K and the predicted path of mRNA from the exit site can follow the path of mRNA in human closed 48S PIC (PDB Id- 8OZ0), where it interacts with eIF3 core.

      Examining the path of RNA in channel from the G-825 (exit site) to C-785 (domain J-K), we found the shortest distance is ~ 173 Å. This bridge could be filled by a single-stranded stretch of 40 nucleotides. However, the presence of domain L (stem loop- residues- 782 to 810) might hinder the placement of A-834 in the P-site (Author response image 8). We anticipate that to accommodate the start codon at the P site, either the domain L stem loop is resolved, which is an energetically expensive process (free energy of the thermodynamic ensemble is -11.12 kcal/mol, predicted using RNAfold). Another way could be a change in the orientation or conformation of domain J-K such that the start codon is directly placed at the P site without resolving domain L.

      Author response image 8.

      (left) The shortest distance between the last fitted residue- 825th of EMCV IRES to 785th of J-K domain of IRES (keeping eIF4G position same as that of PDB Id- 8OZ0) is 173 Å. (right) Tracing the path of mRNA (red) upstream of AUG coming out of the exit site of 40S ribosome and the possible position of eIF4G on EMCV IRES-48S PIC. Addition of nucleotides between C-785 and G-825 would fill the gap. The route of predicted mRNA from the exit channel is based on the mRNA (green) exiting the channel (PDB Id- 8OZ0).

      The domain I is followed by domain J-K, close to the left foot of the 40S ribosomal subunit as per previous biochemical experiments (Yu et al 2011). However, the minimum distance connecting the I domain at 601st nucleotide to 682nd nucleotide of domain J-K (at the predicted location) is ~300 Å, which might be difficult to be covered by 80 nucleotides (from 601 to 682), present as a double helical strand. We suppose there could be instances of J-K domain repositioning in the EMCV IRES-48S PIC such that the I domain apical region can contact the 40S head and simultaneously place the start codon at the P site (Author response image 9).

      Author response image 9.

      Rotated views of EMCV IRES domains- I apical part in contact with 40S head and tRNAi and predicted location of J-K domain in contact with eIF4G, close to the left foot of 40S (predicted from PDB Id- 8OZ0). The minimum distance connecting 601st nucleotide in I domain to 682nd nucleotide in J-K domain is 295.5 Å.

      We lack any details on the other IRES domains, such as domain I lower stem, domain J-K, or L; therefore, we refrained from commenting on these in our manuscript.

      (8) Supplementary Figure 1 is missing labels for the RNA ladders.

      The size of the DNA ladder used is mentioned.

      References:

      Bhattacharjee S, Abaeva IS, Brown ZP, Arhab Y, Fallah H, Hellen CUT, Frank J, Pestova TV. The mechanism of ribosomal recruitment during translation initiation on Type 2 IRESs. bioRxiv [Preprint]. 2025 Jun 11:2025.06.11.659010. doi: 10.1101/2025.06.11.659010. PMID: 40568087; PMCID: PMC12191231.

      Brito Querido J, Sokabe M, Díaz-López I, Gordiyenko Y, Fraser CS, Ramakrishnan V. The structure of a human translation initiation complex reveals two independent roles for the helicase eIF4A. Nat Struct Mol Biol. 2024 Mar;31(3):455-464. doi: 10.1038/s41594-023-01196-0. Epub 2024 Jan 29. PMID: 38287194; PMCID: PMC10948362.

      Brito Querido J, Sokabe M, Kraatz S, Gordiyenko Y, Skehel JM, Fraser CS, Ramakrishnan V. Structure of a human 48S translational initiation complex. Science. 2020 Sep 4;369(6508):1220-1227. doi: 10.1126/science.aba4904. PMID: 32883864; PMCID: PMC7116333.

      Chamond N, Deforges J, Ulryck N, Sargueil B. 40S recruitment in the absence of eIF4G/4A by EMCV IRES refines the model for translation initiation on the archetype of Type II IRESs. Nucleic Acids Res. 2014;42(16):10373-84. doi: 10.1093/nar/gku720. Epub 2014 Aug 26. PMID: 25159618; PMCID: PMC4176346.

      Dorn G, Gmeiner C, de Vries T, Dedic E, Novakovic M, Damberger FF, Maris C, Finol E, Sarnowski CP, Kohlbrecher J, Welsh TJ, Bolisetty S, Mezzenga R, Aebersold R, Leitner A, Yulikov M, Jeschke G, Allain FH. Integrative solution structure of PTBP1-IRES complex reveals strong compaction and ordering with residual conformational flexibility. Nat Commun. 2023 Oct 13;14(1):6429. doi: 10.1038/s41467-023-42012-z. PMID: 37833274; PMCID: PMC10576089.

      Duke GM, Hoffman MA, Palmenberg AC. Sequence and structural elements that contribute to efficient encephalomyocarditis virus RNA translation. J Virol. 1992 Mar;66(3):1602-9. doi: 10.1128/JVI.66.3.1602-1609.1992. PMID: 1310768; PMCID: PMC240893.

      Fernández N, Fernandez-Miragall O, Ramajo J, García-Sacristán A, Bellora N, Eyras E, Briones C, Martínez-Salas E. Structural basis for the biological relevance of the invariant apical stem in IRES-mediated translation. Nucleic Acids Res. 2011 Oct;39(19):8572-85. doi: 10.1093/nar/gkr560. Epub 2011 Jul 8. PMID: 21742761; PMCID: PMC3201876.

      Hashem Y, des Georges A, Dhote V, Langlois R, Liao HY, Grassucci RA, Pestova TV, Hellen CU, Frank J. Hepatitis-C-virus-like internal ribosome entry sites displace eIF3 to gain access to the 40S subunit. Nature. 2013 Nov 28;503(7477):539-43. doi: 10.1038/nature12658. Epub 2013 Nov 3. PMID: 24185006; PMCID: PMC4106463.

      Imai S, Suzuki H, Fujiyoshi Y, Shimada I. Dynamically regulated two-site interaction of viral RNA to capture host translation initiation factor. Nat Commun. 2023 Aug 28;14(1):4977. doi: 10.1038/s41467-023-40582-6. PMID: 37640715; PMCID: PMC10462655.

      Kim H, Kim K, Kwon T, Kim DW, Kim SS, Kim YJ. Secondary structure conservation of the stem-loop IV sub-domain of internal ribosomal entry sites in human rhinovirus clinical isolates. Int J Infect Dis. 2015 Dec;41:21-8. doi: 10.1016/j.ijid.2015.10.015. Epub 2015 Oct 27. PMID: 26518063.

      Lomakin IB, Hellen CU, Pestova TV. Physical association of eukaryotic initiation factor 4G (eIF4G) with eIF4A strongly enhances binding of eIF4G to the internal ribosomal entry site of encephalomyocarditis virus and is required for internal initiation of translation. Mol Cell Biol. 2000 Aug;20(16):6019-29. doi: 10.1128/mcb.20.16.6019-6029.2000. PMID: 10913184; PMCID: PMC86078.

      López de Quinto S, Martínez-Salas E. Conserved structural motifs located in distal loops of aphthovirus internal ribosome entry site domain 3 are required for internal initiation of translation. J Virol. 1997 May;71(5):4171-5. doi: 10.1128/JVI.71.5.4171-4175.1997. PMID: 9094703; PMCID: PMC191578.

      Lozano G, Francisco-Velilla R, Martinez-Salas E. Ribosome-dependent conformational flexibility changes and RNA dynamics of IRES domains revealed by differential SHAPE. Sci Rep. 2018 Apr 3;8(1):5545. doi: 10.1038/s41598-018-23845-x. PMID: 29615727; PMCID: PMC5882922.

      Maloney A, Joseph S. Validating the EMCV IRES Secondary Structure with Structure-Function Analysis. Biochemistry. 2024 Jan 2;63(1):107-115. doi: 10.1021/acs.biochem.3c00579. Epub 2023 Dec 11. PMID: 38081770; PMCID: PMC10896073.

      Pestova TV, Hellen CU, Shatsky IN. Canonical eukaryotic initiation factors determine initiation of translation by internal ribosomal entry. Mol Cell Biol. 1996 Dec;16(12):6859-69. doi: 10.1128/MCB.16.12.6859. PMID: 8943341; PMCID: PMC231689.

      Petrychenko V, Yi SH, Liedtke D, Peng BZ, Rodnina MV, Fischer N. Structural basis for translational control by the human 48S initiation complex. Nat Struct Mol Biol. 2024 Sep 17. doi: 10.1038/s41594-024-01378-4. Epub ahead of print. PMID: 39289545.

      Roberts LO, Belsham GJ. Complementation of defective picornavirus internal ribosome entry site (IRES) elements by the coexpression of fragments of the IRES. Virology. 1997 Jan 6;227(1):53-62. doi: 10.1006/viro.1996.8312. PMID: 9007058.

      Robertson ME, Seamons RA, Belsham GJ. A selection system for functional internal ribosome entry site (IRES) elements: analysis of the requirement for a conserved GNRA tetraloop in the encephalomyocarditis virus IRES. RNA. 1999 Sep;5(9):1167-79. doi: 10.1017/s1355838299990301. PMID: 10496218; PMCID: PMC1369840.

      Sweeney TR, Abaeva IS, Pestova TV, Hellen CU. The mechanism of translation initiation on Type 1 picornavirus IRESs. EMBO J. 2014 Jan 7;33(1):76-92. doi: 10.1002/embj.201386124. Epub 2013 Dec 15. PMID: 24357634; PMCID: PMC3990684.

      Velazquez MA, Nuthalapati SS, Hankinson J, Fominykh K, Lulla V, Sweeney TR, Hill CH. Structural and mechanistic insights into translation initiation on the enterovirus Type 1 IRES. bioRxiv [Preprint]. 2025 Oct 3: 2025.10.04.680434. doi: 10.1101/2025.10.04.680434.

      Yu Y, Sweeney TR, Kafasla P, Jackson RJ, Pestova TV, Hellen CU. The mechanism of translation initiation on Aichivirus RNA mediated by a novel type of picornavirus IRES. EMBO J. 2011 Aug 26;30(21):4423-36. doi: 10.1038/emboj.2011.306. PMID: 21873976; PMCID: PMC3230369.

    1. Author response:

      We thank the editors and reviewers for their careful consideration of our manuscript and for their constructive feedback, which we will address in detail in our revised version. We value that Reviewer 1 considered that “data they compiled and submitted to public databases is a valuable resource for the community.” We are also encouraged by Reviewer #2 when they stated that “The data set is very nice, and the annotations are extremely rigorous and more in-depth than other datasets that include these tissues, since these investigators have enriched significantly for this tissue of interest. Their use of PAGA to identify potential developmental relationships within the data is rigorous. I also would like to specifically point out how incredibly gorgeous the microscopy of the lmx1bb phenotype is in Figure 7. Wow.” We were encouraged by Reviewer #3’s comments that “The computational analysis is thorough, and the findings are clearly described. In situ hybridization provides corroboration of cell identities in many cases. This resource atlas will be of particular interest for studies of inner ear morphogenesis.”

      We spent a significant effort and time considering and addressing the reviewers’ public criticisms.

      Below we address the criticisms of the reviewers’ Public Reviews individually.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      Many of the clusters have not been annotated or rely on published data. For the ones for which no HCRs or UMAPs are shown, it is therefore difficult to estimate which of the markers are indeed the most cell type/state-specific ones.

      Major comments:

      (1) It would be very useful if the cluster numbers in the Excel files also had the associated cell type annotations as a second column (at least for the ones that are known). E.g., in Supplemental Table 2, the text states which clusters represent which neuromast and ear cell type, but these are not mentioned in the Excel table.

      Thank you for the suggestion, we will include additional annotations in the revised version.

      (2) Many of the clusters have not been annotated or rely on published data. For the ones for which no HCRs or UMAPs are shown, it is therefore difficult to estimate which of the markers are indeed the most cell-type/state-specific ones.

      We recognize the need to evaluate potential new markers, we will include a heat map of markers and clusters to assess cell-type/state specificity in the revised version.

      (3) Uploading the data to gEAR (https://umgear.org/dataset_explorer.html), a web-based, publicly available ear database, would further increase the usefulness of this study to the broader community.

      We appreciate the suggestion to upload to gEAR and will upload to the database in the near future.

      Method:

      The authors should provide the details about how many cells were sequenced for each ear developmental stage, how many cells were present per cluster (page 8), and how many cells were present in each subcluster of ear and lateral line clusters (page 10).

      We will add cell numbers for each cluster in the revised version as an additional column in the supplemental tables.

      Reviewer #2 (Public review):

      Weaknesses:

      A missed opportunity is that the authors describe creating an additional scRNAseq dataset from lmx1bb mutants, but do not show any comparative scRNAseq analyses that would identify broader sets of differentially expressed genes. It seems almost as if a key element of the study was removed at the last minute, and as a result, the discussion of changes in epcam expression in lmx1bb mutants in Figure 7 seems somewhat tacked onto the end of the study and not motivated by the analyses presented in the manuscript.

      Overall, I do not think this study requires any major revisions to be appropriate and useful to the community. This study would be potentially stronger with a more formal analysis of what gene expression changes occurred in otic tissue in lmx1bb mutants, but it is also useful without this. I did have a couple of minor suggestions for the presentation of some aspects that would have made it easier for me as a reader.

      We will include analysis of the lmx1bb mutant data in the revised version and value the suggestions for improved presentation. We will work on irmpoving presentation of the mutant data, including a UMAP with the WT cells in one color and the mutant cells in another color.

      Reviewer #3 (Public review):

      Weaknesses:

      The manuscript is incomplete. Important details that would allow replicable analysis are not provided, with notebooks not available on the referenced GitHub site, and additional files are missing.

      Python notebooks will be added shortly, and files for mapping in Drops data will be provided at the GitHub site.

      The authors make a detailed description of hair cells and supporting cells that are consistent with previous findings (Figures 2 and 3). By contrast, the analysis of distinct cell types that have not been previously well characterized in zebrafish is somewhat incomplete. Markers are described for cells forming the semicircular canals, including ccn1l1 (Figure 4). The authors report an intriguing pattern of its expression before overt bud formation; however, they provide no detailed expression analysis to support this assertion.

      The authors also identify new markers for subsets of periotic mesenchyme (Figure 6). These include epyc and otos, which mark distinct populations within the mammalian inner ear - cochlea supporting cells, spiral limbus, and ligament, respectively. Identification of the equivalent of the spiral ligament would be of particular interest. However, the expression analysis is not of sufficient resolution to identify which cell types these represent in the zebrafish inner ear.

      Thank you for your input regarding the analysis of the periotic mesenchyme. In the revised version, we will attempt to improve resolution of different populations, first by comparing epyc and otos expression by HCR. It is unclear how to correlate any patterns with structures that have yet to evolve, but we will look for similarities and differences to studies performed in mice (PMID: 37720106).

      Differences in gene expression are reported for lmx1bb mutants. However, none of the single-cell data for mutants is provided, and the table (S8) of differential gene expression is missing. Significantly more detail would be needed to interpret these findings.

      We will include analysis of the lmx1bb mutant data in the revised version and value the suggestions for improved presentation.

    1. Author response:

      We thank the Reviewing Editor and reviewers for their thoughtful and constructive evaluation of our manuscript, Programmed Delayed Splicing: A Mechanism for Timed Inflammatory Gene Expression. We are encouraged that the reviewers found the study valuable, the experimental design strong for the core findings. We appreciate the reviewers’ careful attention to the limits of inference in several parts of the manuscript, and will address these points in a revised version. We especially want to acknowledge that this paper has benefited from the abiding interest in splicing regulation by the editors and reviewers who have meticulously improved nearly every aspect of this multifaceted work in its present state.

      Our planned revisions will focus on five areas. First, we will more carefully evaluate and discuss the extent to which the hybrid-capture strategy may impose position-dependent constraints on apparent splicing behavior, particularly across 5′ and 3′ introns. Second, we will clarify the use of the term “bottleneck introns,” distinguishing descriptive use in the main text from the ranked subsets used in downstream analyses. Third, we will revise the framing of the reporter assays to make explicit that these measure steady-state reporter output and do not, on their own, resolve all downstream kinetic consequences of delayed splicing. Fourth, we will clarify the interpretation of the actinomycin D experiments as providing estimates of intron excision behavior under transcriptional arrest rather than a complete time-resolved model of splicing during TNF induction. Fifth, we will substantially revise the scope and stated limitations of the deep learning-aided interpretations of data in this work.

      Reviewer #1

      We thank Reviewer #1 for the positive assessment of the hybrid-capture strategy, the splice-site reporter experiments, and the potential value of the neural-network-based analysis. We appreciate the reviewer’s view that these approaches help extend a well-established system for studying temporal gene expression in TNF-stimulated macrophages. We address the main concerns raised in the public review below.

      (1) While evidence is provided that these introns are distinct from previously published splicing kinetics studies, “bottleneck” introns are not adequately placed in context for assessment of how they are similar or different.

      We appreciate this point and agree that the current manuscript does not yet place these introns in sufficiently clear context relative to prior literature. Our study builds on foundational work describing regulated changes in splicing kinetics, widespread intron retention, and detained introns as biologically meaningful modes of gene regulation, including transcript-specific regulation of splicing in response to stress (Pleiss, Mol Cell., 2007), widespread functional intron retention in mammals (Braunschweig, Genome Res., 2014), and the definition of detained introns as a distinct class of post-transcriptionally spliced introns (Boutz, Genes Dev., 2015). In revision, we will expand the comparison to previously described classes of delayed or retained introns and clarify more explicitly how the introns studied here are defined in the setting of inducible inflammatory transcripts and their temporal resolution over the course of stimulation. We will also revise the relevant Results and Discussion text so that the distinction is made directly in the manuscript rather than relying on inference from the broader presentation.

      (2) Splicing reporters are a good approach, but the complexities of post-transcriptional gene expression regulation are not adequately addressed.

      We agree that the interpretive limits of the reporter assays should be stated more clearly and consistently. In revision, we will revise the presentation of the minigene experiments to make explicit that these are steady-state reporter assays and therefore do not, on their own, resolve all downstream kinetic consequences of delayed splicing in the endogenous context. At the same time, we believe the assay remains informative because it provides a controlled system in which the contribution of splice donor sequence can be tested directly in matched reporter constructs. In that sense, the reporter experiments are valuable as a reductionist test of whether weak donor sequences are sufficient to alter reporter output, even if they do not fully recapitulate the broader endogenous post-transcriptional environment. We will emphasize that these data support an association between weak donor sites and altered reporter output, while moderating any broader mechanistic claims that extend beyond what the assay directly measures.

      (3) Deep learning models are a potentially powerful tool for identifying novel regulatory sequences; however, their use here is underdeveloped.

      We appreciate this concern and agree that the deep-learning section should be revised substantially. In a revised manuscript, we will clarify the training setup, the definition of the slow-intron subsets used in downstream analyses, and the interpretation of the attribution and motif analyses. Alongside, we believe the assay remains informative because it provides a controlled system in which the contribution of splice donor sequence can be tested directly in matched reporter constructs. In that respect, the reporter experiments are valuable as a reductionist test of whether weak donor sequences are sufficient to alter reporter output, even if they do not fully recapitulate the broader endogenous post-transcriptional environment. We will revise the framing of these results so that they are presented more explicitly as identifying candidate sequence features associated with delayed splicing, rather than as direct evidence of specific causal regulatory mechanisms.

      Reviewer #2

      We thank Reviewer #2 for the thoughtful and detailed comments, and for recognizing the strengths of the measurement strategy and the clarity of the manuscript. We appreciate the reviewer’s view that the study will be of interest to a broad audience, and we agree that several conclusions will be strengthened by additional analysis and clearer explanation. We address the main concerns raised in the public review below.

      (1) Concern regarding possible bias of the hybrid-capture strategy toward introns closer to the 3′ end, and whether 5′ introns should be treated separately in some analyses.

      We thank the reviewer for this careful and important point. We agree that this is a potential limitation of the approach and that it should be addressed more explicitly in the manuscript. Our assay begins with poly(A)-selected RNA and then enriches transcripts of interest through terminal-exon capture, so the molecules analyzed are completed, polyadenylated transcripts rather than nascent partial transcripts. This feature is important for reducing ambiguity arising from incomplete transcription, particularly in the chromatin-associated fraction. At the same time, we agree that for introns near the 5′ end, the assay may have limited power to distinguish very rapid splicing from moderately rapid splicing if excision is largely complete by the time the transcript is fully synthesized and polyadenylated.

      In revision, we will address this concern directly in two ways. First, we will revise the Results and Discussion to clarify that the assay provides a population-level measure of splice completion in completed transcripts and that interpretation is strongest for introns whose excision is not already fully resolved before transcript completion. Second, we will more systematically evaluate whether apparent slow splicing covaries with transcript position, distance from the 3′ end, and intron length, and we will perform sensitivity analyses with and without the most 5′ introns to determine which conclusions are robust to these positional constraints. We will also examine transcript coverage patterns in greater detail to better assess the extent to which library construction and  cDNA generation may contribute to apparent positional bias. Our preliminary inspection suggests that transcript position is not the sole determinant of the observed heterogeneity, but we agree that a more explicit treatment of this issue is warranted in the revised manuscript.

      (2) Request for more detailed discussion of alternative library-construction choices.

      We appreciate this suggestion and agree that the revised manuscript would benefit from a fuller discussion of the strengths and limitations of the current enrichment strategy. We chose poly(A) selection followed by terminal-exon capture because this design enriches completed transcripts of interest and reduces ambiguity from nascent partial transcripts, which is particularly important in the chromatin-associated fraction. This approach also provides greater read depth over the selected inflammatory transcripts, enabling more informative intron-level comparisons within the targeted dataset. In revision, we will clarify this rationale more explicitly in the manuscript. We will also discuss the tradeoffs of this design relative to alternative exon-targeting strategies and how those alternatives might provide different, but complementary, views of splicing kinetics.

      (3) Questions regarding biological replicates, error bars, and statistical analysis in Figure 1C and other plots.

      We agree that the replicate structure and intended interpretation of these plots should be clarified more explicitly. In revision, we will revise the figure legends and Methods to distinguish panels that display a single bulk RNA-seq time course (for example, Figure 1C) from panels that summarize distributions across many introns (for example, Figure 2 and Supplementary Figure 6). We will also add statistical comparisons where they are most appropriate and informative, such as in sequence-feature comparisons like Supplementary Figure 4C, while making clear that some CoSI panels are intended as descriptive summaries of intron-level heterogeneity rather than replicate-based inferential plots.

      (4) Concern that intron half-lives may be time-dependent during TNF induction, and that the logic of the actinomycin D measurements is therefore unclear.

      We appreciate this point and agree that the manuscript should distinguish more clearly between two related but non-identical quantities: the CoSI trajectories observed during ongoing TNF induction, and the interruption-based half-life estimates derived from actinomycin D treatment. The actinomycin D experiments were performed using multiple post-treatment timepoints, but they were designed to estimate intron excision behavior after transcriptional arrest under a defined set of conditions, rather than to measure whether an individual intron’s effective splicing rate changes across all phases of the TNF response. We agree that these estimates should therefore be interpreted as constrained measurements under the assay conditions used, rather than as a complete time-resolved model of splicing kinetics during induction. In revision, we will clarify this point in the Results, Methods, and Discussion, and we will more explicitly acknowledge that effective splicing behavior could vary across the induction time course.

      (5) Concern that the interpretation of Supplementary Figure 6 is unclear, particularly why delayed splicing in non-immediate groups appears to peak later rather than at the earliest time points.

      We appreciate this point and agree that the current presentation of Supplementary Figure 6 does not explain this behavior clearly enough. Our interpretation is not that delayed splicing is the sole determinant of early versus later induction classes. Rather, the earliest time points reflect a combination of transcriptional induction timing and RNA processing state. In this framework, the dip in CoSI shortly after stimulation reflects the appearance of newly induced, incompletely spliced transcripts, and the later kinetic groups appear to recover from this dip more slowly than the immediate-early group. Thus, the strongest signal of delayed splicing may become most apparent only after sufficient transcript accumulation, rather than necessarily at the very earliest time point. In revision, we will revise the text to make this logic clearer and will consider a more intuitive visualization of these group-specific CoSI trajectories.

      (6) Concern that the deep-learning setup does not make clear whether the model input and output are time-dependent.

      We appreciate this concern and agree that the current manuscript does not explain the model setup clearly enough. Briefly, we will clarify the role of the three TNF timepoints in model training, including the fact that these outputs were modeled jointly and that time itself was not provided as an explicit input to the model. We will also revise the Results and Methods so that the scope and interpretation of the resulting analyses are more explicit.

      Reviewer #3

      We thank Reviewer #3 for the positive assessment of the targeted capture design, the evaluation of overall interest of the findings, and the improvements in the current version. We appreciate the reviewer’s view that the study is intriguing and that the manuscript has been strengthened in revision. We agree, however, that the manuscript should more clearly distinguish what is directly demonstrated from what remains mechanistically unresolved. We address the main concerns raised in the public review below.

      (1) The study still does not fully resolve the downstream consequences of delayed splicing, including whether bottleneck introns lead primarily to delayed production of mature transcripts, reduced productive transcript output, or some combination of the two.

      We agree with this assessment. The current data do not fully resolve whether delayed splicing primarily delays mature transcript production, reduces productive transcript output, or reflects some combination of the two. In revision, we will further moderate the framing of the downstream consequences of delayed splicing and will revise the Abstract, Results, and Discussion to make clear that the present data do not fully distinguish among delayed mature transcript production, reduced productive transcript output, or a combination of both. We will ensure that the manuscript consistently presents these possibilities as alternatives not fully resolved by the current data.

      (2) The minigene reporter assays measure a steady-state level of the transcript and do not provide direct insight into kinetics.

      We agree and will revise the manuscript to make this limitation explicit throughout. In particular, we will ensure that the reporter assays are described consistently as steady-state reporter assays that support an association between splice donor strength and altered reporter output, while avoiding stronger claims that they directly resolve endogenous splicing kinetics or downstream transcript fate.

      (3) Given that the detailed analyses were performed on a selected subset of inflammation-induced transcripts, a broader evolutionary interpretation should be restrained.

      We agree that the broader evolutionary and mechanistic framing should be more carefully defined. In revision, we will restrain these interpretations so that they remain closely aligned with the inflammation-focused and targeted-transcript scope of the current study, and we will moderate language that extends beyond what is directly supported by the present dataset.

      Closing Remarks

      We again thank the reviewers for their constructive comments. We believe that the planned revisions will strengthen the manuscript by clarifying the scope of the mechanistic conclusions, sharpening the interpretation of the experimental approaches, and more carefully defining the role of the computational analyses. We appreciate the opportunity to revise the work and to provide this provisional response to accompany the Reviewed Preprint.

    1. Author response:

      Reviewer #1<br /> - The results showing that hh and vvl drive tracheal invaginaton independently of trh are reported in Figure 5 of (Matsuda et al. 2015 eLife 4:e09646).

      Reviewer #2

      Many images primarily show lateral views of whole embryos, which can make it difficult to fully assess some phenotypes; higher-magnification or sectional views would enhance clarity. There are also some minor inconsistencies in the description of invagination phenotypes, particularly regarding whether all trh+ cells remain in a 2D plane versus indications of partial invagination in hh vvl double mutants blocking apoptosis, which would benefit from further clarification.

      The data in our previous eLife publication (DOI: 10.7554/eLife.09646)1 were mostly projection views. Therefore, it is hard to conclude if the airway progenitors of hh vvl double mutants failed to invaginate or they invaginated to form sacs. We will provide magnified views of the progenitor invagination in hh vvl double mutants and describe the degrees of their invagination phenotypes.

      Reviewer #1

      The results showing dpp requirement for trh maintenance are partially reported in Figure 6 of (Matsuda 2015 eLife 4:e09646).

      Reviewer #2

      Finally, some statements in the abstract, especially regarding the role of grn, are not directly supported by data in this study and could be better aligned with the scope of the presented results.

      trh-lacZ (1-eve-1) has been used as the earliest and the strongest enhancer trap line to mark the airway primordia and the airway progenitors. Perdurance of beta-galactocidase proteins makes it difficult to conclude if the marker signals result from the active transcriptional state of the trh locus. In our previous eLife publication we showed that Trh proteins and trh_transcripts are not detectable in _H99 grn hh vvl quadruple mutants and in grn hh vvl triple mutants (Figure 5H and Figure 5-figure supplement 2A of DOI: 10.7554/eLife.09646, respectively)1, although trh-LacZ signals are detected in grn hh vvl triple mutants.

      Similarly, although we previously showed trh-LacZ expression in dpp mutant combinations, Figure 2 in the current manuscript, shows that even strong trh-LacZ signals do not always correlate with trh transcripts in dpp mutants. Therefore, in the current manuscript we included the data of dpp-driven positive regulation of trh transcripts at later stages since they have not been shown before.

      Assessments and advices of the Editors and the Reviewers are indispensable for improving the manuscript. We will address all the Reviewers comments (Weakness of Public review, major and minor issues of Recommendations for the authors) both experimentally and in the text.

      Sincerely yours,

      Christos Samakovlis on behalf of all authors

      • (1) Matsuda, R., Hosono, C., Samakovlis, C. & Saigo, K. Multipotent versus differentiated cell fate selection in the developing Drosophila airways. eLife 4 (2015).
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zacharia and colleagues investigate the role of the C-terminus of IFT172 (IFT172c), a component of the IFT-B subcomplex. IFT172 is required for proper ciliary trafficking and mutations in its C-terminus are associated with skeletal ciliopathies. The authors begin by performing a pull-down to identify binding partners of His-tagged CrIFT172968-C in Chlamydomonas reinhardtii flagella. Interactions with three candidates (IFT140, IFT144, and a UBX-domain containing protein) are validated by AlphaFold Multimer with the IFT140 and IFT144 predictions in agreement with published cryo-ET structures of anterograde and retrograde IFT trains. They present a crystal structure of IFT172c and find that a part of the C-terminal domain of IFT172 resembles the fold of a non-canonical U-box domain. As U-box domains typically function to bind ubiquitin-loaded E2 enzymes, this discovery stimulates the authors to investigate the ubiquitin-binding and ubiquitination properties of IFT172c. Using in vitro ubiquitination assays with truncated IFT172c constructs, the authors demonstrate partial ubiquitination of IFT172c in the presence of the E2 enzyme UBCH5A. The authors also show a direct interaction of IFT172c with ubiquitin chains in vitro. Finally, the authors demonstrate that deletion of the U-box-like subdomain of IFT172 impairs ciliogenesis and TGFbeta signaling in RPE1 cells.

      However, some of the conclusions of this paper are only partially supported by the data, and presented analyses are potentially governed by in vitro artifacts. In particular, the data supporting autoubiquitination and ubiquitin-binding are inconclusive. Without further evidence supporting a ubiquitin-binding role for the C-terminus, the title is potentially misleading.

      Strengths:

      (1) The pull-down with IFT172 C-terminus from C. reinhardtii cilia lysates is well performed and provides valuable insights into its potential roles.

      (2) The crystal structure of the IFT172 C-terminus is of high quality.

      (3) The presented AlphaFold-multimer predictions of IFT172c:IFT140 and IFT172c:IFT144 are convincing and agree with experimental cryo-ET data.

      Weaknesses:

      (1) The crystal structure of HsIFT172c reveals a single globular domain formed by the last three TPR repeats and C-terminal residues of IFT172. However, the authors subdivide this globular domain into TPR, linker, and U-box-like regions that they treat as separate entities throughout the manuscript. This is potentially misleading as the U-box surface that is proposed to bind ubiquitin or E2 is not surface accessible but instead interacts with the TPR motifs. They justify this approach by speculating that the presented IFT172c structure represents an autoinhibited state and that the U-box-like domain can become accessible following phosphorylation. However, additional evidence supporting the proposed autoinhibited state and the potential accessibility of the U-box surface following phosphorylation is needed, as it is not tested or supported by the current data.

      We thank the reviewer for this comment. IFT172C contains TPR region and Ubox-like region, which are admittedly tightly bound to each other. While there is a possibility that this region functions and exists as one domain, below are the reasons why we chose to classify these regions as two different domains.

      (1) TPR and Ubox-like regions are two different structural classes

      (2) TPR region is linked to Ubox-like region via a long linker which seems poised to regulate the relative movement between these regions.

      (3) Many ciliopathy mutations are mapped to the interface of TPR region and the Ubox region hinting at a regulatory mechanism governed by this interface.

      That said, we agree that the proposed autoinhibited state and its potential relief by phosphorylation remains a hypothesis that requires experimental validation. We have revised the manuscript to present this more clearly as a speculative model rather than an established mechanism. We clearly acknowledge this limitation on pg. 16-17 of the revised discussion: ‘The IFT172 U-box domain appears to be in an auto-inhibited state in our crystal structure of HsIFT172C2 (Fig. 2E), potentially explaining the absence of a robust auto-ubiquitination activity in-vitro. This structural inhibition is reminiscent of the RING ubiquitin ligase CBL [59], where phosphorylation and substrate binding trigger a conformational change that activates ligase activity [59,75]. Intriguingly, the phosphosite database [76] lists four residues (T1533, S1549, T1689, Y1691) at the U-box/TPR interface as phosphorylation sites (Fig. S2D). Phosphorylation of these residues could potentially alleviate the auto-inhibited state, suggesting a possible regulatory mechanism. Furthermore, a 30-residue linker connects the U-box domain to the last TPR of IFT172, likely providing significant conformational flexibility (Fig. 2A-B). This flexibility may be functionally crucial for the U-box domain, allowing it to adopt different conformations as needed for its various roles. However, we note that the proposed autoinhibition model and its potential regulation by phosphorylation remain hypothetical and require future experimental validation.

      (2) While in vitro ubiquitination of IFT172 has been demonstrated, in vivo evidence of this process is necessary to support its physiological relevance.

      We thank the reviewer for this important point. We agree that in vivo evidence of IFT172 ubiquitination would strengthen the physiological relevance of our findings. While our current study focuses on the in vitro characterization of this activity, we have revised the manuscript to more clearly state that demonstration of IFT172 ubiquitination activity in cells, including identification of bona fide substrates, is required to establish its physiological significance (p. 16). We consider this an important direction for future studies.

      (3) The authors describe IFT172 as being autoubiquitinated. However, the identified E2 enzymes UBCH5A and UBCH5B can both function in E3-independent ubiquitination (as pointed out by the authors) and mediate ubiquitin chain formation in an E3-independent manner in vitro (see ubiquitin chain ladder formation in Figure 3A). In addition, point mutation of known E3-binding sites in UBCH5A or TPR/U-box interface residues in IFT172 has no effect on the mono-ubiquitination of IFT172c1. Together, these data suggest that IFT172 is an E3-independent substrate of UBCH5A in vitro. The authors should state this possibility more clearly and avoid terminology such as "autoubiquitination" as it implies that IFT172 is an E3 ligase, which is misleading. Similarly, statements on page 10 and elsewhere are not supported by the data (e.g. "the low in vitro ubiquitination activity exhibited by IFT172" and "ubiquitin conjugation occurring on HsIFT172C1 in the presence of UBCH5A, possibly in coordination with the IFT172 U-box domain").

      We now consider this possibility and tone down our statements about the autoubiquitination activity of IFT172 in both the abstract and results/discussion parts of the revised version of the manuscript. We no longer refer to IFT172 as having auto-ubiquitination activity in the manuscript.

      (4) Related to the above point, the conclusion on page 11, that mono-ubiquitination of IFT172 is U-box-independent while polyubiquitination of IFT172 is U-box-dependent appears implausible. The authors should consider that UBCH5A is known to form free ubiquitin chains in vitro and structural rearrangements in F1715A/C1725R variants could render additional ubiquitination sites or the monoubiquitinated form of IFT172 inaccessible/unfavorable for further processing by UBCH5A.

      We agree and the conclusion on pg. 11 has now been changed to: Therefore, while mutations in the IFT172 U-box domain affect the formation of higher molecular weight ubiquitin conjugates, the prominent mono-ubiquitination of IFT172 is likely attributable to the E3-independent activity of UbcH5a, as this event is not impacted by these U-box mutations, rather than indicating an intrinsic auto-ubiquitination capacity of IFT172 itself.

      (5) Identification of the specific ubiquitination site(s) within IFT172 would be valuable as it would allow targeted mutation to determine whether the ubiquitination of IFT172 is physiologically relevant. Ubiquitination of the C1 but not the C2 or C3 constructs suggests that the ubiquitination site is located in TPRs ranging from residues 969-1470. Could this region of TPR repeats (lacking the IFT172C3 part) suffice as a substrate for UBCH5A in ubiquitination assays?

      We thank the reviewer for raising this important point about ubiquitination site identification. While not included in our manuscript, we did perform mass spectrometry analysis of ubiquitination sites using wild-type IFT172 and several mutants (P1725A, C1727R, and F1715A). As shown in Author response image 1, we detected multiple ubiquitination sites across these constructs. The wild-type protein showed ubiquitination at positions K1022, K1237, K1271, and K1551, while the mutants displayed slightly different patterns of modification. However, we should note that the MS intensity signals for these ubiquitinated peptides were relatively low compared to unmodified peptides, making it difficult to draw strong conclusions about site specificity or physiological relevance.

      Author response image 1.

      Consistent with the reviewer's suggestion, all detected ubiquitination sites fall within the TPR-containing region (residues 1022-1551), which is present in the C1 construct but absent from C2 and C3, explaining the construct-dependent ubiquitination pattern. We did not test the TPR region alone as a UBCH5A substrate, but this would be an informative experiment for future studies.

      (6) The discrepancy between the molecular weight shifts observed in anti-ubiquitin Western blots and Coomassie-stained gels is noteworthy. The authors show the appearance of a mono-ubiquitinated protein of ~108 kDa in anti-ubiquitin Western blots. However, this molecular weight shift is not observed for total IFT172 in the corresponding Coomassie-stained gels (Figures 3B, D, F). Surprisingly, this MW shift is visible in an anti-His Western blot of a ubiquitination assay (Fig 3C). Together, this raises the concern that only a small fraction of IFT172 is being modified with ubiquitin. Quantification of the percentage of ubiquitinated IFT172 in the in vitro experiments could provide helpful context.

      We acknowledge that the ubiquitin conjugation of IFT172 in vitro is weak, as stated in the manuscript (p. 16). The discrepancy between anti-ubiquitin Western blots and Coomassie-stained gels is consistent with only a small fraction of IFT172 being modified, which is expected given that the reaction likely reflects E3-independent ubiquitination by UBCH5A rather than a robust enzymatic activity of IFT172 itself. The anti-His Western blot (Fig. 3C) is more sensitive than Coomassie staining, explaining why the shift is visible there but not on Coomassie. We have not performed formal quantification of the ubiquitinated fraction, but based on the Coomassie data, we estimate it to be a minor proportion of total IFT172, consistent with the toned-down conclusions in our revised manuscript. The identification of physiological substrates and in vivo validation will be important future directions to establish the biological relevance of these observations.

      (7) The authors propose that IFT172 binds ubiquitin and demonstrate that GST-tagged HsIFT172C2 or HsIFT172C3 can pull down tetra-ubiquitin chains. However, ubiquitin is known to be "sticky" and to have a tendency for weak, nonspecific interactions with exposed hydrophobic surfaces. Given that only a small proportion of the ubiquitin chains bind in the pull-down, specific point mutations that identify the ubiquitin-binding site are required to convincingly show the ubiquitin binding of IFT172.

      We appreciate the reviewer's point regarding the potential for non-specific ubiquitin interactions and the value of mutational analysis for confirming specificity. While further mutagenesis of the predicted ubiquitin-binding interface was not performed for this revision, we note that our data show comparable tetra-ubiquitin pull-down by both the larger HsIFT172C2 construct and, importantly, the isolated HsIFT172C3 U-box domain itself (Fig. 4D). This localization of binding to the smaller U-box domain, coupled with our AlphaFold model predicting a specific interface with ubiquitin (Fig. 4E-F) and the observation that a mutation elsewhere (D1605R, Fig. 4C) does not abrogate this binding, collectively suggest a degree of specificity. We have revised the manuscript to more cautiously present these findings and acknowledge the need for future studies to definitively map the binding site. Specifically, we have now toned down the conclusion in the section on pg. 12-13 of the revised manuscript including a toned down heading: “IFT172 U-box domain pulls down ubiquitin in vitro”.

      (8) The authors generated structure-guided mutations based on the predicted Ub-interface and on the TPR/U-box interface and used these for the ubiquitination assays in Fig 3. These same mutations could provide valuable insights into ubiquitin binding assays as they may disrupt or enhance ubiquitin binding (by relieving "autoinhibition"), respectively. Surprisingly, two of these sites are highlighted in the predicted ubiquitin-binding interface (F1715, I1688; Figure 4E) but not analyzed in the accompanying ubiquitin-binding assays in Figure 4.

      We thank the reviewer for emphasizing the importance of mutational analysis to confirm the specificity of ubiquitin binding and for specifically inquiring about residues like F1715 and I1688 at the predicted ubiquitin interface. We tested purified HsIFT172C1 constructs containing the F1715A mutation (along with P1725A and C1727R variants) in pull-down assays with GST-Ubiquitin, see Author response image 2.

      Author response image 2.

      However, these experiments did not reveal a conclusive difference in ubiquitin binding for any of the tested variants compared to wild-type IFT172. The I1688A mutant, unfortunately, yielded insoluble protein and could not be evaluated. It is conceivable that the F1715A mutation was not disruptive enough to significantly alter binding, and future studies with different substitutions might be more informative. Nevertheless, our observations that the isolated HsIFT172C3 U-box domain itself pulls down tetra-ubiquitin (Fig. 4D), that our AlphaFold model predicts a specific interface (Fig. 4E-F), and that a mutation elsewhere (D1605R, Fig. 4C) does not abrogate this binding, collectively suggest a degree of specificity. We have revised the manuscript to present these ubiquitin binding findings cautiously, acknowledging the need for further investigation to definitively map the binding site and its functional relevance.

      (9) If IFT172 is a ubiquitin-binding protein, it might be expected that the pull-down experiments in Figure S1 would identify ubiquitin, ubiquitinated proteins, or E2 enzymes. These were not observed, raising doubt that IFT172 is a ubiquitin-binding protein.

      We acknowledge that the absence of ubiquitin or ubiquitinated proteins in our pull-down/MS experiment (Fig. S1) could raise questions about the ubiquitin-binding capacity of IFT172. However, several technical factors likely explain this. First, IFT172 appears to bind ubiquitin with low affinity, as indicated by our in vitro pull-downs and the AF-predicted interface. Second, we used extensive washes to remove non-specific interactors, which would also remove weak but potentially genuine ubiquitin interactions. Third, we did not include ubiquitination-preserving reagents such as NEM in our pull-down buffers, exposing ubiquitinated proteins to DUB-mediated deubiquitination during the experiment. These factors combined would strongly select against the detection of ubiquitin-related interactors under our experimental conditions.

      (10) The cell-based experiments demonstrate that the U-box-like region is important for the stability of IFT172 but does not demonstrate that the effect on the TGFb pathway is due to the loss of ubiquitin-binding or ubiquitination activity of IFT172.

      We acknowledge that our current data cannot definitively distinguish whether the TGFβ pathway defects arise from reduced IFT172 protein stability or from specific loss of ubiquitin-related functions of the U-box domain. Our experiments demonstrate that the U-box region is required for both IFT172 stability and proper TGFβ signaling, but we agree that establishing a direct mechanistic link between ubiquitin-binding/conjugation and signaling would require additional experiments such as point mutations that selectively disrupt ubiquitin-related activity without affecting protein stability. We have revised the discussion (p. 18-19) to more clearly acknowledge this limitation. Addition to text: “However, we note that our current experiments cannot distinguish whether these signaling effects result specifically from loss of ubiquitin-related functions of the U-box domain or from the reduced levels of functional IFT172 protein in the heterozygous U-box deleted cells. Targeted point mutations that selectively disrupt ubiquitin binding without affecting protein stability would be required to resolve this question.”

      (11) The challenges in experimentally validating the interaction between IFT172 and the UBX-domain-containing protein are understandable. Alternative approaches, such as using single domains from the UBX protein, implementing solubilizing tags, or disrupting the predicted binding interface in Chlamydomonas flagella pull-downs, could be considered. In this context, the conclusion on page 7 that "The uncharacterized UBX-domain-containing protein was validated by AF-M as a direct IFT172 interactor" is incorrect as a prediction of an interaction interface with AF-M does not validate a direct interaction per se.

      We agree with the reviewer that our AlphaFold-Multimer (AF-M) predictions alone do not constitute experimental validation of a direct interaction. We appreciate the reviewer's understanding of the technical challenges in validating this interaction experimentally. We have revised our text (p. 7) to state that "The uncharacterized UBX-domain-containing protein was predicted by AF-M as a potential direct IFT172 interactor" and discuss the AF-M predictions as computational evidence that suggests, but does not prove, a direct interaction.

      Reviewer #2 (Public review):

      Summary:

      Cilia are antenna-like extensions projecting from the surface of most vertebrate cells. Protein transport along the ciliary axoneme is enabled by motor protein complexes with multimeric so-called IFT-A and IFT-B complexes attached. While the components of these IFT complexes have been known for a while, precise interactions between different complex members, especially how IFT-A and IFT-B subcomplexes interact, are still not entirely clear. Likewise, the precise underlying molecular mechanism in human ciliopathies resulting from IFT dysfunction has remained elusive.

      Here, the authors investigated the structure and putative function of the to-date poorly characterised C-terminus of IFT-B complex member IFT172 using alpha-fold predictions, crystallography and biochemical analyses including proteomics analyses followed by mass spectrometry, pull-down assays, and TGFbeta signalling analyses using chlamydomonas flagellae and RPE cells. The authors hereby provide novel insights into the crystal structure of IFT172 and identify novel interaction sites between IFT172 and the IFT-A complex members IFT140/IFT144. They suggest a U-box-like domain within the IFT172 C-terminus could play a role in IFT172 auto-ubiquitination as well as for TGFbeta signalling regulation.

      As a number of disease-causing IFT72 sequence variants resulting in mammalian ciliopathy phenotypes in IFT172 have been previously identified in the IFT172 C-terminus, the authors also investigate the effects of such variants on auto-ubiquitination. This revealed no mutational effect on mono-ubiquitination which the authors suggest could be independent of the U-box-like domain but reduced overall IFT172 ubiquitination.

      Strengths:

      The manuscript is clear and well written and experimental data is of high quality. The findings provide novel insights into IFT172 function, IFT complex-A and B interactions, and they offer novel potential mechanisms that could contribute to the phenotypes associated with IFT172 C-terminal ciliopathy variants.

      Weaknesses:

      Some suggestions/questions are included in the comments to the authors below.

      Reviewer #3 (Public review):

      Summary:

      Zacharia et al report on the molecular function of the C-terminal domain of the intraflagellar transport IFT-B complex component IFT172 by structure determination and biochemical in vitro and cell culture-based assays. The authors identify an IFT-A binding site that mediates a mutually exclusive interaction to two different IFT-A subunits, IFT144 and IFT140, consistent with interactions suggested in anterograde and retrograde IFT trains by previous cryo-electron tomography studies. Additionally, the authors identify a U-box-like domain that binds ubiquitin and conveys ubiquitin conjugation activity in the presence of the UbcH5a E2 enzyme in vitro. RPE1 cell lines that lack the U-box domain show a reduction in ciliation rate with shorter cilia, and heterozygous cells manifest TGF-beta signaling defects, suggesting an involvement of the U-box domain in cilium-dependent signaling.

      Strengths:

      (1) The structural analyses of the C-terminal domain of IFT172 combine crystallography with structure prediction using state-of-the-art algorithms, which gives high confidence in the presented protein structures. The structure-based predictions of protein interactions are validated by further biochemical experiments to assess the specific binding of the IFT172 C-terminal domains with other proteins.

      (2) The finding that the IFT172 C-terminus interactions with the IFT-A components IFT140 and IFT144 appear mutually exclusive confirm a suggested role in mediating the binding of IFT-B to IFT-A in anterograde and retrograde IFT trains, which is of very high scientific value.

      (3) The suggested molecular mechanism of IFT train coordination explains previous findings in Chlamydomonas IFT172 mutants, in particular an IFT172 mutant that appeared defective in retrograde IFT, as well as mutations identified in ciliopathy patients.

      (4) The identification of other IFT172 interactors by unbiased mass spectrometry-based proteomics is very exciting. Analysis of stoichiometries between IFT components suggests that these interactors could be part of IFT trains, either as cargos or additional components that may fulfill interesting functions in cilia and flagella.

      (5) The authors unexpectedly identify a U-box-like fold in the IFT172 C-terminus and thoroughly dissect it by sequence and mutational analyses to reveal unexpected ubiquitin binding and potential intrinsic ubiquitination activity.

      (6) The overall data quality is very high. The use of IFT172 proteins from different organisms suggests a conserved function.

      Weaknesses:

      (1) Interaction studies were carried out by pulldown experiments, which identified more IFT172 interaction partners. Whether these interactions can be seen in living cells remains to be elucidated in subsequent studies.

      We agree with the reviewer that validation of protein-protein interactions in living cells provides important physiological context. While our pulldown experiments have identified several promising interaction partners and the AF-M predictions provide computational support for these interactions, we acknowledge that demonstrating these interactions in vivo would strengthen our findings. However, we believe our current biochemical and structural analyses provide valuable insights into the molecular basis of IFT172's interactions, laying important groundwork for future cell-based studies.

      (2) The cell culture-based experiments in the IFT172 mutants are exciting and show that the U-box domain is important for protein stability and point towards involvement of the U-box domain in cellular signaling processes. However, the characterization of the generated cell lines falls behind the very rigorous analysis of other aspects of this work.

      We thank the reviewer for noting that the characterization of our cell lines could be more rigorous. In the revised version of the manuscript, we have addressed this by providing additional validation data for all four engineered RPE1 cell lines. First, we performed Sanger sequencing to confirm precise in-frame integration of the GFP tag at the targeted loci and to exclude unintended insertions or deletions (indels), both for the full-length IFT172-eGFP lines (Fig. S6) and for the IFT172∆U-box-eGFP lines (Fig. S7). Second, we performed anti-IFT172 immunoblotting on all four cell lines alongside parental RPE1 cells, confirming expression of both the full-length and U-box-truncated IFT172 proteins (Fig. S8). Notably, the immunoblot revealed reduced steady-state levels of the IFT172∆U-box protein compared to full-length IFT172, providing direct biochemical evidence that loss of the U-box domain compromises IFT172 protein stability consistent with the ciliogenesis phenotype described in the main text. Together, these data verify the integrity of the edited loci at both the genomic and protein levels, and strengthen the validation of the cellular models used in this study.

      Overall, the authors achieved to characterize an understudied protein domain of the ciliary intraflagellar transport machinery and gained important molecular insights into its role in primary cilia biology, beyond IFT. By identifying an unexpected functional protein domain and novel interaction partners the work makes an important contribution to further our understanding of how ciliary processes might be regulated by ubiquitination on a molecular level. Based on this work it will be important for future studies in the cilia community to consider direct ubiquitin binding by IFT complexes.

      Conceptually, the study highlights that protein transport complexes can exhibit additional intrinsic structural features for potential auto-regulatory processes. Moreover, the study adds to the functional diversity of small U-box and ubiquitin-binding domains, which will be of interest to a broader cell biology and structural biology audience.

      Additional comments:

      The authors investigate the consequences of the U-box deletion on ciliary TGF-beta signaling. While a cilium-dependent effect of TGF-beta signaling on the phosphorylation of SMAD2 has been demonstrated, the precise function of cilia in AKT signaling has not been fully established in the field. Therefore, the relevance of this finding is somewhat unclear. It may help to discuss relevant literature on the topic, such as Shim et al., PNAS, 2020.

      We appreciate the reviewer's comment highlighting that the role of primary cilia in AKT signaling is not as well established as for SMAD2/3. However, we note that a direct functional link between AKT signaling and ciliogenesis has been demonstrated, showing that AKT regulates ciliogenesis initiation through a Rab11-effector switch mechanism (Walia et al., 2019; PMID: 31204173, co-authored by the corresponding author of this study). Furthermore, Shim et al. (PMID: 33753495) demonstrated a cilia-dependent reciprocal activation of AKT1 and SMAD2/3. In the revised manuscript (p. 19, ref. 97), we have expanded the discussion to cite these studies and provide a clearer literature context for the cilia-AKT connection, while acknowledging that the precise mechanism by which the IFT172 U-box domain influences AKT activation requires further investigation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Points for the discussion:

      (1) The discussion should mention that IFT-A subunits IFT121, IFT122 and IFT144 share a similar domain organization to IFT172 (TPRs terminating in Zn-finger-like domains). Do the authors consider these as potential ubiquitin-binding proteins with E3 ligase activity? The possibility that these Zn-finger-like regions share a common origin, and function to stabilize the proteins or mediate IFT subunit interactions without a role in ubiquitin biology should be considered.

      We appreciate this important point. We agree that the shared domain architecture across IFT121, IFT122, IFT144, and IFT172 raises the question of whether these C-terminal domains primarily serve structural rather than ubiquitin-related roles. We have added a discussion paragraph (p. 16) acknowledging that a structural/stabilizing function is the more parsimonious explanation, while noting that whether IFT172's U-box-like domain has additionally acquired ubiquitin-related activity remains an open question.

      (2) From their modeling data, do the authors have an explanation for why a substitution as conservative as D1605E would cause disease?

      The D1605E substitution maps to the IFT172-IFT-A interaction interface (Fig. 1F). While this is a conservative change, D1605 is located at a tightly packed protein-protein interface where even the addition of a single methylene group (the difference between aspartate and glutamate) could introduce steric clashes with residues of IFT140 or IFT144, or alter the precise geometry of hydrogen bonds or salt bridges critical for the interaction. Unfortunately, this level of detail is beyond the resolution of AlphaFold models. However, the fact that this residue is positioned directly at the binding interface provides a plausible structural rationale for its pathogenicity.

      (3) The authors speculate that the L1615P mutation in the Chlamydomonas fla11 strain causes a faulty switch to retrograde IFT and this provides a molecular basis for the retrograde IFT phenotype. However, because the mutation is also within the IFT144 binding site, why is anterograde IFT also not affected?

      The fla11 L1615P mutation resides in helix αA, which participates in both IFT144 (anterograde) and IFT140 (retrograde) interactions. The predominantly retrograde phenotype can be rationalized by the fundamentally different structural roles of the IFT172 C-terminus in anterograde versus retrograde trains. In anterograde trains, the IFT172 C-terminus acts as a flexible tether in stoichiometric excess (2:1 IFT-B:IFT-A ratio), providing an avidity effect that likely compensates for reduced binding affinity caused by L1615P (Lacey et al., 2023). Additional lateral interactions between IFT-B subunits further stabilize the anterograde polymer independently of the IFT172-IFT144 link. In contrast, the retrograde train requires the IFT172 C-terminus to adopt a rigid, resolved conformation that is integral to the IFT-A dimeric interface, with no redundant lateral interactions to compensate (Lacey et al., 2024). The helix-breaking L1615P mutation would specifically disrupt this precise structural requirement, explaining the selective retrograde IFT defect in fla11. We have added this discussion to the revised manuscript (p. 16).

      Minor:

      (1) On page 5, the authors describe the fla11 phenotypes including accumulation of IFT particles at the tip and accumulation of ubiquitinated proteins in the cilium. Could the authors please expand on how this suggests that IFT172 could be involved in ciliary ubiquitination events and discuss an alternative scenario of impaired assembly of functional retrograde IFT in this strain leading to accumulation of ubiquitinated proteins?

      In the revised manuscript (p. 16), we have expanded the discussion of the fla11 phenotype to address this point. We now discuss how the distinct structural roles of the IFT172 C-terminus in anterograde versus retrograde trains explain the selective retrograde IFT defect in fla11, and explicitly note that the accumulation of ubiquitinated proteins in fla11 cilia may reflect impaired retrograde IFT-mediated clearance rather than a direct role of IFT172 in ciliary ubiquitination.

      (2) The authors should also expand on the literature of known UBX-IFT interactions in their manuscript (e.g. Raman et al. PMID 26389662).

      We have expanded the discussion of UBX-IFT interactions in the revised manuscript (p. 7) by citing the work of Raman et al. (PMID 26389662), who identified a direct interaction between the UBX-domain protein UBXN10 and IFT-B via CLUAP1/IFT38 for VCP-mediated regulation of IFT complex integrity. This provides important context for our identification of a UBX-domain protein as an IFT172 interactor.

      (3) On page 11, I1688 is incorrectly referred to as I688.

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) The finding that the interaction with IFT140/144 is mutually exclusive is very interesting. Could you speculate on or do you have any data regarding the effects to the overall IFT-complex conformation and downstream biological effects depending on which partner is bound?

      I am not a structural biologist so this may be an irrelevant/impossible-to-answer question: I was also wondering as Ref 46 has shown that the dynein-2 motor complex binds to the edge of IFT-B2 (for assembled trains): Could the IFT172 C-terminus be involved here or somehow influence this interaction? In your mass spec data from Cr cilia using CrIFT172_968-C you don`t mention pulling down dynein-2 components so there doesn`t seem to be a direct interaction, but could the IFT-B2 conformation depend on if IFT172 has bound IFT-140 or IFT144 and hence this interaction influence the dynein-2 binding?

      We thank the reviewer for this insightful question. Based on recent cryo-ET structures of anterograde and retrograde IFT trains (Lacey et al., 2023; 2024), the switch from IFT144 to IFT140 binding fundamentally changes IFT172's structural role. In anterograde trains, the IFT172 C-terminus acts as a flexible tether tolerating the 2:1 IFT-B:IFT-A stoichiometry and permitting long polymer formation. In retrograde trains, it adopts a rigid conformation integral to the IFT-A dimeric interface, driving the formation of discrete retrograde units with distinct architecture.

      Regarding Dynein-2: while IFT172 does not directly bind Dynein-2 (consistent with our MS data), the reviewer's intuition is correct that IFT172's binding partner influences Dynein-2 association. In anterograde trains, autoinhibited Dynein-2 binds a composite surface formed between adjacent IFT-B2 repeats. When IFT172 switches to IFT140 at the ciliary tip, the resulting train depolymerization destroys this composite binding site, releasing Dynein-2 from its cargo mode to function as an active retrograde motor. The IFT172 binding switch may thus indirectly acts as a structural checkpoint for Dynein-2 activation.

      (2) The data provided regarding TGFbeta signalling effects in cells with heterozygous U-box-like domain deletions is interesting. While secondary effects of impaired ciliogenesis due to homozygous deletion of the U-box-like domain can cause difficulties to analysing cell signalling effects, it would still be interesting to check the effects of bi-allelic human IFT172 disease variants in this region as well (the human disease phenotype is recessive and human mutations are likely hypomorphic variants still allowing for ciliogenesis).

      Also, while there may be secondary effects, it would still be interesting to check homozygous U-box deleted cells as an aggravated effect would further support the data from the het cells.

      We agree that testing bi-allelic human disease variants would strengthen the physiological relevance of our findings. While generating knock-in RPE1 lines was beyond the scope of this revision, we have obtained preliminary data from patient-derived fibroblasts carrying bi-allelic IFT172 missense variants in the U-box region (NPH2161). TGF-β1 stimulation time courses in these fibroblasts show altered p-SMAD2 kinetics compared to control fibroblasts, consistent with the phenotype observed in our heterozygous U-box deleted RPE1 cells (see Author response image 3).

      Author response image 3.

      While these results are preliminary and require further replication, they support the involvement of the IFT172 U-box domain in TGF-β signaling regulation in a disease-relevant context. Regarding homozygous U-box deleted cells, the severe reduction in IFT172 protein levels and ciliogenesis defects (Fig. 5B,D) make it difficult to separate U-box-specific effects from secondary consequences of impaired cilia formation, as the reviewer notes. We consider this an important direction for future studies using targeted point mutations rather than domain deletions.

      (3) Figure 5 E-G: Overall, the effects upon TGFB1 addition are rather small compared to previously published data eg Clement et al Cell reports 2013 where one of the authors is the senior. Are RPE cells less responsive or do you have another theory? Did you check TGFB receptor levels to ensure the differences are not due to different levels of receptor expression? I feel it could be interesting to also check ciliary phopsho-SMAD localisation by IF. In Clement et al, loss of IFT88 results in reduced phospho-SMAD2 levels, do you have any theory why these opposite effects compared to the IFT172 loss of function could occur?

      We thank the reviewer for this insightful comment. The Tg737orpk fibroblasts used in Clement et al. (2013), which harbor a hypomorphic mutation in IFT88, exhibit severely stunted cilia. This defect broadly disrupts cilium-dependent signaling pathways, including R-SMAD activation, and is therefore expected to produce more pronounced signaling phenotypes. In contrast, our study utilizes RPE-1 cells with structurally intact cilia, enabling us to investigate more specific alterations in ciliary signaling associated with IFT172 function rather than the global effects of cilia loss. Consequently, the more modest effects observed in our system are consistent with the less severe structural and functional perturbation. Both fibroblasts and RPE-1 cells are known to express TGF-β receptors and to respond robustly to TGF-β stimulation, making it unlikely that differences in receptor abundance alone account for the observed discrepancies. We also note that increasing evidence supports a role for the primary cilium in fine-tuning TGF-β signaling output by coordinating both canonical (R-SMAD-mediated) and non-canonical (e.g., AKT/ERK-mediated) pathways. Our data raise the possibility that loss of the IFT172 U-box domain, or reduced IFT172 levels, may differentially affect this balance, rather than simply attenuating signaling uniformly, as seen with more severe ciliary defects such as IFT88 disruption in Tg737orpk cells. We agree that the current dataset does not fully resolve the underlying mechanism. We therefore consider it an important direction for future work to examine, in greater detail, the localization and phosphorylation status of key canonical and non-canonical signaling components in context of the primary cilium by IF analyses.

      (4) In the summary conclusion at the end of the discussions, the authors propose that IFT72 could directly influence the fate of ubiquitinated TGFB receptors. Do you have any data supporting the theory that TGFB ubiquitination is influenced by IFT172 ?

      We acknowledge that our current data are insufficient to establish a direct link between IFT172-dependent ubiquitination events and TGF-β receptor regulation. Accordingly, we have revised the Discussion (page 19) to remove our previous hypothesis proposing a role for IFT172 in modulating TGF-β receptor ubiquitination.

      While our experiments demonstrate that the U-box region is required for both IFT172 stability and proper TGF-β signaling, we agree that establishing a direct mechanistic connection between ubiquitin-related activity of IFT172 and signaling outcomes would require additional approaches such as targeted point mutations that selectively disrupt ubiquitin-binding or conjugation functions.

      Furthermore, we note that our current data do not allow us to distinguish whether the observed signaling phenotypes arise specifically from the loss of ubiquitin-related functions of the U-box domain or from reduced levels of functional IFT172 protein in the heterozygous U-box–deleted cells.

      (5) Wording:

      Abstract

      "IFT72..is associated with several disease variants causing ciliopathies". I would change this to "..and several disease-causing IFT172 variants have been identified in ciliopathy patients".

      Corrected.

      Introduction

      "Another cohort of patients with milder ciliopathy resembling BBS also presented with ...". I would reword this to "Another cohort of patients with phenotypically slightly different ciliopathy features resembling BBS also presented with ...". It`s not necessarily less severe (they may die of cardiovascular complications in their early thirties for example due to metabolic syndrome, they are intellectually impaired, become blind...), but rather different.

      Changed according to the reviewer’s recommendations.

      Reviewer #3 (Recommendations for the authors):

      (1) Recommended modifications:

      (a) The RPE lines generated should be described better, i.e. sequencing information should be provided, or some kind of evidence that the lines are what they are supposed to be.

      As also noted above, we acknowledge that the characterization presented for the RPE cell lines was insufficient in the initial version of the manuscript. In the revised version, we have addressed this limitation by including detailed sequencing analyses to validate the modifications introduced. Specifically, we provide sequencing data confirming both the integration of the GFP tag and the successful deletion of the U-box domain in all four engineered RPE cell lines. These data verify the integrity of the edited loci and exclude the presence of unintended insertions or deletions at the targeted regions. The corresponding results are presented in Figures S6 and S7 of the revised manuscript, thereby strengthening the validation of the cellular models used in this study.

      (b) It would be more convincing if more than one clone of the RPE lines were presented, as this could rule out possible clonal effects.

      We acknowledge that only a single clone was characterized for each of the four genotypes (IFT172-FL homozygous, IFT172-FL heterozygous, IFT172∆U-box homozygous, IFT172∆U-box heterozygous), and we agree that independent clones would provide stronger protection against clonal artifacts. Generating and validating additional clones was not feasible within the scope of this revision. However, several features of our data mitigate this concern. First, the phenotypes scale with allele dosage: the homozygous ∆U-box line shows the strongest reduction in IFT172 protein level, ciliation, and cilium length, while the heterozygous line shows intermediate defects (Fig. 5B, D and Fig. S8). A clonal off-target effect would not be expected to produce this dose-dependent pattern across two independently isolated lines. Second, the reduced steady-state IFT172 level in the ∆U-box lines (Fig. S8) is consistent with our in vitro observation that the U-box/TPR interface is required for protein stability, providing an independent biochemical rationale for the cellular phenotype. Third, Sanger sequencing of all four lines confirmed precise in-frame integration with no indels at the targeted locus (Figs. S6, S7). We have added a sentence to the Discussion (p. 20) acknowledging that confirmation in additional independent clones remains an important goal for follow-up work.

      (c) Figure 5C: distribution of the GFP-tagged IFT172∆U-box protein could be quantified to support the statement.

      In the revised version of the manuscript, we have included additional quantification of GFP fluorescence across all four cell lines to support our conclusions regarding IFT172 ciliary localization. The corresponding data for each cell line are presented in Figure S5C–F.

      (d) The final sentences include quite bold statements about a general function of IFT172 in signal regulation. Yet, the evidence is the weakest part of the work. It is only shown in i) one cell line, ii) in one cell clone that is not extensively characterized, and iii) for one signaling pathway that is not the best-studied cilia signaling pathway. Therefore, I recommend a more moderate statement.

      Abstract last sentence has now been toned down and reads: Our findings suggest that IFT172, beyond its structural role in bridging IFT-A and IFT-B complexes within IFT trains, harbors a conserved U-box-like domain with potential involvement in ciliary ubiquitination processes and signaling, providing new insights into the molecular mechanisms underlying IFT172-related ciliopathies.

      (e) The order of the figures is not followed in the main text, which is distracting.

      The order of figures is now consecutive in the revised manuscript.

      (2) Questions and comments to consider:

      (a) It is unclear why tetra-ubiquitin chains have been used.

      We thank the reviewer for this question. Recent evidence suggests that ubiquitin chains, rather than monomeric ubiquitin, act as sorting and signaling cues at the primary cilium (Shinde et al., 2020). To probe the ubiquitin-binding activity of IFT172, we therefore used a tetrameric ubiquitin chain as a model substrate, which better reflects the multivalent nature and binding avidity expected for physiological polyubiquitin signals than a ubiquitin monomer. Specifically, we used a recombinantly expressed linear (Met1-linked) tetra-ubiquitin chain, generated as a genetically encoded fusion. Linear ubiquitin chains are well-established non-degradative signaling chains recognized by a dedicated class of ubiquitin-binding domains, making them a suitable probe for detecting ubiquitin-binding activity outside the canonical proteasomal pathway. In addition, monomeric ubiquitin (~8 kDa) is poorly retained during membrane transfer in Western blotting, which further precluded its reliable use as a probe in our pull-down assays. Together, these considerations motivated the use of tetrameric ubiquitin as a biologically and technically appropriate substrate for assessing IFT172's ubiquitin-binding activity.

      (b) Figure 4D: described in the text as "pulldown tetraubiquitin at comparable levels", which is not obvious from the figure presented, it appears reduced by at least 30%.

      We thank the reviewer for this observation. As described on page 10 of the manuscript and evident from Figure 4D, the purified GST–HsIFT172C3 construct underwent substantial proteolytic cleavage during purification. This degradation limited our ability to include amounts of intact GST–HsIFT172C3 comparable to those of the full-length GST–HsIFT172C2 construct in the pull-down assays. Importantly, when accounting for the reduced proportion of full-length GST–HsIFT172C3 present in the assay, the observed differences in tetra-ubiquitin pull-down efficiency between the two constructs are expected to be comparable. This is supported by the Coomassie staining shown in Figure 4D, which reflects the relative abundance of the intact protein species used in the experiment.

      (c) With the proposed model, why would the fla11 mutant only affect retrograde IFT?

      We have revised our manuscript in page 16 of the discussion section providing a plausible explanation of why only retrograde IFT is affected in the fla11 mutant.

      (3) Minor copy-editing:

      (a) Page 3, first paragraph: led := leads.

      (b) Kinesin-2 and Dynein-2 should be hyphenated.

      (c) Page 4: wwp1 should be WWP1.

      (d) Bonafide should be italicized: bona fide.

      (e) Some abbreviations appear uncommon and therefore somewhat distracting: TGFB instead of TGF-beta, Cr in instances where specifically referred to the organism.

      (f) Unprecise lab jargon: "very C-terminal".

      (g) Lab jargon: "purified a C-terminal construct".

      (h) Lab jargon: "pull-downs".

      (i) Page 8: "DALI" only abbreviated.

      (j) Page 9: "Appearance ... were observed" should be "was".

      (k) Page 11: "I688" should be "I1688".

      (l) Page 12: "PDs" unclear.

      These minor points have been corrected.

      We have revised the text and figures to ensure using the widely accepted nomenclature, using TGF-β to refer to the signaling pathway and TGF-β1 specifically when referring to the ligand.

      We further revised the text to reflect the use “Chlamydomonas reinhardtii” in instances when referring to the organism and “Cr” when referring to the protein.

      We have removed the informal phrases "very C-terminal" and "purified a C-terminal construct" from the revised manuscript. We have retained the term "pull-down," as this is well-established and widely used terminology in the biochemistry literature to describe the affinity-based co-isolation assays used here. PD has been replaced with pull-down.

      The grammatical error on page 9 ("Appearance... “were observed") has been corrected to "was observed”.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to investigate the development of infants' responses to music by examining neural activity via EEG and spontaneous body kinematics using video-based analysis. The authors also explore the role of musical pitch in eliciting neural and motor responses, comparing infants at 3, 6, and 12 months of age.

      Strengths:

      A key strength of the study lies in its analysis of body kinematics and modeling of stimulus-motor coupling, demonstrating how the amplitude envelope of music predicts infant movement, and how higher musical pitch may enhance auditory-motor synchronization.

      EEG data provide evidence for enhanced neural responses to music compared to shuffled auditory sequences. These findings ecourage further investigation of the proposed developmental trajectory of neural responses to music and their link to musical behavior in infants.

      Comments on revisions:

      I thank the authors for the considerable effort devoted to revising the manuscript and addressing the raised questions and comments. I particularly appreciate the additional analyses and the extended arguments included in the discussion. I believe that this paper represents a valuable contribution to the literature on music development.

      One remaining comment concerns the evoked response observed in the shuffled condition, which I still find intriguing. Considering that the auditory events in the shuffled condition display a clear rise time, particularly for those events that were selected based on being preceded and followed by longer periods of silence, one would expect to observe an evoked response emerging from baseline. However, this pattern is not evident in the presented curves. The authors may further examine and discuss the shape and characteristics of these response patterns.

      We thank the Reviewer for highlighting this intriguing aspect of our data. We entirely agree that from a purely bottom-up, acoustic perspective, one would expect a clear onset-locked evoked response, such as an P1/P2 complex in adults or its developmental equivalent, given the prominent acoustic rise times and the surrounding periods of silence (such as those accounted for in the control analyses)

      The fact that these responses are not present in the curves for the shuffled condition was striking to us as well. We interpret this severe attenuation not as a failure of sensory perception, but potentially as a consequence of higher-level cognitive modulation. Specifically, because the shuffled condition completely lacks structural regularities, the brain might be unable to build reliable temporal and/or melodic expectations. In the absence of a learnable structure, the auditory system likely down-weights the processing of these random sequences to conserve cognitive resources, leading participants to attentionally disengage.

      This phenomenon aligns with both developmental and adult models of auditory processing. For instance, the "Goldilocks effect" demonstrates that infants systematically withdraw attention from auditory sequences that are entirely unpredictable (Kidd et al., 2014). Similarly, adult auditory literature suggests that while predictable patterns automatically capture attention, random and unpredictable acoustic streams could be actively tuned out (Dayan et al., 2000; Esber & Haselgrove, 2011).

      Following the Reviewer’s helpful suggestion to further discuss the characteristics of these response patterns, we have expanded our description and interpretation of the shuffled condition curves in the revised manuscript. We added the following text to the Methods and Discussion to explicitly address the dampened shape of these responses:

      p. 9: “Importantly, and in line with the adults’ data, all infant groups exhibited enhanced P1 amplitudes in response to music compared to shuffled music. Actually, across all groups, shuffled music did not elicit clear ERPs as the ones elicited by music”.

      p.20: “This process was markedly dampened or interrupted by shuffled music (Bianco et al., 2024, 2025; Lense et al., 2022), a finding that could be interpreted as evidence of disengagement from such highly unpredictable sequences (Dayan et al., 2000; Esber & Haselgrove, 2011; Kidd et al., 2014).”

      Reviewer #2 (Public review):

      Summary:

      Infants' auditory brain responses reveal processing of music (clearly different from shuffled music patterns) from the age of 3 months; however, they do not show related increase in spontaneous movement activity to music until the age of 12 months.

      Strengths:

      This is a nice paper, well designed, with sophisticated analyses and presenting clear results filling an important gap about early infant sensitivity, detection, and differentiation of musical sounds. The addition of EEG recordings (specifically ERPs) in response to music presentations at 3 different infant ages in the first postnatal year is important, and the manipulation of the music stimuli into shuffled, high and low pitch to capture differences in brain response processing and spontaneous movements is interesting. Further, the movement analysis based on Quantity of Movements (QoM) and movement subdivision into 10 distinct Principal Movements (PMs) is novel and creative.

      Overall, results show that ERPs responses to music occurs earlier than QoM in early development, and that even at 12 months, motor responses to music remain coarse and not rhythmically aligned with the music tempo. This work increases our fundamental understanding of infants' early music perception in relation to auditory processing and motor response.

      Comments on revisions:

      The authors have addressed my questions in their revision. I have no other questions. Thanks again for the opportunity to read and evaluate this interesting work.

      We thank the Reviewer for their time, their positive evaluation of our revised manuscript, and their constructive feedback throughout the review process, which has greatly helped us to strengthen this paper.

      Reviewer #3 (Public review):

      Summary

      This study provides a detailed investigation of neural auditory responses and spontaneous movements in infants listening to music. Analyses of EEG data (event-related potentials and steady-state responses) first highlighted that infants at 3, 6 and 12 months of age and adults showed enhanced auditory responses to music than shuffled music. 6-month-olds also exhibited enhanced P1 response to high-pitch vs low-pitch stimuli, but not the other groups. Besides, whole body spontaneous movements of infants were decomposed into 10 principal components. Kinematic analyses revealed that the quantity of movement was higher in response to music than shuffled music only at 12 months of age. Although Granger causality analysis suggested that infants' movement was related to the music intensity changes, particularly in the high-pitch condition, infants did not exhibit phase-locked movement responses to musical events, and the low movement periodicity was not coordinated with music.

      Strengths

      This study investigates an important topic on the development of music perception and translation to action and danse. It targets a crucial developmental period that is difficult to explore. It evaluates two modalities by measuring neural auditory responses and kinematics, while cross-modal development is rarely evaluated. Overall, the study fills a clear gap in the literature.

      Besides, the study uses state-of-the-art analyses. Detailed investigations were performed, as well as exploratory analyses in supplementary information. The discussion is rich in neurodevelopmental interpretations and comparisons with the literature. All steps are clearly detailed. The manuscript is very clear, well-written and pleasant to read. Figures are well-designed and informative. The authors' responses to previous reviews are also detailed and informative.

      Comments on revisions:

      The authors answered all my questions.

      Thank you very much for your positive evaluation and for taking the time to review our revisions. We deeply appreciate your insightful comments across the review rounds, which have helped us improve the clarity and rigor of our paper.

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study presents results supporting a model that tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the stem cell niche and inhibit the differentiation of neighboring cells. The valuable findings show that GSC tumors often contain non-mutant cells whose differentiation is suppressed by the GSC tumorous cells. However, the evidence showing that the GSC tumors produce BMP ligands to suppress differentiation of non-mutant cells is incomplete due to concerns about the new HCR data.

      Thanks for this assessment. All concerns raised by the reviewers regarding the HCR data and others are followed by our responses below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This preprint from Shaowei Zhao and colleagues presents results that suggest tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the ovarian stem cell niche and inhibit the differentiation of neighboring non-mutant GSC-like cells. The authors use FRT-mediated clonal analysis driven by a germline-specific gene (nos-Gal4, UASp-flp) to induce GSC-like cells mutant for bam or bam's co-factor bgcn. Bam-mutant or bgcn-mutant germ cells produce tumors in the stem cell compartment (the germarium) of the ovary (Fig. 1). These tumors contain non-mutant cells - termed SGC for single-germ cells. 75% of SGCs do not exhibit signs of differentiation (as assessed by bamP-GFP) (Fig. 2). The authors demonstrate that block in differentiation in SGC is a result of suppression of bam expression (Fig. 2). They present data suggesting that in 73% of SGCs BMP signaling is low (assessed by dad-lacZ) (Fig. 3) and proliferation is less in SGCs vs GSCs. They present genetic evidence that mutations in BMP pathway receptors and transcription factors suppress some of the non-autonomous effects exhibited by SGCs within bam-mutant tumors (Fig. 4). They show data that bam-mutant cells secrete Dpp, but this data is not compelling (see below) (Fig. 5). They provide genetic data that loss of BMP ligands (dpp and gbb) suppresses the appearance of SGCs in bam-mutant tumors (Fig. 6). Taken together, their data support a model in which bam-mutant GSC-like cells produce BMPs that act on non-mutant cells (i.e., SGCs) to prevent their differentiation, similar to what in seen in the ovarian stem cell niche. This preprint from Shaowei Zhao and colleagues presents results that suggest tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the ovarian stem cell niche and inhibit the differentiation of neighboring non-mutant GSC-like cells. The authors use FRT-mediated clonal analysis driven by a germline-specific gene (nos-Gal4, UASp-flp) to induce GSC-like cells mutant for bam or bam's co-factor bgcn. Bam-mutant or bgcn-mutant germ cells produce tumors in the stem cell compartment (the germarium) of the ovary (Fig. 1). These tumors contain non-mutant cells - termed SGC for single-germ cells. 75% of SGCs do not exhibit signs of differentiation (as assessed by bamP-GFP) (Fig. 2). The authors demonstrate that block in differentiation in SGC is a result of suppression of bam expression (Fig. 2). They present data suggesting that in 73% of SGCs BMP signaling is low (assessed by dad-lacZ) (Fig. 3) and proliferation is less in SGCs vs GSCs. They present genetic evidence that mutations in BMP pathway receptors and transcription factors suppress some of the non-autonomous effects exhibited by SGCs within bam-mutant tumors (Fig. 4). They show data that bam-mutant cells secrete Dpp, but this data is not compelling (see below) (Fig. 5). They provide genetic data that loss of BMP ligands (dpp and gbb) suppresses the appearance of SGCs in bam-mutant tumors (Fig. 6). Taken together, their data support a model in which bam-mutant GSC-like cells produce BMPs that act on non-mutant cells (i.e., SGCs) to prevent their differentiation, similar to what in seen in the ovarian stem cell niche.

      Strengths:

      (1) Use of an excellent and established model for tumorous cells in a stem cell microenvironment

      (2) Powerful genetics allow them to test various factors in the tumorous vs non-tumorous cells

      (3) Appropriate use of quantification and statistics

      Thank you for your valuable comments, and we greatly appreciate them.

      Weaknesses:

      (1) What is the frequency of SGCs in nos>flp; bam-mutant tumors? For example, are they seen in every germarium, or in some germaria, etc or in a few germaria.

      This concern was addressed in the rebuttal. The line number is 106, not line 103.

      (2) Does the breakdown in clonality vary when they induce hs-flp clones in adults as opposed to in larvae/pupae?

      This concern was addressed in the rebuttal. However, these statements are no on lines 331-335 but instead starting on line 339. Please be accurate about the line numbers cited in the rebuttal. They need to match the line numbers in the revised manuscript.

      We have rechecked the line numbers and confirmed that the mismatch arose from the Word-to-PDF conversion process on the eLife website. As this issue has recurred and reviewers’ file-format preferences are unknown to us, we have added a clarifying note at the beginning of each response letter: “Please note that the line numbers cited refer to the revised manuscript in the Microsoft Word format”.

      (3) Approximately 20-25% of SGCs are bam+, dad-LacZ+. Firstly, how do the authors explain this? Secondly, of the 70-75% of SGCs that have no/low BMP signaling, the authors should perform additional characterization using markers that are expressed in GSCs (i.e., Sex lethal and nanos).

      The authors did not perform additional staining for GSC-enriched protein like Sex lethal and nanos.

      The 70-75% of SGCs that have low BMP signaling display the following characteristics: 1) dot-like spectrosomes, 2) positivity for Dad-lacZ, and 3) absence of bamP-GFP expression. This combination of traits is sufficient to classify them as GSC-like cells. Neither Sex lethal nor Nanos is expressed exclusively in GSCs (Chau et al., 2009; Li et al., 2009), rendering them unsuitable for distinguishing GSC-like from cystoblast-like cells.

      (4) All experiments except Fig. 1I (where a single germarium with no quantification) were performed with nos-Gal4, UASp-flp. Have the authors performed any of the phenotypic characterizations (i.e., figures other than figure 1) with hs-flp?

      In the rebuttal, the authors stated that they used nos>flp for all figures except for Fig. 1I. It would be more convincing for them to prove in Fig. 1 than there is not phenoytpic difference between the two methods and then switch to the nos>FLP method for the rest of the paper.

      We appreciate this suggestion. These data are included in Figure 1-figure supplement 3 in the revised manuscript.

      (5) Does the number of SGCs change with the age of the female? The experiments were all performed in 14-day old adult females. What happens when they look at young female (like 2-day old). I assume that the nos>flp is working in larval and pupal stages and so the phenotype should be present in young females. Why did the authors choose this later age? For example, is the phenotype more robust in older females? or do you see more SGCs at later time points?

      The authors did not supply any data to prove that the clones were larger in 14-day-old flies than in younger flies. Additionally, the age of "younger" flies was not specified. Therefore, the authors did not satisfactorily answer my concern.

      We appreciate this critical comment. Figure 1J includes the SGC phenotype data from 1-, 7-, and 14-day-old flies. Both 1- and 7-day-old flies are younger flies in our analyses. The evidence that germline clones were larger in 14-day-old flies than in younger flies was provided in Figure 1-figure supplement 2 in the revised manuscript.

      (6) Can the authors distinguish one copy of GFP versus 2 copies of GFP in germ cells of the ovary? This is not possible in the Drosophila testis. I ask because this could impact on the clonal analyses diagrammed in Fig. 4A and 4G and in 6A and B. Additionally, in most of the figures, the GFP is saturated so it is not possible to discern one vs two copies of GFP.

      In the rebuttal, the authors stated that they cannot differential one vs two copies of GFP. They used other clone labeling methods in Fig. 4 and 6. I think that the authors should make a statement in the manuscript that they cannot distinguish one vs two copies of GFP for the record.

      Thank you for this suggestion. Such statement has been added in the revised manuscript (Lines 177-178).

      (7) More evidence is needed to support the claim of elevated Dpp levels in bam or bgcn mutant tumors. The current results with dpp-lacZ enhancer trap in Fig 5A,B are not convincing. First, why is the dpp-lacZ so much brighter in the mosaic analysis (A) than in the no-clone analysis (B); it is expected that the level of dpp-lacZ in cap cells should be invariant between ovaries and yet LacZ is very faint in Fig. 5B. I think that if the settings in A matched those in B, the apparent expression of dpp-lacZ in the tumor would be much lower and likely not statistically significantly. Second, they should use RNA in situ hybridization with a sensitive technique like hybridization chain reactions (HCR) - an approach that has worked well in numerous Drosophila tissues including the ovary.

      The HCR FISH in Fig.5 of the revised manuscript needs an explanation for how the mRNA puncta were quantified. Currently, there is no information in the methods. What is meant but relative dpp levels. I think that the authors should report in and unbiased manner "number" of dpp or gbb puncta in TFs. For the germaria, I think that they should report the number of puncta of dpp or gbb divide by the total area in square pixels counted. Additionally, the background fluorescence is noticeably much higher in bamBG/delta86 germaria, which would (falsely) increase the relative intensity of dpp and gbb in bam mutants. Although, I commend the authors for performing HCR FISH, these data are still not convincing to me.

      We appreciate these critical comments. Due to variable puncta sizes and frequent clustering in TF and cap cells (see Figure 5A, C), direct quantification of puncta number was unreliable. Therefore, we quantified mean fluorescence intensity instead, as described in the revised figure legend of Figure 5 (Lines 603-604). In Author response image 1 1A, B (modified from Figure 5A, C) , magenta ovals indicate empty background fluorescence areas, which appear similar between w<sup>1118</sup> (wild-type control) and bam<sup>-/-</sup> germaria. In Author response image 1, the yellow oval outlines a neighboring germarium, not an empty area (see the DAPI channel).

      Author response image 1.

      In situ-HCR results of dpp and gbb in wild-type and bam mutant germaria. Magenta ovals indicate empty areas displaying only background fluorescence. In panel (B), the yellow oval outlines a neighboring germarium, not an empty area (see the DAPI channel below).

      (8) In Fig 6, the authors report results obtained with the bamBG allele. Do they obtain similar data with another bam allele (i.e., bamdelta86)?

      The authors did not try any experiments with the bamdelta86 allele, despite this allele being molecularly defined, where the bamBG allele is not defined.

      While we agree that repeating the experiments in Figure 6 with bam<sup>Δ86</sup> would be helpful, our mosaic analysis strategy for two genes on different chromosome arms is technically complex (see genotypes in Source data 1). Switching from bam<sup>BG</sup> to bam<sup>Δ86</sup> would necessitate extensive and time-consuming genetic recombination. Given that both alleles induce the SGC phenotype indistinguishably (Figure 1J), we believe that repeating these experiments with bam<sup>Δ86</sup> would not alter our key conclusion. We appreciate your understanding regarding this technical complexity.

      Reviewer #2 (Public review):

      In the current version, Zhang et al. have made substantial improvements to the manuscript. It is now easier to read, and the data are more solid compared with the previous version, supporting their conclusion that tumor GSCs secrete stemness factors (BMPs and Dpp) to suppress the differentiation of neighboring wild-type GSCs. This study should benefit a broad readership across developmental biology, germ cell biology, stem cell biology, and cancer biology.

      Thank you for your valuable comments, and we greatly appreciate them.

      However, the following suggestions may further improve the clarity and rigor of the research content:

      (1) Clarification of sample size (n).

      Each germarium can contain highly variable numbers of SGCs, sometimes reaching 50-100. When reporting "n" values, the authors are encouraged to also indicate the number of germaria analyzed. For example, in lines 126-128:

      "Notably, 74% of SGCs (n = 132) were GFP-negative, while the remaining 26% were GFP-positive (Figure 2B, C). This suggests that SGCs can be categorized into two distinct groups: those resembling GSCs (GSC-like) and those resembling cystoblasts (cystoblast-like)." Please clarify how many germaria were examined to obtain n = 132.

      We appreciate this comment. In 14-day-old fly ovaries, each germarium that met our criterion for quantifying the SGC phenotype contains approximately 1.5 SGCs (see Figure 1K). For the specific analysis of the “132” SGCs presented in Figure 2C, we did not record the number of germaria from which they originated.

      In addition, it is unclear whether the authors intend to suggest that the GFP-negative SGCs are GSC-like or cystoblast-like; this point should be clarified.

      Thank you for this suggestion. We intend to suggest that the bamP-GFP-negative SGCs are GSC-like, which information has been added in the revised manuscript (Line 129).

      (2) Improvement of Fig. 6 in situ hybridization images.

      The in situ hybridization images in Fig. 6 are not fully convincing. The control images, in particular, would benefit from higher resolution and enlarged views of the germarium region.

      Thank you for this valuable suggestion. The enlarged views of both the control and bam<sup>-/-</sup> germarium regions were included in Figure 5A, C in the revised manuscript.

      In panel C, abundant signals are also present outside the germarium, which may complicate interpretation and should be clarified or controlled for.

      In the right panel of Figure 5C, the abundant signals noted by the reviewer originate from neighboring germaria (see the DAPI channel), not from empty areas, which would be expected to show only background fluorescence. For more details, please refer to our response to Question (7) raised by Reviewer #1.

      Alternatively, the authors could strengthen the in situ analysis by using bam mutants or bam dpp / bam gbb double mutants as controls to better define signal specificity.

      We appreciate this comment. Homozygous dpp or gbb mutants are lethal, precluding the generation of dpp bam or gbb bam double-mutant flies. Additionally, the GFP signal was drastically reduced during our HCR processing, preventing mosaic clone analysis.

      Reviewer #3 (Public review):

      Zhang et al. investigated how germline tumors influence the development of neighboring wild-type (WT) germline stem cells (GSC) in the Drosophila ovary. They report that germline tumors generated by differentiation-arrested mutations (bam and bgcn) inhibit the differentiation of neighboring WT GSCs by arresting them in an undifferentiated state, resulting from reduced expression of the differentiation-promoting factor Bam. They find that these tumor cells produce low levels of the niche-associated signaling molecules Dpp and Gbb, which suppress bam expression and consequently inhibit the differentiation of neighboring WT GSCs non-cell-autonomously. Based on these findings, the authors propose that germline tumors mimic the niche to suppress the differentiation of the neighboring wild-type germline stem cells.

      Strengths:

      The study uses a well-established in vivo model to address an important biological question concerning the interaction between germline tumor cells and wild-type (WT) germline stem cells in the Drosophila ovary. If the findings are substantiated, this study could provide valuable insights that are applicable to other stem cell systems.

      Thank you for your valuable comments, and we greatly appreciate them.

      Weaknesses:

      The authors have addressed some of my concerns in the revised submission. However, the data presented do not allow the authors to distinguish whether the failed differentiation of WT stem cells/germline cells results from "arrested differentiation due to the loss of the differentiation niche" or from "direct inhibition by tumor-derived expression of niche-associated molecules Dpp and Gbb".

      Blocking Dpp or Gbb secretion specifically from germline tumor cells promoted differentiation of neighboring wild-type germ cells (Figure 6). This indicates that BMP ligands secreted by germline tumors are required to inhibit this differentiation. However, we cannot rule out the possibility that disruption of the differentiation niche also contributes to the SGC phenotype, a point highlighted in the manuscript (Line 204).

      The critical supporting data, HCR in situ results, are not sufficiently convincing.

      Below, we provide a point-by-point reply addressing each of your specific recommendations.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      It's a surprising that the authors failed to induce germline tumors at the adult stage, as this has been reported by many labs and would allow for time course analysis of SGC phenotype. As a result, the data in this manuscript address only events occurring after the germline tumor formation (with clonal induction at larval stage) and and focus on the already presene "arrested wild-type germ cells", without providing insight into the process of by which these arrested germ cells are formed.

      In our hands, inducing germline clones by the hs-FLP method at the adult stage was efficient in males but not in females, despite subjecting adult flies to intensive heat-shock at 37°C.

      The HCR in situ data exhibit a high background.

      Regarding the background issue, please see our response to Reviewer #1’s Question (7).

      First, the signal appears stronger in TF cells than in cap cells.

      As demonstrated by Li et al. (Li et al., 2016), dpp-lacZ (P4-lacZ) signals are also stronger in TF cells than in cap cells (see their Figure 4D').

      Second, both dpp and gbb are detected broadly in somatic cells including escort cells. These observations are inconsistent with published data.

      As shown in Figure 5A and C, dpp and gbb were detected broadly in somatic cells of bam<sup>-/-</sup> germaria, but not in those of w<sup>1118</sup> (wild-type) controls. To our knowledge, no previous study has reported the expression pattern of these ligands in a bam mutant background.

      To demonstrate the tumor-derived dpp and gbb, the HCR in situ analysis could be performed in the germarium with mosaic clones. If these niche-associated molecules are indeed expressed in tumor cells, the authors should observe a mosaic expression pattern of these molecules, with signal "ON" in tumor cells and "OFF" in neighbouring arrested germ cells.

      This is a great idea and was indeed our original approach. However, GFP signal was drastically reduced during our HCR processing, ultimately precluding mosaic clone analysis.

      References

      Chau, J., Kulnane, L.S., and Salz, H.K. (2009). Sex-lethal facilitates the transition from germline stem cell to committed daughter cell in the Drosophila ovary. Genetics 182, 121-132.

      Li, X., Yang, F., Chen, H., Deng, B., Li, X., and Xi, R. (2016). Control of germline stem cell differentiation by Polycomb and Trithorax group genes in the niche microenvironment. Development 143, 3449-3458.

      Li, Y., Minor, N.T., Park, J.K., McKearin, D.M., and Maines, J.Z. (2009). Bam and Bgcn antagonize Nanos-dependent germ-line stem cell maintenance. Proc Natl Acad Sci U S A 106, 9304-9309.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment:

      This study presents a valuable theoretical exploration on the electrophysiological mechanisms of ionic currents via gap junctions in hippocampal CA1 pyramidal-cell models, and their potential contribution to local field potentials (LFPs) that is different from the contribution of chemical synapses. The biophysical argument regarding electric dipoles appears solid, but the evidence can be more convincing if their predictions are tested against experiments. A shortage of model validation and strictly comparable parameters used in the comparisons between chemical vs. junctional inputs makes the modeling approach incomplete; once strengthened, the finding can be of broad interest to electrophysiologists, who often make recordings from regions of neurons interconnected with gap junctions.

      We gratefully thank the editors and the reviewers for the time and effort in rigorously assessing our manuscript, for the constructive review process, for their enthusiastic responses to our study, and for the encouraging and thoughtful comments. We especially thank you for deeming our study to be a valuable exploration on the differential contributions of active dendritic gap junctions vs. chemical synapses to local field potentials. We thank you for your appreciation of the quantitative biophysical demonstration on the differences in electric dipoles that appear in extracellular potentials with gap junctions vs. chemical synapses.

      However, we are surprised by aspects of the assessment that resulted in deeming the approach incomplete, especially given the following with specific reference to the points raised:

      (1) Testing against experiments: With specific reference to gap junctions, quantitative experimental verification becomes extremely difficult because of the well-established non specificities associated with gap junctional modulators (Behrens et al., 2011; Rouach et al., 2003). In addition, genetic knockouts of gap junctional proteins are either lethal or involve functional compensation (Bedner et al., 2012; Lo, 1999), together making causal links to specific gap junctional contributions with currently available techniques infeasible.

      In addition, the complex interactions between co-existing chemical synaptic, gap junctional, and active dendritic contributions from several cell-types make the delineation of the contributions of specific components infeasible with experimental approaches. A computational approach is the only quantitative route to specifically delineate the contributions of individual components to extracellular potentials, as seen from studies that have addressed the question of active dendritic contributions to field potentials (Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Sinha & Narayanan, 2015, 2022) or spiking contributions to local field potentials (Buzsaki et al., 2012; Gold et al., 2006; Schomburg et al., 2012). The biophysically and morphologically realistic computational modeling route is therefore invaluable in assessing the impact of individual components to extracellular field potentials (Einevoll et al., 2019; Halnes et al., 2024).

      Together, we emphasize that the computational modeling route is currently the only quantitative methodology to delineate the contributions of gap junctions vs. chemical synapses to extracellular potentials.

      (2) Model validation: The model used in this study was adopted from a physiologically validated model from our laboratory (Roy & Narayanan, 2021). Please note that the original model was validated against several physiological measurements along the somatodendritic axis. We sincerely regret our oversight in not mentioning clearly that we have used an existing, thoroughly physiologically-validated model from our laboratory in this study.

      (3) Comparisons between chemical vs. junctional inputs: We had taken elaborate precautions in our experimental design to match the intracellular electrophysiological signatures with reference to synchronous as well as oscillatory inputs, irrespective of whether inputs arrived through gap junctions or chemical synapses. A new Supplementary Figure S3 has been added to address this concern raised by the reviewers.

      In the revised manuscript, we have addressed all the concerns raised by the reviewers in detail. We have provided point-by-point responses to reviewers’ helpful and constructive comments below. We thank the editors and the reviewers for this constructive review process, which helped us in improving our manuscript with specific reference to emphasizing the novelty of our approach and conclusions. The specific changes incorporated into the revised manuscript are detailed below.

      Reviewer #1 (Public review):

      This manuscript makes a significant contribution to the field by exploring the dichotomy between chemical synaptic and gap junctional contributions to extracellular potentials. While the study is comprehensive in its computational approach, adding experimental validation, network-level simulations, and expanded discussion on implications would elevate its impact further.

      We gratefully thank you for your time and effort in rigorously assessing our manuscript, for the enthusiastic response, and the encouraging and thoughtful comments on our study. In what follows, we have provided point-by-point responses to the specific comments.

      Strengths

      Novelty and Scope

      The manuscript provides a detailed investigation into the contrasting extracellular field potential (EFP) signatures arising from chemical synapses and gap junctions, an underexplored area in neuroscience. It highlights the critical role of active dendritic processes in shaping EFPs, pushing forward our understanding of how electrical and chemical synapses contribute differently to extracellular signals.

      We thank you for the positive comments on the novelty of our approach and how our study addresses an underexplored area in neuroscience. The assumptions about the passive nature of dendritic structures had indeed resulted in an underestimation of the contributions of gap junctions to extracellular potentials. Once the realities of active structures are accounted for, the contributions of gap junctions increases by several orders of magnitude compared to passive structures (Fig. 1D).

      Methodological Rigor

      The use of morphologically and biophysically realistic computational models for CA1 pyramidal neurons ensures that the findings are grounded in physiological relevance. Systematic analysis of various factors, including the presence of sodium, leak, and HCN channels, offers a clear dissection of how transmembrane currents shape EFPs.

      We thank you for your encouraging comments on the experimental design and methodological rigor of our approach.

      Biological Relevance

      The findings emphasize the importance of incorporating gap junctional inputs in analyses of extracellular signals, which have traditionally focused on chemical synapses. The observed polarity differences and spectral characteristics provide novel insights into how neural computations may differ based on the mode of synaptic input.

      We thank you for your positive comments on the biological relevance of our approach. We also gratefully thank you for emphasizing the two striking novelties unveiling the dichotomy between gap junctions and chemical synapses in their contributions to field potentials: polarity differences and spectral characteristics.

      Clarity and Depth

      The manuscript is well-structured, with a logical progression from synchronous input analyses to asynchronous and rhythmic inputs, ensuring comprehensive coverage of the topic.

      We sincerely thank you for the positive comments on the structure and comprehensive coverage of our manuscript encompassing different types of inputs that neurons typically receive.

      Weaknesses and Areas for Improvement

      Generality and Validation

      The study focuses exclusively on CA1 pyramidal neurons. Expanding the analysis to other cell types, such as interneurons or glial cells, would enhance the generalizability of the findings. Experimental validation of the computational predictions is entirely absent. Empirical data correlating the modeled EFPs with actual recordings would strengthen the claims.

      We thank you for raising this important point. The prime novelty and the principal conclusion of this study is that gap junctional contributions to extracellular field potentials are orders of magnitude higher when the active nature of cellular compartments are accounted for. The lacuna in the literature has been consequent to the assumption that cellular compartments are passive, resulting in the dogma that gap junctional contributions to field potentials are negligible. Despite knowledge about active dendritic structures for decades now, this assumption has kept studies from understanding or even exploring the contributions of gap junctions to field potentials. The rationale behind the choice of a computational approach to address the lacuna were as follows:

      (1) The complex interactions between co-existing chemical synaptic, gap junctional, and active dendritic contributions from several cell-types make the delineation of the contributions of specific components infeasible with experimental approaches. A computational approach is the only quantitative route to specifically delineate the contributions of individual components to extracellular potentials, as seen from studies that have addressed the question of active dendritic contributions to field potentials (Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Sinha & Narayanan, 2015, 2022) or spiking contributions to local field potentials (Buzsaki et al., 2012; Gold et al., 2006; Schomburg et al., 2012). The biophysically and morphologically realistic computational modeling route is therefore invaluable in assessing the impact of individual components to extracellular field potentials (Einevoll et al., 2019; Halnes et al., 2024).

      (2) With specific reference to gap junctions, quantitative experimental verification becomes extremely difficult because of the well-established non-specificities associated with gap junctional modulators (Behrens et al., 2011; Rouach et al., 2003). 'The non-specific actions of gap junctions are tabulated in Table 2 of (Szarka et al., 2021). In addition, genetic knockouts of gap junctional proteins are either lethal or involve functional compensation (Bedner et al., 2012; Lo, 1999), together making causal links to specific gap junctional contributions with currently available techniques infeasible.

      We highlight the novelty of our approach and of the conclusions about differences in extracellular signatures associated with active-dendritic chemical synapses and gap junctions, against these experimental difficulties. We emphasize that the computational modeling route is currently the only quantitative methodology to delineate the contributions of gap junctions vs. chemical synapses to extracellular potentials. Our analyses clearly demonstrates that gap junctions do contribute to extracellular potentials if the active nature of the cellular compartments is explicitly accounted for (Fig. 1D). We also show theoretically well-grounded and mechanistically elucidated differences in polarity (Figs. 1–3) as well as in spectral signatures (Figs. 5–8) of extracellular potentials associated with gap junctional vs. chemical synaptic inputs. Together, our fundamental demonstration in this study is the critical need to account for the active nature of cellular compartments in studying gap junctional contributions of extracellular potentials, with CA1 pyramidal neuronal dendrites used as an exemplar.

      In the revised version of the manuscript, we have emphasized the motivations for the approach we took, highlighting the specific novelties both in methodological and conceptual aspects, finally emphasizing the need to account for other cell types and gap junctional contributions therein. Importantly, we have emphasized the non-specificities associated with gap-junctional blockers as the reason why experimental delineation of gap junctional vs. chemical synaptic contributions to LFP becomes tedious. We believe that these points underscore the need for the computational approach that we took to address this important question, apart from the novelties of the study.

      In response to your constructive comments, we have added the following to the revised version of the manuscript, in the Introduction section as motivation for the specific route we took:

      “Given the complexity arising from the concurrent activity of chemical synapses, gap junctions, and active dendritic conductances across multiple neuronal populations, experimentally isolating the contributions of individual components to extracellular potentials remains highly challenging. To address this limitation, we employed a computational modeling approach, which provides a quantitative framework for systematically dissecting the distinct roles of specific cellular and synaptic elements. This strategy is consistent with previous studies that have successfully used computational methods to elucidate the contributions of active dendritic mechanisms to LFPs (Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Sinha & Narayanan, 2015, 2022) or spiking contributions to LFPs (Buzsaki et al., 2012; Gold et al., 2006; Schomburg et al., 2012). In addition, experimentally isolating the contribution of gap junctions is complicated by non-specific effects of available pharmacological modulators targeting these connections (Behrens et al., 2011; Rouach et al., 2003). Most genetic knockouts of gap junctional proteins are either lethal or trigger functional compensatory mechanisms (Bedner et al., 2012; Lo, 1999), thereby rendering causal attribution of specific gap junctional contributions infeasible with currently available experimental approaches. Consequently, biophysically and morphologically detailed computational modeling provides a crucial means to evaluate the impact of individual neuronal components on extracellular field potentials (Einevoll et al., 2019; Halnes et al., 2024).”

      We thank you for raising this point as this allowed us to expand on the specific motivations for the approach we took, and to present the specific novelties of our study to the analyses of extracellular field potentials. Thank you.

      Role of Active Dendritic Currents

      The paper emphasizes active dendritic currents, particularly the role of HCN channels in generating outward currents under certain conditions. However, further discussion of how this mechanism integrates into broader network dynamics is warranted.

      We thank you for this constructive suggestion. We agree that it is important to consider the implications for broader network dynamics of the outward HCN currents that are observed with synchronous inputs. In the revised manuscript, we have elaborated on the implications of the outward HCN current to network dynamics in detail. The following paragraph has been added to Discussion subsection on “Outward HCN currents regulate extracellular potentials”:

      “HCN channels play a critical role in shaping hippocampal network dynamics by modulating neuronal excitability, oscillatory behavior, and susceptibility to pathological states (Kessi et al., 2022; Magee, 1998; Mishra & Narayanan, 2025; Nolan et al., 2004). The outward-like properties of the HCN current we observed may have specific functional implications at different scales. At the cellular scale, the manifestation of outward current during action potentials or plateau potentials could contribute to after hyperpolarization thereby regulating firing properties. In cortical and hippocampal pyramidal neurons, most single-neuron processing occurs in their elaborate dendritic branches, where there is spatiotemporal summation of different synaptic potentials, plateau potentials, back propagating action potentials, and dendritic spikes (Johnston & Narayanan, 2008; Major et al., 2013; Stuart & Spruston, 2015). Considering the heavy expression of HCN channels in the dendrites of hippocampal and cortical pyramidal neurons (Kole et al., 2006; Lorincz et al., 2002; Magee, 1998; Williams & Stuart, 2000), the back propagating action potentials, plateau potentials, or dendritic spikes at dendritic location could yield outward currents. These outward currents could act as a hyperpolarizing mechanism that suppresses spatiotemporal summation of the different dendritic potentials.

      At the network scale, such regulation of dendritic potentials and somatic firing could contribute to overall reduction in firing rates of different neurons in the network. For instance, as inhibitory neurons typically elicit action potentials at higher frequencies, somatic outward HCN currents would occur more frequently in inhibitory neurons that express HCN channels compared to excitatory neurons. However, the heavy expression of HCN channels in the dendrites and the higher prevalence of dendritic spikes and plateau potentials in dendrites (Basak & Narayanan, 2018; Larkum et al., 2022; Moore et al., 2017) imply that the impact on outward HCN currents might be higher. Thus, the presence of outward HCN currents would regulate network balance of excitation inhibition in an activity-dependent manner. Additionally, the outward component of the current through HCN channels could contribute to stabilization of network synchrony by promoting spike phase coherence and to modulation of spike-LFP phase relationships (Das et al., 2017; Ness et al., 2016, 2018; Seenivasan & Narayanan, 2020; Sinha & Narayanan, 2015, 2022).

      Together, the outward HCN current could play critical roles in regulating several cellular and network functions including spatiotemporal summation within single neurons, amplitude and phase of different oscillations, excitatory-inhibitory interactions, and rate and temporal coding involved in spatial navigation (Hussaini et al., 2011; Nolan et al., 2004; O'Keefe & Recce, 1993). In the context of brain rhythms, future investigations are needed to explore ripple-frequency oscillations, specifically to assess whether high-frequency network interactions are modulated by HCN outward currents. Importantly, future studies could specifically focus on delineating the prevalence and specific contributions of outward currents through HCN channels to single-neuron and network physiology.”

      We thank you for highlighting this point, as it allowed us to elaborate the broader roles of HCN channels to single-cell computation, network dynamics, and field potentials. Thank you.

      Analysis of Plasticity

      While the manuscript mentions plasticity in the discussion, there are no simulations that account for activity-dependent changes in synaptic or gap junctional properties. Including such analyses could significantly enhance the relevance of the findings.

      We thank you for this constructive suggestion. Please note that we have presented consistent results for both fewer and more gap junctions in our analyses (Figure 1 with 217 gap junctions and Supplementary Figure 1 with 99 gap junctions). Thus, our fundamentally novel result that gap junctions onto active dendrites differentially shape LFPs holds true irrespective of the relative density of gap junctions onto the neuron. Thus, these results demonstrate that the conclusions about their contributions to LFP are invariant to plasticity in their gap junctional numerosity.

      We had only briefly mentioned plasticity in the Introduction to highlight the different modes of synaptic transmission and to emphasize that plasticity has been studied in both chemical synapses and gap junctions, playing a role in learning and adaptation. However, it seems that this wording inadvertently suggested that our study includes plasticity simulations. Therefore, we have removed that sentence from Introduction in the revised manuscript to ensure clarity.

      In the ‘Limitations of analyses and future studies’ section in Discussion, we suggested investigating the impact of plasticity mechanisms—specifically, activity-dependent plasticity of ion channels—on synaptic receptors vs. gap junctions and their effects on extracellular field potentials under various input conditions and plasticity combinations across different structures. We fully agree with the reviewer that such studies would offer valuable insights and further enhance the broader relevance of our findings. However, while our study implies this direction, it was not the primary focus of our investigation.

      In the revised manuscript, we have also expanded on intrinsic/synaptic plasticity and how they could contribute to LFPs (Sinha & Narayanan, 2015, 2022), while also pointing to simulations with different numbers of gap junction in this context. The following specific changes have been incorporated to the revised manuscript:

      Discussion subsection “Limitations of analyses and future directions”

      “We demonstrated that the contribution of gap junctions to extracellular field potentials remains consistent regardless of the number of gap junctions. Specifically, we showed that the distinct positive LFP deflections persisted irrespective of their relative density on neurons (Fig. 1 with 217 gap junctions and Supplementary Fig. 1 with 99 gap junctions). Previous studies have quantitatively demonstrated that intrinsic and synaptic plasticity modulate hippocampal LFPs and phase coding (Sinha & Narayanan, 2015, 2022). Future analyses should also assess the impact of activity-dependent plasticity in ion channels (on dendrites, axonal initial segments, and other compartments), in synaptic receptors, and in gap junctions (Andersen et al., 2006; Coulon & Landisman, 2017; Johnston & Narayanan, 2008; Magee & Grienberger, 2020; Mishra & Narayanan, 2021; Neves et al., 2008; O'Brien, 2014; Pereda, 2014; Vaughn & Haas, 2022) on extracellular potentials with various kinds of gap junctional inputs and different combinations of plasticity in various structures. Interactions among different forms of plasticity and how co-dependent plasticity in different components alters extracellular field potentials could provide deeper insights about physiological changes during learning and pathological changes observed in different neurological disorders (Sinha & Narayanan, 2022).”

      We thank you for highlighting this as this allowed us to improve on the specific focus of the manuscript and the study. Thank you.

      Frequency-Dependent Effects

      The study demonstrates that gap junctional inputs suppress highfrequency EFP power due to membrane filtering. However, it could delve deeper into the implications of this for different brain rhythms, such as gamma or ripple oscillations.

      We sincerely thank you for these insightful comments that we totally agree with. As it so happens, this manuscript forms the first part of a broader study where we explore the implications of gap junctions to ripple frequency oscillations. The ripple oscillations part of the work was presented as a poster in the Society for Neuroscience (SfN) annual meeting 2024 (Sirmaur & Narayanan, 2024). There, we simulate a neuropil made of hundreds of morphologically realistic neurons to assess the role of different synaptic inputs excitatory, inhibitory, and gap junctional and active dendrites to ripple frequency oscillations. We demonstrate there that the conclusions from single-neuron simulations in this current manuscript extend to a neuropil with several neurons, each receiving excitatory, inhibitory and gap-junctional inputs, especially with reference to high-frequency oscillations. Our network based analyses unveiled a dominant mediatory role of patterned inhibition in ripple generation, with recurrent excitations through chemical synapses and gap junctions in conjunction with return-current contributions from active dendrites playing regulatory roles in determining ripple characteristics (Sirmaur & Narayanan, 2024).

      Our principal goal in this study, therefore, was to lay the single-neuron foundation for network analyses of the impact of gap junctions on LFPs. We are preparing the network part of the study, with a strong focus on ripple-frequency oscillations, for submission for peer review separately. Please see abstract of our poster presented at the Society for Neuroscience annual meeting 2024 on the topic here: https://tinyurl.com/57ehvsep).

      In the revised manuscript, we have mentioned the results from our SfN abstract with reference to network simulations and high-frequency oscillations, while also presenting discussions from other studies on the role of gap junctions in synchrony and LFP oscillations. The following has been added to the revised manuscript under the Discussion subsection “High-frequency LFP power was suppressed with gap junctional inputs”:

      “In this context, our analyses lay the foundation for network analyses of the impact of gap junctions on LFPs. The conclusions from the single-neuron simulations in this study extend to a neuropil with several neurons, each receiving synaptic and gap junctional inputs, especially with reference to high-frequency ripple oscillations (Sirmaur & Narayanan, 2024). A neuropil made of hundreds of morphologically realistic pyramidal neurons was used to assess the role of different synaptic inputs excitatory, inhibitory, and gap junctional with different patterns of stimulation and active dendritic contributions to ripple-frequency oscillations. Network-based analyses have unveiled a dominant mediatory role of patterned inhibition in ripple generation, with recurrent excitations through chemical synapses and gap junctions, in conjunction with return-current contributions from active dendrites, playing modulatory roles in governing ripple characteristics (Sirmaur & Narayanan, 2024). Future studies could expand on these conclusions to explore the implications of frequency-dependent filtering (with reference to gap junctional coupling) on high-frequency extracellular oscillations.”

      We thank you for highlighting this point as it allowed us to expand on the implications for our analyses to brain rhythms, especially with reference to high-frequency oscillations. Thank you.

      Visualization

      Figures are dense and could benefit from more intuitive labeling and focused presentations. For example, isolating key differences between chemical and gap junctional inputs in distinct panels would improve clarity.

      We thank you for this constructive suggestion. We used the specific visualization throughout, where we place the outcomes associated with chemical synapses and gap junctions in the same figure, adjacent to each other. We believe that this offers visually intuitive distinction between the outcomes for chemical synapses and gap junctions, rather than placing them in different figures. Splitting them would place the outcomes in different figures and requires turning pages or placing two different figures adjacent to each other for quantitative comparison. We respectfully request that we be allowed to retain this form of visualization in the figures. Thank you.

      Contextual Relevance

      The manuscript touches on how these findings relate to known physiological roles of gap junctions (e.g., in gamma rhythms) but does not explore this in depth. Stronger integration of the results into known neural network dynamics would enhance its impact.

      We sincerely appreciate your valuable suggestion and acknowledge the importance of integrating our results into established neural network dynamics, particularly their implications for gamma rhythms. We have addressed this aspect in the revised version of our manuscript. We have added this to the Discussion subsection on “High-frequency LFP power was suppressed with gap junctional inputs” of the revised manuscript:

      “In the context of oscillations and gap-junctional coupling, electrical synapses have been shown to regulate the emergence and stability of the network interactions underlying rhythms of different frequencies, especially gamma-frequency oscillations (Bocian et al., 2009; Buhl et al., 2003; Draguhn et al., 1998; Hormuzdi et al., 2001; Konopacki et al., 2004; LeBeau et al., 2003; Posluszny, 2014; Traub et al., 2003). Specifically, both genetic and pharmacological manipulations of gap junctions have been shown to disrupt gamma rhythms. Genetic deletion of connexin-36 impairs the gamma oscillations associated with awake, active behavioral states (Buhl et al., 2003; Hormuzdi et al., 2001). High-frequency oscillations in the hippocampus have been shown to be sensitive to pharmacological agents like carbenoxolone and octanol that are known to inhibit gap junctions. Carbenoxolone has been known to reduce the transient gamma-frequency oscillations while octanol abolishes the persistent gamma rhythm (Draguhn et al., 1998; Hormuzdi et al., 2001; Posluszny, 2014; Traub et al., 2003). In the context of our results, where we demonstrate that the relative contributions of gap-junctional coupling to high-frequency extracellular potentials is low (Figs. 6–7), how do gap junctions contribute to enhanced extracellular gamma oscillations in these circuits?

      It should be noted that in hippocampal circuits, gamma oscillations emerge predominantly due to interactions between inhibitory interneurons through GABAA103046 receptors (Buzsaki & Wang, 2012; Colgin, 2016; Colgin & Moser, 2010; Wang, 2010; Wang & Buzsaki, 1996; Whittington et al., 1995). Thus, the presence of additional gap junctional coupling between these inhibitory neurons allows for tighter synchrony between these reciprocally inhibition-coupled neurons. In other words, the presence of gap junctions increases the probability of action potential generation in other neurons that are electrically coupled to them, together increasing the population of inhibitory neurons that elicit synchronous action potentials. When these synchronous action potentials act on the adjacent cells, both excitatory and inhibitory, the transmembrane GABAA receptor currents yield stronger gamma-frequency oscillations in the extracellular potentials (Draguhn et al., 1998; Hormuzdi et al., 2001; Posluszny, 2014; Traub et al., 2003). Thus, the stronger high-frequency oscillations observed in these scenarios is owing to the enhanced synchrony that is brought about the gap-junctional coupling, which translates to stronger transmembrane inhibitory receptor currents.

      These observations also strongly emphasize the utility of the computational approach we took in this study towards discerning the specific roles of gap junctions. Gap junctional coupling have strong physiological roles in terms of enhancing synchronous activity across the neurons that they couple and often express along with other receptors that connect the sets of neurons. Thus, the specific contributions of different neuronal components need to be studied with reference to how they contribute to physiological characteristics vs. their contributions to extracellular potentials. Thus, computational modeling offers an ideal route to understand the specific contributions of different neural-circuit components to extracellular field potentials and rhythms therein (Buzsaki et al., 2012; Einevoll et al., 2019; Einevoll et al., 2013; Sinha & Narayanan, 2022).”

      We thank you for highlighting this point as this allowed us to delineate the impact of gap junctions to regulating synchrony across connected neurons vs. modulating field potentials. Thank you.

      Reviewer #2 (Public review):

      This computational work examines whether the inputs that neurons receive through electrical synapses (gap junctions) have different signatures in the extracellular local field potential (LFP) compared to inputs via chemical synapses. The authors present the results of a series of model simulations where either electric or chemical synapses targeting a single hippocampal pyramidal neuron are activated in various spatio-temporal patterns, and the resulting LFP in the vicinity of the cell is calculated and analyzed. The authors find several notable qualitative differences between the LFP patterns evoked by gap junctions vs. chemical synapses. For some of these findings, the authors demonstrate convincingly that the observed differences are explained by the electric vs. chemical nature of the input, and these results likely generalize to other cell types. However, in other cases, it remains plausible (or even likely) that the differences are caused, at least partly, by other factors (such as different intracellular voltage responses due to, e.g., the unequal strengths of the inputs). Furthermore, it was not immediately clear to me how the results could be applied to analyze more realistic situations where neurons receive partially synchronized excitatory and inhibitory inputs via chemical and electric synapses.

      We gratefully thank you for your time and effort in rigorously assessing our manuscript, for the enthusiastic response, and the encouraging and thoughtful comments on our study. In what follows, we have provided point-by-point responses to the specific comments.

      Strengths

      The main strength of the paper is that it draws attention to the fact that inputs to a neuron via gap junctions are expected to give rise to a different extracellular electric field compared to inputs via chemical synapses, even if the intracellular effects of the two types of input are similar. This is because, unlike chemical synaptic inputs, inputs via gap junctions are not directly associated with transmembrane currents. This is a general result that holds independent of many details such as the cell types or neurotransmitters involved.

      We gratefully thank you for the positive comments and the encouraging words about the novel contributions of our study. We are particularly thankful to you for your comment on the generality of our conclusions that hold for different cell types and neurotransmitters involved.

      Another strength of the article is that the authors attempt to provide intuitive, non-technical explanations of most of their findings, which should make the paper readable also for non-expert audiences (including experimentalists).

      We sincerely thank you for the positive comments about the readability of the paper.

      Weaknesses

      The most problematic aspect of the paper relates to the methodology for comparing the effects of electric vs. chemical synaptic inputs on the LFP. The authors seem to suggest that the primary cause of all the differences seen in the various simulation experiments is the different nature of the input, and particularly the difference between the transmembrane current evoked by chemical synapses and the gap junctional current that does not involve the extracellular space. However, this is clearly an oversimplification: since no real attempt is made to quantitatively match the two conditions that are compared (e.g., regarding the strength and temporal profile of the inputs), the differences seen can be due to factors other than the electric vs. chemical nature of synapses. In fact, if inputs were identical in all parameters other than the transmembrane vs. directly injected nature of the current, the intracellular voltage responses and, consequently, the currents through voltage-gated and leak currents would also be the same, and the LFPs would differ exactly by the contribution of the transmembrane current evoked by the chemical synapse. This is evidently not the case for any of the simulated comparisons presented, and the differences in the membrane potential response are rather striking in several cases (e.g., in the case of random inputs, there is only one action potential with gap junctions, but multiple action potentials with chemical synapses). Consequently, it remains unclear which observed differences are fundamental in the sense that they are directly related to the electric vs. chemical nature of the input, and which differences can be attributed to other factors such as differences in the strength and pattern of the inputs (and the resulting difference in the neuronal electric response).

      We thank you for raising this important point. We would like to emphasize that our experimental design and analyses quantitatively account for the spatial distribution and temporal pattern of specific kinds of inputs that arrive through gap junctions and chemical synapses. We submit that our analyses quantitatively demonstrates that the fundamental difference between the gap junctional and chemical synaptic contributions to extracellular potentials is the absence of the direct transmembrane component from gap junctional inputs. We elucidate these points below:

      (1) Spatial distribution: The inputs were distributed randomly across the basal dendrites, irrespective of whether they were through gap junctions or chemical synapses. For both chemical synapses and gap junctions, the inputs were of the same nature: excitatory.

      (2) Different numbers of inputs: We have presented consistent results for both fewer and more gap junctions or chemical synapses in our analyses (see Figure 1 with 217 gap junctions or 245 chemical synapses and Supplementary Figure 2 with 99 gap junctions or 30 chemical synapses). Our fundamentally novel result that gap junctions onto active dendrites shape LFPs holds true irrespective of the relative density of gap junctions onto the neuron.

      (3) Synchronous inputs (Figs. 1–3): For chemical synapses, the waveforms are in the shape of postsynaptic potentials. For gap junctional inputs, the waveforms are in the shape of postsynaptic potentials or dendritic spikes (to respect the active nature of inputs from the other cell). Here, the electrical response of the postsynaptic cell is identical irrespective of whether inputs arrive through gap junctions or chemical synapses: an action potential. We quantitatively matched the strengths such that the model generated a single action potential in response to synchronous inputs, irrespective of whether they arrived through chemical synaptic and gap junctional inputs. We mechanistically analyzed the contributions of different cellular components and show that the direct transmembrane current in chemical synapses is the distinguishing factor that determines the dichotomy between the contributions of gap junctions vs. chemical synapses to extracellular potentials (Figs. 2–3). In the revised manuscript, we have shown the intracellular responses to demonstrate that they are electrically matched (new Supplementary Figure 3).

      (4) Random inputs (Fig. 4): For random inputs, we did not account for the number of action potentials that arrived, as the only observation we made here was with reference to the biphasic nature of the extracellular potentials with gap junctional inputs in the “No Sodium” scenario. We note that in the “No Sodium” scenario, the time-domain amplitudes were comparable for the field potentials (Fig. 4B, Fig. 4D).

      (5) Rhythmic inputs (Fig. 5–8): For rhythmic inputs, please note that the intracellular and extracellular waveforms for every frequency are provided in supplementary figures S5– S11. It may be noted that the intracellular responses are comparable. In simulations for assessing spike-LFP comparison, we tuned the strengths to produce a single spike per cycle, ensuring fair comparison of LFPs with gap junctions vs. chemical synapses.

      Taken together, we demonstrate through explicit sets of simulations and analyses that the differences in LFPs were not driven by the strength or patterns of the inputs but rather by the differences in direct transmembrane currents, which are subsequently reflected in the LFPs. In the revised manuscript, we have emphasized these points in the Discussion section, apart from providing intracellular traces for cases where they were not provided before (new Supplementary Figure 3):

      Discussion subsection “Dominance of active dendritic currents with LFP associated with gap junctions”

      “Our analyses quantitatively demonstrates that the fundamental difference between the gap junctional and chemical synaptic contributions to extracellular potentials is the absence of the direct transmembrane component from gap junctional inputs. A multitude of factors suggests that the observed LFP differences result not from variations in input strength or patterns but rather from differences in direct transmembrane currents, which are subsequently reflected in the LFP signals.

      First, the inputs were distributed randomly across the basal dendrites, irrespective of whether they were through gap junctions or chemical synapses. For both chemical synapses and gap junctions, the inputs were exclusively excitatory in nature.

      Second, the results remained consistent regardless of the number of gap junctions or chemical synapses. (Fig. 1 with 217 gap junctions or 245 chemical synapses and Supplementary Fig. 2 with 99 gap junctions or 30 chemical synapses). Our fundamentally novel result that gap junctions onto active dendrites shape LFPs holds true irrespective of the relative density of gap junctions onto the neuron.

      Third, for synchronous chemical synaptic inputs, the waveforms resembled typical postsynaptic potentials. Whereas, for gap junctional inputs, the waveforms showed characteristics of postsynaptic potentials or dendritic spikes (accounting the active nature of inputs from the potential presynaptic cells). Electrical response of postsynaptic cell remains identical, producing an action potential regardless of whether inputs arrive via gap junctions or chemical synapses. We quantitatively matched the strengths such that the model generated a single action potential in response to synchronous inputs, irrespective of whether they arrived through chemical synaptic or gap junctional inputs. We mechanistically analyzed the contributions of different cellular components and show that the direct transmembrane current in chemical synapses is the distinguishing factor that determines the dichotomy between the contributions of gap junctions vs. chemical synapses to extracellular potentials (Fig. 23).

      Fourth, for random inputs, the models were not specifically tuned to generate a single action potential. Here, the inputs served as a proxy for asynchronous inputs arriving from other subregions at random times.

      Finally, the intracellular responses were comparable for chemical synaptic and gap junctional rhythmic inputs (Supplementary Fig. S5-S11). Here, the model was tuned to elicit a single spike per cycle in simulations evaluating spike-LFP interactions, ensuring a fair comparison between LFPs from gap junctional and chemical synaptic inputs.”

      We have added a new Supplementary Figure 3 to the revised manuscript and have referred to this figure in the Results subsection. We thank you for raising these points as it allowed to elaborate on the several novelties and implications of our methodology and conclusions. Thank you.

      Some of the explanations offered for the effects of cellular manipulations on the LFP appear to be incomplete. More specifically, the authors observed that blocking leak channels significantly changed the shape of the LFP response to synchronous synaptic inputs - but only when electric inputs were used, and when sodium channels were intact. The authors seemed to attribute this phenomenon to a direct effect of leak currents on the extracellular potential - however, this appears unlikely both because it does not explain why blocking the leak conductance had no effect in the other cases, and because the leak current is several orders of magnitude smaller than the spike-generating currents that make the largest contributions to the LFP. An indirect effect mediated by interactions of the leak current with some voltage-gated currents appears to be the most likely explanation, but identifying the exact mechanism would require further simulation experiments and/or a detailed analysis of intracellular currents and the membrane potential in time and space.

      We thank you for raising this important question. Leak channels were among the several contributors to the positive deflection observed in LFPs associated with gap junctions. This effect was present not only in gap junctional models with intact sodium conductance but also in the no-sodium model, where the amplitude of the positive deflection was reduced across other models as well (Fig. 2F, I). Furthermore, even in the absence of leak conductance, a small positive deflection was still observed (Fig. 2F), leading us to further investigate other transmembrane currents over time and across spatial locations, from the proximal to the distal dendritic ends relative to the soma (Fig. 3D). We had observed that the dominant contributor in the case of chemical synapses was the inward synaptic current (Fig. 3A), whereas for gap junctions, the primary contributors were leak conductance along with other outward currents, such as potassium and HCN currents (Fig. 3D). Together, the direct transmembrane component of chemical synapses provides a dominant contribution to extracellular potentials. This dominance translates to differences in the relative contributions of indirect currents (including leak currents) to extracellular potentials associated chemical synaptic vs. gap junctional inputs. Our analyses of the exact ionic mechanisms (Fig. 3) demonstrates the involvement of several ion channels contributing to the indirect component in either scenario.

      In every simulation experiment in this study, inputs through electric synapses are modeled as intracellular current injections of pre-determined amplitude and time course based on the sampled dendritic voltage of potential synaptic partners. This is a major simplification that may have a significant impact on the results. First, the current through gap junctions depends on the voltage difference between the two connected cellular compartments and is thus sensitive to the membrane potential of the cell that is treated as the neuron "receiving" the input in this study (although, strictly speaking, there is no pre- or postsynaptic neuron in interactions mediated by gap junctions). This dependence on the membrane potential of the target neuron is completely missing here. A related second point is that gap junctions also change the apparent membrane resistance of the neurons they connect, effectively acting as additional shunting (or leak) conductance in the relevant compartments. This effect is completely missed by treating gap junctions as pure current sources.

      We thank you for raising this important point. We agree with the analyses presented by the reviewer on the importance of network simulations and bidirectional gap junctions that respect the voltages in both neurons. However, the complexities of LFP modeling precludes modeling of networks of morphologically realistic models with patterns of stimulations occurring across the dendritic tree. LFP modeling studies predominantly uses “post-synaptic” currents to analyze the impact of different patterns of inputs arriving on to a neuron, even when chemical synaptic inputs are considered. Explicitly, individual neurons are separately simulated with different patterns of synaptic inputs, the transmembrane current at different locations recorded, and the extracellular potential is then computed using line source approximation (Buzsaki et al., 2012; Gold et al., 2006; Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Schomburg et al., 2012; Sinha & Narayanan, 2015, 2022). Even in scenarios where a network is analyzed, a hybrid approach involving the outputs of a pointneuron-based network being coupled to an independent morphologically realistic neuronal model is employed (Hagen et al., 2016; Martinez-Canada et al., 2021; Mazzoni et al., 2015). Given the complexities associated with the computation of electrode potentials arising as a distance-weighted summation of several transmembrane currents, these simplifications becomes essential.

      Our approach models gap junctional currents in a similar way as the other model incorporate synaptic currents in LFP modeling (Buzsaki et al., 2012; Gold et al., 2006; Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Schomburg et al., 2012; Sinha & Narayanan, 2015, 2022). As gap junctions are typically implemented as resistors from the other neuronal compartment, we accounted for gap-junctional variability in our model by randomizing the scaling-factors and the exact waveforms that arrive through individual gap junctions at specific locations. Thus, the inputs were not pre-determined by “pre” neurons. Instead, the recorded voltages from potential synaptic partner neurons were randomized across locations and scaled using factors at the dendrites before being injected into the target neuron (Supplementary Fig. S1). While incorporating a network of interconnected neurons is indeed important, we utilized biophysical, morphologically realistic CA1 neuron model with different sets of input patterns to model LFPs, which were derived from the total transmembrane currents across all compartments of the multi-compartmental neuron model. Given the complexity of this approach, adding further network-level interactions or pre-post connections would have been computationally demanding.

      In the revised manuscript, we have elaborated on the general methodology used in LFP modeling studies to introduce synaptic currents. We have emphasized that our study extends this approach to modeling gap junctional inputs, while also highlighting randomization of locations and the scaling process in assigning gap junctional synaptic strengths. The following paragraphs were specifically added to the revised version of the manuscript:

      Methods subsection “Chemical synaptic and gap junctional inputs: Characteristics and temporal dynamics”:

      “The complexities of LFP modeling precludes modeling of networks of morphologically realistic models with patterns of stimulations occurring across the dendritic tree. LFP modeling studies predominantly uses post-synaptic currents to analyze the impact of different patterns of inputs arriving on to a neuron, even when chemical synaptic inputs are considered. Explicitly, individual neurons are separately simulated with different patterns of synaptic inputs, the transmembrane current at different locations recorded, and the extracellular potential is then computed using line source approximation (Buzsaki et al., 2012; Gold et al., 2006; Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Schomburg et al., 2012; Sinha & Narayanan, 2015, 2022). Even in scenarios where a network is analyzed, a hybrid approach involving the outputs of a point-neuron-based network being coupled to an independent morphologically realistic neuronal model is employed (Hagen et al., 2016; MartinezCanada et al., 2021; Mazzoni et al., 2015). Given the complexities associated with the computation of electrode potentials arising as a distance-weighted summation of several transmembrane currents, these simplifications become essential.”

      “Our approach models gap junctional currents in a similar way as the other model incorporate synaptic currents in LFP modeling (Buzsaki et al., 2012; Gold et al., 2006; Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Schomburg et al., 2012; Sinha & Narayanan, 2015, 2022). As gap junctions are typically implemented as resistors from the other neuronal compartment, we accounted for gap-junctional variability in our model by randomizing the scaling-factors and the exact waveforms that arrive through individual gap junctions at specific locations from potential presynaptic sources.”

      We thank for you highlighting these points as it allowed us to place our methodology in the specific context of the literature. Thank you.

      One prominent claim of the article that is emphasized even in the abstract is that HCN channels mediate an outward current in certain cases. Although this statement is technically correct, there are two reasons why I do not consider this a major finding of the paper. First, as the authors acknowledge, this is a trivial consequence of the relatively slow kinetics of HCN channels: when at least some of the channels are open, any input that is sufficiently fast and strong to take the membrane potential across the reversal potential of the channel will lead to the reversal of the polarity of the current. This effect is quite generic and well-known and is by no means specific to gap junctional inputs or even HCN channels. Second, and perhaps more importantly, the functional consequence of this reversed current through HCN channels is likely to be negligible. As clearly shown in Supplementary Figure S3, the HCN current becomes outward only for an extremely short time period during the action potential, which is also a period when several other currents are also active and likely dominant due to their much higher conductances. I also note that several of these relevant facts remain hidden in Figure 3, both because of its focus on peak values, and because of the radically different units on the vertical axes of the current plots.

      We thank you for raising this point and agree with you on every point. Please note that we do not assert that the outward HCN currents are exclusively associated with gap junctional inputs. Rather, our results show that synchronous inputs generate outward HCN currents in both chemical synapses (Fig. 3B; positive/outward HCN currents, except in the no sodium or leak model) and gap junctions (Fig. 3D; positive/outward HCN currents). We emphasized this in the case of gap junctions because, in the absence of inward synaptic currents, HCN (acting as outward currents with synchronous inputs) contributed to the positive deflection observed in the LFPs. While HCN would also contribute in the case of chemical synapses, its effect was negligible due to the presence of large inward synaptic currents. Since LFPs reflect the collective total transmembrane currents, the dominant contributors differ between these two scenarios, which we aimed to highlight. Since HCN exhibited outward currents in our synchronous input simulations, we have elaborated on this mechanism in the supplementary figure (Fig. S3). Our intention was not to emphasize this effect for only one synaptic mode but rather to highlight HCN's contribution to the positive deflection as one of the contributing factors.

      We agree that HCN currents are relatively small in magnitude; therefore, our conclusions were based on HCN being one of the several contributing factors. Leak conductance and other outward conductances, including HCN currents (Fig. 3D), collectively contribute to the positive deflections observed in the case of gap junctional synchronous inputs.

      In the revised manuscript, we have provided the following clarifications in the Results subsection on” Synchronous inputs: Outward transmembrane currents from active dendrites contribute to positive deflection in extracellular potentials associated with gap junctional inputs”:

      “It is important to note that despite their relatively small magnitude, the outward HCN currents (Fig. 3D) substantially contribute to positive extracellular potential deflections associated with gap junctional inputs (Fig. 2), together with leak and other outward conductances.”

      “While outward HCN currents (Fig. 3B) are also expected to influence LFPs under chemical synaptic input, their impact was minimal due to the predominance of large inward synaptic currents (Fig. 3A). As LFPs reflect the summation of all transmembrane currents, the dominant contributors vary across different modes of synaptic transmission.”

      We thank you for emphasizing this point. This allowed us to expand on the specific roles of HCN channels and potential contributions of the outward nature of the HCN current. We have also expanded the Discussion subsection on “Outward HCN currents regulate extracellular potentials” to elaborate on this aspect as well. Thank you.

      Finally, I missed an appropriate validation of the neuronal model used, and also the characterization of the effects of the in silico manipulations used on the basic behavior of the model. As far as I understand, the model in its current form has not been used in other studies. If this is the case, it would be important to demonstrate convincingly through (preferably quantitative) comparisons with experimental data using different protocols that the model captures the physiological behavior of at least the relevant compartments (in this case, the dendrites and the soma) of hippocampal pyramidal neurons sufficiently well that the results of the modeling study are relevant to the real biological system. In addition, the correct interpretation of various manipulations of the model would be strongly facilitated by investigating and discussing how the physiological properties of the model neuron are affected by these alterations.

      We thank you for raising this important point. The CA1 pyramidal neuronal model used in this study is built with ion-channel models derived from biophysical and electrophysiological recordings from these cells. As mentioned in the Methods section “Dynamics and distribution of active channels” and Supplementary Table S1, models for individual channels, their gating kinetics, and channel distributions across the somatodendritic arbor (wherever known) are all derived from their physiological equivalents. Importantly, these values were derived from previously validated models from the laboratory, which contain these very ion channel models and the exact same morphology (Roy & Narayanan, 2021). Please compare Supplementary Table S1 with Table 1 from (Roy & Narayanan, 2021). Please note that this model was validated against several physiological measurements along the somatodendritic axis (Fig. 1 of (Roy & Narayanan, 2021)).

      In the revised manuscript, we have explicitly mentioned this while also mentioning the different physiological properties that were used for the validation process from (Roy & Narayanan, 2021):

      Methods subsection “Pyramidal neuron model”

      “All parameters and their corresponding values for the neuronal model were derived from previously validated models (Roy & Narayanan, 2021). These CA1 models were validated against several physiological measurements along the somato dendritic axis (Roy & Narayanan, 2021).”

      “These channel distributions and the associated parametric values (Supplementary Table S1) were demonstrated to satisfy 22 different somato-dendritic measurements (Roy & Narayanan, 2021). Specifically, six physiological measurements input resistance, maximal impedance amplitude, resonance frequency, resonance strength, total inductive phase, and back-propagating action potential were validated with respective electrophysiological ranges at three somato-dendritic locations (Soma, ~150 µm dendrite, and ~300 µm dendrite) each (6×3=18 measurements). In addition, action potential firing frequency at each of 100 pA, 150 pA, 200 pA, and 250 pA (4 measurements) were also matched in the model to fall within the respective ranges of corresponding electrophysiological measurements. The electrophysiological ranges of intrinsic measurements were derived from respective somato-dendritic recordings (Malik et al., 2016; Narayanan et al., 2010; Narayanan & Johnston, 2007, 2008; Spruston et al., 1995). Together, the CA1 pyramidal model neuron used in this study was tuned to match several electrophysiological characteristics and ion-channel distributions (Roy & Narayanan, 2021).”

      We thank you for pointing us to this slip in elaborating on how the model was validated. We have now rectified this. Thank you.

      References

      Andersen, P., Morris, R., Amaral, D., Bliss, T., & O'Keefe, J. (2006). The hippocampus book. Oxford University Press.

      Basak, R., & Narayanan, R. (2018). Spatially dispersed synapses yield sharply-tuned place cell responses through dendritic spike initiation. Journal of Physiology, 596(17), 4173-4205. https://doi.org/10.1113/JP275310

      Bedner, P., Steinhauser, C., & Theis, M. (2012). Functional redundancy and compensation among members of gap junction protein families? Biochim Biophys Acta, 1818(8), 1971-1984. https://doi.org/10.1016/j.bbamem.2011.10.016

      Behrens, C. J., Ul Haq, R., Liotta, A., Anderson, M. L., & Heinemann, U. (2011). Nonspecific effects of the gap junction blocker mefloquine on fast hippocampal network oscillations in the adult rat in vitro. Neuroscience, 192, 11-19. https://doi.org/10.1016/j.neuroscience.2011.07.015

      Bocian, R., Posluszny, A., Kowalczyk, T., Golebiewski, H., & Konopacki, J. (2009). The effect of carbenoxolone on hippocampal formation theta rhythm in rats: in vitro and in vivo approaches. Brain Res Bull, 78(6), 290-298. https://doi.org/10.1016/j.brainresbull.2008.10.005

      Buhl, D. L., Harris, K. D., Hormuzdi, S. G., Monyer, H., & Buzsaki, G. (2003). Selective impairment of hippocampal gamma oscillations in connexin-36 knock-out mouse in vivo. J Neurosci, 23(3), 1013-1018. https://doi.org/10.1523/jneurosci.23-03-01013.2003

      Buzsaki, G., Anastassiou, C. A., & Koch, C. (2012). The origin of extracellular fields and currents--EEG, ECoG, LFP and spikes. Nat Rev Neurosci, 13(6), 407-420. https://doi.org/10.1038/nrn3241

      Buzsaki, G., & Wang, X. J. (2012). Mechanisms of gamma oscillations. Annual Review of Neuroscience, Vol 36, 35, 203-225. https://doi.org/10.1146/annurev-neuro-062111150444

      Colgin, L. L. (2016). Rhythms of the hippocampal network. Nat Rev Neurosci, 17(4), 239249. https://doi.org/10.1038/nrn.2016.21

      Colgin, L. L., & Moser, E. I. (2010). Gamma oscillations in the hippocampus. Physiology (Bethesda), 25(5), 319-329. https://doi.org/10.1152/physiol.00021.2010

      Coulon, P., & Landisman, C. E. (2017). The Potential Role of Gap Junctional Plasticity in the Regulation of State. Neuron, 93(6), 1275-1295. https://doi.org/10.1016/j.neuron.2017.02.041

      Das, A., Rathour, R. K., & Narayanan, R. (2017). Strings on a Violin: Location Dependence of Frequency Tuning in Active Dendrites. Front Cell Neurosci, 11, 72. https://doi.org/10.3389/fncel.2017.00072

      Draguhn, A., Traub, R. D., Schmitz, D., & Jefferys, J. G. (1998). Electrical coupling underlies high-frequency oscillations in the hippocampus in vitro. Nature, 394(6689), 189-192. https://doi.org/10.1038/28184

      Einevoll, G. T., Destexhe, A., Diesmann, M., Grun, S., Jirsa, V., de Kamps, M., Migliore, M., Ness, T. V., Plesser, H. E., & Schurmann, F. (2019). The Scientific Case for Brain Simulations. Neuron, 102(4), 735-744. https://doi.org/10.1016/j.neuron.2019.03.027

      Einevoll, G. T., Kayser, C., Logothetis, N. K., & Panzeri, S. (2013). Modelling and analysis of local field potentials for studying the function of cortical circuits. Nat Rev Neurosci, 14(11), 770-785. https://doi.org/10.1038/nrn3599

      Gold, C., Henze, D. A., Koch, C., & Buzsaki, G. (2006). On the origin of the extracellular action potential waveform: A modeling study. J Neurophysiol, 95(5), 3113-3128. https://doi.org/10.1152/jn.00979.2005

      Hagen, E., Dahmen, D., Stavrinou, M. L., Linden, H., Tetzlaff, T., van Albada, S. J., Grun, S., Diesmann, M., & Einevoll, G. T. (2016). Hybrid Scheme for Modeling Local Field Potentials from Point-Neuron Networks. Cereb Cortex, 26(12), 4461-4496. https://doi.org/10.1093/cercor/bhw237

      Halnes, G., Ness, T. V., Næss, S., Hagen, E., Pettersen, K. H., & Einevoll, G. T. (2024). Electric Brain Signals: Foundations and Applications of Biophysical Modeling. Cambridge University Press. https://doi.org/10.1017/9781009039826

      Hormuzdi, S. G., Pais, I., LeBeau, F. E., Towers, S. K., Rozov, A., Buhl, E. H., Whittington, M. A., & Monyer, H. (2001). Impaired electrical signaling disrupts gamma frequency oscillations in connexin 36-deficient mice. Neuron, 31(3), 487-495. https://doi.org/10.1016/s0896-6273(01)00387-7

      Hussaini, S. A., Kempadoo, K. A., Thuault, S. J., Siegelbaum, S. A., & Kandel, E. R. (2011). Increased size and stability of CA1 and CA3 place fields in HCN1 knockout mice. Neuron, 72(4), 643-653. https://doi.org/10.1016/j.neuron.2011.09.007

      Johnston, D., & Narayanan, R. (2008). Active dendrites: colorful wings of the mysterious butterflies. Trends Neurosci, 31(6), 309-316. https://doi.org/10.1016/j.tins.2008.03.004

      Kessi, M., Peng, J., Duan, H., He, H., Chen, B., Xiong, J., Wang, Y., Yang, L., Wang, G., Kiprotich, K., Bamgbade, O. A., He, F., & Yin, F. (2022). The Contribution of HCN Channelopathies in Different Epileptic Syndromes, Mechanisms, Modulators, and Potential Treatment Targets: A Systematic Review. Front Mol Neurosci, 15, 807202. https://doi.org/10.3389/fnmol.2022.807202

      Kole, M. H., Hallermann, S., & Stuart, G. J. (2006). Single Ih channels in pyramidal neuron dendrites: properties, distribution, and impact on action potential output [Research Support, Non-U.S. Gov't]. J Neurosci, 26(6), 1677-1687. https://doi.org/10.1523/JNEUROSCI.3664-05.2006

      Konopacki, J., Kowalczyk, T., & Golebiewski, H. (2004). Electrical coupling underlies theta oscillations recorded in hippocampal formation slices. Brain Res, 1019(1-2), 270-274. https://doi.org/10.1016/j.brainres.2004.05.083

      Larkum, M. E., Wu, J., Duverdin, S. A., & Gidon, A. (2022). The Guide to Dendritic Spikes of the Mammalian Cortex In Vitro and In Vivo. Neuroscience, 489, 15-33. https://doi.org/10.1016/j.neuroscience.2022.02.009

      LeBeau, F. E., Traub, R. D., Monyer, H., Whittington, M. A., & Buhl, E. H. (2003). The role of electrical signaling via gap junctions in the generation of fast network oscillations. Brain Res Bull, 62(1), 3-13. https://doi.org/10.1016/j.brainresbull.2003.07.004

      Lo, C. W. (1999). Genes, gene knockouts, and mutations in the analysis of gap junctions. Dev Genet, 24(1-2), 1-4. https://doi.org/10.1002/(SICI)1520-6408(1999)24:1/2%3C1::AID-DVG1%3E3.0.CO;2-U

      Lorincz, A., Notomi, T., Tamas, G., Shigemoto, R., & Nusser, Z. (2002). Polarized and compartment-dependent distribution of HCN1 in pyramidal cell dendrites. Nat Neurosci, 5(11), 1185-1193. https://doi.org/10.1038/nn962

      Magee, J. C. (1998). Dendritic hyperpolarization-activated currents modify the integrative properties of hippocampal CA1 pyramidal neurons. J Neurosci, 18(19), 7613-7624. https://doi.org/10.1523/jneurosci.18-19-07613.1998

      Magee, J. C., & Grienberger, C. (2020). Synaptic Plasticity Forms and Functions. Annual Review of Neuroscience, Vol 36, 43, 95-117. https://doi.org/10.1146/annurev-neuro090919-022842

      Major, G., Larkum, M. E., & Schiller, J. (2013). Active properties of neocortical pyramidal neuron dendrites [Review]. Annual Review of Neuroscience, Vol 36, 36, 1-24. https://doi.org/10.1146/annurev-neuro-062111-150343

      Malik, R., Dougherty, K. A., Parikh, K., Byrne, C., & Johnston, D. (2016). Mapping the electrophysiological and morphological properties of CA1 pyramidal neurons along the longitudinal hippocampal axis. Hippocampus, 26(3), 341-361. https://doi.org/10.1002/hipo.22526

      Martinez-Canada, P., Ness, T. V., Einevoll, G. T., Fellin, T., & Panzeri, S. (2021). Computation of the electroencephalogram (EEG) from network models of point neurons. PLoS Comput Biol, 17(4), e1008893. https://doi.org/10.1371/journal.pcbi.1008893

      Mazzoni, A., Linden, H., Cuntz, H., Lansner, A., Panzeri, S., & Einevoll, G. T. (2015). Computing the Local Field Potential (LFP) from Integrate-and-Fire Network Models. PLoS Comput Biol, 11(12), e1004584. https://doi.org/10.1371/journal.pcbi.1004584

      Mishra, P., & Narayanan, R. (2021). Stable continual learning through structured multiscale plasticity manifolds. Curr Opin Neurobiol, 70, 51-63. https://doi.org/10.1016/j.conb.2021.07.009

      Mishra, P., & Narayanan, R. (2025). The enigmatic HCN channels: A cellular neurophysiology perspective. Proteins, 93(1), 72-92. https://doi.org/10.1002/prot.26643

      Moore, J. J., Ravassard, P. M., Ho, D., Acharya, L., Kees, A. L., Vuong, C., & Mehta, M. R. (2017). Dynamics of cortical dendritic membrane potential and spikes in freely behaving rats. Science, 355(6331). https://doi.org/10.1126/science.aaj1497

      Narayanan, R., Dougherty, K. J., & Johnston, D. (2010). Calcium Store Depletion Induces Persistent Perisomatic Increases in the Functional Density of h Channels in Hippocampal Pyramidal Neurons. Neuron, 68(5), 921-935. https://doi.org/10.1016/j.neuron.2010.11.033

      Narayanan, R., & Johnston, D. (2007). Long-term potentiation in rat hippocampal neurons is accompanied by spatially widespread changes in intrinsic oscillatory dynamics and excitability. Neuron, 56(6), 1061-1075. https://doi.org/10.1016/j.neuron.2007.10.033

      Narayanan, R., & Johnston, D. (2008). The h channel mediates location dependence and plasticity of intrinsic phase response in rat hippocampal neurons. J Neurosci, 28(22), 5846-5860. https://doi.org/10.1523/JNEUROSCI.0835-08.2008

      Ness, T. V., Remme, M. W. H., & Einevoll, G. T. (2016). Active subthreshold dendritic conductances shape the local field potential. Journal of Physiology, 594(13), 38093825. https://doi.org/10.1113/JP272022

      Ness, T. V., Remme, M. W. H., & Einevoll, G. T. (2018). h-Type Membrane Current Shapes the Local Field Potential from Populations of Pyramidal Neurons. J Neurosci, 38(26), 6011-6024. https://doi.org/10.1523/jneurosci.3278-17.2018

      Neves, G., Cooke, S. F., & Bliss, T. V. (2008). Synaptic plasticity, memory and the hippocampus: a neural network approach to causality. Nat Rev Neurosci, 9(1), 65-75. https://doi.org/10.1038/nrn2303

      Nolan, M. F., Malleret, G., Dudman, J. T., Buhl, D. L., Santoro, B., Gibbs, E., Vronskaya, S., Buzsaki, G., Siegelbaum, S. A., Kandel, E. R., & Morozov, A. (2004). A behavioral role for dendritic integration: HCN1 channels constrain spatial memory and plasticity at inputs to distal dendrites of CA1 pyramidal neurons. Cell, 119(5), 719-732. https://doi.org/10.1016/j.cell.2004.11.020

      O'Brien, J. (2014). The ever-changing electrical synapse. Curr Opin Neurobiol, 29, 64-72. https://doi.org/10.1016/j.conb.2014.05.011

      O'Keefe, J., & Recce, M. L. (1993). Phase relationship between hippocampal place units and the EEG theta rhythm. Hippocampus, 3(3), 317-330. https://doi.org/10.1002/hipo.450030307

      Pereda, A. E. (2014). Electrical synapses and their functional interactions with chemical synapses. Nat Rev Neurosci, 15(4), 250-263. https://doi.org/10.1038/nrn3708

      Posluszny, A. (2014). The contribution of electrical synapses to field potential oscillations in the hippocampal formation. Front Neural Circuits, 8, 32. https://doi.org/10.3389/fncir.2014.00032

      Reimann, M. W., Anastassiou, C. A., Perin, R., Hill, S. L., Markram, H., & Koch, C. (2013). A biophysically detailed model of neocortical local field potentials predicts the critical role of active membrane currents. Neuron, 79(2), 375-390. https://doi.org/10.1016/j.neuron.2013.05.023

      Rouach, N., Segal, M., Koulakoff, A., Giaume, C., & Avignone, E. (2003). Carbenoxolone blockade of neuronal network activity in culture is not mediated by an action on gap junctions. Journal of Physiology, 553(Pt 3), 729-745. https://doi.org/10.1113/jphysiol.2003.053439

      Roy, A., & Narayanan, R. (2021). Spatial information transfer in hippocampal place cells depends on trial-to-trial variability, symmetry of place-field firing, and biophysical heterogeneities. Neural Netw, 142, 636-660. https://doi.org/10.1016/j.neunet.2021.07.026

      Schomburg, E. W., Anastassiou, C. A., Buzsaki, G., & Koch, C. (2012). The spiking component of oscillatory extracellular potentials in the rat hippocampus. J Neurosci, 32(34), 11798-11811. https://doi.org/10.1523/JNEUROSCI.0656-12.2012

      Seenivasan, P., & Narayanan, R. (2020). Efficient phase coding in hippocampal place cells. Physical Review Research, 2(3), 033393. https://doi.org/10.1103/PhysRevResearch.2.033393

      Sinha, M., & Narayanan, R. (2015). HCN channels enhance spike phase coherence and regulate the phase of spikes and LFPs in the theta-frequency range. Proc Natl Acad Sci U S A, 112(17), E2207-2216. https://doi.org/10.1073/pnas.1419017112

      Sinha, M., & Narayanan, R. (2022). Active Dendrites and Local Field Potentials: Biophysical Mechanisms and Computational Explorations. Neuroscience, 489, 111-142. https://doi.org/10.1016/j.neuroscience.2021.08.035

      Sirmaur, R., & Narayanan, R. (2024). Distinct extracellular signatures of chemical and electrical synapses impinging on active dendrites differentially contribute to ripplefrequency oscillations. Society for Neuroscience annual meeting, Chicago, USA.

      Spruston, N., Schiller, Y., Stuart, G., & Sakmann, B. (1995). Activity-dependent action potential invasion and calcium influx into hippocampal CA1 dendrites [Research Support, Non-U.S. Gov't]. Science, 268(5208), 297-300. https://doi.org/10.1126/science.7716524

      Stuart, G. J., & Spruston, N. (2015). Dendritic integration: 60 years of progress. Nat Neurosci, 18(12), 1713-1721. https://doi.org/10.1038/nn.4157

      Szarka, G., Balogh, M., Tengolics, A. J., Ganczer, A., Volgyi, B., & Kovacs-Oller, T. (2021). The role of gap junctions in cell death and neuromodulation in the retina. Neural Regen Res, 16(10), 1911-1920. https://doi.org/10.4103/1673-5374.308069

      Traub, R. D., Cunningham, M. O., Gloveli, T., LeBeau, F. E., Bibbig, A., Buhl, E. H., & Whittington, M. A. (2003). GABA-enhanced collective behavior in neuronal axons underlies persistent gamma-frequency oscillations. Proc Natl Acad Sci U S A, 100(19), 11047-11052. https://doi.org/10.1073/pnas.1934854100

      Vaughn, M. J., & Haas, J. S. (2022). On the Diverse Functions of Electrical Synapses. Front Cell Neurosci, 16, 910015. https://doi.org/10.3389/fncel.2022.910015

      Wang, X. J. (2010). Neurophysiological and computational principles of cortical rhythms in cognition. Physiol Rev, 90(3), 1195-1268. https://doi.org/10.1152/physrev.00035.2008

      Wang, X. J., & Buzsaki, G. (1996). Gamma oscillation by synaptic inhibition in a hippocampal interneuronal network model. J Neurosci, 16(20), 6402-6413. https://doi.org/10.1523/jneurosci.16-20-06402.1996

      Whittington, M. A., Traub, R. D., & Jefferys, J. G. (1995). Synchronized oscillations in interneuron networks driven by metabotropic glutamate receptor activation. Nature, 373(6515), 612-615. https://doi.org/10.1038/373612a0

      Williams, S. R., & Stuart, G. J. (2000). Site independence of EPSP time course is mediated by dendritic I(h) in neocortical pyramidal neurons [In Vitro]. J Neurophysiol, 83(5), 3177-3182. https://doi.org/10.1152/jn.2000.83.5.3177

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors Hall et al. establish a purification method for snake venom metalloproteinases (SVMPs). By generating a generic approach to purify this divergent class of recombinant proteins, they enhance the field's accessibility to larger quantities of SVMPs with confirmed activity and, for some, characterized kinetics. In some cases, the recombinant protein displayed comparable substrate specificity and substrate recognition compared to the native enzyme, providing convincing evidence of the authors' successful recombinant expression strategy. Beyond describing their route towards protein purification, they further provide evidence for self-activation upon Zn2+ incubation. They further provide insights on how to design high-throughput screening (HTS) methods for drug discovery and outline future perspectives for the in-depth characterization of these enzyme classes to enable the development of novel biomedical applications.

      Strengths:

      The study is well-presented and structured in a compelling way. The purification strategy results in highly pure protein products, well characterized by size exclusion chromatography, SDS page as well as confirmed by mass spectrometry analysis. Further, a significant portion of the manuscript focuses on enzyme activity, thereby validating function. Particularly convincing is the comparability between recombinant vs. native enzymes; this is successfully exemplified by insulin B digestion. By testing the fluorogenic substrate, the authors provide evidence that their production method of recombinant protein can open up possibilities in HTS. Since their purification method can be applied to three structurally variable SVMP classes, this demonstrates the robust nature of the approach.

      We thank the reviewer for their positive assessment of our work.

      Weaknesses:

      The universal applicability of the approach could be emphasized more clearly. The potential for this generic protocol for recombinant SVMP zymogen production to be adapted to other SVMPs is somewhat obscured by the detailed optimization steps. A general schematic overview would strengthen the manuscript, presented as a final model, to illustrate how this strategy can be extended to other targets with similar features. Such a schematic might, for example, outline the propeptide fusion design, including its tags, relevant optimizations during expression, lysis, purification (e.g., strategies for metal ion removal and maintenance of protease inactivity), as well as the controllable auto-activation.

      In the revised version of the manuscript, we moved the detailed description of the optimisation of SVMP expression, including mature SVMP expression, Marimastat addition, active site mutations and fusion of propeptides, into the supplement as supplementary text. We hope this improves the clarity and flow. As suggested, we now include a new figure outlining the SVMP production strategy and optimisation steps in the revised manuscript (new Figure S1).

      The product obtained from the purification protocol appears to be a heterogeneous mixture of selfactivated and intact protein species. The protocol would benefit from improved control over the selfactivation process. The Methods section does not indicate whether residual metal ions were attempted to be removed during the purification, which could influence premature activation.

      We agree that improved control of self-activation would be desirable. However, there is an issue: Previous studies reported that (1) SVMP zymogens are processed within secretory cells of the venom gland (Portes-Junior et al., 2014), and (2) mature SVMPs accumulate in secretory vesicles during venom production (Carneiro et al., 2002). Accordingly, preventing the auto-processing of SVMP zymogens is difficult to achieve because this would require Zn<sup>2+</sup> depletion within the insect cells during production which would result in cytotoxicity. We have included this information in the updated Discussion section of the revised manuscript.

      Additionally, it has not been discussed whether the shift to pH 8 in the purification process is necessary from the initial steps onwards, given that a lower pH would be expected to maintain enzyme latency.

      The shift to pH 8 is required for the affinity purification of the SVMP zymogens from the medium, involving the poly-histidine-tag and immobilized metal affinity chromatography (IMAC). At lower pH, the histidines would become protonated, preventing binding of the His-tag to the column. Thus, with the His-tag the shift to pH 7.5 or pH 8 is necessary.

      The characterization of PIII activity using the fluorogenic peptide effectively links the project to its broader implications for drug design. However, the absence of comparable solutions for PI and PII classes limits the overall scope and impact of the finding.

      We agree that such assays would be extremely useful. However, the development of fluorescence based high-throughput assays to test for PI and PII SVMP activity is beyond the scope of this study. Here, our overarching objective is to report a broadly applicable production method for PI, PII and PIII SVMPs.

      Overall, the authors successfully purified active SVMP proteins of all three structurally diverse classes in high quality and provided convincing evidence throughout the manuscript to support their claims. The described method will be of use for a broader community working with self-activating and cytotoxic proteases.

      Thank you.

      Reviewer #2 (Public review):

      Summary:

      The aim of the study by Hall et al. was to establish a generic method for the production of Snake Venom Metalloproteases (SVMPs). These have been difficult to purify in the mg quantities required for mechanistic, biochemical, and structural studies.

      Strengths:

      The authors have successfully applied the MultiBac system and describe with a high level of detail the downstream purification methods applied to purify the SVMP PI, PII, and PIII. The paper carefully presents the non-successful approaches taken (such as expression of mature proteins, the use of protease inhibitors, prodomain segments, and co-expression of disulfide-isomerases) before establishing the construct and expression conditions required. The authors finally convincingly describe various activity assays to demonstrate the activity of the purified enzymes in a variety of established SVMP assays.

      We thank the reviewer for their positive assessment of our work.

      Weaknesses:

      The manuscript suffers from a lack of bottoming out and stringent scientific procedures in the methodology and the characterization of the generated enzymes.

      As an example, a further characterization of the generated protein fragments in Figure 3 by intact mass spectroscopy would have aided in accurate mass determination rather than relying on SEC elution volumes against a standard. Protein shape and charge can affect migration in SEC.

      We agree that intact MS would be useful to determine the mass of the produced SVMPs. In this manuscript, we performed SEC as a purification step, removing aggregates. Furthermore, SEC allowed determining if the SVMPs form monomers or dimers. MS characterisation of intact SVMPs (and their PTMs) is not trivial and beyond the scope of this manuscript (see below).

      Also, the analysis of N-linked glycosylation demonstrates some reactivity of PIII to PNGase F, but fails to conclude whether one or more sites are occupied, or whether other types of glycosylation is present. Again, intact mass experiments would have resolved such issues.

      We concur that glycosylation of SVMPs is an important question. However, analysing the glycosylation of the SVMPs is beyond the scope of this manuscript; it is actually a project on its own: Intact MS can indeed provide information on glycosylation but is not very precise. Unambiguous assignment of the number and occupancy of glycosylation sites is more challenging, especially for large, glycosylated proteins such as our PIII SVMP zymogen. In practice, confident mapping of glycosylation sites would require peptide-level mass spectrometry following enzymatic digestion (Trypsin and Multi-Enzymatic Limited Digestion, ideally). Sample preparation, method optimization, MS acquisition, and data analysis together would require a significant investment. Moreover, we do not have access to the native PIII SVMP from Echis carinatus sochureki venom - this is the main point of our manuscript: we describe a protocol to produce SVMPs which could not be purified from venom. Therefore, a comparison of the glycosylation of the recombinant SVMP and the native SVMP cannot be performed unfortunately (see below).

      The activity assays in Figure 4 are not performed consistently with kinetic assays and degradation assays performed for some, but not all, enzymes, and there is no Echis ocellatus comparison in Figure 4h.

      This is correct. The suggested control experiment is not possible for the PII SVMP and PIII SVMP because we cannot purify the native PII and PIII SVMPs from Echis venom. We have highlighted this information in the revised manuscript in the insulin B degradation section.

      Overall, whilst not affecting the main conclusion, this leaves the reader with an impression of preliminary data being presented. For consistency, application of the same assays to all enzymes (high-grade purified) would have provided the reader with a fuller picture.

      In the revised manuscript, we included new data showing the requested characterisations of all three SVMPs.

      We have included the respective assays in Figure 5 and Supplementary Figure S11. In the original manuscript, we had omitted these assays as the data show no enzymatic activity in the respective assays. Specifically, we show that (1) PII does not cause insulin B degradation (Fig. S11b), (2) that the PI and PII SVMPs do not degrade the fluorogenic peptide which is prototypic for PIII SVMPs and MMPs (Fig. S11a), (3) PI and PIII do not cause platelet aggregation because they lack the entire disintegrin domain (PI) or the RGD motif (PIII) (Fig. 5a), and (4) that the PI and PII SVMPs, like the PIII SVMP, are not pro-coagulant and do not cause blood clotting (Fig. 5d,5e and Fig. S11c). We also included this new information in the main text of our revised manuscript.

      Overall, the data presented demonstrates a very credible path for the production of active SVMP for further downstream characterization. The generality of the approach to all SVMP from different snakes remains to be demonstrated by the community, but if generally applicable, the method will enable numerous studies with the aim of either utilizing SVMPS as therapeutic agents or to enable the generation of specific anti-venom reagents, such as antibodies or small molecule inhibitors.

      Thank you.

      Reviewer #3 (Public review):

      Summary:

      The presented study describes the long journey towards the expression of members' SVMP toxins from snake venom, which are toxins of major importance in a snakebite scenario. As in the past, their functional analysis relied on challenging isolation; the toxins' heterologous expression offers a potential solution to some major obstacles hindering a better understanding of toxin pathophysiology. Through a series of laborious and elegantly crafted experiments, including the reporting of various failed attempts, the authors establish the expression of all three SVMP subtypes and prove their activity in bioassays. The expression is carried out as naturally occurring zymogens that autocleave upon exposure to zinc, which is a novel modus operandi for yielding fusion proteins and sheds also some new light on the potential mechanism that snakes use to activate enzymatic toxins from zymogenic preforms.

      Strengths:

      The manuscript draws from an extensive portfolio of well-reasoned and hypothesis-driven experiments that lead to a stepwise solution. The wetlands data generated is outstanding, although not all experiments along this rocky road to victory were successful. A major strength of the paper is that, translationally speaking, it opens up novel routes for biodiscovery since a first reliable platform for expression of an understudied, yet potent toxin class is established. The discovered strategy to pursue expression as zymogens could see broad application in venom biotechnology, where several toxin types are pending successful expression. The work further provides better insights into how snake toxins are processed.

      We thank the reviewer for their positive assessment of our work.

      Weaknesses:

      The manuscript contains several chapters reporting failed experiments, which makes it difficult to follow in places.

      Based on a similar comment of Reviewer 1, we now moved the ‘failed’ experiments reporting on SVMP expression optimisation to the supplement as new supplementary text. We hope that the revisions have improved the clarity and overall readability of our manuscript.

      The reporting of experimental details, especially sample sizes and replicates, could be optimised.

      The number of replicates has now been added to the figure legends in the revised manuscript. Detailed experimental information is found in the revised Methods part.

      At the time of writing, it remains unclear whether the glycosilations detected at a pIII SVMP could have an impact on the bioactivities measured, which is a major aspect, and future follow-ups should clarify this.

      A detailed analysis of glycosylation of the PIII SVMP is beyond the scope of our manuscript (see above, response to Reviewer 2). Our manuscript describes a generic protocol to produce active SVMPs. Importantly, we cannot purify the native PIII SVMP from Echis carinatus sochureki venom. Therefore, it is not possible to compare our PIII SVMP with the native PIII SVMP.

      We agree that this is an important question, and we will aim in the future to perform such a comparison of a different insect cell-produced PIII with a native PIII SVMP that can be readily purified from venom.

      Finally, the work, albeit of critical importance, would benefit from a more down-to-earth evaluation of its findings, as still various persistent obstacles that need to be overcome.

      We consider cytotoxicity to be the principal bottleneck in SVMP production. In this study, we present a strategy to overcome this bottleneck.

      Major comments to the manuscript:

      (1) Lines 148-149: "indicating that expressing inactivated SVMPs could be a viable, although inefficient, approach". I think this text serves a good purpose to express some thoughts on the nature of how the current draft is set up. It is quite established that various proteases cause extreme viability losses to their expression host (whether due to toxicity, but surely also because of metabolic burden), which is why their expression as inactive fusion proteins is the default strategy in all cases I have thus far seen. I believe that, especially in venom studies, this is of importance given the increased toxicity often targeting cellular integrity, and especially here, because Echis are known to feed on arthropods at younger life history stages, making it very likely that some venom components are especially active against insects and other invertebrates. With that in mind, I would argue that exploring their production in inactive form is the obvious strategy one would come up with and not really the conclusion of a series of (well-conducted and scientifically sound!) experiments. For me, the insight of inactive expression is largely confirmatory of what is established, unless I miss something in the authors' rationale. If yes, it would be important to clarify that in the online version.

      We agree that producing zymogens represents a straightforward strategy and now, in hindsight, would have wished we had tested this first thing, it would have saved us and apparently many others significant effort. However, realising this, and implementing this approach took us considerable time and insight as we described in this manuscript. The alternative strategies we describe in the manuscript, in particular the use of inhibitors and active-site mutation, have been successfully applied for recombinant production of diverse enzymes before, including enzymes that are toxic to host cells.

      We have revised the manuscript as requested and moved the optimisation of SVMP expression to the Supplement. We hope this improved the clarity, overall readability of the text and thus addressed the reviewer’s comment.

      (2) Line 173: Here, Alphafold 3 was used, whereas in previous sections (e.g., line 153, line 210), it was Alphafold 2. I suggest using one release across the manuscript.

      Thank you for bringing this to our attention. In the revised version of the manuscript, we clarified that all models were generated using AlphaFold 3.

      (3) Line 252-254: I fully agree, the PIII SVMP is glycosylated. Glycosylation is an important mediator of snake venom activity, and several works have described their importance in the field. This raises the question, which glycosylations have been introduced here in the SVMP, and to verify that these are glycosylations that belong to those found in snakes. This is important as insects facilitate thousands of N- and O- O-glycosylations to modulate the activity of their proteome, of which many are specific to insects. If some of these were integrated into the SVMP, this could have an impact on downstream produced bioassays and also antigenicity (the surface would be somewhat different from natural toxins, causing different selection).

      We agree that glycosylation is important and warrants a follow-up in the future.

      However, most publications we found reported that de-glycosylation has a negative effect on stability and solubility of SVMPs, which is expected to have a knock-on effect on toxin activity (e.g. AndradeSilva et al., 2025; DOI: 10.1021/acs.jproteome.5c00249). It will be difficult to separate the two effects from each other. We found only a few examples where SVMP glycosylation (sialylation and Nglycosylation) modulated proteolytic and haemorrhagic functions, including interaction with substrates such as e.g. fibrinogen (Schluga et al., 2024; https://doi.org/10.3390/toxins16110486; Chen et al., 2008; 10.1111/j.1742-4658.2008.06540.x; Nikai et al., 2000; DOI: 10.1006/abbi.2000.1795. PMID: 10871038). In our manuscript, we show that our PIII SVMP is very cytotoxic and highly active in casein, fibrinogen and ESO10 degradation assays, with a K<sub>M</sub> and k<sub>cat</sub>/K<sub>M</sub> comparing favourably with other SVMPs and MMPs. We are not aware of a specific substrate for this particular PIII SVMP that depends on a distinct glycosylation pattern. Recombinant production of such SVMPs with specific glycosylation pattern requirement would be a challenge in all commonly used expression systems (yeast, plant, insect cells and mammalian cells). In fact, insect cell expression systems could be advantageous in this respect because the Sf21 and High Five (Hi5) lepidopteran cell lines we utilised are well-characterized for their ability to perform posttranslational modifications on complex secreted proteins:

      (1) N-Glycan conservation: Both Sf21 and Hi5 cells typically produce N-glycans that are trimmed to a core 'paucimannose' structure (Man3GlcNAc2), often with an alpha1,6-fucosylation. While snakes can produce more complex, sialylated N-glycans, glycomic studies of native venoms (e.g., Bothrops venom) have demonstrated that high-mannose and paucimannose structures are also prevalent in native SVMPs. Therefore, the recombinant glycoforms produced in our system are not 'unnatural' in the snake venom context but rather represent a subset of the native glycan microheterogeneity.

      (2) Occupancy vs structure: The critical function of glycosylation in PIII SVMPs is thought to be often structural, facilitating correct folding and protecting the large metalloprotease and disintegrin-like domains from proteolytic degradation. Because Sf21 and Hi5 cells recognize the same Nglycosylation sequon (Asn-X-Ser/Thr) as reptilian cells, the site-occupancy remains consistent with the native protein, preserving the overall topography of the toxin.

      (3) Activity and authentic self-processing: We acknowledge that insect-specific alpha1,3-fucosylation can occur in Hi5 cells and is potentially antigenic. As the recombinant SVMPs will be used for binder selections and for testing in silico designed binders, useful binders will be selected based on neutralising activity against venom toxins. Here, our assays focused on auto-activation and proteolytic activity, which is primarily driven by the catalytic Zn<sup>2+</sup>-site and the protein backbone.

      As stated above, analysis of glycosylation pattern of the PIII SVMP is a project on its own and beyond the scope of this manuscript.

      We have incorporated some of the above information into the discussion section of the revised manuscript to clarify that insect cell glycosylation does not recapitulate the full diversity of SVMP glycosylation observed in native venoms.

      (4) General comment for the bioassays: It would be good to specify the replicates again and report the data, including standard deviations.

      We included this information in the figure legends.

      Discussion:

      I think the data generated in the study is very valuable and will be instrumental for pushing the frontiers in SVMP research, but still I would like to see a bit of modesty in their discussion. As I have pointed out above, it is unclear which effect the glycosilations may have (i.e., are the glycosilations found reminiscent of natural ones?), despite their being functionally important. Also, yes, isolation of SVMPs is challenging, but the reality is that their expression is equally challenging, as evidenced by the heaps of presented negative data (with which I have no problems, I think reporting such is actually important). So far, the "generic" protocol has been used to express one member per structural class of Echis SVMP, but no evidence is provided that it would work equally well on other members from taxonomically more distant snakes (e.g., the pIII known from Naja oxiana). It is very likely, but at the time of writing, purely speculative.

      We have expressed additional PIII SVMPs from Echis and Daboia species and will report their production and characterisation in due course.

      Lastly, the reality is also that the expression in insect cells can only be carried out by highly specialized labs (even in the expression world, as most laboratories work with bacterial or fungal hosts), whereas the isolation can be attempted in most venom labs. That said, production in insect cells also has economic repercussions as it will be very challenging to generate yields that are economically viable versus other systems, which is pivotal because the authors talk about bioprospecting and the toxins used in snakebite agent research.

      We thank the reviewer for this perspective on the practicalities of protein expression. However, we respectfully disagree with the characterization of insect cell expression as an inaccessible or economically non-viable platform for toxin research. We offer the following points:

      (1) Prevalence and accessibility: Contrary to the suggestion that insect cell expression is restricted to highly specialized labs, the Baculovirus Expression Vector System (BEVS) has become a cornerstone of modern biologics production, structural biology and biochemistry. For instance, our MultiBac system (which is but one of several systems currently widely in use) is utilised by over 1,000 laboratories and institutions, academic and pharma/biotech, worldwide. The maturation of commercially available kits, automated platforms, and standardized protocols has moved this technology into the mainstream, making it a standard tool for any lab requiring high-quality eukaryotic proteins.

      (2) Biological necessity: Bacterial (E. coli) and fungal (P. pastoris) systems are widely accessible, however, they appear to be fundamentally incapable of producing functional SVMPs. SVMPs require complex disulfide-bond formation, intricate folding, and N-glycosylation for stability and solubility. Bacterial systems have been widely tried by us and others but typically result in very low expression or misfolded inclusion bodies. Of note, originally, we had invested significant effort to adapt P. pastoris to the production of eukaryotic proteins we are interested in, without success, before moving on to the MultiBac system. The SVMPs that we analysed here are highly cytotoxic, rendering the baculovirus/insect cell system in a way a logical choice given that the cells are no longer 'living' after infection with the baculovirus (but more akin membrane-enveloped bioreactors). Thus, one can make the argument that insect cells represent the most accessible middle ground that provides folding apparatus and necessary post-translational modifications (PTMs) required for biological relevance, and it is possible to produce mg amounts of SVMP proteins per litre cell culture as reported here in our manuscript.

      (3) Economic viability and bioprospecting: Regarding the economic argument, we contend that viability in bioprospecting is defined by functional yield rather than simple volume. Producing large quantities of non-functional or misfolded protein in a cheaper system is economically inefficient. Furthermore, for snakebite research, the ability to produce specific, pure isoforms recombinantly without the contamination of other toxic venom components found in native isolations is essential for high-throughput screening and drug design.

      (4) Scalability: Historically, insect cell production was seen as expensive, but current bioreactor technology and reduction in consumables and media costs allow for significant scaling. Many therapeutic reagents (vaccines, viral vectors, protein biologics) are produced routinely in baculovirus/insect cells. For the purposes of bioprospecting and lead identification, the yields provided by our Hi5/Sf21 system are sufficient for rigorous downstream bioassays and structural characterization.

      Again, I believe the paper is highly important and excellently crafted, but I think especially the discussion should see some refinement to address the drawbacks and to evaluate the paper's findings with more modesty.

      Thank you. We included the discussion about glycosylation patterns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It is not entirely clear to me if the final constructs are indeed "fusion-proteins" (line 172, 974), in the sense of chimeric proteins. From the current description, it appears that the prodomain is encoded in the same gene rather than fused as a separate domain. Thus, referring to these constructs as fusion proteins may overstate the degree of protein engineering involved in the study.

      This is correct. In the revised manuscript, ‘fusion protein’ is only used in the context of the propeptide SVMP fusion construct to avoid confusion.

      (2) Figure 2J: It is difficult to assess how much protein is secreted relative to the intracellular amounts. The blot is surely misleading, as the effective protein dilution differs substantially between intracellularly vs. extracellularly. Providing an estimate of the relative dilution of extracellular protein would help clarify the extent of secretion.

      We estimate that the SNP and SN fractions are at least 10-times more concentrated than the media fraction. The blot is analytical and not quantitative.

      (3) The manuscript appears to use both alphafold 2 and alphafold 3 for structural predictions. Clarification on the choice of the version and its impact on results would improve consistency.

      In the revised version of the manuscript, we clarify that all structural models were generated using AlphaFold 3.

      (4) Figure S3b and others: a clear description of the antibodies used in the Western blots would be appreciated (including in the methods).

      We included this information in the figure legends and a paragraph in the methods section for Western blots in the revised manuscript.

      (5) MTT cytotoxicity testing would be more convincing if done in a concentration-dependent manner.

      We repeated this assay using different concentrations of SVMPs and show the results as a new Figure 5f in the revised manuscript.

      (6) Figure S3c: It could be interesting to show the sequence coverage to get an impression of what part of the protein is there.

      We have included this information as Supplementary Figure S4d in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Overall, the study is presented in a step-by-step manner, and its conclusions are valid.

      (1) As suggested in the public review, further characterization of the purified material would be good, for example, by intact mass-spectroscopy to characterize the enzymes in further detail.

      Preliminary MALDI-MS analysis (performed in Loic Quinton’s laboratory) of our PIII SVMP revealed a broad and heterogeneous mass distribution, consistent with heterogeneity caused by the presence of multiple glycoforms (which is not unlike the microheterogeneity in native snake venom). However, owing to the inherent limitations of MALDI-MS for the analysis of glycoproteins, our data do not allow determination of the number of occupied N-glycosylation sites or the identification of additional types of glycosylation.

      Moreover, the relatively large molecular mass of these proteins (zymogen 70.2 kDa protein only, mature PIII 50.6 kDa protein only) makes analysis by electrospray ionisation mass spectrometry technically challenging.

      An MS-based deep analysis of the glycosylation patterns would therefore be a project on its own, and beyond the scope of the present manuscript.

      (2) The studies involving PII appear challenging due to low yields and stability of the enzyme and the mentioned self-degradation. Some studies, such as the casein-degradation, would benefit from working with a well-characterized batch of enzymes to ensure, it is not auto-degrading during the experiment.

      We believe that the finding that the PII SVMP degrades itself after incubation with Zn<sup>2+</sup> is an important observation. It is novel to the best of our knowledge. Moreover, the key message of our manuscript is that we can produce and characterise novel SVMPs that cannot be readily purified from venom (and thus are not well characterised).

      Besides, there are very few intact PII SVMPs in venom (e.g. Suntravat et al. BMC Molecular Biol 2016); the vast majority cleaves itself into a PI and a disintegrin.

      (3) Figure 4h. Degradation of insulin is only shown for recombinant PIII, not the native enzyme, and therefore doesn't convey any information with respect to how well they compare.

      We do not have available any native PII and PIII SVMPs for a comparison with the recombinant SVMPs (in our manuscript we show expression of new, uncharacterised SVMPs). We have included the PIII SVMP in the original manuscript to show that the enzyme is active and has a different specificity compared to PI SVMP. In the revised manuscript, we also included the PII SVMP insulin B degradation assay in Supplementary Figure S11b.

      (4) Figure 5a. Inconsistent use of enzymes - data for PII is presented (both as mature protein and Zymogen) and compared to PIII, but not PI, as both zymogen and mature protein. The current data presentation is confusing and gives the idea of the manuscript assembled with figures produced during the exploratory phase of the study, and not from subsequent experiments systematically conducted for the purposes of clarity and completeness.

      In the revised manuscript, we included the missing enzymatic characterisations in Figure 5 (panel a and e) and Supplementary Figure S11a-c. These data were initially not included because the respective enzymes are inactive in these assays.

      (5) The manuscript would benefit from editing to make it more concise. For an early-career reader, it is of interest and utility to follow the thought and experimental processes that led to the successful solution, but there is a risk of losing the reader's interest along the way by going through expression experiments that did not "work" in the typical sense of the word. To this reviewer, there is no added value in a full paragraph around co-expression with disulfide isomerase, as it did not improve the protein yield. A single sentence, "co-expression with PDI did not improve yields," with a reference to a supplemental figure would convey that message.

      We have moved the optimisation of SVMP expression to the Supplementary Information, which we hope has improved the clarity and flow of the main text.

      We note that the hypothesis that co-expression of protein disulfide isomerases (PDIs) enhances yields of functional SVMPs, given the high expression of PDIs in snake venom gland cells, is well established in the field. While we consider PDIs (and other chaperones) likely to play an important role in SVMP expression, we were unable to demonstrate this effect using the baculovirus-insect cell expression system and hypothesize that efficient insect and/or baculoviral PDIs are already present.

      (6) Similarly with N-linked glycosylation, the section needs a headline (line 241) and firming up of a sentence like "and possibly not all of the glycosylation..." which is vague and appears to state that it was not really of interest to pursue this further. My view is that either an experiment is done properly with a stated aim and purpose, interpreted, and then, based on whether the results are of interest to the main story or not, they are included. If N-linked glycosylation is to be included in the manuscript, it should be with a purpose (e.g., N-linked glycosylation affects enzyme activity). As it stands, the message is "there is some N-linked glycosylation" without further explanation, and this generates information without justifying the inclusion hereof.

      Please see our reply above regarding an in-depth characterisation of insect cell glycosylation of the recombinant PIII SVMP without access to the native enzyme for comparison. In our revised manuscript, we confirm that the PIII SVMP is glycosylated and that this at least partly accounts for the apparent discrepancy in molecular weight observed in SEC and SDS PAGE. We have modified the text to clarify the purpose of the PNGase deglycosylation experiment.

      (7) The manuscript, in its current form, appears to have been copied from a Thesis with very detailed step-by-step logic and description. While this is useful in a scholarly context, a scientific manuscript should be presented more compactly, assuming the readers know basic biochemistry.

      We trust that this Reviewer finds the revised version of our manuscript more compact and concise. 

      Reviewer #3 (Recommendations for the authors):

      (1) Material and Methods plus Figures:

      Please report the number of replicates per experiment and how data is presented (means/ medians/ standard deviation/ others), and add error bars to the plots where needed.

      In the revised manuscript we have included the number of repeats in the figure legends.

      (2) Abstract

      Line 4: I would not say that SVMPs are the most potent viper toxins. This place is probably taken by some of the highly neurotoxic PLA2, such as Crotoxin. Nevertheless, SVMPs are surely some of the most important toxins responsible for pathophysiological effects stemming from viper envenoming, but I would suggest rephrasing for accuracy.

      In the revised manuscript, we have modified this sentence.

      (3) Introduction

      Lines 27-31: I would like to see a reference supporting the existence of all SVMP types across vipers.

      We have included references supporting the existence of PI, PII and PIII SVMPs in viper venom. We also rewrote the sentence to state that “representatives of all three sub-classes are present in different viper venoms.” This clarifies that we do not say that all classes are present in all venoms.

      Lines 59-60: I am not sure if this should be considered such an important impediment. Essentially, many vipers yield double- to triple-digit mg amounts of crude venom per specimen from only a single milking.

      We have rewritten this text in the revised manuscript.

      Currently, it is not possible to purify any given SVMP of interest from venom; in particular for E. ocellatus SVMP isoform mixtures are typically purified rather than individual enzymes (see also introduction section of our manuscript line 57ff). Also, many SVMPs are not present in sufficient amounts in the venom. Here, we provide an approach to recombinantly produce any SVMP of interest, independent of its abundance in the venom.

      (4) Results

      Line 102: The army-fallworms name is Spodoptera, not Spotoptera. Please correct the typo.

      Done. Apologies for our oversight.

      Line 311: Please provide the data at least as a supplement.

      In the revised manuscript, we have included this experiment in Supplementary Figure S6c.

      Line 432- 433: It would be useful to clarify whether the protein should have a pro-coagulant activity (or not).

      We have changed this sentence as follows in the revised manuscript: This shows that our recombinantly produced SVMPs have no pro-coagulant activity, which was unknown before.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The pathogenic mechanism of the E182STOP variant is unclear. The mutant protein does not appear to affect WT protein localization, arguing against a dominant-negative effect. Yet, overexpression of HSD17B7-E182* alone causes toxicity in zebrafish and mislocalizes cholesterol in HEI-OC1 cells, suggesting a gain-of-function or toxic effect. In addition, the variant mRNA is expressed at a low level, consistent with nonsense-mediated decay. This apparent complexity and inconsistency need clearer explanation.

      We appreciate the reviewer’s careful evaluation of this mechanistic complexity. Based on our combined molecular, cellular, and in vivo data, we propose that the pathogenic effect of the HSD17B7-E182* variant reflects a composite mechanism, rather than a classical dominant-negative effect.

      At the transcript level, the E182* variant introduces a premature termination codon and shows markedly reduced mRNA abundance, consistent with partial degradation by nonsense-mediated mRNA decay. This reduction is expected to decrease overall HSD17B7 dosage, contributing a loss-of-function component. Unlike HSD17B7, the truncated HSD17B7<sup>E182*</sup> mislocalizes cholesterol in HEI-OC1 cells, and overexpression alone reduces hair cell MET function and startle response in zebrafish embryos. We therefore propose that the truncated protein disturbing local cholesterol homeostasis, thereby exerts a toxic or ectopic gain-of-function.

      We have revised the manuscript to clarify the dual-mechanism model.

      (2) The link to human deafness is based on a single heterozygous patient with no syndromic features. Given that nearly all known cholesterol metabolism disorders are syndromic, this raises concerns about causality or specificity. The term "novel deafness gene" is premature without additional cases or segregation data.

      We thank the reviewer for this important point. We fully agree that, based on a single heterozygous case without segregation data, it is premature to designate HSD17B7 as a novel deafness gene. Therefore, we have revised the manuscript to use the description of "candidate deafness genes".

      (3) The localization of HSD17B7 should be clarified better: In HEI-OC1 cells, HSD17B7 localizes to the ER, as expected. In mouse hair cells, the staining pattern is cytosolic and almost perfectly overlaps with the hair cell marker used, Myo7a. This needs to be discussed. Without KO tissue, HSD17B7 antibody specificity remains uncertain.

      We thank the reviewer for the constructive comments regarding HSD17B7 localization and antibody specificity.

      Regarding subcellular localization, the original Figure 1K was intended to demonstrate the expression of HSD17B7 in mouse hair cells. To address this concern, we performed additional immunostaining on dissected organ of Corti sections at P1, P4, and P7 using higher magnification. Using parvalbumin as a hair cell marker, HSD17B7 displayed a partially punctate intracellular pattern in hair cells (revised Figure 1K). This pattern is consistent with localization to membrane-associated compartments, including the endoplasmic reticulum, and agrees with the ER-associated localization observed in HEI-OC1 cells and zebrafish hair cells. In mature hair cells, ER-associated signals may appear cytosolic and overlap with general hair cell markers such as Myo7a.

      Regarding antibody specificity, although HSD17B7 knockout tissue was not available, we performed complementary validation experiments in HEI-OC1 cells. Cells were transfected with pCMV-Flag, pCMV-Flag-hHSD17B7WT, or pCMV-hHSD17B7WT-EGFP constructs and stained with anti-Flag, anti-EGFP, and anti-HSD17B7 antibodies. The HSD17B7 antibody signal showed strong co-localization with both FLAG- and EGFP-tagged HSD17B7 (revised Figure S1A and B), supporting its specificity.

      Reviewer #2 (Public review):

      (1) The statement that HSD17B7 is "highly" expressed in sensory hair cells in mice and zebrafish seems incorrect for zebrafish:

      (a) The data do not support the notion that HSB17B7 is "highly expressed" in zebrafish. Compared to other genes (TMC1, TMIE, and others), the HSB17B7 level of expression in neuromast hair cells is low (Figure 1F), and by extension (Figure 1C), also in all hair cells. This interpretation is in line with the weak detection of an mRNA signal by ISH (Figure 1G I"). On this note, the staining reported in I" does not seem to label the cytoplasm of neuromast hair cells. An antisense probe control, along with a positive control (such as TMC1 or another), is necessary to interpret the ISH signal in the neuromast.

      We thank the reviewer for this detailed evaluation and agree that the description of HSD17B7 expression in zebrafish hair cells requires clarification.

      To address this, we performed a quantitative comparison of average expression levels within neuromast hair cells using log-normalized single-cell RNA-seq data. This analysis shows that hsd17b7 is expressed at a level comparable to several known MET-associated genes (e.g., tmc1 and lhfpl5a) (revised Figure 1D). Regarding the pseudotime heatmap (Figure 1F), we now state that this analysis illustrates temporal expression dynamics within neuromast hair cell development.

      In addition, we have clarified the interpretation of the whole-mount in situ hybridization data by emphasizing that the signal indicates spatial enrichment rather than high transcript abundance.

      We have updated the figure panels, legends, and corresponding text in the Results section to reflect these changes.

      (b) However, this is correct for mouse cochlear hair cells, based on single-cell RNA-seq published databases and immunostaining performed in the study. However, the specificity of the anti-HSD17B7 antibody used in the study (in immunostaining and western blot) is not demonstrated. Additionally, it stains some supporting cells or nerve terminals. Was that expression expected?

      To assess antibody specificity, we performed validation experiments using distinct epitopes. In HEI-OC1 cells transfected with pCMV-Flag-HSD17B7, or pCMV-HSD17B7-EGFP constructs, immunostaining with anti-HSD17B7 showed strong co-localization with both FLAG- and EGFP-tag (revised Figure S1B). In addition, western blot analyses using the same constructs confirmed the specific detection of HSD17B7 protein (revised Figure S1B). These validation data have now been included as supplementary figures in the revised manuscript and provide independent supporting evidence for the specificity of the anti-HSD17B7 antibody.

      (2) A previous report showed that HSD17B7 is expressed in mouse vestibular hair cells by single-cell RNAseq and immunostaining in mice, but it is not cited: Spatiotemporal dynamics of inner ear sensory and non-sensory cells revealed by single-cell transcriptomics. Jan TA, Eltawil Y, Ling AH, Chen L, Ellwanger DC, Heller S, Cheng AG. Cell Rep. 2021 Jul 13;36(2):109358. doi: 10.1016/j.celrep.2021.109358.

      We have now cited this reference in the revised manuscript.

      (3) Overexpressed HSD17B7-EGFP C-terminal fusion in zebrafish hair cells shows a punctiform signal in the soma but apparently does not stain the hair bundles. One limitation is the consequence of the C-terminal EGFP fusion to HSD17B7 on its function, which is not discussed.

      We thank the reviewer for raising this important technical point. The apparent absence of an HSD17B7-EGFP signal in hair bundles is primarily due to the imaging strategy and the selection of representative images. In zebrafish hair cells, the EGFP signal within hair bundles is extremely strong. To better visualize the intracellular distribution of HSD17B7 within the hair cell soma, we selected representative confocal optical sections that were focused on the cell body rather than on the apical hair bundle plane. As a result, the hair bundle signal is not visible in the images shown.

      Importantly, we agree that C-terminal EGFP fusion may potentially influence protein localization or function. We have therefore revised the Discussion to discuss this limitation and to clarify that our central conclusions regarding HSD17B7 function are primarily supported by loss-of-function analyses, rescue experiments using untagged mRNA, and cholesterol perturbation phenotypes, rather than relying solely on EGFP-tagged overexpression constructs.

      (4) A mutant Zebrafish CRISPR was generated, leading to a truncation after the first 96 aa out of the 340 aa total. It is unclear why the gene editing was not done closer to the ATG. This allele may conserve some function, which is not discussed.

      Targeting regions close to the ATG is indeed a commonly used strategy for CRISPR-mediated gene disruption. In this study, sgRNA selection was guided by online CRISPR design tools (CRISPRscan), prioritizing predicted cutting efficiency and specificity. This strategy resulted in a frameshift mutation introducing a premature stop codon after amino acid 96 of the 340-aa Hsd17b7 protein.

      Importantly, this truncation removes most of the conserved catalytic core required for 17β-hydroxysteroid dehydrogenase activity, including key motifs involved in NAD(P)-binding and substrate recognition. Therefore, although the mutation does not occur immediately adjacent to the ATG, the resulting allele is predicted to lack enzymatic function. We have clarified this rationale and discussed the functional consequences of the truncation in the revised manuscript.

      (5) The hsd17b7 mutant allele has a slightly reduced number of genetically labeled hair cells (quantified as a 16% reduction, estimated at 1-2 HC of the 9 HC present per neuromast). On a note, it is unclear what criteria were used to select HC in the picture. Some Brn3C:mGFP positive cells are apparently not included in the quantifications (Figure 2F, Figure 5A).

      Upon re-evaluation, we recognized that the original figure annotations were not sufficiently clear and may have led to confusion regarding hair cell selection. In the original images, the absence of dashed outlines around some Brn3c:mGFP<sup>+</sup> cells may have been misinterpreted as their exclusion from analysis. To address this issue, we have revised Figures 2F and 5A by updating the annotations to ensure that all Brn3c:mGFP<sup>+</sup> hair cells within each neuromast are clearly visible and unambiguously included (revised Figures 2F and 6A). Corresponding figure legends have also been revised to clarify the criteria used for hair cell identification and quantification.

      (6) The authors used FM4-64 staining to evaluate the hair cell mechanotransduction activity indirectly. They found a 40% reduction in labeling intensity in the HCs of the lateral line neuromast. Because the reduction of hair cell number (16%) is inferior to the reduction of FM4-64 staining, the authors argue that it indicates that the defect is primarily affecting the mechanotransduction function rather than the number of HCs. This argument is insufficient. Indeed, a scenario could be that some HC cells died and have been eliminated, while others are also engaged in this path and no longer perform the MET function. The numbers would then match. If single-cell staining can be resolved, one could determine the FM4-64 intensity per cell. It would also be informative to evaluate the potential occurrence of cell death in this mutant. On another note, the current quantification of the FM4-64 fluorescence intensity and its normalization are not described in the methods. More importantly, an independent and more direct experimental assay is needed to confirm this point. For example, using a GCaMP6-T2A-RFP allele for Ca2+ imaging and signal normalization. 

      We have revised the FM4-64 quantification strategy. Instead of measuring fluorescence intensity at the neuromast level, FM4-64 uptake was re-quantified at the single hair cell level. Hair cells within each neuromast were identified based on mGFP labeling, and the mean FM4-64 fluorescence intensity was measured for each individual hair cell. The average FM4-64 intensity per hair cell was then calculated for each neuromast and used for group comparisons (revised Figures 2F, 6B, and 8B, Figure S5B). The updated quantification method, normalization procedure, and analysis pipeline have now been described in the revised Methods section.

      As supportive evidence, we further analyzed single-cell RNA-seq data from control and hsd17b7 mutant hair cells (revised Figure 3). This analysis revealed dysregulation of multiple genes involved in the MET machinery, including reduced expression of tip-link–associated components and altered expression of other MET-related genes. While these transcriptional changes do not constitute a direct functional assay, they are consistent with perturbation of MET-associated pathways and complement the FM4-64 findings.

      (7) The authors used an acoustic startle response to elicit a behavioral response from the larvae and evaluate the "auditory response". They found a significative decrease in the response (movement trajectory, swimming velocity, distance) in the hsd17b7 mutant. The authors conclude that this gene is crucial for the "auditory function in zebrafish".

      This is an overstatement:

      (a) First, this test is adequate as a screening tool to identify animals that have lost completely the behavioral response to this acoustic and vibrational stimulation, which also involves a motor response. However, additional tests are required to confirm an auditory origin of the defect, such as Auditory Evoked Potential recordings, or for the vestibular function, the Vestibulo-Ocular Reflex. 

      We thank the reviewer for highlighting the limitations in interpreting the acoustic startle assay. We have revised the manuscript to avoid overstatement and now describe the observed phenotype as a reduction in the behavioral response to acoustic and vibrational stimulation, rather than concluding a specific impairment of auditory function.

      (b) Secondly, the behavioral defects observed in the mutant compared to the control are significantly different, but the differences are slight, contained within the Standard Deviation (20% for velocity, 25% for distance). To this point, the Figure 2 B and C plots are misleading because their y-axis do not start at 0.

      We have corrected Figures 2B and 2C so that the y-axes start at zero, thereby providing a more transparent visualization of the behavioral differences. The figure legends have also been revised to clarify the presentation of the data.

      (8) Overexpression of HSD17B7 in cell line HEI-OC1 apparently "significantly increases" the intensity of cholesterol-related signal using a genetically encoded fluorescent sensor (D4H-mCherry). However, the description of this quantification (per cell or per surface area) and the normalization of the fluorescent signal are not provided. 

      The quantification of the D4H-mCherry signal in HEI-OC1 cells was performed at the single-cell level. Specifically, individual cells were segmented based on morphology, and the mean fluorescence intensity of D4H-mCherry per cell was measured. To account for variability in cell size and imaging conditions, fluorescence intensity was normalized to the background signal measured from cell-free regions in the same field of view. We have now clarified the quantification strategy and normalization procedure in the revised Methods and Results sections.

      (9) When this experiment is conducted in vivo in zebrafish, a reduction in the "DH4 relative intensity" is detected (same issue with the absence of a detailed method description). However, as the difference is smaller than the standard deviation, this raises questions about the biological relevance of this result.

      We have now clarified the quantification strategy and normalization procedure in the revised Methods and Results sections.

      (10) The authors identified a deaf child as a carrier of a nonsense mutation in HSB17B7, which is predicted to terminate the HSB17B7 protein before the transmembrane domain. However, as no genetic linkage is possible, the causality is not demonstrated.

      We thank the reviewer for raising this important point. Unfortunately, we were unable to obtain the parents' genetic testing data to perform formal genetic and linkage analysis. To address this limitation, we have revised the manuscript to avoid causal overstatement and now describe the HSD17B7 E182* variant as a candidate pathogenic variant associated with hearing loss. Importantly, our functional analyses in zebrafish and cell-based systems demonstrate that the E182* truncation abolishes key biological activities of HSD17B7, including subcellular localization, cholesterol regulation, mechanotransduction-related activity, and behavioral responses. These convergent functional data provide biological support for the potential pathogenic relevance of this variant.

      (11) Previous results obtained from mouse HSD17B7-KO (citation below) are not described in sufficient detail. This is critical because, in this paper, the mouse loss-of-function of HSD17B7 is embryonically lethal, whereas no apparent phenotype was reported in heterozygotes, which are viable and fertile. Therefore, it seems unlikely that heterozygous mice exhibit hearing loss or vestibular defects; however, it would be essential to verify this to support the notion that the truncated allele found in one patient is causal.

      Hydroxysteroid (17beta) dehydrogenase 7 activity is essential for fetal de novo cholesterol synthesis and for neuroectodermal survival and cardiovascular differentiation in early mouse embryos.

      Jokela H, Rantakari P, Lamminen T, Strauss L, Ola R, Mutka AL, Gylling H, Miettinen T,

      Pakarinen P, Sainio K, Poutanen M. Endocrinology. 2010 Apr;151(4):1884-92. doi: 10.1210/en.2009-0928. Epub 2010 Feb 25.

      We thank the reviewer for raising this important point. We acknowledge that previous work has shown that complete loss of Hsd17b7 in mice is embryonically lethal, whereas heterozygous animals are viable and fertile (Jokela et al., 2010). Notably, this study primarily focused on embryonic development, cholesterol metabolism, and cardiovascular and neuroectodermal survival, and auditory or vestibular functions were not specifically examined. Therefore, subtle or sensory organ–specific phenotypes in heterozygous mice cannot be excluded.

      The human variant identified in this study (E182*) is a nonsense mutation predicted to truncate the HSD17B7 protein prior to the transmembrane and cytoplasmic domains. We therefore present it as a candidate loss-of-function variant, providing supportive human genetic evidence that is consistent with our functional analyses in zebrafish hair cells, rather than as definitive proof of causality. We have revised the manuscript to clarify these points and to acknowledge this limitation.

      (12) The authors used this truncated protein in their startle response and FM4-64 assays. First, they show that contrary to the WT version, this truncated form cannot rescue their phenotypes when overexpressed. Secondly, they tested whether this truncated protein could recapitulate the startle reflex and FM4-64 phenotypes of the mutant allele. At the homozygous level (not mentioned by the way), it can apparently do so to a lesser degree than the previous mutant. Again, the differences are within the Standard Deviation of the averages. The authors conclude that this mutation found in humans has a "negative effect" on hearing, which is again not supported by the data. 

      We thank the reviewer for this important comment. We agree that the overexpression strategy employed in this study does not fully replicate the endogenous heterozygous state observed in patients, and that the magnitude of the observed effects varies across samples. Accordingly, our experiments were not intended to demonstrate a definitive causal role of the HSD17B7 <sup>E182*</sup> variant in hearing loss.

      Instead, the overexpression assays were designed to assess whether the truncated HSD17B7 protein displays abnormal cellular properties and whether its presence can interfere with processes relevant to hair cell function. Under these conditions, HSD17B7<sup>E182*</sup> exhibited aberrant subcellular localization, altered intracellular cholesterol distribution, and was associated with reduced FM4-64 uptake and changes in startle-associated behaviors, whereas the wild-type protein did not.

      We revised the manuscript to moderate our conclusions. Rather than claim that the E182* mutation has a definitive “negative effect on auditory function,” we now describe it as a functionally compromised allele that disrupts cholesterol distribution and MET-related activity under overexpression conditions, providing mechanistic support consistent with our zebrafish loss-of-function data and the identification of this variant in a patient with hearing loss. In addition, the "negative effect" statement was based on the result that overexpression of the E182* mutation in wild-type embryos caused the compromised MET function and startle response defect.

      (13) The authors looked at the distribution of the HSB17B7 in a cell line. The WT version goes to the ER, while the truncated one forms aggregates. An interesting experiment consisted of co-expressing both constructs (Figure S6) to see whether the truncated version would mislocalize the WT version, which could be a mechanism for a dominant phenotype. However, this is not the case.

      We thank the reviewer for raising this important point regarding a potential dominant-negative mechanism. Consistent with the reviewer’s interpretation, we found that HSD17B7<sup>WT</sup> predominantly localizes to the endoplasmic reticulum, whereas the truncated HSD17B7<sup>E182*</sup> protein forms intracellular aggregates. Importantly, we further observed that the E182* mutation markedly reduces the stability of both HSD17B7 mRNA and protein, resulting in substantially decreased abundance of the truncated protein (Figure S6B–E). As a consequence, the cellular levels of HSD17B7^E182* are abnormally low.

      Based on these findings, we consider it unlikely that the E182* variant exerts its effect through interference with the wild-type protein. Our results suggest that the heterozygous c.544G>T (p.E182*) variant contributes to auditory dysfunction through potential pathogenic mechanisms: 1, haploinsufficiency caused by reduced HSD17B7 expression, 2, functional impairment due to altered protein subcellular localization and cholesterol distribution.

      We have revised the Results and Discussion sections. Our conclusions now emphasize that the functional impact of this variant is attributable to decreased effective HSD17B7 dosage, consistent with the observed defects in cholesterol synthesis, MET-related activity, and auditory-associated phenotypes in our model.

      (14) Through mass spectrometry of HSB17B7 proteins in the cell line, they identified a protein involved in ER retention, RER1. By biochemistry and in a cell line, they show that truncated HSB17B7 prevents the interaction with RER1, which would explain the subcellular localization.

      Consistent with the reviewer’s interpretation, wild-type HSD17B7 interacts with RER1, a protein known to participate in ER retention, whereas this interaction is lost in the truncated HSD17B7 variant. We propose that RER1 is an interacting partner of HSD17B7, providing a mechanistic explanation for the protein's subcellular localization.

      (15) Information and specificity validation of the HSB17B7 antibody are not presented. It seems that it is the same used on mice by IF and on zebrafish by Western. If so, the antibody could be used on zebrafish by IF to localize the endogenous protein (not overexpression as done here). Secondly, the specificity of the antibody should be verified on the mutant allele. That would bring confidence that the staining on the mouse is likely specific.

      We thank the reviewer for raising this important point regarding antibody specificity and validation. Information on the HSD17B7 antibody and its validation has been provided in our response to comment 1, where we described the use of antibodies recognizing different epitopes and the experimental strategies employed to assess specificity (revised Figure S1A and B).

      Although the same antibody was used for Western blot analysis in zebrafish samples, its performance in immunofluorescence staining of zebrafish tissues was suboptimal, with relatively high background. For this reason, we did not rely on this antibody for endogenous Hsd17b7 localization in zebrafish by immunofluorescence and instead employed tagged constructs for subcellular localization analyses. This approach provides more reliable and interpretable localization information under the current experimental conditions.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Suggested revisions to help improve the study and the eLife Assessment:

      (1) FM4-64 uptake: Isolate the effect of hair cell loss and MET reduction.

      (2) Clarify the mechanistic model: Is the mutant protein pathogenic due to toxicity, lack of expression or function, or both? Come up with a clearer causal chain of events.

      (3) Mouse immunostaining: Validate the HSD17B7 antibody, and since mouse RNAseq data (gEAR database) suggest that HSD17B7 expression increases dramatically between P0-P5, show this developmental progression by immunostaining of the mouse organ of Corti at P0, P3, and P5.

      (4) The HSD17B7-E182* expression disrupts cholesterol (D4H staining) in OC1 cells. This should also be demonstrated in the mutant zebrafish.

      (5) Structural modeling of E182* is uninformative; half the protein is absent. This kind of analysis is better suited for missense variants. Suggest removing this analysis.

      We thank the Reviewing Editor for these constructive suggestions. The major points raised here substantially overlap with the concerns raised in the public reviews. In response, we have:

      (1) revised FM4-64 quantification and interpretation to better distinguish hair cell loss from MET impairment;

      (2) Clarify the mechanistic mode. Mechanistically, the mutation decreases mRNA abundance and significantly reduces protein levels. Moreover, expression of the p.E182* mutation disrupted the interaction between HSD17B7 and the ER retention receptor RER1, leading to aberrant subcellular localization and altered cholesterol distribution, thereby exacerbating HC dysfunction.

      (3) provided additional validation of the HSD17B7 antibody using antibodies targeting distinct epitopes, and extended mouse organ of Corti immunostaining to postnatal stages P1, P4, and P7 to demonstrate the developmental upregulation of HSD17B7 expression;

      (4) added in vivo zebrafish experiments demonstrating that expression of HSD17B7<sup>E182*</sup> disrupts cholesterol distribution in hair cells, consistent with the effects observed in HEI-OC1 cells using D4H staining;

      (5) removed the structural modeling of the E182* variant.

      Recommendations for the authors:

      The recommendations from Reviewer #1 and Reviewer #2 were carefully considered and addressed. Most of these points overlap with the public reviews and the Reviewing Editor's comments and have been addressed through a revised mechanistic interpretation, additional clarifications in the Methods, more moderate claims regarding auditory function and human genetics, and the removal or revision of potentially misleading analyses. In addition, a number of minor issues were corrected, including missing or incorrect references, repetitive or unclear statements in the Introduction, insufficient methodological details, imprecise terminology, and typographical or formatting errors. Collectively, these revisions improve the clarity, rigor, and transparency of the study without altering its central conclusions.

    1. Author response:

      We thank the editors and reviewers for thoroughly reviewing our manuscript and offering thoughtful and constructive feedback. We appreciate the positive reception of our work and welcome the opportunity to address the lingering concerns. In the coming revisions, we will be directly addressing the question of the miniprotein’s specificity and increase the precision in the language used to discuss our findings.

    1. Author response:

      eLife Assessment

      This study presents a valuable theoretical exploration on the electrophysiological mechanisms of ionic currents via gap junctions in hippocampal CA1 pyramidal-cell models, and their potential contribution to local field potentials (LFPs) that is different from the contribution of chemical synapses. The biophysical argument regarding electric dipoles appears solid, but the evidence can be more convincing if their predictions are tested against experiments. A shortage of model validation and strictly comparable parameters used in the comparisons between chemical vs. junctional inputs makes the modeling approach incomplete; once strengthened, the finding can be of broad interest to electrophysiologists, who often make recordings from regions of neurons interconnected with gap junctions.

      We gratefully thank the editors and the reviewers for the time and effort in rigorously assessing our manuscript, for the constructive review process, for their enthusiastic responses to our study, and for the encouraging and thoughtful comments. We especially thank you for deeming our study to be a valuable exploration on the differential contributions of active dendritic gap junctions vs. chemical synapses to local field potentials. We thank you for your appreciation of the quantitative biophysical demonstration on the differences in electric dipoles that appear in extracellular potentials with gap junctions vs. chemical synapses.

      However, we are surprised by aspects of the assessment that resulted in deeming the approach incomplete, especially given the following with specific reference to the points raised:

      (1) Testing against experiments: With specific reference to gap junctions, quantitative experimental verification becomes extremely difficult because of the well-established nonspecificities associated with gap junctional modulators (Behrens et al., 2011; Rouach et al., 2003). The non-specific actions of gap junctions are tabulated in Table 2 of (Szarka et al., 2021), reproduced below. In addition, genetic knockouts of gap junctional proteins are either lethal or involve functional compensation (Bedner et al., 2012; Lo, 1999), together making causal links to specific gap junctional contributions with currently available techniques infeasible.

      In addition, the complex interactions between co-existing chemical synaptic, gap junctional, and active dendritic contributions from several cell-types make the delineation of the contributions of specific components infeasible with experimental approaches. A computational approach is the only quantitative route to specifically delineate the contributions of individual components to extracellular potentials, as seen from studies that have addressed the question of active dendritic contributions to field potentials (Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Sinha & Narayanan, 2015, 2022) or spiking contributions to local field potentials (Buzsaki et al., 2012; Gold et al., 2006; Schomburg et al., 2012). The biophysically and morphologically realistic computational modeling route is therefore invaluable in assessing the impact of individual components to extracellular field potentials (Einevoll et al., 2019; Halnes et al., 2024).

      Together, we emphasize that the computational modeling route is currently the only quantitative methodology to delineate the contributions of gap junctions vs. chemical synapses to extracellular potentials.

      (2) Model validation: The model used in this study was adopted from a physiologically validated model from our laboratory (Roy & Narayanan, 2021). Please note that the original model was validated against several physiological measurements along the somatodendritic axis. We sincerely regret our oversight in not mentioning clearly that we have used an existing, thoroughly physiologically-validated model from our laboratory in this study.

      (3) Comparisons between chemical vs. junctional inputs: We had taken elaborate precautions in our experimental design to match the intracellular electrophysiological signatures with reference to synchronous as well as oscillatory inputs, irrespective of whether inputs arrived through gap junctions or chemical synapses.

      In a revised manuscript, we will address all the concerns raised by the reviewers in detail. We have provided point-by-point responses to reviewers’ helpful and constructive comments below. We thank the editors and the reviewers for this constructive review process, which we believe will help us in improving our manuscript with specific reference to emphasizing the novelty of our approach and conclusions.

      Reviewer #1 (Public review):

      This manuscript makes a significant contribution to the field by exploring the dichotomy between chemical synaptic and gap junctional contributions to extracellular potentials. While the study is comprehensive in its computational approach, adding experimental validation, network-level simulations, and expanded discussion on implications would elevate its impact further.

      We gratefully thank you for your time and effort in rigorously assessing our manuscript, for the enthusiastic response, and the encouraging and thoughtful comments on our study. In what follows, we have provided point-by-point responses to the specific comments.

      Strengths

      Novelty and Scope

      The manuscript provides a detailed investigation into the contrasting extracellular field potential (EFP) signatures arising from chemical synapses and gap junctions, an underexplored area in neuroscience. It highlights the critical role of active dendritic processes in shaping EFPs, pushing forward our understanding of how electrical and chemical synapses contribute differently to extracellular signals.

      We thank you for the positive comments on the novelty of our approach and how our study addresses an underexplored area in neuroscience. The assumptions about the passive nature of dendritic structures had indeed resulted in an underestimation of the contributions of gap junctions to extracellular potentials. Once the realities of active structures are accounted for, the contributions of gap junctions increases by several orders of magnitude compared to passive structures (Fig. 1D).

      Methodological Rigor

      The use of morphologically and biophysically realistic computational models for CA1 pyramidal neurons ensures that the findings are grounded in physiological relevance. Systematic analysis of various factors, including the presence of sodium, leak, and HCN channels, offers a clear dissection of how transmembrane currents shape EFPs.

      We thank you for your encouraging comments on the experimental design and methodological rigor of our approach.

      Biological Relevance

      The findings emphasize the importance of incorporating gap junctional inputs in analyses of extracellular signals, which have traditionally focused on chemical synapses. The observed polarity differences and spectral characteristics provide novel insights into how neural computations may differ based on the mode of synaptic input.

      We thank you for your positive comments on the biological relevance of our approach. We also gratefully thank you for emphasizing the two striking novelties unveiling the dichotomy between gap junctions and chemical synapses in their contributions to field potentials: polarity differences and spectral characteristics.

      Clarity and Depth

      The manuscript is well-structured, with a logical progression from synchronous input analyses to asynchronous and rhythmic inputs, ensuring comprehensive coverage of the topic.

      We sincerely thank you for the positive comments on the structure and comprehensive coverage of our manuscript encompassing different types of inputs that neurons typically receive.

      Weaknesses and Areas for Improvement

      Generality and Validation

      The study focuses exclusively on CA1 pyramidal neurons. Expanding the analysis to other cell types, such as interneurons or glial cells, would enhance the generalizability of the findings. Experimental validation of the computational predictions is entirely absent. Empirical data correlating the modeled EFPs with actual recordings would strengthen the claims.

      We thank you for raising this important point. The prime novelty and the principal conclusion of this study is that gap junctional contributions to extracellular field potentials are orders of magnitude higher when the active nature of cellular compartments are accounted for. The lacuna in the literature has been consequent to the assumption that cellular compartments are passive, resulting in the dogma that gap junctional contributions to field potentials are negligible. Despite knowledge about active dendritic structures for decades now, this assumption has kept studies from understanding or even exploring the contributions of gap junctions to field potentials. The rationale behind the choice of a computational approach to address the lacuna were as follows:

      (1) The complex interactions between co-existing chemical synaptic, gap junctional, and active dendritic contributions from several cell-types make the delineation of the contributions of specific components infeasible with experimental approaches. A computational approach is the only quantitative route to specifically delineate the contributions of individual components to extracellular potentials, as seen from studies that have addressed the question of active dendritic contributions to field potentials (Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Sinha & Narayanan, 2015, 2022) or spiking contributions to local field potentials (Buzsaki et al., 2012; Gold et al., 2006; Schomburg et al., 2012). The biophysically and morphologically realistic computational modeling route is therefore invaluable in assessing the impact of individual components to extracellular field potentials (Einevoll et al., 2019; Halnes et al., 2024).

      (2) With specific reference to gap junctions, quantitative experimental verification becomes extremely difficult because of the well-established non-specificities associated with gap junctional modulators (Behrens et al., 2011; Rouach et al., 2003). The non-specific actions of gap junctions are tabulated in Table 2 of (Szarka et al., 2021). In addition, genetic knockouts of gap junctional proteins are either lethal or involve functional compensation (Bedner et al., 2012; Lo, 1999), together making causal links to specific gap junctional contributions with currently available techniques infeasible.

      We highlight the novelty of our approach and of the conclusions about differences in extracellular signatures associated with active-dendritic chemical synapses and gap junctions, against these experimental difficulties. We emphasize that the computational modeling route is currently the only quantitative methodology to delineate the contributions of gap junctions vs. chemical synapses to extracellular potentials. Our analyses clearly demonstrates that gap junctions do contribute to extracellular potentials if the active nature of the cellular compartments is explicitly accounted for (Fig. 1D). We also show theoretically well-grounded and mechanistically elucidated differences in polarity (Figs. 1–3) as well as in spectral signatures (Figs. 5–8) of extracellular potentials associated with gap junctional vs. chemical synaptic inputs. Together, our fundamental demonstration in this study is the critical need to account for the active nature of cellular compartments in studying gap junctional contributions of extracellular potentials, with CA1 pyramidal neuronal dendrites used as an exemplar.

      In a revised version of the manuscript, we will emphasize the motivations for the approach we took, highlighting the specific novelties both in methodological and conceptual aspects, finally emphasizing the need to account for other cell types and gap junctional contributions therein. Importantly, we will emphasize the non-specificities associated with gap-junctional blockers as the reason why experimental delineation of gap junctional vs. chemical synaptic contributions to LFP becomes tedious. We hope that these points will underscore the need for the computational approach that we took to address this important question, apart from the novelties of the manuscript.

      Role of Active Dendritic Currents

      The paper emphasizes active dendritic currents, particularly the role of HCN channels in generating outward currents under certain conditions. However, further discussion of how this mechanism integrates into broader network dynamics is warranted.

      We thank you for this constructive suggestion. We agree that it is important to consider the implications for broader network dynamics of the outward HCN currents that are observed with synchronous inputs. In a revised manuscript, we will elaborate on the implications of the outward HCN current to network dynamics in detail.

      Analysis of Plasticity

      While the manuscript mentions plasticity in the discussion, there are no simulations that account for activity-dependent changes in synaptic or gap junctional properties. Including such analyses could significantly enhance the relevance of the findings.

      We thank you for this constructive suggestion. Please note that we have presented consistent results for both fewer and more gap junctions in our analyses (Figure 1 with 217 gap junctions and Supplementary Figure 1 with 99 gap junctions). Thus, our fundamentally novel result that gap junctions onto active dendrites differentially shape LFPs holds true irrespective of the relative density of gap junctions onto the neuron. Thus, these results demonstrate that the conclusions about their contributions to LFP are invariant to plasticity in their gap junctional numerosity.

      We had only briefly mentioned plasticity in the Introduction to highlight the different modes of synaptic transmission and to emphasize that plasticity has been studied in both chemical synapses and gap junctions, playing a role in learning and adaptation. However, if this wording inadvertently suggests that our study includes plasticity simulations, we would remove it from Introduction in the updated manuscript to ensure clarity.

      In the ‘Limitations of analyses and future studies’ section in Discussion, we suggested investigating the impact of plasticity mechanisms—specifically, activity-dependent plasticity of ion channels—on synaptic receptors vs. gap junctions and their effects on extracellular field potentials under various input conditions and plasticity combinations across different structures. We fully agree with the reviewer that such studies would offer valuable insights and further enhance the broader relevance of our findings. However, while our study implies this direction, it was not the primary focus of our investigation.

      In the revised manuscript, we will expand on intrinsic/synaptic plasticity and how they could contribute to LFPs (Sinha & Narayanan, 2015, 2022), while also pointing to simulations with different numbers of gap junction in this context.

      Frequency-Dependent Effects

      The study demonstrates that gap junctional inputs suppress highfrequency EFP power due to membrane filtering. However, it could delve deeper into the implications of this for different brain rhythms, such as gamma or ripple oscillations.

      We sincerely thank you for these insightful comments that we totally agree with. As it so happens, this manuscript forms the first part of a broader study where we explore the implications of gap junctions to ripple frequency oscillations. The ripple oscillations part of the work was presented as a poster in the Society for Neuroscience (SfN) annual meeting 2024 (Sirmaur & Narayanan, 2024). There, we simulate a neuropil made of hundreds of morphologically realistic neurons to assess the role of different synaptic inputs — excitatory, inhibitory, and gap junctional — and active dendrites to ripple frequency oscillations. We demonstrate there that the conclusions from single-neuron simulations in this current manuscript extend to a neuropil with several neurons, each receiving excitatory, inhibitory and gap-junctional inputs, especially with reference to high-frequency oscillations. Our networkbased analyses unveiled a dominant mediatory role of patterned inhibition in ripple generation, with recurrent excitations through chemical synapses and gap junctions in conjunction with return-current contributions from active dendrites playing regulatory roles in determining ripple characteristics (Sirmaur & Narayanan, 2024).

      Our principal goal in this study, therefore, was to lay the single-neuron foundation for network analyses of the impact of gap junctions on LFPs. We are preparing the network part of the study, with a strong focus on ripple-frequency oscillations, for submission for peer review separately.

      In a revised manuscript, we will mention the results from our SfN abstract with reference to network simulations and high-frequency oscillations, while also presenting discussions from other studies on the role of gap junctions in synchrony and LFP oscillations.

      Visualization

      Figures are dense and could benefit from more intuitive labeling and focused presentations. For example, isolating key differences between chemical and gap junctional inputs in distinct panels would improve clarity.

      We thank you for this constructive suggestion. In the revised manuscript, we will enhance the visualization of the figures to ensure a clearer and more intuitive distinction between chemical synapses and gap junctions.

      Contextual Relevance

      The manuscript touches on how these findings relate to known physiological roles of gap junctions (e.g., in gamma rhythms) but does not explore this in depth. Stronger integration of the results into known neural network dynamics would enhance its impact.

      We sincerely appreciate your valuable suggestion and acknowledge the importance of integrating our results into established neural network dynamics, particularly their implications for gamma rhythms. We will address this aspect more comprehensively in the revised version of our manuscript.

      Reviewer #2 (Public review):

      This computational work examines whether the inputs that neurons receive through electrical synapses (gap junctions) have different signatures in the extracellular local field potential (LFP) compared to inputs via chemical synapses. The authors present the results of a series of model simulations where either electric or chemical synapses targeting a single hippocampal pyramidal neuron are activated in various spatio-temporal patterns, and the resulting LFP in the vicinity of the cell is calculated and analyzed. The authors find several notable qualitative differences between the LFP patterns evoked by gap junctions vs. chemical synapses. For some of these findings, the authors demonstrate convincingly that the observed differences are explained by the electric vs. chemical nature of the input, and these results likely generalize to other cell types. However, in other cases, it remains plausible (or even likely) that the differences are caused, at least partly, by other factors (such as different intracellular voltage responses due to, e.g., the unequal strengths of the inputs). Furthermore, it was not immediately clear to me how the results could be applied to analyze more realistic situations where neurons receive partially synchronized excitatory and inhibitory inputs via chemical and electric synapses.

      We gratefully thank you for your time and effort in rigorously assessing our manuscript, for the enthusiastic response, and the encouraging and thoughtful comments on our study. In what follows, we have provided point-by-point responses to the specific comments.

      Strengths

      The main strength of the paper is that it draws attention to the fact that inputs to a neuron via gap junctions are expected to give rise to a different extracellular electric field compared to inputs via chemical synapses, even if the intracellular effects of the two types of input are similar. This is because, unlike chemical synaptic inputs, inputs via gap junctions are not directly associated with transmembrane currents. This is a general result that holds independent of many details such as the cell types or neurotransmitters involved.

      We gratefully thank you for the positive comments and the encouraging words about the novel contributions of our study. We are particularly thankful to you for your comment on the generality of our conclusions that hold for different cell types and neurotransmitters involved.

      Another strength of the article is that the authors attempt to provide intuitive, non-technical explanations of most of their findings, which should make the paper readable also for non-expert audiences (including experimentalists).

      We sincerely thank you for the positive comments about the readability of the paper.

      Weaknesses

      The most problematic aspect of the paper relates to the methodology for comparing the effects of electric vs. chemical synaptic inputs on the LFP. The authors seem to suggest that the primary cause of all the differences seen in the various simulation experiments is the different nature of the input, and particularly the difference between the transmembrane current evoked by chemical synapses and the gap junctional current that does not involve the extracellular space. However, this is clearly an oversimplification: since no real attempt is made to quantitatively match the two conditions that are compared (e.g., regarding the strength and temporal profile of the inputs), the differences seen can be due to factors other than the electric vs. chemical nature of synapses. In fact, if inputs were identical in all parameters other than the transmembrane vs. directly injected nature of the current, the intracellular voltage responses and, consequently, the currents through voltage-gated and leak currents would also be the same, and the LFPs would differ exactly by the contribution of the transmembrane current evoked by the chemical synapse. This is evidently not the case for any of the simulated comparisons presented, and the differences in the membrane potential response are rather striking in several cases (e.g., in the case of random inputs, there is only one action potential with gap junctions, but multiple action potentials with chemical synapses). Consequently, it remains unclear which observed differences are fundamental in the sense that they are directly related to the electric vs. chemical nature of the input, and which differences can be attributed to other factors such as differences in the strength and pattern of the inputs (and the resulting difference in the neuronal electric response).

      We thank you for raising this important point. We would like to emphasize that our experimental design and analyses quantitatively account for the spatial distribution and temporal pattern of specific kinds of inputs that arrive through gap junctions and chemical synapses. We submit that our analyses quantitatively demonstrates that the fundamental difference between the gap junctional and chemical synaptic contributions to extracellular potentials is the absence of the direct transmembrane component from gap junctional inputs. We elucidate these points below:

      (1) Spatial distribution: The inputs were distributed randomly across the basal dendrites, irrespective of whether they were through gap junctions or chemical synapses. For both chemical synapses and gap junctions, the inputs were of the same nature: excitatory.

      (2) Different numbers of inputs: We have presented consistent results for both fewer and more gap junctions or chemical synapses in our analyses (see Figure 1 with 217 gap junctions or 245 chemical synapses and Supplementary Figure 2 with 99 gap junctions or 30 chemical synapses). Our fundamentally novel result that gap junctions onto active dendrites shape LFPs holds true irrespective of the relative density of gap junctions onto the neuron.

      (3) Synchronous inputs (Figs. 1–3): For chemical synapses, the waveforms are in the shape of postsynaptic potentials. For gap junctional inputs, the waveforms are in the shape of postsynaptic potentials or dendritic spikes (to respect the active nature of inputs from the other cell). Here, the electrical response of the postsynaptic cell is identical irrespective of whether inputs arrive through gap junctions or chemical synapses: an action potential. We quantitatively matched the strengths such that the model generated a single action potential in response to synchronous inputs, irrespective of whether they arrived through chemical synaptic and gap junctional inputs. We mechanistically analyze the contributions of different cellular components and show that the direct transmembrane current in chemical synapses is the distinguishing factor that determines the dichotomy between the contributions of gap junctions vs. chemical synapses to extracellular potentials (Figs. 2–3). In a revised manuscript, we will show the intracellular responses to demonstrate that they are electrically matched.

      (4) Random inputs (Fig. 4): For random inputs, we did not account for the number of action potentials that arrived, as the only observation we made here was with reference to the biphasic nature of the extracellular potentials with gap junctional inputs in the “No Sodium” scenario. We note that in the “No Sodium” scenario, the time-domain amplitudes were comparable for the field potentials (Fig. 4B, Fig. 4D).

      (5) Rhythmic inputs (Fig. 5–8): For rhythmic inputs, please note that the intracellular and extracellular waveforms for every frequency are provided in supplementary figures S5– S11. It may be noted that the intracellular responses are comparable. In simulations for assessing spike-LFP comparison, we tuned the strengths to produce a single spike per cycle, ensuring fair comparison of LFPs with gap junctions vs. chemical synapses.

      Taken together, we demonstrate through explicit sets of simulations and analyses that the differences in LFPs were not driven by the strength or patterns of the inputs but rather by the differences in direct transmembrane currents, which are subsequently reflected in the LFPs. In a revised manuscript, we will add a section to emphasize these points apart from providing intracellular traces for cases where they are not provided.

      Some of the explanations offered for the effects of cellular manipulations on the LFP appear to be incomplete. More specifically, the authors observed that blocking leak channels significantly changed the shape of the LFP response to synchronous synaptic inputs - but only when electric inputs were used, and when sodium channels were intact. The authors seemed to attribute this phenomenon to a direct effect of leak currents on the extracellular potential - however, this appears unlikely both because it does not explain why blocking the leak conductance had no effect in the other cases, and because the leak current is several orders of magnitude smaller than the spike-generating currents that make the largest contributions to the LFP. An indirect effect mediated by interactions of the leak current with some voltage-gated currents appears to be the most likely explanation, but identifying the exact mechanism would require further simulation experiments and/or a detailed analysis of intracellular currents and the membrane potential in time and space.

      We thank you for raising this important question. Leak channels were among the several contributors to the positive deflection observed in LFPs associated with gap junctions. This effect was present not only in gap junctional models with intact sodium conductance but also in the no-sodium model, where the amplitude of the positive deflection was reduced across other models as well (Fig. 2F, I). Furthermore, even in the absence of leak conductance, a small positive deflection was still observed (Fig. 2F), leading us to further investigate other transmembrane currents over time and across spatial locations, from the proximal to the distal dendritic ends relative to the soma (Fig. 3D). We had observed that the dominant contributor in the case of chemical synapses was the inward synaptic current (Fig. 3A), whereas for gap junctions, the primary contributors were leak conductance along with other outward currents, such as potassium and HCN currents (Fig. 3D). Together, the direct transmembrane component of chemical synapses provides a dominant contribution to extracellular potentials. This dominance translates to differences in the relative contributions of indirect currents (including leak currents) to extracellular potentials associated chemical synaptic vs. gap junctional inputs. Our analyses of the exact ionic mechanisms (Fig. 3) demonstrates the involvement of several ion channels contributing to the indirect component in either scenario.

      In every simulation experiment in this study, inputs through electric synapses are modeled as intracellular current injections of pre-determined amplitude and time course based on the sampled dendritic voltage of potential synaptic partners. This is a major simplification that may have a significant impact on the results. First, the current through gap junctions depends on the voltage difference between the two connected cellular compartments and is thus sensitive to the membrane potential of the cell that is treated as the neuron "receiving" the input in this study (although, strictly speaking, there is no pre- or postsynaptic neuron in interactions mediated by gap junctions). This dependence on the membrane potential of the target neuron is completely missing here. A related second point is that gap junctions also change the apparent membrane resistance of the neurons they connect, effectively acting as additional shunting (or leak) conductance in the relevant compartments. This effect is completely missed by treating gap junctions as pure current sources.

      We thank you for raising this important point. We agree with the analyses presented by the reviewer on the importance of network simulations and bidirectional gap junctions that respect the voltages in both neurons. However, the complexities of LFP modeling precludes modeling of networks of morphologically realistic models with patterns of stimulations occurring across the dendritic tree. LFP modeling studies predominantly uses “post-synaptic” currents to analyze the impact of different patterns of inputs arriving on to a neuron, even when chemical synaptic inputs are considered. Explicitly, individual neurons are separately simulated with different patterns of synaptic inputs, the transmembrane current at different locations recorded, and the extracellular potential is then computed using line source approximation (Buzsaki et al., 2012; Gold et al., 2006; Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Schomburg et al., 2012; Sinha & Narayanan, 2015, 2022). Even in scenarios where a network is analyzed, a hybrid approach involving the outputs of a pointneuron-based network being coupled to an independent morphologically realistic neuronal model is employed (Hagen et al., 2016; Martinez-Canada et al., 2021; Mazzoni et al., 2015). Given the complexities associated with the computation of electrode potentials arising as a distance-weighted summation of several transmembrane currents, these simplifications becomes essential.

      Our approach models gap junctional currents in a similar way as the other model incorporate synaptic currents in LFP modeling (Buzsaki et al., 2012; Gold et al., 2006; Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Schomburg et al., 2012; Sinha & Narayanan, 2015, 2022). As gap junctions are typically implemented as resistors from the other neuronal compartment, we accounted for gap-junctional variability in our model by randomizing the scaling-factors and the exact waveforms that arrive through individual gap junctions at specific locations. Thus, the inputs were not pre-determined by “pre” neurons. Instead, the recorded voltages from potential synaptic partner neurons were randomized across locations and scaled using factors at the dendrites before being injected into the target neuron (Supplementary Fig. S1). While incorporating a network of interconnected neurons is indeed important, we utilized biophysical, morphologically realistic CA1 neuron model with different sets of input patterns to model LFPs, which were derived from the total transmembrane currents across all compartments of the multi-compartmental neuron model. Given the complexity of this approach, adding further network-level interactions or pre-post connections would have been computationally demanding.

      In a revised manuscript, we will introduce the general methodology used in LFP modeling studies to introduce synaptic currents. We will emphasize that our study extends this approach to modeling gap junctional inputs, while also highlighting randomization of locations and the scaling process in assigning gap junctional synaptic strengths.

      One prominent claim of the article that is emphasized even in the abstract is that HCN channels mediate an outward current in certain cases. Although this statement is technically correct, there are two reasons why I do not consider this a major finding of the paper. First, as the authors acknowledge, this is a trivial consequence of the relatively slow kinetics of HCN channels: when at least some of the channels are open, any input that is sufficiently fast and strong to take the membrane potential across the reversal potential of the channel will lead to the reversal of the polarity of the current. This effect is quite generic and well-known and is by no means specific to gap junctional inputs or even HCN channels. Second, and perhaps more importantly, the functional consequence of this reversed current through HCN channels is likely to be negligible. As clearly shown in Supplementary Figure S3, the HCN current becomes outward only for an extremely short time period during the action potential, which is also a period when several other currents are also active and likely dominant due to their much higher conductances. I also note that several of these relevant facts remain hidden in Figure 3, both because of its focus on peak values, and because of the radically different units on the vertical axes of the current plots.

      We thank you for raising this point and agree with you on every point. Please note that we do not assert that the outward HCN currents are exclusively associated with gap junctional inputs. Rather, our results show that synchronous inputs generate outward HCN currents in both chemical synapses (Fig. 3B; positive/outward HCN currents, except in the no sodium or leak model) and gap junctions (Fig. 3D; positive/outward HCN currents). We emphasized this in the case of gap junctions because, in the absence of inward synaptic currents, HCN (acting as outward currents with synchronous inputs) contributed to the positive deflection observed in the LFPs. While HCN would also contribute in the case of chemical synapses, its effect was negligible due to the presence of large inward synaptic currents. Since LFPs reflect the collective total transmembrane currents, the dominant contributors differ between these two scenarios, which we aimed to highlight. Since HCN exhibited outward currents in our synchronous input simulations, we have elaborated on this mechanism in the supplementary figure (Fig. S3). Our intention was not to emphasize this effect for only one synaptic mode but rather to highlight HCN's contribution to the positive deflection as one of the contributing factors.

      We agree that HCN currents are relatively small in magnitude; therefore, our conclusions were based on HCN being one of the several contributing factors. Leak conductance and other outward conductances, including HCN currents (Fig. 3D), collectively contribute to the positive deflections observed in the case of gap junctional synchronous inputs.

      We will ensure that we will account for all the points appropriately in a revised manuscript.

      Finally, I missed an appropriate validation of the neuronal model used, and also the characterization of the effects of the in silico manipulations used on the basic behavior of the model. As far as I understand, the model in its current form has not been used in other studies. If this is the case, it would be important to demonstrate convincingly through (preferably quantitative) comparisons with experimental data using different protocols that the model captures the physiological behavior of at least the relevant compartments (in this case, the dendrites and the soma) of hippocampal pyramidal neurons sufficiently well that the results of the modeling study are relevant to the real biological system. In addition, the correct interpretation of various manipulations of the model would be strongly facilitated by investigating and discussing how the physiological properties of the model neuron are affected by these alterations.

      We thank you for raising this important point. The CA1 pyramidal neuronal model used in this study is built with ion-channel models derived from biophysical and electrophysiological recordings from these cells. As mentioned in the Methods section “Dynamics and distribution of active channels” and Supplementary Table S1, models for individual channels, their gating kinetics, and channel distributions across the somatodendritic arbor (wherever known) are all derived from their physiological equivalents. Importantly, these values were derived from previously validated models from the laboratory, which contain these very ion channel models and the exact same morphology (Roy & Narayanan, 2021). Please compare Supplementary Table S1 with the Table 1 from (Roy & Narayanan, 2021). Please note that this model was validated against several physiological measurements along the somatodendritic axis (Fig. 1 of (Roy & Narayanan, 2021)).

      In a revised manuscript, we will explicitly mention this while also mentioning the different physiological properties that were used for the validation process from (Roy & Narayanan, 2021). We sincerely regret not mentioning these details in the current version of our manuscript.

      We will fix these in a revised version of the manuscript.

      References

      Bedner, P., Steinhauser, C., & Theis, M. (2012). Functional redundancy and compensation among members of gap junction protein families? Biochim Biophys Acta, 1818(8), 1971-1984. https://doi.org/10.1016/j.bbamem.2011.10.016

      Behrens, C. J., Ul Haq, R., Liotta, A., Anderson, M. L., & Heinemann, U. (2011). Nonspecific effects of the gap junction blocker mefloquine on fast hippocampal network oscillations in the adult rat in vitro. Neuroscience, 192, 11-19. https://doi.org/10.1016/j.neuroscience.2011.07.015

      Buzsaki, G., Anastassiou, C. A., & Koch, C. (2012). The origin of extracellular fields and currents--EEG, ECoG, LFP and spikes. Nat Rev Neurosci, 13(6), 407-420. https://doi.org/10.1038/nrn3241

      Einevoll, G. T., Destexhe, A., Diesmann, M., Grun, S., Jirsa, V., de Kamps, M., Migliore, M., Ness, T. V., Plesser, H. E., & Schurmann, F. (2019). The Scientific Case for Brain Simulations. Neuron, 102(4), 735-744. https://doi.org/10.1016/j.neuron.2019.03.027

      Gold, C., Henze, D. A., Koch, C., & Buzsaki, G. (2006). On the origin of the extracellular action potential waveform: A modeling study. J Neurophysiol, 95(5), 3113-3128. https://doi.org/10.1152/jn.00979.2005

      Hagen, E., Dahmen, D., Stavrinou, M. L., Linden, H., Tetzlaff, T., van Albada, S. J., Grun, S., Diesmann, M., & Einevoll, G. T. (2016). Hybrid Scheme for Modeling Local Field Potentials from Point-Neuron Networks. Cereb Cortex, 26(12), 4461-4496. https://doi.org/10.1093/cercor/bhw237

      Halnes, G., Ness, T. V., Næss, S., Hagen, E., Pettersen, K. H., & Einevoll, G. T. (2024). Electric Brain Signals: Foundations and Applications of Biophysical Modeling. Cambridge University Press. https://doi.org/DOI: 10.1017/9781009039826

      Lo, C. W. (1999). Genes, gene knockouts, and mutations in the analysis of gap junctions. Dev Genet, 24(1-2), 1-4. https://doi.org/10.1002/(SICI)1520-6408(1999)24:1/2%3C1::AID-DVG1%3E3.0.CO;2-U

      Martinez-Canada, P., Ness, T. V., Einevoll, G. T., Fellin, T., & Panzeri, S. (2021). Computation of the electroencephalogram (EEG) from network models of point neurons. PLoS Comput Biol, 17(4), e1008893. https://doi.org/10.1371/journal.pcbi.1008893

      Mazzoni, A., Linden, H., Cuntz, H., Lansner, A., Panzeri, S., & Einevoll, G. T. (2015). Computing the Local Field Potential (LFP) from Integrate-and-Fire Network Models. PLoS Comput Biol, 11(12), e1004584. https://doi.org/10.1371/journal.pcbi.1004584

      Ness, T. V., Remme, M. W. H., & Einevoll, G. T. (2018). h-Type Membrane Current Shapes the Local Field Potential from Populations of Pyramidal Neurons. J Neurosci, 38(26), 6011-6024. https://doi.org/10.1523/jneurosci.3278-17.2018

      Reimann, M. W., Anastassiou, C. A., Perin, R., Hill, S. L., Markram, H., & Koch, C. (2013). A biophysically detailed model of neocortical local field potentials predicts the critical role of active membrane currents. Neuron, 79(2), 375-390. https://doi.org/10.1016/j.neuron.2013.05.023

      Rouach, N., Segal, M., Koulakoff, A., Giaume, C., & Avignone, E. (2003). Carbenoxolone blockade of neuronal network activity in culture is not mediated by an action on gap junctions. Journal of Physiology, 553(Pt 3), 729-745. https://doi.org/10.1113/jphysiol.2003.053439

      Roy, A., & Narayanan, R. (2021). Spatial information transfer in hippocampal place cells depends on trial-to-trial variability, symmetry of place-field firing, and biophysical heterogeneities. Neural Netw, 142, 636-660. https://doi.org/10.1016/j.neunet.2021.07.026

      Schomburg, E. W., Anastassiou, C. A., Buzsaki, G., & Koch, C. (2012). The spiking component of oscillatory extracellular potentials in the rat hippocampus. J Neurosci, 32(34), 11798-11811. https://doi.org/10.1523/JNEUROSCI.0656-12.2012

      Sinha, M., & Narayanan, R. (2015). HCN channels enhance spike phase coherence and regulate the phase of spikes and LFPs in the theta-frequency range. Proc Natl Acad Sci U S A, 112(17), E2207-2216. https://doi.org/10.1073/pnas.1419017112

      Sinha, M., & Narayanan, R. (2022). Active Dendrites and Local Field Potentials: Biophysical Mechanisms and Computational Explorations. Neuroscience, 489, 111-142. https://doi.org/10.1016/j.neuroscience.2021.08.035

      Sirmaur, R., & Narayanan, R. (2024). Distinct extracellular signatures of chemical and electrical synapses impinging on active dendrites differentially contribute to ripple-frequency oscillations. Society for Neuroscience annual meeting (https://www.abstractsonline.com/pp8/?_gl=1*1bxo7m*_gcl_au*MTc5MTQ0NjE0NC4xNzI3MDcwOTMw*_ga*MTMxMTE5OTcyMy4xNzI3MDcwOTMx*_ga_T09K 3Q2WDN*MTcyNzA3MDkzMS4xLjEuMTcyNzA3MDkzNy41NC4wLjA.#!/20433/ presentation/13949), Chicago, USA.

      Szarka, G., Balogh, M., Tengolics, A. J., Ganczer, A., Volgyi, B., & Kovacs-Oller, T. (2021). The role of gap junctions in cell death and neuromodulation in the retina. Neural Regen Res, 16(10), 1911-1920. https://doi.org/10.4103/1673-5374.308069

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      This study resolves a cryo-EM structure of the GPCR, GPR30, in the presence of bicarbonate, which the author's lab recently identified as the physiological ligand. Understanding the ligand and the mechanism of activation is of fundamental importance to the field of receptor signaling. This solid study provides important insight into the overall structure and suggests a possible bicarbonate binding site.

      Strengths:

      The overall structure, and proposed mechanism of G-protein coupling are solid. Based on the structure, the authors identify a binding pocket that might accommodate bicarbonate. Although assignment of the binding pocket is speculative, extensive mutagenesis of residues in this pocket identifies several that are important to G-protein signaling. The structure shows some conformational differences with a previous structure of this protein determined in the absence of bicarbonate (PMC11217264). To my knowledge, bicarbonate is the only physiological ligand that has been identified for GPR30, making this study an important contribution to the field. However, the current study provides novel and important circumstantial evidence for the bicarbonate binding site based on mutagenesis and functional assays.

      Weaknesses:

      Bicarbonate is a challenging ligand for structural and biochemical studies, and because of experimental limitations, this study does not elucidate the exact binding site. Higher resolution structures would be required for structural identification of bicarbonate. The functional assay monitors activation of GPR30, and thus reports on not only bicarbonate binding, but also the integrity of the allosteric network that transduces the binding signal across the membrane. However, biochemical binding assays are challenging because the binding constant is weak, in the mM range.

      The authors appropriately acknowledge the limitations of these experimental approaches, and they build a solid circumstantial case for the bicarbonate binding pocket based on extensive mutagenesis and functional analysis. However, the study does fall short of establishing the bicarbonate binding site.

      We thank the reviewer for this thoughtful and constructive assessment of our revised manuscript. We are grateful for the recognition of the overall quality of the cryo-EM structure and the proposed mechanism of G-protein coupling, as well as for highlighting the importance of identifying bicarbonate as a physiological ligand for GPR30 and the contribution this work makes to the receptor signaling field. We also appreciate the reviewer’s careful and balanced discussion of the inherent challenges posed by bicarbonate as a low-affinity, small, negatively charged ligand, and we fully agree that, given current experimental limitations, our data provide circumstantial—rather than definitive—evidence for the binding site and that higher-resolution structures would be required for direct visualization. Importantly, we value the reviewer’s acknowledgement that we transparently describe these limitations and that our extensive mutagenesis and functional analyses nonetheless build a solid case for the proposed bicarbonate-binding pocket, which we believe will serve as a useful framework for future biochemical and structural investigation

      Reviewer #1 (Recommendations for the authors):

      Overall, the authors do a good job responding to the previous review, with updated structures and experimental data. I have two comments on the current version:

      (1) When the authors compare their structure to a previously published structure of the same receptor, they say that the previous structure came out while the current manuscript was in revision (line 255). This is not correct. The previous manuscript was published May 14, 2024, and the current manuscript was received by eLife on May 20, 2024. This sentence should be corrected to "During the preparation of this manuscript..."

      We corrected the sentence accordingly (line 259).

      (2) Line 173: what other structures are the authors referring to? Citations should be included here.

      Is Line 193 correct? We added citations (line 190).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, "Cryo-EM structure of the bicarbonate receptor GPR30," the authors aimed to enrich our understanding of the role of GPR30 in pH homeostasis by combining structural analysis with a receptor function assay. This work is a natural development and extension of their previous work on Nature Communications (PMID: 38413581). In the current body of work, they solved the cryo-EM structure of the human GPR30-G-protein (mini-Gsqi) complex in the presence of bicarbonate ions at 3.15 Å resolution. From the atomic model built based on this map, they observed the overall canonical architecture of class A GPCR and also identified 3 extracellular pockets created by ECLs (Pockets A-C). Based on the polarity, location, size, and charge of each pocket, the authors hypothesized that pocket A is a good candidate for the bicarbonate binding site. To identify the bicarbonate binding site, the authors performed an exhaustive mutant analysis of the hydrophilic residues in Pocket A and analyzed receptor reactivity via calcium assay. In addition, the human GPR30-G-protein complex model also enabled the authors to elucidate the G-protein coupling mechanism of this special class A GPCR, which plays a crucial role in pH homeostasis.

      Strengths:

      As a continuation of their recent Nature Communications publication, the authors used cryo-EM coupled with mutagenesis and functional studies to elucidate bicarbonate-GPR30 interaction. This work provided atomic-resolution structural observations for the receptor in complex with G-protein, allowing us to explore its mechanism of action, and will further facilitate drug development targeting GPR30. There were 3 extracellular pockets created by ECLs (Pockets A-C). The authors were able to filter out 2 of them and hypothesized that pocket A was a good candidate for the bicarbonate binding site based on the polarity, location, and charge of each pocket. From there, the authors identified the key residues on GPR30 for its interaction with the substrate, bicarbonate. Together with their previous work, they mapped out amino acids that are critical for receptor reactivity.

      Weaknesses:

      When we see a reduction of a GPCR-mediated downstream signaling, several factors could potentially contribute to this observation: 1) a reduced total expression of this receptor due to the mutation (transcription and translation issue); 2) a reduced surface expression of this receptor due to the mutation (trafficking issue); and 3) a dysfunctional receptor that doesn't signal due to the mutation. In the current revision, based on the gating strategy, the surface expression of the HA-positive WT GPR30-expressing cells is only 10.6% of the total population, while the surface expression levels of the mutants range from 1.89% (P71A) to 64.4% (D111A). Combining this information with the functional readout in Figure 3F and G, as well as their previous work, the authors concluded that mutations at P71, E115, D125, Q138, C207, D210, and H307 would decrease bicarbonate responses. Among those sites,

      E115, Q138, and H307 were from their previous Nature Comm paper.

      Authors claim P71 and C207 make a structural-stability contribution, as their mutations result in a significant reduction in surface expression: P71A (1.89%) and C207A (2.71%). However, compared to 10.6% of the total population in the WT, (P71A is 17.8% of the WT, and C207A is 25.6% of the WT), this doesn't rule out the possibility that the mutated receptor is also dysfunctional: at 10 mM NaHCO3, RFU of WT is ~500, RFU of P71 and C207 are ~0.

      The authors also interpret "The D125ECL1A mutant has lost its activity but is located on the surface" and only mention "D125 is unlikely to be a bicarbonate binding site, and the mutational effect could be explained due to the decreased surface expression". Again, compared to 10.6% of the total population in the WT, D125A (3.94%) is 37.2% of the WT. At 10 mM NaHCO3, the RFU of the WT is ~500, the RFU of D125 is ~0. This doesn't rule out the possibility that the mutated receptor is also dysfunctional. It is not clear why D125A didn't make it to the surface.

      Other mutants that the authors didn't mention much in their text: D111A (64.4%, 607.5% of WT surface expression), E121A (50.4%, 475.5% of WT surface expression), R122 (41.0%, 386.8% of WT surface expression), N276A (38.9%, 367.0% of WT surface expression) and E218A (24.6%, 232.1% of WT surface expression) all have similar RFU as WT, although the surface expression is about 2-6 times more. On the other hand, Q215A (3.18%, 30% of WT surface expression) has similar RFU as WT, with only a third of the receptor on the surface.

      Altogether, the wide range of surface expression across the different cell lines, combined with the different receptor function readouts, makes the cell functional data only partially support their structural observations.

      We sincerely thank the reviewer for their careful reading and thoughtful evaluation of our manuscript on the cryo-EM structure of the bicarbonate receptor GPR30. We greatly appreciate the reviewer’s positive assessment of the overall significance of combining structural determination with extensive mutagenesis and functional assays to advance understanding of bicarbonate–GPR30 interactions and G-protein coupling, as well as their recognition that these atomic-level insights will be valuable for future mechanistic studies and drug-development efforts. We are also grateful for the reviewer’s constructive critique regarding the interpretation of reduced signaling in the context of variable surface expression across mutants, which highlights an important point about disentangling effects of expression/trafficking from intrinsic receptor dysfunction; these comments are highly insightful and will help us strengthen the clarity and rigor of our presentation and conclusions in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      In this revision, the authors have made a significant effort to improve and validate the structural observations, as well as address the comments in the previous submission. They updated the functional assays and evaluated the receptor function by measuring intracellular calcium mobilization, which is a more direct measurement for the downstream signaling of hGPR30-Gq signaling. They also used flow cytometry with an HA-antibody for a more direct measurement of the surface expression of the receptor, replacing their previous assay that normalized to the housekeeping gene Na-K-ATPase.

      I appreciate the effort the authors made to address the previous comments made by the reviewers. However, there are still some concerns about the current data.

      (1) The authors have addressed my previous comment on untangling the mixture of their previous and new data in the "insights into bicarbonate binding" section. They have made it clear that the importance of E115, Q138, and H307 in the receptor-bicarbonate interaction was shown in their Nature Communications paper.

      (2) The authors have addressed my previous comment on adding some content about the physiological concentration of HCO3, or referring more to their previous work about the rationale to select the bicarbonate dose in their functional assay.

      (3) The authors have updated Figure 3

      (4) The authors have updated Supplemental Figure 1 to show the full gel with molecular weight markers in the supplemental data to demonstrate the sample purity.

      (5) The authors have updated the predicted model using AF3

      (6) The authors added E218A as suggested before.

      Some new suggestions for this R1:

      (1) The wide range of surface expression across the different cell lines, combined with the different receptor function readouts, makes the cell functional data only partially support their structural observations.

      We acknowledge this limitation. The wide range of surface expression among cell lines, together with differences in assay modalities, may introduce variability that complicates direct quantitative comparisons and therefore only partially supports the structural observations. Future work using more standardized expression systems and matched functional readouts will be important to strengthen the structure–function linkage.

      (2) Line 101, "ICL1 and ECL1 contain short α helices", no α helix of ICL1 is shown in Figure 2C

      We removed the word “ICL1” (line 98).

      (3) For the unsolved region of ECL2, could the author put a dashed line connecting ECL2 with TM4? In the current Figure 2B, it looks like ECL2 connects TM3 and TM5.

      According to the suggestion, we corrected Figure 2B.

      (4) I appreciate that the authors updated the predicted model with AF3, but they didn't make it clear why they had the comparison between their cryo-EM structure (bicarbonate-activated G-protein-incorporated GPR30) and the predicted AF3 model (inactive GPR30)

      We wish to assert the usefulness of experimental structures, not merely predictions. These include structures independent of receptor activation, such as SS bonds.

      (5) I appreciate that the authors have addressed my previous comment on adding some content about the physiological concentration of HCO3, but it was still not clear to me why they picked 11 mM in Figure 3G for the bar graph. Also, since a dose-response curve was made in Figure 3F, why not just calculate and report the EC50 of NaHCO3 for each mutant?

      Thank you for your comment. Thank you for the comment. We’ve calculated the EC50 of the calcium response and assessed its correlation with receptors’ cell surface expression. We chose 11 mM in Fig .3G since our previous paper in Nature Communications showed the EC50 value of IPs assay was around 11 mM. However, the calcium response was more sensitive and gave a lower value than expected. Therefore, according to your advice, we deleted the bar graph with 11 mM responses, calculated EC50, and drew pictures of the correlation among cell surface expression, EC50, and maximum responses (Figure 3F-I, Supplementary File 1). Moreover, we revised the explanation about this mutagenesis study (lines139-154 and 217-230).

      (6) In the previous submission and comments, E218 was in close contact with bicarbonate in the previous Figure 4D (the bicarbonate is deleted in the new structure). I thank the authors for making an E218A mutant and performing the functional assay. As mentioned above, E218A (24.6%, 232.1% of WT surface expression) has a similar functional readout as WT. Doesn't this also indicate that E218A is partially broken, so you will need twice as much as WT to have the same downstream signal?

      Thank you for your comment. In our revised manuscript, we described the correlation between cell surface expression and EC50 and found that cell surface expression and the response to bicarbonate are not correlated, which you mentioned in your review comment (Figure 3F-I, Supplementary File 1). There are many possibilities that could explain this: GPR30 localization in specific spots on the plasma membrane might limit the response stoichiometry, GPR30 might also work intracellularly to blunt the increased response because of more GPR30 expression on PM, redundant GPR30 on PM might be broken, or E118A might be less functional and need twice as much as WT. We will examine cell surface expression of GPR30 and its response to bicarbonate in a future study.

      I would suggest that the authors in future studies consider using the Tet-on inducible cell lines, such as HEK293 Flp-In Trex. These cell lines will allow the authors to fine-tune the surface expression of their mutants to the same level with different doses of Tetracycline in their stable cell lines.

      We appreciate your advice. We’ll introduce Tet-on inducible cell lines for future research.

      Reviewer #3 (Public review):

      Summary

      GPR30 responds to bicarbonate and plays a role in regulating cellular pH and ion homeostasis. However, the molecular basis of bicarbonate recognition by GPR30 remains unresolved. This study reports the cryo-EM structure of GPR30 bound to a chimeric mini-Gq in the presence of bicarbonate, revealing mechanistic insights into its G-protein coupling. Nonetheless, the study does not identify the bicarbonate-binding site within GPR30.

      Strengths

      The work provides strong structural evidence clarifying how GPR30 engages and couples with Gq.

      Weaknesses

      Several GPR30 mutants exhibited diminished responses to bicarbonate, but their expression levels were also reduced. As a result, the mechanism by which GPR30 recognizes bicarbonate remains uncertain, leaving this aspect of the study incomplete.

      We sincerely thank the reviewer for this thoughtful and balanced assessment of our manuscript, including the clear summary of the central advance and the constructive identification of remaining limitations. We particularly appreciate the recognition that our cryo-EM analysis provides strong structural evidence for how GPR30 engages and couples with Gq, and we agree that pinpointing the bicarbonate-binding site remains a critical open question. In the revised manuscript, we will make this point more explicit, clarify the interpretation of the mutagenesis results in light of reduced receptor expression for some variants, and further strengthen the presentation and discussion of what our current data do—and do not—allow us to conclude regarding bicarbonate recognition by GPR30

      Reviewer #3 (Recommendations for the authors):

      The authors have removed the bicarbonate assignment from their model and have addressed all of my concerns. In this study, or in future work, it would be advisable for the authors to explore the use of bicarbonate mimetics with higher binding affinity to facilitate more definitive structural characterization.

      Thank you for this constructive suggestion. We agree that exploring bicarbonate mimetics with higher binding affinity would be an important next step to enable more definitive structural characterization of GPR30 and to strengthen mechanistic conclusions. In future work, we plan to pursue the identification and/or design of such mimetics, guided by the architecture and mutational landscape of the extracellular pocket described here, and to combine these ligands with optimized cryo-EM sample preparation and complementary functional assays to better stabilize and visualize the bound state.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study examines the role of the long non-coding RNA Dreg1 in regulating Gata3 expression and ILC2 development. Using Dreg1-deficient mice, the authors show a selective loss of ILC2s but not T or NK cells, suggesting a lineage-specific requirement for Dreg1. By integrating public chromatin and TF-binding datasets, they propose a Tcf1-Dreg1-Gata3 regulatory axis. The topic is relevant for understanding epigenetic regulation of ILC differentiation.

      Strengths:

      (1) Clear in vivo evidence for a lineage-specific role of Dreg1.

      (2) Comprehensive integration of genomic datasets.

      (3) Cross-species comparison linking mouse and human regulatory regions.

      Weaknesses:

      (1) Mechanistic conclusions remain correlative, relying on public data.

      We agree that the mechanistic conclusions are of our study are indeed correlative and we mention this in the discussion. The primary work of the study is the discovery of Dreg1's necessity for ILC2 development via the new knockout mouse model. Re-analysing good quality publicly available data on rare cell populations is an appropriate approach and in line with DORA guidelines for ethical research.

      (2) Lack of direct chromatin or transcriptional validation of Tcf1-mediated regulation.

      The most appropriate way to examine direct Tcf1 target genes in primary cells is to examine the association of Tcf1 binding with the changes that occur in Tcf1-bound genes after Tcf7 knockout. By analysing publicly available data on ILC progenitors we indeed did this. We revealed that Tcf1 bound to Dreg1 and that Dreg1 was not expressed when Tcf1 was knocked out in ILC progenitors. In addition we examined H3K27ac at the Dreg1 locus in the same ILC progenitors to demonstrate that Tcf1 appears to be important for decorating the Dreg1 gene with this histone modification. We believe that this analysis is sufficient to conclude that Tcf1 is required for the expression of Dreg1 in ILC progenitors.

      (3) Human enhancer function is not experimentally confirmed.

      We agree that the potential human enhancer of GATA3 we identified has not been confirmed in human ILC. However, a previous study showed clear evidence that this region has GATA3 enhancer activity in human T cells. Therefore, while not specific to ILC2s the region where the DREG1 homologues lie does indeed harbour enhancer activity.

      (4) Insufficient methodological detail and limited mechanistic discussion.

      We have now made the changes suggested by the reviewer to both the methods/figure legends and also the discussion.

      Reviewer #1 (Recommendations for the authors):

      The authors generated Dreg1-deficient mice and demonstrated that loss of this locus selectively reduces ILC2s but not T or NK cells, indicating a lineage-specific requirement for Dreg1 in ILC development. By analyzing publicly available chromatin accessibility and transcription factor-binding datasets, they link Dreg1 expression to Tcf1-dependent chromatin activation and extend their findings to human data by identifying a syntenic GATA3 enhancer that produces homologous Dreg lncRNAs in ILC2s. While the study addresses an interesting question, most of the mechanistic interpretations rely heavily on publicly available datasets rather than the authors' own functional evidence. To establish causality and reinforce the overall conclusions, I provide below some comments and suggestions for additional experiments and clarifications that would considerably strengthen the manuscript.

      (1) In Figure 3, the authors use public datasets to argue that Tcf1 regulates Dreg1 expression by modulating chromatin accessibility and H3K27ac at its locus. However, since these data are derived from heterogeneous external sources, the conclusions remain associative. To better support causality, the authors should generate matched datasets from their own sorted progenitor populations and perform CUT&Tag for Tcf1 and H3K27ac in wild-type and Tcf7 knockout progenitors to directly test whether Tcf1 binding establishes an active chromatin state at Dreg1. Also, complementing this with nascent RNA or pre-mRNA quantification would link chromatin activation to transcriptional output. These experiments are technically feasible in progenitors and would substantially strengthen the claim that Tcf1 directly drives Dreg1 activation during ILC development.

      We believe that utilising publicly available data sufficiently answers this question while also adhering to ethical considerations. The ILC populations used to produce the publicly available data were akin to those we examined in our analyses, and the data was of sufficient quality. Moreover, they enable us to access data from Tcf1-deficient mice. Redoing large-scale chromatin profiling on rare cell types would require hundreds of mice to achieve sufficient cell numbers. Repeating this solely for “originality” contradicts the 3Rs principles (replacement, reduction, refinement) if high quality public data already exists and we feel will require years of redundant work. In addition, we believe the fact that the data derive from heterogenous external sources, yet align well, only strengthen our conclusions. We have now added mention to our use of publicly available data in the discussion.

      (2) In Figure 4, the authors provide correlative evidence from public datasets suggesting that the human region syntenic to the murine Dreg1 locus acts as a distal enhancer of GATA3 and gives rise to two ILC2-specific lncRNAs. To substantiate this claim, the authors should perform CUT&Tag for H3K27ac in human ILC2s to confirm enhancer activation and use 3C or HiChIP to demonstrate physical interaction with the GATA3 promoter. These experiments should be doable by fusing pooled ILC2 samples and would provide more direct evidence that this region actively regulates GATA3 expression.

      Assessing the activity of a distal enhancer region on its target gene in primary human cells is extremely difficult, due to a number of technical and biological complications such as enhancer redundancy. This is why we chose to reanalyse an extensive enhancer deletion screen performed in human T cells by Chen et al., AJHG 2023. This analysis clearly showed deletion of the region we identified as harbouring Dreg1 homologues affected GATA3 expression, thus confirming its enhancer activity. While we agree with the reviewer that specific profiling of human ILC populations for H3K27ac and 3D genome architecture would provide further correlative evidence this will be a time-consuming and costly endevour with human material and ultimately the definitive proof in ILCs would require specific deletion of this region in ILC2s. We have mentioned this caveat in the discussion.

      (3) Several figure legends lack essential methodological details. Figure 1 should specify how NK and ILC populations were gated, including intermediate steps and markers used. The same applies to Supplementary Figure 1, and particularly to Supplementary Figure 2, where gating strategies for progenitors are shown but not explained. Figure 2 should also indicate that these analyses were performed in bone marrow. Clearer legends are crucial for interpreting and reproducing the data.

      We have made the suggested changes.

      (4) It is also unclear throughout the manuscript whether the authors performed any ATACseq experiments themselves or relied entirely on public datasets. This information should be stated explicitly in the main text and figure legends, not only in the Methods section. Similarly, the source of the ChIPseq or CUT&Run datasets should be clearly indicated alongside the relevant figures.

      We apologise for not making this clearer and have now clearly articulated if the data was public in the text.

      (5) As the authors themselves suggest, performing experiments that selectively suppress Dreg1 transcription using antisense oligonucleotides or CRISPR interference at the Dreg1 promoter would provide more valuable mechanistic insights. Conducting these experiments in their own system would allow them to determine whether Dreg1 functions through its RNA product or as a DNA enhancer element, thereby strengthening the causal link between Dreg1 activity and Gata3 regulation.

      We agree with the reviewer, however, this, in our opinion is beyond the scope of this manuscript. The strength of this manuscript lies in the findings from the novel Dreg1 knockout mouse strain. Future studies will focus on understanding how Dreg1 influences Gata3 expression.

      (6) The discussion would benefit from a clearer and more integrated explanation of how Dreg1 fits into the transcriptional network that controls ILC2 differentiation. The authors could elaborate on whether Dreg1 fine-tunes Gata3 expression or functions as part of a regulatory loop with Tcf1, and better explain how this mechanism might be conserved in humans. In addition, the authors should explicitly acknowledge the limitations of relying on publicly available datasets and emphasize the need for direct experimental validation to support their mechanistic interpretation.

      We have now made these suggested inclusions.

      Reviewer #2 (Public review):

      The authors investigate the role of the long non-coding RNA Dreg1 for the development, differentiation, or maintenance of group 2 ILC (ILC2). Dreg1 is encoded close to the Gata3 locus, a transcription factor implicated in the differentiation of T cells and ILC, and in particular of type 2 immune cells (i.e., Th2 cells and ILC2). The center of the paper is the generation of a Dreg1-deficient mouse. While Dreg1-/- mice did not show any profound ab T or gd T cell, ILC1, ILC3, and NK cell phenotypes, ILC2 frequencies were reduced in various organs tested (small intestine, lung, visceral adipose tissue). In the bone marrow, immature ILC2 or ILC2 progenitors were reduced, whereas a common ILC progenitor was overrepresented, suggesting a differentiation block. Using ATAC-seq, the authors find that the promoter of Dreg1 is open in early lymphoid progenitors, and the acquisition of chromatin accessibility downstream correlates with increased Dreg1 expression in ILC2 progenitors. Examining publicly available Tcf1 CUT&Run data, they find that Tcf1 was specifically bound to the accessible sites of the Dreg1 locus in early innate lymphoid progenitors. Finally, the syntenic region in the human genome contains two non-coding RNA genes with an expression pattern resembling mouse Dreg1.

      The topic of the manuscript is interesting. However, there are various limitations that are summarized below.

      (1) The authors generated a new mouse model. The strategy should be better described, including the genetic background of the initially microinjected material. How many generations was the targeted offspring backcrossed to C57BL/6J?

      The mice were backcrossed for at least 2 generations to C57BL/6. This information is now included in the methods section.

      (2) The data is obtained from mice in which the Dreg1 gene is deleted in all cells. A cell-intrinsic role of Dreg1 in ILC2 has not been demonstrated. It should be shown that Dreg1 is required in ILC2 and their progenitors.

      We now provide new mixed bone marrow irradiation chimera data that shows that the effect is intrinsic to Dreg1-deficient ILC2 cells (Figure 1F and Supplementary Figure 1E-G).

      (3) The data on how Dreg1 contributes to the differentiation and or maintenance of ILC2 is not addressed at a very definitive level. Does Dreg1 affect Gata3 expression, mRNA stability, or turnover in ILC2? Previous work of the authors indicated that knockdown of Dreg1 does not affect Gata3 expression (PMID: 32970351).

      We have indeed shown that Dreg1-deficient ILC2P have reduced levels of Gata3 (Figure 2H) however we have not determined the exact mechanisms by which Dreg1 controls ILC2 development.

      (4) How Dreg1 exactly affects ILC2 differentiation remains unclear.

      We agree with the reviewer, however, this article is focused on the first description of the Dreg1 knockout mice and the surprisingly specific effect on ILC2 development.

      Reviewer #2 (Recommendations for the authors):

      (1) Relating to point 2 of public review:

      It should be shown that Dreg1 is required in ILC2 and their progenitors. Mixed bone marrow chimeras would be an adequate strategy.

      We have now done this and clearly showed that the effect is intrinsic to Dreg1-deficient ILC2s.

      (2) Relating to point 3 of public review:

      Minimally, Gata3 expression should be analyzed in ILC2, ILC2P, and the ILC progenitors by qRT-PCR and antibody stain.

      We have indeed shown reduced Gata3 levels by antibody stain in Figure 2H.

      (3) Relating to point 4 of public review:

      The manuscript would benefit from additional data studying ILC2 differentiation in (competitive) adoptive transfer experiments or using in vitro differentiation assays.

      We have performed the mixed bone marrow chimera experiments which are testing the competitiveness of Dreg1-deficient bone barrow with control wildtype. In this case the WT ILC2s outcompeted the Dreg1-deficient ILC2s for the same niche.

    1. Author response:

      eLife Assessment

      This valuable study reports a spatiotemporal atlas of mouse placental development and explores the role of glycogen trophoblast cells in fetal viability. Solid data are presented to support the main conclusion. This work will be of great interest to developmental DNA reproductive biologists.

      We thank the editors for this positive and balanced assessment of our study. We are encouraged that the spatiotemporal mouse placental atlas and the functional analysis of glycogen trophoblast cells were considered valuable, and that the data were viewed as providing solid support for the main conclusions.

      In the revised manuscript, we will further clarify the scope of these conclusions, particularly regarding the contribution of GC-associated glycogen metabolism to fetal viability in the global Ano6 knockout model. We will also refine the wording where needed to ensure that the mechanistic interpretation accurately reflects the strength of the available evidence.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors combine single-nucleus RNA sequencing with spatial transcriptomics to generate a spatiotemporal atlas of mouse placental development and explore the role of glycogen trophoblast cells in fetal viability. The study integrates several computational approaches, including trajectory analysis, regulatory network inference, and spatial mapping, together with histology and glycogen measurements. Based on these analyses, the authors propose that glycogen trophoblast cells provide metabolic support that is important for maintaining placental function and fetal survival.

      One of the main strengths of the study is the quality and scope of the dataset. The integration of snRNA-seq with Stereo-seq spatial transcriptomics provides a detailed view of placental organization across regions and developmental stages. This type of combined spatial and transcriptional analysis is still relatively rare in placental biology and represents an important contribution to the field. The atlas itself will likely be a valuable resource for future studies.

      Another strength is the effort to connect transcriptional findings with tissue-level validation. The glycogen staining and biochemical measurements support the interpretation that glycogen trophoblast cells contribute to placental metabolic function. The spatial analyses identifying macrophage accumulation in the labyrinth region of mutant placentas are also interesting and illustrate how spatial approaches can reveal microenvironmental changes that are difficult to detect otherwise.

      The main limitation of the study is that the conclusion that glycogen cells are essential mediators of metabolic support for fetal viability remains partly indirect. The transcriptomic and spatial data strongly suggest a role for these cells, but it is still difficult to determine whether glycogen cell dysfunction is the primary cause of fetal lethality or a consequence of broader placental abnormalities. Clarifying this point would strengthen the central message of the paper.

      Similarly, the macrophage accumulation observed in the labyrinth appears consistent with a response to tissue stress or injury, but its relationship to glycogen cell function is not fully explained. A clearer discussion of whether this represents a primary mechanism or a secondary effect would improve the interpretation.

      Overall, this is a strong dataset and a useful spatial atlas of placental development. The study provides convincing descriptive insight into glycogen trophoblast biology, and with some clarification of the mechanistic conclusions, the manuscript will be even stronger.

      We thank the reviewer for this constructive assessment of our manuscript. We are pleased that the reviewer recognized the quality and scope of the dataset, particularly the integration of snRNA sequencing with Stereo-seq spatial transcriptomics to generate a spatiotemporal atlas of mouse placental development. We also appreciate the reviewer’s view that this atlas represents a valuable resource for the placental biology and developmental biology communities. We also appreciate the reviewer’s important point that the causal relationship between glycogen trophoblast cell dysfunction, placental metabolic impairment, and fetal viability should be presented with appropriate caution. In the revised manuscript, we will clarify that our data support a strong association between impaired glycogen trophoblast cell function, altered placental glycogen metabolism, and fetal lethality in the global Ano6 knockout model, but do not by themselves establish glycogen trophoblast dysfunction as the sole or primary cause of fetal loss. We will revise the relevant sections to avoid overstatement and to distinguish more clearly between direct experimental evidence, correlative spatial-transcriptomic observations, and mechanistic interpretation. Similarly, we agree that the macrophage accumulation observed in the labyrinth region is most appropriately interpreted as a spatially localized immune or tissue-stress response in the mutant placenta. In the revised manuscript, we will expand the discussion to clarify that, while this observation may reflect downstream consequences of placental dysfunction and altered tissue homeostasis, the current data do not establish macrophage accumulation as a primary mechanism linking glycogen trophoblast defects to fetal lethality. We will therefore frame this finding as an important microenvironmental alteration revealed by the spatial atlas, rather than as definitive evidence of a direct causal pathway.

      Reviewer #2 (Public review):

      This manuscript constructs a spatiotemporal transcriptomic atlas (STAMP) of the mouse placenta from E9.5-E18.5 by integrating Stereo-seq and snRNA-seq, and identifies two glycogen trophoblast cell (GC) subtypes (GC-1 and GC-2), a spatial transition from the junctional zone (JZ) to the decidua, and metabolic defects in Ano6-null placentas including GC persistence, glycogen accumulation, reduced glycogenolysis metabolites, and partial rescue by maternal glucose supplementation. The breadth of the dataset and the integration of atlas construction with PAS/TEM/LC-MS analyses are impressive, and the study has the potential to provide a valuable resource for the placental biology community.

      However, in its current form, the central claim that "GC-mediated metabolic support is essential/indispensable for fetal viability" is not sufficiently disentangled from the complex phenotype of a global Ano6 knockout model. In addition, the stage-level biological replication in the atlas and the claim of "single-cell resolution" require more careful presentation. Therefore, while the study is interesting and potentially impactful, substantial revisions are required, particularly to recalibrate the strength of the conclusions and causal interpretations.

      Major comments

      (1) The most significant concern is that the manuscript overinterprets the phenotype observed in a global Ano6 knockout as direct evidence that GC glycogen metabolism is essential for fetal viability. The authors themselves report multiple severe placental abnormalities in the knockout, including reduced placental size and weight, structural defects in the labyrinth, impaired vascularization, and accumulation of abnormal regions. Previous studies cited in the manuscript also indicate that Ano6 deficiency leads to defects in syncytiotrophoblast formation, impaired maternofetal exchange, and perinatal lethality.

      In this context, the current data support an association between GC metabolic defects and fetal lethality, but do not establish that GC glycogen metabolism is the primary causal driver. The conclusion should therefore be moderated (e.g., "contributes to" rather than "is essential for"), unless additional placenta-specific or GC-specific functional validation is provided.

      (2) Maternal glucose supplementation is an interesting functional experiment, but in its current form, it provides supportive rather than definitive mechanistic evidence. While survival improves (from ~3% to ~10%), the rescue remains partial. Moreover, the readouts are largely limited to metabolite restoration (glucose, G1P, G6P) in the placenta and fetal liver.

      To support a stronger causal claim, the authors should assess whether glucose supplementation also rescues: placental morphology (especially labyrinth structure), GC number and PAS staining, ultrastructural glycogen features (TEM), fetal growth and developmental outcomes.

      (3) The atlas is constructed from nine placentas across developmental stages, suggesting limited biological replication per stage. It remains unclear how robust the observed temporal trends are to litter effects, sex differences, or sectioning variability.

      Furthermore, the "single-cell resolution" is not directly measured but inferred via image segmentation and reference-based mapping (e.g., TACCO). This should be more explicitly stated, as it represents computational inference rather than direct single-cell measurement.

      The authors should:

      - clearly report biological replicates per stage (including litter and sex),

      - demonstrate reproducibility of key patterns across independent samples,

      - refine the wording to reflect segmentation- and reference-based single-cell inference.

      (4) The proposed developmental trajectory (JZ progenitor → GC precursor → GC-1 → GC-2) and the claim of GC migration from JZ to decidua are based on spatial distribution and computational trajectory analyses (Monocle, CytoTRACE).

      While this is a compelling model, it remains inferential. The language throughout the manuscript should be softened (e.g., "consistent with spatial transition" rather than "migration"). Ideally, additional experimental validation, such as stage-resolved RNAscope/immunostaining quantification or lineage tracing, would strengthen this claim.

      (5) The manuscript concludes that ANO6 deficiency leads to impaired glycogen utilization, based primarily on the observation that differentiation markers and glycogenolytic enzyme transcripts are unchanged.

      However, this demonstrates what is not altered rather than what is mechanistically responsible for the defect. A more direct mechanistic link is needed, such as changes in enzyme activity, altered intracellular localization, effects on ion homeostasis or membrane biology.

      (6) The statistical framework requires clarification. Several analyses use n = 4-8 placentas or "independent experiments," but it is unclear whether these represent independent litters or multiple samples from the same dam.

      Given the risk of pseudoreplication in placental studies, the authors should define whether n refers to placentas or litters, report the number of dams per genotype, and ensure appropriate statistical treatment (e.g., litter-based analysis or mixed-effects models).

      We thank the Reviewer for the careful evaluation of our manuscript and for recognizing the breadth of the STAMP dataset and the value of integrating spatial transcriptomics, snRNA-seq, PAS, TEM and LC-MS analyses.

      We agree that the current manuscript overstates some mechanistic conclusions. In the revision, we will moderate the central claim and more clearly acknowledge that the global Ano6 knockout model has complex placental defects.

      Comment 1: Causality in the global Ano6 knockout model

      We agree that our current data do not prove that GC glycogen metabolism is the primary cause of fetal lethality in the global Ano6 knockout model. In the revised manuscript, we will avoid presenting GC dysfunction as the sole causal mechanism. We will replace stronger terms such as “essential” or “indispensable” with more measured wording such as “contributes to” or “supports.” We will frame impaired GC-associated glycogen metabolism as one important component of Ano6-null placental dysfunction.

      Comment 2: Maternal glucose supplementation

      We agree that maternal glucose supplementation provides supportive, but not definitive, mechanistic evidence. In the revision, we will describe the partial survival rescue more cautiously and will not use it as proof of GC-specific causality. Where possible, we will also assess whether glucose supplementation affects additional phenotypes, including fetal growth, placental morphology, GC abundance and PAS/glycogen readouts.

      Comment 3: Biological replication and single-cell resolution

      We agree that the replication structure and the wording of “single-cell resolution” need clarification. We will report the number of placentas, litters and available sex information for each stage. We will also revise the wording to make clear that the spatial single-cell annotation is based on image segmentation and snRNA-seq reference mapping, rather than direct single-cell measurement by Stereo-seq alone.

      Comment 4: GC trajectory and spatial transition

      We agree that the proposed GC trajectory and JZ-to-decidua transition remain inferential. We will soften the language throughout the manuscript, using terms such as “spatial transition,” “redistribution,” or “consistent with migration” rather than stating that migration has been directly proven.

      Comment 5: Mechanism of impaired glycogen utilization

      We agree that unchanged GC markers and glycogenolytic enzyme transcripts do not reveal the direct mechanism. In the revision, we will state more clearly that these data argue against gross GC differentiation defects or transcriptional loss of glycogenolytic enzymes, but that the direct mechanism may involve enzyme activity, localization, ion homeostasis or ANO6-dependent membrane biology.

      Comment 6: Statistical framework

      We agree that the statistical framework needs clearer reporting. We will define what each n represents, including placenta, section, litter, dam or independent experiment, and will revise the analysis or description where needed to minimize concerns about pseudoreplication.

      Overall, we appreciate these comments and will use them to make the revised manuscript more precise, transparent and appropriately cautious.

  3. Apr 2026
    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present a new autofocusing method, LUNA (Locking Under Nanoscale Accuracy), designed to overcome severe focus drift, a major challenge in long-term time-lapse microscopy. Using this method, they address a fundamental question in bacterial cold shock response: whether cells halt growth and division following an abrupt temperature downshift. Through single-cell analysis, the authors uncover a multi-phase adaptation process with distinct growth deceleration dynamics, and show that bacterial cells adapt to cold shock in a largely uniform manner across the population. Overall, this work provides new insights into the bacterial cold shock response at the single-cell level, extending beyond what can be inferred from population-level measurements.

      Strengths:

      (1) The LUNA method shows improved performance compared to existing autofocusing systems, achieving nanoscale precision over a large focusing range. Its focusing speed is sufficient for the experiments presented, with potential for further improvement through faster motors and optimized control algorithms, suggesting broad applicability. Theoretical simulations and experimental validation together provide strong support for the method's robustness.

      (2) Using LUNA, the authors address a long-standing question in bacterial physiology: whether cells arrest growth and division during the acclimation phase following cold shock. Single-cell analyses across the full course of cold adaptation reveal features that are obscured in bulk-culture studies. Cells continue to grow and divide at reduced rates while maintaining cell size regulation, and exhibit a three-phase adaptation program with distinct growth dynamics. This response appears uniform across the population, with no evidence for bet-hedging. Overall, the experiments are well designed, and the analyses are solid and support the authors' conclusions.

      (3) The authors further propose a model describing how population-level optical density (OD) depends on cell dry mass density, volume, and concentration. Following cold shock, cells grow more slowly and exhibit smaller sizes, explaining the apparently unchanged OD. This model provides a valuable conceptual framework for interpreting OD-based growth measurements, a widely used method in microbiology, and will be of broad interest to the field.

      Weaknesses:

      No major weaknesses identified.

      Comments on revisions:

      The authors have thoroughly addressed all of my questions. I thank them for their clear clarifications and thoughtful revisions, and I greatly appreciate their efforts in improving the manuscript.

      We sincerely thank the reviewer’s for the encouraging comments and positive assessment. We greatly appreciate the reviewer’s constructive feedback during the review process, which helped us improve the manuscript.

      Reviewer #2 (Public review):

      Summary:

      This study presents LUNA, an autofocus method that compensates for focus drift during rapid temperature changes. Using this approach, the authors show that E. coli cells continue to grow and divide during cold shock, revealing a coordinated, multi-phase adaptation process that could not be deduced from traditional population measurements. They propose a scattering-theory-based model that reconciles the paradox between growth differences of the bacteria at the single-cell level vs population level.

      Strengths:

      (1) The LUNA approach is pretty creative, turning coma aberration from what is normally a nuisance into an exploit. LUNA enabled long-term single-cell imaging during rapid temperature downshifts.

      (2) The authors show that the long-assumed growth arrest during cold shock from population-level measurements is misleading. At the single-cell level, bacteria do not stop growing or dividing but undergo a continuous, three-phase adaptation process. Importantly, this behavior is highly synchronized across the population and not based on bet-hedging.

      (3) Finally, the authors propose a model to resolve a long-standing paradox between single-cell vs population behavior: if cells keep growing, why does optical density (OD) of the culture stop increasing? Using light-scattering theory, they show that OD depends not only on cell number but also on cell volume, which decreases after cold shock. As a result, OD can remain flat, or even decrease, despite continued biomass accumulation. This demonstrates that OD is not a reliable proxy for growth under non-steady conditions.

      Weaknesses:

      (1) While the authors theoretically explain the advantages of LUNA over existing autofocus methods, it is unclear whether practical head-to-head comparisons have been performed, apart from the comparison to Nikon PFS shown in Video S1. As written, the manuscript gives the impression that only LUNA can solve this problem, but such a claim would require more systematic and rigorous benchmarking against alternative approaches.

      (2) No mutants/inhibitors used to test and challenge the proposed model.

      (3) Cells display a high degree of synchronization, but they are grown in confined microfluidic channels under highly uniform conditions. It is unclear to what extent this synchrony reflects intrinsic biology versus effects imposed by the microfluidic environment.

      (4) To further test and generalize the model, it would be informative to also examine bacterial responses at intermediate temperatures rather than focusing primarily on a single cold-shock condition.

      Comments on revisions:

      The authors have addressed my comments in their response, but have chosen not to incorporate most of them into the manuscript. Readers may refer to the peer review section for further details.

      We thank the reviewer for this additional comments and for the careful suggestions, and we appreciate that the raised points are valuable for a broader discussion of the topic. In the revised manuscript, we have incorporated the comments most directly relevant to the scope and central conclusions of the study, and have clarified these points in the text where appropriate. Specifically, we have clarified several key issues, including the interpretation of the OD lag as a “combined effect,” the performance and application scope of LUNA, the alignment of cell-cycle progression after cold shock, and relevant methodological details.

      For the remaining contextual issues, we have kept the detailed discussion in the response to reviewers rather than expanding the manuscript extensively, so as to preserve the focus and readability of the main text. We hope that the revisions now better acknowledge the reviewer’s concerns while maintaining a concise presentation of the central findings.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors developed a new autofocusing method, LUNA (Locking Under Nanoscale Accuracy), to address severe focus drift-a major challenge in time-lapse microscopy. Using this method, they tackle a fundamental question in bacterial cold shock whether cells halt growth and division following an abrupt temperature downshift. Overall, the experimental design, modeling, and data analysis are solid and well executed. However, several points require clarification or further support to fully substantiate the authors' conclusions.

      Strengths:

      (1) The LUNA method outperforms existing autofocusing systems with nanoscale precision over a large focusing range. The focusing time is reasonable for the presented experiments, and the authors note potential improvements by using faster motors and optimized control algorithms, suggesting broad applicability. The theoretical simulations and experimental validation provide solid support for the robustness of the method.

      (2) Using LUNA, the authors address a long-standing question in bacterial physiology: whether cells arrest growth and division after an abrupt cold shock. Single-cell analyses monitoring the entire course of cold adaptation and steady-state growth reveal features that are obscured in bulk-culture studies: cells continue to grow at reduced rates with smaller cell sizes, resulting in an apparently unchanged population-level OD. The experiments are well designed and analyses are generally solid and largely support the authors' conclusions.

      (3) The authors also propose a model describing how population-level OD measurements depend on cell dry mass density, volume, and concentration. This provides a valuable conceptual contribution to the interpretation of OD-based growth measurements, which remain a gold-standard method in microbiology.

      We thank the reviewer for acknowledging the strengths of our study.

      Weaknesses:

      (1) It is unclear whether the author's model explaining the population-level OD during acclimation is broadly applicable. Most analyses focus on a shift from 37˚C to 14˚C, where the model agrees well with experimental data. However, in the 37˚C to 12˚C experiment, OD600 decreases after cold shock (Fig. 5e), and the computed OD does not match the experimental measurements (Fig. S16a). Although the authors attribute this discrepancy to a "complicated interplay," no further explanation is provided, which limits confidence in the model's general applicability.

      Thank you for this careful evaluation regarding the model generality. In the experiment with a temperature shift from 37°C to 12°C, the measured OD600 values were 0.243 at 0 hours and 0.242 at 5 hours. In comparison, our model-computed OD600 values were 0.243 at 0 hours and 0.271 at 5 hours. The absolute difference between the measured and computed values at 5 hours is therefore 0.028.

      Given the typical experimental variability in OD600 measurements and the limited linear range of the OD-to-biomass approximation (generally considered reliable below ~0.5), this deviation is quantitatively modest. We appreciate your valuable feedback and are happy to provide further clarification if needed.

      (2) The manuscript proposes that cell-cycle progression becomes synchronized across the population after cold shock, but the supporting evidence is not fully convincing. If synchronization refers primarily to the uniform reduction in growth rate following cold shock, this could plausibly arise from global translation inhibition affecting all cells. However, the additional claim that "cells encountering a relatively late CSR will accelerate division to maintain synchronization" is not strongly supported by the presented data.

      We appreciate your critical reading, which has helped us identify ambiguities in our terminology and strengthen the clarity of our work. Regarding the term “synchronization”, we would like to clarify that it refers to two different scenarios: (i) the synchrony in the timing of growth rate changes after cold shock. The cells initiate the slowdown in growth almost simultaneously, suggesting a highly coordinated, non-stochastic population-level response to cold shock; (ii) the synchrony in division cycle progression.

      In the sentence you referenced “cells encountering a relatively late CSR will accelerate divisions to maintain synchronization”, we intended to describe that cells maintain consistent progression of the division cycle after cold shock, meaning that after the same number of elapsed cycles, different cells are at a similar stage in their division timing (Figure 4f, 4g, Figure S14). The term “accelerate” refers to our observation that cells which complete a given cycle later than others tend to have shorter subsequent inter-division intervals, thereby “catching up” to maintain alignment in cycle number across the population. We acknowledge that using “synchronization” in this scenario may be ambiguous, and we will replace it with more precise phrasing “progression of division cycle” to accurately convey this finding.

      (3) Several technical terms used in the method development section are not clearly defined and may be unfamiliar to a broad readership, which makes it difficult to fully understand the methodology and evaluate its performance. Examples include depth of focus, focusing precision, focusing time, focusing frequency, and drift threshold value. In addition, the reported average focusing time per location (~0.6 s) lacks sufficient context, limiting the reader's ability to assess its significance relative to existing autofocusing methods.

      Thank you for your valuable comments and suggestions. In response, we have added more detailed descriptions in the Methods section of the revised version.

      The reviewer noted that the reported average focusing time (~0.6 s) lacks sufficient context, which may limit readers’ ability to assess its significance relative to existing autofocusing methods. We would like to clarify that the core innovation of this work lies in the proposed theoretical framework for autofocusing, which offers advantages over existing methods in terms of focusing precision and range. While focusing time is a practically relevant performance metric, it is primarily presented here as an implementation-dependent parameter rather than a central theoretical contribution of this study. In our experimental setup, an average focusing time of 0.6 s proved sufficient for routine timelapse imaging in microscopy, thereby demonstrating the practical usability of LUNA.

      Reviewer #2 (Public review):

      Summary:

      This study presents LUNA, an autofocus method that compensates for focus drift during rapid temperature changes. Using this approach, the authors show that E. coli cells continue to grow and divide during cold shock, revealing a coordinated, multi-phase adaptation process that could not be deduced from traditional population measurements. They propose a scattering-theory-based model that reconciles the paradox between growth differences of the bacteria at the single-cell level vs population level.

      Strengths:

      (1) The LUNA approach is pretty creative, turning coma aberration from what is normally a nuisance into an exploit. LUNA enabled long-term single-cell imaging during rapid temperature downshifts.

      (2) The authors show that the long-assumed growth arrest during cold shock from population-level measurements is misleading. At the single-cell level, bacteria do not stop growing or dividing but undergo a continuous, three-phase adaptation process. Importantly, this behavior is highly synchronized across the population and not based on bet-hedging.

      (3) Finally, the authors propose a model to resolve a long-standing paradox between single-cell vs population behavior: if cells keep growing, why does optical density (OD) of the culture stop increasing? Using light-scattering theory, they show that OD depends not only on cell number but also on cell volume, which decreases after cold shock. As a result, OD can remain flat, or even decrease, despite continued biomass accumulation. This demonstrates that OD is not a reliable proxy for growth under non-steady conditions.

      We thank the reviewer for acknowledging the strengths of our study.

      Weaknesses:

      (1) While the authors theoretically explain the advantages of LUNA over existing autofocus methods, it is unclear whether practical head-to-head comparisons have been performed, apart from the comparison to Nikon PFS shown in Video S1. As written, the manuscript gives the impression that only LUNA can solve this problem, but such a claim would require more systematic and rigorous benchmarking against alternative approaches.

      Thank you for your insightful comment regarding the comparison of LUNA with other autofocus methods.

      In our study, we primarily compared LUNA with the Nikon PFS system (as shown in Video S1) because Nikon PFS is one of the most widely used commercial autofocus systems in single-cell time-lapse imaging, and its manufacturer provides well-defined performance parameters (e.g., focusing precision within 1/3 depth-of-focus, response time <0.7 s), which facilitates a quantitative comparison. For other commercial systems, such as Olympus ZDC, Zeiss Definite Focus, Leica AFC, and ASI CRISP, the publicly available specifications are often less clearly defined, or are measured under inconsistent conditions, making a direct head-to-head comparison challenging and potentially misleading. Additionally, in our preliminary experiments, we also tested an Olympus microscope and observed severe focus drift during slow cooling processes. From a physical perspective, LUNA is specifically designed to meet the demanding requirements of single-cell experiments, including a wide focusing range and high precision, while existing commercial systems may not physically achieve the combination of range and accuracy needed for such extreme conditions.

      (2) No mutants/inhibitors used to test and challenge the proposed model.

      We agree that such approaches would provide valuable mechanistic insights and further strengthen the validation of the model presented in this study. In the current work, our primary goal was to introduce LUNA autofocusing method and demonstrate its capability to resolve bacterial cold shock response at the single-cell level with unprecedented precision. As such, we focused on characterizing the wild-type physiological dynamics under cold shock, which already revealed several previously unreported phenomena. We acknowledge that the use of genetic mutants or chemical inhibitors targeting specific cold shock proteins or regulatory pathways would be a logical and powerful next step to dissect the underlying molecular mechanisms and test the causality of the observed growth dynamics. We plan to address this in future work by incorporating such perturbations to further test and refine the model.

      (3) Cells display a high degree of synchronization, but they are grown in confined microfluidic channels under highly uniform conditions. It is unclear to what extent this synchrony reflects intrinsic biology versus effects imposed by the microfluidic environment.

      The reviewer raises a pertinent question regarding whether the observed high degree of cell synchronization represents an intrinsic biological phenomenon or an artifact induced by the microfluidic environment.

      Over the past decade, microfluidic chips, including the specific design used in our work, have become a widely accepted and powerful tool in microbial physiology research. A broad consensus has emerged within the community that the microenvironment within these microchannels does not significantly interfere with or perturb the natural physiological behavior of microorganisms (Dusny, C. & Grünberger, Curr Opin Biotechnol. 63, 26-33 (2020)). This understanding is also supported by the fact that key findings obtained with microfluidic single-cell technologies are reproducible by other methods. For example, the adder model of cell-size homeostasis in E. coli firstly observed in microfluidic chips has been repeatedly validated by different methods (Taheri-Araghi, S. et al. Curr. Biol. 25, 385-391 (2015)). Therefore, while we acknowledge the importance of considering environmental effects, we are confident that the synchronization we report reflects the genuine biological dynamics of E. coli cells.

      (4) To further test and generalize the model, it would be informative to also examine bacterial responses at intermediate temperatures rather than focusing primarily on a single cold-shock condition.

      We thank the reviewer for this thoughtful suggestion. In designing our experiments, we aimed to study the bacterial cold shock response at the single-cell level. A key feature of this response is that it is typically triggered only when the temperature drops below a certain threshold within a short time duration. We therefore chose to lower the temperature from 37 °C to 14 °C as rapidly as possible. This approach allowed us to leverage the unique capabilities of LUNA while also providing an opportunity to explore this biological process in greater detail.

      We agree that investigating bacterial responses across intermediate temperatures would be highly informative for understanding how temperature changes affect cellular physiology. However, this direction addresses a distinct scientific question that lies beyond the scope of the current work. We fully acknowledge its value and do have the intention to explore it in future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major points:

      (1) To strengthen the generality of the conclusions regarding cold shock response, it would be helpful to include a similar single-cell analysis of growth and division (cell size and concentration) for the 37˚C to 12˚C temperature shift. In this case, the experimental acclimation lasts ~5 hours, whereas the model predicts ~2 hours (Fig. S16a). Examining whether the model still holds or whether additional factors (e.g., further reductions in cell size) contribute to the observed OD decrease would clarify this discrepancy.

      We thank the reviewer for this valuable suggestion. Our model for explaining the population-level OD dynamics during acclimation does not depend on single-cell time-lapse microscopy data. Instead, the single-cell inputs used for parameterization were obtained from flow cytometry measurements, which quantify population-wide single-cell distributions. Therefore, the model is not intrinsically restricted to a specific imaging-based experimental setup or to a particular temperature shift.

      Most of the quantitative analysis presented in the manuscript focuses on the 37°C to 14°C transition, where the model shows strong agreement with experimental OD measurements. We selected this condition because it provides high-quality, internally consistent datasets at both the single-cell and population levels. However, the modeling framework itself is mechanistic and parameter-based, rather than temperature-specific. In principle, it can be applied to other temperature shifts, provided that the corresponding single-cell growth and state-transition parameters are experimentally determined.

      Regarding the temperature shift from 37°C to 12°C, the model demonstrates good agreement with the experimental observation that acclimation lasts approximately 5 hours. The minor deviations in several data points during the acclimation period can be attributed to systematic errors in the measurement of cell concentration and volume, as illustrated in the lower panel of Figure S16a. We are open to extend our analysis to additional temperature shifts in future work to further validate the model’s generality.

      (2) Related to weakness #2, it would be helpful for the authors to clarify their definition of "synchronization" and to provide additional explanation or evidence supporting this claim. In particular, further discussion of the data in Fig. 4f, 4g, and S14 could help strengthen the proposed hypothesis.

      We thank the reviewer for this constructive suggestion. In previous response (public review weakness #2), we clarified the definition of “synchronization” in the revised manuscript by explicitly distinguishing between two types of synchrony: (i) the synchrony in the timing of growth rate changes after cold shock, and (ii) the synchrony in division cycle progression. For the latter, we now use the more precise term “progression of division cycle” to avoid ambiguity. Furthermore, we have expanded the discussion of the data in Figures 4f, 4g, and S14 to better support the claim that cells actively maintain alignment in cycle progression. We hope these revisions address the reviewer’s concern and strengthen the evidence for our hypothesis.

      Minor points:

      (1) Line 78: "... and concluded that the OD lag is actually the outcome of the synergy of changes in bacterial concentration and volume, ..." The term synergy usually implies a combined effect greater than the sum of individual effects. Are the changes in bacterial concentration and volume synergistic here?

      We agree with your observation that the term "synergy" in scientific contexts typically implies an interaction effect that is greater than the sum of individual effects. In our original phrasing, we intended to convey that the observed OD lag is a result of the combined contributions from both changes in bacterial concentration and changes in cell volume, rather than being dominated by a single factor. We did not mean to imply a super-additive interaction between these two variables.

      We acknowledge that the relationship between bacterial concentration and cell volume can be complex and may even exhibit interdependence under certain conditions (e.g., under nutrient limitation at high OD). However, using "synergy" could indeed be misleading. To ensure terminological precision and avoid any potential misinterpretation, we will revise the text in the revised manuscript. We will replace "synergy" with a more neutral and accurate phrase "combined effect".

      (2) Figure 2d: Why does the focusing time increase even after temperature stabilizes following the downshift? Does focus drift depend not only on rapid cooling but also on the lower steady-state temperature? Additional explanation would be helpful.

      As noted in the Methods section ("Time-lapse imaging of bacteria under CS"), when the temperature was lowered, the objective lens heater was stopped, which caused a slightly longer focusing time. This is because prior to the temperature downshift, the objective heater maintained the objective at a temperature close to that of the sample (37°C), minimizing any thermal gradient between them. After the temperature decrease to 14°C, while the sample chamber was precisely controlled at the target low temperature, the objective lens now without active heating gradually equilibrated to ambient room temperature (approximately 22–25°C). This created a stable temperature mismatch between the relatively warmer objective and the colder sample. Such a temperature gradient can cause minor thermal expansion or contraction of the objective lens barrel, leading to a small but persistent shift in the focal plane. Consequently, the focusing time remained slightly elevated (∼0.6 s) compared to the 37°C condition (∼0.3 s), even after the sample temperature had stabilized. This offset reflects the steady-state thermal disequilibrium between the objective and the sample, rather than a transient cooling effect. We hope this explanation clarifies the reviewer’s concern.

      (3) Line 234: "Reanalysis of the protein synthesis dynamics after CS revealed increase in CSPs synthesis (Figure 3e)." A citation is needed here. Additionally, the dataset referenced here was generated using a 37˚C to 10˚C cold shock.

      We thank the reviewer for the insightful comments and the careful reading of our manuscript. We have now added the appropriate citation in the main text (Zhang, Y. et al. Molecular Cell 70, 274–286 (2018)). The dataset used in this reanalysis was generated under a 37°C to 10°C cold shock, rather than 12°C, and we have clarified this in the Methods section to avoid any ambiguity.

      We would also like to clarify our rationale for using this published dataset in the present context. To our knowledge, no published dataset exists with comparable protein synthesis dynamics specifically at 12°C. Our intention here was to reference a well-characterized cold-shock dataset to support the qualitative point that CSP synthesis increases and ribosome synthesis decreases after cold shock. In cold shock studies, many qualitative conclusions are broadly consistent across low-temperature conditions (e.g., below ~15°C, and in some cases more broadly below ~20°C), including the observation that the ribosomal protein fraction is relatively insensitive to temperature change (Herendeen, S. L. et al. Journal of Bacteriology. 139, 185–194 (1979), Knapp, B. D. & Huang, K. C. Annual Review of Biophysics. 51, 499–526 (2022)). We appreciate the reviewer’s valuable feedback, which has helped us improve the clarity and accuracy of our work.

      (4) Figure 3f and 3g: How is growth rate defined here, and why do the elongation rate and growth rate yield different results? My understanding is that, during steady-state growth, cell elongation rate increases as cells progress through a single cell cycle prior to division, whereas G0 cells exhibit reduced elongation rate following cold shock. Is this correct? More explanation is also needed for "linear growth in growth mode" (Line 267).

      Thank you for this important comment. In our manuscript, we use:

      Elongation rate = dL/dt (the absolute rate of increase in cell length; y-axis in Figure 3f)

      Growth rate = (dL/dt)/L (i.e., λ, y-axis in Figure 3g; also referred to in some studies as the instantaneous growth rate)

      Because these are different quantities, they do not necessarily follow the same trend across the cell cycle. To clarify the logic behind our “growth mode” classification (also see Willis & Huang, Nat Rev Microbiol 2017):

      For a rod-shaped cell growing in length L,

      (1) Exponential growth means the elongation rate is proportional to cell size, i.e.,

      𝑑𝐿/𝑑𝑡 ∝ 𝐿

      or equivalently,

      (𝑑𝐿/𝑑𝑡)/𝐿) = constant

      (2) Linear growth means the elongation rate is constant throughout the cell cycle, i.e.,

      𝑑𝐿/𝑑𝑡 = constant

      which implies that

      (𝑑𝐿/𝑑𝑡)/𝐿)

      decreases as the cell elongates.

      Based on these two basic cases, additional growth modes (e.g., super-exponential, sub-exponential, sub-linear) can also be defined, as illustrated in the Author response image 1.

      Author response image 1.

      With this definition, our interpretation of Figure 3f and 3g is as follows: before cold shock, cells are consistent with approximately exponential growth (red line in Figure 3g), whereas after cold shock, the G0 cells are better described as undergoing approximately linear growth (yellow line in Figure 3f).

      (5) Figure S12: Why are the curves not continuous across GN, G0, G1, and G2?

      In this figure, we present two different metrics: elongation rate (𝑑𝐿/𝑑𝑡) in panel (a) and growth rate (𝜆 = (𝑑𝐿/𝑑𝑡)/𝐿) in panel (b). During bacterial division, the cell length approximately halves while the growth rate remains constant under steady-state conditions. As a result, elongation rate, which is proportional to the instantaneous length, also halves at each division event, leading to the observed discontinuities at the time points corresponding to divisions (GN, G0, G1, and G2). In contrast, growth rate is inherently continuous across divisions, as shown in panel (b), although minor apparent discontinuities may appear due to the finite temporal resolution of our measurements. We hope this explanation clarifies the figure.

      (6) Figure 4d: X-axis labels are missing.

      Thank you for your insightful comment. The six panels share identical axes in Figure 4d. To enhance the visual focus on the data trends across different generations, we intentionally displayed the X-axis label and numerical tick labels only on the first panel. The subsequent panels show only the tick marks without the numerical labels, as their scale is identical to that of the first panel.

      (7) Line 285 and Figure 4e: "The changes in λ are highly synchronized in time, with the exact time lag between any pair of ξ not exceeding 2 min ..." What is the definition of time lag?

      In our study, the term "time lag" refers to the absolute difference in time at which a large sudden drop of the λ curve occurs between any two pairs of ξ. Essentially, it quantifies how closely the dynamic changes in λ are aligned across different groups. A time lag of zero would indicate perfect synchrony, while a value within 2 minutes implies that the variations in λ for any pair of ξ occur nearly simultaneously.

      (8) Figure S14: Why can the elapsed cycles take negative values?

      In Figure S14, we plotted the centered values. Specifically, at each time point, we calculated the mean elapsed cycle number across all lineages, and then subtracted this mean from each group’s value. The resulting values are presented in the figure as “Elapsed cycles (zero-centered)”. Thus, negative values are expected and meaningful they represent lineages that are progressing more slowly than the average at that time point. This transformation helps to highlight the relative differences among groups over time, while removing the overall temporal trend (which is already shown in Figure 4g).

      (9) Figure 5 legend: Fitting for the acclimation has a R2 of -0.263 (Pearson correlation coefficient -0.00). R^2 should not be negative, and it doesn't agree with the calculated Pearson correlation coefficient.

      Thank you for this important observation. Indeed, R<sup>2</sup> should normally fall within the range [0, 1]. This discrepancy arises because the fitting model used differs from the default linear regression, and we did not specify this in the original figure legend. In the revised manuscript, this has been corrected. The explanation why R<sup>2</sup> is negative here is as follows:

      The linear fit used is y = a·x (i.e., no-intercept, forced through the origin). This is based on the physical principle that when OD is zero (no bacteria), the total bacterial mass must also be zero. For ordinary linear models with an intercept, R<sup>2</sup> ranges from 0 to 1. However, for no-intercept models, the calculation of total sum of squares (SS<sub>tot</sub>) differs (typically relative to zero rather than the mean of y), and R<sup>2</sup> can become negative if the fit performs worse than the baseline y = 0. Here, R<sup>2</sup> = -0.263 simply indicates that for these specific data points, the origin-constrained linear fit does not outperform the trivial y=0 model. Regarding the Pearson correlation: The near-zero coefficient (-0.00) suggests no significant linear trend between X and Y, which is consistent with the poor fit performance.

      (10) Language and typos: The manuscript contains grammatical errors and typos that require careful proofreading (one example: Line 56 "..., and reflection-based approaches ...").

      We thank the reviewer for the careful reading and for drawing our attention to the language and typographical issues in the manuscript. In the revised version, we will carefully proofread the entire text and correct any errors and inconsistencies, including the example pointed out in line 56.

      Reviewer #2 (Recommendations for the authors):

      (1) The LUNA section is extremely technical and advanced for most biologists - it might be useful to include a few sentences in simple language why LUNA helps solve the biology question.

      We thank the reviewer for the valuable suggestion. We have now added a concise, plain-language overview at the end of the LUNA section (Performance Analysis of LUNA):

      “In brief, LUNA locks the focal plane with nanometer-scale precision over an ultra-large range rapidly, ensuring stable focus during long-term imaging for reliable observation of fine subcellular structures and dynamics.”

      (2) The suggestions I included in the weakness section are not mandatory to perform, but will be helpful to at least discuss in the paper.

      We thank the reviewer for the thoughtful comment and for acknowledging that the suggestions in the weakness section are not mandatory. We have carefully considered each point raised and have provided detailed responses in the point-by-point reply. While we recognize the potential value of these suggestions for further expanding the study, we respectfully believe that incorporating them into the current manuscript would go beyond the intended scope of this work.

      Thanks

      Otherwise, great job with the paper!

      We are truly grateful to the reviewer for the encouraging feedback and appreciate the time and effort invested in improving our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Concerns persist regarding the interpretation of data and the validation of experiments. First, the presence of T cells, NKT cells, and neutrophils in both the control and METH-treated hippocampi suggests that blood contamination rather than immune cell infiltration is the cause. Since the authors claim that METH disrupts the blood-brain barrier, increasing the infiltration of these immune cells, identifying the source of these immune cells is critical.

      We sincerely appreciate the valuable suggestions you have provided. Your professional perspective impresses us. Based on your suggestion, we conducted a systematic review and in-depth analysis of the experimental process.

      As you have pointed out, we believe that the T cells, NK cells and neutrophils detected in the single-cell sequencing of the mouse hippocampus may have a blood-derived origin. However, this does not mean that the presence of these cell types in the control group is abnormal, because in many literature, these cells can also be found in the hippocampus of control mice. Nevertheless, clarifying the origin and location of these cells will help to further strengthen the persuasiveness of the research hypothesis. Although there is currently no systematic discussion on the role of such cells in the field of methamphetamine neurotoxicity research, we believe that the relevant findings still have certain reference value for subsequent research in this field.

      Our response is based on the following description:

      (1) Insufficient perfusion during the extraction of the hippocampus may lead to a certain degree of blood contamination.

      Given that the single-cell sequencing technique employed in this study can detect all the mRNA of the entire cell, in order to ensure that the cells are in the optimal physiological state and to minimize the stress response caused by the experimental operation on the cells, we perfused the anesthetized mice with cold PBS for approximately 3 min (this has been supplemented in the Materials and methods Line165-166), and completed the rapid dissection and collection of the mouse hippocampus on the ice surface within 2 min, and immediately placed it in an appropriate amount of tissue preservation solution for storage. The time of tissue perfusion might be insufficient or the perfusion volume might not be adequate, resulting in the incomplete expulsion of all the blood. Subsequently, the decomposition operations of the tissue samples were all carried out in the preservation solution or PBS buffer, which to some extent reduced the potential interference of blood components on the experimental results. Additionally, T cells, NKT cells and neutrophils in the capillary perivascular spaces of the hippocampal tissue might still remain and be successfully captured, and were reflected in the final sequencing data.

      (2) The presence of T cells, NKT cells, and neutrophils in the brain tissue of normal mice has been supported by existing literature. Moreover, several studies have specifically described the localization of these immune cell types within the brain parenchyma.

      Contemporary studies have completely changed the view of brain immunity from envisioning the brain as isolated and inaccessible to peripheral immune cells to an organ in close physical and functional communication with the immune system for its maintenance, function, and repair. Circulating immune cells reside in special niches in the brain’s borders, the choroid plexus, meninges, and perivascular spaces, from which they patrol and sense the brain in a remote manner [1].

      A large-scale mouse brain cell atlas study also reported that approximately 8% of non-neuronal cells are immune cells, including microglia, boundary-associated macrophages, lymphocytes, dendritic cells, and monocytes [2].

      Hang Yao et al. demonstrated through flow cytometry that neutrophils were present in the hippocampal tissues of both healthy control mice and depressed mice (Fig.2 H) [3]. Wei Su et al. identified through single-cell sequencing that dendritic cells, neutrophils, macrophages, T cells, and NKT cells were present in the brain tissues of non-transgenic (Non-Tg) control mice (Fig.1a-b), and the localization of these cells was explicitly characterized as brain parenchyma in the study [4]. Tomomi M Yoshida et al. discovered through immunohistochemistry (IHC) and single-cell sequencing techniques that there were a certain number of CD3+ and CD4+ T cells in the hippocampus and other regions of the brain, and they observed that these cells were located outside the blood vessels. (Fig.1a-c, g) [5].

      (3) Both the analysis of immune cells within blood vessels and those in the brain parenchyma contribute to elucidating the immune effects in the hippocampal microenvironment under chronic METH exposure, as well as their interactions with other cell types. At present, the understanding of the neurotoxicity of methylphenidate and the immune system is still limited to the central resident immune cells, such as microglia, astrocytes and oligodendrocytes [6]. Adaptive immune cells and myeloid cells recruited from the circulation have also been implicated in brain development, function, and aging. Their depletion during developmental stages can disrupt critical neural processes, including glial cell maturation, neuronal activity, and myelinogenesis. However, the precise developmental stage at which lymphocyte infiltration into the central nervous system occurs remains to be elucidated [7].

      Our data results indicate that during chronic METH abuse, T cells are more active and participate in the regulation of cytokines through complement signaling. At the same time, the frequency of cell communication between endothelial cells and epithelial cells is increased. Moreover, microglia upregulated the processes of cell chemotaxis and migration, as well as the communication with immune cells such as T cells, and to some extent, this also suggests an enhanced infiltration of T cells. However, we also recognize that the current conclusions regarding immune cell infiltration based on sequencing data and literature reports lack the support of experimental data. Currently, we are conducting morphological analysis using the same batch of brain tissue samples to further validate the relevant findings.

      Immune fluorescence staining and flow cytometry can be utilized to further determine the locations of these immune cells in the hippocampus. The classical pathways through which peripheral immune cells enter the brain mainly include the BBB and the choroid plexus. In June 2025, Kim N. Green et al. published a study in Neuron, further revealing that during the developmental stage and in cases of inflammatory diseases, immune cells can also infiltrate the brain parenchyma through a newly identified channel - the medial ventricle, thereby further confirming that these cells have the ability to migrate to the central nervous system under specific physiological or pathological conditions [8].

      (1) Castellani G, Croese T, Peralta Ramos JM, Schwartz M. Transforming the understanding of brain immunity. Science. 2023 Apr 7;380(6640):eabo7649. doi: 10.1126/science.abo7649.

      (2) Zhang M, Pan X, Jung W, Halpern AR, Eichhorn SW, Lei Z, Cohen L, Smith KA, Tasic B, Yao Z, Zeng H, Zhuang X. Molecularly defined and spatially resolved cell atlas of the whole mouse brain. Nature. 2023 Dec;624(7991):343-354. doi: 10.1038/s41586-023-06808-9.

      (3) Yao H, Jiang SY, Jiao YY, Zhou ZY, Zhu Z, Wang C, Zhang KZ, Ma TF, Hu G, Du RH, Lu M. Astrocyte-derived CCL5-mediated CCR5+ neutrophil infiltration drives depression pathogenesis. Sci Adv. 2025 May 23;11(21):eadt6632. doi: 10.1126/sciadv.adt6632.

      (4) Su W, Saravia J, Risch I, Rankin S, Guy C, Chapman NM, Shi H, Sun Y, Kc A, Li W, Huang H, Lim SA, Hu H, Wang Y, Liu D, Jiao Y, Chen PC, Soliman H, Yan KK, Zhang J, Vogel P, Liu X, Serrano GE, Beach TG, Yu J, Peng J, Chi H. CXCR6 orchestrates brain CD8+ T cell residency and limits mouse Alzheimer's disease pathology. Nat Immunol. 2023 Oct;24(10):1735-1747. doi: 10.1038/s41590-023-01604-z.

      (5) Yoshida TM, Nguyen M, Zhang L, Lu BY, Zhu B, Murray KN, Mineur YS, Zhang C, Xu D, Lin E, Luchsinger J, Bhatta S, Waizman DA, Coden ME, Ma Y, Israni-Winger K, Russo A, Wang H, Song W, Al Souz J, Zhao H, Craft JE, Picciotto MR, Grutzendler J, Distasio M, Palm NW, Hafler DA, Wang A. The subfornical organ is a nucleus for gut-derived T cells that regulate behaviour. Nature. 2025 Jul;643(8071):499-508. doi: 10.1038/s41586-025-09050-7.

      (6) Shi S, Sun Y, Zan G, Zhao M. The interaction between central and peripheral immune systems in methamphetamine use disorder: current status and future directions. J Neuroinflammation. 2025 Feb 15;22(1):40. doi: 10.1186/s12974-025-03372-z.

      (7) Castellani G, Croese T, Peralta Ramos JM, Schwartz M. Transforming the understanding of brain immunity. Science. 2023 Apr 7;380(6640):eabo7649. doi: 10.1126/science.abo7649.

      (8) Hohsfield LA, Kim SJ, Barahona RA, Henningfield CM, Mansour K, Vallejo KD, Tsourmas KI, Kwang NE, Ghorbanian Y, Angulo JAA, Gao P, Pachow C, Inlay MA, Walsh CM, Xu X, Lane TE, Green KN. Identification of the velum interpositum as a meningeal-CNS route for myeloid cell trafficking into the brain. Neuron. 2025 May 28:S0896-6273(25)00351-4. doi: 10.1016/j.neuron.2025.05.004.

      (2) Secondly, the pseudotime analysis, which suggests altered neural stem cell (NSC) differentiation, is not conclusively supported by the current data and requires further validation.

      We sincerely appreciate your valuable feedback, which we find highly relevant and constructive. It is important to acknowledge that the sequencing data presented in our study currently lacks experimental validation. Nevertheless, considering that existing research on the effects of METH on neural stem cell differentiation predominantly emphasizes observational phenomena and remains limited in terms of in vivo experimental evidence and mechanistic investigations, we aim to contribute our analytical findings as a reference for further scholarly exploration in this field.

      Our study utilized pseudotime analysis (powered by Monocle2) to reconstruct an "imaginary timeline" (pseudo-time) based on intercellular gene expression similarities, thereby modeling the dynamic state transitions of cells during continuous biological processes. Drawing upon single-cell RNA sequencing data captured as "snapshots" from hippocampal astrocytes, neural stem cells, and neuroblasts in mice four weeks after METH exposure, we applied computational algorithms to integrate the originally discrete cellular states into a continuous pseudo-time trajectory. This approach was employed to elucidate the differentiation stages of these cell populations, identify potential branching points in their developmental pathways, and uncover the key regulatory genes driving the differentiation process. Pseudotime analysis, as a computational approach grounded in mathematical modeling, yields inferences that are contingent upon the underlying assumptions of the algorithms employed. Consequently, experimental validation through methodologies such as time-series sampling and lineage tracing is essential to substantiate the derived biological interpretations. In light of the insufficiency of such empirical verification to date, our conclusions concerning alterations in the dynamic behavior of neural stem cell differentiation remain preliminary and require further experimental support.

      In Figures 5C and 5F, we present the expression profiles of the four genes exhibiting the most statistically significant differences across the differentiation trajectory. In Figures 5B and 5E, we conducted GO and KEGG functional enrichment analyses on the genes that showed significant differential expression at different differentiation stages. While no studies within the current METH research domain have reported on the potential effects of these genes on neural stem cell differentiation, emerging evidence from related fields provides preliminary insights into their functional roles. For instance, the Flt1 gene (also known as VEGFR1), referred to as the vascular endothelial growth factor receptor, has been demonstrated to play a critical role in the conversion of Müller glial cells into neurons within the zebrafish retina [1], serves as a critical regulator in promoting definitive neural stem cell survival [2]. Furthermore, it substantiates the intricate interconnection between neurons, neural stem cells, and vascular cells, as identified in our cell communication analysis. Hsp1b gene plays a significant role in ferroptosis and autophagy processes of nerve cells[3, 4], and may be closely related to the self-renewal ability of neural stem cell, while METH may impair neural stem cell function by disrupting autophagy, leading to reduced self-renewal capacity and altered differentiation potential [5]. In METH group, Sox11 has been shown to play a critical role in early differentiation and neuronal growth, both during perinatal development and in adult neurogenesis [6] Fos gene plays a critical regulatory role in the differentiation of neural stem cells into neurons and in modulating neuronal functional activities [7]; Alterations in Ccl5 expression levels may indicate astrocyte-mediated inflammatory responses, which could represent one of the underlying mechanisms through which METH promotes the differentiation of neural stem cells into astrocytes.

      Thank you very much for your thoughtful questions and valuable suggestions. These suggestions have helped us gain a deeper understanding of the areas where we can improve, and have guided us toward more meaningful directions for future research.

      (1) Mitra S, Devi S, Lee MS, Jui J, Sahu A, Goldman D. Vegf signaling between Müller glia and vascular endothelial cells is regulated by immune cells and stimulates retina regeneration. Proc Natl Acad Sci U S A. 2022 Dec 13;119(50):e2211690119. doi: 10.1073/pnas.2211690119.

      (2) Wada T, Haigh JJ, Ema M, Hitoshi S, Chaddah R, Rossant J, Nagy A, van der Kooy D. Vascular endothelial growth factor directly inhibits primitive neural stem cell survival but promotes definitive neural stem cell survival. J Neurosci. 2006 Jun 21;26(25):6803-12. doi: 10.1523/JNEUROSCI.0526-06.2006.

      (3) Meng J, Fang J, Bao Y, Chen H, Hu X, Wang Z, Li M, Cheng Q, Dong Y, Yang X, Zou Y, Zhao D, Tang J, Zhang W, Chen C. The biphasic role of Hspb1 on ferroptotic cell death in Parkinson's disease. Theranostics. 2024 Aug 1;14(12):4643-4666. doi: 10.7150/thno.98457.

      (4) Sisto A, van Wermeskerken T, Pancher M, Gatto P, Asselbergh B, Assunção Carreira ÁS, De Winter V, Adami V, Provenzani A, Timmerman V. Autophagy induction by piplartine ameliorates axonal degeneration caused by mutant HSPB1 and HSPB8 in Charcot-Marie-Tooth type 2 neuropathies. Autophagy. 2025 May;21(5):1116-1143. doi: 10.1080/15548627.2024.2439649.

      (5) Gu C, Wang Z, Luo W, Ling H, Cui X, Deng T, Li K, Huang W, Xie Q, Tao B, Qi X, Peng X, Ding J, Qiu P. Impaired olfactory bulb neurogenesis mediated by Notch1 contributes to olfactory dysfunction in mice chronically exposed to methamphetamine. Cell Biol Toxicol. 2025 Feb 20;41(1):46. doi: 10.1007/s10565-025-10004-y.

      (6) Rasetto NB, Giacomini D, Berardino AA, Waichman TV, Beckel MS, Di Bella DJ, Brown J, Davies-Sala MG, Gerhardinger C, Lie DC, Arlotta P, Chernomoretz A, Schinder AF. Transcriptional dynamics orchestrating the development and integration of neurons born in the adult hippocampus. Sci Adv. 2024 Jul 19;10(29):eadp6039. doi: 10.1126/sciadv.adp6039.

      (7) Pagin M, Pernebrink M, Pitasi M, Malighetti F, Ngan CY, Ottolenghi S, Pavesi G, Cantù C, Nicolis SK. FOS Rescues Neuronal Differentiation of Sox2-Deleted Neural Stem Cells by Genome-Wide Regulation of Common SOX2 and AP1(FOS-JUN) Target Genes. Cells. 2021 Jul 12;10(7):1757. doi: 10.3390/cells10071757.

      Reviewer #2 (Public review):

      (1) Despite this potential novelty, the study has numerous weaknesses. Notably, single-cell RNA sequencing was unable to capture an adequate number of neuronal populations. Neurons accounted for only approximately 0.6% of the total nuclei, representing a significant underrepresentation compared to their actual physiological proportion. Given that the behavioral effects of METH are likely mediated by neuronal dysfunction, readers would reasonably expect to see transcriptional changes in neurons. The authors should explain why they were unable to capture a sufficient number of neurons and justify how this incomplete dataset can still provide meaningful scientific insights for researchers studying METH-induced hippocampal damage and behavioral alterations.

      Thank you sincerely for bringing this important issue to our attention.

      Firstly, this represents an unavoidable technical bottleneck. The single-cell sequencing (scRNA-seq) we perform involves the detection of mRNA at the whole-cell level, a process that necessitates cells with high structural integrity, robust viability, and minimal exposure to external stimuli. During the preparation of single-cell suspensions, mature neurons due to their highly differentiated state, morphological rigidity, and excessively long axons often fail to maintain structural integrity. These cells typically undergo death during the dissociation process, lose viability, and are subsequently excluded prior to sequencing. To retain a substantial amount of neuron-related data, an alternative technique single-cell nuclear sequencing (snRNA-seq) should be employed. This method does not necessitate cell viability and focuses exclusively on the nuclei of individual cells, thereby capturing mRNA information solely from the nuclear compartment. Consequently, mRNA data originating from the cytoplasm and organelles will not be represented.

      Secondly, numerous studies have shown that the neurological damage caused by chronic exposure to methamphetamine exhibits a high degree of similarity in clinical manifestations and pathogenesis to neurodegenerative diseases (such as Alzheimer's disease, Parkinson's disease, etc.) [1-4].

      We fully acknowledge the central role of neurons in cognitive functions and the pathogenesis of cognitive disorders. However, despite decades of neuron-centric research that has yielded significant advancements, major challenges remain in elucidating disease origins, identifying early pathological events, and developing effective therapeutic strategies. For example, current models fail to adequately explain early disease events. Many pathological hallmarks of cognitive disorders such as amyloid plaques, neurofibrillary tangles, and α-synuclein aggregation emerge in the extracellular space long before overt neuronal loss or dysfunction occurs, and are increasingly recognized to be initiated or modulated by non-neuronal cells, including astrocytes and microglia [5]. Furthermore, the critical contribution of the neural microenvironment is often overlooked. Neuronal function and survival are highly dependent on this microenvironment, which is predominantly established and maintained by non-neuronal cell types such as astrocytes, oligodendrocytes, microglia, vascular endothelial cells, pericytes, and interstitial cells and matrix [6-10]. Additionally, systemic factors such as metabolic dysregulation, peripheral inflammation, and vascular pathology are closely associated with cognitive disorders. These factors often initially impact non-neuronal cells, particularly those forming the blood-brain barrier (e.g., endothelial cells) or mediating immune responses (e.g., microglia), before exerting downstream effects on neurons [11,12]. Finally, current therapeutic approaches for neuron face significant limitations, highlighting an urgent need for novel intervention strategies.

      During the development of neurodegenerative chronic diseases, although the structural or functional abnormalities of neurons are the direct factors leading to clinical symptoms (such as cognitive decline), this process is often regulated by various auxiliary cell types such as glial cells, immune cells, and stromal cells, and constitutes a complex pathological mechanism network. It is worth noting that the chronic and persistent progression of the disease usually results from the failure of these auxiliary cells to effectively provide support and nutrition to neurons, and even in some pathological states, they transform into effector cells that promote neuronal damage [13,14]. In recent years, a growing number of evidence has demonstrated that glial cells, immune cells, and stromal cells exert critical regulatory functions in the pathogenesis of neurodegenerative diseases. These cell types not only contribute to the maintenance of neural microenvironmental homeostasis during the early stages of disease progression but also display substantial functional heterogeneity in modulating inflammatory responses, synaptic plasticity, the repair of neuronal injury, linking genetic risks with environmental factors and the pathogenic mechanism of pathological protein propagation [15-19]. These research results indicate that they have the potential to become key therapeutic targets in clinical interventions: 1. compared to neurons themselves, they are more susceptible to being targeted by drugs or biological agents (such as antibodies), and have higher accessibility; 2. Non-neuronal cells (especially glial cells) exhibit high plasticity and reactivity during the course of diseases, providing an opportunity window for intervening in their functional states (such as inhibiting harmful activation and promoting protective functions); 3. they can serve as early intervention targets before irreversible damage occurs to neurons, helping to prevent or delay the progression of the disease;4. intervention methods targeting these targets are diverse, including immunomodulation, anti-inflammatory, vascular protection, and metabolic regulation strategies, which are usually more feasible in practical applications than directly protecting the fragile neurons.

      Early pharmacological studies have extensively characterized the neurotoxic effects of METH, including the induction of autophagy, apoptosis, oxidative stress, endoplasmic reticulum stress, and dopaminergic neurotoxicity [20]. However, therapeutic options and pharmacological interventions for METH abuse remain limited [21]. In recent years, increasing attention has been directed toward the impact of METH on non-neuronal cells. Research into mechanisms such as neuroinflammatory responses, blood-brain barrier disruption, and immune modulation is progressively contributing to a more comprehensive understanding of METH-induced neural injury [22-24]. Moreover, METH is a substance that induces widespread damage across multiple organ systems and diverse cell types throughout the body. Beyond its effects on neurons, various cell types exhibit distinct responses to METH exposure, which differ significantly depending on the duration of exposure. Our research dataset encompasses high-quality whole-cell mRNA sequencing data from multiple cell types within the hippocampus of mice subjected to chronic METH exposure, offering substantial data support and a robust foundation for in-depth investigation into the pathological mechanisms underlying METH-induced neurodamage.

      Thirdly, the selection of scRNA-seq was guided by our experimental objectives and prior research experience. Our earlier investigations have primarily centered on astrocytes, endothelial cells, and microglia. This single-cell sequencing study is intended to enhance our understanding of these neural support cells, comprehensively explore their underlying mechanisms and cellular interactions, and ultimately provide a solid foundation and reference for future research. However, our experience and infrastructure in the field of neuronal research remain relatively limited. To ensure the generation of high-quality data and to systematically advance the experimental objectives, we have prioritized the analysis of the neural microenvironment as the central focus of this study.

      Fourthly, the hippocampal region is a brain area with highly specialized and collaborative characteristics, which can be further divided into the ventral hippocampus, the dorsal hippocampus, and multiple subregions such as DG, CA1, CA2, and CA3. The neurons in these subregions exhibit strong heterogeneity, and the experimental methods we currently adopt are still unable to precisely distinguish the neurons in these different regions, which may to some extent affect the accuracy of data interpretation. To address the impact of neuronal heterogeneity, we believe that single-cell spatial transcriptomics technology can be adopted for in-depth research. However, due to the high cost of this technology, it is currently difficult to apply it in our research group.

      (1) Lappin JM. Rare but relevant: Methamphetamine and Parkinson's disease. Addiction. 2025 Apr;120(4):797-800. doi: 10.1111/add.16695. Epub 2024 Oct 22. PMID: 39434702.

      (2) Lappin JM, Darke S. Methamphetamine and heightened risk for early-onset stroke and Parkinson's disease: A review. Exp Neurol. 2021 Sep;343:113793. doi: 10.1016/j.expneurol.2021.113793. Epub 2021 Jun 21. PMID: 34166684.

      (3) Shukla M, Vincent B. The multi-faceted impact of methamphetamine on Alzheimer's disease: From a triggering role to a possible therapeutic use. Ageing Res Rev. 2020 Jul;60:101062. doi: 10.1016/j.arr.2020.101062.

      (4) Shrestha P, Katila N, Lee S, Seo JH, Jeong JH, Yook S. Methamphetamine induced neurotoxic diseases, molecular mechanism, and current treatment strategies. Biomed Pharmacother. 2022 Oct;154:113591. doi: 10.1016/j.biopha.2022.113591.

      (5) Gabitto MI, et al.. Integrated multimodal cell atlas of Alzheimer's disease. Nat Neurosci. 2024 Dec;27(12):2366-2383. doi: 10.1038/s41593-024-01774-5.

      (6) Stogsdill JA, Harwell CC, Goldman SA. Astrocytes as master modulators of neural networks: Synaptic functions and disease-associated dysfunction of astrocytes. Ann N Y Acad Sci. 2023 Jul;1525(1):41-60. doi: 10.1111/nyas.15004.

      (7) Terreros-Roncal J, et al.. Impact of neurodegenerative diseases on human adult hippocampal neurogenesis. Science. 2021 Nov 26;374(6571):1106-1113. doi: 10.1126/science.abl5163.

      (8) Zhu K, Fu Y, Zhao Y, Niu B, Lu H. Perineuronal nets: Role in normal brain physiology and aging, and pathology of various diseases. Ageing Res Rev. 2025 Jun;108:102756. doi: 10.1016/j.arr.2025.102756.

      (9) Depp C, Doman JL, Hingerl M, Xia J, Stevens B. Microglia transcriptional states and their functional significance: Context drives diversity. Immunity. 2025 May 13;58(5):1052-1067. doi: 10.1016/j.immuni.2025.04.009.

      (10) Sweeney MD, Zhao Z, Montagne A, Nelson AR, Zlokovic BV. Blood-Brain Barrier: From Physiology to Disease and Back. Physiol Rev. 2019 Jan 1;99(1):21-78. doi: 10.1152/physrev.00050.2017.

      (11) Nation DA, et al.. Blood-brain barrier breakdown is an early biomarker of human cognitive dysfunction. Nat Med. 2019 Feb;25(2):270-276. doi: 10.1038/s41591-018-0297-y.

      (12) Montagne A, Zhao Z, Zlokovic BV. Alzheimer's disease: A matter of blood-brain barrier dysfunction? J Exp Med. 2017 Nov 6;214(11):3151-3169. doi: 10.1084/jem.20171406. Epub 2017 Oct 23.

      (13) Huang Q, Wang Y, Chen S, Liang F. Glycometabolic Reprogramming of Microglia in Neurodegenerative Diseases: Insights from Neuroinflammation. Aging Dis. 2024 May 7;15(3):1155-1175. doi: 10.14336/AD.2023.0807.

      (14) Shi FD, Yong VW. Neuroinflammation across neurological diseases. Science. 2025 Jun 19;388(6753):eadx0043. doi: 10.1126/science.adx0043.

      (15) Xu X, Mei B, Yang Y, Li J, Weng J, Yang Y, Zhu Q, Zhang H, Liu X. Astrocytes Lingering at a Crossroads: Neuroprotection and Neurodegeneration in Neurocognitive Dysfunction. Int J Biol Sci. 2025 Apr 28;21(7):3122-3143. doi: 10.7150/ijbs.109315.

      (16) Bedolla A, et al.. Adult microglial TGFβ1 is required for microglia homeostasis via an autocrine mechanism to maintain cognitive function in mice. Nat Commun. 2024 Jun 21;15(1):5306. doi: 10.1038/s41467-024-49596-0.

      (17) Castellani G, Croese T, Peralta Ramos JM, Schwartz M. Transforming the understanding of brain immunity. Science. 2023 Apr 7;380(6640):eabo7649. doi: 10.1126/science.abo7649.

      (18) Chen YH, Jin SY, Yang JM, Gao TM. The Memory Orchestra: Contribution of Astrocytes. Neurosci Bull. 2023 Mar;39(3):409-424. doi: 10.1007/s12264-023-01024-x.

      (19) Deng Q, Wu C, Parker E, Liu TC, Duan R, Yang L. Microglia and Astrocytes in Alzheimer's Disease: Significance and Summary of Recent Advances. Aging Dis. 2024 Aug 1;15(4):1537-1564. doi: 10.14336/AD.2023.0907.

      (20) Jayanthi S, Daiwile AP, Cadet JL. Neurotoxicity of methamphetamine: Main effects and mechanisms. Exp Neurol. 2021 Oct;344:113795. doi: 10.1016/j.expneurol.2021.113795.

      (21) Paulus MP, Stewart JL. Neurobiology, Clinical Presentation, and Treatment of Methamphetamine Use Disorder: A Review. JAMA Psychiatry. 2020 Sep 1;77(9):959-966. doi: 10.1001/jamapsychiatry.2020.0246.

      (22) Shi S, Sun Y, Zan G, Zhao M. The interaction between central and peripheral immune systems in methamphetamine use disorder: current status and future directions. J Neuroinflammation. 2025 Feb 15;22(1):40. doi: 10.1186/s12974-025-03372-z.

      (23) Pang L, Wang Y. Overview of blood-brain barrier dysfunction in methamphetamine abuse. Biomed Pharmacother. 2023 May;161:114478. doi: 10.1016/j.biopha.2023.114478.

      (24) Shaerzadeh F, Streit WJ, Heysieattalab S, Khoshbouei H. Methamphetamine neurotoxicity, microglia, and neuroinflammation. J Neuroinflammation. 2018 Dec 12;15(1):341. doi: 10.1186/s12974-018-1385-0.

      (2) Another significant weakness of this study is the lack of a cohesive hypothesis or overarching conclusion regarding how METH impacts neural populations. The authors provide a largely descriptive account of transcriptional alterations across various cell types, but the manuscript lacks clear, biologically meaningful conclusions. This descriptive approach makes it difficult for readers to identify the key findings or take-home messages. To improve clarity and impact, the authors should focus on developing and presenting a few plausible hypotheses or mechanistic scenarios regarding METH-induced neurotoxicity, grounded in their scRNA-seq data. Including schematic figures to illustrate these hypotheses would also help readers better understand and interpret the study.

      We sincerely appreciate your valuable comments on our article. As you pointed out, the current research lacks experimental verification to further support our conclusions. To enhance the clarity and readability of the mechanism explanation, we have added several hypothetical diagrams (such as Figures.7, 8, and 9) in the discussion section to present the biological mechanisms reflected by the data more intuitively. Additionally, relevant verification work is underway, such as marking specific cell types with marker proteins. Author response image 1 shows some of our preliminary experimental results that have not been published yet, and their trends are consistent with the conclusions of this article. However, since the complete verification still requires a certain period of time, to ensure the rigor of the data, these results have not been included in the current manuscript for the time being. Finally, we would like to thank you again for your constructive suggestions.

      Author response image 1.

      (3) The final major weakness of this study is its poor readability. It appears that the authors did not adequately proofread the manuscript, as there are numerous typographical errors (e.g., line 333: trisulting; line 756: essencial), unsupported scientific claims lacking citations (e.g., lines 485, 503, 749-753), and grammatically incorrect sentences (e.g., lines 470-472, 540-543, 749-753). In addition, many paragraphs are unorganized and overly descriptive, which further hinders clarity. Some figures are also problematic - too small in size and overcrowded with text in fonts that are difficult to read. It is recommended that the authors carry out quality control. There are too many typographical and grammatical errors to list individually; the authors should carefully review and revise the entire manuscript to address all of these issues.

      We truly appreciate your thoughtful feedback and sincerely apologize for any inconvenience experienced by you and other readers.

      The text of this research manuscript was manually entered, which unfortunately resulted in some spelling and grammatical errors. In response, we have carefully revised the entire manuscript using word processing tools in the second version. Meanwhile, we have restructured and organized some lengthy paragraphs to enhance the clarity and readability of the content.

      Regarding the issue you raised about certain viewpoints lacking citation support, we have added the necessary references to those sections and reviewed the entire text to ensure all scientific claims are properly supported. 

      As for the image clarity, we made sure the submitted images met the 600dpi resolution requirement. However, we acknowledge that there were clarity issues in the final published version. We have since re-adjusted and re-uploaded the images to improve their quality.

      We are committed to continuously improving the manuscript and enhancing the overall quality of our academic presentation. Thank you sincerely for your kind attention to our work, your careful review, and the valuable suggestions you provided.

      Reviewer #3 (Public review):

      (1) While the bioinformatics analyses are extensive, the study is primarily descriptive at the molecular level. The absence of experimental validation, such as targeted mRNA/protein quantification and gene knockdown/overexpression to confirm the causal relationship between these identified genes and METH-induced cognitive deficits, is a notable limitation.

      We sincerely appreciate your valuable comments and suggestions. Indeed, there are still certain limitations in our manuscript in some aspects. It may not be able to systematically answer specific questions, and it is also difficult to fully clarify the functional roles of certain genes or specific cell types through experimental evidence.

      Although our manuscript still has certain limitations, we believe that the publication of this research is expected to provide new perspectives and theoretical support for the in-depth exploration of METH toxicity damage-related fields, thereby promoting the progress of research in this direction:

      (1) At present, the single-cell sequencing datasets on chronic damage caused by METH are still relatively limited, especially in terms of studies at the whole-cell level. Our dataset is expected to fill the research gap in this field to some extent, providing reference and support for subsequent related research.

      (2) During the sampling process of the sequencing experiment, we ensured high cell viability and sequencing quality. The experiment exhibited good reproducibility (each group consisted of 10 mice, and 2 mice from each group were selected to mix their hippocampal tissues into one sample), and the obtained data had high credibility.

      (3) The effects of METH have a wide distribution pattern across various organs and tissues. Through single-cell sequencing data, the common and differential expression patterns of related genes under different conditions can be systematically analyzed, which is helpful for future targeted knockout studies of these genes and provides a predictive basis for the evaluation of intervention measures, thereby enabling precise regulation of gene functions.

      (4) This is conducive to the orderly implementation of our subsequent research plans. Our subsequent research plan can be further developed based on a specific aspect of this study. We are indeed planning to do exactly that. During our earlier research on astrocytes, we discovered that astrocytes have two phenotypes (protective and inflammatory) in neuroinflammation. Given that astrocytes in the hippocampus show great variability depending on their location, the cells they come into contact with, and the stimuli they receive, we aim to investigate the changes in the function of astrocyte subpopulations in chronic METH-induced cognitive impairment. We focused on the role of the cAMP signaling pathway in the transformation of astrocyte phenotypes and attempted to link changes in astrocyte energy metabolism to their inflammatory phenotype. In addition, we found that endothelial cells can be easily distinguished into many subpopulations, which are related to their specific functions in immune responses, material transport, vascular growth regulation, energy metabolism, and other processes. We believe that single-cell technology can help us find the key mechanisms and intervention targets of chronic METH abuse-induced damage with greater precision.

      (2) While the discussion extensively covers the functional implications of specific molecular pathways and cell types, it would greatly benefit from a comparison of these findings with existing RNA sequencing data from other METH models in hippocampal tissue.

      We are very grateful for your professional suggestions, which have been of great help in improving the quality of our manuscript. We agree that comparing our findings with existing RNA sequencing data from other METH models in hippocampal tissue would strengthen the discussion. In response to your suggestion, we have actively reviewed relevant literature and databases, and attempted to request the database administrators and original authors for the download and use of the relevant data. However, as data integration still requires some time, we may not be able to conduct a detailed analysis of the data in this revised version. We can only discuss the conclusions of some authors.

      Palsamy Periyasamy et al. published a scRNA sequencing (live-cell) study on chronic METH exposure almost at the same time as us. They also adopted a similar gradual incremental 4-week METH exposure model and conducted sequencing analysis on glial cells in the cerebral cortex of mice [1]. The changes they observed in the circadian rhythm, adherens junctions, Rap1 signaling pathway, and cAMP signaling pathway (Disscusion, Lines 892-897) in the cortical astrocytes were also similar in the astrocytes of the hippocampal region that we studied. Similarly, in oligodendrocytes, we observed an upregulation trend of key genes regulating the circadian rhythm, such as Per2, Per3, and Nr1d1 (Disscusion, Lines 916-939). This result is consistent with their research findings. Non etheless, we believe that the changes in oligodendrocytes in terms of metabolic regulation and axonal function homeostasis are more significant.

      Pingming Qiu et al. further confirmed the correlation between the NF-κB signaling pathway in hippocampal astrocytes under METH action and neuroinflammation, neuroinjury, and learning and memory impairments in mice by integrating the GEO dataset [2]. This conclusion is also consistent with the sequencing results and analysis conclusions we obtained (Results, Lines 473-476).

      In terms of the neuro-immune system disorder caused by chronic METH exposure, our research findings are consistent with those of Biao Wang et al [3]. We both observed that METH exposure may involve the participation of related immune cells (such as T cells, monocytes) and may be related to the regulation of the innate immune response and the homeostasis of myeloid cells, etc. Through the identification and analysis of cell subtypes, we further revealed that these signals may be closely related to the interaction between microglia and other immune cells mediated by MHC molecules (Disscusion, Lines 870-894).

      Currently, the research results related to METH are still scattered and lack systematicness. There are differences among the research models, and there are relatively few studies on chronic exposure and in vivo experiments. Sequencing data sets with strong correlations are also scarce. We hope that this dataset can comprehensively and elaborately depict the molecular map of the hippocampus of mice after chronic METH exposure (although due to technical limitations, mature neurons die during dissociation, thus making it impossible to obtain the relevant data). In addition, we also hope to integrate the single-cell sequencing data and spatial transcriptome data of the hippocampus of mice after chronic METH exposure, providing a reliable data foundation and theoretical support for subsequent research in this field.

      Finally, we would like to express our sincere gratitude for your valuable suggestions and support. Although we still need some time to further refine the manuscript based on your opinions, we sincerely hope that more readers will provide us with constructive feedback to promote the continuous improvement and deepening of this research.

      (1) Oladapo A, Deshetty UM, Callen S, Buch S, Periyasamy P. Single-Cell RNA-Seq Uncovers Robust Glial Cell Transcriptional Changes in Methamphetamine-Administered Mice. Int J Mol Sci. 2025 Jan 14;26(2):649. doi: 10.3390/ijms26020649.

      (2) Li K, Ling H, Wang X, Xie Q, Gu C, Luo W, Qiu P. The role of NF-κB signaling pathway in reactive astrocytes among neurodegeneration after methamphetamine exposure by integrated bioinformatics. Prog Neuropsychopharmacol Biol Psychiatry. 2024 Feb 8;129:110909. doi: 10.1016/j.pnpbp.2023.110909.

      (3) Wu L, Liu X, Jiang Q, Li M, Liang M, Wang S, Wang R, Su L, Ni T, Dong N, Zhu L, Guan F, Zhu J, Zhang W, Wu M, Chen Y, Chen T, Wang B. Methamphetamine-induced impairment of memory and fleeting neuroinflammation: Profiling mRNA changes in mouse hippocampus following short-term and long-term exposure. Neuropharmacology. 2024 Dec 15;261:110175. doi: 10.1016/j.neuropharm.2024.110175.

      (3) The conclusion that "prolonged METH use may progressively impair cognitive function" may not be uniformly supported by the behavioral data: Figures 1C and F (discrimination and preference indexes) exhibited that the 4-week test further declined in the METH group compared to the 2-week. In contrast, Figure 1E and H present a contradictory pattern.

      Thank you very much for pointing this out. Your observation is very detailed and constructive. Regarding the conclusion "prolonged use of METH may progressively impair cognitive function", our main basis is the discrimination index and preference index shown in Figures 1C and 1F. These two indicators are usually calculated based on the total exploration time of new and old objects by mice. They are widely adopted as important references for cognitive function assessment in many relevant literature [1-3], thus providing strong support for our conclusion. The exploration frequency data we provided can, on the one hand, reflect the curiosity of mice towards new things, and on the other hand, can be calculated as the average time of each exploration by "total exploration time / exploration frequency", thereby evaluating their learning interest and the degree of their focus during exploration. We believe this is also of certain significance for reflecting the effect of METH on learning. As for the fact that there is no statistically significant difference in the exploration frequency of new and old objects in the 4-week-old mice in Figure 1H, we are also regretful about this. This might be due to the fact that our tests allow mice to freely explore in a stress-free environment, and there are significant differences among individual mice within the group. However, the mean values still show certain differences between the two groups. Compared to the mice at 2 weeks, the mice at 4 weeks have undergone a NOR test once and may have formed memories, which were retained in the subsequent assessment after four weeks. Moreover, we believe that injecting normal saline to the control group mice for a long time may affect their emotional state, because they cannot obtain the same pleasure as that brought by METH from the injection behavior.

      (1) Riva M, Moriceau S, Morabito A, Dossi E, Sanchez-Bellot C, Azzam P, Navas-Olive A, Gal B, Dori F, Cid E, Ledonne F, David S, Trovero F, Bartolomucci M, Coppola E, Rebola N, Depaulis A, Rouach N, de la Prida LM, Oury F, Pierani A. Aberrant survival of hippocampal Cajal-Retzius cells leads to memory deficits, gamma rhythmopathies and susceptibility to seizures in adult mice. Nat Commun. 2023 Mar 18;14(1):1531. doi: 10.1038/s41467-023-37249-7.

      (2) Lu Y, Chen X, Liu X, Shi Y, Wei Z, Feng L, Jiang Q, Ye W, Sasaki T, Fukunaga K, Ji Y, Han F, Lu YM. Endothelial TFEB signaling-mediated autophagic disturbance initiates microglial activation and cognitive dysfunction. Autophagy. 2023 Jun;19(6):1803-1820. doi: 10.1080/15548627.2022.2162244.

      (3) Arroyo-García LE, Tendilla-Beltrán H, Vázquez-Roque RA, Jurado-Tapia EE, Díaz A, Aguilar-Alonso P, Brambila E, Monjaraz E, De La Cruz F, Rodríguez-Moreno A, Flores G. Amphetamine sensitization alters hippocampal neuronal morphology and memory and learning behaviors. Mol Psychiatry. 2021 Sep;26(9):4784-4794. doi: 10.1038/s41380-020-0809-2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      Summary:

      This manuscript investigates whether newborns can use speaker identity to separate verbal memories, aiming to shed light on the earliest mechanisms of language learning and memory formation. The authors employ a well-designed experimental paradigm using functional nearinfrared spectroscopy (fNIRS) to measure neural responses in newborns exposed to familiar and novel words, with careful counterbalancing and acoustic controls. Their main finding is that newborns show differential neural activation to novel versus familiar words, particularly when speaker identity changes, suggesting that even at birth, infants can use indexical cues to support memory.

      Strengths:

      Major strengths of the work include its innovative approach to a longstanding question in developmental science, the use of appropriate and state-of-the-art neuroimaging methods for this age group, and a thoughtful experimental design that attempts to control for order and acoustic confounds. The study addresses a significant gap in our understanding of how infants process and remember speech, and the data are presented transparently, with clear reporting of both significant and non-significant results.

      Weaknesses:

      However, there are notable weaknesses that limit the strength of the conclusions. The main recognition effect is restricted to a specific subgroup of participants and emerges only during a particular testing window, raising questions about the robustness and generalizability of the findings. The sample size, while typical for infant neuroimaging, is modest, and the statistical power is further reduced by missing data and group-dependent effects. Additionally, the claims regarding episodic memory and evolutionary implications are somewhat overstated, as the paradigm primarily demonstrates memory retention over a few minutes without evidence of the rich, contextually bound recall characteristic of fully developed episodic memory.

      Overall, the authors have achieved their primary aim of demonstrating that speaker identity can facilitate memory separation in newborns, providing valuable preliminary evidence for early indexical processing in language learning. The results are intriguing and likely to stimulate further research, but the limitations in effect robustness and theoretical interpretation mean that the findings should be viewed as an important step forward rather than a definitive answer. The methods and data will be of interest to researchers studying infant cognition, memory, and language, and the study highlights both the promise and the challenges of probing complex cognitive processes in the earliest stages of life.

      We thank the reviewer for their thoughtful and positive assessment of our work, and for giving us the opportunity to clarify points that may have been unclear in the original manuscript.

      First, considering that the recognition response was quite consistent in previous studies, we expected the effect to emerge within a specific testing window, in either the first or the second block, depending on task difficulty. Accordingly, our analytical approach was designed to reflect this expectation, which was subsequently confirmed by the results. Second, the main recognition effect is not restricted to a specific subgroup of participants. Recognition responses were observed in both groups in the left IFG and bilateral STG. The only group-specific modulation was found in the right IFG, where the effect was primarily driven by Group A. This suggests that activity in this specific region may be influenced by contextual factors such as the nature and amount of recently processed stimuli. We have clarified these points in the revised manuscript to avoid the impression that the core effect is limited to a subset of participants or not generalizable across studies. 

      Regarding the sample size, a formal calculation was initially attempted based on the effect size reported in a closely related ANOVA-based study (Benavides-Varela et al., 2011; Study 2: Word recognition after intervening melodies, main effect for the comparison same vs novel word [F(1,26) = 19.318; p<0.0001 effect size f =.87). However, inputting this information into a dedicated software (G*power; α = 0.05; number of groups =1; number of measurements = 2) leads to an estimated sample size of N = 5 to 7 (depending on the desired power, range = 0.800.95). This sample size is unrealistically small and not representative of current research standards in the field. A proper formal power analysis for the LMM is otherwise hard to perform, as we lack information about the expected variance and random-effects structure. We therefore aligned our sample size with prior newborn studies using similar stimuli and experimental designs, and with fNIRS studies in newborns and infants (for recent metanalysis see De Roever et al., 2018; Boek et al., 2023; Gemignani et al., 2023; which examined studies with mean N =24; N range= 186 and sample sizes often including various conditions and groups). Note also that our design includes a within-subject comparison, our analytical approach models subject-level variance and handles unbalanced datasets and missing data (which are common in infant studies), thereby improving statistical sensitivity. We have now explicitly clarified this choice in the Introduction.

      Finally, we revised the discussion to ensure that interpretations are aligned with our findings, by including a limitations section and a more explicit note regarding theories of memory.

      Episodic memory is a multifaceted construct that matures over time through the integration of the what–who-where–when information. The present study does not aim to demonstrate the presence of a fully developed episodic memory system at birth; rather, it shows that specific features of episodic-like processing (i.e., what–who) are already bound from the first days of life. Future studies may track the progressive integration of additional episodic-related components leading to a mature episodic memory system.

      Reviewer #1 (Recommendations for the authors):

      (1) I wonder why a control condition with same-speaker interference was not included. Adding such a control would allow you to directly test whether the observed effects are truly due to speaker changes, rather than other acoustic or procedural factors. If it is not feasible to add this condition, please discuss its absence explicitly and clarify how it impacts the interpretation of your findings.

      We thank the reviewer for raising the issue of a same-speaker interference control. A similar control has been tested previously using a closely related paradigm, showing that recognition does not persist when neonates hear another word produced by the same speaker during the retention period (Benavides-Varela et al., 2011). As noted in the manuscript, there were some methodological differences between that study and the current one. Most importantly, in the present study familiarization was reduced (from ten to five blocks) and the retention interval increased (two to three minutes), making the current paradigm more demanding. We reasoned that, if newborns forgot the word under the prior (less challenging) study, they would also forget it here if a same-speaker interference control would have been implemented. With the current manipulation, despite the difficulty of the paradigm, the recognition response was observed. This pattern suggests that speaker change, rather than general procedural factors, is central to the observed effect. Given these prior findings and the ethical constraints of testing newborns, we believe that adding a new same-speaker control is not essential. We have now made this rationale more explicit in the manuscript (discussion section, limitations, p. 16), hoping that this clarification will make our methodological choices clearer.

      (2) It wasn't clear if Group A and Group B have the same number of infants, and whether they were randomly assigned. Please specify.

      Participants were initially assigned to Group A or Group B in a counterbalanced way to maintain comparable group sizes. Due to attrition and subsequent exclusion for various reasons (e.g., low signal quality, fussiness, technical issues), the final sample consisted of 17 infants in Group A and 15 infants in Group B. We have now specified this information in the revised manuscript (p. 20).

      (3) Please specify the exact number of fNIRS channels assigned to each region of interest (ROI), as it is currently difficult to map the channel numbers in Supplementary Table 2 to the optode montage shown in Figure 2. Additionally, report the percentage of usable channels after quality control.

      The inferior frontal gyrus left and right ROIs comprised 4 channels each, the superior temporal gyrus left and right ROIs 5 channels each, and the parietal lobe left and right ROIs 7 channels each. This information has been added to the methods section, along with the average number of channels contributing to each ROI after data rejection and the percentage of channels rejected throughout the recording (p. 23).

      (4) Also, a formal power analysis to justify your sample size would be helpful for evaluating the reliability of your findings and is increasingly expected in developmental neuroimaging research.

      Thanks for this suggestion. As stated in the public response, we agree that power analyses constitute an important component of methodological rigor in the field. In our case, a formal calculation was initially attempted based on the effect size reported in a closely related ANOVAbased study (Benavides-Varela et al., 2011; Study. 2: Word recognition after intervening melodies, main effect for the comparison same vs novel word [F(1,26) = 19.318; p<0.0001 effect size f =.87).

      However, inputting this information into a dedicated software (G’power; α = 0.05; power range = 0.80-0.95; number of groups =1; number of measurements = 2) leads to an estimated sample size of N = 5 to 7, which is unrealistically small and not representative of current research standards in the field. A proper formal power analysis for the LMM is otherwise hard to perform, as we lack information about the expected variance and random-effects structure. We therefore aligned our sample size with prior newborn studies using similar stimuli and experimental designs, and with fNIRS studies in newborns and infants (for recent metanalysis see De Roever et al., 2018; Boek et al., 2023; Gemignani et al., 2023; which examined studies with mean N =24; N range= 1-86 and sample sizes often including various conditions and groups. Note also that our design includes a within-subject comparison, and our analytical approach models subject-level variance and handles unbalanced datasets and missing data (which are common in infant studies), thereby improving statistical sensitivity.

      (5) The manuscript references episodic memory explicitly in the abstract and introduction, emphasizing the role of speaker identity in enabling episodic-like memory from birth. However, this concept is not sufficiently addressed or delineated in the discussion. Episodic memory is generally understood as recalling events with contextual details, involving complex integrative processes that extend beyond simple recognition of auditory stimuli. Your paradigm demonstrates memory retention over a few minutes but does not provide strong evidence for the hallmark features of episodic memory, such as contextual binding or autobiographical recollection. Moreover, infant speech recognition and memory formation in early life are influenced by the immediacy and complexity of sensory input, which may not necessarily engage fully developed episodic systems. Clarifying these distinctions and making sure your interpretations and claims are consistent with them would enhance the conceptual clarity of the manuscript.

      We agree that episodic memory is a multifaceted construct that, in its mature form, entails the ability to retrieve past events with contextual detail, typically involving autobiographical recollection and the integration of what–-who-where–when information (Tulving, 1993). Our study does not aim to demonstrate the presence of a fully developed episodic memory system at birth, nor do we claim that newborns’ performance satisfies all hallmark criteria of mature episodic memory. 

      Here, we focused on sensitivity to speaker identity as a contextual dimension relevant to memory formation. Within this narrower sense, both, the patterns of activation and the localization of the response provide evidence for early source–content binding (i.e., what–who), which can be considered a foundational aspect of episodic-like processing. Following up on this foundational step, future studies may track the gradual integration of additional aspects (where-when), ultimately leading to the maturation of a fully functional human episodic memory system.

      We have now clarified this point in the revised manuscript (p. 17)

      (6) Please add a dedicated limitations section. This should address the group-dependent nature of your main effects, the timing-specific recognition response, and any other methodological constraints that may impact the generalizability of your results.

      We thank the reviewer for this comment. We have made our best to expose the limitations of our study in the text (p.16), specifically regarding the reasons for the lack of a control condition and the effects of frequent changes in sleeping states in newborns. 

      (7) Consider revising sections where claims may be overstated, particularly regarding episodic memory and evolutionary implications.

      These sections have now been revised in the abstract and throughout the manuscript to ensure that interpretations remain proportionate to the data and consistent with current theoretical frameworks.

      Reviewer #2 (Public review):

      Summary:

      Previous studies by some of the same authors of the actual manuscript showed that healthy human newborns memorize recently learned nonsense words. They exposed neonates to a familiarization period (several minutes) when multiple repetitions of a bisyllabic word were presented, uttered by the same speaker. Then they exposed neonates to an "interference period" when newborns listened to music or the same speaker uttering a different pseudoword. Finally, neonates were exposed to a test period when infants hear the familiarized word again. Interestingly, when the interference was music, the recognition of the word remained. The word recognition of the word was measured by using the NIRS technique, which estimates the regional brain oxygenation at the scalp level. Specifically, the brain response to the word in the test was reduced, unveiling a familiarity effect, while an increase in regional brain oxygenation corresponds to the detection of a "new word" due to a novelty effect. In previous studies, music does not erase the memory traces for a word (familiarity effect), while a different word uttered by the same speaker does.

      The current study aims at exploring whether and how word memory is interfered with by other speech properties, specifically the changes in the speaker, while young children can distinguish speakers by processing the speech. The author's main hypothesis anticipates that new speaker recognition would produce less interference in the familiarized word because somehow neonates "separate" the processing of both words (familiarized uttered by one speaker, and interfering word, uttered by a different speaker), memorizing both words as different auditory events.

      From my point of view, this hypothesis is interesting, since the results would contribute to estimating the role of the speaker in word learning and speech processing early in life.

      Strengths:

      (1) New data from neonates. Exploring neonates' cognitive abilities is a big challenge, and we need more data to enrich the knowledge of the early steps of language acquisition.

      (2) The study contributes new data showing the role of speaker (recognition) on word learning (word memory), a quite unexplored factor. The idea that neonates include speakers in speech processing is not new, but its role in word memory has not been evaluated before. The possible interpretation is that neonates integrate the process of the linguistic and communicative aspects of speech at this early age.

      (3) The study proposes a quite novel analytic approach. The new mixed models allow exploring the brain response considering an unbalanced design. More than the loss of data, which is frequent in infants' studies, the familiarization, interference and learning processes may take place at different moments of the experiment (e.g. related to changes in behavioural states along the experiment) or expressed in different regions (e.g. related to individual variations in optodes' locations and brain anatomy).

      Weaknesses:

      I did not find major weaknesses. However, I would like to have more discussion or explanation on the following points.

      (1) It would be fine to report the contribution of each infant to the analysis, i.e. how many good blocks, 1 to 5 in sequence 1 and 2, were provided by each infant.

      (2) Why did the factor "blocknumber" range from 0 to 4? The authors should explain what block zero means and why not 1 to 5.

      (3) I may suggest intending to integrate the changes in brain activity across the 3 phases. That is, whether changes in familiarization relate to changes in the test and interference phases. For instance, in Figure 2, the brain response distinguishes between same and novel words that occurred over IFG and STG in both hemispheres. However, in the right STG there was no initial increase in the brain response, and the response for the same was higher than the one for novels in the 5th block.

      (4) Similarly, it is quite amazing that the brain did not increase the activity with respect to the familiarization during the interference phase, mainly over the left hemisphere, even if both the word and speaker changed. Although the discussion considers these findings, an integrated discussion of the detection of novel words and the detection of a novel speaker over time may benefit from a greater integration of the results.

      Appraisal:

      The authors achieved their aims because the design and analytic approaches showed significant differences. The conclusions are based on these results. Specifically, the hypothesis that neonates would memorize words after interference, when interfered speech is pronounced by a different speaker, was supported by the data in blocks 2 and 5, and the potential mechanisms underlying these findings were discussed, such as separate processing for different speakers, likely related to the recognition of speaker identity.

      I think the discussion is well-structured, although I may suggest integrating the changes into the three phases of the study. Maybe comparing with other regions, not related to speech processing.

      Evaluating neonates is a challenge. Because physiology is constantly changing. For instance, in 9 minutes, newborns may transit from different behavioral states and experience different physiological needs.

      We thank the reviewer for their constructive and positive appraisal of our work and for drawing attention to points that benefited from further clarification or discussion in the manuscript.

      In the following, we address each point in turn, using the numbering of the reviewer’s identified concerns.

      (1) In the Methods section (“Data Processing and Analysis”, p. 22), we have added detailed information about the number of data points contributed by each infant to the analyses.

      (2) The factor “blocknumber” ranged from 0 to 4 for statistical purposes, allowing Block 0 to serve as the reference (intercept) in the model. This coding facilitated the interpretation of parameter estimates. We now clarify this in the revised manuscript (p. 7).

      (3) Thanks for this relevant suggestion. In the Discussion, we now explicitly discuss the relationship across phases. We also acknowledged that a thorough examination of these issues lies beyond the scope of the present study as it will require future work based on multivariate and connectivity analyses.

      (4) We thank the reviewer for this comment. In the revised manuscript, we have expanded the Discussion to clarify the absence of a strong novelty response during interference. The discussion highlights how the temporal properties of the hemodynamic response and the functional demands of each phase jointly shape the observable fNIRS signal in newborns, with purely sensory novelty effects likely increasing with maturation.

      Finally, we agree that evaluating the transitions of sleeping states can further strengthen and clarify the results obtained in the present study. This has now been added as one of the limitations of this study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary

      The manuscript by K.H. Lee et al. presents Spyglass, a new open-source framework for building reproducible pipelines in systems neuroscience. The framework integrates the NWB (Neurodata Without Borders) data standard with the DataJoint relational database system to organize and manage analysis workflows. It enables the construction of complete pipelines, from raw data acquisition to final figures. The authors demonstrate their capabilities through examples, including spike sorting, LFP filtering, and sharpwave ripple (SWR) detection. Additionally, the framework supports interactive visualizations via integration with Figurl, a platform for sharing neuroscience figures online.

      Strengths:

      Reproducibility in data analysis remains a significant challenge within the neuroscience community, posing a barrier to scientific progress. While many journals now require authors to share their data and code upon publication, this alone does not ensure that the code will execute properly or reproduce the original results. Recognizing this gap, the authors aim to address the community's need for a robust tool to build reproducible pipelines in systems neuroscience.

      We appreciate the summary and the recognition of the key need for maximally reproducible scientific workflows.

      Weaknesses:

      The issues identified here may serve as a foundation for future development efforts.

      (1) User-friendliness:

      The primary concern is usability. The manuscript does not clearly define the intended user base within a modern systems neuroscience lab. Improving user experience and lowering the barrier to entry would significantly enhance the framework's potential for broad adoption. The authors provide an online example notebook and a local setup notebook. However, the local setup process is overly complex, with many restrictive steps that could discourage new users. A more streamlined and clearly documented onboarding process is essential. Additionally, the lack of Windows support represents a practical limitation, particularly if the goal is widespread adoption across diverse research environments.

      We agree that usability is critical, and we now clarify that Spyglass

      “… is designed to be used by everyone in a laboratory who works with the data, both as a general-purpose tool to enable the development of new analysis pipelines and a tool that allows those pipelines and associated results to be frozen and packaged to enable reproducibility…”

      To address the local setup issue, we have now created an interactive quick start program to guide new users through the setup (scripts/install.py). It now leads the user through a few prompts with sensible defaults to reduce the complexity of the setup. It aids the user in installing the Spyglass dependencies and creating the Data joint configuration file. We also validate the configuration to make sure the set up was successful (scripts/validate.py). Combined, these should reduce the complexity and set up time for most users while allowing expert users to configure Spyglass as they need. We thank the reviewer for the suggestion.

      We also agree that the lack of support for Windows is a key issue, and that is something we plan to address in the coming years. We note that it may be possible to run Spyglass under the Windows Subsystem for Linux (WSL 2), which allows users to run Linux programs on a Windows machine without the need for a virtual machine or dual boot setup.

      (2) Dependency management and long-term sustainability:

      The framework depends on numerous external libraries and tools for data processing. This raises concerns about long-term maintainability, especially given the short lifespan of many academic software projects and the instability often associated with Python's backward compatibility. It would be helpful for the authors to clarify how flexible and modular the pipeline is, and whether it can remain functional if upstream dependencies become deprecated or change substantially.

      This is a very good point that reflects a broad challenge to maintainability and reproducibility. We now explicitly raise this point in our Limitations section, and note that

      “…even in cases where reproducing a result would require installing older versions of software, the results themselves remain accessible within NWB files referenced in Spyglass, ensuring that previous results can be built on even as packages evolve.”

      The merge table pattern also allows us to update (version) our pipelines as software changes. For example, we have already done so for changes in SpikeInterface versions for the version 1 pipeline for spike sorting. New and older versions of the pipeline (v0 and v1) are accessed through the merge table SpikeSortingOutput. This allows the user to have consistent results despite the version change.

      (3) Extensibility for custom pipelines:

      A further limitation is the insufficient documentation regarding the creation of custom pipelines. It is unclear how a user could adapt Spyglass to implement their own analysis workflows, especially if these differ from the provided examples (e.g., spike sorting, LFP analysis that are very specific to the hippocampal field). A clearer explanation or example of how to extend the framework for unrelated or novel analyses would greatly improve its utility and encourage community contributions.

      Here we failed to provide the required links to the documentation. We now explicitly refer to documentation on Custom Pipeline, which include a link to a YouTube video walking users through the creation of such a pipeline:

      Specifically, Spyglass uses DataJoint syntax to define tables as Python classes (see online documentation on Custom Pipelines and this video for examples).

      (4) Flexibility vs. Standardization:

      The authors may benefit from more explicitly defining the intended role of the framework: is Spyglass designed as a flexible, general-purpose tool for developing custom data analysis pipelines, or is its primary goal to provide a standardized framework for freezing and preserving pipelines post-publication to ensure reproducibility? While both goals are valuable, attempting to fully support both may introduce unnecessary complexity and result in a tool that is not well-suited for either purpose. The manuscript briefly touches on this tradeoff in the introduction, and the latter-pipeline preservation-may be the more natural fit for the package. If so, this intended use should be clearly communicated in the documentation to help users understand its scope and strengths.

      We appreciate this point, and have now clarified in the beginning of the Results section that

      It is both a general-purpose tool to enable the development of new analysis pipelines and a tool that allows those pipelines and associated results to be frozen and packaged to enable reproducibility.

      In practice, our lab uses Spyglass to systematize analyses to enable rapid application across many datasets. Then, once a paper has been finalized, we can export the data and the code in a package that enables reproduction. Being able to do both things is, in our view, a key strength of Spyglass. More broadly, we feel it is critical that there be a clear path for users to take their analysis code and make it reproducible. That process normally involves a very substantial amount of work, and our goal was to reduce the burden on users and make this a straightforward extension of how analyses are carried out.

      Impact:

      This work represents a significant milestone in advancing reproducible data analysis pipelines in neuroscience. Beyond reproducibility, the integration of cloud-based execution and shareable, interactive figures has the potential to transform how scientific collaboration and data dissemination are conducted. The authors are at the forefront of this shift, contributing valuable tools that push the field toward more transparent and accessible research practices.

      We appreciate this positive assessment.

      Reviewer #1 (Recommendations for the authors):

      (1) "The authors write: ‘the relational database, a well-known data structure that uses tables to organize data.’ This phrasing may be misleading… It would be more accurate to describe them as ‘well-established’ rather than ‘well-known.’"

      We have made this change.

      (2) The statement "It makes it easy to apply the same analysis to multiple datasets, as users need to specify only the data and parameters for computation ("what") rather than the execution details ("how")." would benefit from further elaboration. Specifically, how does this approach compare in practice to using a simple configuration file (e.g., YAML or JSON) to manage parameters and execution logic? A comparison or example would help ground the claim.?"

      We agree one could in principle do something similar with configuration files, but this is a discipline that the user must impose on themselves, as configuration files in general have no constraint on how they are to be used. On the other hand, a system like Spyglass enforces the separation of data from parameters by design. We have now added a brief comment on this point in the Results:

      “It provides a structure to organize and systematize the analysis parameters, data, and outputs into different tables. This contrasts with user-generated configuration files where each user could adopt their own idiosyncratic approach to specifying parameters and data.”

      We also come back to this point in the Discussion:

      Other approaches do away with the relational database altogether. For example, DataLad uses version control tools such as git and git-annex to manage both code and data as files [39]. This enables the creation of a data analysis environment and decentralized data sharing. For building analysis pipelines, it may be combined with other tools for managing the sequential execution of scripts. For example, Snakemakeb[40] (and related projects such as Cobrawap [41]) allows the users to gather and define the input, output, and the associated scripts to execute for each analysis step, thereby tracking the dependency between steps. But because these tools do not provide any formal structure for data analysis or parameter specification, they lack the advantages of the relational database that we discussed, such as being able to easily organize or search for the records of previous analysis based on specific parameters, efficient data sharing and access management to multiple users, and built-in data integrity checks based on constraints native to the database (e.g. primary keys).

      (3) The sentence ‘It enables easy access to multiple datasets via queries’ may overstate the benefit… clarify what specific advantages database queries offer.

      We agree that this is an important feature and we added the following as an example of the advantage of being able to query the database:

      It enables easy access to multiple datasets via queries (e.g. to find all datasets with recordings from a particular brain region or that used a particular behavioral paradigm)

      (4) Specifically, Spyglass uses DataJoint syntax to define tables as Python classes’ lacks clarity… Expanding this explanation with a brief, concrete example would

      We agree that this sentence does not provide information on how to use DataJoint syntax to define a table. We carefully considered adding that syntax to the manuscript, but we are concerned that doing so here and in other places where syntax examples could be used would decrease the readability of the document. We also noted that other papers that present analysis frameworks typically provide much less information.

      Nevertheless, it is clear that users would benefit from a concrete example, and as we mentioned above, we have added a link to the documentation describing how to make custom schema and pipelines, as well as a YouTube video that we created to walk users through this process.

      (5) The authors write: "Selection tables associate parameter entries with data object entries." This terminology is confusing. From a naming perspective, it is not immediately obvious what a "selection table" is or how it differs from other components. Moreover, shouldn't parameter entries be associated with a specific pipeline rather than directly with data objects? Further clarification is needed. "

      We appreciate that our terminology was not clear. The idea behind a selection table is that there are many data entries and many potential sets of parameters that can be used to analyze each of those entries. We have now revised this section of the text and added an explanatory paragraph:

      An analysis pipeline consists of sets of tables downstream of the Common tables. In each step in the analysis, the user populates one of four table types (Figure 2A):

      Data tables contain pointers to data objects in either the original NWB file or ones generated by an upstream analysis.

      Parameter tables contain a list of the parameters needed to fully specify the desired analysis.

      Selection tables allow users to select and pair a data entry and a parameter entry, defining the input to the Compute table.

      Compute tables execute the computations to carry out the analysis using the Data and Parameters specified in the Selection table entry. These results are then stored and can serve as Data for downstream analysis.

      This design has multiple features that we have found to be beneficial. First, Parameter tables store the full set of parameters needed to specify a given analysis. For example, a Parameter table entry for a firing rate analysis of a single neuron might specify the bin size and smoothing to be used for that analysis. Multiple such entries can be defined, allowing a user to select the most appropriate one for the question being addressed. Second, because Selection tables specify which Parameter table entry was used for a given analysis on the associated Data table entry, they provide the key information needed to know which parameters were used to generate the entry in the downstream Compute table. Third, it is simple to associate a given Data table entry with multiple Parameter table entries and then re-run the analysis on those pairs. This enables a user to understand how their choice of parameters impacts their results, something that is otherwise difficult to manage and track.

      (6) Including ‘fitting state-space models’ as a standard example may be misleading… Presenting it as a routine task might set unrealistic expectations."

      We agree and have changed “standard” to “a diverse range of”.

      (7) Figure 2 would benefit from clearer sequential logic. For example, the object ‘LFPSelection’ appears after a method call referencing it."

      We agree that the figure was not explained adequately. We now make it clear in the caption that the method call creates the entry in the LFPSelection table, and is thus upstream of the picture of the table entry that was created.

      (8) Example 3 would be strengthened by a comparison to SpikeInterface, a framework increasingly adopted by the community."

      Here we clearly did not explain the spike sorting pipeline sufficiently thoroughly. As we now clarify in the text:

      This pipeline uses SpikeInterface [19] to perform the operations critical for spike sorting, but also tracks all of the parameters used and provides a system for tracking multiple sorting curations.

      Thus, Spyglass takes advantage of the special purpose routines within SpikeInterface, but also provides an organizational framework for the outputs, and, equally critically, allows direct use of the outputs of sorting in downstream analyses with the ability to go back and know which sorting parameters were used for that analysis.

      (9) The authors state: "These are saved as Docker containers and optionally uploaded to DANDI." However, it is unclear how end users are expected to interact with these containers. Additional guidance or an example interaction would be valuable.

      We agree that this interaction was not described in the text, and we have now added the following to explain how a user might interact with these containers:

      ...This can be done by (i) hosting the database on the cloud and granting access to users outside the lab; or (ii) exporting and sharing parts of the database that were used by the project. Spyglass facilitates the second option by providing functions that automatically log the table entries and NWB files used for creating figures of a manuscript in a Python environment (Table 1, 05_Export). The dependencies of these entries are traced through the database to compile the complete set of raw, intermediate, and plotted NWB files and their corresponding database entries. These are stored in the `Export` table, which also generates a bash script to create SQL dumps of the identified database entries.

      To upload these files to DANDI, users must first register a new dandiset for their project and record their API and dandiset ID. With this information, they can then use the method `DandiPath.compile_dandiset()` to automatically validate, organize, and upload all project files to the DANDI archive. Additionally, this process stores the archive information for each file in the `DandiPath` table, allowing `fetch_nwb` to automatically stream data from the DANDI cloud storage when not available locally.

      To create a sharable docker image of the project, we provide a template repository spyglass-export-docker. Users first download a local copy of this repo and copy the SQL dump file, environment yaml, and figure-generating notebooks generated during spyglass export into the appropriate folders. Running the provided docker compose scripts then generates two linked docker containers: one running the reconstructed spyglass SQL database, and a second connected to this database and running a jupyter hub with a python environment matching that used when generating the figures. These can be readily shared with new users to provide them immediate access to all steps of the analysis process and the corresponding data through DANDI streaming

      (10) The phrase "not requiring a central location to track available files and providing a user-friendly Python API" is somewhat vague. Does this imply that multiple sources can exist for the same NWB file? How does the system handle potential version conflicts, such as when an NWB file is modified locally? A clearer explanation would help users understand the system's behavior in collaborative scenarios. "

      This is an important point that we now explain in the manuscript:

      Critically, the downloaded files are never modified locally within Spyglass and attempt to access a modified file would result in a DataJoint error. This ensures that each user is working on the same underlying data even if they are at different sites.

      To provide interested readers with more details, we also now point them to the repo for more information:

      We point interested readers to the Kachery GitHub repo (https://github.com/magland/kachery) for further descriptions.

      (11) "The concept of a ‘kachery zone’ in Figure 4 is ambiguous. Is this storage local or in the cloud? If a third-party storage system is involved, it should be explicitly labeled and described in the diagram."

      We agree that the depiction of a Kachery zone in Figure 4 is hard to understand. For the reviewer’s reference, a Kachery zone defines a list of users that have permissions to upload and download a particular set of files that have been linked to that zone. This is a explained in the tutorials, and to simplify the figure we have replaced the Kachery zone with a remote computer.

      (12) If one of the manuscript's goals is to showcase the functionality of the pipeline, Figure 5 would be more informative if it also illustrated the workflow or steps involved in generating the displayed figures.

      We have added a supplementary figure (Supplementary Figure 1) related to figure 5 that illustrates the main data workflow used in generating the figure. In addition, we note that the code for generating the figure 5 and supplemental are included in the code repository for the paper (https://github.com/LorenFrankLab/spyglass-paper/).

      (13) In the conclusion, the authors write: "By contrast, Spyglass begins with a shared data format that includes the raw data and offers both transparent data management and reproducible analysis pipelines using a formal data structure." However, the tools discussed in the previous paragraph seem to offer similar capabilities. The real challenge in transparent data management often lies in the technical overhead associated with setting up and maintaining a database, particularly when collaborating across labs.

      Here we may not have explained the differences between Spyglass and these other approaches sufficiently clearly. The various tools mentioned in the paragraph above this one do not begin with a shared format nor do they include a formal data structure. That said, we agree that maintaining a database accessible across labs is a key challenge. We note here that we provide tutorials to ease this process, which are linked and described in the manuscript (e.g. Table 1).

      (14) Specifying a preferred IDE… may not be necessary. This recommendation could be made optional or omitted."

      We agree that it may not be necessary, but we have also noted that users come to Spyglass with a very wide range of expertise, and in our lab it has been helpful to specify the IDE.

      Reviewer #2 (Public review):

      Summary:

      This valuable paper presents Spyglass, a comprehensive software framework designed to address the critical challenges of reproducibility and data sharing in neuroscience.

      The authors have developed a robust ecosystem built on community standards such as NWB and DataJoint, and demonstrate its utility by applying it to datasets from two independent labs, successfully validating the framework's ability to reproduce and extend published findings. While the framework offers a powerful blueprint for modern, reproducible research, its immediate broad impact may be tempered by the significant upfront investment required for adoption and its current focus on electrophysiological data. Nevertheless, Spyglass stands as an important and practical contribution, providing a well-documented and thoughtfully designed path toward more transparent and collaborative science.

      Strengths:

      (1) Principled solution to a foundational challenge:

      The work offers a concrete and comprehensive framework for reproducibility in neuroscience, moving beyond abstract principles to provide an implemented, end-to-end ecosystem.

      (2) Pragmatic and robust architectural design:

      Features such as the "cyclic iteration" motif for spike-sorting curation and the "merge" motif for pipeline consolidation demonstrate deep, practical experience with neurophysiological analysis and address real-world challenges.

      (3) Cross-laboratory validation:

      The successful replication and extension of published hippocampal decoding findings across independent datasets strongly support the framework's utility and underscore its potential for enabling reproducible science.

      (4) Accessibility through documentation and demos:

      Extensive tutorials and the availability of a public demo environment lower some of the barriers to adoption.

      We appreciate the Reviewer’s recognition of these strengths.

      Weaknesses:

      (1) High barrier to adoption:

      The requirement to convert all data into NWB, maintain a relational database, and train users in structured workflows is a significant hurdle, particularly for smaller labs.

      We agree that this is a significant hurdle, but we also believe that it comes with many advantages. It is also increasingly easy to do given the many community-supported tools, regardless of how much resource the lab has. These points are discussed in detail in “Why NWB?” section.

      We also note that, to our knowledge, there is no simpler alternative that provides the key features of Spyglass.

      (2) Limited tool integration:

      The current pipelines, while useful, still resemble proof-of-principle demonstrations.

      Closer integration with established analysis libraries such as Pynapple and others could broaden the toolkit and reduce duplication of effort.

      Here we clearly failed to explain that we have integrated other libraries, including Pynapple. We now make this clear in the Results section:

      Our goal was take advantage of other open source packages, and we have therefore integrated support for Pynapple [21], a general purpose neural data analysis package. We also built our pipelines to take advantage of other community-developed, open-source packages, like GhostiPy [20], SpikeInterface [19], DeepLabCut [2] and Moseq [29].

      We also have added a specific reference to the relevant function call in the Practical use cases and extensions section:

      For example, the user can conveniently read specific data types from the NWB file by first ingesting it into Spyglass and accessing database tables with Spyglass functions (e.g. fetch_nwb) or even load those objects in a format compatible with Pynapple [21] (fetch_pynapple).

      Pynapple support is actually aided by our design choice of relying on NWB. Because NWB files can be loaded by Pynapple, any analysis that uses a NWB file that can be read by Pynapple can be loaded as a Pynapple object. We have provided methods to do so.

      (3) Experimental metadata support:

      While NWB provides a solid foundation for storing neurophysiology data streams, it still lacks broad and standardized support for experimental metadata, including descriptions of conditions, subject details, and procedures, as well as links across datasets. This limitation constrains one of Spyglass's key promises: enabling reproducible, crosslaboratory science. The authors should clarify how Spyglass plans to address or mitigate this gap - for example, by adopting or contributing to metadata extensions, providing templates for experimental conditions, or integrating with complementary systems that manage metadata across datasets.

      This is an important point. First, NWB provides methods for creating new metadata extensions, and our laboratory has contributed to multiple such extensions and have adopted metadata extensions as they come to exist (for example, we are currently integrating the ndx-pose extension, which has broader support for pose estimation algorithms such as DLC and SLEAP, enabling us to capture relationships between body parts). These extensions, once incorporated into NWB, make it easy to create parallel Spyglass tables that read in the associated metadata. Second, we note that by storing the metadata from the NWB file in a database, Spyglass naturally supports searches across datasets where the metadata is the same (e.g. all the datasets from a given subject or using a given behavioral apparatus).

      That said, for these searches to be easy, the underlying NWB files need to use the same ontologies (naming systems). Creating shared naming systems within and across labs is very challenging, but even here having a database helps greatly, as it provides a way to find all the names used for a given field and to thereby make an effort to standardize them.

      Finally, while Spyglass aims to enable reproducibility, it will not be possible to solve all standardization issues of the field. We believe that Spyglass is an important step forward in standardization and reproducibility in that it encourages users to use the same data format and processing. To our knowledge, there is no software like it in the field of systems neuroscience. Limitations of the field and of current progress does not invalidate the contribution of Spyglass as a framework.

      We now mention all these issues in the Limitations section of the Discussion.

      (4) Cross-laboratory interoperability:

      While demonstrated across two datasets, the manuscript does not fully address how Spyglass will handle the diversity of metadata standards, acquisition systems, and labspecific practices that remain major obstacles to reproducibility.

      We agree that the current version of Spyglass does not fully address this diversity. Neverless, we note that the NWB standard is increasingly widely adopted in our field, and that by building on this standard, it is much similar to create structures that store relevant data across labs.

      (5) Visualization limitations:

      Beyond the export system and Figurl, NWB offers relatively few options for interactive data exploration. The ability to explore data flexibly and discover new phenomena remains limited, which constrains one of the potential strengths of standardized pipelines.

      We agree that there are many other tools, and we have considered additional integrations. We have chosen not to proceed in this direction because the various visualization tools are well constructed, and therefore already easy to use with data retrieved from Spyglass. Thus, users can choose to use Matplotlib, Seaborn, or any of many other visualization tools and apply thos to data accessed through Spyglass without the need for more explicit integration.

      Spyglass is well-positioned to become a community framework for reproducible neuroscience workflows, with the potential to set new standards for transparency and data sharing. With expanded modality coverage, tighter integration of existing community tools, stronger solutions for cross-lab interoperability, and richer visualization capabilities, it could have a transformative impact on the field.

      We appreciate this summary and will continue to try to make Spyglass more powerful, generalizable, and accessible to the community.

      Reviewer #2 (Recommendations for the authors):

      (1) Documentation/User onboarding:

      While extensive documentation exists, new users may feel overwhelmed. A single Quickstart or "golden path" guide and a one-command validation script would substantially improve usability.

      As mentioned in the response to reviewer 1, we have added an interactive quickstart program to walk users through installation and setup (scripts/install.py) and validate the install (scripts/validate.py). This should greatly reduce the complexity of the set-up process and allow new users to use Spyglass quickly and confidently. We thank the reviewer for the suggestion.

      (2) Permission handling and multi-user scaling:

      Current ad hoc solutions (like cautious deletes) may not scale well in large collaborations. This should be acknowledged, but it is not a fatal weakness given the framework's early stage.

      This is a fair point and we now mention this when cautious delete is introduced in the Methods:

      Though this is not a formal permission-management system, it serves to prevent accidental deletions. We note that this system does incur additional overhead, and while that has not been an issue for us, it is possible that this would become problematic in use for much larger cross-laboratory collaborations.

      (3) Benchmarking and performance evaluation:

      "More systematic testing (e.g., reproducibility across independent users, computational efficiency) would be reassuring, but the lack of it does not invalidate the proof-of-principle demonstration. "

      We agree. So far at least two other labs have adopted this system and we are working with a consortium funded by the Simons Foundation to use Spyglass as a data sharing system across a larger number of labs.

      (4) Support for Cloud solution:

      To lower the barrier to adoption, the authors should consider cloud integration, such as preconfigured Docker/Cloud templates or hosted options, so end-users do not need to maintain databases and storage locally.

      We agree that cloud-based solutions could be a good option for some labs, although we note that the cost of cloud-based computing can be very high. There is also the burden of moving and storing the data to where it needs to be processed, which can be particularly time intensive with the large-scale data being generated by many laboratories.

      At the reviewer’s suggestion, we have added a docker-compose support to lower the barrier to adoption. This includes:

      docker-compose.yml with health checks and persistent storage

      .env.example configuration template

      This allows one-command database setup: `docker compose up –d`

      (5) Integration of greater modalities:

      The authors should consider expanding support to other major data types, particularly calcium imaging, photometry, and other optical physiology data.

      We entirely agree that pipelines to ingest and process these datatypes would be very valuable, and we would welcome collaborations with experts and the general community to build these pipelines. We are, for example, working with a collaborating lab on a photometry pipeline. However, we only have so many people to build and maintain Spyglass, so we are limited by the capacity and expertise of our developers.

      (6) Integrate more community tools:

      Closer integration with community tools such as Pynapple, Neurosift, and SpikeInterface would broaden functionality and position Spyglass as a hub rather than a parallel ecosystem.

      As we mentioned in our responses to Reviewer 1, we entirely agree, and in fact we have already integrated Pynapple support into Spyglass. Because we store files in the NWB format and Pynapple supports NWB, it was easy for us to convert any data we have into the Pynapple format upon request, thus making it easily analyzable by the Pynapple package. Moreover, we use SpikeInterface for the SpikeSorting pipline, and similarly provide pipelines built on other open source projects. As we now clarify in the text:

      Spyglass includes pipelines for a diverse range of analysis tasks in systems neuroscience, such as the analysis of LFP, spike sorting, video and position processing, and fitting state-space models for decoding neural data. Tutorials for all pipelines are available on the Spyglass documentation website (Table 1). Our goal was take advantage of other open source packages, and we have therefore integrated support for Pynapple [21], a general purpose neural data analysis package. We also built our pipelines to take advantage of other community-developed, open-source packages, like GhostiPy [20], SpikeInterface [19], DeepLabCut [2] and Moseq [29].

      (7) Direct Dandi archive upload functionality:

      Scripts and tutorials for uploading data directly from Spyglass to DANDI, with validation of metadata completeness, would provide users with a direct pipeline from raw data to a public archive.

      The tutorials for DANDI upload are included as part of the export tutorial notebook (https://lorenfranklab.github.io/spyglass/latest/notebooks/05_Export/). We agree that this was not apparent from the manuscript before and have noted this within the Manuscript table describing these notebooks.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors wanted to determine whether the set-19 gene, one of 38 SET-domain containing genes in C elegans, has a clear function in vivo with respect to lysine methylation. The question is not only whether it can modify this histone tail residue, but also what the impact of a loss of this locus is on the inheritance of repressive chromatin states.

      Strengths:

      The authors clearly achieved their goal, and it is convincingly shown that SET_19 is indeed a somatic cell histone methyltransferase with a striking specificity for H3K23. There is both recombinant protein work, quantitative mapping in vivo, of histone marks and transcriptional changes, and the authors rule out some other hypotheses that have been in the literature. Overall, this provides a compelling argument that SET-19 is indeed the major somatic cell HMT for this residue. Interestingly, the phenotypes are rather minimal, consistent with redundancy in the physiological roles of histone methylation, and redundancy as well in HMT function. For the most part, the data are not over-interpreted. The genetic alleles used, assuming they are confirmed, were revealing and well-documented.

      Thanks very much for the positive comments on our work.

      The alleles used in this study were confirmed by PCR and Sanger sequencing, and the sequence information will be added in the revised manuscript.

      Weaknesses:

      The major weaknesses are easily fixed. The major weaknesses mainly reflect a slight overstatement of certain data (claiming insignificance, when it is not clear how that was determined) and claiming a bit too much about SET-32, which was independently claimed to be an H3K23 HMT. Clearly, the two SET domain enzymes are not redundant, nor is the claim that SET-32 has no role in H3K23 methylation completely convincing. Especially in germline or embryonic conditions. Finally, the imaging is not of very high quality, nor are the images fully quantitated. These points can be easily remedied.

      Thanks very much for the comments.

      We agree that some interpretations in the original manuscript were too strong, particularly regarding the negative results and the role of SET-32. Our in vitro assays show that SET-32 exhibits H3K23me1 activity and, at higher SAM concentrations, activity toward H3K23me2/3. These findings indicate that SET-32 does have a role in H3K23 methylation. SET-32 is expressed in germ cells, oocytes, and embryos. It is quite likely that redundancy of H3K23 methyltransferase activity exists in these tissues. In the revised manuscript, we will tone down the interpretations and expand the Discussion section to include this possibility. We will also replace the relevant images with higher-quality versions and provide quantitative analyses for Figures 6a and 6b.

      Reviewer #2 (Public review):

      Summary:

      This manuscript identifies SET-19 as a somatic H3K23 methyltransferase in C. elegans, building on previous genetic evidence for a role of set-19 in H3K23me3 regulation. The authors combine quantitative mass spectrometry, western blotting, in vitro methyltransferase assays, ChIP-seq, and RNA-seq to show that loss of set-19 causes a strong reduction of H3K23me3, particularly in somatic tissues, and is associated with derepression of a subset of genes enriched for H3K23me3. They further conclude that SET-19 is dispensable for canonical feeding RNAi and for transgenerational or intergenerational inheritance of RNAi, distinguishing its function from other heterochromatin-associated methyltransferases such as SET-25, SET-32, and the H3K27 HMTs. Overall, the work adds an important piece to the H3K23 methylation pathway and tissue-specific chromatin regulation in C. elegans.

      Strengths:

      Very strong genetic and biochemical evidence for SET-19 as the major H3K23me3 HMT.

      The mass spectrometry and western blot data convincingly demonstrate a strong reduction of H3K23me3 in two independent set-19 alleles and rescue by GFP::SET-19, which is a major strength (Figure 1, including Figure 1f).

      The in vitro methyltransferase assays (Figure 2) showing robust H3K23me1/2/3 activity for SET-19 SET+CC and only modest H3K23me activity for SET-32, together with the SAM titration experiment in Figure 2C, are very informative and nicely support the conclusion that SET-19 is a high-activity H3K23 methyltransferase compared to SET-32.

      The ChIP-seq analysis is central to the conclusions that H3K23me3 is enriched on chromosome arms, co-localizes with H3K9me3/H3K27me3, and is strongly reduced in set-19 mutants.

      Thanks very much for the positive comments on our work.

      Weaknesses:

      (1) The global reduction of H3K23me3 in Figure 3b,c and Figure S4c is convincing, but the correlation analysis between H3K23me3 loss and mRNA changes in Figure 3g could be strengthened. Currently, the analysis appears to focus on broad categories; it would be helpful to provide:

      Representative genome browser tracks (e.g., exemplary gene coverage plots) for several genes that show clear H3K23me3 peaks in wild type, reduction in set-19, and concomitant upregulation of mRNA levels, and for a few genes that retain H3K23me3 and do not change expression. This would make the link between chromatin changes and transcriptional output more concrete.

      Thanks very much for the suggestion.

      To address this point, we will include representative genome browser tracks for selected genes in the revised manuscript. These examples will help better illustrate the relationship between H3K23me3 loss and mRNA expression changes.

      (2) In Figure S4C, the authors note a pronounced reduction of H3K23me3 mainly on chromosome arms, but in the current data, it appears that the impact might be arm-specific (i.e., stronger reduction in one arm than the other in a chromosome), with a notable pattern at the X chromosome tip where H3K23me3 seems increased. This is potentially interesting and should be briefly commented on in the Results or Discussion, for example, whether this reflects compensatory activity of another HMT, changes in chromatin organization, or could be a technical artifact.

      Thanks very much for bringing up this point.

      As shown in Figure S4C, the overall chromosomal distribution pattern of H3K23me3 is broadly similar between wild type and set-19 mutants, with pronounced enrichment over one chromosomal arm, whereas the center and the opposite arm show relatively lower signal. In set-19 mutants, this asymmetry becomes more pronounced, with a larger difference between the highly enriched arm and the lower-signal regions. This pattern is particularly evident on chromosomes I, II, V, and X. These observations suggest that the effect of set-19 loss on H3K23me3 is not uniform across chromosomal regions.

      Substantial H3K23me3 signal remains in specific regions in set-19 mutants, suggesting that additional enzyme(s) also contribute to H3K23me3 methylation. For example, SET-19 appears to function predominantly in somatic tissues, yet the ChIP-seq assays were performed using whole animals, including the germline. Alternatively, there might be compensatory activity of another HMT. In the revised manuscript, we will state these points more explicitly in the Results section and discuss the residual and locally increased H3K23me3 signals.

      (3) Figure 3d suggests that some actively expressed genes can also display relatively high H3K23me3 levels, which complicates a simple model of H3K23me3 as exclusively repressive. If feasible, a limited additional analysis stratifying genes by both H3K23me3 and H3K9me3/H3K27me3 status might clarify whether these highly expressed, H3K23me3 marked genes differ in other chromatin features.

      Thanks very much for the suggestion.

      To address this point, we will perform additional stratified analyses of H3K23me3-marked genes according to their H3K9me3 and/or H3K27me3 status. We will also compare highly and weakly expressed H3K23me3-marked genes to examine whether they differ in other chromatin features, including H3K9me3, H3K27me3, and, if feasible, H3K4me3 and H3K36me3.

      (4) The authors argue that SET-19 primarily affects H3K23me3 and not other canonical repressive marks, based largely on mass spectrometry. It would significantly strengthen the mechanistic conclusions if the authors could assess H3K9me3 and H3K27me3 profiles in set-19 mutants, ideally by ChIP-seq or at least by focused ChIP-qPCR at a subset of loci that lose H3K23me3 and are derepressed at the RNA level. This would address whether H3K23me3 loss occurs independently of changes in other heterochromatin marks, or whether there is crosstalk.

      Thanks very much for the suggestions.

      As suggested, H3K9me3 and H3K27me3 ChIP-seq in wild-type and set-19 mutants will be performed. We will compare their genome-wide distributions and identify loci with significantly altered H3K9me3 and/or H3K27me3 enrichment. These analyses should help clarify whether H3K23me3 loss occurs largely independently of H3K9me3/H3K27me3 changes or reflects potential crosstalk among these repressive chromatin marks. In addition, we will examine H3K9me3 and H3K27me3 enrichment at genes showing both H3K23me3 loss and increased mRNA expression in set-19 mutants to assess whether derepression at these loci is accompanied by changes in other canonical repressive marks.

    1. Author response:

      [These author responses are to reviews from another journal.]

      Reviewer #1:

      This manuscript investigates the behaviour of a variety of clock proteins in cultured cells when epitope tagged and transiently expressed and try to draw general implications for endogenous function of circadian clock proteins.

      Clock proteins are expressed at low levels in most cells, and so the clock interacting proteins (other kinases, phosphatases, ubiquitin-conjugated enzymes, etc.) are likewise probably at low abundance. Over-expression of one or two or even three components of a multicomponent system is going to produce odd and obscure non-physiological imbalances. The authors do not extend detailed study of these imbalances to more physiologic levels so the importance of their observations to clock function is not clear, and importantly, they are not tested in more biologically relevant models.

      To study the function of components within a system, the steady state must be perturbed in one way or another. This can be achieved through pharmacological treatment, mutagenesis, downregulation, or overexpression. Such interventions are inherently non-physiological, and the relevance of the resulting observations must therefore be carefully validated.

      In our study, the purpose of PER2 overexpression was to investigate its subcellular dynamics in the absence and presence of CRYs, specifically CRY1. This is far less trivial than it might appear at first glance, because our data clearly show that PER2 overexpression triggers, within 24 h, the accumulation of endogenous CRY1 (Fig. 1A), due to PER2-mediated stabilization of CRY1 (Fig. 4). PER2 overexpression also induces the accumulation of endogenous PER1, CK1, and BMAL1 (Fig. 2).

      This effect was not considered in previous studies, such as Yagita et al. (2002), in which PER2 subcellular localization was assessed at a single time point following transient transfection. Yagita et al. found roughly equal proportions of cells with PER2 exclusively in the nucleus, exclusively in the cytoplasm, or distributed between both compartments. Such extreme cell-to-cell variability cannot be explained solely by PER2’s shuttling dynamics, as that would imply synchronous export in one cell and synchronous import in another.

      Our time-resolved analysis of DOX-induced PER2 expression strongly suggests that the variability reported by Yagita et al. reflects a heterogeneous population of unsynchronized cells at different temporal stages along a trajectory from cytoplasmic PER2 (unbound) to nuclear PER2 fully saturated with CRYs (bound), owing to stabilization of endogenous CRYs. Similarly, Öllinger et al. (2014) analyzed PER2 nuclear export in cells constitutively expressing PER2-Dendra. Under such steady-state conditions, PER2-Dendra is already in complex with endogenous CRYs. The slow export rate and lack of dependence on additional CRY1 expression therefore likely reflect export of the complex, which is intrinsically slow.

      Thus, prior to our work, no data on the true shuttling dynamics of PER2 were available.

      Importantly, our results show not only that CRY1 promotes nuclear accumulation of PER2 (as reported by Öllinger et al.) but also that, conversely, PER2 promotes cytosolic accumulation of CRY1, depending on their expression ratio. Since CRY1 is predominantly nuclear and PER2 predominantly cytosolic, and because a PER2 dimer can bind one or two CRY1 molecules, our data suggest that the shuttling equilibrium depends on PER2 saturation state: a PER2 dimer bound to one CRY1 remains cytosolic, whereas a dimer bound to two CRY1 is nuclear.

      These observations are novel and have not been reported previously. They were only possible through time-resolved analysis of overexpressed proteins.

      A number of the findings are confirmatory rather than novel - the phosphorylation-regulated nuclear-cytoplasmic shuttling of CK1 and PER proteins is long known, and it's not clearly stated what is novel here. 

      We acknowledge prior work by Milne et al. (2001), who showed that kinase-dead CK1 is predominantly nuclear and that prolonged treatment with leptomycin B (16 h) enhances its nuclear localization. We cite this study at the beginning of the relevant paragraph. While we confirm these earlier observations, our work extends them in several important and novel ways:

      (1) Rapid dynamics of CK1 localization – We show that pharmacological inhibition of CK1 with PF670 induces rapid (within 1 h) depletion of CK1δ from the centrosome, accompanied by nuclear accumulation and elevated CK1δ levels. These kinetics have not previously been reported. We also show that proteasome inhibition with MG132 enhance centrosomal staining, indicating that centrosomal binding sites are not saturated. Together, the data show that CK1δ equilibrates rapidly between its binding partners. 

      (2) Integration of localization with protein stability – We relate the known localization patterns of WT CK1 and the kinase-dead mutant K38R to CK1 degradation dynamics and further compare them to the tau-like kinase mutant CK1δ-R1178Q. This integration of subcellular localization data with turnover mechanisms provides new mechanistic insight.

      (3) Comprehensive regulatory model – In the revised manuscript, we now include a schematic summarizing how CK1δ is posttranslationally regulated via subcellular shuttling, nuclear degradation, and dynamic interactions with binding partners (Figure EV5C). To our knowledge, such a comprehensive view of CK1δ regulation, linking localization, stability, and partner association, has not been presented before.

      We believe these additions clearly distinguish our findings from prior reports and highlight the novel aspects of our study.

      The formation of PER and CRY and CK1 complexes likewise is well established. The finding that formation of multiprotein complexes stabilize otherwise unstable over-expressed proteins is interesting but not novel.

      We fully agree that the existence of PER–CRY–CK1 complexes is well established. It is also known that PER2 stabilizes CRY1 by occupying the FBXL3 binding site and that CRY1 promotes the nuclear accumulation of PER2. We do not present these established interactions as novel findings.

      Our novel contribution, as outlined above, is the discovery that the shuttling and subcellular localization of PER2 and CRY1 are mutually dependent on their expression ratio. Specifically, we show for the first time that the steady-state shuttling distribution PER2 alone is cytosolic due to its rapid nuclear export wherease CRY1 is predominantly nuclear (known). Given that CRY1 facilitates the nuclear import of PER2 (known) and that a PER2 dimer can bind either one or two CRY1 molecules, our data showing that cytoplasmic PER2-CRY1 foci contain less CRY1 than nuclear foci lead us to conclude that cytoplasmic PER2 complexes contain one CRY1 molecule, while nuclear complexes contain two.

      This model provides a mechanistic explanation for the distribution of PER2 between the cytosol and nucleus and for the relatively lower cytosolic CRY1 levels. Moost importantly, we further show (for the first time) that CK1-mediated phosphorylation of PER2 displaces CRY1. This phosphorylation event would produce PER2 dimers with one or no CRY1 bound, promoting their export to the cytosol. We believe this represents a novel and potentially important mechanism for regulating circadian clock function.

      The results from many of the imaging assays are not quantitated, and the figures often show single cells. It's hard to draw statistical significance from these.

      The phenotypes we report here are result of multiple technical and biological replicates (n >3). Image analysis and statistical analysis was performed when required. We show additional examples in the EVs.

      There are a number of phenomena seen whose physiological relevance is unclear. In figure 1, forced over-expression of CRY1 and PER2 leads to formation of nuclear foci. It is unlikely these foci form at non-overexpressed levels, and so the general interest and relevance is not high nor investigated. This reduces the impact of the finding.

      It has been shown that PERs and CRYs do not form thermodynamically stable, large (detectable) foci under physiological conditions, as we have stated in the manuscript. Whether these proteins have the propensity to form smaller, more dynamic structures of physiological relevance is an interesting question that could be explored elsewhere, but it is not relevant to our study. In our work, these foci are simply convenient markers for analyzing the interaction and subcellular (co)localization of clock proteins under investigation. In the revised version, we have kept the analysis of these foci and the discussion of their potential relevance to a minimum in order to avoid confusion and unnecessary discussions.

      The finding that CK1δ is keep in the dephosphorylated state by binding to PER has been established previously by Johnson and colleagues and should perhaps be mentioned (Qin JBR 2015 (doi: 10.1177/0748730415582127).

      There is clearly a misunderstanding here. Qin et al.’s data show that, in a cell-free system, CK1ε phosphorylates PER2 and also autophosphorylates its C-terminal tail (autoradiograph, Fig. 1E).  

      However, because PER2 phosphorylation is carried out by CK1ε that is tightly anchored to PER2, there is competition between PER2 phosphorylation and tail autophosphorylation. As a result, the kinetics of tail phosphorylation are slower (Fig. 3B and quantification in C) than those observed with free CK1ε (as seen in the presence of the p53 substrate, Fig. 3A,C). We believe that his is also happening in the cell.

      Author response image 1.

      Our data, in contrast, address a different point. It has been known from the Virshup lab for decades that CK1δ/ε undergo futile cycles of (auto)phosphorylation and dephosphorylation, resulting in an active, dephosphorylated kinase in cells because cellular phosphatases are more efficient than CK1 autophosphorylation. We now show that CK1δ is also efficiently dephosphorylated when bound to PER2 (Fig. 3). Nevertheless, despite dephosphorylation of PER2-bound CK1δ, PER2 itself becomes hyperphosphorylated, indicating that cellular phosphatases act differently on these two substrates. To clarify this point, we inhibited phosphatases with calyculin A (CalA). Under these conditions, both PER2 and PER2-bound CK1δ became efficiently hyperphosphorylated (new Fig. 3).

      The degradation of kinase-active but not inactive CK1 is only shown here with 50-fold overexpressed protein so it's interesting, but the relevance to circadian biology is not made clear. The fact that over-expressed CK1 is degraded primarily in the nucleus is interesting, but needs further characterization - is this affected by the epitope tag? Is it true of endogenous CK1 or only over-expressed CK1? Is this not seen with e.g. other forms of CK1, e.g. lacking the C-terminus?

      The observation that unassembled kinase is rapidly degraded is most clearly demonstrated by overexpression experiments. However, Fig. 3 shows that overexpression of CRY1 and PER2 leads to the accumulation of elevated levels of endogenous CK1δ (untagged), indicating that endogenous kinase is likewise degraded in the absence of a stabilizing binding partner. In addition, we present data showing that overexpression of tagged CK1δ reduces the levels of endogenous, untagged CK1δ, further supporting the conclusion that unassembled endogenous CK1δ is unstable and subject to degradation.

      Further characterization of the CK1 degradation pathway is of considerable interest and could form the basis of a separate study, particularly to identify the components that mediate activity-dependent nuclear export and activity-dependent nuclear degradation. The Δ-tail kinase is expressed at very low levels, although interpretation is complicated by the possibility that this reflects pleiotropic effects.

      The final figure, showing that nuclear CK1 is the form responsible for shortening rhythms, is interesting. Is this because massive increases in nuclear CK1 alter PER, or BMAL/CLOCK, or proteasome activity?  

      Our data show that cells expressing either nuclear or cytosolic CK1 are viable, proliferate normally, and maintain a functional circadian clock. Therefore, overexpression of the kinase does not produce pleiotropic effects.

      To assume it's due to PER phosphorylation is in disagreement with the studies of Meng et al. Neuron 2008 DOI 10.1016/j.neuron.2008.01.019.

      The data are not in disagreement with Meng et al.; in fact, they align quite well. Meng et al. showed that CK1ε-tau shortens the circadian period, which we had also previously reported for CK1δ-tau-like (Marzoll et al., 2022). We now demonstrate that CK1δtau-like is enriched in the nucleus, contributing to its period-shortening phenotype. Furthermore, we show that active CK1δ (but not CK1δ-K38R) promotes cytoplasmic accumulation of PER:CRY complexes, consistent with PER2 degradation in the cytosol as described by Meng et al.

      Taken together, these findings suggest that PER proteins acquire their CK1 in the nucleus, and this interaction determines the circadian period length. Following a time delay—set by the kinetics of PER2 phosphorylation—PER2:CRY complexes are exported to the cytosol along with their bound CK1, where they are subsequently degraded.

      Reviewer #2:

      Interactions between the circadian clock proteins PER1/2 with CK1d/e and CRY1/2 influence each of their stability, subcellular localization, and activity, as countless studies over the last two decades have shown. However, many questions still remain, especially in light of newer models of the transcription-translation feedback loop (TTFL) in which the repression phase relies on two distinct mechanisms, a phosphorylation-dependent displacement of the transcription factor by CK1-PER-CRY complexes from DNA early in repression, and a CRY1dependent sequestration of the transcription factor activation domain later in repression. In particular, questions remain about mechanisms triggering nuclear entry/export and activity of these proteins in the cytoplasm and nucleus. 

      Here, the authors utilize a system of induced and/or transient overexpression of proteins with or without with fluorophores to track subcellular localization, stability, and interactions. As the authors point out throughout the manuscript, the overexpression of these clock proteins often causes them to behave differently from the endogenous proteins. It looks as though the authors have done their best to account for these changes, and they have certainly been rigorous in pointing them out, but there is concern that some of the conclusions may be influenced by this overexpression. For example, the relevance of work related to the overexpression-dependent foci is unclear. 

      Same answer as to Reviewer 1: It has been shown that PERs and CRYs do not form thermodynamically stable, large (detectable) foci under physiological conditions, as we have stated in the manuscript. Whether these proteins have the propensity to form smaller, more dynamic structures of physiological relevance is an interesting question that could be explored elsewhere, but it is not relevant to our study. In our work, these foci are simply convenient markers for analyzing the interaction and subcellular (co)localization of the clock proteins under investigation. In the revised version, we have kept the analysis of these foci and the discussion of their potential relevance to a minimum in order to avoid confusion.

      The findings that the stability of the kinase depend on localization, its intrinsic activity, and interaction with PER2 are interesting and important. Use of the CKBD deletion to show that CK1 stabilization depends on its anchoring interaction with PER2 is a nice touch. The authors bring up an excellent point that most of the potential phosphorylation sites on PER1 and PER2 have not been functionally characterized aside from the phosphoswitch mechanism. Their observation that CK1 eventually induces cytoplasmic localization of the CK1-PER-CRY1 complex and the release of CRY1 is intriguing. In particular, the finding that pretreatment of PER2 with CK1 in vitro blocked its ability to interact with CRY1 is very interesting. However, the absence of mechanistic data to explore this in more detail limits the impact of this conclusion. Using the system they have established here to identify the site(s) on PER2 and/or CRY1 that lead to this would help to solidify this work and increase the impact of this work. Overall, there are some interesting findings here but the inclusion of some competing viewpoints and mechanistic data would strengthen the impact of the work.

      Major

      (1) The characterization of the tau-like CK1 mutant R178C as less active than the wild type enzyme is not entirely correct-it is less active on the FASP region as described, but it has increased activity on S478 in the phosphodegron that is independent of inhibition from the FASP region (Gallego et al. PNAS, 2007 and Philpott et al. eLife, 2020). It is still possible that some of the period shortening effects of the mutant could arise from enhanced nuclear accumulation, but the oversimplified description of the mutant as less active should be corrected.  

      In the revised version, we discuss that the enhanced nuclear localization of the Tau-like kinase may contribute, at least in part, to period shortening, similar to how forced nuclear overexpression of wild-type kinase also shortens the period. We emphasize, however, that CK1 Tau is compromised in its priming-dependent activity, whereas its priming-independent activity is context-specific and enhanced toward the β-TrCP site.

      (2) One of main conclusions from the paper, that CK1 induces cytoplasmic localization of the CK1-PER2-CRY1 complex and subsequent release of CRY1 would be strengthened significantly by identifying the phosphorylation site(s) responsible for the cytoplasmic localization of the complex and the release of CRY1. The system they have developed here seems ideal to identify these sites.

      We fully agree with the reviewer. We substituted the known phosphorylation sites in PER2 surrounding the CRY-binding domain, but this had no effect on the phosphorylationdependent release of CRY1. Therefore, a more systematic analysis will be required, including the possibility that phosphorylations in CRY1 itself may contribute. To this end, we are generating PER2 and CRY1 variants in which all Ser/Thr residues are replaced by Ala. Using these constructs alongside the wild-type versions, we will by PCR systematically create hybrids in which specific regions containing phosphorylation sites are exchanged.

      Nevertheless, this will require considerable time and effort, and we believe this investigation exceeds the scope of the present manuscript and will address it in future work.

      (3) The concept of delayed release of CRY1 presented here is an interesting one. It's unclear why the authors have also not incorporated prior findings (Ukai-Tadenuma et al. Cell, 2012, Koike et al. Science, 2012) that peak levels of CRY1 are expressed in a later phase than CRY2, PER1, and PER2. It seems like figure EV6 should reflect the observation that CRY2 is the predominant cryptochrome present during early repression (Koike et al. Science, 2012).

      The reviewer is absolutely right: the expression phases of CRY1, CRY2, PER1, and PER2 are important. I have recently discussed these issues in detail in a News & Views article in The EMBO Journal, commenting on a paper by Smyllie et al. In this News & Views article, I discuss that the presently available data suggest that CRY1 is always present throughout the circadian cycle and keeps circadian transcription partially repressed even at peak phases of expression. In the revised version, I refer to these publications, including those mentioned by the reviewer. However, I would like to keep the model presented in the supplementary figure as simple as possible and specifically focused on the work presented in this manuscript, rather than presenting a comprehensive conceptual model of the circadian clock.

      (4) The model presented in figure EV6 and described throughout the text shows that PER-CRY complexes interact with CK1 in the nucleus, and not in the cytoplasm prior to nuclear entry. Prior work on endogenous protein complexes has shown that CK1-PER-CRY complexes exist in the cytoplasm very early on in the repression phase (Aryal et al. Mol Cell, 2017-ref. 14 in the manuscript). Work by Sancar and colleagues (Cao et al. PNAS, 2020) also shows with endogenous proteins that CK1d has a circadian pattern of nuclear entry (or possibly retention) concomitant with PER2 that is dependent on the presence of PERs and CRYs. Together, these data seem to be inconsistent with your model. 

      We think the data are not inconsistent. The recent Smyllie et al. paper in EMBO Journal shows that PER2 is present in both the cytosol and the nucleus at all times when it is expressed, but cytosolic PER2 is not saturated with CRY, which is more nuclear. Our data demonstrate that PER2 shuttles between the cytosol and the nucleus depending on its occupancy with CRYs (see schematic Fig. 1). Occupancy, in turn, depends on expression levels and binding affinities, including those of CRY2 and PER1. Consequently, PER2 complexes could shuttle continuously throughout the circadian cycle—either because they are not saturated with CRYs due to the balance between expression levels, freely available CRY, and binding affinity, or later in the cycle because CRYs are displaced by phosphorylation. If PER2 acquires casein kinase in the nucleus early in the cycle, it will shuttle out to the cytosol together with the bound CK1. We believe this does occur, but early in the circadian cycle the saturation of PER2 with casein kinase is likely to be very low due to the limited availability of CK1 in the nucleus. I am aware that not everyone will share this interpretation point by point, but discussing it in greater length and detail exceeds the scope of the present manuscript.

      Reviewer #3:

      This manuscript by Serrano and co-workers is a tight body of work that provides much needed insights into the regulation of clock proteins by CK1D, and into the regulation of CK1D itself. While the whole paper relies on artificial overexpression of chimeric/tagged proteins that may have significant differences in the function, the stability and subcellular distribution of the endogenous proteins they are suppose to model, this limitation was been clearly stated by the authors, and nevertheless their study still provides important insights. 

      While the authors have specified which Ck1d isoform (Ck1d1) they are overexpressing in their model cell lines, they may have thought to consider that the overexpression of one Ck1 homologue may affect the endogenous expression of the other homologues and their isoforms, e.g. ck1d1 overexpression may cause an increase in Ck1d2 or Ck1e, which would in turn affect the conclusions. 

      We show in revised Fig. 3 that overexpression of CK1δ1 reduces the expression of endogenous CK1δ1/2. This is consistent with our prediction that overexpressed and endogenous CK1 (including CK1ε) compete for the same stabilizing binding partners, leading to rapid degradation of unassembled kinases.

      Moreover, the antibody they used for endogenous Ck1d (which is ab85320, also mentioned as AF12G4 but that is the clone number, not the catalogue number) is discontinued and its specificity against Ck1d1, Ck1d2 or even the highly identical Ck1e, has not been clearly demonstrated. We know from Fig 3 that it can detect Ck1d1 but it would be great if the authors would provide additional evidence for the specificity of this antibody, for example by overexpressing Ck1d1/Ck1d2/Ck1e to see really which "endogenous" Ck1 we are seeing.

      Are the three bands for example seen in Fig 4A corresponding to the different isoforms? This simple experiment would reinforce the conclusions. 

      We show in the revised figure that the antibody recognizes CK1δ1 and CK1δ2, but not CK1ε. In U2OS cells, the antibody detects a single band (Figure); we do not know whether this represents predominantly one splice isoform or both, which are not resolved. However, this distinction is not relevant for our interpretation, because overexpression of tagged CK1δ1 reduces the expression of whichever endogenous kinase is present.

      There are no minor comments, as the figures, the figure legends and main text are all of good quality and ready for publication.

      Reviewers’ Responses to Point-by-Point Response to Peer Review 

      Referee #1:

      I appreciated the additional efforts by the authors to improve the manuscript. Unfortunately, the underlying approach of forced over-expression remains artifact-prone, and has been largely supplanted by readily available knockin and targeted mutagenesis methods. Over-expression may give clues, but I think more rigorous mechanistic validation is needed to make this compelling. I cannot support publication of this manuscript.

      Referee #2:

      In their response to reviewers, the authors make the valid point that the steady state of a system is usually perturbed to study it. In this study, they have used overexpression of the clock proteins PER2, CRY1 and CK1 to study their effects on subcellular dynamics and stability. In justifying this choice, they refer to several papers that similarly overexpressed at least one of these components, stating that their time-resolved approach brings novel insights. However, there is a missed opportunity here to translate any lessons learned from overexpression studies to a system where the proteins are expressed at physiological levels and stoichiometry.

      The authors reply to reviewer 1 stating that they conclude PER proteins acquire CK1 in the nucleus, but this does not account for other studies showing an apparent PER-CK1 complex in the cytoplasm during the early phases of repression and/or a pattern of PER-dependent nuclear entry of CK1 (Lee et al. 2001, Cell; Aryal et al. 2017 Mol Cell; Cao et al. 2021 PNAS). Given that all 3 of these studies were done with native expression levels, it seems incumbent upon the authors to demonstrate that their conclusions from the overexpression study are physiologically relevant by translating them in some way to a more native system. This also addresses a point made by reviewer 2, major concern 4 that was not satisfactorily addressed by the authors. Perhaps they could validate their hypothesis of PER shuttling and interactions with CK1 or CRY1 that alter this in a native system similar to Aryal or Cao et al. with the use of nuclear export inhibitors?

      The response to reviewer 2, major concern 1 is thoughtful and much appreciated. However, simplifying the effects of the tau mutation on CK1 as having a decreased rate on priming-dependent phosphorylation but not priming-independent is not quite true-the tau mutation also decreases the rate of priming-independent phosphorylation of S662 (in humans) (Philpott et al. 2020, eLife).

      Other papers appearing in this journal seem to all include at least one major new mechanistic insight. Although the authors do a diligent job in characterizing the overexpressed proteins in this system, some of their conclusions are at odds with prior studies of the system in more native conditions, so the potential impact of this work is unclear. To verify these conclusions or test new ones (ie, that CK1 disrupts PER-CRY1 interactions), they should use their insights to generate mutations or make perturbations in a native system and demonstrate that they still hold.

      Referee #3:

      The authors have adequately addressed the reviewers' comments, and it is my opinion that the manuscript is ready for publication. It is true, as previously mentioned by other reviewers, that the evidence presented rely on overexpression, which for the other reviewers seem to preclude publication. However, I find this to be a too strict opinion.

      If the authors had indeed provided evidence using crispr-cas9-mediated genetic manipulation and tagging/mutating endogenous genes for all their experiments, thereby providing more physiological evidence of how clock proteins interact, they would probably have submitted their manuscript to an alternative journal with a higher impact.

      As it stands, it is my opinion that, considering the evidence and limitations of the study, this manuscript is a good match for the journal.

      Author Rebuttal:

      Apologies for the delayed reply regarding our manuscript. In the meantime, we have added several new experiments which address the comments of the reviewers and more. These are now included as Figures 1C, EV3, 4D, 6E, 6F, EV6D, and EV7.

      Figure 1C reinforces our observations from Figure 1B showing that induction of stably-integrated PER2 also results in accumulation of endogenous CRY1 at a timescale that is compatible with the gradual localization of overexpressed PER2 into the nucleus.

      Figure EV3 addresses several technical comments from Reviewers #3 and #1, respectively: Figure EV3A shows that our CK1δ antibody recognizes CK1δ1 and CK1δ2, but not CK1ε. Figures EV 3B and C clearly show how overexpression of our transgenic CK1δ results in decreased endogenous CK1δ which further demonstrates the rapid turnover of active kinase.

      Figure 4D addresses the comment from Reviewer #2. We clearly show that CK1δ is not kept in a dephosphorylated state by binding to PER. In addition to our direct comment to this point, Figure 4D shows that CK1δ regardless if it is expressed alone or in complex with PER2 is phosphorylated to a similar extent when the cells are treated with the phosphatase inhibitor CalA. As indicated in our direct response, we are rather more interested in the observation that cellular phosphatases act differently on PER2 compared to CK1δ despite being in the same PER:CK1δ complex (as shown by the clear stabilization of overexpressed CK1δ by co-expression of PER2).

      Figures 6E, 6F, and EV6D demonstrate that our observations from overexpression systems are also observed in a more physiological context, addressing comments from Reviewers #1 and #2. Figure 6E shows that dephosphorylation of PER2 leads to its relocalization from the cytosol to the nucleus, while Figure 6F analyzes the subcellular localization of PER2 in the context of a functional circadian clock in U2OS cells. The latter demonstrates that PER2 is predominantly nuclear early in the circadian cycle, but redistributes to the cytosol at later time points. We included these experiments in response to the reviewer’s request for a more physiological context. Since we are not a mouse lab, this cell-based system represents the most physiological model we can provide. Figure 6F show the dynamics of endogenous PER2 from DEX-synchronized cells. At early timepoints, PER2 is predominantly nuclear likely due to the incorporation of CRY1 forming the PER:CRY complex. At later timepoints PER2 is redistributed between the cytoplasm and nucleus due to PER2 phosphorylation. Importantly, these results are consistent with and recontextualize the results from Liu et al. (Xie et al., PNAS, 2023) showing the hypophosphorylated PER2 at early timepoints post-DEX is predominantly nuclear and hyperphosphoryated PER2, that appear later post-DEX is predominantly cytoplasmic.

      Finally, Figure EV7 provides a model how the subcellular distribution of CK1δ affects its assembly into the PER:CRY complex emphasizing how nuclear kinase enacts its role in the circadian clock.

      Response to Reviewers:

      We were disappointed by the categorical rejection of overexpression experiments. Without a specific discussion of why they would be inappropriate or not sufficient in the context of the work presented here, the blanket assertion that overexpression inevitably produces artifacts functions more as a rhetorical device than as a substantiated scientific argument. The fact that the term ‘physiological’ generally carries a positive connotation, whereas ‘overexpression’ is often perceived negatively, does not in itself justify the categorical rejection of experiments.

      While we appreciate that some reviewers may personally prefer alternative strategies, we believe that the suitability of any approach must be evaluated in light of the specific biological questions being addressed. I cannot see a single specific point in the reviewers’ responses indicating that any of our experiments yielded artificial results. It is true that targeted knock-in and mutagenesis methods are available, however, these approaches are simply not suited to the questions raised in this manuscript. We also fully agree that, whenever possible, insights from overexpression studies should be validated in systems with a functional clock where proteins are expressed at physiological levels, which we did using U2OS cells, and noting the compatibility of our results with those in the literature using endogenously-tagged constructs. We have cited several recent studies that have investigated the subcellular distribution and circadian dynamics of endogenous or endogenously-tagged clock proteins in mice (Cao et al, 2021; Smyllie et al, 2022, 2016, 2025) and U2OS cells (Öllinger et al, 2014; Gabriel et al, 2021; Xie et al, 2023). While we cannot substantially expand on these previous observations, we confirm them in the revised version by demonstrating the nuclear-to-cytoplasmic relocalization of PER2 in U2OS cells over the course of a circadian cycle. In addition, we show that this process is, in principle, reversible: when CK1 is inhibited with PF670, overexpressed hyperphosphorylated cytosolic PER2 becomes dephosphorylated and accumulates in the nucleus.

      Overall, we consider our approach not only complementary but also essential, as it enables us to address two key questions that would otherwise be difficult or even impossible to resolve:

      (1) Mutual impact of PER2 and CRY1 on subcellular dynamics and the role of PER2 phosphorylation

      Evidence from mouse liver (Cao et al, 2021), mouse SCN (Smyllie et al, 2022, 2025), and U2OS cells (Xie et al, 2023) indicates that a substantial fraction of PER2 remains cytoplasmic throughout its expression cycle, even in the presence of CRY1, which promotes PER’s nuclear import. The mechanisms underlying this cytoplasmic retention remain unclear, and no circadian function has yet been attributed to the cytosolic PER2 pool. Our study addresses how PER2 abundance, phosphorylation state, and stoichiometry relative to CRY1 govern their interaction and subcellular dynamics. This is physiologically relevant because PER1/2 and CRY1/2 proteins oscillate in expression and degradation out of phase, such that their concentrations, stoichiometry, and phosphorylation state vary systematically over the circadian cycle. Transient transfection and inducible overexpression combined with time-lapse microscopy are essential here, as they uniquely allow modulation of protein ratios and CK1δ levels and to resolve their dynamics.

      Previous work established that CRY1 is nuclear and promotes PER2 nuclear accumulation (Smyllie et al, 2022). Our data extend this by showing that subcellular distribution is determined by the CRY1:PER2 ratio. While CRY1 alone is nuclear we show that PER2 alone is cytoplasmic due to rapid nuclear export. Mixed conditions reveal ratio-dependent shifts: at low CRY1-to-PER2 ratios, CRY1 relocalizes to the cytoplasm, whereas at high ratios, PER2 is retained in the nucleus. We explain this behavior by PER2 dimerization: dimers bound to two CRY1 molecules remain nuclear, while dimers bound to a single CRY1 localize to the cytosol. Such species can be expected to form in a physiological context depending on binding affinities and rhythmic expression levels and ratios across circadian time. Importantly, we show that CK1δ-mediated phosphorylation destabilizes PER2 and CRY1 interactions. From this, we infer that PER2 dimers with only a single bound CRY1 transiently form and accumulate in the cytosol, consistent with the lower CRY1-to-PER2 ratio we observe in the cytosol and that has also been reported in the SCN (Smyllie et al, 2025). With continued phosphorylation, PER2 dimers lose CRY1 altogether, while the released CRY1 accumulates in the nucleus. We suggest that this mechanism supports and extends the late repressive phase of the circadian cycle. Recent data show that hypophosphorylated PER2 is predominantly nuclear, whereas hyperphosphorylated PER2 is largely cytoplasmic in mouse liver (Cao et al, 2021; Xie et al, 2023), linking our data to a physiological context.

      Taken together, these findings suggest a mechanism whereby stoichiometry, subunit composition, and CK1δ phosphorylation determine PER:CRY complex composition and localization. Crucially, these complexes and their dynamic relocalization could only be observed using inducible overexpression; knock-in strategies at endogenous levels would not be able to capture such states.

      (2) Posttranslational regulation and subcellular homeostasis of CK1δ and impact on the clock

      Previous work has shown that nuclear export of CK1δ depends on its kinase activity (Milne et al, 2001). Here, we further demonstrate that unassembled CK1δ is subject to degradation, with nuclear turnover accelerated by its catalytic activity. Thus, when evaluating the impact of CK1δ mutants on the circadian clock, one must consider not only kinase activity but also protein stability and subcellular distribution. We find that CK1δ availability for PER2 differs between cytosol and nucleus. In particular, nuclear CK1δ is limiting, and its abundance directly determines circadian period length. This is significant because subcellular CK1δ availability and posttranslational regulation have not previously been examined or incorporated into circadian clock models, as the kinase has been assumed to be non-limiting given its constant expression throughout the circadian cycle. Complex formation between CK1δ and PER is a well-established determinant of circadian timing, with CK1δ overexpression known to shorten period length. Our data explain why: the binding equilibrium between CK1δ and PER must be finely tuned. Previous studies suggested that PER associates with CK1δ in the cytosol and enters the nucleus as a PER:CRY:CK1δ complex (Lee et al, 2001; Aryal et al, 2017). Our data suggest that nuclear PER is not saturated with CK1δ. This is because levels of free, active CK1δ in the nucleus are low, owing to its rapid export or degradation by the nuclear proteasome, which limits its availability for PER binding.

      Our overexpression studies support this mechanism. NES-tagged CK1δ overexpression does not alter circadian period length, because it fails to increase nuclear CK1δ levels: Each PER molecule can coimport only one kinase, a process already occurring in wild-type cells, and the few co-imported molecules rapidly equilibrate with the nuclear pool, where they are subject to export or degradation. In contrast, NLS-tagged CK1δ overexpression directly increases nuclear kinase abundance by antagonizing export, thereby enhancing PER binding and shortening circadian period. This multilayered regulation of CK1δ stability and localization and its consequences for PER2 availability would not have been revealed without targeted overexpression. Our findings therefore fill a key knowledge gap and remain fully consistent with previous studies (Lee et al, 2001; Aryal et al, 2017; Cao et al, 2021).

      Conclusion: In sum, our findings are novel and physiologically relevant, aligning with data from mouse liver and SCN. While studies at strictly endogenous protein levels are important and necessary, perturbation of steady state is a standard strategy to uncover and observe novel mechanisms. Endogenous-level experiments would demand technically unrealistic systems (for example, even the simplest case, analyzing the subcellular dynamics of PER2 alone, would require cells lacking PER1, CRY1/2, and CK1δ/ε). Moreover, adjustment of PER2-to-CRY1 ratios cannot be achieved with stably integrated genes and of course not at physiological expression levels. Thus, inducible overexpression is not merely practical but currently the most feasible approach to dissect these dynamics. We complement our findings with data from U2OS cells with a functional clock, showing that the availability of nuclear CK1δ directly determines circadian period length. Although specific aspects of our extended model require further experimental validation, no published evidence contradicts it to date. Mechanistic discussions of the circadian clock have so far focused primarily on PER protein degradation. Our model broadens this perspective by incorporating CK1δ homeostasis, PER:CRY complex composition, subcellular localization, and their regulation by phosphorylation. In doing so, it provides a detailed framework to be critically tested and refined in future studies.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript investigates how dentate gyrus (DG) granule cell subregions, specifically suprapyramidal (SB) and infrapyramidal (IB) blades, are differentially recruited during a high cognitive demand pattern separation task. The authors combine TRAP2 activity labeling, touchscreen-based TUNL behavior, and chemogenetic inhibition of adult-born dentate granule cells (abDGCs) or mature granule cells (mGCs) to dissect circuit contributions.

      This manuscript presents an interesting and well-designed investigation into DG activity patterns under varying cognitive demands and the role of abDGCs in shaping mGC activity. The integration of TRAP2-based activity labeling, chemogenetic manipulation, and behavioral assays provides valuable insight into DG subregional organization and functional recruitment. However, several methodological and quantitative issues limit the interpretability of the findings. Addressing the concerns below will greatly strengthen the rigor and clarity of the study.

      Major points:

      (1) Quantification methods for TRAP+ cells are not applied consistently across panels in Figure 1, making interpretation difficult. Specifically, Figure 1F reports TRAP+ mGCs as density, whereas Figure 1G reports TRAP+ abDGCs as a percentage, hindering direct comparison. Additionally, Figure 1H presents reactivation analysis only for mGCs; a parallel analysis for abDGCs is needed for comparison across cell types.

      In Figure 1G and 1H we report TRAP+ abDGCs as a percentage rather than density because we are analyzing colocalization of the two markers, which are very sparse in this population. Given the very low number of double-labeled abDGCs, calculating density would not be practical. In the revised manuscript we have clarified the rationale for using these measures. As noted in the current text, we did not observe abDGCs co-expressing TRAP and c-Fos; we have made this point more explicit to guide interpretation of these data.

      (2) The anatomical distribution of TRAP+ cells is different between low- and high-cognitive demand conditions (Figure 2). Are these sections from dorsal or ventral DG? Is this specific to dorsal DG, as itis preferentially involved in cognitive function? What happens in ventral DG?

      The sections shown in Figure 2 were obtained from the dorsal dentate gyrus (see Methods, “Histology and imaging”: stereotaxic coordinates −1.20 to −2.30 mm relative to bregma, Paxinos atlas). From a feasibility standpoint, it is not possible to analyze the entire longitudinal extent of the hippocampus with these low-throughput histological approaches. We therefore focused on the dorsal DG, for which there is a strong functional rationale. A large body of work indicates that the dorsal hippocampus, and specifically the dorsal DG, is preferentially involved in spatial memory and in the fine contextual discrimination that underlies pattern separation. The dorsal hippocampus is critical for encoding and distinguishing similar spatial representations, a core component of the high-cognitive demand task used here. In contrast, the ventral DG is more strongly associated with emotional regulation and affective memory processing and is less implicated in high-resolution spatial encoding. For these reasons, the present study was designed to assess TRAP+ cell distributions specifically in the dorsal DG.

      (3) The activity manipulation using chemogenetic inhibition of abDGCs in AsclCreER; hM4 mice was performed; however, because tamoxifen chow was administered for 4 or 7 weeks, the labeled abDGC population was not properly birth-dated. Instead, it consisted of a heterogeneous cohort of cells ranging from 0 to 5-7 weeks old. Thus, caution should be taken when interpreting these results, and the limitations of this approach should be acknowledged.

      We agree that prolonged tamoxifen administration results in labeling a heterogeneous population of abDGCs spanning approximately 0 to 5–7 weeks of age, rather than a precisely birth-dated cohort. This is a limitation of this approach and we have included discussion of this in more detail in the revised manuscript.

      (4) There is a major issue related to the quantification of the DREADD experiments in Figure 4, Figure 5, Figure 6, and Figure 7. The hM4 mouse line used in this study should be quantified using HA, rather than mCitrine, to reliably identify cells derived from the Ascl lineage. mCitrine expression in this mouse line is not specific to adult-born neurons (off-targets), and its expression does not accurately reflect hM4 expression.

      We agree that mCitrine is not a marker that allows localization of hM4Di as it is well known that the mCitrine can be independently expressed in a Cre independent manner in this mouse. As suggested, we have removed the figure that showed the mCitrine and have performed immunohistochemical localization of the DREADD with an antibody against the HA tag. This is now shown in Figure 5.

      (5) Key markers needed to assess the maturation state of abDGCs are missing from the quantification. Incorporating DCX and NeuN into the analysis would provide essential information about the developmental stage of these cells.

      The goal of this study was to examine activity patterns of adult-born versus mature granule cells, rather than to assess maturation state. The adult-born neurons analyzed were 25–39 days old, an age at which point most cells have progressed beyond the DCX<sup>+</sup> stage and are expected to express NeuN based on prior work. We therefore do not think that including DCX or NeuN quantification would provide additional information relevant to the aims or interpretation of this study.

      Minor points:

      (1) The labeling (Distance from the hilus) in Figure 2B is misleading. Is that the same location as the subgranular zone (SGZ)? If so, it's better to use the term SGZ to avoid confusion.

      We have updated Figure 2B, the Methods, and the main text to more explicitly localize this which it the boundary between the subgranular zone (SGZ) and the hilus.

      (2) Cell number information is missing from Figures 2B and 2C; please include this data.

      We have now added the cell number information to the figure legends. In Figures 2B and 2C, each point corresponds to a single cell, with an equal number of mice per group. The total number of TRAP<sup>+</sup> cells per mouse is shown in Figure 1F, which reports TRAP<sup>+</sup> cell densities by group.

      (3) Sample DG images should clearly delineate the borders between the dentate gyrus and the hilus. In several images, this boundary is difficult to discern.

      We made the DG-hilus boundaries clearer in the sample images to improve visualization and interpretation.

      (4) In Figure 6, it is not clear how tamoxifen was administered to selectively inhibit the more mature 6-7-week-old abDGC population, nor how this paradigm differs from the chow-based approach. Please clarify the tamoxifen administration protocol and the rationale for its specificity.

      We apologize for the confusion here. The protocol used in Figure 6 is the same tamoxifen chow–based approach as in Figure 5, differing only in the duration of tamoxifen exposure. Mice in Figure 5 received tamoxifen chow for 7 weeks, whereas mice in Figure 6 received it for 4 weeks, restricting labeling to a younger and narrower cohort of adult-born DGCs. Thus, the population targeted in Figure 6 is younger than that in Figure 5 and does not correspond to mature 6–7-week-old neurons. By contrast, the experiment in Figure 4 targets a more mature population, consisting predominantly of ~5-week-old adult-born neurons as well as mature granule cells, which are Dock10-positive and express Cre endogenously, allowing selective manipulation of this later-stage population.

      We have corrected the paragraph accordingly and clarified the age range of the labeled populations in the revised manuscript.

      Comments on revisions:

      I appreciate the authors' careful and thorough revisions. They have addressed all of my previous concerns satisfactorily, and the manuscript is now significantly strengthened. I have no further concerns.

      Reviewer #2 (Public review):

      In this study, the authors investigate how increasing cognitive demand shapes activity patterns in the dorsal dentate gyrus (DG). Using a touchscreen-based TUNL task combined with TRAP/c-Fos tagging, birth-dating of adult-born granule cells (abDGCs), and chemogenetic inhibition, they show that higher task demand increases mature granule cell (mGC) recruitment and enhances suprapyramidal (SB) versus infrapyramidal (IB) blade bias. Functionally, mGC inhibition reduces overall activity and impairs performance without disrupting blade bias, whereas inhibition of {less than or equal to}7-week-old abDGCs increases mGC activity, abolishes blade bias, and impairs discrimination under high-demand conditions. These findings suggest that effective pattern separation depends not only on overall DG activity levels but also on the spatial organization of recruited ensembles.

      The integration of touchscreen TUNL with temporally controlled activity tagging and birth-dated cohorts is technically strong. Quantification of SB-IB bias and radial/apical distributions adds anatomical precision beyond bulk activity measures. The comparison between mGC and abDGC inhibition is conceptually compelling and supports dissociable functional roles. Overall, the data convincingly demonstrate that increasing cognitive demand amplifies blade-biased DG recruitment and that mGCs and abDGCs differentially contribute to both behavioral performance and network organization.

      However, how abDGCs are integrated into the mGC network under high cognitive demand remains unresolved. Additional experiments are needed to clarify how abDGCs shape spatial recruitment patterns and whether they directly inhibit or indirectly regulate mGC activity to maintain high performance.

      Furthermore, the authors frame "high cognitive demand" as a multidimensional construct encompassing broad behavioral challenge. It would strengthen the work to delineate how local abDGC-mGC circuit interactions regulate specific task components in real time. This will require higher temporal resolution approaches, as TRAP and c-Fos labeling integrate activity over prolonged windows and primarily reflect sustained engagement rather than moment-to-moment computations.

      The central conclusion that dentate function depends on coordinated spatial recruitment rather than total activity magnitude is supported by the data, although mechanistic interpretations should be tempered given methodological limitations.

      Overall, this work advances models of adult neurogenesis by emphasizing a critical-period modulatory role of abDGCs in organizing DG network activity during high-demand discrimination. The combined behavioral and circuit-level framework is likely to be influential in the field.

      Reviewer #3 (Public review):

      This study examines the role of dentate gyrus neuronal populations, reflecting neurogenesis and anatomical location (suprapyramidal vs infrapyramidal blade), in a mnemonic discrimination task that taxes the pattern separation functions of the dentate. The authors measure dentate gyrus activity resulting from cognitive training and test whether adult neurogenesis is required for both the anatomical patterns of activity and performance in the cognitive task. The authors find that more cognitively challenging variants of the task evoked more dentate activity, but also distinct patterns of activity (more activity in the suprapyramidal blade, less in the infdrapyramidal blade). Using chemogenetic approaches they silence mature vs immature dentate gyrus neurons and find that only mature neurons (either the general population or specifically mature adult-born neurons), and not immature adult-born neurons, are required for the difficult version of the task. Inhibition of mature adult-born neurons furthermore increased overall activity in the dentate and reduced the biased pattern of activity across the blades, consistent with evidence that adult-born neurons broadly regulate dentate gyrus activity.

      Comments on revisions:

      I appreciate the efforts the authors have taken to revise this manuscript. I have only minor concerns with this revised version of the manuscript:

      Methods state that significance is defined as P<0.05 but some results are interpreted as significant when P=0.05. Either the alpha value needs to change or the interpretation needs to change.

      We have corrected the statement in the Methods section to define statistical significance as P ≤ 0.05, which aligns with how significance was interpreted throughout the manuscript.

      I believe the statistical results for group and blade effects for the ANOVAs, in Figs 2,3 & 4, appear to be switched (blade should be significant, not group).

      We thank the reviewer for pointing out this mistake. We have corrected the reported statistical results for the group and blade effects in the manuscript accordingly.

      I appreciate that sometimes there is not a perfect overlap between immunohistochemical signals, but I continue to believe that the spatially-non-overlapping TRAP and EDU signals in Fig 3 is caused by these 2 markers being in different cells. A Z-stack or orthogonal projection could verify/disprove this concern.

      We agree that limited overlap in single optical sections can raise the possibility that TRAP and EdU signals originate from different cells. However, based on our imaging conditions and inspection across focal planes, the signals are consistent with being present within the same cells, with partial spatial separation likely reflecting subcellular localization and/or sectioning effects.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript presents a compelling new in vitro system based on isogenic co-cultures of human iPSC-derived hepatocytes and macrophages, enabling the modelling of hepatic immune responses with unprecedented physiological relevance. The authors show that co-culture leads to enhanced maturation of hepatocytes and tissue-resident macrophage identity, which cannot be achieved through conditioned media alone. Using this system, they functionally validate immune-driven hepatotoxic responses to a panel of drugs and compare the system's predictive power to that of monocyte-derived macrophages. The results underscore the necessity of macrophage-hepatocyte crosstalk for accurate modelling of liver inflammation and drug toxicity in vitro.

      The manuscript is clearly written and addresses a key limitation in liver organoid systems: the lack of immune complexity and tissue-specific macrophage imprinting. Nevertheless, several conclusions would benefit from a more careful interpretation of the data, and some important controls or explanations are missing, particularly in the flow cytometry gating strategies, stress marker validation, and cluster interpretations.

      Strengths:

      (1) Novelty and Relevance: The study presents a highly innovative co-culture system based on isogenic human iPSCs, addressing an unmet need in modelling immune-mediated hepatotoxicity.

      (2) Mechanistic Insight: The reciprocal reprogramming between iHeps and iMacs, including induction of KC-specific pathways and hepatocyte maturation markers, is convincingly demonstrated.

      (3) Functional Readouts: The application of the model to detect IL-6 responses to hepatotoxic compounds enhances its translational relevance.

      Weaknesses:

      (1) Several key claims, particularly those derived from PCA plots and DEG analyses, are overinterpreted and require more conservative language or further validation.

      We agree that PCA does not allow for maturation trajectories and mentioned that it was a hypothesis that the co-culture was promoting maturation, which we later validated by looking at the expression of key hepatocyte markers as well as by pearson correlation comparison with fetal hepatocytes.

      (2) The purity of sorted hepatocytes and macrophages is not convincingly demonstrated; contamination across gates may confound transcriptomic readouts.

      We agree and have highlighted and addressed this limitation in our discussion. Unfortunately, this is a limitation of bulk sequencing that a small amount of contamination might be present, however the TPM values of ALB for example in the iMacs is extremely low especially when compared to the hepatocytes, indicating that the level of contamination is likely to be very low. Likewise, the expression of CSF1R in the co-cultured iHeps is also extremely low. This has been included in Supp Fig 1F and G.

      (3) Stress response genes and ER stress/apoptosis signatures are not properly assessed, despite being potentially activated in the system.

      This has been included in Supp Fig 2C, where we’ve included the expression of ATF4, CASP3 and CASP9. Although there’s a significant difference in ATF4 expression between Day 0 and Day 7 iHep only/Co-culture, there is no significant difference between the Day 7 iHep only and Day 7 iHep Co-culture. There are no significant differences in CASP3 and CASP9 expression across all the samples.

      (4) Some figure panels and legends lack statistical annotations, and microscopy validation of morphological changes is missing.

      Although we agree that the morphology changes would be interesting, we think that this question is unfortunately outside of the scope of our question. Although Kupffer cells are in direct contact with hepatocytes, they migrate from the liver parenchyma into the sinusoidal spaces where they primarily reside. We do not think that the morphology would add much to the paper, especially given that this is a 2D model as well.

      (5) The co-culture model with monocyte-derived macrophages is not fully characterised, making comparisons less informative.

      Although we agree that it would be interesting to look more closely at the monocyte-derived macrophage co-cultures as well, we think that this would be more suited to a future study as the transcriptomic analysis would likely include confounding effects of patient specific transcriptomic changes, and our primary focus was on developing an isogenic co-culture system.

      Reviewer #2 (Public review):

      Summary:

      This study builds on work by Glass and Guilliams showing that mouse Kupffer cells depend on the surrounding cells, including endothelium, hepatocytes, and stellate cells, for their identity. Herein, the authors extend the work to human systems. It nicely highlights why taking monocyte-derived macrophages and pretending they are Kupffer cells is simply misleading.

      Strengths:

      Many, including human cells, difficult culture assays, and important new data.

      Weaknesses:

      This reviewer identified minor queries only, rather than 'weaknesses' as such.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors establish a human in vitro liver model by co-culturing induced hepatocyte-like cells (iHEPs) with induced macrophages (iMACs). Through flow cytometry-based sorting of cell populations at days 3 and 7 of co-culture, followed by bulk RNA sequencing, they demonstrate that bidirectional interactions between these two cell types drive functional maturation. Specifically, the presence of iMACs accelerates the hepatic maturation program of iHEPs, while contact-dependent cues from iHEPs enhance the acquisition of Kupffer cell identity in iMACs, indicating that direct cell-cell interactions are critical for establishing tissue-resident macrophage characteristics.

      Functionally, the authors show that iMAC-derived Kupffer-like cells respond to pathological stimuli by producing interleukin-6 (IL-6), a hallmark cytokine of hepatic immune activation. When exposed to a panel of clinically relevant hepatotoxic drugs, the co-culture system exhibited concentration-dependent modulation of IL-6 secretion consistent with reported drug-induced liver injury (DILI) phenotypes. Notably, this response was absent when hepatocytes were co-cultured with monocyte-derived macrophages from peripheral blood, underscoring the liver-specific phenotype and functional relevance of the iMAC-derived Kupffer-like cells. Collectively, the study proposes this co-culture platform as a more physiologically relevant model for interrogating macrophage-hepatocyte crosstalk and assessing immune-mediated hepatotoxicity in vitro.

      Strengths:

      A major strength of this study lies in its systematic dissection of cell-cell interactions within the co-culture system. By isolating each cell type following co-culture and performing comprehensive transcriptomic analyses, the authors provide direct evidence of bidirectional crosstalk between iMACs and iHEPs. The comparison with single-culture controls is particularly valuable, as it clearly demonstrates how co-culture enhances functional maturation and lineage-specific gene expression in both cell types. This approach allows for a more mechanistic understanding of how hepatocyte-macrophage interactions contribute to the acquisition of tissue-specific phenotypes.

      Weaknesses:

      (1) Overreliance on bulk RNA-seq data:

      The primary evidence supporting cell maturation is derived from bulk RNA sequencing, which has inherent limitations in resolving heterogeneous cellular states and functional maturation. The conclusions regarding hepatocyte maturation are based largely on increased expression of a subset of CYP genes and decreased AFP levels - markers that, while suggestive, are insufficient on their own to substantiate functional maturation. Additional phenotypic or functional assays (e.g., metabolic activity, protein-level validation) would significantly strengthen these claims.

      We have added a discussion on the limitations of our study.

      (2) Insufficient characterization of input cell populations:

      The manuscript lacks adequate validation of the cellular identities prior to co-culture. Although the authors reference previously published protocols for generating iHEPs and iMACs, it remains unclear whether the cells used in this study faithfully retain expected lineage characteristics. For example, hepatocyte preparations should be characterized by flow cytometry for ALB and AFP expression, while iMACs should be assessed for canonical macrophage markers such as CD45, CD11b, and CD14 before co-culture. Without these baseline data, it is difficult to interpret the magnitude or significance of any co-culture-induced changes.

      We apologise for this oversight, some of the markers were used in determining the purity of the iMacs before co-culture, and we did not end up including these plots for brevity. We have added the purity plots in Supp Fig 2E now, showing that the iMacs were more than 90% pure before co-culture. We acknowledge the concern about cross-contamination for bulk sequencing, and have added in Supp Fig 2G and H the expression of ALB in the iMac fraction, as well as the expression of CSF1R in the iHep fraction, showing minimal contamination with our gating strategy.

      (3) Quantitative assessment of IL-6 production is insufficient:

      The analysis of drug-induced IL-6 responses is based primarily on relative changes compared to control conditions. However, percentage changes alone are inadequate to capture the biological relevance of these responses. Absolute cytokine production levels - particularly in response to LPS stimulation - should be reported and directly compared to PBMC-derived macrophages to determine whether iMAC-derived Kupffer-like cells exhibit enhanced cytokine output. Moreover, the Methods section should clearly describe how ELISA results were normalized or corrected to account for potential differences in cell number, viability, or culture conditions.

      We apologise if this was unclear. The cytokine production from dosed cells was normalized based on the viability of cells measured from the same well.

      (4) Unclear mechanistic interpretation of IL-6 modulation:

      The observed changes in IL-6 production upon drug treatment cannot be interpreted solely as evidence of Kupffer cell-specific functionality. For instance, IL-6 suppression by NSAIDs such as diclofenac is well known to result from altered prostaglandin synthesis due to COX inhibition, while leflunomide's effects are linked to metabolite-induced modulation of immune cell proliferation and broader cytokine networks. These mechanisms are distinct from Kupffer cell identity and may not directly reflect liver-specific macrophage function. Consequently, changes in IL-6 secretion alone - particularly without additional mechanistic evidence or analysis of other cytokines - are insufficient to conclude that co-culture with hepatocytes drives the acquisition of bona fide Kupffer cell maturity.

      We fully agree with the reviewer and have highlighted this in our discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) GSE ID for RNA-seq data has not been provided.

      This has been included.

      (2) Line 291: Can the authors specify what they mean by "state-of-the-art"?

      What we mean here is what others in the field have also recently described. We have rewritten this to be clearer.

      (3) Lines 299-300: check sentence for grammar mistakes.

      We have rewritten and clarified this.

      (4) Figure 1B: The PCA does not really allow for following maturation trajectories. Also, all samples (day 3 Co-iHep, day 7 Co-iHep, day 7 iHep) look as if they cluster more or less together. Therefore, the conclusion drawn in lines 303-305 does not hold. Why is day 3 iHep not also shown here?

      We agree that PCA does not allow for maturation trajectories and mentioned that it was a hypothesis that the co-culture was promoting maturation, which we later validated by looking at the expression of key hepatocyte markers as well as by pearson correlation comparison with fetal hepatocytes.

      (5) Can the authors show that the cells that they are sorting in the double negative gate are indeed hepatocytes? Typically, these cells are big in cell size; therefore, showing the FSC/SSC gate would also be important.

      We have added the FSC/SSC gate in supp fig. 1E to show that the populations have different sizes.

      (6) Can the authors provide microscopy pictures of iHeps, iMacs, and the co-cultured cells for the reader to appreciate whether the morphology of cells already changes during the co-culture experiments?

      Although we agree that the morphology changes would be interesting, we think that this question is unfortunately outside of the scope of our question. Although Kupffer cells are in direct contact with hepatocytes, they migrate from the liver parenchyma into the sinusoidal spaces where they primarily reside. We do not think that the morphology would add much to the paper, especially given that this is a 2D model as well.

      (7) Please show expression of apoptotic and ER stress genes comparing Day7 iHeps and Co-iHeps, since genes such as c-Fos and Ppp2r3b can also be associated with cellular stress.

      This has been included in Supp Fig 2C, where we’ve included the expression of ATF4, CASP3 and CASP9. Although there’s a significant difference in ATF4 expression between Day 0 and Day 7 iHep only/Co-culture, there is no significant difference between the Day 7 iHep only and Day 7 iHep Co-culture. There are no significant differences in CASP3 and CASP9 expression across all the samples.

      (8) In addition to the genes shown in Figure 1E, could the authors extract a longer gene list of maturing hepatocytes and display them all in bar graphs or heatmaps, or similar? E.g., Albumin expression is shown later, but why not show it already here?

      There are not many differences in the canonical hepatocyte markers, which is why we chose only to show the interesting genes that were different, as seen in the later ALB expression plot where there wasn’t a difference in ALB expression after 7 days of co-culture. Instead, we have included a new heatmap in Supp Fig 2B showing the top 40 genes that are contributing to the similarity by pearson correlation.

      (9) Along these lines, how do the authors ensure that they are culturing only hepatocytes and do not have a mixture of cells that may "dilute" the hepatocyte signature?

      Unfortunately, this is an limitation of our methodology, although the expression of key hepatic markers are routinely confirmed by qPCR to ensure that the majority of the cells are hepatocyte-like.

      (10) Lines 347-350: similar to the interpretation of the PCA for hepatocytes, this is a completely random interpretation. The expression of ALB in the co-cultured iMacs indicates that there are some hepatocytes that ended up in the macrophage gate.

      We agree and have highlighted and addressed this limitation in our discussion. Unfortunately, this is a limitation of bulk sequencing that a small amount of contamination might be present, however the TPM values of ALB for example in the iMacs is extremely low especially when compared to the hepatocytes, indicating that the level of contamination is likely to be very low. Likewise, the expression of CSF1R in the co-cultured iHeps is also extremely low. This has been included in Supp Fig 1F and G.

      (11) Figure 2D: Among the pathways shown, there are also stress pathways (acute phase response, HMGB1). Also for these cells, control of apoptotic and ER stress signatures is necessary.

      As mentioned, we have included some stress genes in Supp Fig 2C to address this.

      (12) Lines 385-386: Why would FCGRA3 indicate tissue residency? Is there literature to support this statement?

      CD16 is a marker often used to distinguish Kupffer cells from the surrounding cells, although it also expressed by non-classical monocytes, we have clarified the text here (Lines 356-357).

      (13) Figure 3E: ALB and other genes were at the same or even lower levels expressed in D7 compared to D3. Why is that? Are the cells starting to de-differentiate after 7 days? Please discuss.

      This is a very interesting question that we were wondering ourselves as well, although sadly we do not have an answer yet. We hypothesized that this might be due to the activation of cell proliferation/developmental programmes as the cells are kept longer together, as shown by the expression of morphogens like OSM and IGF-2 after co-culture. We have added some discussion for this (Lines 532-540)

      (14) Line 459: Word "in" is double

      We thank the reviewer for catching this, this has been corrected

      (15) Figure 5: The findings are interesting, but the co-culture model remains somewhat unclear. Can the authors show, e.g., using qRT-PCR, how hepatocytes are developing in this culture system? If the development with monocyte-derived macrophages is altered, then one would expect that also the cellular response is different.

      We agree with the reviewer, but we think that this question would be better answered in a follow-up study. We were looking to answer if the addition of isogenic iMacs would change the drug response of iHeps, and were using the PBMC-derived macrophages here as a control. A more complete study taking into account the genetic background of the donor PBMC-derived macrophages would be much more informative, but sadly outside of the scope of our present study.

      (16) Lines 482-484: The authors talk about LPS-treated cultures and refer to Figure 4. However, there is no graph shown for LPS.

      We apologise for being unclear here, but the co-cultures were co-treated with LPS during the drug stimulation assays, as it had been shown that LPS increases the sensitivity of the liver toward hepatotoxic drugs. We have clarified this in the main text (Lines 435-437).

      Reviewer #2 (Recommendations for the authors):

      (1) It would be nice to add some protein production by the hepatocytes. For example, can they produce albumin or some other protein that can be measured? Perhaps I missed this.

      The protein expression of Albumin and Urea were assessed in the hepatocytes prior to co-culture in Supp Fig 1C; however we did not measure the protein level changes after co-culture as the co-culture would have a significant number of macrophages as well which we thought might affect the readout. Instead, after co-culture the primary analysis was done on the RNA levels of ALB and other cytochrome genes after sorting in Fig 3.

      (2) Was there an increase in hepatocyte number? Did one cell outgrow the other, or did they maintain numbers?

      The relative proportion of the iHeps remained the same, although we did see an expansion in the iMac population after 7 days by flow cytometry in Fig 1D.

      (3) What happens if the iMACs and the iHeps are grown in Costar chambers with pore sizes too small to allow for cell contact, but allowing supernatant to be continuously exposed to both cell types?

      We were primarily focused on the acquisition of KC-like phenotype in the iMacs with regards the question of direct contact, which was why we chose to use conditioned iHep media as part of the iMac experimental set up. However, it would be very interesting to see if the converse is also true, and whether secreted factors from the iMacs alone would be sufficient to drive the changes we observed in the iHeps after co-culture in a follow-up study.

      (4) The discussion could use a brief paragraph on some limitations and what could be added to the co-culture system. For example, could stellate cells and sinusoidal endothelium also impart KC identity? Would growing KCs on endothelium provide a more natural substratum?

      Once again, these are very interesting questions which are unfortunately outside of the scope of our study. However, we have included a short section discussing this in the paper, as we do think that it would be interesting to look at iMacs educated by hepatocyte vs stellate cells for example (Lines 530-536).

      (5) The axonal guidance pathway in early iMACs is interesting. A recent report in vivo showed that macrophages migrate from the liver parenchyma into the sinusoids in neonates when they are still immature. The process could be chemotaxis, or it could be repulsion by parenchyma. Numerous axonal guidance molecules are repulsive, pushing axons away (robo/slit, etc). The migration of Kupffer cells into sinusoids could be a repulsive rather than a chemoattractant pathway. Did the RNA seq data provide any interesting molecules in this regard?

      Reviewer #3 (Recommendations for the authors):

      This manuscript presents a conceptually well-designed approach to modeling hepatocyte-macrophage crosstalk in vitro. The authors develop a co-culture system aimed at recapitulating key aspects of Kupffer cell (KC) identity and hepatocyte maturation. The data convincingly show that macrophages acquire KC-like features under co-culture conditions. However, several major issues limit the strength of the conclusions, the depth of mechanistic insight, and the translational impact of the work.

      First, the study relies heavily on bulk RNA-seq data with minimal functional or protein-level validation - particularly for hepatocyte maturation. To substantiate claims of functional maturation, additional assays measuring albumin secretion, urea production, and CYP activity are essential. Furthermore, the omission of zonation-associated markers (e.g., GLUL, CPS1, CYP2E1) leaves a critical gap in assessing whether the iHEPs achieve physiologically relevant functional states.

      Second, statistical interpretation and reporting are inconsistent. Significant and non-significant findings are frequently conflated, which risks overinterpretation. For instance, the reported reduction in HNF4A expression is not statistically significant, and AFP expression is only significantly reduced in Day 7 co-iHEPs - yet these distinctions are not clearly stated.

      Third, although the authors emphasize the role of cell-cell contact in promoting KC identity, no experiments (e.g., transwell separation, adhesion-blocking assays) directly test this claim. As a result, the mechanistic basis for this conclusion remains speculative.

      Finally, while the data support enhanced macrophage differentiation toward a KC-like phenotype, the evidence that co-culture significantly promotes hepatocyte maturation is far less convincing and requires additional functional, mechanistic, and statistical validation before firm conclusions can be drawn.

      Minor comments:

      (1) Methodology: The choice of a 2.5:1 iHEP:iMAC ratio is not justified. This proportion does not reflect physiological hepatocyte-to-KC ratios in vivo and should be either rationalized or benchmarked against native liver composition.

      We admit that the ratio here is on the higher side of things, but it has been previously reported that there can be between 20 to 40 macrophages per 100 hepatocytes (1:5 to 1:2.5) in the adult mouse liver (Baratta et al., 2009), while admittedly in the developing mouse liver the ratio is closer to 1:4 (Lopez et al., 2011). We chose 1:2.5 as we anticipated that not all of the macrophages would be able to attach, and would thus be lost during media change, as evident by the flow cytometry of the co-culture on Day 3 of the co-culture, where only 20% of the cells had clear CD45 and CD14 expression. We have clarified our methodology in paper (Lines 141-143).

      (2) Effect of iMAC on iHEP (Section 3.2, Supplementary Figure 1E):

      (2.1) The authors should explain why Day 3 co-cultured iHEPs show stronger transcriptomic similarity to primary hepatocytes than Day 7 cells. Possible biological mechanisms (e.g., transient paracrine signaling or temporal changes in maturation dynamics) should be discussed.

      We have added some discussion for this (Lines 309-311, 536-540).

      (2.2) The figure legend refers to "fetal hepatocytes," while the correlation map states "hepatocytes." This discrepancy must be clarified. Moreover, if fetal hepatocytes are used as the reference, and the goal is to assess maturation, comparisons to adult hepatocytes are necessary. 

      The comparison was done against fetal hepatocytes, and has been clarified in the figure. We chose to use fetal hepatocytes here as it would be unfair to compare iPSC-derived cells that are less than 3 weeks old to adult human tissue, and any similarity or differences between the mono/co-cultures to the adult tissue might be due to the shifting transcriptomic landscape during development. However, we do recognise the nuanced nature of using “maturation” here, and what we mean is that the iPSC-derived cells become more similar to their in-vivo counterparts.

      (2.3) Baseline characterization of both cell types before co-culture is insufficient. For iHEPs, flow cytometry data on ALB and AFP positivity rates should be presented, along with post-co-culture changes. For iMACs, marker expression (CD45, CD11b, CD14) should be shown before and after co-culture. The methods mention CD163, CX3CR1, and CD11b, but these data are absent from the results. Additionally, the gating strategy for cell sorting prior to bulk RNA-seq must be clearly described - including how potential cross-contamination of cell fractions (e.g., macrophages in the hepatocyte population) was excluded.

      We apologise for this oversight, some of the markers were used in determining the purity of the iMacs before co-culture, and we did not end up including these plots for brevity. We have added the purity plots in Supp Fig 2E now, showing that the iMacs were more than 90% pure before co-culture. We acknowledge the concern about cross-contamination for bulk sequencing, and have added in Supp Fig 2G and H the expression of ALB in the iMac fraction, as well as the expression of CSF1R in the iHep fraction, showing minimal contamination with our gating strategy.

      (3) IGF2 Expression: The observed upregulation of IGF2, a fetal marker, contradicts the conclusion that co-culture promotes hepatocyte maturation. This inconsistency should be addressed, and possible explanations (e.g., transient fetal-like activation driven by macrophage-derived signals) discussed. The lack of statistical significance for this finding must also be explicitly noted.

      We thank the reviewer for pointing this out. The expression of IGF2 was actually significantly different when comparing the Day 0 Hepatocyte only and Day 7 Hepatocyte only to the Day 3 Co-cultured Hepatocytes, but the significance is lost with the Day 7 co-cultured Hepatocytes. One possible explanation is as the reviewer suggested, that there is a transient program that is activated upon co-culture that is subsequently downregulated. We have updated the figure and text, and added some discussion to reflect this (Lines 309-311, 536-540).

      (4) Effect of iHEP on iMAC: The reported upregulation of KC-related genes is overstated. Changes in LYVE1 and ID1 are not statistically significant (Figure 2G), yet they are presented as meaningful. Clear separation of statistically significant results from non-significant trends is critical to avoid overinterpretation.

      We apologise for this, as it was never our intention to present these markers as significant, but rather we presented these markers because we thought that these markers would be of interest to the audience. We have clarified the text to reflect that these are trends and non-significant (Lines 367-369).

      (5) Mimicking In Vivo Clinical Responses:

      (5.1) The authors' conclusion that IL-6 responses are not recapitulated when iMACs are replaced by monocyte-derived macrophages (MoMs) is not fully supported by the data presented. In fact, the MoM co-cultures exhibit a noticeable trend toward increased IL-6 production (e.g., approximately 150% with LTG at 66.6 µM and 400 µM), suggesting that some degree of responsiveness is retained. To substantiate the claim that the observed cytokine modulation is unique to iKC-containing co-cultures, the authors should perform direct statistical comparisons of absolute IL-6 secretion levels between iKC and MoM co-cultures at each drug concentration. Such analyses are essential to determine whether the differences are statistically significant and biologically meaningful, and to clarify whether the observed effects truly reflect KC-specific functionality rather than general macrophage activation.

      (5.2) The effects of drug exposure on hepatocytes themselves are not addressed. It is important to evaluate whether the co-culture remains viable under treatment, whether it recovers after drug withdrawal, and whether there is evidence of cytotoxicity or irreversible phenotypic loss.

      (6) Interpretation of IL-6 Modulation and Model Specificity:

      The authors show that IL-6 secretion in their co-culture system varies in response to multiple hepatotoxic drugs and parallels some reported clinical trends - notably, a concentration-dependent decrease with diclofenac (DIC) and leflunomide (LFM). They further report that this pattern is not observed in hepatocyte-PBMC-derived macrophage co-cultures, and they conclude that iMAC/iKC-like cells are essential for capturing immune-mediated hepatotoxic responses. However, the data presented do not fully justify such a conclusion. Several key mechanistic issues weaken the interpretation:

      (6.1) Mechanistic ambiguity in the DIC response: The decrease in IL-6 following DIC exposure is most likely attributable to reduced prostaglandin E₂ (PGE₂) production via COX inhibition, which secondarily suppresses IL-6 signaling. This effect is a general pharmacological property of NSAIDs and is not necessarily reflective of Kupffer cell-specific pathways. Direct evidence - such as prostanoid quantification or PGE₂ rescue experiments - is required to establish that the observed effects are liver-specific rather than nonspecific NSAID responses.

      (6.2) Pharmacogenetic complexity in the LFM response: LFM-induced hepatotoxicity is highly variable and largely dependent on CYP2C9 polymorphisms, which determine conversion to the active metabolite teriflunomide. Because hepatotoxicity and the associated cytokine responses are not universal among patients, a simplified co-culture model lacking metabolic diversity cannot be assumed to faithfully reproduce patient-specific immune responses. The observed IL-6 suppression could arise from differences in metabolic activation, intracellular exposure, or indirect signaling changes rather than from intrinsic KC-specific mechanisms.

      These points significantly undermine the authors' claim that IL-6 modulation provides definitive evidence of model specificity or predictive value. At minimum, the manuscript should (i) explicitly acknowledge these mechanistic limitations, (ii) include supporting data such as prostanoid profiling, CYP2C9 modulation, or teriflunomide quantification, and (iii) temper its claims regarding the model's capacity to recapitulate immune-mediated hepatotoxicity. Without such evidence, the current interpretation risks overstating the functional significance and translational relevance of the co-culture system.

      We fully agree with the reviewer and have highlighted this in our discussion (Lines 540 – 551).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The analysis of neural morphology across Heliconiini butterfly species revealed brain area specific changes associated with new foraging behaviours. While the volume of the centre for learning and memory, the mushroom bodies, was known to vary widely across species, new, valuable results show conservation of the volume of a center for navigation, the central complex. The presented evidence is convincing for both volumetric conservation in the central complex and fine neuroanatomical differences associated with pollen feeding, delivered by experimental approaches that are applicable to other insect species. This work will be of interest to evolutionary biologists, entomologists, and neuroscientists.

      Many thanks for your assessment and time handling this manuscript. We value the constructive input of both reviewers and believe that the result is an improved publication.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors previously reported that Heliconius, one genus of the Heliconiini butterflies, evolved to be efficient foragers to feed pollen of specific plants and have massively expanded mushroom bodies. Using the same image dataset, the authors segmented the central complex and associated brain regions and found that the volume of the central complex relative to the rest of the brain is largely conserved across the Heliconiini butterflies. By performing immunostaining to label a specific subset of neurons, the authors found several potential sites of evolutionary divergence in the central complex neural circuits, including the number of GABAergic ellipsoid body ring neurons and the innervation patterns of Allatostatin A expressing neurons in the noduli. These neuroanatomical data will be helpful to guide future studies to understand the evolution of the neural circuits for vector-based navigation.

      We thank Reviewer 1 for the constructive feedback and criticism, which will have strengthened this publication.

      Strengths:

      The authors used a sufficiently large scale of dataset from 307 individuals of 41 species of Heliconiini butterflies to solidify the quantitative conclusions and present new microscopy data for fine neuroanatomical comparison of the central complex.

      Weaknesses:

      (1) Although the figures display a concise summary of anatomical findings, it would be difficult for non-experts to learn from this manuscript to identify the same neuronal processes in the raw confocal stacks. It would be helpful to have instructive movies to show a step-by-step guide for identification of neurons of interest, segmentations, and 3D visualizations (rotation) for several examples, including ER neurons (to supplement texts in line 347-353) and Allatostatin A neurons.

      We approached this with the following logic:

      All 3D segmentations were animated, to illustrate how they are generated from raw imaging data. This means we are providing a video file for each major species group (Heliconius/outgroup-Heliconiini) for Figure 4 (general CX anatomy), Figure 7 (ER neuron projections), Figure S5 (ER neuron/bulb anatomy). This visual connection should help the reader relate 3D segmentations to image stacks. We have also added a reference to these videos in the relevant Figure captions.

      We also annotated image stacks, but did so selectively. We annotated key stacks of Figure 4 (general CX anatomy), Figure 7 (ER neuron projections), Figure S5 (ER neuron/bulb anatomy) and include a reference in figure caption to them.

      We refrained from annotating stacks of Figures 5, 6, 8 and S4. This is because we believe that the annotations we have performed in the figure panels will be sufficient for readers interested in the finer detail of these anatomies who are familiar with general CX anatomy.

      We believe that our approach will help the reader to gain a visual illustration of those parts of the manuscript which report key results and novel insights, such as ER neuronal variation, and that the data and figures collectively provide accessible information sufficient for this purpose.

      Text changes in Figure captions 4, 7 and S5: “See animated 3D segmentations and annotated stacks in file repository.”

      (2) Related to (1), it was difficult for me to assess if the data in Figure 7 support the author's conclusions that ER neuron number increased in Heliconius Melpomene. By my understanding, the resolution of this dataset isn't high enough to trace individual axons and therefore authors do not rule out that the portion of "ER ring neurons" in Heliconius may not innervate the ER, as stated in Line 635 "Importantly, we also found that some ER neurons bypass the ellipsoid body and give rise to dense branches within distinct layers in the fan-shaped body (ER-FB)". If they don't innervate the ellipsoid body, why are they named as "ER neurons"?

      Thanks for pointing to this. We believe this is primarily a nomenclature issue but have tried to specify in the text.

      Ultimately, neurons from this group that project to the EB forming the actual ring neurons and those that project to the FB with unclear function, thus far, emerge through the same lineage, DALv2 (as determined by Kandimalla et al 2023) and therefore have common developmental origin (also noted by Homberg et al 2018). To acknowledge their common developmental origin and to simplify nomenclature, and therefore also provide easier comprehension by non-experts, we specify which DALv2 progeny project to which areas, but refer to both adult neuron populations to “ER neurons”. We have changed the following text to acknowledge our definition specifically, which we hope mitigates the understandable confusion.

      Lines 354-357: “Here, we refer to these neurons, as well as those neurons projecting to the fan-shaped body (GU neurons in [66]), as ER neurons due to their common developmental origin [45,66] and to simplify anatomical descriptions.”

      Lines 386-387: “Whether these ER neurons solely branch in the fan-shaped body, as shown for GU neurons elsewhere [66] or have additional side branches entering the ellipsoid body is not clear.”

      (3) Discussions around the lines 577-584 require the assumption that each ellipsoid body (EB) ring neuron typically arborises in a single microglomerulus to form a largely one-to-one connection with TuBu neurons within the bulb (BU), and therefore, the number of BU microglomeruli should provide an estimation of the number of ER neurons. Explain this key assumption or provide an alternative explanation.

      Thanks for this. We do not think that our hypothesis necessarily requires any specific assumptions regarding the ratio of microglomerulus to ER or TuBu neurons. Even in Drosophila the ratio of ER to MG is only approximately 1:1, as some microglomeruli seem to combine into one. In other species this relationship might be very different. Indeed, our data suggests that in outgroup-Heliconiini the ratio is 4.4 microglomeruli to 1 ER neuron, and in Heliconius it is 3.4. However, as these MG numbers are extrapolated and cannot be precisely counted, they may be too imprecise to come to a definite conclusion, hence why we do not mention this in the text. Importantly, extrapolation in the current form is a valid additional way for us to describe overall bulb anatomy (next to bulb volume, average microglomerulus size).

      In any case, the inference we make here is that a conserved bulb anatomy in volume, MG numbers and size supports our assumption that the additional neurons in the ER neuron group/DALv2 progeny do not arborize in the bulb, but do so in the SMP/SLP region and in the fanshaped body. We believe we have described this inference accurately in the current manuscript.

      An additional point, not mentioned in the manuscript, but emerging through lineage annotations of connectome data, is that some DALv2 progeny have been identified as MBONs as well as being GABA-ergic, which could potentially be the ER-FB neurons that we describe (Schlegel et al 2024 Nature). We refrain from mentioning this here, as its too speculatory, but we thought the reviewer may be interested in this observation.

      (4) The details of antibody information are missing in the Key resource table. Instead of citing papers, list the catalogue numbers and identifier for commercially available antibodies, and describe the antigen, and whether they are monoclonal or polyclonal. Are antigens conserved across species?

      We have now added substantial information to Table 2, including research resource identifiers (RRIDs) and antigen descriptions, as well as information about specificity and conservation. In the text itself, in line 757, we already provide publications that have illustrated conservation very extensively.

      We believe that with the additional information provided in Table 2, all necessary information is now provided.

      (5) I did not understand why authors assume that foraging to feed on pollens is a more difficult cognitive task than foraging to feed on nectar. Would it be possible that they are equally demanding tasks, but pollen feeding allows Heliconius to pass more proteins and nucleic acids to their offspring and therefore they can develop larger mushroom bodies?

      This is an excellent point. Our current understanding is that pollen feeding is a cognitively more demanding task, because, a) the density of pollen resources is lower than nectar resources, and b) the competition for pollen is higher (pollen is depleted quickly, and Heliconius compete with each other, and other taxa including hummingbirds). There is therefore a benefit to high foraging efficiency, which favours the evolution of learning. This is likely reinforced by the long lives of Heliconius which live up to a year, compared to ~4 weeks for most outgroups and the temporal stability of major pollen resources, resulting in a memorised location providing benefit for the long periods of time (Young and Montgomery 2020 Proc B).

      We now refer to an additional publication (Young and Montgomery 2020 Proc B) in lines 103-104 for a fuller description of the ecology of pollen feeding, and in the current manuscript simply focus on the impact of mushroom body expansion on the CX.

      Reviewer #2 (Public review):

      Summary:

      In this study, Farnsworth et al. ask whether the previously established expansion of mushroom bodies in the pollen foraging Heliconius genus of Heliconiini butterflies co-evolved with adaptations in the central complex. Heliconius trap line foraging strategies to acquire pollen as a novel resource require advanced spatial memory mediated by larger mushroom bodies, but the authors show that related navigation circuits in the central complex are highly conserved across the Heliconiini tribe, with a few interesting exceptions. Using general immunohistochemical stains and 3D reconstruction, the authors compared volumes of central complex regions, and unlike the mushroom bodies, there was no evidence of expansion associated with pollen feeding. However, a second dataset of neuromodulator and neuropeptide antibody labeling reveals more subtle differences between pollen and non-pollen foragers and highlights sub-circuits that may mediate species-specific differences in behavior. Specifically, the authors found an expansion of GABAergic ER neurons projecting to the fanshaped body in Heliconius, which may enhance their ability to path-integrate. They also found differences in Allatostatin A immunoreactivity, particularly increased expression in the noduli associated with pollen feeding. These differences warrant closer examination in future studies to determine their functional implication on navigation and foraging behaviors.

      We thank Reviewer 2 for the constructive and thorough review. We believe that addressing these criticisms will have improved this publication.

      Strengths:

      The authors leveraged a large morphological data set from the Heliconiini to achieve excellent phylogenetic coverage across the tribe with 41 species represented. Their high-quality histology resolves anatomical details to the level of specific, identifiable tracts and cell body clusters. They revealed differences at a circuit level, which would not be obvious from a volumetric comparison. The discussion of these adaptations in the context of central complex models is useful for generating new hypotheses for future studies on the function of ER-FB neurons and the role of Allatostatin A modulation in navigation.

      The conclusions drawn in this paper are measured and supported by rigorous statistics and evidence from micrographs.

      Weaknesses:

      The majority of results in this study do not reveal adaptations in the central complex associated with pollen foraging. However, reporting conserved traits is useful and illustrates where developmental or functional constraints may be acting. The implied hypothesis in the introduction is that expansion of mushroom bodies in Heliconius co-evolved with central complex adaptations, so it may be helpful to set up the alternate hypotheses in the beginning.

      Thank you for this relevant comment. We have added to the text in lines 124-128, as follows

      “Indeed, these circumstances permit us to test the hypotheses that modifications in the mushroom bodies either occurred in isolation from other integrative centres, or that they occurred in concert with specific changes in centres, such as the central complex. This provides insights into the functional flexibility of two interacting, integrative centres across evolutionary time.”

      In the main text, the authors describe differences in GABAergic neurons "across several species" but only one Heliconius and one outgroup species seem to be represented in the figures. ER numbers in Figure 7H are only compared for these two species. If this data is available for other species, it would strengthen the paper to add them to the analysis, since this was one of the most intriguing findings in the study. I would want to know if the increased ER number is a trend in Heliconius or specific to H. melpomene.

      This points to imprecise phrasing. We indeed have additional data in other species, but unfortunately not to an extent that would permit quantification of cell numbers, which is why we chose to put these data into the supplement, Fig. S4.

      We modified the text to more directly point at the additional data in Fig S4, now reading in lines 362-368

      “…, we noticed a pronounced difference in a portion of projections leading into the fan-shaped body and a strong difference in signal inside layer III in our two focal species H. Melpomene and D. iulia, as well as other representatives of the Heliconiini tribe (Figure S4A-B, Figure 7). To understand how these differences could have occurred, we quantified ER neuron numbers in our focal species, and identified a significant difference, reflecting a 35% increase in Heliconius (t = 4.221, P = 0.004; Figure 7H).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Add a detailed description about each of the tiff files that were deposited at https://doi.org/10.5281/zenodo.15304965. It was hard for me to relate these raw images with the Figure panels. For instance, "Melp_GAD_26-F_detailed_conc.tif" in the Figure 7 folder seems to be used to make Figure 7L and N, but that information is cryptic.

      We agree with the reviewer. We added further descriptions, and have created a detailed readme file which explains which original file refers to which figure. Together with the efforts for Reviewer 1’s first comment, we hope that this updated version of our repository is easier to understand.

      In addition, we made additional changes in image orientation in some of the files supplied, and which were originally incorrect.

      (2) Add descriptions about the dataset for large-scale volumetric analysis. With the current methods and texts, it is hard to understand what kinds of staining and microscopes were used. I initially thought that they could be micro-CT data.

      We have made two improvements:

      We have added an additional readme file to explain the different datasets, and which datasets were used for each figure, to relate them to the original data deposited at zenodo.org (see your previous comment).

      We have added descriptions in several places in the manuscript file, i.e.

      Lines 133-135, now reading “To assess evidence of volumetric changes in the central complex and associated neuropils, we drew data from a large dataset of immunostained brains from 307 individuals of 41 species, …”

      Lines 144-149, now reading “We used a combination of phylogenetic comparative analysis across a large dataset of brains immunostained against the structural marker synapsin in 41 species and 307 individuals, and more targeted sampling of species that represent the behavioural and neuroanatomical diversity of Heliconiini for more fine-scale assessments of patterns of divergence in substructures of the CX with various antibodies (Figure 1A-B).”

      (3) Line 275: Non-expert readers would need an explanation about what the gamma lobe is.

      Agreed and added in line 273

      “Some of the ventral projections seemed to directly originate from the γ lobe, a portion of the mushroom body, thus potentially labelling projections of mushroom body output neurons into the fan-shaped body (Figure 5a-c) [12,21].”

      (4) Figures 4 I-L are missing.

      We modified the figure caption accordingly, and address annotated differences more directly. This section now reads

      “G/H: Labelling reveals two distinguishable layers in the fan-shaped body while additional staining elsewhere reveals further detail (arrows in G/H-2/3). Thicker tract conflations indicate the columnar architecture determined through the four columnar neuron bundles (arrowheads in G/H-3). Labelling in the EB reveals two pronounced layers (arrows in G/H-1/2), while obvious columns could not be indicated. PB protocerebral bridge, FB fan-shaped body, EB ellipsoid body. A anterior, P posterior. Scale bars are 50 μm.”

      (5) In the current version of Figure 1B, AOTU is displayed with the mushroom body. The authors can emphasize its relation to the central complex by showing it on the right side of panels together with the central complex.

      Great suggestion. We have done this now. We have kept the AOTU at the scale of the MB, indicated by the different scale bars of the bottom of the figure, as we’re showing the CX at a slightly larger scale.

      (6) Figure 1C: What do the colors of the lines represent?

      We now changed these colours so that they correspond to the colours chosen in Figures 2 and S2 as well as in a previous publication of the lab, added an asterisk next to Heliconius aoede, and added text to the figure legend:

      “Colour indicates focal groups here and elsewhere [29]. The asterisk at the branch of H. aoede indicates a secondary loss of pollen feeding.”

      (7) Figures 2A and B: What does the size of the circles represent? I guess that small ones are individuals, and larger ones are species averages. Plots with only species averages would be easier to see. It is difficult to distinguish Heliconius and Helicononius aoede in these panels. It would be easier if Heliconius circles were outlined with thin black lines. 

      Thanks for this. We wanted to keep both the averages and individual data points in one figure, as to not overcrowd the manuscript with additional figures. We still hope that the changes we made address the confusion sufficiently. We made the following modifications to Figure 2 and S1 and S2:

      (1) Added text in the figure legend clarifying what solid and transparent circles indicate (“Solid data points indicate species averages, while opaque circles indicate individual data points.”)

      (2) Added, as suggested, additional contours, to all Heliconius data points, and added corresponding text to the legend (“Black contours indicate Heliconius sp. data points.”)

      (3) Changed opacity settings of individual data points.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 391 and Methods. It was unclear how the extrapolated microglomeruli numbers were calculated. Please clarify this in the methods.

      Agreed. We substantially modified the text to address this.

      Lines 392-396: “We generated high resolution images of the bulb to determine its size (Figure S5 C-F), and 3D segmented seven microglomeruli per individual with which we generated an extrapolated approximation of total microglomeruli number by dividing bulb volume with average microglomerulus volume. This was necessary as most microglomeruli were not discernible from each other (Figure S5 G-H).”

      Lines 862-873: “To segment the bulb, we created high resolution images and were particularly careful to only segment the area of the bulb that comprised large synapses/glomeruli, excluding parts of the LEa/IT projection. This was essential, because we relied on extrapolating the total number of microglomeruli from a subset of segmented microglomeruli and the total volume that contained microglomeruli, which means any section containing tracts and not glomerular structures would skew the estimated total number of microglomeruli. Extrapolation was necessary, as not all microglomeruli were visually discernible. We achieved an unskewed bulb volume by leaving out dense pieces of tubulin-positive tract material. We segmented seven microglomeruli per individual from the posterior section of the bulb, where they were most clearly visible, to get the most comparable impression across individuals and species. We then calculated average microglomerulus size and divided this by bulb volume to determine an approximation of microglomeruli number.”

      (2) Line 439. It would be helpful to add that Kaiser et al. studied honeybees.

      Agreed! Now reads in lines 443-444

      “Moreover, Kaiser et al. [75] identified Allatostatin A expression in three fan-shaped and two ellipsoid body layers in the honey bee brain, …”

      (3) Line 492. "outcome" should be "outcomes".

      We believe that this refers to original line 481. Corrected. Thank you.

      (4) Figure 3B. If there is significance to the colors and triangle directions, please include a key/legend.

      We have added:

      “Cell type depictions are examples with localisation inside each neuropil being purely visual (as well as their colour), while triangles indicate approximate output sites.”

      We also corrected the following issues that were noted during our revisions:

      line 587, wrong reference.

      We updated references 37 and 44, which are now respectively

      Hodge, E. A. et al. Modality-specific long-term memory enhancement in Heliconius butterflies. Philos Trans R Soc Lond B Biol Sci 380, 20240119 (2025).

      Hodge, E. A. et al. Conservation of sensory pathways implies a localised change in the mushroom bodies is associated with cognitive evolution in Heliconius butterflies. Evol qpag005 (2026) doi:10.1093/evolut/qpag005.

      Figure S5 had an error in panels C and D, where the pictures in C were actually for H. Melpomene in D and the reverse; the other panels were correct. We have corrected this.

      In the data submitted on Zenodo: we corrected a few inconsistencies in channel colours and orientation in the .tiff files for Fig 6, 8 and S4.

      We added important bulb 3D segmentation files to the repository on Zenodo.

    1. Author response:

      We would like to express our sincere gratitude to the editors and the two reviewers for providing their constructive and valuable comments that will greatly guide us in improving the manuscript. We will revise the manuscript according to their critiques and suggestions. The existing code for this study, along with preliminary code developed in response to the review comments, has been made publicly available at https://github.com/cbaiming/miRTarDS. We now provide detailed responses to each reviewer below.

      Reviewer #1 (Public review):

      The author presents a new method for microRNA target prediction based on (1) a publicly available pretrained Sentence-BERT language model that the author fine-tunes using MeSH information and (2) downstream classification analysis for microRNA target prediction. In particular, the author's approach, named "miRTarDS", attempts to solve the microRNA target prediction problem by utilizing disease information (i.e., semantic similarity scores) from their language model. The author then compares the prediction performance with other sequence- and disease-based methods and attempts to show that miRTarDS is superior or at least comparable to existing methods. The author's general approach to this microRNA target prediction problem seems promising, but fails to demonstrate concrete computational evidence that miRTarDS outperforms other existing methods. The author's claim that disease information-based language models are sufficient is unfounded. The manuscript requires substantial rewriting and reorganization for readers with a strong background in biomedical research.

      We appreciate the reviewer’s careful examination of modeling, benchmarking, and interpretation, and we are particularly encouraged that they found the proposed method promising. We will make corresponding revisions to the manuscript based on the reviewer’s comments.

      A major issue related to the author's claim of computational advance of miRTarDS: The author does not introduce existing biomedical-specific language models, and does not compare them against miRTarDS's fine-tuned model. The performance of miRTarDS is largely dependent on the semantic embedding of disease terms. The author shows in Figure 5 that MeSH-based fine-tuning leads to a substantial improvement in MeSH-based correlation compared to the publicly available pretrained SBERT model "multi-qa-MiniLM-L6-cos-v1" without sacrificing a large amount of BIOSSES-based correlation. However, the author does not compare the performance of MeSH- and BIOSSES-based correlation with existing language models such as ChatGPT, BioBERT, PubMedBERT, and more. Also, the substantial improvement in MeSH-based correlation is a mere indication that the MeSH-based fine-tuning strategy was reasonable and not that it's superior to the publicly available pretrained SBERT model "multi-qa-MiniLM-L6-cos-v1".

      We thank the reviewer for the constructive suggestions regarding the benchmarking of language models. We acknowledge that the performance of miRTarDS largely depends on the semantic embeddings of disease terms. So, in the revisions, I will: 1) conduct a literature review to introduce existing biomedical-specific language models, and 2) perform a horizontal comparison between our fine-tuned model and these existing models, to more comprehensively evaluate the model’s capabilities.

      Another major issue is in the author's claim that disease-information from miRTarDS's language model is "sufficient" for accurate microRNA target prediction. Available microRNA targets with experimental evidence are largely biased for those with disease implications that have been reported in the biomedical literature. It's possible that their language model is biased by existing literature that has also been used to build microRNA target databases. Therefore, it is important that the author provides strong evidence that excludes the possibility of data leakage circularity. Similar concerns are prevalent across the manuscript, and so I highly recommend that the author reassess the evaluation frameworks and account for inflated performance, biased conclusions, and self-confirming results.

      We thank the reviewer for the comment. We recognize that existing experimentally validated microRNA targets may be biased toward those reported in biomedical literature as disease‑related. To mitigate this bias, we attempted to extract predicted microRNA targets that share a very similar number of miRNA- and gene‑ disease entries as the experimentally validated microRNA targets using the K‑Nearest Neighbors (KNN) method. Then applied Positive‑Unlabeled (PU) Learning to classify the two groups. PU‑Learning is designed to address scenarios where only a subset of the training data is explicitly labeled as positive, while the remaining data are unlabeled—with the unlabeled set containing both potential positives and true negatives—which is highly suitable for the application context of this manuscript [1]. Preliminary results show that after applying the new data extraction and classification approach, model performance drops to around F1=0.73 (the MISIM method also shows a decline, with F1 around 0.58; detailed code is available on GitHub). The specific reasons for this require further investigation.

      Last but not least, the manuscript requires a deeper and careful description and computational encoding of microRNA biology. I'd advise the author to include an expert in microRNA biology to improve the quality of this manuscript. For example, the author uses the pre-miRNA notation and replaces the mature miRNA notation to maintain computational encoding consistency across databases. However, the mature microRNA notation "the '-3p' or '-5p' is critical as the 3p and 5p mature microRNAs have different seed sequences and thus different mRNA targets. The 3p mature microRNA would most likely not target an mRNA targeted by the 5p mature microRNA.

      We thank the reviewer for the critique and suggestion. We fully agree with the reviewer that the distinction between the 3p and 5p mature strands is critical for determining mRNA targeting, as they possess distinct seed sequences. In our study, we relied on the miRNA–disease associations provided by the HMDD database, which annotates interactions at the pre-miRNA level: “… the enriched functions of each mature miRNA are aggregated to the corresponding miRNA precursor.” [2] Furthermore, existing literature suggests that the pre-miRNA level can be appropriate and informative for disease association analyses: “Compared with the mature miRNA method, the pre-miRNA method is more useful for studying disease association.” [3] We also find that, in some cases, both strands cooperate to regulate the same or complementary pathways [4]. We acknowledge the reviewer’s point as an important consideration for future revision. We plan to consult or collaborate with biologists to enhance the quality of the manuscript in biology.

      Reviewer #2 (Public review):

      This study introduces a novel knowledge-driven approach, miRTarDS, which enables microRNA-Target Interaction (MTI) prediction by leveraging the disease association degree between a miRNA and its target gene. The core hypothesis is that this single feature is sufficient to distinguish experimentally validated functional MTIs from computationally predicted MTIs in a binary classification setting. To quantify the disease association, the authors fine-tuned a Sentence-BERT (SBERT) model to generate embeddings of disease descriptions and compute their semantic similarity. Using only this disease association feature, miRTarDS achieved an F1 score of 0.88 on the test set.

      We thank the reviewers for their positive feedback, especially for their recognition of the novelty of this manuscript.

      Strengths:

      The primary strength is the innovative use of the disease association degree as an independent feature for MTI classification. In addition, this study successfully adapts and fine-tunes the Sentence-BERT (SBERT) model to quantify the semantic similarity between biomedical texts (disease descriptions). This approach establishes a critical pathway for integrating powerful language models and the vast growth in clinical/disease data into biochemical discovery, like MTI prediction.

      We would like to thank the reviewer again for their positive feedback. We appreciate their recognition of the novelty of our work, as well as their acknowledgment that the proposed method paves the way for integrating language models with clinical/disease data into biochemical discovery.

      Weaknesses:

      The main weakness lies in its definition of the ground-truth dataset, which serves as a foundation for methodological evaluation. The study defines the Negative Set as computationally predicted MTIs that lack experimental evidence. However, the absence of experimental validation does not equate to non-functionality. Similarly, the miRAW sets are classified by whether the target and miRNA could form a stable duplex structure according to RNA structure prediction. This definition is biologically irrelevant, as duplex stability does not fully encapsulate the complex in vivo binding of miRNAs within the AGO protein complex.

      We thank the reviewers for their constructive feedback. We have realized that treating predicted MTI as a negative class may pose some issues. Therefore, we have decided to adopt Positive Unlabeled (PU) Learning in subsequent updates. This classification method can be applied to datasets such as ours, which contain only positive classes and lack negative ones [1]. We used the miRAW dataset to enable a horizontal comparison of our method with traditional sequence-based prediction approaches. We acknowledge that miRAW may overlook some biological insights, and we plan to optimize the construction of test datasets in the future. Some preliminary explorations have already been conducted, and the relevant code is available on GitHub.

      Furthermore, we will make the following revisions: 1) We will clearly specify the version of miRBase and incorporate more miRNA-related databases. 2) Conduct a further literature review on miRNA biological mechanisms to enhance the quality of the manuscript in biology. 3) Perform a more comprehensive evaluation of the model’s performance. 4) Attempt to identify some representative MTIs that have been overlooked by existing prediction tools but can be predicted by our proposed method.

      References

      (1) Li, F., Dong, S., Leier, A., Han, M., Guo, X., Xu, J., ... & Song, J. (2022). Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Briefings in Bioinformatics, 23(1), bbab461.

      (2) Huang, Z., Shi, J., Gao, Y., Cui, C., Zhang, S., Li, J., ... & Cui, Q. (2019). HMDD v3. 0: a database for experimentally supported human microRNA–disease associations. Nucleic acids research, 47(D1), D1013-D1017.

      (3) Wang, H., & Ho, C. (2023). The human pre-miRNA distance distribution for exploring disease association. International Journal of Molecular Sciences, 24(2), 1009.

      (4) Mitra, R., Adams, C. M., Jiang, W., Greenawalt, E., & Eischen, C. M. (2020). Pan-cancer analysis reveals cooperativity of both strands of microRNA that regulate tumorigenesis and patient survival. Nature Communications, 11(1), 968.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is an important paper that reports in vivo physiological abnormalities in the hippocampus of a rat model of traumatic brain injury (TBI). In this study, authors focused on changes in theta-gamma phase coupling and action potential entrainment to theta, phenomena hypothesized to be critical for cognition. While the authors provide solid evidence of deficits in both features post-TBI, the study would have been stronger with a more hypothesis-driven approach and consideration of alterations of the animal's behavioral state or sensorimotor deficits beyond memory processes.

      We would like to thank the reviewers for their comments on our manuscript. By incorporating their feedback, we were able to make our hypotheses more clear, expand our analyses to compare physiological processes across similar behavioral states, and address extra hippocampal input and potential sensorimotor confounds in our data.

      Specifically, we have added new data in Figure 5 showing how theta amplitude correlates with theta-gamma PAC and entrainment strength. We have also added supplementary Figure 1 demonstrating that there are no differences in exploration or movement velocity in injured animals compared to shams. Supplementary Figures 2, 3, and 4 were added to compare oscillatory power while animals were still, moving at a higher velocity, and following a broadband power shift correction respectively. We also added Supplementary Figure 7 demonstrating that there were no differences in firing rates between sham and injured animals while they were still or moving and Supplementary Figure 8 showing no changes in pyramidal cell bursting. Finally, we added Supplementary Figure 10 showing that there was no difference in velocity or distance traveled during testing in the MWM between sham and injured animals and that learning curves were similar across groups before sham/injury surgery. We believe that the addition of this data significantly improves our manuscript by more strongly controlling for the animal’s behavioral state in our analyses and provides strong evidence that significant sensory/motor deficits were not present in injured animals at this injury level and time point post injury. Below we address specific points raised by the reviewers.

      Reviewer #1 (Public review):

      Summary:

      This study investigated how traumatic brain injury affects oscillatory and single-unit hippocampal activity in awake-behaving rats.

      Strengths:

      The use of high-density laminar electrodes enabled precise localization of recording sites. To ensure an unbiased, rigorous approach, single-unit analysis was performed by a reviewer who was blind to experimental conditions. A proof of concept study was undertaken to characterize the pathology that resulted from the specific TBI model used in the main study. There was an effort to link abnormalities in hippocampal activity to memory disruption by running a cohort of rats on the Morris Water Maze task.

      Weaknesses:

      The paper is written as if the experiment was exploratory and not hypothesis-driven despite the fact that there is a wealth of experimental evidence about this TBI model that could have informed very specific predictions to test a hypothesis that is only hinted at in the discussion. The number of rats used for the spatial working memory experiment is not reported. Some of the statistics are not completely reported. It is also unclear what the rationale was for recording single units in a novel and familiar environment. Furthermore, this analysis comparing single-unit activity between familiar and novel environments is quite rudimentary. There are much more rigorous analyses to answer the question of how hippocampal single-unit firing patterns differ across changes in environments. There are details lacking about the number of units recorded per session and per rat, all of which are usually reported in studies that record single units. Spatial working memory assessment is delegated to a single panel of a supplementary figure. More importantly, there is no effort to dissociate between spatial working memory deficits and other motor, motivational, or sensory deficits that could have been driving the lower "memory score" in the experimental group.

      In order to address these important concerns, we have made the following changes:

      (1) We have updated the results section to include more rationale for the recordings and analyses used to clarify our hypotheses. In addition, we hope that our extensive characterization will lay the groundwork to inform future studies investigating circuit-specific disruptions following TBI and neuromodulatory therapies.

      (2) The number of rats used for the spatial working memory experiment is reported in the text and figure legend.

      (3) We have added supplemental Table 2 to include the requested statistical information (t-statistic, degrees of freedom, and 1 vs 2-tailed analyses).

      (4) Unfortunately, we did not have adequate occupancy to robustly extract and compare place cell properties across groups and environments which obscured the rationale of our study design and limited us to more rudimentary analyses. While animals did actively explore the two environments, the relatively short recording time limited the spatial sampling of the two-dimensional environment. We were able to extract putative place cells and found some evidence that place cells in TBI rats had lower spatial information content than in shams (as has previously been described). However, we did not feel that place cell analyses were rigorous enough to include in this manuscript due to the limited spatial sampling. Future studies in the lab will assess how TBI affects place cell information content, stability, and phase precession with better occupancy.

      (5) We have added Supplemental Table 1 that includes the total number of units recorded for each animal.

      (6) The spatial working memory deficit we report in the MWM is not a novel finding in this model of TBI. However, we wanted to ensure that <sub>L</sub>FPI in our hands at this injury level reproduced this known deficit. Importantly, the swim speed and distance traveled during testing did not differ between groups, suggesting that differences were not due to motor deficits. Additionally, the learning curves before sham/<sub>L</sub>FPI surgery were the same across groups. This data has been added to the manuscript in Supplementary Figure 10. While we did not test animals in a version of the task where the platform was visibly marked, previous studies have demonstrated that sham and injured rats perform comparably in a version of the MWM where the platform is visible or when a constant start location is used. These citations have been added to the manuscript.

      Reviewer #1 (Recommendations for the authors):

      For a more rigorous way of analyzing changes in hippocampal firing patterns across environments, see Wills et al 2005 for example.

      Addressed in point 4 above

      Spatial working memory tasks should always be compared with a control task to rule out confounding performance variables. Examples would be to use a variant of the MWM task that does not require the hippocampus such as using a visible escape platform.

      Addressed in point 6 above

      Statistics are typically reported including a t-statistic and degrees of freedom, not just the p-value. In addition, the authors should indicate whether the t-test is one or two-tailed.

      Addressed in point 3 above

      Reviewer #2 (Public review):

      Summary:

      The authors investigate changes in theta-gamma phase amplitude coupling, and action potential entrainment to theta following traumatic brain injury (TBI). Both phenomena are widely hypothesized to be important for cognition, and the authors report deficits in both after TBI. The manuscript is well-written, the figures are well-constructed, and the author's use of high-level analysis methods for TBI EEG data collected from awake, behaving animals is welcome.

      Major Comments:

      The animal n's are small (4 sham and 5 injured). In Figure 3, for instance, one wonders if panels D and E might have shown significant differences if more animals had been recorded.

      There are conflicting reports regarding the effect of <sub>L</sub>FPI on single cell firing rates. This is likely due to differential task demands and variations in <sub>L</sub>FPI severity across studies. We agree that the firing rates do appear to be trending; however, overall firing rate changes can be difficult to interpret. Because firing rates are influenced by behavior and brain state, we further separated firing rates into epochs when animals were moving or still and found similar trends that did not reach significance (data added in Supplementary Figure 7). We also assessed bursting in pyramidal cells to investigate whether potential changes in bursting influenced overall firing rates, and we found no differences between sham and injured animals across conditions (data added in Supplementary Figure 8). While the n’s are small when considered by animal, the number of units is actually fairly large, so if there were robust effects (as there were for the entrainment analyses), we would expect to see significant differences.

      The text focuses on deficits in the theta and gamma bands, but the reduction in power appears to be broadband (see Figure 1F, especially Pyramidal cell layer panel). Therefore, the overall decrease in broadband (in the injured population) must be normalized between sham and injured animals before a selective comparison between sham and injured animals can be conducted. That is the only way that selective narrow bands i.e., theta and low gamma can be compared between the two cohorts. A brief discussion of the significance of a broadband decrease would be appreciated.

      This is an excellent point that has now been addressed with the addition of Supplementary Figure 4. We used a well-established method (Donoghue et al 2020) to flatten power spectra in order to compare specific frequency bands in the context of a broadband shift. After applying this correction, we show that theta power is still reduced in injured rats compared to shams. While there is no difference in gamma power between groups in the corrected power spectra, this result should be interpreted with caution especially since there is not a large distinct peak in the gamma frequency range in the power spectrum of either sham or injured animals. However, if this is interpreted to mean that gamma power is not different between sham and injured animals, it makes the PAC data even more compelling. While there is clearly a broadband shift, the frequency range of this shift is still limited in the frequency domain to ~4-90Hz which contains physiologically relevant frequencies associated with synaptic currents. Importantly, the power spectra of sham and injured animals converge at low (<4Hz) and high (>100Hz) frequencies. This suggests that slow oscillations which could include delta and respiration-associated oscillations are not affected by TBI (though sleep recordings would be needed to properly address this). High-frequency activity can include ripples and HFOs which need to be separately extracted when comparing between groups due to their transient nature. However, overall spiking activity including the depolarizing spike and the after hyperpolarization significantly contribute to power in the high frequency range. Because this general high-frequency power is not different between groups, it suggests that the limited range of the broadband power reduction still contains important physiological signals. This broadband shift may result from a global reduction in or desynchronization of synaptic input to CA1. The specific mechanisms behind this broadband shift and the consequences it has on coding information in the hippocampus are fascinating questions that we hope will be specifically investigated in future studies. This point is now addressed in the Discussion.

      Reviewer #2 (Recommendations for the authors):

      Minor Comments:

      Please define your reference waveform for theta - is it theta recorded on the channel containing the cell? Average theta for all electrodes in SP? SP + SO? Theta for the nominal "St. pyr." channel? Please define.

      For all entrainment analyses, entrainment was measured referenced to the theta oscillation recorded from st. pyr. on the specific shank where the unit was detected. We added clarification in the results and methods sections regarding this point.

      Similarly, even though the peak of the theta wave appears from the figures to be taken as 0 degrees, please explicitly state this in the text.

      This has been added to the results and methods.

      Did the authors check for any difference between interneurons in SP and interneurons in SO?

      This is an excellent suggestion that we had hoped to investigate as it could inform whether specific interneuron populations were affected. However, we did not record enough units in st. ori to make this comparison.

      On page 8, Figures 3E and 3F are incorrectly labeled 4E and 4F.

      This has been fixed.

      Figure 1, panel C: please add a numerical scale to the colored scale bar.

      This has been added

      Figure 1, panel F: how was the significance between the frequency bands calculated?

      Statistics were done using a t-test at each frequency point with significance set at α=0.01 for multiple comparisons. This has been clarified in the figure legend and methods.

      Figure 3, panel A legend: Please add "Spike at 0 ms omitted for clarity.”

      This has been added

      Figure 4, panel A, right side: please provide the MVL for this cell, so that readers have a benchmark for evaluating the MVL as a parameter. A sample poorly entrained cell, with MVL, would also be informative.

      We added the MVL for this cell. We were unable to add a poorly entrained cell without making the figure more confusing.

      Raw data must be provided for the Morris Water Maze experiments described in Supplementary Figure 3.

      We added data showing no difference in the swim velocity or distance traveled between the sham and injured groups during memory testing as well as data showing that the two groups had similar learning curves during training before sham/injury surgery. See Supplementary Figure 10.

      Antibody 22C11 for APP has been shown to be non-specific when used for immunocytochemistry (it may be fine for Westerns). In addition, using a biotinylated secondary with an ABC kit for visualization risks contamination by post-injury changes in biotin. Reviewed in Xiong et al., 2023, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10580020/.

      As is standard practice in neuropathology, negative controls were run for all of these experiments (identical preparations minus the primary antibody.) No non-specific staining was present that could be mis-interpreted as APP-positive axonal profiles in either sham or injured tissue. While beyond the scope of this response, there are many reasons the authors of the cited paper may have had non-specific staining, including a concentration 450X that of the one utilized here and the absence of an antigen-retrieval technique in their protocol.

      Tummala et al. used in vivo calcium-imaging after TBI and also investigated single-cell activity in familiar and novel environments, and when moving or still. The authors could consider discussing their work.

      We have added a citation for this paper

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors studied the effects of traumatic brain injury created by LFPI procedure on the CA1 at the network level. The major findings in this study seem to be that the TBI reduces theta and gamma powers in CA1, reduces phase-amplitude coupling in between theta and gamma bands as well as disrupts the gamma entrainment of interneurons. I think the authors have made some important discoveries that could help advance the understanding of TBI effects at the physiological level, however, more investigations into deciphering the relationship of the behavioral and brain states to the observed effects would help clarify the interpretations for the readers.

      Strengths:

      The authors in this study were able to combine behavioral verification of the TBI model with the laminar electrophysiological recordings of the CA1 region to bring forward network-level anomalies such as the temporal coordination of network-level oscillations as well as in the firing of the interneurons. Indeed, it seems that the findings may serve future studies to functionally better understand and/or refine the therapies for the TBI.

      Weaknesses:

      Discoveries made in the paper and their broad interpretations can be helped with further characterization and comparison among the brain and behavioral states both during immobility and movement. The impact of brain injury in several parts of the brain can alter brain-wide LFP and/or behavior. The altered behavior and/or LFP patterns might then lead to reduced spiking and unreliable LFP oscillations in the hippocampus. Hence, claims made in the abstract such as "These results reveal deficits in information encoding and retrieval schemes essential to cognition that likely underlie TBI-associated learning and memory impairments, and elucidate potential targets for future neuromodulation therapies" do not have enough evidence to test whether the disruptions were information encoding and retrieval related or due to sensorymotor and/or behavioral deficits that could also occur during TBI.

      Movement velocity is already known to be correlated to the entrainment of spikes with the theta rhythm and also in some cases with the gamma oscillations. So, it is important to disentangle the differences in behavioral variables and the observed effects. As an example, the author's claims of disrupted temporal coding (as shown in the graphical abstract) might have suffered from these confounds. The observed results of reduced entrainment might, on one hand, be due to the decreased LFP power (induced by injury in different brain areas) resulting in altered behavior and/or the unreliable oscillations of the LFP bands such as theta and gamma, rather than memory encoding and retrieval related disruption of spikes synchrony to the rhythms, while on the other hand, they may simply be due to reduced excitability in the neurons particularly in the behavioral and brain state in which the effects were observed, rather than disrupted temporal code. Hence, further investigations into dissociating these factors could help readers mechanistically understand the interesting results observed by the authors.

      We appreciate the Reviewer’s insights into disentangling the complex interactions between power, entrainment, and excitability, and have attempted to dissociate these further in our analyses. Regarding the broad effects of TBI, we agree that TBI affects many brain regions outside of the hippocampus as well as white matter pathways containing axons from areas where pathology is not visible, which likely results in widespread changes to LFPs across regions and altered behavior. Here we report disrupted network activity in the hippocampus which is likely a consequence of numerous pathologies across multiple brain regions. In the discussion, we speculate that disrupted power and coupling comes from desynchronization of inputs (especially those from the mEC and MS) as well as changes to local circuits within the hippocampus which combine to disrupt temporal coding. While the disrupted processes we report in the hippocampus are implicated in computational processes thought to support learning and memory, we acknowledge that results from this study do not causally reveal a specific mechanism that is directly responsible for cognitive impairments. We have changed the language of the quoted sentence from the abstract to make our claim less causal as we agree that the direct effects of these results on cognition are difficult to quantify due to the fact that animals were not performing a spatial navigation task with measurable outcomes during recordings. We have also removed the graphical abstract as we believe it is an oversimplification of the results given new analyses.

      Regarding the possible contribution of sensory and motor deficits or differences in behavioral states to the observed changes, we agree that it is essential to consider potential sensorimotor deficits as well as the animal’s behavioral state when comparing oscillations and single unit activity in the hippocampus, especially since these phenomena have been extensively liked to movement velocity and exploration. To address this, we have added Supplementary Figure 1 showing that there are no differences in movement velocity or exploration time between sham and injured animals. Because animals were simply foraging during electrophysiological experiments we do not expect there to be any major additional behavioral differences that would influence oscillations or spiking once locomotion is controlled for, though differences in attention or arousal cannot be ruled out. Additionally, analyses throughout the manuscript are performed independently during periods when animals were moving or still. Data in Figures 1 and 2 also only include data from the familiar environment to rule out any effects of novelty on hippocampal oscillations. Supplementary Figures 2 and 3 were added to demonstrate that TBI-associated reductions in power were consistent when animals were still and when a higher threshold for movement (>20 cm/sec) was used. Finally, supplementary Figure 10 was added showing no differences in swim velocity or distance traveled in the MWM between sham and injured animals, further suggesting that there are no significant sensorimotor deficits at this injury level and timepoint. Additionally, previous studies have demonstrated that sham and injured rats perform comparably in a version of the MWM where the platform is visible or when a constant start location is used, which provides further support that sensorimotor deficits are not responsible for memory deficits in this task (see above).

      Regarding the contribution of neuronal excitability to the reported changes, we agree that changes in the excitability of neurons could have a strong effect on entrainment. Importantly, we show that the disrupted oscillations recorded in the injured hippocampus do not coincide with significant changes in neuronal firing rates between sham and injured animals. We have added Supplementary Figure 7 demonstrating this holds true both when animals are still and when they are moving. Additionally, we have added Supplementary Figure 8 showing no differences in pyramidal cell bursting between sham and injured animals. While this suggests that there are not major changes in excitability, homeostatic plasticity mechanisms may impact firing rates and bursting, and the extent of these effects and their role on entrainment are unclear. This point was added to the Discussion.

      To address the effects of LFP power on entrainment strength, Figure 5 has been updated to show theta and gamma entrainment strength as well as theta-gamma PAC as a function of theta amplitude. We found that, during periods of comparable theta power, interneurons from sham and injured animals are similarly entrained to theta, but pyramidal cells from injured animals become significantly more entrained to theta than in shams. We address the potential implications of these results in the Discussion.

      Reviewer #3 (Recommendations for the authors):

      The authors have stated on page 7 and Figure 2E, "Taken together, injured rats show a decrease in the strength of theta-gamma PAC that is specific to st. pyr, and a shift in peak gamma amplitude to a later phase of theta in both st. pyr and st. rad". Is the shift in the peak position greater than expected by chance?

      We are unaware of a rigorous method that would allow us to compare this shift statistically. We have reported the observed shift and avoided calling the shift significant for that reason.

      The authors state on page 9 "cells (sham familiar=1.63{plus minus}0.23 Hz, n=51, injured familiar=2.11{plus minus}0.20 Hz, n=141, p=0.446; sham novel=1.84{plus minus}0.18 Hz, n=55, injured novel=2.23{plus minus}0.21 Hz, n=134, p=0.170; mean{plus minus}SEM; ks-test; Fig 4E) between sham and injured groups, but a higher percentage of pyramidal cells were active (firing rate >0.1Hz) in both the familiar and novel environment in injured rats compared to shams (sham=74%, injured=87%, p=0.025, Fisher's exact test; Fig 4F)." Do the authors mean Figures 3E and 3F respectively in place of Figures 4E and 4F?

      This has been fixed.

      Regarding the finding of similar firing rates and differences in the overlap of the neurons that were active in between injured and control animals, it is imperative to study the differences in behaviors of the animals. First of all, it seems appropriate to quantify and compare the immobility and mobile periods as well as the movement velocity of the animals in both groups. Then, it would be interesting to see if any behavioral variables correlate with the firing characteristics of the cells in both the sham and the injured animals. Since hippocampal cells have been known to have different levels of recruitment and firing rates according to different behavioral states such as movement velocity, some of the similarities or differences in neural findings might as well be attributed to the differences in behaviors in between the groups. However, some differences may be observed in the injured rats despite similar behavior and the LFP powers. In other words, studying the effects of injury during similar behavioral (e.g. firing rate as a function of movement velocity) and brain states (e.g. categorical effects of awake theta state, type two theta, and ripple states on firing rates and the entrainment) might help dissociate some effects that might only be due to difference in the behavior caused by the injury throughout the brain and might as well have less to do with specific injury induced local circuits level deficits in the hippocampus. The results in Figures 4, 5, and 6 reveal such interesting differences and hence, it becomes even more important to quantify and correlate behavioral states (movement velocity and theta/ripple) to the neuronal characteristics (LFP power, PAC, firing rates, and entrainment) presented in Figure 3.

      These are excellent points, and we have addressed them in the following ways:

      We added Supplementary Figure 1 demonstrating that there were no differences in movement velocity between sham and injured animals during electrophysiological recordings.

      Power and PAC analyses were done exclusively when the animal was moving to compare across similar behavioral states. Additionally, these analyses were constrained to recordings from the familiar environment to rule out any effects of novelty. Because animals were simply foraging during recordings we do not expect other behavioral factors besides movement velocity to play a major role in these processes. We have also added Supplementary Figures 2 and 3 which demonstrate that TBI-associated differences in oscillatory power follow similar trends when animals are still (Sup. Fig 2) or when a higher movement threshold (>20cm/sec) is used (Sup Fig 3). We also added Supplementary Figures 7 and 8 showing that there were no significant differences in firing rates or bursting while animals were still or while they were moving.

      The Discussion was expanded to discuss how TBI may disrupt circuits outside the hippocampus which may contribute to our findings. Additionally, we acknowledge the limitation that these recordings were not obtained while animals were doing a quantitatively measurable spatial navigation task which limits our ability to assess whether changes are truly behaviorally relevant.

      We have also updated Figure 5 to show entrainment across different levels of theta power.

      Elaborating on the abovementioned point, Figures 4B and 4E depict a finding that mean entrainment is reduced in the injured during immobility. The following factors may contribute to the results:

      (1) Reduction in theta power during immobility (reduced attention and/or LFP profile due to brain-wide injury), which makes theta cycles unreliable, which can contribute to the results.

      (2) Changes in neural firing properties during immobility, such as reduced burst rates or firing rates during immobility.

      (3) As the authors claimed in the graphical abstract, there might be an actual disruption of temporal code associated with the memory encoding. It would be awesome if the temporal disruption could be investigated during the comparable theta power and behavioral states. This analysis would test whether there is an unconfounded disruption in the temporal code in the hippocampus due to the injury. In any case, it would be ideal to isolate the epochs during sleep in which animals were in theta state and exclude ripple states to make a definitive assessment of the aforementioned factors. These further investigations would also help the interpretations made by authors in the discussion section such as "This can disrupt type II theta which occurs when animals are not actively moving and exploring the environment. We found that single unit entrainment to theta was substantially decreased in injured rats when they were not moving, a phenomenon not seen in shams, which suggests a disruption in type II theta. This provides further evidence that cholinergic signaling may be dysfunctional following TBI."

      (1) While theta power is reduced in injured animals, it can still be reliably detected even at rest. We added Supplementary Figure 2 showing power spectra while animals were not moving, and a distinct peak can be seen in the theta frequency range. Additionally, clear peaks in entrainment can be seen in the theta frequency band in Fig 4B while animals were still. This suggests that theta can still be reliably detected in injured animals even when they are not moving. However, we agree that reduced attention or arousal could contribute to these changes, and this point has been added to the Discussion.

      (2) We added Supplementary Figures 7 and 8 showing no differences in firing rates or bursting parameters between groups during periods of immobility.

      (3) We updated Figure 5 which now shows entrainment strength as a function of theta amplitude. We found that the theta entrainment strength of both pyramidal cells and interneurons increased with increasing theta amplitudes. We address potential implications of these changes in the Discussion.

      On page 10 the authors state, "theta entrainment strength drastically increased when rats began moving in injured but not sham animals." It is unclear if the effect was confined to the periods when rats started movement. Also, it would be of interest to investigate whether movement epochs and velocity were affected in the periods when the effects were observed.

      This was not confined to the exact points when the rats started moving. We removed the word “began” for clarity. See point regarding velocity above.

      On page 12 the authors state, "On test day, injured rats had a lower memory score than shams (sham=114.8 {plus minus} 21.8, n=9; injured=51.5{plus minus}6.8, n=14; p=0.020; mean {plus minus} SEM; Welch's t-test) indicating poor spatial memory (Sup Fig 3A)." The result is the validation of the TBI injury on a hippocampal-dependent Morris water maze task. However, it would be nice to see the quantification of the movement velocity in the water maze and the trajectory length in each group to further dissect whether animals were constrained in the movement and hence, they could not get to the platform or they forgot where it was located. Also, it would help to compare the rats' performance after sham or TBI surgeries to their performance during the training before the surgeries (assuming the data during the training periods were recorded as well).

      We have added Supplemental Figure 10 to include all of this information. Importantly, movement velocity and distance traveled were not different between groups on testing day, and the learning curves of both groups were the same before sham/injury surgery.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study utilises fNIRS to investigate the effects of undernutrition on functional connectivity patterns in infants from a rural population in Gambia. fNIRS resting-state data recording spanned ages 5 to 24 months, while growth measures were collected from birth to 24 months. Additionally, executive functioning tasks were administered at 3 or 5 years of age. The results show an increase in left and right frontal-middle and right frontal-posterior connections with age and, contrary to previous findings in high-income countries, a decrease in frontal interhemispheric connectivity. Restricted growth during the first months of life was associated with stronger frontal interhemispheric connectivity and weaker right frontal-posterior connectivity at 24 months of age. Additionally, the study describes some connectivity patterns, including stronger frontal interhemispheric connectivity, which is associated with better cognitive flexibility at preschool age.

      Strengths:

      The study analyses longitudinal data from a large cohort (n = 204) of infants living in a rural area of Gambia. This already represents a large sample for most infant studies, and it is impressive, considering it was collected outside the lab in a population that is underrepresented in the literature. The research question regarding the effect of early nutritional deficiency on brain development is highly relevant and may highlight the importance of early interventions. The study may also encourage further research on different underrepresented infant populations (i.e., infants not residing in Western high-income countries) or in settings where fMRI is not feasible.

      The preprocessing and analysis steps are carefully described, which is very welcome in the fNIRS field, where well-defined standards for preprocessing and analysis are still lacking.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      While the study provides a solid description of the functional connectivity changes in the first two years of life at the group level and investigates how restricted growth influences connectivity patterns at 24 months, it does not explore the links between adverse situations and developmental trajectories for functional connectivity. Considering the longitudinal nature of the dataset, it would have been interesting to apply more sophisticated analytical tools to link undernutrition to specific developmental trajectories in functional connectivity. The authors mention that they lack the statistical power to separate infants into groups according to their growing profiles. However, I wonder if this aspect could not have been better explored using other modelling strategies and dimensional reduction techniques. I can think about methods such as partial least squares correlation, with age included as a numerical variable and measures of undernutrition.

      We agree with the reviewer that this complex and rich longitudinal dataset would benefit from more sophisticated analytical approaches to characterise developmental trajectories in functional connectivity and to more directly link them to measures of undernutrition. However, conducting such analyses would require substantial additional methodological development, model validation, and careful interpretation, which fall beyond the scope and timeline of the present manuscript. Our aim here was to provide a clear and robust characterisation of functional connectivity changes during the first two years of life and to examine associations with growth outcomes at a specific developmental stage, while ensuring methodological transparency and statistical reliability. Importantly, these more advanced trajectory-based analyses are currently being pursued in the final phase of the BRIGHT project (BRIGHT IMPACT), in collaboration with expert statisticians and data scientists. This ongoing work aims specifically to leverage the longitudinal richness of the dataset to model developmental trajectories and their associations with early-life adversity and nutritional factors. We therefore see the present study as an important foundation for these forthcoming analyses.

      Connectivity was assessed in 6 big ROIs. While the authors justify this choice to reduce variability due to head size and optodes placement, this also implies a significant reduction in spatial resolution. Individual digitalisation and co-registration of the optodes to the head model, followed by image reconstruction, could have provided better spatial resolution. This is not a weakness specific to this study but rather a limitation common to most fNIRS studies, which typically analyse data at the channel level since digitalisation and co-registration can be challenging, especially in complex setups like this. However, the BRIGHT project has demonstrated that it is possible and that differences in placement affect activation patterns, which become more localised when data is co-registered at the subject level (Collins-Jones et al., 2021). Could the co-registration of individual data have increased sensitivity, particularly given that longitudinal effects are being investigated?

      We agree with the reviewer that the fNIRS community should work toward more precise methods for spatial registration of optodes, not only at the group level but also at the subject level, in order to make more precise inferences about the locations of activations. However, we followed a very thorough offline procedure to model headgear placement based on each participant’s photographs, which we believe complements the coregistration work performed by Collins-Jones in 2021. As reported in the fNIRS data acquisition section “Infants were excluded from further analysis if the band was excessively high over the front above the eyebrows” (line 409, methods section). Moreover channels displacement was measured from the photos, and if it was “equal or greater than 1.6 cm were renumbered, so that each channel was shifted either backward or forward one full channel location in space” (line 413, methods section). While these practices are thoroughly followed in the BRIGHT project, we are aware that they are not part of the standard procedure in many infant fNIRS studies. We hope that this work provides guidance for other researchers on how to coregister infant fNIRS data.

      Considering the spatial resolution of fNIRS, which is on the order of centimetres, and the thorough procedure combining fNIRS–MRI coregistration with channel displacement assessment based on photographs, we do not think that individual-level coregistration would have significantly increased the sensitivity of the results.

      I believe that a further discussion in the manuscript on the application of global signal regression and its effects could have been beneficial for future research and for readers to better understand the negative correlations described in the results. Since systemic physiological changes affect HbO/HbR concentrations, resulting in an overestimation of functional connectivity, regressing the global signal before connectivity computation is a common strategy in fNIRS and fMRI studies. However, the recommendation for this step remains controversial, likely depending on the case (Murphy & Fox, 2017). I understand that different reasons justify its application in the current study. In addition to systemic physiological changes originating from brain tissue, fNIRS recordings are contaminated by changes occurring in superficial layers (i.e., the scalp and skull). While having short-distance channels could have helped to quantify extracerebral changes, challenges exist in using them in infant populations, especially in a longitudinal study such as the one presented here. The optimal source-detector distance that minimises sensitivity to changes originating from the brain would increase with head size, and very young participants would require significantly shorter source-detector distances (Brigadoi & Cooper, 2015). Thus, having them would have been challenging. Under these circumstances (i.e., lack of short channels and external physiological measures), and considering that the amount the signal is affected by physiological noise (either coming from the brain or superficial tissue) might change through development, the choice of applying global signal regression is justified. Nevertheless, since the method introduces negative correlations in the data by forcing connectivity to average to zero, I believe a further discussion of these points would have enriched the interpretation of the results.

      We added a paragraph discussing the choice of using GSR in our pipeline in the discussion of the manuscript as follows: “Importantly, these results remained significant even without GSR, indicating that our findings are not solely driven by preprocessing choices. While the use of GSR in FC studies remains debated (Murphy & Fox, 2017), in the absence of short channels (which are difficult to use reliably with infants (Emberson et al., 2016)) and external physiological measures, applying GSR represented the most appropriate preprocessing option. In fact, failure to correct for systemic physiological fluctuations can, in fact, lead to artificially elevated connectivity estimates in fNIRS data (Abdalmalak et al., 2022)” (line 250, discussion section).

      Reviewer #2 (Public review):

      Strengths:

      The article addresses a topic of significant importance, focusing on early life growth faltering in low-income countries-a key marker of undernutrition-and its impact on brain functional connectivity (FC) and cognitive development. The study's strengths include the laborious data collection process, as well as the rigorous data preprocessing methods employed to ensure high data quality. The use of cutting-edge preprocessing techniques further enhances the reliability and validity of the findings, making this a valuable contribution to the field of developmental neuroscience and global health.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      The study fails to fully leverage its longitudinal design to explore neurodevelopmental changes or trajectories, as highlighted by all three reviewers. The revised manuscript still primarily focuses on FC values at a single age stage (i.e., 24 months) rather than utilizing the longitudinal data to investigate how FC evolves over time or predicts cognitive development. Although the authors acknowledge that analyzing changes in FC (ΔFC) would reduce degrees of freedom (to ~30) and risk interpretability, they do not report or discuss these results, even as exploratory findings.

      As suggested, we added the table reporting the results of the associations between changes in functional connectivity (DFC) between 5 and 24 months and cognitive flexibility in the supplementary materials (Table SI3). We additionally explored the relationship between changes in growth and cognitive flexibility as suggested by Reviewer #3 and we reported these additional analyses in the text as follows: “We also explored whether changes in growth and changes in functional connectivity between 5 and 24 months were associated with cognitive flexibility at preschool age, but we did not find any significant association (Table SI3 and Table SI4).” (line 213, results section).

      Furthermore, the study lacks specificity in identifying which specific brain networks are affected by growth faltering, as the current exploratory analyses mainly provide an overall conclusion that infant brain network development is impacted without pinpointing the precise neural mechanisms or networks involved.

      We added this limitation in the discussion as follows: “While the impact of undernutrition on brain development has been documented in LMICs (46), herein, we provided empirical evidence that growth faltering specifically in infants younger than five months of age impacts observable development of functional brain networks in the second year of life. Future studies may be needed to pinpoint which specific brain networks are impacted” (line 279, discussion section).

      Reviewer #3 (Public review):

      Summary

      This study aimed to investigate whether the development of functional connectivity (FC) is modulated by early physical growth, and whether these might impact cognitive development in childhood. This question was investigated by studying a large group of infants (N=204) assessed in Gambia with fNIRS at 5 visits between 5 and 24 months of age. Given the complexity of data acquisition at these ages and following data processing, data could be analyzed for 53 to 97 infants per age group. FC was analyzed considering 6 ensembles of brain regions and thus 21 types of connections. Results suggested that: i) compared to previously studied groups, this group of Gambian infants have different FC trajectory, in particular with a change in frontal inter-hemispheric FC with age from positive to null values; ii) early physical growth, measured through weight-for-length z-scores from birth on, is associated with FC at 24 months. Some relationships were further observed between FC during the first two years and cognitive flexibility, in different ways between 4- and 5-year-old preschoolers, but results did not survive corrections for multiple comparisons.

      Strengths

      The question investigated in this article is important for understanding the role of early growth and undernutrition on brain and behavioral development in infants and children. The longitudinal approach considered is highly relevant to investigate neurodevelopmental trajectories. Furthermore, this study targets a little studied population from a low-/middle-income country, which was made possible by the use of fNIRS outside the lab environment. The collected dataset is thus impressive and it opens up a wide range of analytical possibilities.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses

      Data analyses were constrained by the limited number of children with longitudinal data on NIRS functional connectivity. Nevertheless, considering more advanced statistical modelling approaches would be relevant to further explore neurodevelopmental trajectories as well as relationships with early growth and later cognitive development.

      While in this study we selected specific FC and outcome variables based on our hypothesis, the final phase of the BRIGHT project, known as BRIGHT IMPACT, aims to apply advanced statistical models to integrate a range of project variables into a single comprehensive analysis. We have acknowledged this in the discussion as follows: “Applying more advanced statistical modelling methods and structural equation modelling analyses may provide greater insight with further investigations in contexts of adversity and, in turn, establish which outcomes are predicted by FC” (line 309, discussion section).

      The abstract and end of the discussion should make it clearer that the associations between FC and cognitive flexibility are results that need to be confirmed, insofar as they did not survive correction for multiple comparisons.

      We have acknowledged this in the abstract as follows: “Our results highlight the measurable effects that poor growth in early infancy has on brain development and the possible subsequent impact on pre-school age cognitive development, underscoring the need for early life interventions throughout global settings of adversity”.

      We have acknowledged this in the discussion as follows: “While our results are consistent with previous studies, we acknowledge that the significant associations between early FC and later cognitive flexibility do not withstand multiple comparisons. Therefore, we encourage future studies that may replicate these findings with a larger sample” (line 300, discussion section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1 B and C the authors should indicate that the results refer to HbO.

      We have added the suggested specification in the caption of the figure as suggested.

      (2) Figure SI2. Please indicate in the caption that these are the results when pre-processing did not include global signal regression.

      We have added the suggested specification in the caption of the figure as suggested.

      Reviewer #3 (Recommendations for the authors):

      (1) The sentence l529-531 ("To investigate whether FC early in life predicted...") should be more explicit as it is not clear which of the two variables is regressed by the other: is it the measure of cognitive flexibility that is regressed by FC, as the hypothesis suggests? Were other variables considered in the regression model? (For linear regression with only one "prediction" variable, the square root of the coefficient of determination 𝑅2 is equal to the correlation between the two variables.)

      Yes, it is the measure of cognitive flexibility that is regressed by FC. We have rephrased it in the text as follows: “we regressed later cognitive flexibility against FC that showed a significant change across the first two years of life”. There were no other variables in the regression model.

      (2) A summary table of the statistical results for FC-cognitive flexibility associations should be included as for other analyses, in addition to Figure 3B.

      We added a table of the results for the association between FC and cognitive flexibility in the supplementary materials (Table SI2, page 10), matching the same colours of Table 2. We referenced the table in the text in the main manuscript (line 211, result section).

      (3) Figure 3B: The legend should precise that these results did not survive corrections for multiple comparisons.

      We have specified this in the legend of Figure 3 as suggested.

      (4) For the young pre-schooler group, it seems that the age is around 4 years (age mean +/- SD=47.96 +/- 2.77 months) and not 3 years as indicated at several places in the manuscript.

      We found only once instance in which we erroneously said that the younger preschoolers were around 3 years. We replaced “Gambian infants from BRIGHT were cross-sectionally assessed at the age of 3 or 5 years for cognitive flexibility” with Gambian infants from BRIGHT were cross-sectionally assessed between the age of 3 and 5 years for cognitive flexibility (line 489, method section).

      (5) The authors use the term "intra-hemispheric" connections for the ones within each of the 6 sections. This might be misleading since fronto-posterior connections are also intra-hemispheric ones. Specifying "short-range" or "within-section" connections might be clearer.

      As suggested by the reviewer, we replaced “intra-hemispheric” with “intra-hemispheric within section” where appropriate through the whole manuscript.

      (6) Abstract: what is the justification for using the term "optimal" for describing developmental trajectories of FC?

      The term “optimal” refers to knowledge about typical developmental trajectories, coming especially from fMRI studies, as mentioned in the introduction: “Based on data from fMRI, current models hypothesize that FC patterns mature throughout early development (23–27), where in typically developing brains, adult-like networks emerge over the first years of life as long-range functional connections between pre-frontal, parietal, temporal, and occipital regions become stronger and more selective (28–31). [...]. Importantly, normative developmental patterns may be disrupted and even reversed in clinical conditions that impact development; e.g., increased short-range and reduced long-range FC have been observed in preterm infants (36) and in children with autism spectrum disorder (37, 38)” (line 93-106, introduction).

      (7) The confidence interval should be added in Figure SI3.

      As suggested, confidence intervals have been added in Figure SI3.

      (8) Other scatterplot examples of associations might be added as supplementary information.

      As suggested, we added several additional scatterplots to Figure SI3 (with confidence intervals as noted in the comment above) to show other associations between changes in growth and FC at 24 months.

      (9) Figure SI6: % in x-axis is still indicated.

      We apology for the oversight, all the percentage signs have now been removed from the x-axis tick labels.

      (10) The authors might show the (even not significant) results of the associations between changes in growth and cognitive flexibility in supplementary information.

      As suggested, we added the table reporting the results of the associations between changes in growth (DWLZ) and cognitive flexibility in the supplementary materials (Table SI3). We additionally explored the relationship between changes in functional connectivity and cognitive flexibility as suggested by Reviewer #2 and we reported these additional analyses in the text as follows: “We also explored whether changes in growth and changes in functional connectivity between 5 and 24 months were associated with cognitive flexibility at preschool age, but we did not find any significant association (Table SI3 and Table SI4).” (line 213, results section).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      Hoverflies are known for their sexually dimorphic visual systems and exquisite flight behaviors. This valuable study reports how two types of visual descending neurons differ between males and females in their motion- and speed-dependent responses, yet surprisingly, the behavior they control lacks any sexual dimorphism. The results convincingly support these findings, which will be of interest for studies of visuomotor transformations and network-level brain organization.

      This statement perfectly recapitulates our findings.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Hoverflies are known for a striking sexual dimorphism in eye morphology and early visual system physiology. Surprisingly, the male and female flight behaviors show only subtle differences. Nicholas et al. investigate the sensori-motor transformation of sexually dimorphic visual information to flight steering commands via descending neurons. The authors combined intra- and extracellular recordings, neuroanatomy, and behavioral analysis. They convincingly demonstrate that descending neurons show sexual dimorphisms - in particular at high optic flow velocities - while wing steering responses seem relatively monomorphic. The study highlights a very interesting discrepancy between neuronal and behavioral response properties.

      Thank you for this summary. Most of the statement perfectly recapitulates the main findings of our paper. However, we want to emphasize that some hoverfly flight behaviors are strongly sexually dimorphic, especially those related to courtship and mating. Indeed, only male hoverflies pursue targets at high speed, chase away territorial intruders, and pursue females for mating. However, other flight behaviours, such as those related to optomotor responses and flights between flowers when feeding, are not sexually dimorphic. We have amended the Introduction and Discussion to make the difference between flight behaviors more clear. Please see lines 77 and 305 onwards.

      More specifically, the authors focused on two types of descending neurons that receive inputs from well-characterized wide-field sensitive tangential cells: OFS DN1, which receives inputs from so-called HS cells, and OFS DN2, which receives input from a set of VS cells. Their likely counterparts in Drosophila connect to the neck, wing, and haltere neuropils. The authors characterized the visual response properties of these two neuronal classes in both male and female hoverflies and identified several interesting differences. They then presented the same set of stimuli, tracked wing beat amplitude, and analyzed the sum and the difference of right and left wing beat amplitude as a readout of lift or thrust, and yaw turning, respectively. Behavioral responses showed little to no sexual dimorphism, despite the observed neuronal differences.

      Thank you for this very nice summary of our work. We want to clarify that LPTC input to DN1 and DN2 has not been shown directly in hoverflies using e.g. dye coupling, or dual recordings. Instead, the presumed HS and VS input is inferred from morphological and physiological DN evidence, and comparisons to similar data in Drosophila and blowflies. We have amended the Introduction to clarify this. Please see line 64 onwards. The rest of the paragraph perfectly recapitulates the main findings of our paper.

      Strengths:

      I find the question very interesting and the results both convincing and intriguing. A fundamental goal in neuroscience is to link neuronal responses and behavior. The current study highlights that the transformations - even at the level of descending neurons to motoneurons - are complex and less straightforward than one might expect.

      Thank you.

      Weaknesses:

      The authors investigated two types of descending neurons, but it was not clear to me how many other descending neurons are thought to be involved in wing steering responses to wide-field motion. I would suggest providing a more in-depth overview of what is known about hoverflies and Drosophila, since the conclusions drawn from the study would be different if these two types were the only descending neurons involved, as opposed to representing a subset of the neurons conveying visual information to the wing neuropil.

      This is a great point. There are around 1000 fly descending neurons identified in Drosophila, of which many could respond to widefield motion, without being specifically tuned to widefield motion. In Drosophila, at least 35 descending neuron types receive input in the part of the brain where the LPTC outputs are located, and at least 29 descending neuron types project to the wing motor neuropil. Thus, it is more than likely that other neurons project visual widefield motion information to the wing neuropil. Furthermore, we only measured wing beat amplitude (WBA) as seen in the horizontal plane, as we were filming from above. As such, other wing angle changes and rotations are not quantified. We have amended our Introduction (see line 53 onwards) and Discussion (see line 320 onwards) to address these important points.

      Both neuronal classes have counterparts in Drosophila that also innervate neck motor regions. The authors filled the hoverfly DNs in intracellular recordings to characterize their arborization in the ventral nerve cord. In my opinion, these anatomical data could be further exploited and discussed a bit more: is the innervation in hoverflies also consistent with connecting to the neck and haltere motor regions? Are there any obvious differences and similarities to the Drosophila neurons mentioned by the authors? If the arborization also supports a role in neck movements, the authors could discuss whether they would expect any sexual dimorphism in head movements.

      These are all great points. We did not see any clear arborizations to the frontal nerve (FN), where we would expect to find the neck motor neurons (NMNs). In addition, while we did see fine arborizations throughout the length of the thoracic ganglion, we saw no strong outputs projecting directly to the haltere nerve (HN). In the revised version of the MS we have modified figure 4 (morphological characterization) to show a magnification of the thoracic ganglion to clarify this.

      There are important differences between the morphology of DN1 and DN2 in hoverflies and DNHS1 and DNOVS2 in Drosophila, in terms of their projections in the thoracic ganglion. For example, In Drosophila DNOVS2, there are several fine branches along the length of the neuron in the thoracic ganglia. Similarly, we found fine branches in Eristalis tenax DN2, however, in addition, we found a wide branch projecting to the area of the thoracic ganglion where the prothoracic and pterothoracic nerves likely get their inputs, which we also found in Eristalis tenax OFS DN1 (Figure 4). This suggests that both neurons could contribute to controlling the wings and/or the forelegs (which is why we quantified the WBA). In Drosophila DNOVS1, there is a similar fat branch to the prothoracic and pterothoracic nerves, Furthermore, while Drosophila DNHS1 and DNOVS2 have different morphology, DN1 and DN2 in Eristalis looked similar. We have modified the Results section to make this clear, see line 193 onwards.

      In addition, to investigate this further, our revised version of the MS includes analysis of the movement of different body parts (the head angle, fore- and hindleg extension) to investigate this further, and to look for sexual dimorphism. Unfortunately, however, this did not include the halteres, as they cannot be seen well in the videos. The new data can be seen in Figure 7.

      Reviewer #2 (Public review):

      Summary:

      Many fly species exhibit male-specific visual behaviors during courtship, while little is known about the circuit underlying the dimorphic visuomotor transformations. Nicholas et al focus on two types of visual descending neurons (DNs) in hoverflies, a species in which only males exhibit high-speed pursuit of conspecifics. They combined electrophysiology and behavior analysis to identify these DNs and characterize their response to a variety of visual stimuli in both male and female flies. The results show that the neurons in both sexes have similar receptive fields but exhibit speed-dependent dimorphic responses to different optic flow stimuli.

      This statement perfectly recapitulates the main findings of our paper. As mentioned above, while hoverfly flight behaviors related to courtship and mating are strongly sexually dimorphic, other flight behaviours, such as those related to optomotor responses and flights between flowers when feeding, are not. We have amended the Introduction and Discussion to make the difference between flight behaviors more clear. Please see lines 77 and 305 onwards.

      Strengths:

      Hoverflies, though not a common model system, show very interesting dimorphic behaviors and provide a unique and valuable entry point to explore the brain organization behind sexual dimorphism. The findings here are not only interesting on their own right but will also likely inspire those working in other systems, particularly Drosophila.

      Thank you.

      The authors employed rigorous morphology, electrophysiology, and behavior methods to deliver a comprehensive characterization of the neurons in question. The precision of the measurements allowed for identifying a subtle and nuanced neuronal dimorphism and set a standard for future work in this area.

      Thank you.

      Weaknesses:

      Cell-typing using receptive field preferred directions (RFPDs): if I understood correctly, this classification method mostly relies on the LPDs near the center of the receptive field (median within the contour in Fig.1). I have two concerns here. First, this method is great if we are certain there are only two types of visual DNs as described in the manuscript. But how certain is this? Given the importance of vision in flight control, I would expect many DNs that transmit optic flow information to the motor center. I'd also like to point out that there are other lobula plate tangential cells (LPTCs) than HS and VS cells, which are much less studied and could potentially contribute to dimorphic behaviors.

      This is very true, and important. As mentioned above, in Drosophila there are 35 descending neuron types with inputs on the dorsal surface of the brain (labelled DNp1-35), suggesting that they could receive input from LPTCs. However, only 3 of these have been shown physiologically and morphologically to receive LPTC input, in blowflies and Drosophila (DNHS1, DNOVS1, DNOVS2). Note that in both blowflies and fruitflies DNOVS1 gives graded responses, and no action potentials, meaning that we would not be able to record from it using extracellular electrophysiology.

      We previously used clustering techniques to show that in Eristalis, we can reliably distinguish two types of optic flow sensitive DNs from extracellular electrophysiological data, based on a range of receptive field parameters, and we think that these correspond to DNHS1 and DNOVS2 in Drosophila (Nicholas et al, J Comp Physiol A, 2020, cited in paper). As mentioned above in response to Reviewer 1, this does not mean that there are no other neurons that could respond to widefield optic flow, and which might be involved in the WBA we recorded in the paper. However, the point of this paper was not to conclusively show that there are only two optic flow sensitive descending neurons. The point was to say that there are two quite distinct optic flow sensitive neurons that have similar receptive fields in males and females, while their velocity response functions differ between males and females.

      We have modified the Introduction (see lines 53 and 64 onwards) and Discussion to make these important points clear to the Reader, including a mention of the 45-60 LPTCs that exist in the lobula plate, and what their role might be.

      Second, this method feels somewhat impoverished given the richness of the data. The authors have nicely mapped out the directional tuning for almost the entire visual field. Instead of reducing this measurement to 2 values (center and direction), I was wondering if there is a better method to fully utilize the data at hand to get a better characterization of these DNs. As the authors are aware, local features alone can be ambiguous in characterizing optic flows. What's more, taking into account more global features can be useful for discovering potentially new cell types.

      This is a great point, and we did analyse other receptive field properties in this study (shown in previous supp fig 1). In addition, and as mentioned above, we have published a clustering analysis across receptive field properties of these neurons (Nicholas et al, J Comp Physiol A, 2020, cited in paper). The point that we attempted to make in this paper was that by using two strikingly simple metrics, we can reliably distinguish which of the two neuron types we are recording from simply based on azimuthal location and overall directional preference. This makes automated analysis very straightforward. Indeed, we now use this routinely to ID what neuron we are recording from computationally, rather than making a human-based assumption.

      However, we agree that this needs to be shown, and that further in depth analysis was warranted. Therefore, we have provided additional receptive field analysis and clustering (see new supplementary figure 1) and associated text. We also want to highlight that all data is uploaded to Data Dryad for anyone interested in doing additional in-depth analyses.

      Line 131, it wasn't clear to me why full-screen stimuli were used for comparison here, instead of the full receptive field maps. Male flies exhibit sexual dimorphic behaviors only during courtship, which would suggest that small-sized visual stimuli (mimicking an intruder or female conspecific) would be better suited to elicit dimorphic neuronal responses. A similar comment applies to the later results as well. Based on the receptive field mapping in Figure 1, I'm under the impression that these 2 DN types are more suited to detect wide-field optic flows, those induced by self-motion as mentioned in the manuscript. The results are still very interesting, but it's good to make this point clear early on to help set appropriate expectations. Conversely, this would also suggest that there are other visual DN types that are responsible for the courtship-related sexually dimorphic behaviors.

      Thank you for mentioning these important points. Our reasoning for using full-screen stimuli for the analysis on line 131 was that since we used the small sinusoidal gratings for mapping the receptive fields, and to subsequently classify the neurons, it would be unfair to use the same data to investigate potential sexual dimorphism. I.e., we selected neurons that fulfilled certain criteria, and then we cannot rightfully use the same criteria to determine differences. This was not explicitly mentioned in the paper, so we have modified the text to make this clear to the Reader, see lines 142 onwards.

      However, in Supp Figure 2d/e we show that there are no striking receptive field differences between males and females in terms of receptive field center nor directional preference. In Supp Figure 2f we also show that there is no difference between male and female receptive field height and width. We have modified the text to draw the Reader’s attention to this figure, and also mention the additional analysis done in response to the comment above.

      As a side note, I personally expected at least DN1 to have a smaller receptive field in males, as the hoverfly HSN is strikingly sexually dimorphic (Nordström et al, Curr Biol 2008). However, while optic flow sensitive DNs do respond to small objects (see e.g. the J Comp Physiol paper mentioned above) we did not detect any obvious sexual dimorphism in receptive field properties. Indeed, we think that a different subset of DNs control parts of target pursuit behavior (target selective DNs (TSDNs)). This is now addressed in the modified version of the paper, see line 89-92.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I think that the additional measurement of head turns in response to some of the stimuli that showed the strongest sexual dimorphism would be very interesting, but I fully acknowledge that this might be beyond the scope of the current paper or technically too challenging, requiring additional cameras and a whole new tracking software, etc.

      We have added an additional figure to the paper, with associated text, showing the response of the head, fore- and hindlegs to the same stimuli, as far as we could extract them with only one camera filming from above. The new data can be found in the new figure 7, and associated text.

      (2) Are the onset measurements for WBD comparable across flight manoeuvres, given that they are limited to a single projection plane?

      This is a great point, and we have now added this caveat in the text, see line 261-262.

      (3) Line 62 - typo: DNp15 not NDp15.

      Thank you, fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) Related to a comment earlier, in the Introduction, it is mentioned that there are 3 optic flow-sensitive DNs in Drosophila and blowfly. However, I don't see convincing evidence for this in the cited references, none of which have exclusively surveyed all the DNs.

      We have revised this to say that 3 neuron have been identified morphologically and physiologically, but that does not mean that there are no others. Please see line 60 onwards.

      (2) Line 142 and Supplementary Figure 3, this is stated in the next section, but I think it's better to make it clear that DN2 in females has a higher spontaneous rate before mentioning the starfield. Please also specify if the stationary starfield affects the DN2 rate at all in the female flies.

      Great points. We now describe the spontaneous rate before mentioning the responses to moving starfield stimuli, and highlight that there is no difference between no stimulus (pre-stimulation) and a stationary stimulus. Please see lines 155 onwards.

      (3) Line 34, 'redress' should be 'to address'.

      Thank you, fixed.

      (4) Line 59, a bit unclear to me what this sentence is trying to say. Also, I wouldn't say LPTCs are 'indirect' in the sensorimotor transformation -- it's a necessary link in this pathway, no?

      That was indeed a strange sentence. We have simplified it to the following: “LPTCs project to the inferior posterior slope[6], where they synapse with descending neurons[7,8]. In Drosophila at least 35 descending neuron types have their inputs in the posterior surface of the brain (named DNp1-35) [9].”

      (5) Figures:

      This is a formatting problem. The figure legends are separated from the figures, and there are no titles on the figures to indicate which one is which.

      We are sorry about this. We have added labels to the figures.

      Figure 1: What kind of geographic projections are these? The azimuth axis is not labeled.

      These stimuli were not perspective corrected, and therefore the RF maps simply reflect the visual monitor. We have clarified this in the figure legend, including mentioning that the axis label is the same for elevation and azimuth.

      Figure 2a: The error bars are not aligned to the angular axis.

      These have now been aligned.

      Supplement Figure 2b: I'm not sure why there are two measurements at each stimulus orientation. The bottom panel is confusing -- what do you mean by 'receptive field location'? And what does this red arrow/line mean in the bottom panel?

      Thank you for pointing this out. The figure was supposed to help the reader understand our transformations, so it’s great to know that it needed further explanation. To address this, we have added extra text and panel labels, please see lines 520 onwards.

      (6) Methods:

      Line 356: Maybe a picture or schematic drawing would be helpful to explain the setup. For instance, it's unclear what 32 degrees here refers to.

      This is a great suggestion, and a pictogram explaining the set-up can now be seen in Supplementary Fig. 6b.

      Line 404: What does it mean that 'spatially interpolate 10 times'?

      This sentence has been changed to “After subtracting the spontaneous rate, calculated for 0.8 s preceding stimulus onset (dotted line, inset, Fig. 1b, e), we interpolated the resulting local maximum responses to a ten-fold higher spatial resolution (colour coding, Fig. 1a, d).”

      Line 405: How to determine the center from the 50% contour?

      We have modified the Methods to explain how this was done, please see lines 478 onwards.

      Line 408: Please explain more explicitly how LPD and LMS are computed.

      We have modified the Methods to explain how this was done, please see lines 488 onwards.

      Line 418: Is reference 42 correct? I could be wrong, but this reference seems to be talking about target-selective DNs rather than optic flow-sensitive DNs?

      Yes, this reference is correct. In a supp figure to ref 42, we show data from optic flow sensitive neurons, but not their receptive fields. Thanks for checking.

      Line 426: Are the full-screen stimuli presented in 8 directions too? Do I understand correctly that the preferred direction vector for the full-screen stimuli is extracted from a cosine fit, which is slightly different from the 'receptive field preferred direction' in the receptive field mapping measurement, which is the median of all the 'local preferred direction' (which are from the cosine fit)?

      We have modified the text to make this clear, please see lines 519 onwards, as well as the receptive field analysis, please see lines 474 onwards.

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study provides valuable insight into the role of actin protrusions in mediating early pre-endoyctic steps of human papillomavirus entry at the cell surface. Using state-of-the-art microscopy in an immortalized keratinocyte model, the authors present mostly solid evidence that filopodia actively promote the transfer of heparin sulfate-coated virions from the extracullar matrix to the viral entry factor CD151. Remaining gaps in the mechanistic model could be further supported by including a more expansive analysis of the fixed microscopy samples and live cell imaging to distinguish virion transfer from direct binding.

      We thank the editorial team for the improved eLife assessment. Regarding the remaining gap, we agree that it is not clear why the large majority of the virions indeed are transferred and not directly binding virions.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The author's goal was to arrest PsV capsids on the extracellular matrix using cytochalasin D. The cohort was then released and interaction with the cell surface, specifically with CD151 was assessed.

      The model that fragmented HS associated with released virions mediates the dominant mechanism of infectious entry has only been suggested by research from a single laboratory and has not been verified in the 10+ years since publication. The authors are basing this study on the assumption that this model is correct, and these data are referred to repeatedly as the accepted model despite much evidence to the contrary. The discussion in lines 65-71 concerning virion and HSPG affinity changes is greatly simplified. The structural changes in the capsid induced by HS interaction and the role of this priming for KLK8 and furin cleavage has been well researched. Multiple laboratories have independently documented this. If this study aims to verify the shedding model, additional data needs to be provided.

      Comment of the authors: the above paragraph is copied from the very first review and describes the situation before revision.

      Note on revisions:

      The authors did an excellent job in their revision to include data from the effect of proteolytic priming on their observed virion transfer to the cell body. All other minor issues were addressed adequately.

      We are grateful that the referee acknowledges that we addressed all issues adequately.

      The work could be especially critical to understanding the process of in vivo infection. 

      We agree, and would like to point out that a similar comment was raised by the reviewing editor assigned to our original submission, John Schiller. For unknown reasons, he was no longer involved in the evaluation of the revision.

      Reviewer #2 (Public review):

      The study design involves infecting HaCaT cells (immortalised keratinocytes mimicking basal cells of a target tissue) and observing virus localization with and without actin polymerization inhibition by cytochalasin D (cytoD) to analyze virion transfer from the ECM to the cell via filopodial structures, using cellular proteins as markers.

      In the context of the model system, the authors stress in the revised version the importance of using HaCaT cells as a relevant 'polarized' cell model for infection. The term 'polarized' is used in the cell biological literature for epithelial cells to describe a strict apical vs. basolateral demarcation of the plasma membrane with an established diffusion barrier of the tight junction. However, HaCat cells do not form tight junctions. In squamous epithelia, such barriers are only found in granular layers of the epithelium. The published work cited in support of their claims either does not refer to polarity or only in the context of other cells such as CaCo-2 cells.

      We thank the reviewer for this important clarification and fully agree. HaCaT cells do not form tight junctions and therefore do not fulfill the classical definition of polarized epithelial cells with a strict apical basolateral diffusion barrier. In response to this comment, we have removed the term “polarized” in reference to HaCaT cells throughout the revised manuscript. Our intention was not to imply classical epithelial polarity, but rather to emphasize that HaCaT cells represent a functionally relevant keratinocyte model that recapitulates key early steps of HPV infection observed in vivo, particularly abundant ECM deposition enabling for strong virion binding to the ECM.

      We now state on line 120: “PsVs that bind to the ECM at sites distal from the cell body are unable to establish direct contact with entry receptors, until the cell migrates onto them or they are transported along cell protrusions towards the cell body (Schelhaas et al., 2008; Smith et al., 2008). Both cell migration and protrusion transport depend on actin dynamics (Schaks et al., 2019). We aimed for blocking these active recruitment mechanisms in HaCaT cells, a cell line that is widely used as a cell culture model for HPV infection. They resemble primary keratinocytes in several key aspects: they are not virally transformed and produce large amounts of ECM, promoting interactions between viruses and ECM components and thereby facilitating infection (Bienkowska-Haba et al., 2018; Gilson et al., 2020). In addition, subconfluent HaCaT cells form filopodia and filopodial transport is used for the recruitment of ECM-bound virus particles to the cell body (Schelhaas et al., 2008, Smith et al., 2008). Together, these features make HaCaT cells a suitable model for studying active PsV recruitment from the ECM to the cell surface.”

      Overall, the matter of polarity would be important, if indeed the virus could only access cell-associated HSPGs as primary binding receptor, or the elusive secondary receptor via the ECM in the used model system (HaCaT cells), if they would locate exclusively basolaterally.

      We apologize for not having stressed enough that virions bind as well directly to the not imaged, upper cell membrane. To make clear that HaCaT cells are still a suitable model for studying active recruitment, throughout the manuscript, we worked on the following issues (this is an outline, for details see below):

      (1) We now discuss adequately that virions reach cell surface receptors either by passive diffusion or by active transport mechanisms, the latter involving actin dynamics (filopodial transport and cell migration), to which we refer in the revised manuscript as active recruitment.

      (2) We explain why the large majority of virions in the microscopic assay are actively recruited virions.

      (3) We explain the difference between biochemical infection assays that do not differentiate between passive and active recruitment, and microscopic assays studying the basal cell membrane and by this primarily actively recruited virions

      This is at least not the case for binding, as observed in several previous publications (just two examples: Becker et al, 2018, Smith et al., 2008). With only a rather weak attempt at experimental verification of their model system with regards to polarity of binding, the authors then go on to base their conclusions on this unverified assumption.

      We agree with the reviewer that strict epithelial polarity would only be relevant if HPV binding or receptor accessibility were confined to the basolateral membrane, which is not the case in HaCaT cells, as shown previously (e.g., Becker et al., 2018; Smith et al., 2008). However, our conclusions do not rely on strictly polarity-dependent binding.

      We added the following paragraphs clarifying that (i) in HaCaT cells PsVs also bind by passive diffusion to the upper cell membrane and that (ii) at the basal membrane the large majority of imaged PsVs is actively recruited.

      Line 332: “…, the lower PCC at 0 min/CytD suggests that without active recruitment less PsVs reach CD151. At 30 min after CytD, the PCC has reached the level of 0.1 as in the control, which is in line with the idea of fast recruitment as observed in Figure 4. To follow how the basal cell membrane is populated with PsVs over time, as additional analysis we determined the PsVs per µm<sup>2</sup> in ROIs placed in the cell body region. At 0 min, CytD reduces the PsV density to 19 - 33%, albeit the effect is not significant, and at 180 min/CytD the same PsV density as in the control is reached (Supplementary Figure 6A and B). Overall, under CytD there was a trend towards less PsVs present (Supplementary Figure 6A and B). Hence, both Figure 5C and Supplementary Figure 6A and B suggest that active virion transport is required to reach efficiently the basal membrane.”

      Line 447: “Throughout all experiments, we observe at 0 min/CytD only few PsVs at the basal membrane (Figure 1A, Supplementary Figure 6A and B; see also PCC at 0 min between PsVs an CD151 in Figure 5C), suggesting that in the absence of active recruitment the access to the basal membrane via passive diffusion is limited. We wondered, how many PsVs may bind to the cell membrane without a diffusion barrier? For this reason, we incubated EDTA detached HaCaT cells in suspension with PsVs for 1 h at 4 °C, followed by re-attachment for 1 h. Under these conditions, we find, despite of a shorter incubation time (1 h versus 5 h), a roughly 3-fold larger PsV density (1.7 PsVs/µm<sup>2</sup> (Supplementary Figure 6D)) than the highest density observed in the other experiments. However, it should be noted that values of the different experiments cannot be directly compared. Aside from the different treatments, another difference lies in the size of the imaged membrane. The re-attachment of cells is not complete after 1 h (compare size of adhered membranes in Supplementary Figure 6A and 1A), wherefore the membranes are likely strongly ruffled, which results in the underestimation of the membrane area. As a result, we overestimate the PsVs per µm<sup>2</sup> adhered membrane (please note that we cannot re-attach cells for longer times as we then lose PsVs due to endocytosis). In any case, the experiment suggests that PsVs bind more efficiently to membrane surface receptors without a diffusion barrier. We conclude that in our assay PsVs cannot readily bypass the active PsV recruitment by diffusing directly to the basal cell membrane, which is plausible, because to make this happen a 55 nm large PsV must diffuse through the narrow gap between glass-coverslip and adhered cell.”

      Line 538: “The analyzed PsVs hardly bind to the basal cell surface directly by diffusion (Supplementary Figure 6, compare PsV maxima density at 0 min/CytD in A and B to C). Therefore, the actin-driven virion transport would play a decisive role in HPV infection if cells would form a monolayer with a disruption at which ECM is present and that is approached by PsVs, a scenario similar to in vivo infection. In addition, cell migration could establish contact between PsVs and the cell surface.”

      Line 548: “…that can readily bind to the upper cell membrane. We are not aware of a PsV translocation mechanism from the upper to the basal membrane. Therefore, in our assay, PsVs bound to the upper membrane are not expected to show up at the basal membrane. Comparing 0 min of control and CytD (Supplementary Figure 6A and B), we find that compared to the control 19 - 33% of the PsVs reach the basal membrane in the absence of active transport, or in other words, most likely by passive diffusion. Actually, the range from 19 – 33% must be a strong overestimate as PsVs in the control are in transit and many actively recruited PsVs are already internalized during the 5 h incubation period. For this reason, we propose that most likely much less than 10% of the PsVs reach the basal membrane by diffusion. Moreover, in the absence of the diffusion barrier, the density of bound PsVs is strongly increased (Supplementary Figure 6D), showing indirectly that at the basal membrane the binding sites are difficult to access without active recruitment. Taken together, we propose the large majority of PsVs analyzed in our assay are ECM bound and actively recruited to the basal cell membrane.”

      This is one example of several in the manuscript, where claims for foundational premises, observations, and/or conclusions remain undocumented or not supported by experimental data.

      Another such example is the assumption of transfer of the virus from ECM to the tetraspanin CD151. Here, the conclusions are based on the poorly documented inability of the virus to bind to the cell body, which is in stark contrast to several previous publications, and raises questions.

      We hope with the above changes we made clear that virions can also directly bind to the cell body. We also added a paragraph discussing differences between biochemical and microscopic assays.

      Line 568: “In this scenario, sub-confluent HaCaT cells, or even better single HaCaT cells, would be an ideal model system for the microscopic study of these very early infection steps that involve ECM attachment and subsequent active recruitment, as supposed to occur during in vivo infection of basal keratinocytes after binding of virions to the basement membrane (Bienkowska-Haba et al., 2018; Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010). In contrast, in biochemical infection assays, virions diffusing to HSPGs on the cell surface, and by this bypassing active recruitment, are assayed together with the actively recruited virions. Should cells secrete little ECM and are grown to confluency, the passively binding virions are supposed to strongly dominate the infection rate in a biochemical infection assay.”

      There are a number of important additional issues with the manuscript:

      First, none of the inhibitors have been tested in their system for efficacy and specificity, but rely on published work in other cell types. This considerably weakens the confidence on the conclusion drawn by the authors.

      We use inhibitors CytD, blebbistatin, leupeptin and furin inhibitor I. The below references are examples reporting the usage of the inhibitors on HaCaT cells studied in the context of HPV infection.

      Furin inhibitor I:

      Cruz et al., Cleavage of the HPV16 Minor Capsid Protein L2 during Virion Morphogenesis Ablates the Requirement for Cellular Furin during De Novo Infection. Viruses, 2015; doi.org/10.3390/v7112910

      Cytochalasin D/Blebbistatin:

      Schelhaas et al., Human papillomavirus type 16 entry: retrograde cell surface transport along actinrich protrusions. PLoS Pathog., 2008. doi: 10.1371/journal.ppat.1000148.

      Smith et al., Virus activated filopodia promote human papillomavirus type 31 uptake from the extracellular matrix. Virology, 2009; doi.org/10.1016/j.virol.2008.08.040 and

      Leupeptin/Furin inhibitor I:

      Cerqueira et al., Kallikrein-8 Proteolytically Processes Human Papillomaviruses in the Extracellular Space To Facilitate Entry into Host Cells. J. Virology, 2015; doi.org/10.1128/jvi.00234-15

      Moreover, the reversible inhibitory effect of CytD the key inhibitor, used in this study on transport and infection is validated in this study. However, we discuss this data now in the context of directly binding virions more critically.

      Line 485: “Hence, the infection assay suggests that the treatment is largely reversible and only slightly harmful, if at all. However, the luciferase infection assay does not distinguish between actively recruited PsVs and PsVs that bind passively by diffusion to the upper membrane. The latter fraction likely dominates the total infection rate and should be less affected by CytD than the fraction of actively recruited PsVs. Therefore, if the infection pathway of a small fraction of actively recruited PsVs is irreversibly inhibited, we may not be able to detect this effect on the background of unaffected passively binding PsV.”

      Second, the authors aim to study transfer from ECM to the cell body and effects thereof. However, there are still substantial amounts of viruses that bind to the cell body compared to ECM-bound viruses in close vicinity to the cells.

      Regarding direct binding to the cell body, please see our detailed reply above.

      This is in part obscured by the small subcellular regions of interest that are imaged by STED microscopy, or by the use of plasma membrane sheets. This remains an issue despite the added Supple. Fig. 1, where also only sub cellular regions are being displayed. As a consequence the obtained data from time point experiments is skewed, and remains for the most part unconvincing, largely because the origin of virions in time and space cannot be taken into account. This is particularly important when interpreting the association with HS, the tetraspanin CD151, and integral alpha 6, as the low degree of association could be originating from cell bound and ECM-transferred virions alike.

      We hope with the above explanations it is plausible that the imaged virions primarily reach the basal membrane by active recruitment.

      Third, the use of fixed images in a time course series also does not allow to understand the issue of a potential contribution of cell membrane retraction upon cytoD treatment due to destabilisation of cortical actin. Or, of cell spreading upon cytoD washout. The microscopic analysis uses an extension of a plasma membrane stain as marker for ECM bound virions, this may introduce a bias and skew the analysis.

      The referee is correct in pointing out that cell spreading after CytD wash off would affect our analysis, e.g. by increasing the overlap between PsVs and the cell body although no active recruitment via filopodial transport and cell migration occurs. An argument speaking against this possibility is the lack of increase in the PCC between PsVs and F-actin after CytD removal, if the protease inhibitor leupeptin was present (Figure 2B and D). Leupeptin prevents PsV/phalloidin overlap despite restored actin polymerization after washout of both inhibitors, suggesting that priming is required for increased PsV–actin association and is too slow to change PCC within 60 min. These results support that the observed overlap reflects active, priming-dependent recruitment rather than cell morphology changes.

      We state on line 252: “Moreover, the experiment suggests that without PsV priming the PCC between PsV-L1 and F-actin does not increase, for instance, due to cell spreading after CytD removal.”

      On line 494, we state “However, we assume that this is rather unlikely, as cell spreading would increase the PCC between PsVs and F-actin under a condition where PsVs are not-primed (and therefore not actively recruited) but cell spreading occurs, which is not the case in Figure 2B and D (CytD/leupeptin).”

      Fourth, while the use of randomisation during image analysis is highly recommended to establish significance (flipping), it should be done using only ROIs that have a similar density of objects for which correlations are being established. For instance, if one flips an image with half of the image showing the cell body, and half of the image ECM, it is clear that association with cell membrane structures will only be significant in the original. But given the high density of objects on the plasma membrane, I am not convinced that doing the same by flipping only the plasma membrane will not also obtain similar numbers than the original.

      Regarding the association of PsVs with CD151 and HS, we corrected for random background with reference to a calibration line that describes the random background association in dependence of the density of objects. We now refer to this issue on line 343: “…, the fraction of PsVs closely associated with CD151 is around 10% (Figure 5D, control), after correction for random background association, for which we used a calibration line based on the same density of PsVs in flipped images (see Supplementary Figure 7).”

      In the legend of Supplementary Figure 7 we state: “…The fraction of closely associated PsVs (PsV-L1 maxima with a distance ≤ 80 nm to the next nearest CD151 maximum) in the Control of Figure 5 was analyzed on original and flipped images (for an example of a flipped image see Supplementary Figure 5A)…on flipped images, we often find values more than half of the values of the original images, demonstrating that many PsVs have a distance ≤ 80 nm to CD151 merely by chance, in the following referred to as background association…We take the altogether 24 fraction values obtained on flipped images (12 values from Control and CytD each), and plot the fraction of closely associated PsVs against the average CD151 maxima density in the respective images. As can be seen in (C), the fraction increases with the maxima density, as the chance of a distance ≤ 80 nm increases with the maxima density. The fitted linear regression line describes how the background association depends from the maxima density. As a result, the background association (y) can be calculated for any maxima density (x) with the equation y = 2.04 • x. The CytD/0 min condition may be overcorrected, if it includes many images with CD151 flipped onto peripheral PsVs that actually are distal to CD151 (for an example ROI see Supplementary Figure 5A). On the other hand, PsVs right at the cell border, where CD151 staining tends to be strong (Supplementary Figure 5A), after flipping have less CD151 than before, contributing to undercorrection.”

      When omitting the CytD/0 min values, we obtain essentially the same calibration line.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      There are further issues that are not pertaining to the study design that I find important.

      Fig.1

      There are few, if any, filopodia in untreated cells. It would be good to quantify their abundance to substantiate that resting HaCat cells are indeed a good model for filopodial transport bs. membrane retraction / spreading.

      We see filopodia in untreated HaCaT cells (although quite variable in abundance, please see control cells in e.g. Figure 3 and 8 and Supplementary Figure 2).

      In HaCat ECM the virus binds also to laminin-332 for a good part. Would this not also confound the analysis?

      We agree with the reviewer that in HaCaT-derived ECM, virus binding is not restricted to heparan sulfate (HS), and that laminin-332 represents an additional relevant binding partner. Indeed, viruses bound to laminin-332 may likewise be transported toward the cell body via laminin-binding integrins. We therefore consider laminin-332 to act as a parallel attachment factor alongside HS rather than as a mutually exclusive alternative.

      However, the primary aim of this study was not to comprehensively map all ECM binding partners, but to analyze the actin-dependent transport of ECM-bound virus particles. HS was chosen as a representative and well-characterized ECM marker for initial virus attachment. Importantly, inhibition of actin dynamics by cytochalasin D blocks this transport process downstream of initial binding. Thus, irrespective of whether the virus is initially bound to HS, laminin-332, or both, the readout reflects interference with the same actin-dependent transport mechanism.

      Consequently, the presence of laminin-332 binding does not confound our analysis, as the experimental outcome is determined by inhibition of transport rather than by the specific ECM attachment factor. Nonetheless, we acknowledge laminin-332 as an important parallel interaction partner and had already mentioned it the first version of the manuscript, but removed the sentence during the last revision, that has now been added again. On line 593 we state: “Finally, not all PsVs bound to the ECM are expected to bind to HS but could also bind to laminin 332 (Culp et al., 2006).”

      Fig.2

      Would benefit from live cell analysis. There are considerable amounts of virions on the cell body, which partially contradicts statements from Fig. 1. The fast transfer to the cell body after cyto D washout is based on the assumption that filopodia formation and transport along them (and not membrane extension) occurs quickly. Is this reasonable? Does membrane extension and migration occur between 0 min and later time points?

      Regarding membrane extension after CytD removal, that in the analysis may be indistinguishable from active recruitment transfer, please see our reply above (no PCC increase between PsV-L1 and F-actin after CytD removal if leupeptin is employed). Regarding migration, we now included this possibility as an active recruitment mechanism that may occur in parallel to filopodial transport (please see our reply above).

      Fig.4

      How are the subcellular ROIs chosen? Is there not a bias by not studying a full cell?

      In Figure 4 we are specifically interested in the time course of PsV diminishment from the cell periphery. The ROIs are generated with reference to the membrane staining, using the cell body delineation as a starting point. For details about how ROIs are generated, please see legend of Figure 4 and materials and methods.

      Fig. 5/6

      The data needs a better analysis on correlation by using randomisation as explained above.

      Please see our reply above. The association between PsVs and CD151 or HS has been corrected using a calibration line based on the same density of objects.

      Fig. 8. Why does blebbistatin block the transport only partially? Previous work on actin retrograde flow suggests that in the absence of myosin II function the transport stops completely. Would this not be a concern, when interpreting the city D data?

      Is the referee referring to Schelhaas et al., 2008 that we cite in the paper? In this paper, in HeLa cells blebbistatin reduced the directed particle motion by 82%, but not completely.

      Suppl. Fig. 1A, B: Intented to adress the issue of viruses binding to the cell body, it unfortunately falls short. It would have been better to analyse complete cells rather than ROIs, or better even, a comprehensive analysis of cell islets (boundary cells vs. central cells, with cell body to cell periphery).

      This experiment addresses the increase in PsV density resulting from active recruitment. Outlining entire cells would include also PsVs close to the cell edge that have not been actively recruited.

      Regarding cell islets (we call them patches of confluent cells as islets may be confused with e.g. more structured Langerhans islets), there are hardly any PsVs at the basal membrane. We state on line 135: “Frequently, we observe patches of confluent cells which are common to HaCaT cells. Cells at the center of these patches are dismissed during imaging, because hardly any PsVs are bound to their basal membrane, indicating that PsVs do rather not reach this area by passive diffusion. Instead, we focus on isolated HaCaT cells or cells at the periphery of cell patches. At these cells, we find more PsVs per cell than one would expect from the employed ≈ 50 viral genome equivalents (vge) per cell, indicating that PsVs are unequally distributed between the cells.”

      Is the difference between untreated and cytoD treated significant?

      We stated in the Figure legend that the difference is not significant (the exact p value is p = 0.089). We now have revised the Figure (previously Supplementary Figure 1A and B, now Supplementary Figure 6A and B), showing the PsV density at the basal membrane over time, also for the experiment shown in Figure 6. The now revised Figure (Supplementary Figure 6A and B) is discussed together with the re-attachment experiment (Supplementary Figure 6C and D), in order to compare the PsV accessibility to the cell membrane with and without diffusion barrier. Please see our reply above (paragraph starting at line 447).

    1. Author response:

      We are particularly encouraged by the consensus that our study provides a substantial resource and that the bioinformatic framework is biologically grounded and convincing, while appropriately noting that further experimental validation will be required. We fully agree with this point. As clarified in the revised manuscript, the lineage relationships we describe are inferred from integrative transcriptomic analyses and are intended to provide a mechanistic and conceptual framework rather than definitive proof of cellular origin. We have further strengthened the Discussion to explicitly acknowledge these limitations and outline future directions, including lineage tracing and functional validation studies.

      At the same time, we respectfully note that such experimental validation would require a substantial extension of this work and likely 2–3 years of additional studies, including development of appropriate model systems. We believe these efforts represent an important next phase of investigation rather than a revision-level addition to the current manuscript. Our primary goal here is to present a high-resolution human transcriptomic resource and a coherent framework that identifies biologically plausible epithelial intermediates linking normal fallopian tube hierarchy to malignant states.

      Given the reviewers’ positive evaluation and recognition of the value and rigor of the dataset and analyses, we respectfully request consideration to proceed with publication as an eLife Version of Record without further experimental revision. We believe that the timely dissemination of these findings will provide a useful resource for the field and help guide the experimental studies needed to test the hypotheses generated here.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      In my view, the presentation of the data is in some cases not ideal. The phrasing of some conclusions (e.g., group-attacks and wolf-pack-hunting by the bacteria) is in my opinion too strong based on the herein provided data.

      We agree with your comment and have replaced the terms “Group-attacks” and “wolf-pack-hunting by “attacks” throughout the manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 2AB, please add the name of the statistical test and the number of replicates that the data is based on to the figure legend.

      We thank Reviewer#1 for highlighting the need for more detail. We have revised the manuscript accordingly. The captions of figures 2, 3, 4 and S1 were revised to include the name of the statistical test and the number of replicates. Asterisks indicate significant differences in a multiple comparison test (One -way ANOVA with post hoc Tukey test),* P ≤ 0.05, ** P≤0.01, *** P≤ 0.001

      (2) Figure 2C is this figure referred to in the text?

      We apologize for this oversight. Figure 2C was replaced by new figures 2C and 2D and the old figure 2C is now referenced in the manuscript as Fig 3B1.

      (3) Movie 1, could the movie please also be provided as .mp4? I suggest including individual images across time in the main figure so that readers do not rely on opening a supplementary file for this key finding of the study.

      In the revised manuscript, all the videos were converted to mp4 format and individual images across time were included in Figure 2C and 2D (Chronological snapshots of one attack) and in figure 3B1 (Chronological snapshots of the complete event), thereby improving the readability of the manuscript.

      (4) Figure 3A2 (text l. 355), I am afraid I do not find this figure.

      Fig. 3A2 which previously corresponded to Fig. 3B1, correspond now to Fig. 2C and Fig. 2D. This has been corrected in the revised version of the manuscript.

      (5) Lines 356ff, I am afraid that I find it hard to follow what the authors refer to as the right cell or the left cell. I suggest either adding labels to the movies or providing individual images across multiple timepoints into the main figure that can be labelled and bring across the point.

      Arrows have been added to videos 3–5 to clearly indicate the cells referred to in the text and facilitate tracking across time.

      (6) In general, for all the microscopy, on how many cells have these phenomena been observed? What is n=x? Has this been quantified?

      We thank the reviewer for pointing this out.

      In caption of Fig. 3, the sentence “(A) Percentage of motile A. pacificum ACT03. (B) A. pacificum ACT03 attacked by V. atlanticus LGP32 and (C) A. pacificum ACT03 lysis after 0, 15, 30, 45 and 60 min of interaction. “was replaced by “(A) Cumulative percentage of motile A. pacificum ACT03 cells. (B) Cumulative number of cells attacked by V. atlanticus LGP32 and (C) Cumulative cell lysis after 0, 15, 30, 45 and 60 minutes of interaction.”. In Fig. 3 caption, the sentence “All percentages were determined based on a minimum of 2,000 cells of A. pacificum ACT03.” was also added.

      In Fig. 4 caption, the sentence “All percentages were determined based on a minimum of 2,000 cells of A. pacificum ACT03.” was added.

      In Fig. S1 caption, the sentence “All percentages were determined based on a minimum of 2,000 cells of A. pacificum ACT03.” was added.

      (7) Figure S1A, does this figure show means plus/minus standard deviation? If yes, please add this to the figure legends.

      In Fig. S1 caption, the sentence “Error bars represent the standard deviation of the mean of three independent experiments” was added.

      How do the authors explain the big variation in the test condition and not in the control?

      Regarding the higher variation observed in the test condition compared to the control, this may, on the one hand, reflect biological variability between independent batches of 60-h V. atlanticus cultures used to prepare the supernatants, and, on the other hand, a heterogeneity in the physiological status of independent algal batches (N = 3 ; 2 × 10^4 cells ; see Materials and Methods, Co-culture assay), which may not be perfectly synchronized . In contrast, the control condition consists of A. pacificum cultures incubated in fresh medium without bacterial supernatant, for which algal motility is highly reproducible and thus shows very little variation.

      (8) Line 375, "The lysis phase corresponded to initial vesicle formation followed by the bursting of A. pacificum ACT03 cells (Movie 5) and was induced by the old-starved culture supernatant of V. atlanticus LGP32 (Fig. S1)." Is this reference to Figure S1 correct? S1 shows motility, doesn't it? I don't see how this data supports the statement made in this sentence.

      We apologize for this unclear message.

      "The lysis phase corresponded to initial vesicle formation followed by the bursting of A. pacificum ACT03 cells (Video 5) and was induced by the old-starved culture supernatant of V. atlanticus LGP32 (Fig. S1)." was replaced by "The lysis phase corresponded to initial vesicle formation followed by the bursting of A. pacificum ACT03 cells (Fig. 3C and 3C1).

      And “We next tested whether this lytic effect was mediated by thermostable molecule (s) secreted by Vibrio. “was replaced by “We next tested whether this lytic effect was linked to Vibrio culture supernatant and mediated by thermostable molecule (s) secreted by Vibrio.

      (9) Line 388ff, "Group attacks were observed on non-degraded A. pacificum ACT03 cells, but not on previously lysed cells." No reference to a figure is provided. I am afraid I don't see the data that this statement is based on.

      As it is impossible to show a lack of attack, we just clarified the basis of our experiment.

      “To this end, A. pacificum ACT03 in exponential growth phase was first exposed for 30 minutes to the supernatant of a 60-hour culture of V. atlanticus LGP32, which induced 25% lysis of A. pacificum ACT03 cells. Next, the corresponding V. atlanticus LGP32 cells were added. During exposure, attacks were observed only on undegraded A. pacificum ACT03 cells, but not on previously lysed cells” was replaced by “To this end, A. pacificum ACT03 in exponential growth phase was first exposed for 30 minutes to the supernatant of a 126-hour culture of V. atlanticus LGP32, which induced lysis of 70% of the A. pacificum ACT03 cells (Figures 3C and 3C1, arrow 2 and video 4). Next, cells of V. atlanticus LGP32 from a 60-hour culture, capable of attacking A. pacificum ACT03 cells (Fig. 3B), were added. For 1 hour of exposure, no attack was observed on the previously lysed algae.”

      (10) Figure 4a, Based on the labeling of the figure, in particular the x-axis, it is not fully clear to me what I am looking at.

      Figure 4A has been reworked and its legend modified. We hope that this graph is clearer now.

      (11) Line 428, did the authors consider complementing the pvuD deletion mutant and testing for gain of function when providing the gene in trans?

      We did not investigate pvuD in this study and did not construct a pvuD deletion mutant. We therefore assume that the recommendation refers to pvuB, which was the focus of our work. Unfortunately, we did not perform this experiment. However, several lines of evidence support the implication of PvuB and the vibrioferrin uptake system in this process: (i) the loss of attack behaviour is specific to the mutant in the vibrioferrin uptake pathway and (ii) our expression and proteomic data show a strong induction of vibrioferrin uptake components under starvation and iron-manipulated conditions, which correlate with the attack phenotype.

      (12) Use of the term "group attack" in parentheses in the text, but in the section header and title. Is there really sufficient actual data to say that this is a "group attack"? What exactly are the indications for this being a behaviour of a group?

      We agree with you. The terms “group attacks” and “wolf-pack hunting” were replaced by the more neutral term “attacks” throughout the manuscript.

      (13) Table S1 and S2, those tables give a nice overview. Do the authors provide the raw data based on which they make a claim on "+" and "-" in the individual categories? I would prefer to see the actual data or at least have the possibility to look into this.

      In the revised versions of Tables 1 and 2, we have improved the captions and clarified the meaning of each column in order to avoid any ambiguity between the results of this study and the bibliographic information.

      Specifically regarding Table 2 :

      We do not present any visuals of the interaction between Vibrio and Alexandrium because these species all look alike. Regarding the other algae species tested in interaction with Vibrio, phenomena other than lysis or cell attack have been observed and are the subject of specific laboratory studies.

      (14) Line 456 "first study", line 40f "first evidence of a new mechanism". I suggest toning this down a bit and being clearer in the abstract about this being a working model that can be suggested based on individual bits of data.

      We thank Reviewer #1 for this helpful suggestion.

      In the summary:

      “This is the first evidence of a new mechanism that could to be involved in regulating Alexandrium spp. blooms and giving Vibrio a competitive advantage in obtaining nutrients from the environment.” was replaced by “The interaction model we propose here suggests that Vibrio could play a role in regulating the proliferation of Alexandrium spp., giving it a competitive advantage in obtaining nutrients from the environment.”

      In the discussion:

      Considering predator as a free organism that feeds at the expense of another, this study is the first evidence of the capacity of some Vibrio to develop a predatory strategy against an alga. This behaviour differs from parasitism, because the survival of Vibrio is not exclusively dependent on algae in environment” was replaced by “Consider a predator as a free-living organism that kills its prey and feeds on it, this study provides data suggesting the ability of Vibrios to develop an original predator-like behaviour to kill and feed on algae.”

      (15) Line 469 "Overall, these observations show that V. atlanticus LGP32 is able of wolf-pack hunting behaviour." I see the similarities. I feel that the term "show" is a bit too strong here, or I suggest referring to "wolf-pack-like behaviour".

      The sentence “Overall, these observations show that V. atlanticus LGP32 is able of wolf-pack hunting attack behaviour” was replaced by “Overall, these observations suggest that V. atlanticus LGP32 can exhibit a predator-like behaviour”

      Reviewer #2 (Public review):

      As Weaknesses Reviewer #2 include:

      (1) A lack of early, clear definitions for several important terms used in the paper, including 'predation', 'coordination' and 'coordinated action', 'group attack', and 'wolf-pack hunting', along with a corresponding lack of criteria for what evidence would warrant use of some of these labels. (For example, does mere simultaneity of attacks of an A. pacificum cell by many V. atlanticus cells constitute "coordination"? Or, as it seems to us, does coordination require some form of signalling between predator cells?)

      The term “Coordinate” was replaced by “simultaneous” throughout the manuscript

      The terms “Group attack” and “wolf pack hunting” were replaced by “attack” throughout the manuscript

      (2) Absence of controls for cell density in the test for starvation effects on predatory behaviour; unclear how the length of incubation affects the density of V. atlanticus cells.

      We thank the reviewer for pointing this out.

      Cells density experiment was already performed (cf. Fig. 4A).

      The sentence. ”All percentages were determined based on a minimum of 2,000 cells of A. pacificum ACT03.“ was added in captions of Fig. 3, Fig. 4 and Fig S1

      (3) Lack of clarity in some of the methodological descriptions

      The Methodology has been checked and some improvements have been made.

      Reviewer #2 (Recommendations for the authors):

      (A) Title

      (1) Could 'induces' be better than 'promotes'?

      We agree with Reviewer #2. The initial title, “Starvation of the bacterium Vibrio atlanticus promotes lightning group-attacks on the dinoflagellate Alexandrium pacificum”, was replaced by “Starvation of the bacterium Vibrio atlanticus induces simultaneous attacks on the dinoflagellate Alexandrium pacificum”.

      (B) Abstract

      (1) Perhaps define pycosphere in the abstract - many readers might not know this word.

      We have revised the abstract to define the term phycosphere and added the sentence “This occurs in the microenvironment surrounding phytoplankton cells, the phycosphere. An interface rich in nutrients and organic molecules exuded by the cell.”

      (2) Perhaps "on dinoflagellates".

      We thank Reviewer #2 for this suggestion. We have revised the abstract by replacing “on the dinoflagellates species” with “on dinoflagellates”.

      (3) Line 33 - The word 'prey' is used without a claim of predation having yet been made; only killing has been claimed so far.

      We agree and have replaced the word “prey” by “algae” in the abstract.

      (4) Line 34 - It is unclear whether the description refers to the 'attack stage' or to 'wolf-pack attack' in general. The sentence is written in such a way that it seems to refer to 'wolf-pack attack'. However, this would seem to be incorrect, with the description being specific to V. atlanticus.

      To avoid this ambiguity, we have removed the sentence “resembles the ‘wolf-pack attack’ strategy” from the abstract.

      (5) Line 35 - Should there be a 'consumption phase'?

      We agree with the reviewer #2, “degradation” was replaced by “consumption”.

      (6) If predation is claimed later in the manuscript (which it is), it should be explicitly claimed in the abstract.

      We thank Reviewer #2 for this helpful suggestion.

      We have revised the abstract. The sentence “Results showed that Vibrio atlanticus was able to coordinate lightning group attacks then kill the dinoflagellate Alexandrium pacificum ACT03” was replaced by “The results showed that Vibrio atlanticus was capable of attacking and killing the dinoflagellate Alexandrium pacificum ACT03”.

      (C) Main text

      (1) Line 54 - Perhaps "Among HAB-causing organisms...".

      We agree with the reviewer’s suggestion and have revised the wording.

      (2) Line 56 - "that, together with..., form the "Alexandrium tamarense" complex".

      We agree with the reviewer’s suggestion and have revised the sentence.

      (3) Line 57 - What this "complex" is and its significance should be explained.

      “Among them, Alexandrium pacificum is a flagellated eukaryotic unicellular organism that together with Alexandrium tamarense and Alexandrium fundyense form the "Alexandrium tamarense" complex (Hadjadji et al., 2020)” was replaced by

      “Among them, Alexandrium pacificum is a flagellated eukaryotic unicellular organism that together with Alexandrium tamarense and Alexandrium fundyense form the "Alexandrium tamarense" complex, responsible for paralytic shellfish poisoning worldwide (Hadjadji et al., 2020)”

      (4) Line 58 - What is a Rephy survey?

      We clarified this point, “by rephy survey” was replaced by “by the French phytoplankton observation and monitoring network (Rephy)”

      (5) Line 59 - 'resulting in' instead of 'resulting of'.

      We agree with the reviewer and have replaced “resulting of” with “resulting in”.

      (6) Line 65 - It seems that ', influencing the time of appearance of blooms' would be more correct than the current phrasing. The current phrasing is unclear regarding the relation between species, tolerance range, and the time of appearance of blooms.

      To address this point, “Depending on the phytoplankton species, the tolerance range of physicochemical parameters is different and influences the time of appearance of blooms” was replaced by “Depending on the species of phytoplankton, tolerance to physicochemical parameters varies, which influences when blooms occur.”

      (7) Line 76 - Run-on sentence which should probably be split after the reference to Wang et al., 2020.

      We agree with the reviewer and have split the sentence.

      (8) Line 89 - What are these observations?

      This sentence was reformulated.

      “Based on observations from the natural environment showing a potent relationship between Vibrio and Alexandrium algae bloom events, this study aim to determine in vitro, the main factors implicated in this relationship” was replaced by ”This study aims to describe observations made in the natural environment between Vibrio bacteria and Alexandrium algal blooms, and to determine in vitro the main factors involved in this relationship.”

      (9) Line 94 - This is the first clear reference to a predator-prey interaction, and it is stated as if it's established. Is it not a central goal of the study to demonstrate that predation is even happening?

      Based on the title and abstract, I would have expected the major claims of the paper highlighted in the abstract to be:

      (i) that predation of algae by bacteria occurs in this system,

      (ii) there is a social component of predation,

      (iii) claims about what induces this predatory behaviour.

      The summary has been amended accordingly, and the term “predation” has been removed, along with all sentences referring to it.

      (10) Line 99 - What does n.d. mean?

      This point was addressed in the revised version.

      (11) Line 97 section - specify qPCR.

      This point was clarified in the revised version.

      (12) Line 139 - Mentioning the oligonucleotides in this part of the methods seems out of place. Would this not fit better in the section on Gene expression analysis?

      This sentence was discarded from this paragraph.

      (13) Line 147 - Where did the co-cultured phytoplankton species come from?

      To answer this point, reference to Table 2 was added

      (14) Line 149 - Is it known if the phytoplankton strains had all grown to the same density after 24 hours?

      The doubling time of dinoflagellates in laboratory culture is between 5 and 7 days. During the duration of the experiments, the dinoflagellate concentration did not change significantly.

      The sentence “(doubling time between 5 and 7 days)” was added

      (15) Line 150 - Was the density of the Vibrio cultures at the different incubation times measured? Density might play an important role in predation, and so it would be important to control for density in these assays.

      The concentrations of live vibrio in each individual culture were not actually measured. However, the role of vibrio density in attacks was measured and is shown in Figure 4A and observed in Fig 2B.

      (16) Line 153 - How long was the co-incubation?

      The incubation times were added in the revised version.

      (17) Line 158 - What is mean by "independent experiments", more exactly?

      To clarify this point, “Data are the means of three independent experiments” was replaced by “The data come from three independent experiments using independent phytoplankton cultures and independent bacterial cultures.”

      (18) Line 161 - Perhaps give the source information about the Vibrio strain at its first mention.

      A reference has been added in the revised preprint.

      (19) Line 163 - line 141 refer to multiple non-axenic species, whereas here "the algal strain" is referred to.

      And

      (20) Line 164 - language phrasing throughout the manuscript could use some polishing, e.g., "this means that additional bacteria...".

      To address this comment, “As the algal strain used in the study is not axenic, means that additional bacteria, other than the V. atlanticus LGP32, are potentially present in the experiments.” was replaced by “As the A. pacificum ACT03 strain (table 2) used in the study is not axenic, there is potential for bacteria other than V. atlanticus LGP32 to be present in the experiments.”

      (21) Line 208 - Why were both magnitude and p-value criteria used rather than just p-values?

      In the present proteomic approach each experimental condition was measured six times, and the average (mean) value was used to reduce random noise. Then we selected differences that had to be large enough to matter biologically, this is a central criterion and at least a 2-fold change was considered to focus exclusively on biologically relevant differences, which allowed us to control for the effect size. However, the differences also had to be statistically significant, we applied a statistical confidence at P < 0.01, to be sure that there is less than a 1% chance the result happened randomly. In the present proteomic approach each experimental condition was measured six times, and the average (mean) value was used to reduce random noise.

      Then we selected differences that had to be large enough to matter biologically, this is a central criteria and at least a 2-fold change was considered to focus exclusively on biologically relevant differences, which allowed us to control for the effect size. However, the differences also had to be statistically significant, we applied a statistical confidence at P < 0.01, to be sure that there is less than a 1% chance the result happened randomly. We considered that using both criteria makes the results meaningful and trustworthy, not just a small or random fluctuation.

      (22) Line 270 - Were these three replicate experiments also "independent"; if yes, in what sense?

      “All experiments were conducted in triplicate” was replaced by “The experiments were performed using biological triplicates, each of which was analyzed in triplicate.”

      (23) Line 296 - Perhaps "the temperature-sensitivity (or resistance) of" rather than "the nature of".

      The modification was made in the new manuscript.

      (24) Line 307 - The sentence mentions only one influential period that was removed from the dataset, but the word 'whenever' suggests multiple occurrences.

      We agree, “whenever” was replaced by “because”.

      (25) Line 325 - line 327 - The rationale behind the first part of the following sentence isn't clear to me, and what is meant by the second part is also not clear.

      To clarify this point, “This result is consistent with the difficulty that Vibrio has in growing at temperatures below 20°C and with the complex interacting factors driving bloom dynamics (Laanaia et al., 2013)” was replaced by “This result is consistent with the difficulty Vibrio has in growing at temperatures below 20°C and with the many environmental factors that influence the dynamics of algae proliferation (Laanaia et al., 2013)."

      (26) Line 327 - line 328 - Hard to interpret; does this refer to living algal cells, or all algal cells, living and degraded?

      To improve clarity, “Interestingly, in spring 2015, the mean densities of all Alexandrium cells and of free-living Vibrio were positively correlated” was replaced by “Interestingly, in spring 2015, the mean densities of Alexandrium cells (living and degraded) and of free-living Vibrio were positively correlated”

      (27) Figure 2 - These results strongly point to predation, but why the Vibrio population would already be elevated in the co-culture treatment relative to the control immediately after inoculation (0 hrs) is not clear.

      The experiments were not conducted at the same time, and the first value on the graphs corresponds to the concentration of vibrio determined after 1 hour of exposure/incubation and not at time 0. Figures 2A and 2B have been modified accordingly, and substantial changes have been made to the relevant section of the results.

      (28) Line 348 - There's no mention of Figure 2C in the main text, or of the statistical test associated with it in the Figure 2 legend.

      To address this comment, Figure 2C has now been cited in the main text, and the statistical analysis method has been added to the Figure 2 caption.

      (29) Line 352 - Text descriptions of videos are not easy to connect with the video content. Label the file names the same as how they are referred to in the text.

      We agree with you, the sentence “Epifluorescence microscopy observation of GFP-labelled V. atlanticus LGP32 (previously grown in Zobell medium) in interaction showed that A. pacificum ACT03 cells that had lost their motility were attacked individually by V. atlanticus LGP32 before being lysed (Fig, 2C and Video 1). “was rephrased and replaced by “Epifluorescence microscopy observation of GFP-labelled V. atlanticus LGP32 (previously grow in Zobell medium) in interaction showed that V. atlanticus LGP32 simultaneously attacks A. pacificum ACT03 cells (Fig, 2C and Video 1).”

      (30) Movie 1 could be cut to remove uninteresting footage at the start. What indicates lysis? Is the deformation of the cells an indication of lysis?

      To respond to this comment, Video 1 has been shortened and in the caption, “degraded” was replaced by “lysed”

      (31) Line 353 - Video could be zoomed in more on a few typical attacks to remove visual noise.

      A chronological overview of an attack has been added to Figure 2 corresponding to Figure 2D, and a chronological overview of the overall event has been added to Figure 3 corresponding to Figure 3B1.

      (32) Line 355 - There does not seem to be a Figure 3A2.

      To address this point, the Fig. 2 and Fig. 3 has been revised for more clarity. See above

      (33) Figure 3 - Can the authors fully exclude an effect of bacterial density as distinct from an effect of growth/starvation phase? It would be helpful to determine bacterial viable population densities at 12, 36, 60, and 126 hrs of incubation in Zobell medium, and to control for density in testing for effects on algae.

      Information on Vibrio densities incubated in Zobell medium for 12, 36, 60, and 126 hours has been now included in the results section “Attack of A. pacificum ACT03 is activated by V. atlanticus LGP32 starvation.”

      (34) Line 363 - It is unclear how the degradation of the flagella is apparent from movie 3. It would be helpful to have a comparison with healthy flagella.

      Alexandrium cells with intact flagella move so quickly that it is impossible for us to follow them and film their flagella with the tools at our disposal.

      For greater clarity, arrows have been added to videos 3, 4 and 5.

      (35) Line 364 - Sudden change from referring to the recording as 'video' instead of movie. What is meant by erratic swimming? The cell does not seem to move much.

      To address this comment, “Movie” was replaced by “Video” throughout the manuscript and “erratic swimming” was replaced by “irregular swimming”

      (36) Line 365 - How did you observe the detachment of the flagellum?

      The detachment of the flagellum can be observed using a confocal microscope. This process was filmed and presented in Video 3. Arrows have been added to the video to clearly indicate the flagellum detachment.

      (37) Line 368 - Perhaps this is due to it not being clear regarding which movie is meant, but there is no clear attack visible in movie 4.

      To make this clearer, arrows have been added to the video 4 to indicate attached cells.

      And the sentence in the caption of the video 4 “Vibrio, filmed under a confocal microscope, attacks in groups one immobilized Alexandrium cell then moves on to attack — still as a group — another cell without touching the other whole cells, suggesting active communication between Vibrio cells” was rewritten and replaced by “This video, recorded under a confocal microscope, shows Vibrios simultaneously attacking a first immobilized Alexandrium cell, then moving on to attack a second cell without ever targeting the other cells present, suggesting active communication between the Vibrio bacteria.”

      (38) Line 369 - It seems the peak attach % was reached at 45 minutes, not 15-30 minutes.

      Sorry for the confusion. In fig. 3 for more clarity, the sentence “(A) Percentage of A. pacificum ACT03 motile cells. (B) cells attacked by V. atlanticus LGP32 and (C) cells lysis after 0, 15, 30, 45 and 60 min of interaction” was replaced by “(A) Cumulative percentage of motile A. pacificum ACT03 cells. (B) Cumulative number of cells attacked by V. atlanticus LGP32 and (C) Cumulative cell lysis after 0, 15, 30, 45 and 60 minutes of interaction.”

      (39) Line 382 - "clearly show role of nutrient limitation", see comment re controlling for any role of bacterial density.

      To address this point, information’s on Vibrio densities were added in the manuscript. See cf comment 33.

      (40) Line 385 - line 386 - Phrasing unclear.

      We have revised the text accordingly, “To this aim, A. pacificum ACT03 in exponential growth phase was first exposed for 30 min to supernatant from 60 hours starved V. atlanticus LGP32 Zobell media that induced 25% lysis of A. pacificum ACT03 cells and next to the corresponding V. atlanticus LGP32 cells. Group attacks were observed on non-degraded A. pacificum ACT03 cells, but not on lysed cells.“ was replaced by “To this end, A. pacificum ACT03 in exponential growth phase was first exposed for 30 minutes to the supernatant of a 126-hour culture of V. atlanticus LGP32, which induced lysis of 70% of the A. pacificum ACT03 cells (Figures 3C and 3C1, arrow 2 and video 4). Next, cells of V. atlanticus LGP32 from a 60-hour culture, capable of attacking A. pacificum ACT03 cells (Fig. 3B), were added. For 1 hour of exposure, no attack was observed on the previously lysed algae.”

      (41) Line 413 - Is this the only pathway for quorum sensing in V. atlanticus?

      Indeed, the last two sentences of this paragraph are unclear.

      To address this point:

      “By targeted mutagenesis of key genes involved in QS pathways ΔluxM (HAI-1 production), ΔluxS (AI-2 production) and ΔluxR (high-density QS master regulator) did not lead to any change in the attack behaviour of V. atlanticus LGP32 (Fig. 4C).” was replaced by “Targeted mutagenesis of key genes involved in two of the three known QS pathways in vibrios (Fig. S3), ΔluxM (HAI-1 production), ΔluxS (AI-2 production), and ΔluxR (main high-density QS regulator), did not result in any changes in the attack behavior of V. atlanticus LGP32 (Fig. 4C).”

      And “Taken together these results showed that attack by V. atlanticus LGP32 is not link to QS.” was replaced by. “Combined with the absence of overexpression of the CqsS gene (inducible by CAI-1) involved in the last known QS pathway in Vibrio (Fig. S3), these results indicated that the attack by V. atlanticus LGP32 is most likely unrelated to QS.”

      (42) The references to tropism aren't clear.

      You're right, there's no reason to use the term tropism here. We have removed it.

      (43) Line 439 - Why was H3BO4 used as a control for the addition of FeCl3?

      For clarity, the sentence “Boron being known to be a regulator or capable of being transported by vibrioferrin (Romano et al., 2013; Weerasinghe et al., 2013), we tested its potential involvement in the interaction but no effect was evidenced here.” was replaced by “Given that boron is known for its role in regulating a global bacterial cellular response to phytoplankton and to bind to vibrioferrin (Romano et al., 2013; Weerasinghe et al., 2013), we tested its potential involvement in simultaneous vibrio attacks. Compared to the Zobell control, no effect on the number of attacks was observed”

      (44) Line 441 - line 449 - Should explicitly say in text that no attacks were observed for any species other than the Alexandrium and Gymnodinium species.

      We agree and have explicitly stated in the text that no attacks were observed for any species other than Alexandrium and Gymnodinium.

      (45) Line 454 - line 455 - The last part of this sentence seems a strange statement, since

      (i) it has long been know that predatory bacteria can eat a wide range of eukaryotes, ii) one of the cited papers (Perez et al) actually highlights a case of bacterial predation on algae, and iii) in the next paragraph the authors themselves highlight Streptomyces predation of algae.

      To make this clearer, « Among predators, predatory bacteria are found in a wide variety of environments, and like bacteriophages and predatory protists, they have been reported to prey exclusively on other bacteria » was replaced by “Among predators, predatory bacteria are found in a wide variety of environments and, like bacteriophages and predatory protists, feed primarily on other bacteria, although a few cases of predation on microbial eukaryotes have also been reported.”

      (46) Line 455 - Better to clarify the authors' definition of a predator at the start of the paper. The offered definition seems more like a definition of 'consumer' than 'predator', as the latter normally involves both the killing and consumption of other organisms, not just consumption with some kind of "expense".

      To address this comment:

      - “predator behaviour” was replaced by “predator-like behaviour”

      - and “Considering predator as a free organism that feeds at the expense of another, this study is the first evidence of the capacity of some Vibrio to develop a predatory strategy against an alga. This behaviour differs from parasitism, because the survival of Vibrio is not exclusively dependent on algae in environment” was replaced by “Consider a predator as a free-living organism that kills its prey and feeds on it, this study provides data suggesting the ability of Vibrios to develop an original predator-like behaviour to kill and feed on algae.”

      (47) Line 457 - Don't see the benefit of trying to distinguish from parasitism here, especially since parasitism can be facultative, whereas the authors' phrasing suggests that it is always obligate.

      You are right, this sentence has been deleted.

      (48) Line 463 - line 464 - The authors should clearly explain exactly what detailed aspects of Myxococcus and Lysobacter predation they think the "attack stage" of V. atlanticus resembles.

      Accordingly, “The second stage, the ‘attack stage’ corresponding to physical contact between Vibrio and Alexandrium resembles the ‘wolf-pack attack’ strategy described for Myxococcus xanthus and Lysobacter regardless of the prey species used, M. xanthus must be in close proximity to prey cells in order to induce their lysis and to benefit from their biomass (Martin, 2002; Perez et al., 2014)” was replaced by “The second stage, the ‘attack stage’ corresponding to the physical contact between Vibrios and Alexandrium, is similar to the strategy used by Myxococcus xanthus and Lysobacter. These bacteria must be in close proximity to their prey in order to cause lysis and utilize their biomass, regardless of the prey's species (Martin, 2002; Genovesi et al., 2013; Perez et al., 2016; Zhang et al., 2020)”

      (49) Line 466 - line 467 - The comparison to bacteria clustering around lysed cells is surprising since the authors show that V. atlanticus does not attack already lysed cells.

      The sentence was rephrased, “This phenomenon is comparable to that of bacteria clustering around lysed ciliate cells “was replaced by “Visually, this phenomenon resembles bacteria clustering around lysed ciliate cells.”

      (50) Line 469 - Missing is a statement of exactly what criteria constitute "wolf-pack hunting behaviour" and exactly how V. atlanticus meets those criteria.

      To address this point, “wolf-pack hunting behaviour” was replaced by “predator-like behaviour”

      'Able of' should be corrected to 'Capable of'.

      We agree and have reworded the sentence.

      (51) Line 470 - Consider starting a new paragraph for the material on quorum sensing.

      Accordingly, we have separated the section concerning QS pathway from the section concerning iron pathway.

      (52) As part of their discussion on the role of iron uptake, can the authors comment on any relationship between starvation and iron uptake, and in particular the observations that, while general nutrient deprivation induces attacks, supplementation with a specific nutrient (iron) also induces attacks (Figure 4D)? Do bacteria starved for general growth substrates take up more iron than growing bacteria?

      To respond to this comment, “Future study could demonstrate further the role of vibrioferrin in group attack, by adding iron-saturated vibrioferrin to algae-Vibrio co-cultures.” was replaced by “Interestingly, if a general nutrient deficiency causes attacks, iron supplementation increases the number of attacks (Figure 4D), suggesting the importance of iron absorption in the attack behavior. Future studies should determine whether nutrient deficiency increases the iron absorption capacity of Vibrios and whether this plays a major role in the attack mechanism.”

      (53) Line 486 - Of what is boron known to be a regulator?

      To respond to this comment, “Given that boron is known for its regulatory properties and for being transportable by vibrioferrin“ was replaced by “Given that boron is known for its role in regulating a global bacterial cellular response to phytoplankton and to bind to vibrioferrin”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This work demonstrates that MORC2 undergoes phase separation (PS) in cells to form nuclear condensates, and the authors demonstrate convincingly the interactions responsible for this phase separation. Specifically, the authors make good use of crystallography and NMR to identify multiple protein: protein interactions and use EMSA to confirm protein: DNA interactions. These interactions work together to promote in vitro and in cell phase separation and boost ATPase activity by the catalytic domain of MORC2.

      However, the authors have very weak evidence supporting their potentially valuable claim that MORC2 PS is important for the appropriate gene regulatory role of MORC2 in cells. Exploring causal links between PS and function is an important need in the phase separation field, particularly as regards the role of condensates in gene regulation, and is a non-trivial matter. Any study with convincing data on this matter will be very important. For this reason, it is crucial to properly explore the alternative possibility that soluble complexes, existing in the same conditions as phase-separated condensates, are the functional species. It is also critical to keep in mind that, while a specific protein domain may be essential for PS, this does not mean its only important function pertains to PS.

      In this study, the authors do not sufficiently explore the role that soluble MORC2 complexes may play alongside MORC2 condensates. Neither do they include enough data to solidly show that domain deletion leads to phenotypes via a loss of phase separation per se, rather than the loss of phase separation being a microscopically visible result, not cause, of an underlying shift in protein function. For these reasons, the authors' conclusions regarding the functional role of MORC2 condensates are based on incomplete data. This also dampens the utility of this work as a whole, since the very nice work detailing the mechanism of MORC2 PS is not paired with strong data showing the importance of this observation.

      We thank the reviewer for this thoughtful and constructive critique. We agree that establishing a causal link between phase separation (PS) and biological function—particularly in transcriptional regulation—is a central and non-trivial challenge in the condensate field. We also appreciate the reviewer’s emphasis on two critical alternative interpretations: (i) that soluble MORC2 complexes, rather than condensates, may represent the primary functional species, and (ii) that loss of phase separation upon domain deletion could reflect a downstream consequence of altered protein function rather than its cause.

      To address these concerns, we have performed a series of new experiments specifically designed to decouple condensate formation, and condensate dynamics, thereby allowing us to more rigorously interrogate the functional relevance of MORC2 condensates.

      First, to overcome the limitation of domain deletions which may affect MORC2 function beyond phase separation we introduced a micropeptide-based kill switch (KS) to the C terminus of MORC2. This strategy has recently emerged as a powerful approach to selectively reduce condensate dynamics without disrupting protein expression, folding, or domain architecture [1]. Importantly, unlike CC3 or IDRa deletions, MORC2+KS robustly form nuclear condensates but exhibits markedly reduced internal dynamics, as demonstrated by FRAP analyses showing minimal fluorescence recovery after photo bleaching (Fig. 6a-c). This strategy therefore allows us to perturb condensate material properties independently of MORC2 domain integrity.

      Second, we systematically compared the transcriptional consequences of rescuing MORC2-knockout HeLa cells with MORC2FL, condensation-deficient mutants (ΔCC3 and ΔIDRa), and the dynamics-defective MORC2+KS (Fig. 6d). Despite being expressed at substantially higher levels than MORC2FL (Fig. 6e), all three mutants showed a striking and consistent failure to restore MORC2-dependent transcriptional regulation (Fig. 6f-h). This effect was particularly pronounced for transcriptionally repressed genes, including two sets of high-confidence MORC2 targets reported in prior studies (Fig. 6i and Fig.S10). These findings demonstrate that neither increased protein abundance nor the mere presence of condensate-like structures alone is sufficient to restore MORC2 function.

      Third, our data instead support a model in which both soluble MORC2 complexes and dynamic MORC2 condensates are required for full transcriptional regulation activity. While soluble MORC2 is likely involved in target recognition and complex assembly, our results indicate that proper condensate formation—and critically, condensate dynamics—are essential for effective transcriptional repression and activation. The inability of the MORC2+KS mutant to rescue transcriptional defects, despite intact condensate formation, points away from a model in which MORC2 condensates represent only microscopically visible byproducts of MORC2 activity.

      We believe these new data strengthen the manuscript by pairing the detailed mechanistic dissection of MORC2 phase separation with direct functional evidence, enhancing the conceptual impact and biological significance of the study.

      Strengths:

      Static light scattering and crystallography are nicely used to demonstrate the dimerization of MORC2FL and to discover the structure of the CC3 domain dimer, presumably responsible for the dimerization of MORC2FL (Figure 1).

      Extensive use of deletion mutants in multiple cell lines is used to identify regions of MORC2 that are important for forming condensates in the nucleus: the IBD, IDR, and CC3 domains are found to be essential for condensate formation, while the CW domain plays an unknown role in condensate morphology (Figure 3). The authors use NMR to further identify that the IBD domain seems to interact with the first third of the centrally located IDR, termed IDRa, but not with the latter two-thirds of the IDR domain (Figure 4). This leads them to propose that phase separation is the product of IDB:IDRa interaction, CC3 dimerization, and an unknown but important role for the CW domain.

      Based on the observation that removal of the NLS resulted in diffuse cytoplasmic localization, they hypothesized that DNA may play an important role in MORC2 PS. EMSA was used to demonstrate interaction between DNA and several MORC2 domains: CC1, CC2, IDR, and TCD-CC3-IBD. Further in vitro microscopy with purified MORC2 showed that DNA addition significantly reduces MORC2 saturation concentration (Figure 5).

      These assays convincingly demonstrate that MORC2 phase separates in cells, and identify the protein domains and interactions responsible for this phenomenon, with the notable caveat that the role of the CW domain here is left unexplored.

      We appreciate the reviewer for their positive and detailed assessment of the strengths of our study. Our understanding of the CW domain’s function remains preliminary. Although we observed that the CW domain can influence condensate size, the IDR, IBD, and CC3 domains constitute the core structural elements driving phase separation. Consequently, the CW domain was not a primary focus of the current study. Nonetheless, investigating its functional contributions represents an interesting avenue for future work.

      Weaknesses:

      Although the authors demonstrated phase separation of MORC2FL, their evidence that this plays a functional role in the cell is incomplete.

      Firstly, looking at differentially upregulated genes under MORC2FL overexpression, the authors acknowledge that only 10% are shared with differentially regulated genes identified in other MORC2FL overexpression studies (Figure 6c, d). No explanation is given for why this overlap is so low, making it difficult to trust conclusions from this data set.

      We thank the reviewer for raising this important concern. In response, we have improved the quality and robustness of our RNA-seq analysis by repeating the experiments with optimized sample handling and increased sequencing depth. Using this updated dataset, we identified a considerably higher overlap between MORC2-regulated genes in our study and those reported previously.

      Specifically, we observed 84 overlapping genes with the study by Nikole L. Fendler et al. [2], corresponding to approximately 32% of the MORC2-regulated genes reported in that work (Fig. 6i). In addition, we identified 102 overlapping genes with the dataset reported by Iva A. Tchasovnikarova et al. [3], representing approximately 22% of the genes identified in that study (Fig. S10b).

      We note that complete concordance with previous reports is not expected, given substantial differences in experimental design. For example, Fendler et al. employed a doxycycline-inducible MORC2 expression system [2], whereas our study relies on transient overexpression in MORC2-knockout HeLa cells. In contrast, Tchasovnikarova et al. compared transcriptomes between MORC2 knockout and wild-type cells [3], rather than MORC2 rescue conditions. Moreover, RNA-seq results are inherently influenced by cell line batch variability, sequencing depth, and analysis pipelines, all of which differ across studies.

      Taken together, we consider an overlap in the range of ~20–30% to be reasonable and biologically meaningful in the context of these experimental differences, and we believe that the revised RNA-seq data provide a more reliable foundation for our conclusions regarding MORC2-dependent transcriptional regulation.

      Secondly, of the 21 genes shared in this study and in earlier studies, the authors note that the differential regulation is less pronounced when a phase-separation-deficient MORC2 mutant is overexpressed, rather than MORC2FL (Figure 6e). This is taken as evidence that phase separation is important for the proper function of MORC2. However, no consideration is made for the alternative possibility that the mutant, lacking the CC3 dimerization domain, may result in non-functional complexes involving MORC2, eliminating the need for a PS-centric conclusion. To take the overexpression data as solid evidence for a functional role of MORC2 PS, the authors would need to test the alternative, soluble complex hypothesis. Furthermore, there seems to be low replicate consistency for the MORC2 mutant condition (Figure S6a), with replicate 3 being markedly upregulated when compared to replicates 1 and 2.

      We thank the reviewer for raising these important concerns. In the revised manuscript, we have substantially strengthened both the experimental evidence and the data presentation to directly address the alternative “soluble complex” interpretation as well as the issue of replicate consistency. Specifically, we now provide data that clarify the functional impact of phase-separation-deficient MORC2 mutants and explicitly show replicate-level RNA-seq analyses. The Fig. 6 and Fig. S10support these improvements and enhance both the robustness and transparency of our transcriptional analyses. Collectively, these revisions directly address the reviewer’s concerns regarding the functional interpretation of MORC2 phase separation.

      Thirdly, the authors close by examining the in-cell PS capabilities and ATPase activity of several disease-associated mutants of MORC2 (Figure 7). However, the relevance of these mutants to the past 6 figures is unclear. None of these mutations is in regions identified as important for PS. Two of the mutations result in a higher percentage of the cell population being condensate-positive, but this is not seemingly connected to ATPase activity, as only one of these two mutants has increased ATPase activity. Figure 7 does not add any support to the main hypotheses in the paper, and nowhere in the paper do the authors investigate the protein regions where the mutations in Figure 7 are found.

      We thank the reviewer for raising this point regarding Fig. 7. At the current stage, the results for disease-associated mutations are primarily descriptive. While we observed that certain mutations clustered at the N-terminus can affect MORC2 condensate formation, ATPase activity, and DNA binding, we did not identify a mechanistic explanation for these correlations. Notably, the T424R mutation, previously reported to significantly enhance ATPase activity [4], also increased both intracellular condensate formation and in vitro DNA binding in our experiments. In contrast, other mutations did not show such consistent effects. Previous studies have established that MORC2’s ATP-binding and DNA-binding activities are independent [4]. Our results further suggest that MORC2’s phase separation behavior is independent of both ATP and DNA binding affinity, although existing evidence hints at potential cross-regulatory interactions among these three functions.

      We would also like to emphasize an additional observation that may help contextualize the relevance of N-terminal mutations. Although deletion of the MORC2 N-terminus does not prevent the remaining C-terminal region from forming nuclear condensates, these C-terminal condensates exhibit a marked loss of fluorescence recovery in FRAP assays (Fig. S11). This finding suggests that while the N-terminus is not strictly required for condensate assembly, it plays an important role in regulating condensate fluidity. Accordingly, disease-associated mutations distributed across the N-terminal region may influence MORC2 function by modulating condensate material properties rather than condensate formation per se. Based on this hypothesis, we evaluated the fluidity of condensates formed by the E236G and T424R mutants. FRAP measurements indicated substantially reduced fluorescence recovery in E236G, whereas T424R exerted minimal effects (Fig. 7e, f).

      Overall, our interpretation of the results in Fig. 7 is still at a preliminary stage. Nevertheless, the role of the MORC2 N-terminus in modulating condensate fluidity, together with the observed impairment caused by the E236G mutation, appears to be robust, although the underlying mechanism remains to be elucidated. We have incorporated additional discussion on this point and consider it an important direction for future study.

      Reviewer #1 (Recommendations for the authors):

      (1) Why does MORC2 overexpression lead to changes in gene regulation that are so different from past MORC2 overexpression studies? This is unsettling to me.

      (2) Likewise, why is replicate 3 for the MORC2ΔCC3 variant so different from replicates 1 and 2? Perhaps repeating this experiment would be helpful, both for showing better repeatability and perhaps as regards pulling out a stronger phenotype.

      We have repeated the experiments and obtained improved data quality.

      (3) A better explanation of the relevance of Figure 7 to the story of the rest of the paper, especially the phase-separation of MORC2, would be important to improving this paper.

      We thank the reviewer for this suggestion. We have performed additional experiments and expanded the discussion.

      (4) Are expression levels of mutant proteins in Figure 7 uniform between mutants? If not, is it possible that expression levels might account for the difference in condensate-positive cells between mutants?

      We cannot fully exclude the possibility that differences in expression levels may contribute to the observed differences among mutants. In our experiments, equal amounts of plasmid DNA were used for transfection across all conditions. Although we did not directly quantify post-transfection protein expression levels by immunoblotting or similar approaches, even if certain mutations were to affect protein expression, it would be technically challenging to further optimize the strategy to fully normalize expression levels across mutants.

      Importantly, we note that MORC2 does not form condensates in all transfected cells, even when EGFP fluorescence indicates robust expression levels that are comparable to, or even exceed, those observed in condensate-positive cells. This observation suggests that high expression alone is not sufficient to drive MORC2 phase separation in cells. Therefore, we do not favor the interpretation that the E236K and T424R mutations enhance MORC2 condensation simply by increasing MORC2 protein expression levels.

      Minor:

      (1) I would suggest considering using the term "dynamic" rather than "liquid-like", as FRAP is technically a measurement of the dynamicity of a protein within a volume, rather than a measurement of the actual fluidity of that volume.

      We thank the reviewer for this helpful suggestion. We agree that FRAP measurements primarily report protein mobility and condensate dynamics rather than the physical fluidity of the condensates. We have therefore revised the manuscript to replace “liquid-like” with “dynamic” where conclusions are based on FRAP analyses.

      (2) A further investigation of the role of the CW domain would be very interesting, since it clearly has a major role in condensate morphology. Perhaps CW confers important heterotypic interactions which contribute to compositional control of the MORC2 condensates, and thus function and morphology? However, due to the complexity of this specific question and the potentially marginal improvement offered by this paper, I do not think this is a critical addition.

      We thank the reviewer for this insightful suggestion. We have noted this possibility in the Discussion as an important avenue for future investigation.

      (3) Why is TCD not tested alone by EMSA for affinity to DNA in Figure 5?

      Our inference regarding the DNA-binding capacity of the TCD domain was based on comparative EMSA analyses. Specifically, we found that the TCD–CC3–IBD fragment was able to bind DNA, whereas the CC3–IBD fragment alone showed no detectable DNA binding. From this comparison, we inferred that the TCD domain is responsible for the observed DNA-binding activity.

      Because the TCD domain does not affect MORC2 condensate formation, it was not a central focus of the present study, which primarily aims to elucidate the mechanisms underlying MORC2 phase separation and its functional relevance. For this reason, we did not further test TCD alone by EMSA in Figure 5.

      Reviewer #2 (Public review):

      Summary:

      The study by Zhang et al. focuses on how phase separation of a chromatin-associated protein MORC2, could regulate gene expression. Their study shows that MORC2 forms dynamic nuclear condensates in cells. In vitro, MORC2 phase separation is driven by dimerization and multivalent interactions involving the C-terminal domain. A key finding is that the intrinsically disordered region (IDR) of MORC2 exhibits strong DNA binding. They report that DNA binding enhances MORC2's phase separation and its ATPase activity, offering new insights into how MORC2 contributes to chromatin organization and gene regulation. The authors try to correlate MORC2's condensate-forming ability with its gene silencing function, but this warrants additional controls and validation. Moreover, they investigate the effect of disease-linked mutations in the N-terminal domain of MORC2 on its ability to form cellular condensates, ATPase activity, and DNA-binding, though the findings appear inconclusive in the manuscript's current form.

      Thank you for your thorough and constructive review of our manuscript. In response to the concerns raised regarding the functional relevance of MORC2 condensate formation, we have redesigned and expanded the experiments presented in Fig. 6 and Fig. S6 to directly link MORC2’s condensate-forming capacity with its transcriptional regulatory function. These new experiments provide additional controls and validation, strengthening the causal relationship between MORC2 condensate dynamics and gene regulation.

      At the current stage, the results for disease-associated mutations are descriptive. While we observed that certain mutations clustered at the N-terminus can affect MORC2 condensate formation, ATPase activity, and DNA binding, we did not identify a mechanistic explanation for these correlations. Notably, the T424R mutation, previously reported to significantly enhance ATPase activity [4], also increased both intracellular condensate formation and in vitro DNA binding in our experiments. In contrast, other mutations did not show such consistent effects. Previous studies have established that MORC2’s ATP-binding and DNA-binding activities are independent [4]. Our results further suggest that MORC2’s phase separation behavior is also independent of both ATP and DNA binding, although existing evidence hints at potential cross-regulatory interactions among these three functions.

      Strengths:

      The authors determined a 3.1 Å resolution crystal structure of the dimeric coiled-coil 3 (CC3) domain of MORC2, revealing a hydrophobic interface that stabilizes dimer formation. They present extensive evidence that MORC2 undergoes liquid-liquid phase separation (LLPS) across multiple contexts, including in vitro, in cellulo, and in vivo. Through systematic cellular screening, they identified the C-terminal domain of MORC2 as a key driver of condensate formation. Biophysical and biochemical analyses further show that the IDR within the C-terminal domain interacts with the C-terminal end region (IBD) and also exhibits strong DNA-binding capacity, both of which promote MORC2 phase separation. Together, this study emphasizes that interactions mediated by multiple domains-CC3, IDR, and IBD- drives MORC2 phase separation. Finally, the authors quantified the effect of removing the CC3 on the upregulation and downregulation of target gene expression.

      We thank the reviewer for their appreciation of the key findings presented in this manuscript.

      Weaknesses:

      Though the findings appear compelling in isolation, the study lacks discussion on how its findings compare with previous studies. Particularly in the context of MORC2-DNA binding, there are previous studies extensively exploring MORC2-DNA binding (Tan, W., Park, J., Venugopal, H. et al. Nat Commun 2025), and its effect on ATPase activity (ref 22). The contradictory results in ref 22 about the impact of DNA-binding on ATPase activity, and ATPase activity on transcriptional repression, warrant proper discussion. The authors performed extensive in-cellulo screening for the investigation of domain contribution in MORC2 condensate formation, but the study does not consider/discuss the possibility of some indirect contributions from the complex cellular environment. Alternatively, the domain-specific contributions could be quantified in vitro by comparing phase diagrams for their variants. While the basis of this study is to investigate the mechanism of MORC2 condensate-mediated gene silencing, the findings in Figure 6 appear incomplete because the CC3 deletion not only affects phase separation of MORC2 but also dimerization. Furthermore, their investigation on disease-linked MORC2 mutations appears very preliminary and inconclusive because there are no obvious trends from the data. Overall, the discussion appears weak as it is missing references to previous studies and, most importantly, how their findings compare to others'.

      We thank the reviewer for their careful assessment of MORC2’s DNA-binding properties and its relationship with ATPase and transcriptional activities. We would like to offer the following clarifications to address these concerns, which will also be incorporated into the Discussion section of the revised manuscript.

      First, recent work by Tan et al. [5] similarly identified multiple DNA-binding sites in MORC2, consistent with our findings, though there are discrepancies in the precise binding regions. In particular, they reported that isolated CC1 and CC2 domains do not bind 60 bp dsDNA, which contrasts with our observations. We attribute this difference to the types of DNA used in the assays. In our study, we employed 601 DNA, a defined nucleosome-positioning sequence, which differs substantially from randomly designed short dsDNA. For instance, prior work by Christopher H. Douse et al. [54] also confirmed that MORC2’s CC1 domain can bind 601 DNA.

      Second, in the study by Fendler et al. [2], DNA binding was reported to reduce MORC2’s ATPase activity—an observation that appears inconsistent with the results presented in our Fig. 5j. A critical distinction between the two studies lies in the experimental systems used: Fendler et al. [2] employed MORC2 constructs and 35 bp double-stranded DNA (dsDNA), whereas our experiments utilized full-length MORC2 and 601 bp DNA (a sequence with high nucleosome assembly potential). These differences including the absence of potentially regulatory C-terminal regions in the truncated construct and the varying length/structural properties of the DNA substrates introduce variables that substantially complicate direct comparative analysis of ATPase activity outcomes.

      Separately, Douse et al. [4] demonstrated that the efficiency of HUSH complex-dependent epigenetic silencing decreases as MORC2’s ATP hydrolysis rate increases, implying an inverse relationship between ATPase activity and silencing function. Notably, our current work has not established a direct mechanistic link between MORC2 phase separation and its ATPase activity. Thus, we refrain from inferring that the effect of MORC2 phase separation on transcriptional repression is mediated through modulation of its ATPase function this remains an important question to address in future studies.

      Finally, we have redesigned and expanded the experiments presented in Fig. 6 and Fig. S6 to directly link MORC2’s condensate-forming capacity with its transcriptional regulatory function.

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) Unaddressed discrepancies with the previous study:

      (a) Inadequate discussion of Reference 22 and apparent contradictions. Notably, Reference 22 provides evidence for reduced ATPase activity upon DNA binding, in contrast to the current study's observations. Moreover, Reference 22 demonstrates that ATP hydrolysis (ATPase activity) is inversely associated with MORC2-mediated gene silencing, whereas this study concludes that 'the silencing function of MORC2 requires its ATPase activity'. These apparent contradictions warrant a more thorough discussion to reconcile the differences, including potential mechanistic explanations and experimental context that could account for the discrepancies. Additionally, the authors should discuss potential reasons why Ref. 22 may not have observed phase separation during MORC2 biophysical analysis. For instance, in Ref. 22, SEC-MALS was performed at 2 mg/mL (~16 µM) MORC2 FL in the presence of 150 mM NaCl, conditions that could influence phase behavior based on the current manuscript's results. Addressing whether differences in protein construct, buffer composition, or experimental design might account for this discrepancy would strengthen the discussion.

      We thank the reviewer for pointing out the apparent discrepancies between our results and those reported in Ref. 22. We agree that these differences warrant explicit discussion, and we have revised the Discussion accordingly to clarify the experimental and conceptual distinctions between the two studies.

      First, regarding the effect of DNA binding on ATPase activity, Ref. 22 examined MORC2 ATPase activity under conditions where MORC2 does not undergo detectable phase separation, whereas our ATPase assays were performed under conditions in which MORC2 readily forms condensates in the presence of DNA. We therefore propose that the observed increase in ATPase activity in our study may reflect a distinct biochemical regime in which phase separation and/or high local protein concentration modulates enzymatic activity. Importantly, our data do not exclude the possibility that DNA binding per se can inhibit ATPase activity under non-condensing conditions, as reported in Ref. 22.

      Second, with respect to transcriptional repression, Ref. 22 reported an inverse correlation between ATP hydrolysis and MORC2-mediated silencing, whereas our study finds that ATPase activity is required for efficient repression. We suggest that these observations are not necessarily contradictory but may reflect different regulatory layers of MORC2 function. Specifically, ATP binding and hydrolysis may be required for MORC2 structural remodeling and chromatin engagement, while excessive or dysregulated ATP hydrolysis could impair stable silencing complexes, as suggested previously [4]. We now explicitly discuss this possibility in the revised manuscript.

      Finally, we appreciate the reviewer’s suggestion regarding the absence of phase separation in Ref. 22. Indeed, SEC-MALS experiments in Ref. 22 were conducted at ~16 µM MORC2 in the presence of 150 mM NaCl (the purification condition is 500 mM NaCl, 10% glycerol), conditions that based on our phase diagrams—are close to or above the saturation concentration but also strongly influenced by ionic strength. This combination of factors explains why the UV peak from SEC-MALS is not indicative of a homogeneous sample [3].

      (b) The DNA binding capacity of individual MORC2 domains was tested in Fig. 5. IDR appears to be the strongest DNA binder among others. Is this the effect of IDR being isolated from the rest of the protein? A recent paper (Tan, W., Park, J., Venugopal, H. et al. Nat Commun 2025) also investigated DNA binding capacity of different regions of MORC2 using hydrogen-deuterium exchange experiments and EMSA. Interestingly, it can be seen in Figure S9 that the DNA binding capacity of different regions changes when compared together to when in isolation (MORC2 1-603 vs 1-265; 1-495; 496-603). In line with the above, MORC2 IDR's interaction with DNA warrants additional investigation, taking the system as a whole to avoid misinterpretation arising from non-specific interactions.

      We appreciate the reviewer’s insightful comments regarding domain-specific DNA binding and the potential caveats of studying isolated regions. In Figure 5, our EMSA analyses show that the isolated IDR exhibits the strongest DNA-binding signal among the tested fragments. We agree that this observation may, at least in part, reflect the removal of structural or regulatory constraints imposed by the full-length protein.

      Consistent with the reviewer’s point, Tan et al. [5] demonstrated that DNA-binding behavior of MORC2 regions differs when analyzed in isolation versus in the context of larger constructs. We have now incorporated this comparison into the Discussion and explicitly note that DNA binding by the IDR should be interpreted as a contextual and potentially cooperative property rather than an autonomous function.

      Importantly, our conclusions do not rely on the IDR acting as an independent DNA-binding module in vivo. Rather, we propose that the IDR contributes to DNA engagement and phase behavior within the architectural framework of full-length MORC2. We now emphasize this limitation and highlight the need for future studies that probe DNA binding in the context of intact MORC2 or minimally perturbed constructs.

      (2) MORC2 DNA binding impacting phase separation and ATPase activity:

      While it is clear that MORC2: DNA interaction facilitates MORC2 phase separation, the impact on ATPase activity is not conclusive. First, they observe an opposite trend (compared to ref. 22) for DNA binding on MORC2's ATPase activity. Secondly, it is not clear if the increase in ATPase activity is mediated by DNA binding or phase separation. The ATPase activity was measured at 1 µM MORC2 protein concentration in the presence of DNA, where MORC2 appears to phase separate. To draw more definitive conclusions, additional controls are necessary. Specifically, a phase separation-deficient mutant (from this study) and a DNA-binding-deficient mutant (see ref. 22) should be included to disentangle the contributions of DNA binding and phase separation to ATPase activity. The choice of ATP-binding-deficient mutant N39A as a negative control seems inconclusive in this regard. Additionally, why is there an increase in ATP hydrolysis rate for the ATP-binding-deficient mutant in the presence of DNA, resulting in ATP hydrolysis rates similar to WT MORC2? This raises further questions about the underlying mechanism.

      We agree with the reviewer that disentangling the contributions of DNA binding and phase separation to ATPase activity is challenging and that our current data do not fully resolve this issue. As noted, ATPase assays were performed at protein concentrations (1 µM) where MORC2 undergoes DNA-induced phase separation, making it difficult to distinguish whether enhanced ATP hydrolysis arises directly from DNA binding or indirectly from condensate formation.

      We acknowledge that inclusion of additional mutants such as phase separation deficient or DNA-binding deficient variants would provide a more definitive mechanistic separation of these effects. However, generating and validating such mutants in a manner that preserves overall protein integrity is beyond the scope of the current study. Accordingly, we have revised the text to present our findings more cautiously and to frame the observed ATPase enhancement as a correlation rather than a causal mechanism.

      Regarding the ATP-binding–deficient N39A mutant, we agree that its behavior in the presence of DNA raises interesting mechanistic questions. We now explicitly note this unexpected observation and discuss possible explanations, including partial ATP binding, altered oligomeric states, or indirect effects mediated by condensate formation.

      (3) Dissecting the domain-specific contribution in MORC2 phase separation:

      (a) While in cellulo data indicate that the presence of IDR, NLS, CC3, and IBD is all essential for MORC2 condensate formation, it is not clear if this is the effect of the complex cellular environment or whether it is intrinsic for MORC2 phase separation ability. In lines 256-259, the authors suggest IDRa interaction with IBD may serve as a nucleation mechanism for LLPS. In other places, it has been mentioned that CC3 dimerization acts as a scaffold for condensate formation. It is not clear if all of these are essential for MORC2 phase separation, or one of them is essential while the other domain(s) facilitates the phase separation. Though Figure 3 provides a qualitative overview of the contribution of different regions in MORC2 phase separation in cellulo-influenced by the complex cellular environment and substrate interactions, the absolute domain contribution in phase separation would be better studied in vitro by quantitatively comparing phase diagrams (for example, c-sat vs temperature) of different domain deletion constructs.

      We thank the reviewer for highlighting the distinction between intrinsic phase separation propensity and cellular context dependent effects. Our in cellular screening was designed to identify regions required for condensate formation under physiological conditions, where chromatin, binding partners, and macromolecular crowding are present. We agree that this approach does not directly quantify the intrinsic phase separation contribution of individual domains.

      While CC3 dimerization, IDR–IBD interactions, and nuclear localization all contribute to condensate formation, our data do not imply that these elements are mechanistically equivalent. Rather, we propose that CC3 provides a structural scaffold, while IDR-mediated interactions lower the energetic barrier for condensation. We have revised the manuscript to clarify this hierarchical model and to avoid implying that all domains contribute equally or independently.

      We agree that quantitative in vitro phase diagrams would provide valuable insight into intrinsic domain contributions. Whereas the MORC2ΔCC3-IBD (1–900) and CC3-IBD (900-1032) fragment fails to induce phase separation, the IDR mix CC3–IBD fragment drives robust phase separation; additionally, phase separation is entirely abrogated in the absence of domain–domain interactions. These observations collectively verify that phase separation is contingent on specific domain combinations and their interactions.

      (b) Similarly, for line 228-231: 'Notably, condensates formed exclusively in the nucleus and not in the cytoplasm of transfected HeLa cells, suggesting that chromatin-associated nuclear factors, such as DNA, may contribute to the nucleation or stabilization of MORC2 condensates.' This is an important observation made by the authors. Since MORC2 readily phase separates in vitro under physiological conditions, it is important to discuss why MORC2 does not make condensates in the cytoplasm (in the case of MORC2deltaNLS). In this regard, how does the concentration of overexpressed EGFP-MORC2 constructs compare with in vitro tested droplets of MORC2?

      We thank the reviewer for highlighting this important conceptual point. Although MORC2 readily undergoes phase separation in vitro under physiological buffer conditions, the absence of condensate formation in the cytoplasm of cells expressing MORC2ΔNLS underscores the importance of the nuclear environment in promoting MORC2 assembly.

      The cytoplasm differs fundamentally from the nucleus not only in overall molecular composition but also in the availability of high-valency scaffolds such as chromatin. We propose that chromatin-associated components, particularly DNA, provide a platform that locally concentrates MORC2 and increases its effective valency, thereby facilitating nucleation or stabilization of condensates in the nucleus. In contrast, the cytoplasm lacks such scaffolds, even when MORC2 is expressed at appreciable levels. In cultured cells, MORC2 is seldom observed in the cytoplasm. While specific experimental contexts may facilitate its cytoplasmic localization, such observations are rarely reported [6]. In transfection-based systems, MORC2 predominantly displays droplet-like behavior in the nucleus. Notably, in endogenous EGFP–MORC2 chimeric mice, we detected punctate MORC2 structures in the neuronal cytoplasm of the brain and spinal cord. The functional significance and biophysical state of cytoplasmic MORC2 remain largely unexplored.

      With respect to protein concentration, while EGFP-MORC2 is robustly expressed in cells, direct comparison between cellular expression levels and the protein concentrations used in vitro is inherently challenging. Importantly, in vitro phase separation is driven by bulk protein concentration under defined conditions, whereas in cells, effective local concentration and interaction valency are strongly shaped by spatial confinement and chromatin association. We have revised the manuscript text to emphasize this distinction and to avoid interpreting nuclear specificity as a purely concentration-dependent phenomenon.

      (c) Lines 227-228: '... CW domain restricts condensate overgrowth or fusion', this inference is based on CTDdeltaCW puncta being larger in size (Figure 3a). However, in Figure 4h MORC2deltaIDRb and MORC2deltaIDRc also result in larger puncta. Making a final conclusion that the CW domain restricts condensate overgrowth or fusion warrants additional investigation.

      We thank the reviewer for pointing out the limitation of our original conclusion. We agree that the enlarged puncta in both CTDΔCW (Figure 3a) indicate that condensate size regulation involves the CW domain was insufficiently rigorous.

      Re-analysis of existing data identifies clear phenotypic disparities between the mutants: MORC2ΔIDRb/ΔIDRc mutants show two distinct phenotypes (reduced puncta number with enlarged size, or unchanged puncta number with uniform enlargement), and their total puncta area per cell is comparable to the WT. By contrast, CTDΔCW mutants display markedly larger puncta relative to the WT. Based on this distinction, we have revised our conclusion to a more cautious formulation: "These observations suggest that the CW domain may participate in regulating initial nucleation size and the exact molecular mechanisms require further investigation."

      (4) MORC2 condensate-mediated gene silencing:

      This is one of the key investigations of this study where the authors evaluate the ability of MORC2 condensates to regulate gene silencing (transcriptional repression). The major concern here is that the authors are drawing their conclusion based on a CC3 domain deletion mutant of MORC2 and comparing it with wild-type MORC2. Notably, the CC3 domain is responsible for MORC2 dimerization, and as the authors quote, 'The dimeric assembly of CC3 is essential for maintaining the structural integrity of the protein', the absence of CC3 would have a direct impact on its function (such as ATPase activity). With these considerations, it is not clear whether the effect of CC3 domain deletion on gene regulation is an effect of no phase separation or a consequence of loss of function. This necessitates additional validation by including other controls, such as IBD domain deletion mutant, IDRa domain deletion mutant, where the phase separation is impeded without affecting dimerization.

      We appreciate the reviewer’s concern regarding the interpretation of CC3 deletion experiments. We agree that CC3 deletion affects both dimerization and phase separation, complicating attribution of gene regulatory effects solely to condensate formation. Our intention was not to claim that loss of repression arises exclusively from impaired phase separation, but rather to demonstrate that disrupting condensate-dynamic capacity correlates with impaired silencing.

      To directly address these concerns, we have performed a series of new experiments specifically designed to decouple condensate formation, condensate dynamics, and protein abundance, thereby allowing us to more rigorously interrogate the functional relevance of MORC2 condensates.

      First, to overcome the limitation of domain deletions which may affect MORC2 function beyond phase separation we introduced a micropeptide-based kill switch (KS) to the C terminus of MORC2. This strategy has recently emerged as a powerful approach to selectively reduce condensate dynamics without disrupting protein expression, folding, or domain architecture [1]. Importantly, unlike CC3 or IDRa deletions, MORC2+KS robustly form nuclear condensates but exhibits markedly reduced internal dynamics, as demonstrated by FRAP analyses showing minimal fluorescence recovery after photo bleaching (Fig. 6a-c). This strategy therefore allows us to perturb condensate material properties independently of MORC2 domain integrity.

      Second, we systematically compared the transcriptional consequences of rescuing MORC2-knockout HeLa cells with MORC2FL, condensation-deficient mutants (ΔCC3 and ΔIDRa), and the dynamics-defective MORC2+KS (Fig. 6d). Despite being expressed at substantially higher levels than MORC2FL (Fig. 6e), all three mutants showed a striking and consistent failure to restore MORC2-dependent transcriptional regulation (Fig. 6f-h). This effect was particularly pronounced for transcriptionally repressed genes, including two sets of high-confidence MORC2 targets reported in prior studies (Fig. 6i and Fig. S10). These findings demonstrate that neither increased protein abundance nor the mere presence of condensate-like structures alone is sufficient to restore MORC2 function.

      Third, our data instead support a model in which both soluble MORC2 complexes and dynamic MORC2 condensates are required for full transcriptional activity. While soluble MORC2 is likely involved in target recognition and complex assembly, our results indicate that proper condensate formation and critically, condensate dynamics are essential for effective transcriptional repression and activation. The inability of the MORC2+KS mutant to rescue transcriptional defects, despite intact condensate formation, points away from a model in which MORC2 condensates represent only microscopically visible byproducts of MORC2 activity.

      We believe these new data strengthen the manuscript by pairing the detailed mechanistic dissection of MORC2 phase separation with direct functional evidence, enhancing the conceptual impact and biological significance of the study.

      (5) Uncertain impact of pathogenic MORC2 mutations:

      Line 356-365: While the statements such as "disease-associated mutations primarily affect enzymatic and phase behaviors rather than DNA affinity" and "these findings provide mechanistic insight into how specific mutations may contribute to distinct pathological outcomes" are conceptually compelling, the data presented in Figure 7b-d do not appear to fully support these conclusions. For many of the mutants, the differences from WT across key parameters-condensation, ATPase activity, and DNA binding-are either modest or statistically insignificant. As such, drawing a unified mechanistic conclusion from these datasets may overstate what the data actually support.

      We agree that the effects of disease-associated MORC2 mutations described in Fig. 7 are modest and, in some cases, statistically insignificant. Our intention was to document observable trends rather than to propose a unified mechanistic framework. We have revised the manuscript to temper these conclusions and to emphasize the descriptive nature of these data.

      (6) Important conceptual clarifications:

      (a) Intrinsically disordered regions (IDRs) are not synonymous with phase separation. As the authors show, it is a combination of IDR-mediated interactions and CC3 dimerization that contributes towards the phase separation of MORC2. While IDRs can act as scaffolds for multivalent weak interactions that may promote biomolecular condensate formation, many IDRs serve other roles-such as mediating transient interactions, signaling, or regulatory functions-without undergoing phase separation. Researchers should avoid generalizing the assumption that the mere presence of IDRs in a protein implies its ability for phase separation. In this regard, authors should consider restructuring some of their generalized statements: Line 87-88: 'Recent studies suggest that intrinsically disordered regions (IDRs) can drive liquid-liquid phase separation (LLPS)' and Line 159-161: 'we noticed a long unstructured region at its C-terminus (Fig. S1b), a characteristic often associated with proteins capable of phase separation'.

      We agree that IDRs are not synonymous with phase separation and have revised the Introduction to avoid generalized statements. The revised text now emphasizes that IDRs can contribute to phase separation in a context-dependent manner and act in concert with structured oligomerization domains such as CC3-IBD.

      (b) Liquid-liquid phase separation: I would suggest switching the phrase to just phase separation. The rationale is that the in vitro studies of MORC2 (FRAP, droplet imaging) do not show liquid-like behavior, but perhaps liquid-solid. The FRAP studies suggest liquid-like behavior for some of the constructs. Given the differences in viscoelastic properties across the in vitro and in cellulo studies, it is better to generalize to "phase separation". Movies for droplet fusion and FRAP, wherever applicable, would be much appreciated. As the nature of in vitro MORC2 droplets appears different than in cells, movie representations of the above would enable readers to better assess the viscoelastic nature of the droplets (whether liquid, gel, etc).

      We appreciate the reviewer’s insight regarding the viscoelastic properties of MORC2. Our experimental data indeed show a disparity in dynamics between the two environments: while in vitro MORC2-FL condensates exhibit relatively low internal mobility, the in cellulo MORC2-FL puncta display high dynamics, characterized by rapid internal recovery in FRAP assays and droplet fusion events (Fig. S2f).

      This contrast suggests that the intracellular microenvironment plays a critical role in regulating the material state of MORC2 condensates. Consequently, we have focused on providing in vivo fusion data, as we believe in vitro characterizations (such as fusion or FRAP under various artificial conditions) may not faithfully represent the physiological behavior of MORC2. We have revised the manuscript to use the more general term “phase separation” or “condensation” and have added a discussion on these limitations to avoid overinterpreting the material properties observed in vitro.

      (7) Methods:

      (a) Figure 6 S2b: If phase separation occurs at, say, 1.8 µM protein concentration, this indicates that the protein has reached its saturation concentration (c-sat). Beyond c-sat, any additional protein should partition into the dense phase, while the concentration of the dilute phase remains constant. However, in this figure, the dilute phase concentration appears to increase with increasing total protein concentration, which is inconsistent with expected phase separation behavior. As the methods section does not have any sub-section for the sedimentation assay, it becomes difficult to understand how this experiment was performed, whether there is any technical discrepancy in the way soluble and pellet fractions were handled and processed for loading onto the gels. This is also the case with Figure 3d.

      We thank the reviewer for carefully examining the sedimentation assay and for raising this important conceptual point. We agree that, for an ideal two-phase system at thermodynamic equilibrium, the concentration of the dilute phase is expected to remain constant once the saturation concentration (c-sat) is reached.

      In our study, the sedimentation assay was used as an operational readout to assess concentration-dependent partitioning rather than to quantitatively define equilibrium phase boundaries. The assay involves centrifugation-based separation of supernatant and pellet fractions followed by SDS–PAGE analysis, and therefore does not necessarily report the equilibrium concentrations of coexisting dilute and dense phases. In particular, this approach can be influenced by incomplete physical separation of phases, kinetic trapping, and redistribution of material during handling, especially in systems where condensate maturation or internal reorganization occurs on longer timescales.

      Consequently, the apparent increase in the supernatant fraction with increasing total protein concentration likely stems from kinetic limitations and inherent technical constraints of the sedimentation assay, rather than a genuine deviation from classical phase separation behavior. These caveats are now explicitly clarified in the Methods section, with similar limitations of centrifugation-based assays for defining equilibrium phase behavior of biomolecular condensates reported previously.

      (b) Figure 4: The NMR comparisons appear to be primarily qualitative, lacking quantitative analyses such as chemical shift perturbation (CSP) and intensity ratio plots, which would offer deeper mechanistic insights. The NMR spectra detailing interactions among the IDR domains need to be quantified.

      We thank the reviewer for the suggestion. We have now performed quantitative CSP analyses for the NMR data shown in Fig. 4, and the corresponding CSP plots have been added to the revised manuscript (Fig. S7).

      As expected for interactions mediated by intrinsically disordered regions involved in phase separation, the observed CSPs are generally small. Notably, the CSP profile of IDRa closely matches that observed for the full-length IDR, whereas IDRb and IDRc show minimal perturbations. These results indicate that the interaction is primarily mediated by IDRa, with little contribution from the remaining regions.

      Peak intensity analyses were also examined but did not reveal additional residue-specific trends. Together, the quantitative CSP data support our conclusion that the interaction is weak, dynamic, and region-specific, consistent with an IDR-driven, phase-separation-related mechanism. We add this statement in method: CSPs were calculated in Hz at 600 MHz using the following equation:

      Minor comments:

      (1) Line 59-60: The Authors mention the HUSH-complex and then the MORC protein family, but do not discuss the relation between the two.

      We thank the reviewer for this comment. We have revised the Introduction to explicitly state that MORC2 may serve as a component of the HUSH complex and to clarify the functional relationship between MORC family proteins and HUSH-mediated transcriptional repression.

      (2) Line 74: 'Despite their structural similarities...', similarities between what all?

      We agree that this statement was ambiguous. We have revised the text to explicitly specify that the comparison refers to structural similarities among MORC family members.

      (3) Line 75: 'MORC-mediated repression remains...', this is the first time the word 'repression' is mentioned in the text and directly as an outstanding question.

      We have revised the Introduction to introduce the concept of transcriptional repression earlier and to provide appropriate context before posing it as an outstanding question.

      (4) The third paragraph does address issues in comments 1 and 3 to some extent, but the introduction needs some restructuring to provide a proper flow of information.

      We agree that the Introduction required restructuring. We have revised this section to improve logical flow, better integrate prior studies, and more clearly articulate the motivation and scope of the present work.

      (5) Line 83-85: How does the presence of IDRs suggest potential regulatory mechanisms?

      We have revised this sentence to clarify that IDRs may contribute to regulatory mechanisms by enabling multivalent and dynamic interactions, rather than implying that IDRs inherently confer regulatory function or phase separation capability.

      (6) Line 106-107: 'To determine whether MORC2 has N- and C-terminal dimerization interfaces similar to those...', reference 14 has already established that CC3 (denoted as CC4 in ref 14) is responsible for dimerization. Consider acknowledging their work in this regard?

      We thank the reviewer for this reminder. We have now explicitly acknowledged Ref. 14, which previously established the role of CC3 (denoted CC4 in that study) in MORC2 dimerization.

      (7) Lines 117-122: Are the authors comparing morphology from negative stain EM with AlphaFold predicted structure (Figure S1a and S1b)? If so, providing a zoomed-in inset from Figure S1a would be helpful.

      Yes, the comparison was intended to relate the negative-stain EM morphology to the AlphaFold-predicted architecture. We have added a zoomed-in inset in Fig. S1a to facilitate clearer comparison.

      (8) Line 152-153: '...even under varying physiological conditions', what are these varying conditions? Are the authors trying to point towards any of their specific results?

      We have revised this phrase to explicitly refer to variations in salt concentration and protein concentration tested in our in vitro assays.

      (9) Line 154-155: 'The dimeric assembly of CC3 is essential for maintaining the structural integrity of the protein', if it has been established, then please provide a reference.

      We thank the reviewer for this suggestion. For MORC family proteins, C-terminal coiled-coil–mediated dimerization is necessary for correct homodimer formation and functional stability (Xie et al., 2019, Cell Commun Signal. 17:160, Ref 14 in the revised manuscript).

      (10) Line 159-161: 'we noticed a long unstructured region at its C-terminus (Figure S1b), a characteristic often associated with proteins capable of phase separation25.', again authors are generalizing a statement which is, in most cases, context-dependent. For example, ref 25 mentions that unstructured regions or IDRs serve as a scaffold for multivalent interactions.

      We agree with the reviewer and have revised this sentence to avoid generalization. The revised text now emphasizes that IDRs may facilitate multivalent interactions in a context-dependent manner, rather than being intrinsically indicative of phase separation. Additionally, we have explicitly cited the mechanistic insight from Reference 25 that IDRs serve as scaffolds for multivalent interactions, to strengthen the logical link between the structural feature and its potential functional relevance.

      (11) Methods section for NMR (Line 665-667) mentions that nucleotides were added to a final concentration of 10 mM. There is no figure or section for MORC2 NMR with added nucleotides/DNA.

      We thank the reviewer for pointing this out. The nucleotide (ATP) addition was part of preliminary NMR trials and is not directly associated with the figures presented. We have deleted this in the Methods section to avoid confusion.

      (12) Line 285-294: Authors compare the effect of DNA binding on the phase separation of both MORC2FL and MORC2 CTDdeltaCW and conclude that DNA-induced condensation is primarily mediated through interactions with the IDR-NLS region. This appears not to be backed by proper control experiments. The authors do not show whether DNA binding mediates any phase separation for the isolated NTD or not? Similarly, what is the effect of DNA binding on MORC2 deltaIDR?

      We thank the reviewer for this insightful comment and agree that additional controls are essential for rigorously dissecting the contribution of DNA binding to MORC2 phase separation. Our interpretation that DNA-enhanced condensation is primarily mediated through the IDR–NLS region was based on comparative analyses of MORC2FL and MORC2 CTDΔCW, together with EMSA results demonstrating that DNA binding activity is conferred by the IDR–NLS–containing region. We acknowledge, however, that DNA binding alone is not sufficient to infer phase separation behavior.

      To address this point, we have performed additional analyses using the isolated NTD’ (residues 1–536) and MORC2 ΔIDR–NLS mutants (Fig. S6). The isolated NTD’ exhibited detectable DNA binding [4] but did not undergo DNA-induced condensation under conditions while MORC2FL or MORC2 CTDΔCW (residues 537-1032) readily formed condensates, indicating that DNA binding by itself is insufficient to drive phase separation. In parallel, MORC2 ΔIDR–NLS mutants showed severely compromised solubility and stability in vitro, which limited their quantitative characterization in phase separation assays. Nevertheless, under the conditions tested, these mutants did not display DNA-enhanced condensation comparable to MORC2FL.

      Taken together, these observations support a model in which the IDR–NLS region plays a critical role in coupling DNA binding to condensation, while additional domains are required to sustain robust phase separation. We have revised the manuscript text to clarify the experimental scope and to avoid overinterpreting the contribution of DNA binding in the absence of fully reconstituted control systems.

      (13) How did the authors assign the backbone amide NMR chemical shifts for MORC2?

      Backbone assignments of MORC2 IBD (1004-1032) were obtained using SOFAST versions of standard triple-resonance experiments, including HNCACB and CBCACONH, recorded at 298 K. Residual assignment ambiguities were resolved using [15] N-edited HMQC-NOESY-HMQC spectra.

      (14) Line 256: 'The partial compaction of IDRa...', what does the author mean here with 'partial compaction'? How did they measure compaction here?

      Regarding the term “partial compaction” mentioned previously, we apologize for the typographical error this phrase was erroneously used in place of “key component”.

      (15) Line 312-315: Why is there even a MORC2 readout for MORC2 KO cells with only EGFP? Also, the authors suggest that IDR deletion may impair mRNA stability or transcription; however, the expression levels of MORC2 deltaIDR and MORC2 deltaCC3 do not appear drastically different in Figure 3a.

      We thank the reviewer for raising these points. The apparent MORC2 signal in MORC2 knockout cells transfected with EGFP alone is due to the presence of residual MORC2 mRNA. Although CRISPR–Cas9–mediated knockout introduces a frameshift that prevents MORC2 protein expression, the mRNA can still be detected by RNA-seq. This is because nonsense-mediated decay (NMD), which targets transcripts with premature stop codons for degradation, is not always 100% efficient. Therefore, some MORC2 transcripts remain and produce detectable RNA-seq reads, even though no functional protein is expressed.

      Regarding the apparent discrepancy in expression levels, Fig. 3a displays only EGFP-positive cells, within which the fluorescence intensity of MORC2ΔIDR and MORC2ΔCC3 appears comparable to that of WT MORC2. However, the overall fraction of EGFP-positive cells is markedly reduced for these mutants compared to WT. Thus, while expression levels among successfully transfected cells are similar, fewer cells express detectable levels of the ΔIDR or ΔCC3 constructs across the total population. We therefore interpret this reduction in EGFP-positive cell fraction as reflecting impaired expression efficiency of these mutants, potentially arising from altered transcriptional output, mRNA stability, or protein stability. We have revised the manuscript text to clarify this distinction and to avoid overinterpreting the underlying mechanism in the absence of direct measurements.

      Author response image 1.

      EGFP, EGFP–MORC2 (FL), EGFP–MORC2 (ΔCC3), and EGFP–MORC2 (ΔIDR) were re-expressed in MORC2-knockout HeLa cells. Confocal imaging revealed that full-length MORC2 formed condensates in the nucleus, whereas mutants lacking either the CC3 or IDR domain failed to exhibit such behavior. Notably, under identical experimental conditions, we observed a marked reduction in the transfection efficiency of the EGFP-MORC2 (ΔIDR) construct. In contrast to the other variants, EGFP signals for ΔIDR were detectable in only a small fraction of the total cell population, despite consistent DNA loading and protocol synchronization. This observation suggests that the IDR might be required not only for biomolecular condensation but also for maintaining the steady-state levels of the MORC2 mRNA/protein or overall cellular fitness.

      (16) Line 330: 'MORC2 deltaCC3 failed to repress any of the 18 downregulated targets...'. This does not appear to be entirely true as repression of some targets (LBH, TGFB2, GADD45A) are closer to MORC2 FL than the EGFP control.

      We thank the reviewer for pointing out this inconsistency and for highlighting the need for precise wording. We have updated the dataset and revised the text to describe the results more accurately. We now describe that the mutants impair MORC2FL-mediated transcriptional regulation, consistent with the overall trend observed across these target genes.

      (17) Line 347-350: Based on the percent of cells with condensates, the authors conclude that CMT2Z-linked E236G and SMA-linked T424R mutants promote MORC2 phase separation. Again, the effect of these mutations on MORC2 condensation in cells may be direct or indirect. This can be investigated by comparing the in vitro effect of these mutations on MORC2 phase separation.

      We thank the reviewer for raising this important point and fully agree that the effects of disease-associated MORC2 mutations on condensate formation in cells may arise from either direct alteration in intrinsic phase separation propensity or indirect influences mediated by the cellular environment.

      In our study, disease-associated MORC2 mutants were assessed for condensate formation in HEK293F cells. Attempts were made to characterize these mutants in vitro; however, the E236G mutant exhibited markedly reduced solubility and stability upon purification, which precluded reliable in vitro phase separation analysis. We therefore evaluated the impact of E236G in cells and found that this mutation significantly impaired the dynamics of nuclear MORC2 condensates. For the T424R mutant, we note that its intracellular condensates displayed FRAP recovery kinetics comparable to those of WT MORC2, suggesting broadly similar dynamic properties of the assemblies formed in cells, but not necessarily implying a direct enhancement of intrinsic phase separation.

      In light of these considerations, we have revised the text in Lines 347–350 to avoid attributing a direct causal role of these mutations in promoting MORC2 phase separation. Instead, we now describe the observed increase in the fraction of cells containing condensates as a descriptive cellular correlation. We further emphasize that systematic in vitro characterization of disease-associated MORC2 mutants will be required to distinguish direct from indirect effects and represents an important direction for future investigation.

      (18) The discussion section lacks referencing to individual figures in the results section as well as previous literature.

      We agree with the reviewer that the Discussion would benefit from clearer integration with both the Results figures and prior literature. In the revised manuscript, we have substantially restructured the Discussion to explicitly reference key figures when interpreting experimental findings and to more clearly distinguish conclusions drawn from specific datasets. In addition, we have expanded citations to previous studies where relevant, particularly in the context of MORC2 DNA binding, ATPase regulation, chromatin association, and disease-linked mutations. These revisions aim to better situate our findings within the existing literature and to guide readers more clearly between experimental observations and their interpretation.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Zhang et al. demonstrates that MORC2 undergoes liquid-liquid phase separation (LLPS) to form nuclear condensates critical for transcriptional repression. Using a combination of in vitro LLPS assays, cellular studies, NMR spectroscopy, and crystallography, the authors show that a dimeric scaffold formed by CC3 drives phase separation, while multivalent interactions between an intrinsically disordered region (IDR) and a newly defined IDR-binding domain (IBD) further promote condensate formation. Notably, LLPS enhances MORC2 ATPase activity in a DNA-dependent manner and contributes to transcriptional regulation, establishing a functional link between phase separation, DNA binding, and transcriptional control. Overall, the manuscript is well-organized and logically structured, offering mechanistic insights into MORC2 function, and most conclusions are supported by the presented data. Nevertheless, some of the claims are not sufficiently supported by the current data and would benefit from additional evidence to strengthen the conclusions.

      Thank you for your insightful review and constructive suggestions, which have been invaluable in refining our manuscript.

      The following suggestions may help strengthen the manuscript:

      Major comments:

      (1) The central model proposes that multivalent interactions between the IDR and IBD promote MORC2 LLPS. However, the characterization of these interactions is currently limited. It is recommended that the authors perform more systematic analyses to investigate the contribution of these interactions to LLPS, for example, by in vitro assays assessing how the IDR or IBD individually influence MORC2 phase separation.

      We appreciate the reviewer’s insightful comment regarding the characterization of IDR–IBD interactions. In this study, we combined NMR spectroscopy, domain deletion analysis (in vivo), and in vitro phase separation assays to demonstrate that interactions between the IDR and IBD contribute to MORC2 condensate formation. To systematically assess the individual contributions of the IDR and IBD to MORC2 phase separation, we performed in vitro reconstitution assays using purified domain constructs (Fig. S6). Neither the isolated IDR nor the IBD alone exhibited phase separation under buffer conditions approximating the physiological environment, indicating that each domain is individually insufficient to drive condensation. Upon the addition of 10% PEG8000, phase separation was selectively observed for the IDR but not for the IBD, suggesting that the IDR possesses an intrinsic propensity for phase separation that can be enhanced by crowding molecular. Importantly, when the IDR and IBD were mixed, phase separation was robustly induced, supporting a model in which cooperative inter-domain interactions between the IDR and IBD promote MORC2 condensation. In the absence of PEG, no phase separation was observed for the IDR–IBD mixture. These observations imply that IDR–IBD interactions cannot drive phase separation on their own, but require cooperation with CC3-mediated dimerization to achieve this process, which is the central point we wish to emphasize.

      (2) The authors mention that DNA binding can promote MORC2 LLPS. It is recommended that they generate a phase diagram to systematically assess how DNA influences phase separation.

      We agree that constructing a full phase diagram would provide a more systematic evaluation of the effect of DNA on MORC2 phase separation. In the current study, we assessed DNA-dependent condensation across multiple protein and DNA concentrations, which consistently showed that DNA enhances MORC2 phase separation. At low protein concentration (0.5 µM), phase separation requires sufficient DNA, whereas increasing either DNA or protein concentration promotes liquid droplet formation. At high DNA and protein concentrations, amorphous structures dominate, indicating a transition away from dynamic assemblies. We have clarified this point in the Results and Discussion sections and now note that a comprehensive phase diagram analysis represents an important direction for future work.

      (3) The authors use the N39A mutant as a negative control to study the effect of DNA binding on ATP hydrolysis. Given that N39A is defective in DNA binding, it could also be employed to directly test whether DNA binding influences MORC2 phase separation.

      We thank you for your constructive suggestions. The purified wild-type MORC2(1–603) exhibited weak but detectable ATPase activity, whereas the N39A mutant was completely inactive [5]. Based on this characteristic, the N39A mutant was used as a negative control for the ATP-binding-deficient mutant in this study [3]. However, no evidence has been provided to demonstrate that the N39A mutant is defective in DNA binding. Importantly, both our results and previous studies [5-6] indicate that MORC2 engages DNA via multiple domains, suggesting that a single-point mutation is unlikely to significantly compromise its overall DNA-binding capacity.

      (4) Many of the cellular and in vitro LLPS experiments employ EGFP fusions. The authors should evaluate whether the EGFP tag influences MORC2 phase separation behavior.

      We appreciate the reviewer’s concern regarding the potential influence of the EGFP tag. The use of EGFP fusions in our study was primarily to maintain consistency with the in-cell experiments. Importantly, we confirmed that EGFP alone does not undergo phase separation in cells, and this observation is consistent with previous studies [7]. Additionally, in vitro phase separation of MORC2 was independently validated using Cy3–labeled CTD (Fig. S5), which recapitulated the condensate formation seen with EGFP-fused protein. Together, these results indicate that the EGFP tag does not significantly influence MORC2 phase separation, supporting the validity of our conclusions.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors claim to have obtained nucleic acid-free protein, but no data are provided to support this assertion. It is recommended that they include appropriate validation to confirm the absence of nucleic acids.

      We thank the reviewer for highlighting this point. To validate that the purified MORC2 protein is indeed free of nucleic acid contamination, we have additional experimental evidence (e.g., A260/280 measurements, agarose gel analysis, or EMSA in Fig. 5), which has been added to the Methods section and Table S2.

      Note: Agarose gel analysis for MORC2 constructs to confirm the absence of nucleic acids. The pET32 vector as the positive control, the protein preparation for analysis is 0.05 mg. E means E. coli and H means HEK293F.

      (2) The FRAP recovery curves are not normalized to 0, making comparison difficult. The authors should normalize the post-bleach intensity to 0 and re-plot the curves to allow a more standard interpretation of mobile fractions.

      We agree with the reviewer and have now normalized the FRAP recovery curves by setting the post-bleach intensity to 0. The revised plots are presented in the Figures (2f, j, l; 6c, 7f), allowing for more direct comparison of mobile fractions across different conditions.

      (3) The HSQC spectra for IBD appear inconsistent: the peak positions in Fig. 4C do not align with those shown in panels D-F. The authors should verify the spectral assignments and ensure consistency across figures.

      We thank the reviewer for pointing this out. The apparent inconsistency arose from the fact that different spectral regions were displayed in Fig. 4c versus Fig. 4d-f for visualization purposes, which may have given the impression of mismatched peak positions. The spectral assignments themselves are consistent across all panels.

      To avoid confusion, we have now adjusted the spectral window shown in Fig. 4c to match that used in Fig. 4d-f. The revised figure ensures consistent presentation of the same spectral region across all panels.

      Reference:

      (1) Zhang, Y., Stöppelkamp, I., Fernandez-Pernas, P. et al. Probing condensate microenvironments with a micropeptide killswitch. Nature 643, 1107–1116 (2025).

      (2) Fendler NL, Ly J, Welp L, et al. Identification and characterization of a human MORC2 DNA binding region that is required for gene silencing. Nucleic Acids Res.53(4):gkae1273 (2025).

      (3) Tchasovnikarova, I., Timms, R., Douse, C. et al. Hyperactivation of HUSH complex function by Charcot–Marie–Tooth disease mutation in MORC2. Nat Genet 49, 1035–1044 (2017).

      (4) Douse, C. H. et al. Neuropathic MORC2 mutations perturb GHKL ATPase dimerization dynamics and epigenetic silencing by multiple structural mechanisms. Nat Commun 9, 651 (2018).

      (5) Tan, W., Park, J., Venugopal, H. et al. MORC2 is a phosphorylation-dependent DNA compaction machine. Nat Commun 16, 5606 (2025).

      (6) Sánchez-Solana B, Li DQ, Kumar R. Cytosolic functions of MORC2 in lipogenesis and adipogenesis. Biochim Biophys Acta. 1843(2):316-326 (2014).

      (7) Li, C.H., Coffey, E.L., Dall’Agnese, A. et al. MeCP2 links heterochromatin condensates and neurodevelopmental disease. Nature 586, 440–444 (2020).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Behavioral labels rely on video-based scoring, which may not fully capture subtle or hidden movements.

      This is very true; certainly, this work is only a starting point. But the techniques used for this manuscript, despite starting with video-based scoring, specifically did allow us to differentiate behaviors that were too subtle to recognize in the video. For the revision, we will describe how this work leads to future studies in which we will be able to explore other means of collecting behavioral labels, potentially directly from simultaneous recordings of multiple muscles.

      (2) The relationship between brain activity and behavior is correlational, but sometimes interpreted more strongly.

      We will comb through the manuscript and make edits to be more precise and technically correct in presenting this relationship, and clarify that our suggestion of a causal link is only indirect and related to previous work (Mukherjee et al. 2019).

      (3) The manuscript could be clearer and more accessible to readers outside the field.

      We will edit the manuscript in multiple places to make technical and field-specific aspects more accessible. As part of this, in appreciation of Reviewer 2’s comments, we will take additional care to elaborate on and clarify our need and interpretation of SHAP values and classifier structure.

      Reviewer #2 (Public review):

      (1) I have several concerns regarding the methodological comparisons used to establish the superiority of the proposed XGBoost classifier. In particular, the comparison between the XGBoost classifier and previously used QDA approaches (Figure 3) may not be entirely well-matched. The QDA framework was originally designed primarily to detect gape events and does not explicitly assign labels to MTM movements. As a result, the apparent advantage of XGBoost in identifying MTMs may partly reflect differences in task formulation rather than intrinsic differences in classification performance. From visual inspection, gape detection performance appears broadly comparable across methods.

      A more informative benchmark would involve comparing XGBoost to an extended pipeline in which QDA-based gape detection is combined with a secondary movement-detection stage, distinguishing MTMs from periods of no movement. Such a comparison would better isolate the contribution of classifier architecture per se. Without this control analysis, the strength of the claim that XGBoost provides superior performance for behavioral decoding remains somewhat uncertain.

      The revision will further clarify that, as the reviewer notes, the primary improvement in XGB classification compared to QDA (in multi-class aggregated metrics) comes specifically from its ability to classify MTMs, and that for gapes, both QDA and XGB perform on par. We will be more explicit about the fact that our goal in constructing the classifier is not to compare “classifier architecture”—not to find the very best classifier possible—but rather to take the next step by generating an instance of a classifier that performs demonstrably better on aggregated orofacial movements. We will update the manuscript to be more clear in our claims in this regard, and how the current XGB classifier can, once validated, be bootstrapped by future techniques (possibly using more informative data sources) to more fully characterize orofacial movements.

      (2) The presentation of the neural ensemble analyses is considerably less comprehensive and intuitive than that of the behavioral analyses. The manuscript would benefit from more direct visualization of inferred neural state transitions. For example, plotting predicted neural states in a manner analogous to the behavioral states illustrated in Figure 6B would improve interpretability and help readers understand how neural dynamics relate temporally to behavioral changes.

      In addition, the interpretation that GC ensemble dynamics drive behavioral state transitions may require further clarification. If GC activity plays a causal role in initiating behavioral changes, one might expect a consistent brain-to-behavior lag across changepoints. However, Figure 6 appears to show such lag primarily at the second transition but not at the first. This raises questions about how uniformly the proposed causal interpretation applies across state boundaries, and additional analysis or discussion is needed.

      We are happy to update the figures (likely by adding another panel to Figure 6) to clearly show inference of neural state transitions, in a manner similar to how we have shown behavioral state transitions in Fig. 6B. In addition, we will do a more comprehensive job of describing and referencing earlier work in which we have unpacked these analyses in greater detail—work that makes it clear why we would predict a lag-relationship for one set of change points and not the other.

      (3) The neural ensemble analyses primarily focus on constructing higher-level behavioral state variables rather than directly testing how individual movement subtypes relate to neural activity. The behavioral interpretation of the inferred state structure, therefore, remains somewhat unclear. While this approach is consistent with previous work from the authors and with broader state-transition frameworks of gustatory processing, it is not immediately obvious that this is the most informative level of analysis for the present dataset.

      In particular, it would strengthen the manuscript to examine whether GC neurons or ensembles also encode lower-level motor structure, such as the occurrence of gapes or specific MTM subtypes. Demonstrating selective or mixed encoding across hierarchical levels (movement motifs versus abstract behavioral states) would help clarify the functional interpretation of the reported neural dynamics. At present, the manuscript largely assumes that GC activity reflects higher-order behavioral states without directly testing alternative representational possibilities.

      The reviewer makes a good point. While previous work from the lab (Li et al. 2016) has assessed the relationship of GC activity with both the onset of gaping (i.e., the behavioral state transition) and individual gapes and found only a relationship with onset of gaping (findings that we now explicitly describe in the revision), we have not performed a similar analysis for MTMs. We will do so and add it to the paper.

      (4) Because direct behavioral ground truth for intra-oral ingestive movements is difficult to obtain, MTM subtypes are inferred primarily through clustering of EMG waveform features. Although the authors demonstrate statistical separability and cross-session stability of these clusters, it remains unclear whether they correspond to discrete motor programs or instead reflect a structured partitioning of a continuous behavioral space shaped by feature selection and preprocessing choices. Perhaps some additional robustness analyses or convergent validation (e.g., alternative clustering methods, feature perturbation tests, or stronger neural and behavioral dissociations) would help clarify the biological significance of the inferred subtype structure.

      We admit (in fact, we have done so in the text) that we are not yet to the point of being able to “split hairs” to this degree (although we, like R2, see that as a goal). In the meantime, we will expand the section of Results text in which we describe the fact that the clustering of behaviors is observed both in “waveform space” (Fig. 4E was generated using standardized waveforms) and “feature space” (Fig. 4 B,C, and F), and that as such the clusters are NOT simply a partitioning of continuous, unimodal behavioral space. We will report convergent results from alternative (k-means) clustering methods to further support that conclusion. Finally, we will describe (in the Discussion section) ways to more rigorously test and extend this claim in future work.

      Reviewer #3 (Public review):

      Some aspects of the EMG-based movement classification pipeline warrant careful interpretation. The training dataset used for classifier development is relatively small and is derived from a subset of trials in which mouth movements were clearly visible in video recordings. While the classifier performs well on this labeled dataset, it is not entirely clear how representative these labeled examples are of the full range of EMG signals present in the larger dataset.

      Very good point. We will update the text to note this qualification to the reader. We will also, however, highlight the fact that our focus on a highly reliable and representative (i.e., agreed upon by 2 independent, blind scorers) subset of labels allows us to perform more targeted analyses and make more targeted interpretation in our results. And we will also be more pointed in the revision, as we have noted above, about the fact that this work is only scratching the surface of what can be accomplished in this domain, and that future work will involve STARTING with the waveforms that aren't accounted for in terms of gapes and MTMs.

      The interpretation of the three identified MTM subtypes also remains somewhat tentative. The study convincingly demonstrates that distinct waveform-defined clusters exist in the EMG data, but the functional significance of these clusters as ingestive "behaviors" is less clear. As acknowledged by the authors, the specific roles of these movement patterns in the ingestion process remain speculative.

      We share R3’s desire for clarity on this point—we do not wish to imply that we understand more than we understand—and will be sure to fine-tune our language to make clearer and more explicit the fact that the distinction in the roles of the MTM subtypes in ingestion at this point remains speculative.

      Finally, several conclusions in the Discussion rely on relatively strong mechanistic language when describing the relationship between GC dynamics and ingestive behavior. The data clearly demonstrate a temporal association between GC state transitions and changes in the frequencies of the different MTM subtypes. However, the results primarily support the interpretation that similar cortical dynamics are associated with ingestive and rejection-related behaviors rather than definitively establishing that these behaviors "are governed by the same underlying neural mechanisms".

      We will soften our language to clarify which of our Discussion suggestions are speculation, highlighting for the reader the fact that our data, while consistent with evidence suggesting a causal link between the GC transition and gaping (Li et al., 2016; Mukherjee et al., 2019), do not prove a causal neural-behavioral link for MTMs.

      References:

      Li, Jennifer X., et al. “Sensory Cortical Activity Is Related to the Selection of a Rhythmic Motor Action Pattern.” The Journal of Neuroscience, vol. 36, no. 20, May 2016, pp. 5596–607. DOI.org (Crossref), https://doi.org/10.1523/JNEUROSCI.3949-15.2016.

      Mukherjee, Narendra, et al. “Impact of Precisely-Timed Inhibition of Gustatory Cortex on Taste Behavior Depends on Single-Trial Ensemble Dynamics.” eLife, edited by Laura L. Colgin et al., vol. 8, June 2019, p. e45968. eLife, https://doi.org/10.7554/eLife.45968.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary: 

      The authors have used a macaque (two animals only) to follow the migration of 'seeded' TDP43 protein in neuronal pathways - thus mimicking the spread of ALS in the human CNS. Previous experiments in rodents failed to demonstrate this, posing interesting and important biological differences, possibly related to the UMN-LMN system in higher order apes and humans. 

      Strengths: 

      An important step forward. 

      Weaknesses: 

      No weaknesses were identified by this reviewer. Only 2 animals were used, but that is appropriate given the sensate status of the macaque. In the opinion of this reviewer, the results are entirely convincing. 

      Reviewer #2 (Public review): 

      Summary: 

      There are astonishingly few papers trying to reproduce the process of initiation and spreading that Braaks studies have suggested and postulated. The authors should be applauded for pioneering such a difficult experiment. They overexpressed the TDP-43 protein in the motor neuron pool of the brachioradialis muscle and showed that by this technique, motor neurons in this pool died, and the muscle got denervated. They had evidence of a spreading process from the spinal cord to the cortex, demonstrated by showing widespread deposits of phosphorylated TDP-43 bilaterally in the cervical cord and the motor cortex. By their experiment, they created a dying-backwards model, not a model of corticofugal spread, like that shown by Braak. No muscle weakness was observed, not even in the brachioradialis. 

      Strengths: 

      The strength of this innovative study is the fact that this spreading experiment uses the phylogenetically young connectome of primates (macaques). They also made the thought-provoking observation of spreading from the cord to the motor cortex, not the corticofugal spread model observed by Heiko Braak. This is thought-provoking because this enables the observer to compare their model with the findings in humans. 

      Weaknesses: 

      The following aspects are not a weakness but need to be better explained for the interested reader - and potentially improved in future studies for which the authors laid the foundation: 

      (1) Why do the authors use the brachioradialis motor neuron pool to overexpress TDP-43? More is known about other muscles and how they are embedded in the motor connectome of primates. Why not the biceps brachii or the hand extensors or - even better - the small muscles of the hand? These are known to be strongly monosynaptically connected with the motor cortex. The authors should explain this. I am unclear if there was a specific reason which I did not see or understand. In my view, the brachioradialis is not the best representative of the primate connectome, for example, to examine this model and compare it with the corticofugal spread. 

      The brachioradialis muscle was chosen primarily for reasons of animal welfare; our concern when designing the experiments was that the muscle we chose for injection might become very wasted and weak before the experiment had been completed. If we had injected a hand muscle, this would have affected manipulation, feeding and grooming behaviours, whereas had we injected biceps brachii or forearm extensors, this would have affected more important behaviours requiring strength for body support in the home cage (e.g. climbing, swinging, etc.). The advantage of choosing brachioradialis is that there is some functional redundancy; in macaques, compared to biceps brachii, brachioradialis has a relatively minor role in elbow flexion and supination of the forearm. We therefore reasoned that there should be physiological compensation for any weakness in brachioradialis, and thus minimal effects on normal behaviour.

      A secondary practical consideration was the importance of good quality MR imaging of the injected muscle and the positioning of the focussing coil; because of the physical constraints related to the monkey sitting in our narrow-bore scanner, the forearm muscles were the optimal choice. 

      With reference to the ‘primate connectome’, whilst hand muscles are known to have strong cortico-motoneuronal connections, we have shown previously that monosynaptic corticomotoneuronal connections are as strong in muscles innervated by the deep radial nerve (like brachioradialis) as in intrinsic hand muscles (Witham et al, 2016).

      Finally, for the purposes of these experiments, all we required was a method for inoculating TDP-43 into a motor neuron pool within the spinal cord, without direct surgical trauma to the spinal cord. Our aim was to test the hypothesis that extracellular TDP-43 is sufficient to cause spreading neuronal changes in macaque, similar to those observed in human ALS/MND; our aim was not to replicate the actual pattern of human MND observed clinically.

      These points will be addressed in a revised version of the manuscript. 

      (2) In the Braaks experiment, only (seemingly soluble) non-phoshorylated TDP-43 "crossed" synapses. Phosphorylated TDP-43 did not do this. The authors of this study saw phosphorylated TDP43 in motor neurons and the cortex. Is there any potential explanation for how it crosses synapses? If it really does, there is an obvious difference to the human situation which needs to be emphasized and explained (in the future). 

      To clarify, there was no evidence of phosphorylated TDP-43 crossing synapses. It is more likely that excess non-phosphorylated TDP-43 crossed synapses, and that this then subsequently led to TDP-43 phosphorylation.  

      (3) There were significant deposits of phosphorylated TDP-43 in oligodendrocytes in humans. Whilst I understand that one experiment cannot solve every question - I am curious about whether the authors saw anything in oligodendrocytes? 

      We have not looked at this.

      (4) Which was the pattern of damage? Of course, this pattern is not likely to have a monosynaptic pattern - like in humans........but was there a pattern? Did it have a physiologically meaningful basis? Was there any relation to the corticofugal monosynaptic pattern? What are the differences? The authors speak of "multiple waves". Does this mean that if this were a corticofugal model, for example, oculomotor neurons would also degenerate? 

      The description of ‘multiple waves’ in paragraph 2 of the discussion section is entirely hypothetical, based on the assumption that there are different mechanisms by which TDP-43 spreads through the nervous system, from slow local spread by diffusion to more rapid long-range axonal spread to widely separated regions. For the neuropathological staging analysis, we therefore looked at different brain regions (hypoglossal nuclei, reticular formation, inferior olives, frontal cortex, temporal cortex and hippocampal formation). This analysis only showed loss of motor neurons in the spinal cord ipsilateral to the side of the muscle injections, in segments consistent with the location of brachioradialis motoneurons. We did not demonstrate a ‘pattern of damage’ as described in humans in our experiments because this is a pre-symptomatic pre-clinical model, with no established ‘damage’ from each wave. We speculate that this is because animals were terminated too early in the disease process.

      However, whilst there was no established neuronal degeneration outside the cervical spinal cord, the observation that there were more pTDP-43 positive Betz cells in left (contralateral to the brachioradialis injection) New M1 than Old M1 (see Figure 6I and J) would support spread via monosynaptic connections to motoneurons; New M1 is where most monosynaptic cortico-motoneuronal connections originate.

      Reviewer #3 (Public review): 

      Summary: 

      In this paper by Jones and colleagues, a non-human primate model is described in which wild-type TDP-43 is expressed in the cervical spinal cord. This gave rise to loss of motor neurons in the ventral horn at that level in the cervical spinal cord. MRI of the muscles allowed to see increased intensity in the mostly affected brachioradialis muscle, suggesting this muscle becomes denervated. At the neuropathological level, TDP-43 and pTDP-43 staining in the cytoplasm is increased, not only at the specific level of the cervical spinal cord, but also at a distance. 

      Strengths: 

      A clear strength is the state-of-the art focal expression of the TDP-43 transgene at a focal site in the cervical spinal cord. This is achieved by combining a general expression of a flipped loxP flanked TDP-43 vector using AAV9 intrathecal administration, followed by an intramuscular AAV2 hSyn CRE-TdTomato vector in the brachioradialis muscle in order to induce focal recombination and expression of TDP-43 in motor neurons innervating this muscle on one side. 

      Another strength is the non-human primate background, which is much closer to the human situation. 

      Weaknesses: 

      Given the complexity and cost of the model, the n is very low. 

      As is common in most studies in non-human primates, we have carried out all statistical analysis within one animal (e.g. the comparison of motoneuron numbers between left and right cord). We then show that results are reproducible in two animals. Although the number of animals is lower than in a typical rodent study, we see this as an advantage of the model, adhering to the 3Rs principle of ‘reduction’.

      The design of the experiments and the results shown about the toxicity induced by this focal TDP-43 expression do not allow us to conclude that it is a good model for ALS for several reasons. It is not clear that the TDP-43 overexpression results in spreading weakness or in spreading motor neuron loss. The neuropathological changes described suggest that there is a kind of stress response, which extends to regions away from the site of primary damage, but more is needed to provide convincing evidence that there is spreading of disease pathology reminiscent of human ALS. 

      As already noted in our response to Reviewer 2 (point 1), animal welfare is an important consideration when designing these complex experiments in primates. We could not therefore justify allowing the animals to survive until extensive wasting and weakness were evident, recapitulating the human disease. 

      The model developed in these experiments is therefore a pre-symptomatic pre-clinical model, in which animals are terminated before pathology leading to widespread motor neuron loss is evident. At post mortem we do have evidence of motor neuron loss in the segments supplying brachioradialis (C4-C8).

      Stress of various forms, including blunt trauma (e.g. Anderson et al, 2021), stab/electrode insertion injury (e.g. Zambusi et al, 2022), chemical (e.g. arsenite) exposure (e.g. Huang et al, 2024), or hypoxia (Marcus et al, 2021) can result in pathological nucleocytoplasmic translocation of TDP-43. In our model, there was no direct trauma to the brain or spinal cord ante mortem, excluding one major cause of tissue stress. Hypoxia during the process of euthanasia is possible, but we would expect there would not be enough time before death for this to manifest as TDP-43 translocation. In the literature TDP-43 translocation due to stress is diffuse; we have demonstrated that in our model the TDP-43 pathology is not diffuse but selective. For example, there was no evidence of disease in the oculomotor nuclei; in the primary motor cortex (M1) there are significantly more pathological changes in the evolutionarily younger ‘NewM1’ compared to the neighbouring ‘OldM1’.

      It is therefore improbable that our findings could be explained by ‘a kind of stress response’. Our findings are better explained by spread of the TDP-43 protein.

      Reviewer #4 (Public review): 

      Summary: 

      In this manuscript, the authors present data describing the development of a model of ALS in rhesus macaques. They use a viral intersectional model to overexpress TDP-43 in a population of motor neurons and then study the spread of the pathology about 7 months later. They demonstrate that both the cervical spinal cord and motor cortex (new and old M1) are full of TDP-43, suggesting that the pathology spreads from the single motor pool to presumably related neurons. 

      Strengths: 

      This is a super-important study in two main ways: 

      (1) This could be the birth of a really important model, one that is really needed for making progress in understanding ALS and the development of therapeutics. There are shortfalls with all the rodent models. Models dependent on cell cultures are superb for understanding cell-autonomous processes, but miss out on connectivity, particularly the long-range connectivity. Organoids may ultimately prove to be beneficial, but they would need cortex, spinal cord, and muscle, and translatability from them is not assured. So a NHP model is needed, and this may be it.

      Furthermore, the Methods are meticulously described and will undoubtedly facilitate reproducibility. 

      (2) The concept of the spread of pathology has been proposed for some time, I think, based initially on the detailed clinical observations of Ravits and colleagues. The authors have looked at this directly and provide supporting evidence for this interesting hypothesis. They show spread locally and contralaterally in the spinal cord (although a figure would be nice) and to the motor cortex. 

      Taking only these 2 points into account is more than sufficient for me to be enthusiastic about this work. 

      Weaknesses: 

      I'd like to make a couple of points that if addressed, could, in my view, help the authors strengthen this work. 

      (1) We don't know how many MNs were transduced by the rAAV. There was no tdTom expression, for whatever reason. The authors show an image of a control experiment with a single MN transduced, but there should be a red motor pool, at least in the control experiments. The impression that I get is that very few were transduced, and, in my mind, this makes the findings even more interesting - maybe you don't need many "starter" MNs. 

      Unfortunately, we cannot know how many motoneurons were transduced.

      However, the reviewer may be correct, that it is actually only a small fraction of the brachioradialis pool. This is supported by the evidence for rather focal denervation seen on MRI.

      (2) Continuing on this point, this leads the authors to conclude that all BR MNs have died. They support this by the reduced MN count (see point 3). Firstly, do we know how many BR MNs there are in the rhesus macaque, and does the reduction seen correspond to this number? Secondly, and more importantly, the muscle looks normal on MRI at 28 weeks - it does not look like a denervated muscle. The authors state that it has maybe been reinnervated, but by what, if all the BR MNs are dead? This does not seem like a plausible explanation to me. Muscle histology, NMJs, and fibre typing would have been useful to understand what's going on with the MNs. (And electrophysiology would have been wonderful, but beyond the scope of this study.) 

      To clarify, we did not conclude that all brachioradialis motor neurons had died, rather that all transfected brachioradialis motor neurons pool had died. As noted above, when these cells die and the muscle is denervated, the MRI signal changes occupy only a small volume of the muscle and are transient. We would not expect to see long-term MRI changes in muscle anatomy after this limited denervation-reinnervation event. 

      Analysis of muscle histology, including fibre typing, is outwith the scope of this initial paper reporting the model; we hope that this will form the basis of a future publication.

      (3) Some MN biologists, like me, fuss a lot about how to count MNs, which is almost as difficult as counting the number of angels on the head of a pin. Every method has its problems. Focusing on the two methods here: (a) ChAT immunohistochemistry is pretty good in healthy states, but we don't know what happens to ChAT expression in different diseases, particularly when you have a new model. If its expression is decreased, then it is not a good marker for MNs; (b) Identifying MNs based on the size and morphology of neurons in the ventral horn is also insufficient. For example, ~30% of neurons in a typical pool are small gamma MNs, and a significant proportion (depending on the muscle) of the remainder will be small alpha MNs. So what one is counting is, at best, the large alpha MNs, not all the MNs in a pool. And in ALS, it's these largest MNs that are affected at the earliest stages. The small ones might be fine. So results will be skewed. (Hence, it would be interesting to see if the muscle had a higher proportion of Type I fibres after being reinnervated by S-type MNs.) 

      This is an interesting point, and we agree that each method used to quantify MN number carries its own limitations. The problem of MN identification is heightened in a MND-like pathological state, especially when considering evidence of reduced ChAT activity in spinal motoneurons in end-stage disease in post mortem human samples (Oda et al, 1995), and more recent evidence from Casas et al. (2013), who demonstrated early presymptomatic reduction in ChAT expression in SOD1G93A mice. It is important to note that this was a modest reduction, not complete abolition of signal (76% of control levels). ChAT immunoreactivity was still present and motor neurons were still identifiable as ChAT-positive at this pre-clinical stage of disease. As counts in our study were performed based on detecting ChAT in cells, it seems unlikely that we would miss cells. However, we cannot rule this out. If indeed this did occur, it would mean that the reduced motoneuron counts which we observed reflect not only cell death, but also profound motoneuron dysfunction which is presumably the proximal precursor to cell death.

      We acknowledge that size-based criteria applied to ChAT-positive neurons will preferentially capture large alpha motor neurons, and that gamma motor neurons and small alpha motor neurons are likely underrepresented in our counts. Our counts therefore reflect the large alpha motor neuron population rather than the total motor neuron pool. We believe that this is not a critical limitation in the context of the present study. Large alpha motor neurons are the population of primary pathological interest in ALS and related MND, being the earliest and most severely affected subtype. The selective vulnerability of fast-fatigable large alpha motor neurons in ALS is well established, and their preferential loss is the defining feature of disease progression in both human post mortem tissue and rodent models (Lalancette-Hébert et al., 2016). In this respect, our size threshold selects for precisely the population whose degeneration is most relevant to the disease phenotype we are modelling. 

      We intend to include comments on these important points in the revised version of the manuscript.

      In response to the final point regarding muscle histology and proportions of Type I fibres, as stated above, reporting of muscle histology, including fibre typing, is planned for a separate publication.

      (4) Statistics. These are complex experiments looking at the spread of a disease. The experimental unit is therefore the monkey, n=2. In each monkey, multiple sections are analysed, which are key technical replicates and often summative. For example, do we care about the average cell number in Figures 4D, E, 5 I, J or 6G, H, or rather the total cell number? Do the error bars mean anything? To be clear, I am by no means minimising the importance of the overall convincing findings. But I do not think this statistical analysis is particularly meaningful. 

      Here, the experimental unit is the tissue slice, mounted on a slide for histological analysis, and not the monkey. All statistical comparisons are made within a single animal. We then show that the findings can be replicated in two animals, both of which show significant results. This is standard approach taken in primate neuroscience, given the need to reduce animal numbers to the minimum consistent with producing convincing results.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Shan et al seeks to define the role of the CHI3L1 protein in macrophages during the progression of MASH. The authors argue that the Chil1 gene is expressed highly in hepatic macrophages. Subsequently, they use Chil1 flx mice crossed to Clec4F-Cre or LysM-Cre to assess the role of this factor in the progression of MASH using a high fat high, fructose diet (HFFC). They found that loss of Chil1 in KCs (Clec4F Cre) leads to enhanced KC death and worsened hepatic steatosis. Using scRNA seq they also provide evidence that loss of this factor promotes gene programs related to cell death. From a mechanistic perspective they provide evidence that CHI3L serves as a glucose sink and thus loss of this molecule enhances macrophage glucose uptake and susceptibility to cell death. Using a bone marrow macrophage system and KCs they demonstrate that cell death induced by palmitic acid is attenuated by the addition of rCHI3L1. While the article is well written and potentially highlights a new mechanism of macrophage dysfunction in MASH and the authors have addressed some of my concerns there are some concerns about the current data that continue to limit my enthusiasm for the study. Please see my specific comments below.

      Major:

      (1) The authors' interpretation of the results from the KC (Clec4F) and MdM KO (LysMCre) experiments is flawed. The authors have added new data that suggests LyM-Cre only leads to a 40% reduction of Chil1 in KCs and that this explains the difference in the phenotype compared to the Clec4F-Cre. However, this claim would be made stronger using flow sorted TIM4hi KCs as the plating method can lead to heterogenous populations and thus an underestimation of knockdown by qPCR. Moreover, in the supplemental data the authors show that Clec4f-Cre x Chil1flx leads to a significant knockdown of this gene in BMDMs. As BMDMs do not express Clec4f this data calls into question the rigor of the data. I am still concerned that the phenotype differences between Clec4f-cre and LyxM-cre is not related to the degree of knockdown in KCs but rather some other aspect of the model (microbiota etc). It woudl be more convincing if the authors could show the CHI3L reduction via IF in the tissue of these mice.

      We thank the reviewer for these constructive comments. We have performed FACSsorting of KCs (CD45<sup>+</sup> F4/80<sup>hi</sup> CD11b<sup>low</sup> TIM4<sup>hi</sup>) or MoMFs (CD45<sup>+</sup> F4/80<sup>low</sup> CD11b<sup>hi</sup> Ly6G<sup>-</sup> TIM4<sup>-</sup>) from Chil1<sup>fl/fl</sup> and Lyz2<sup>∆Chil1</sup> or Clec4f<sup>∆Chil1</sup>mice, respectively. Compared with Chil1<sup>fl/fl</sup> mice, mRNA levels of Chil1 was reduced more than 90% in KCs from Clec4f<sup>∆Chil1</sup> mice while not different in MoMFs (Revised Figure S3B). Besides, compared with Chil1<sup>fl/fl</sup> mice, mRNA levels of Chil1 was reduced more than 90% in MoMFs from Lyz2<sup>∆Chil1</sup> mice while roughly 40% in KCs (Revised Figure S5B). This revised data support the phenotypic difference between Lyz2-CKO and Clec4f-CKO mice.

      We agree with the reviewer that the significant knockdown of Chil1 in BMDM from Clec4f<sup>∆Chil1</sup>mice is confusing. To keep the rigor of our data, we remove this part from our manuscript. 

      Additionally, we performed immunofluorescence staining to detect Chi3l1 expression in liver tissues of these mice. The results show a reduction of Chi3l1 expression in KCs (TIM4+F4/80+ cells) of both Lyz2<sup>∆Chil1</sup>and Clec4f<sup>∆Chil1</sup>mice, with a more pronounced decrease in Clec4f<sup>∆Chil1</sup>mice (Author response image 1). 

      Author response image 1.

      The expression of Chi3l1 in liver tissues of Chil1<sup>fl/fl</sup>, Lyz2<sup>∆Chil1</sup>and Clec4f<sup>∆Chil1</sup>mice. Immunofluorescent staining to detect Chi3l1(green) expression in liver sections of Chil1<sup>fl/fl</sup>, Lyz2<sup>∆Chil1</sup>and Clec4f<sup>∆Chil1</sup>mice under normal chow diet. TIM4 (KCs marker, white), F4/80 (macrophage marker, red), nuclei were counterstained with DAPI, Scale bar=20 µm and 10 µm (Inset).

      (2) Figure 4 suggests that KC death is increased with KO of Chil1. The authors have added new data with TIM4 tht better characterizes this phenotype. The lack of TIM4 low, F4/80 hi cells further supports that their diet model is not producing any signs of the inflammatory changes that occur with MASLD and MASH. This is also supported by no meaningful changes in the CD11b hi, F4/80 int cells that are predominantly monocytes and early Mdms). It is also concerning that loss of KCs does not lead to an increase in Mo-KCs as has been demonstrated in several studies (PMID37639126, PMID:33997821). This would suggest that the degree of resident KC loss is trivial.

      We appreciate the reviewer’s insightful comment. We agree that our data show no substantial generation of monocyte-derived Kupffer cells (MoKCs) within the 16-week HFHC model. However, we do not believe the degree of resident KC loss is trivial, since 60% of KCs die at 16 weeks compared with 0 week (Revised Figure 5D). Instead, our observations align with a phased replacement model: recruited monocytes first differentiate into monocyte-derived macrophages (MoMFs), which we see accumulate (Revised Figure 5D), and only later adopt a KC phenotype. Consistent with this, our 16-week model shows significant EmKC loss and MoMFs expansion, but not yet the emergence of TIM4-MoKCs. This timing is supported by prior studies, where TIM4KCs were observed at 24 weeks, but not at 16 weeks, on similar diets (PMID: 33440159; PMID: 32888418). Therefore, we interpret our findings as capturing an earlier phase of MASLD progression, characterized by EmKC death and MoMF accumulation, prior to their full differentiation into MoKCs.

      (3) The authors demonstrated that Clec4f-Cre itself was not responsible for the observed phenotype, which mitigates my concerns about this influencing their model.

      We thank the reviewer for this comment and are pleased they agree that our control experiment using Clec4f-Cre alone confirms that the phenotype is specific to our genetic manipulation and not an artifact of the Cre driver.

      (4) I remain somewhat concerned about the conclusion that Chil1 is highly expressed in liver macrophages. The author agrees that mRNA levels of this gene are hard to see in the datasets; however, they argue that IF demonstrates clear evidence of the protein, CHI3L. The IF in the paper only shows a high power view of one KC. I would like to see what percentage of KCs express CHI3L and how this changes with HFHC diet. In addition, showing the knockout IF would further validate the IF staining patterns.

      We thank the reviewer for their thoughtful and constructive feedback. We agree that our initial conclusion regarding Chil1 expression in liver macrophages relied heavily on prior observations and was not sufficiently supported by the data presented. In response, we have revised our conclusion to state: "Hepatic macrophages express Chi3l1 and upregulate its expression following HFHC feeding." (Revised manuscript, page 4, line 136-137)

      To strengthen this finding, we have replaced the original high-power image of a single Kupffer cell with a representative low-power view showing multiple F4/80+ macrophages (Revised Figure 1A). Furthermore, we performed quantitative colocalization analysis, which revealed that under normal chow diet (NCD), approximately 8% of F4/80+ macrophages are Chi3l1-positive. This proportion significantly increases to 15% upon HFHC feeding (Revised Figure 1A).

      Additionally, to validate the specificity of the Chi3l1 immunofluorescence signal, we have included staining of liver sections from Chil1 knockout mice. In contrast to wildtype mice, Chi3l1 signal was completely absent within F4/80+ macrophages in Chil1<sup>-/-</sup> mice, confirming the specificity of the staining (Revised Figure 1B, Revised manuscript, page 4, line 152-157).

      Minor:

      (1) The authors have answered my question about liver fibrosis. In line with their macrophage data their diet model does not appear to induce even mild MASH.

      We thank the reviewer for this observation. We agree that under our HFHC dietary conditions, the mice do not develop MASH pathology. However, we believe this earlystage model is a strength of our study, as it allows us to dissect the initial role of the Chi3l1-glucose interaction in regulating Kupffer cell fate during early MASLD, prior to the onset of significant fibrosis. This approach enables us to capture early macrophage adaptations (such as Chi3l1 upregulation) that might otherwise be masked or become secondary to the overt inflammation and scarring characteristic of late-stage MASH models.

      Reviewer #2 (Public review):

      In the revised version of the manuscript, the authors have attempted to address my questions, however, a number of my original concerns still remain.

      Firstly, I had asked for a validation of the different CRE lines used - Lysm and Clec4f. The authors have now looked at BMDMs and KCs (steady state) from these animals. They conclude LysM only targets BMDMs not KCs, while CLEC4F targets both KCs and BMDMs. This I do not understand, BMDMs do not express CLEC4F so why are they targeted with this CRE? Additionally, BMDMs are not the correct control here, rather the authors should look at the incoming moMFs in the livers of these mice in the MASLD setting. Similarly, the KO in the MASLD KCs should be verified.

      We thank the reviewer for these constructive comments. We have performed FACSsorting of KCs (CD45<sup>+</sup> F4/80<sup>hi</sup> CD11b<sup>low</sup> TIM4<sup>hi</sup>) or MoMFs (CD45<sup>+</sup> F4/80<sup>low</sup> CD11b<sup>hi</sup> Ly6G<sup>-</sup> TIM4<sup>-</sup>) from Chil1<sup>fl/fl</sup> and Lyz2<sup>∆Chil1</sup> or Clec4f<sup>∆Chil1</sup>mice fed NCD or HFHC for 4 weeks, respectively. Compared with Chil1<sup>fl/fl</sup> mice, mRNA levels of Chil1 was reduced more than 90% in KCs from Clec4f<sup>∆Chil1</sup> mice while not different in MoMFs at both 0 and 4 weeks (Revised Figure S3B). Besides, compared with Chil1<sup>fl/fI</sup mice, mRNA levels of Chil1<sup>fl/fI</sup was reduced more than 90% in MoMFs from Lyz2<sup>∆Chil1</sup> mice while roughly 40% in KCs at both 0 and 4 weeks (Revised Figure S5B). This revised data support the phenotypic difference between Lyz2-CKO and Clec4f-CKO mice. 

      Then I had asked for validation of macrophage expression of Chil1 in other MASLD human and mouse datasets. The authors have looked into this, but the data provided do not suggest it is highly expressed by these cells either in the other mouse models or in the human. Nevertheless, they include a statement suggesting a similar expression pattern (although also being expressed by other cells). This is not an accurate discussion of the data and hence must be revised. This also prompted me to take another look at their data and this has left me querying the data in Figure 1D. Is the percent expressed 1%? In Figure 1C the scale goes from 0-100 but here 0-1. If we are talking about expression in 1% of cells which would fit with the additional public mouse data now analysed then how relevant are any of these claims? How sure are the authors that the effects seen are through KCs/moMFs? In figure 1D all cells profiled by scRNA-seq should be shown not just MFs to get a better sense of this data. What is macrophage expression of Chil1 compared with all other liver cells?

      We thank the reviewer for the thoughtful feedback. We agree that the expression pattern of Chil1 should be described more accurately. To address this point, we examined four additional publicly available scRNA-seq datasets, including two mouse MASLD models and two human MASLD datasets (Author response image 2). Across these studies, the cell type with the highest Chil1 expression varied, whereas Chil1 transcripts were detected at relatively low frequency in macrophages (~1% of cells; Author response image 2C, E, K). To better present these data, we regenerated the UMAP plots to include all captured liver non-parenchymal cells, defined using the top two lineage specific markers (Author response image 3A–B). Consistent with Figure 2A–C, violin plots show that Chil1 is highly expressed in neutrophils, with only modest expression detected in macrophages (Author response image 3C). Further analysis of monocyte/macrophage subsets indicates that approximately ~1% of MoMFs or KCs express Chil1 (Author response image 3D–F). As the reviewer noted, the y-axis in Author response image 3F ranges from 0–1%, reflecting the low transcriptional detection frequency of Chil1 in macrophages, which is consistent with the additional public datasets analyzed.

      We also recognize that mRNA detection by scRNA-seq does not necessarily reflect protein abundance. Therefore, we assessed Chi3l1 protein expression in hepatic macrophages using immunofluorescence staining for F4/80, TIM4, and Chi3l1 in liver sections from mice fed either normal chow diet (NCD) or HFHC diet. These analyses show that Chi3l1 protein is detectable in both KCs (TIM4<sup>+</sup>F4/80<sup>+</sup>) and MoMFs (TIM4<sup>-</sup>F4/80<sup>+</sup>) (Revised Figure 1A). Quantitative colocalization analysis revealed that under NCD conditions, approximately 8% of F4/80<sup>+</sup> macrophages are Chi3l1-positive, which increases to ~15% following HFHC feeding (Revised Figure 1A). To confirm antibody specificity, we additionally performed staining in Chil1 knockout mice. In contrast to wild-type mice, Chi3l1 signal was completely absent in F4/80<sup>+</sup> macrophages from Chil1<sup>-/-</sup> mice, validating the specificity of the staining (Revised Figure 1B). Together, these results suggest that low-abundance Chil1 transcripts may be under-detected by scRNA-seq, whereas immunofluorescence captures accumulated protein. Importantly, our functional experiments using Clec4f-Cre– mediated deletion directly support that the observed phenotypes are mediated through Kupffer cells, regardless of expression levels in other liver cell types.

      In response to the reviewer’s comments, we have made the following revisions:

      (1) Softened our conclusion to: “Hepatic macrophages express CHI3L1 and upregulate its expression following HFHC feeding” (Revised manuscript, page 4, lines 136–137).

      (2) Included representative low-magnification images showing multiple F4/80<sup>+</sup> macrophages along with quantitative analysis (Revised Figure 1A).

      (3) Added immunofluorescence staining of Chil1<sup>-/-</sup> liver sections demonstrating complete absence of Chi3l1 signal in F4/80<sup>+</sup> macrophages, validating antibody specificity (Revised Figure 1B).

      (4) Regenerated UMAP plots to display all liver non-parenchymal cells and clearly indicate the low detection frequency of Chil1 transcripts in macrophages (Author response image 3).

      (5) Revised the relevant text to more accurately describe Chil1 expression patterns in hepatic macrophages (Revised manuscript, page 4, lines 136–157).

      Author response image 2.

      Analysis of Chil1 expression in additional single-cell RNA sequencing datasets. (A-C) Chil1 expression in a mouse model of NASH. (A) t-SNE projection of cell clusters from scRNA-seq data (GSE1283338) of livers from C57BL/6J mice fed a control or NASH diet for 30 weeks. (B) Dot plot showing scaled Chil1 expression across all identified cell clusters. (C) Dot plot of scaled Chil1 expression after excluding the neutrophil cluster, highlighting expression in macrophage populations. Analyzed cell clusters and cell numbers: KC_H (healthy, 1178); KC3_Control (1142); KC_N (NASH, 1045); KN_RM (recruited macrophage in KC niche, 950); Proliferating_KC (364); PDC_Control (356); Ly6CHi_RM (320); LSEC (299); NK_NKT (393); B_cell (244); DC_1 (107); DC_2 (118); Ly6CLo_RM (127); Hepatocyte (57); PDC_NASH (46); Neutrophil (21). (D-E) Chil1 expression during NAFLD progression in a mouse Western diet model. (D) t-SNE projection of cell clusters from scRNA-seq data (GSE156059) of livers from C57BL/6J mice fed a Western diet with fructose/sucrose for 12, 24, and 36 weeks. (E) Dot plot showing scaled Chil1 expression across all identified cell clusters. Analyzed cell clusters and cell numbers: capsule macs (250), LAMs (1419), Ly6chi monocytes (6912), mac1 (638), moKCs (767), Patrolling monocytes (690), Prolif.macs (521), Resident KCs (3629), Transitioning monocytes (3615). (F-H) Chil1 expression in human cirrhotic liver biopsies. (F) t-SNE projection of cell clusters from scRNA-seq data (GSE136103) of healthy and cirrhotic human liver samples. (G) Dot plot showing scaled Chil1 expression across major cell lineages. (H) Dot plot of scaled Chil1 expression specifically within the mononuclear phagocyte (MP) population. Analyzed cell clusters and cell numbers: B cell (1951); cycling (967); Epithelia (3751); ILC (10091); mast cell (2511); Mesenchyme (2382); MP (10874); pDC (317); Plasma cell (877); T cell (19076). (I-K) Chil1 expression in a human NAFLD explant. (I) t-SNE projection of cell clusters from scRNA-seq data (GSE190487) of a human NAFLD liver explant. (J) Dot plot showing scaled Chil1 expression across all identified cell clusters. (K) Dot plot of scaled Chil1 expression within the MP subpopulations. Analyzed cell clusters and cell numbers: B cell (1278); Cycling (152); MP (2897); pDC (391); Plasma cell (85); T cell (1551); KC (403); SAMac (scar-associated macrophages, 723); TM (tissue monocytes, 1265).

      Author response image 3.

      Hepatic macrophages express Chi3l1. (A-D) Wildtype C57BL/6J mice were fed either a normal chow diet (NCD) or HFHC for 16 weeks. NPCs were isolated and subjected to BD Rhapsody scRNA sequencing. (A) Uniform manifold approximation and projection (UMAP) plots illustrate the clustering of NPCs from the livers of mice fed NCD and HFHC. Major cell types are colored. (B) Heatmap showing the mean expression of top2 markers of each cell type. (C) Violin plots show the RNA expression of Chil1 between NCD and HFHC livers in each cell cluster. (D) UMAP plots depict the clustering of Monocytes/Macrophages in the livers of mice fed NCD and HFHC. Cell clusters are color-coded. (E) Dot plot displays the scaled gene expression levels of lineage-specific marker genes in different cell clusters. (F) Dot plot shows the scaled gene expression levels of Chil1 in the indicated cell clusters.

      The cell death had also previously concerned me that 40-60% of KCs were tunel +ve. I do not understand how 60% are +ve at 8 weeks but then they have more or less same number of TIM4+ cells at 16 weeks? How can this be? why do the tunel +ve cells not die? This concern remains as I don't understand how they reached these numbers given the images. Additional, larger images were also not provided to be sure that they are representative images in the figure. Now in the images provided, there are clearly cells which are TIM4+ where the tunel does not overlap, likely it is in a LSEC or other neighbouring cell. Indeed also taking Fig S11b as an example there are ˜7KCs and at best 1 expresses tunel so how do they get to 60%?

      We thank the reviewer for these constructive feedback. We agree that the sustained TUNEL positivity without corresponding KC depletion presents an apparent paradox. Based on our data, we propose that TUNEL-positive KCs represent cells in a prolonged stressed or pre-apoptotic state rather than undergoing immediate clearance. This interpretation is supported by the relatively stable TIM4+ cell numbers between 8 and 16 weeks, which would be inconsistent with rapid cell death and removal. Previous studies (PMID: 33440159; PMID: 32888418) have similarly documented gradual KC loss during MASLD progression, supporting our view that KC death occurs over an extended timeframe rather than acutely.

      Regarding quantification concerns, we acknowledge that the representative images in the original figure may have been misleading. To address this, we have now quantified KC apoptosis using low-magnification fields across multiple liver sections to ensure statistical rigor. Figure S11B (now Revised Figure S9B) presents these data, showing that under NCD conditions, KC apoptosis rates are minimal in both genotypes. Following HFHC feeding, apoptosis rates are comparable between Chil1<sup>fl/fl</sup> and Lyz2<sup>Δ Chil1</sup> mice. Importantly, we have replaced all TIM4/TUNEL co-staining images with lowmagnification representative images in the revised figures (Revised Figure 1A, 1B, 5E, S9A, S9B). These images better reflect the quantitative data and confirm that the originally highlighted high-magnification fields were not representative of global apoptosis rates.

      Reviewer #3 (Public review):

      This paper investigates the role of Chi3l1 in regulating the fate of liver macrophages in the context of metabolic dysfunction leading to the development of MASLD. I do see value in this work, but some issues exist that should be addressed as well as possible.

      Here are my comments:

      (1) Chi3l1 has been linked to macrophage functions in MASLD/MASH, acute liver injury, and fibrosis models before (e.g., PMID: 37166517), which limits the novelty of the current work. It has even been linked to macrophage cell death/survival (PMID:31250532) in the context of fibrosis, which is a main observation from the current study.

      We thank the reviewer for raising this important point and acknowledge previous studies linking Chi3l1 to macrophage function in liver disease. However, several aspects of our work extend beyond these prior reports. First, although global Chi3l1 deficiency has been shown to promote macrophage apoptosis in toxin-induced fibrosis models (PMID: 31250532), our study demonstrates that Chi3l1 differentially regulates the fate of distinct hepatic macrophage subsets embryo-derived Kupffer cells (KCs) and monocyte-derived macrophages (MoMFs)—in MASLD. To our knowledge, this subset-specific regulation of hepatic macrophages has not been previously described. Second, we identify a previously unrecognized metabolic mechanism by which Chi3l1 regulates macrophage survival. Specifically, we find that Chi3l1 binds glucose and promotes glucose uptake, thereby protecting the highly glucose-dependent KCs from metabolic stress–induced death, while exerting minimal effects on MoMFs. This mechanism is distinct from the previously reported Fas/Akt-mediated pathway (PMID: 31250532) and highlights a metabolic checkpoint controlling macrophage subset– specific vulnerability. Third, our findings reveal context- and cell type-dependent roles of Chi3l1. While myeloid-specific deletion of Chi3l1 has been reported to ameliorate steatohepatitis and fibrosis (PMID: 37166517), our KC-specific deletion model shows that loss of Chi3l1 in KCs exacerbates disease, indicating a previously unrecognized protective role of Chi3l1 in KCs during early MASLD. Together, these findings provide new insights into macrophage subset-specific regulation, identify a novel glucose related metabolic mechanism, and reveal context-dependent functions of Chi3l1 in MASLD pathogenesis.

      (2) The LysCre-experiments differ from experiments conducted by Ariel Feldstein's team (PMID: 37166517). What is the explanation for this difference? - The LysCre system is neither specific to macrophages (it also depletes in neutrophils, etc), nor is this system necessarily efficient in all myeloid cells (e.g., Kupffer cells vs other macrophages). The authors need to show the efficacy and specificity of the conditional KO regarding Chi3l1 in the different myeloid populations in the liver and the circulation.

      We thank the reviewer for raising this important point regarding the specificity of the genetic models and the apparent discrepancy with the study by Feldstein and colleagues (PMID: 37166517). To address these concerns, we performed additional experiments to directly assess the efficiency and cell-type specificity of Chi3l1 deletion in our models.

      (1) Efficiency and specificity of LysM-Cre and Clec4f-Cre models

      We isolated KCs (CD45<sup>+</sup> F4/80<sup>hi</sup> CD11b<sup>low</sup> TIM4<sup>hi</sup>) or MoMFs (CD45<sup>+</sup> F4/80<sup>low</sup> CD11b<sup>hi</sup> Ly6G<sup>-</sup> TIM4<sup>-</sup>) by FACS from Chil1<sup>fl/fl</sup>, Lyz2<sup>∆Chil1</sup> and Clec4f<sup>∆Chil1</sup>mice fed either NCD or HFHC diet. Consistent with the known specificity of these Cre lines, Clec4f-Cre resulted in >90% reduction of Chil1 mRNA in KCs with no significant change in MoMFs (Revised Figure S3B), confirming efficient KC-specific deletion. In contrast, LysM-Cre reduced Chil1 expression by >90% in MoMFs but only ~40% in KCs (Revised Figure S5B). These data support the reviewer’s concern that LysM-Cre mediates incomplete recombination in KCs, whereas the Clec4f-Cre model provides KC-specific deletion, explaining why the phenotype observed in Lyz2<sup>∆Chil1</sup> mice is relatively modest.

      (2) Relationship to the study by Feldstein et al.

      We agree that our LysM-Cre results appear different from those reported by Feldstein and colleagues. However, considering the new recombination data and differences in disease models, we believe the findings are complementary rather than contradictory. First, the disease models differ substantially. Feldstein et al. used a CDAA-HFAT diet for 10 weeks, which rapidly induces severe inflammation and fibrosis, whereas our study employed a long-term HFHC diet, modeling the more gradual metabolic progression of MASLD. These distinct disease contexts may engage different CHI3L1dependent pathways. Second, the mechanistic focus differs. Feldstein et al. reported that myeloid Chi3l1 promotes steatohepatitis and fibrosis through inflammatory macrophage recruitment and IL13Rα2-mediated stellate cell activation. In contrast, our study identifies a metabolic mechanism in which CHI3L1 binds glucose and promotes glucose uptake, protecting metabolically vulnerable KCs from stress-induced death. Finally, and importantly, KC-specific deletion using Clec4f-Cre recapitulates the key phenotypes observed in our study, including effects on KC survival and metabolic regulation. This confirms that the observed effects are KC-autonomous and not due to broader Cre activity in other myeloid populations.

      Together, these additional experiments clarify the recombination efficiency of our models and demonstrate that our conclusions are supported by KC-specific genetic evidence.

      (3) The conclusions are exclusively based on one MASLD model. I recommend confirming the key findings in a second, ideally a more fibrotic, MASH model.

      We thank the reviewer for this valuable suggestion. To address this point, we tested our key findings in an additional MASH model using a methionine–choline-deficient (MCD) diet. First, we examined Chi3l1 expression in this model. Wild-type mice fed an MCD diet for 6 weeks showed significantly increased Chi3l1 mRNA and protein levels in liver tissues compared with NCD controls, confirming diet-induced upregulation (Revised Figure 3A–B). To determine the functional contribution of Kupffer cell–derived Chi3l1, we subjected Clec4f<sup>ΔChil1</sup> mice and Chil1<sup>fl/fl</sup> controls to MCD feeding for 6 weeks. Body weight was comparable between genotypes throughout the feeding period (Revised Figure 3C). However, KC-specific deletion of Chi3l1 significantly exacerbated MCD diet–induced liver pathology, including increased steatosis, inflammation, and fibrosis, as indicated by higher MASLD activity scores, enhanced Oil Red O staining, increased Sirius Red deposition, and elevated α-SMA expression (Revised Figure 3D). Consistent with these histological findings, Clec4f<sup>ΔChil1</sup> mice exhibited an increased liver index, whereas serum ALT levels remained comparable between groups, suggesting increased hepatic lipid accumulation rather than aggravated hepatocellular injury (Revised Figure 3E). In addition, serum and hepatic triglyceride levels and serum cholesterol were significantly elevated, while hepatic cholesterol levels were not significantly different from controls (Revised Figure 3E). Together, these results validate our findings in an independent MASH model and further support a protective role for Kupffer cell–derived Chi3l1 in limiting steatosis and disease progression (Revised manuscript, page 5, line 188-205).

      (4) Very few human data are being provided (e.g., no work with own human liver samples, work with primary human cells). Thus, the translational relevance of the observations remains unclear.

      We thank the reviewer for raising this important point. We agree that additional human validation would further strengthen the translational relevance of our findings. We initially attempted to examine macrophage cell death in human liver samples by performing TUNEL and F4/80 co-staining on human liver cancer tissues. However, we did not detect clear colocalization in these samples. We speculate that this may reflect differences in disease context and stage, as the available samples represent endstage liver disease, whereas our study focuses on early MASLD progression. Despite this limitation, we provide several lines of evidence supporting the human relevance of our findings. First, analysis of multiple public human MASLD scRNA-seq datasets demonstrates Chi3l1 expression in hepatic macrophages (Figure 2F–K). Second, analysis of public bulk RNA-seq datasets shows that Chi3l1 expression positively correlates with MASLD disease activity and progression (Revised Figure 1EF). Third, our observations are consistent with previous clinical studies reporting elevated CHI3L1 levels in patients with MASLD/MASH and advanced liver disease. We acknowledge that functional validation in primary human macrophages or human liver tissues would further strengthen the translational significance of this work. This limitation and future direction have now been added to the Discussion (Revised manuscript, page 10, lines 409–411).

      Comments on revisions:

      The authors have done a thorough job addressing my comments. However, I am not convinced about the MCD diet model, which is somewhat hidden in the Supplementary Files. Neither seems MASH different nor are any fibrosis data shown to support the conclusions. I am not satisfied with this part of the revised manuscript, and I do not agree that the second MASH model would support the conclusions.

      We thank the reviewer for their continued careful evaluation and for highlighting the need for clearer presentation of the MCD model data. To address this concern, we have substantially revised this section of the manuscript. First, the MCD model results have now been moved from the Supplementary Figure to a new main figure (Revised Figure 3) to improve visibility and clarity. Second, we have added additional fibrosis analyses, including Sirius Red staining and α-SMA immunostaining, to directly assess fibrotic changes. These analyses show that MCD feeding induces significant collagen deposition in control mice and that fibrosis is further increased in Clec4f<sup>ΔChil1</sup> mice (Revised Figure 3D). Importantly, the MCD model recapitulates the key phenotypes observed in the HFHC model, with KC-specific Chi3l1 deletion leading to increased MASLD progression. These findings support the conclusion that the protective role of Kupffer cell–derived Chi3l1 is not restricted to a single dietary model, but is observed across distinct models of steatohepatitis. We hope that these revisions clarify the results and strengthen the evidence supporting our conclusions.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor:

      Line 73 - should be moMfs not moKCs

      We thank the reviewer for this helpful comment. The term moKCs was used intentionally in line 73 to refer to monocyte-derived Kupffer cells, rather than MoMFs (monocyte-derived macrophages). To avoid potential confusion, we have clarified the terminology in the revised manuscript.

      Methods: diet is mentioned for 6 weeks but for HFHC should be 16.

      The correction has been made in the Methods section (page 3,line115).

      Liver/body weight ratios are >3 then I think it is body/liver weight ratio?

      We thank the reviewer for this query. The reported values represent liver-to-body weight ratios, calculated as (liver weight ÷ body weight) × 100%. A value of ~3% is consistent with the expected range for mice with MASLD-associated hepatomegaly.

      This clarification has been added to the revised figure legend.

      Figure 5F - what happens in Clec4f-CRE mice fed HFHC?

      We thank the reviewer for this question. Western blot analysis showed that the HFHC diet upregulated Chi3l1 protein in the livers of Clec4f-Cre mice post HFHC diet (Author response image 4.), similar to the increase observed in wild-type mice.

      Author response image 4.

      The expression of Chi3l1 in serum of Clec4f cre mice. (A) Western blot to detect Chi3l1 expression in murine serum of Clec4f cre mice before and after HFHC feeding. n=3 mice/group.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents convincing findings that oligodendrocytes play a regulatory role in spontaneous neural activity synchronisation during early postnatal development, with implications for adult brain function. Utilising targeted genetic approaches, the authors demonstrate how oligodendrocyte depletion impacts Purkinje cell activity and behaviours dependent on cerebellar function. Delayed myelination during critical developmental windows is linked to persistent alterations in neural circuit function, underscoring the lasting impact of oligodendrocyte activity.

      Strengths:

      (1) The research leverages the anatomically distinct olivocerebellar circuit, a well-characterized system with known developmental timelines and inputs, strengthening the link between oligodendrocyte function and neural synchronization.

      (2) Functional assessments, supported by behavioral tests, validate the findings of in vivo calcium imaging, enhancing the study's credibility.

      (3) Extending the study to assess the long-term effects of early-life myelination disruptions adds depth to the implications for both circuit function and behavior.

      We appreciate these positive evaluation.

      Weaknesses:

      (1) The study would benefit from a closer analysis of myelination during the periods when synchrony is recorded. Direct correlations between myelination and synchronized activity would substantiate the mechanistic link and clarify if observed behavioral deficits stem from altered myelination timing.

      We appreciate the reviewer’s thoughtful suggestion and have expanded the manuscript to clarify how oligodendrocyte maturation relates to the development of Purkinje-cell synchrony. The developmental trajectory of Purkinje-cell synchrony has already been comprehensively characterized by Good et al. (2017, Cell Reports 21: 2066–2073): synchrony drops from a high level at P3–P5 to adult-like values by P8. We found that the myelination in the cerebellum starts to appear from P5-P7 (Figure S1A, B), indicating that the timing of Purkinje cell desynchronization coincides with the initial appearance of oligodendrocytes and myelin in the cerebellum. To determine whether myelin growth could nevertheless modulate this process, we quantified ASPA-positive oligodendrocyte density and MBP-positive bundle thickness and area at P10, P14, P21 and adulthood (Fig. 1J, K, Fig. S1E). Both metrics increase monotonically and clearly lag behind the rapid drop in synchrony, indicating that myelination could be not the primary trigger for the desynchronization. When oligodendrocytes were ablated during the second postnatal week, the synchrony was reduced (new Fig. 2). Thus, once myelination is underway, oligodendrocytes become critical for maintaining the synchrony, acting not as the initiators but as the stabilizers and refiners of the mature network state.

      We have added the new subsection in discussion (lines 451–467) now in which we propose a two-phase model. Phase I (P3–P8): High early synchrony is generated by non-myelin mechanisms (e.g. transient gap junctions, shared climbing-fiber input). Phase II (P8-). As oligodendrocytes proliferate and ensheath axons, they fine-tune conduction velocity and stabilize the mature, low-synchrony network state.

      We believe these additions fully address the reviewer’s concerns.

      (2) Although the study focuses on Purkinje cells in the cerebellum, neural synchrony typically involves cross-regional interactions. Expanding the discussion on how localized Purkinje synchrony affects broader behaviors - such as anxiety, motor function, and sociality - would enhance the findings' functional significance.

      We appreciate the reviewer’s helpful suggestion and have expanded the Discussion (lines 543–564) to clarify how localized Purkinje-cell synchrony can influence broader behavioral domains. In the revised text we note that changes in PC synchrony propagate into thalamic, prefrontal, limbic, and parietal targets, thereby impacting distributed networks involved in motor coordination, affect, and social interaction. Our optogenetic rescue experiments further support this framework, as transient resynchronization of PCs normalized sociability and motor coordination while leaving anxiety-like behavior impaired. This dissociation highlights that different behavioral domains rely to varying degrees on precise cerebellar synchrony and underscores how even localized perturbations in Purkinje timing can acquire system-level significance.

      (3) The authors discuss the possibility of oligodendrocyte-mediated synapse elimination as a possible mechanism behind their findings, drawing from relevant recent literature on oligodendrocyte precursor cells. However, there are no data presented supporting this assumption. The authors should explain why they think the mechanism behind their observation extends beyond the contribution of myelination or remove this point from the discussion entirely.

      We thank the reviewer for pointing out that our original discussion of oligodendrocyte-mediated synapse elimination was not directly supported by data in the present manuscript. Because we are actively analyzing this question in a separate, follow-up study, we have deleted the speculative passage to keep the current paper focused on the demonstrated, myelination-dependent effects. We believe this change sharpens the mechanistic narrative and fully addresses the reviewer’s concern.

      (4) It would be valuable to investigate the secondary effects of oligodendrocyte depletion on other glial cells, particularly astrocytes or microglia, which could influence long-term behavioral outcomes. Identifying whether the lasting effects stem from developmental oligodendrocyte function alone or also involve myelination could deepen the study's insights.

      We thank the reviewer for raising this point and have performed the requested analyses. Using IBA1 immunostaining for microglia and S100b for Bergmann glia, we quantified cell density and these marker signal intensity at P14 and P21. Neither microglial or Bergmann-glial differed between control and oligodendrocyte-ablated mice at either time‐point (new Figure S2). These results indicate that the behavioral phenotypes we report are unlikely to arise from secondary activation or loss of other glial populations.

      We now added results (lines 275–286) and also discuss myelination and other oligodendrocyte function (lines 443–450). It remains difficult to disentangle conduction-related effects from myelination-independent trophic roles of oligodendrocytes. We therefore note explicitly that future work employing stage-specific genetic tools or acute metabolic manipulations will be required to parse these contributions more definitively.

      (5) The authors should explore the use of different methods to disturb myelin production for a longer time, in order to further determine if the observed effects are transient or if they could have longer-lasting effects.

      We agree that distinguishing transient from enduring effects is critical. Importantly, our original submission already included data demonstrating a persistent deficit of PC population synchrony (Fig. 4, previous Fig. 3): (i) at P14—the early age after oligodendrocyte ablation—population synchrony is reduced, and (ii) the same deficit is still present in adults (P60–P70) despite full recovery of ASPA-positive cell density and MBP-area and -thickness (Fig. 2H-K, Fig. S1E, and Fig. 4). We also performed the ablation of oligodendrocytes after the third postnatal week. Despite a similar acute drop in ASPA-positive cells, neither population synchrony nor anxiety-, motor-, or social behaviors differed from littermate controls. Thus, extending myelin disruption beyond the developmental window does not exacerbate or prolong the phenotype, whereas a short perturbation within that window leaves a permanent timing defect. These findings strengthen our conclusion that it is the developmental oligodendrocyte/myelination program itself—rather than ongoing adult myelin production—that is essential for establishing stable network synchrony. We now highlight this point explicitly in the revised Discussion (lines 507–522).

      (6) Throughout the paper, there are concerns about statistical analyses, particularly on the use of the Mann-Whitney test or using fields of view as biological replicates.

      We appreciate the reviewer’s guidance on appropriate statistical treatment. To address these concerns we have re-analyzed all datasets that contained multiple measurements per animal (e.g., fields of view, lobules, or trials) using nested statistics with animal as the higher-order unit. Specifically, we applied a two-level nested ANOVA when more than two groups were compared and a nested t-test when two conditions were present. The re-analysis confirmed all original conclusions. Because the nested models yielded comparable effect sizes to the Mann–Whitney tests, we have retained the mean ± SEM for ease of comparison with prior literature but now also report all values for each mouse in Table 1. In cases where a single measurement per mouse was compared between two groups, we used the Mann–Whitney test and present the results in the graphs as median values.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      In this important work, it is demonstrated that certain high-resolution cryo-EM structures can be obtained by using concentrated cell extracts without purification. The compelling results with the mammalian ribosomes demonstrate the utility of this approach for this molecule and complexes with elongation factor 2. Moreover, this work also demonstrates the utility of 2D template matching for particle picking for structure determination by single-particle averaging pipelines.

      We thank the reviewers for their valuable comments and suggestions, which have helped us to improve the manuscript. We provide a response to the referees’ comments below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Seraj et al. introduces a transformative structural biology methodology termed "in extracto cryo-EM." This approach circumvents the traditional, often destructive, purification processes by performing single-particle cryo-EM directly on crude cellular lysates. By utilizing high-resolution 2D template matching (2DTM), the authors localize ribosomal particles within a complex molecular "crowd," achieving near-atomic resolution (~2.2 Å). The biological centerpiece of the study is the characterization of the mammalian translational apparatus under varying physiological states. The authors identify elongation factor 2 (eEF2) as a nearly universal hibernation factor, remarkably present not only on non-translating 80S ribosomes but also on 60S subunits. The study provides a detailed structural atlas of how eEF2, alongside factors like SERBP1, LARP1, and IFRD2, protects the ribosome's most sensitive functional centers (the PTC, DC, and SRL) during cellular stress.

      Strengths:

      The "in extracto" approach is a significant leap forward. It offers the high resolution typically reserved for purified samples while maintaining the "molecular context" found in in situ studies. This addresses a major bottleneck in structural biology: the loss of transiently bound or labile factors during biochemical purification.

      The finding that eEF2 binds and sequesters 60S subunits is a major biological insight. This suggests a "pre-assembly" hibernation state that allows for rapid mobilization of the translation machinery once stress is relieved, which was previously uncharacterized in mammalian cells.

      The authors successfully captured eIF5A and various hibernation factors in states that are traditionally disrupted. The identification of eIF5A across nearly all translating and non-translating states highlights the power of this method to detect ubiquitous but weakly bound regulators.

      The manuscript beautifully illustrates the "shielding" mechanism of the ribosome. By mapping the binding sites of eEF2 and its co-factors, the authors provide a clear chemical basis for how the cell prevents nucleolytic cleavage of ribosomal RNA during nutrient deprivation.

      Weaknesses:

      (1) While 2DTM is a powerful search tool, it inherently relies on a known structural "template." There is a risk that this methodology may be "blind" to highly divergent or novel macromolecular complexes that do not share sufficient structural similarity with the search model. The authors should discuss the limitations of using a vacant 60S/80S template in identifying highly remodeled stress-induced complexes. For instance, what happens if an empty 40S subunit is used as a template? In the current work, while 60S and 80S particles are picked, none are 40S. The authors should comment on this.

      Thank you for your comment. As noted by the reviewer, 2DTM inherently favors particles that share sufficient similarity with the search template and may underrepresent highly remodeled or structurally divergent complexes. Importantly, once particles are identified, subsequent 2D/3D classification and refinement are not constrained by the template used for particle picking. Consistent with this, we observe classes displaying additional or altered densities absent in the original template, indicating that template matching does not preclude the detection of remodeled ribosomal states, although highly divergent species may still escape detection.

      Regarding the use of a 40S subunit as a template for 2DTM, we tested two templates: a complete 40S subunit and the 40S body alone. Using these 40S templates, we captured several 40S-, 43S-, and 48S-containing complexes, as well as 80S particles. As expected, no individual 60S classes emerge with 40S-TM. 40S-TM yielded 80S classes similar to those with 60-TM, although the number of particles was lower than that in 60S template matching, resulting in lower resolution of these classes. Since this study focuses on ribosome hibernation, we chose to proceed with the 60S-TM results and do not report results using 40S-TM. We reported 40S-TM results in another study from our groups (Zottig et al., bioRxiv, 2025), which focuses on translation initiation on 40S subunits and was deposited as preprint after this submission.

      We have added a comment and reference describing the use of the 40S template in the initial section of Results and Discussion: “This result echoes our concurrent finding that using 40S or partial 40S templates yields a variety of initiation complexes and 80S classes, revealing densities beyond those in the template [44].”

      (2) In the GTPase center, the authors identify density for "DRG-like" proteins. However, due to limited local resolution in that specific region, they are unable to definitively distinguish between DRG1 and DRG2. While the structural similarity is high, the functional implications differ, and the identification remains somewhat speculative. The authors should acknowledge this in the text.

      We agree with this comment and address it in the main text:

      “Whereas the overall shape and secondary structure resemble DRG1 or DRG2, the local resolution is insufficient to distinguish between these or other similarly structured proteins. Both yeast and mammalian counterparts are reported to function with a companion factor (Tma146p or Gir2 in yeast; or DFRP1 and DFRP2 in mammals), but our maps do not contain density that could correspond to DFRP1/2 near the putative DRG1/2 density. Future work will elucidate the function of these or other DRG-like GTPases in the context of an elongation complex.”

      (3) While "in extracto" is superior to purified SPA, the act of cell lysis (even rapid permeabilization) still involves a change in the chemical environment (pH, ion concentration, and dilution of metabolites). The authors could strengthen the manuscript by discussing how post-lysis changes might affect the occupancy of factors like GTP vs. GDP states.

      Thank you for pointing this out. Cell lysis can indeed lead to a change in the chemical environment, although we do not know how post-lysis changes may specifically affect the occupancy of factors, such as GTP- vs. GDP-bound states. We tried to minimize this effect by performing a rapid permeabilization. Our efforts to optimize our protocols are ongoing, and we expect to have a better answer to this question in the future.

      Nevertheless, to address this reviewer’s concern, our discussion states: “Additional optimization of buffer conditions may be required to more accurately represent the translation states observed in cells, as ionic conditions are known to affect the conformation of the ribosomes (e.g. rotated/non-rotated) and binding of protein factors”.

      (4) The study provides excellent snapshots of stationary states (translating vs. hibernating), but the kinetic transition, specifically how the 60S-eEF2 complex is recruited back into active translation, is not well discussed. On page 13, the authors present eEF2 bound to 60S but do not mention anything regarding which nucleotide is bound to the factor. It only becomes clear that it is GDP after looking at Figure S9. This should be clarified in the text. Similarly, the observations that eEF2 is bound to GDP in the 60S and 80S raise questions as to how the factor dissociates from the ribosome. This could also be discussed.

      Thank you for bringing this to our attention. We now state in the main text that eEF2 is bound with GDP on the 60S subunit.

      As for the kinetic transitions of 60S-eEF2 complexes, like this reviewer, we are fascinated by the possible roles and mechanisms of the 60S-eEF2 complex. The averaged particle ensembles derived from cryo-EM data do not report on the kinetics or transition pathways directly. We acknowledge in the main text that “Future studies will bring insights into the roles of the protein(s) and into the functions and transitions of 60S•eEF2 complexes to the pool of translating ribosomes”.

      Overall Assessment:

      The work reported in this manuscript likely represents the future of structural proteomics. The combination of high-resolution structural biology with minimal sample perturbation provides a new standard for investigating the cellular machines that govern life. After addressing minor points regarding template bias, protein identification, and transition dynamics, this work may become a landmark in the field of translation.

      Reviewer #2 (Public review):

      In this manuscript, the authors describe using "in extracto" cryo-EM to obtain high-resolution structures of mammalian ribosomes from concentrated cell extracts without further purification or reconstitution. This approach aims to solve two related problems. The first is that purified ribosomes often lose cellular cofactors, which are often reconstituted in vitro; this precludes the ability to find novel interactions. The second is that while it is possible to perform cryo-EM on cellular lamella, FIB milling is a slow and laborious process, making it unfeasible to collect datasets sufficiently large to allow for high-resolution structure determination. Extracts should contain all cellular cofactors and allow for grid preparation similar to standard single-particle analysis (SPA) approaches. While cryo-EM of cell extracts is not in itself novel, this manuscript uses 2D template matching (2DTM) for particle picking prior to structure determination using more standard SPA pipelines. This should allow for improved picking over other approaches in order to obtain large datasets for high-resolution SPA.

      This manuscript has two main results: novel structures of ribosomes in hibernating states; and a proof-of-principle for in extracto cryo-EM using 2DTM. Overall, I think the results presented here are strong and serve as a proof-of-principle for an approach that may be useful to many others. However, without presenting the logic of how parameters were optimized, this manuscript is limited in its direct utility to readers.

      Thank you for this valuable comment. We have expanded our Methods section “Optimization of 2DTM in RRL data “to present the logic behind parameter optimization, with the paragraph beginning with “We optimized high-resolution template matching procedures…”

      Reviewer #3 (Public review):

      Summary:

      The authors describe a new structural biology framework termed "in extracto cryo-EM," which aims to bridge the gap between single-particle cryo-EM of purified complexes and in situ cryo-electron tomography (cryo-ET). By utilizing high-resolution 2D template matching (2DTM) on mammalian cell lysates, the authors sought to visualize the translational apparatus in a near-native environment while maintaining near-atomic resolution. The study identifies elongation factor 2 (eEF2) as a major hibernation factor bound to both 60S and 80S particles and describes a variety of hibernation scenarios involving factors such as SERBP1, LARP1, and CCDC124.

      Strengths:

      (1) The use of 2DTM effectively overcomes the signal-to-noise challenges posed by the dense and viscous nature of cellular extracts, yielding maps as high as 2.2 Å.

      (2) The discovery of eEF2-GDP as a ubiquitous shield for ribosomal functional centers, particularly its unexpected stabilization on the 60S subunit, provides a compelling model for ribosome preservation during stress.

      Weaknesses:

      (1) Representative nature of cell samples and lower detection limit

      The cells used in this study (MCF-7, BSC-1, and RRL) are either fast-growing cancer cell lines or specialized protein-synthetic systems. For cells with naturally low ribosomal abundance (such as quiescent primary cells), achieving the target concentration (e.g., A260 > 1000 ng/uL) would require an exponentially larger starting cell population.

      Is there a defined lower limit of ribosomal concentration in the raw lysate below which the 2DTM algorithm fails to yield high-resolution classes? In ribosome-sparse lysates, A260 becomes an unreliable proxy for ribosome density due to the high background of other RNA species and proteins. How do the authors estimate specific ribosome abundance in such heterogeneous fields?

      We have not tested these specific points, but we found that 2DTM can successfully result in high-resolution reconstructions even with 1-2 particles per micrograph. This would require a substantially larger dataset than in this work yet could provide a viable strategy for diluted or low-abundance samples. Other optimizations, including lysate concentration, may help as well. We have the following text to reflect these points:

      “Additional optimization of buffer conditions may be required to more accurately represent the translation states observed in cells, as ionic conditions are known to affect the conformation of the ribosomes (e.g. rotated/non-rotated) and binding of protein factors [91-94]. For cells or samples with lower abundance of ribosomes or other macromolecules/complexes of interest, a lysate concentration step or collection of a larger dataset may be considered.”

      (2) Quantitation in heterogeneous lysates and crowding effects

      The authors utilize A260 as a key quality control measure before grid preparation. However, if extreme physical concentration is required to see enough particles, the background concentration of other cytoplasmic components also increases. This may lead to molecular crowding or sample viscosity that interferes with the formation of optimal thin ice. How do the authors calculate or estimate the specific abundance of ribosomes in the cryo-EM field of view when they represent a much smaller percentage of the total cellular content?

      We reported A260 as a reference that may be useful to achieve particle distributions resembling those in our work, rather than as a key quality control measure. Accordingly, we do not use it to estimate ribosome concentration or the specific abundance of ribosomes; instead, we’d recommend adjusting the sample concentration/dilution by grid screening.

      This reviewer mentions the important aspect of ice thickness. We found that the highest population of ribosome particles is found in thicker ice regions, and these particles have been used to make up the majority of our datasets leading to high-resolution reconstructions. We have added this observation to “Optimization of 2DTM in RRL data”.

      (3) Optimization of sample preparation

      The authors describe lysates as dense and viscous, requiring multiple blotting steps (2-3 times) for 3-8 seconds. Have the authors tested whether a larger molecular weight cutoff (e.g., 100 kDa) during concentration could improve the ribosome-to-background ratio without losing small factors like eIF5A (approx. 17 kDa)? Could repeated blotting of a concentrated, viscous lysate introduce shearing forces or increased exposure to the air-water interface that perturbs the native conformation of the complexes?

      We strived to minimize the number of steps in sample preparation, so we did not extensively test concentration steps. We also found that a concentration step can be omitted; the eIF5A-containing structure from the RRL dataset was determined without this step. We agree with the reviewer that repeated blotting may change ribosome complex equilibrium and result in a different distribution of functional states than in cells. However, we did not find evidence of perturbation of the native conformations of complexes, as the positions of ribosomes and factors are nearly identical to those observed in previous studies, including the recent high-resolution structures from cells that we cite.

      (4) The regulatory switch and mechanism of eEF2

      The finding that eEF2-GDP occupies dormant ribosomes is striking. What drives eEF2 from its canonical role in translocation to this hibernation state? Is this transition purely driven by stoichiometry (lack of mRNA/tRNA) and the GDP/GTP ratio, or is there a role for post-translational modifications? How do these eEF2-bound dormant ribosomes rapidly re-enter the translation pool upon stress relief?

      We are glad that this reviewer is fascinated by the eEF2-GDP occupancy on dormant ribosome (just like we are)! These are important open questions that require further research, as our cryo-EM analyses cannot directly address the kinetic or mechanistic aspects of the mentioned processes. We did explore the known modification/phosphorylation sites in eEF2 densities but did not find evidence for such modifications, which does not rule out the possibility of transient or new modifications.

      (5) Hibernation diversity and LARP1 contextualization

      The study reveals that hibernation strategies vary across cell types. Does the high hibernation rate in RRL reflect a physiological state, or does it hint at “preparation-induced stress” due to resource exhaustion or mRNA degradation in the cell-free system? How do the authors reconcile their discovery of LARP1 on 80S particles with recent 2024 reports that primarily describe LARP1 as an SSU-bound repressor?

      Based on the high abundance of hibernating ribosomes in RRL (relative to many other samples we have tested so far), we speculate that this scenario may result from the stresses induced during lysate preparation: first, the rabbits are treated with phenylhydrazine inducing cell stress, then lysates are treated with micrococcal nuclease to degrade endogenous mRNAs. In addition, the specialization of reticulocytes may contribute to the distinct expression of stress/hibernation factors.

      As for LARP1, our finding is consistent with the 2024 work by Saba et al, who reported LARP1 binding to both 40S subunits and 80S ribosomes. They also noted that LARP1-bound ribosomes are “non-translating”, consistent with our structures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 3, it would be easier for the reader if the authors would report the % of particles in each class. Also, indicating body rotation and head swiveling values would help.

      Because our high-resolution maps result from a combination of data sets (e.g., RRL with an mRNA and RRL without an mRNA), we specify the particle percentages in the corresponding classification schemes in supplemental figures. To avoid excessive labeling in this figures, body rotation and head swiveling values for the new classes are shown in Figure 4.

      (2) Page 16, what is 'elongation factor 1'? It doesn't seem the authors refer to eEF1A?

      Thank you for pointing out this inconsistency, this is indeed eEF1A. We have corrected the text.

      (3) Page 16, after 'individual 60S subunits', there is a missing full stop.

      Thanks. Corrected.

      Reviewer #2 (Recommendations for the authors):

      I am not an expert in ribosome biology and do not have any specific comments on the various states presented here. Instead, I will mainly focus on the image processing aspects of this manuscript.

      Major points:

      (1) Were any AI-based particle pickers, such as crYOLO, topaz, or warp tested? While more traditional template-based or LoG pickers were shown to be inferior to 2DTM, it is unclear if AI methods would perform just as well. Given that a major point of this manuscript is the image processing pipeline, and that these AI tools have been widely adopted in the field, I think this is an important consideration.

      We used other particle pickers before using 2DTM and have listed them in the Supplementary Information: please see Table S1 for a complete list of particle pickers evaluated in this study. Since our present work focuses on a sample preparation method, a more extensive evaluation of particle picking methods is beyond the scope of this study.

      (2) While the methods used to obtain the structures presented are detailed, I think it would also be useful to provide some logic for how parameters were determined or optimized. This would serve as a useful foundation for readers who wish to try out this in an extracto approach on their own specimens. Some of these optimizations seem quite specific, such as optimization of angular search parameters, but with no clear logic: e.g., why is the out-plane search coarser than the in-plane search; what is the effect of increasing the angular step sizes? Some seem inconsistent, e.g., why is e2pdb2mrc.py sometimes used and the cisTEM simulate used other times? Some are poorly described, such as "the defocus search turned on for micrographs with thicker ice" where there is no mention of how ice thickness is assessed and how thick is too thick. I think a workflow figure with accompanying text would help the reader understand the logic used in this work and how to apply that logic to their own projects.

      To address the comments in (2), we provide separate responses addressing each comment:

      (1) Provide some logic for how parameters were determined or optimized:

      The logic behind determining and optimizing search parameters is a balance between search precision and computational cost. In practice, users must weigh the benefit of finer sampling against the substantial increase in runtime, particularly for large datasets. For example, enabling defocus searching with a 200 Å step size and a 1000 Å range increases the computational time by approximately 11-fold compared to running the same search with defocus disabled (since each defocus plane in the positive and negative direction are searched), making such increases prohibitive, when GPU resources are limited. In such cases, reducing the defocus search to a 250 Å step size and a 500 Å range can dramatically shorten runtime while preserving nearly the same number of reliable matches. In summary, we found that optimizing the defocus search, in-plane, out-plane angles, and the image/micrograph pixel size can substantially reduce the processing speed while sacrificing only a small percentage of particles.

      We have expanded our parameter optimization paragraph in “Optimization of 2DTM in RRL data”, as mentioned in a previous response.

      (2) Some seem inconsistent, e.g., why is e2pdb2mrc.py sometimes used and the cisTEM simulate used other times?

      e2pdb2mrc.py is simpler to use and was used in the beginning of the project. Later, we switched to using the simulate program since it preformed slightly better. Either software is suitable to generate templates for 2DTM.

      (3) Some are poorly described, such as "the defocus search turned on for micrographs with thicker ice" where there is no mention of how ice thickness is assessed and how thick is too thick.

      We did not quantitatively assess ice thickness; instead, we tested whether it is advantageous to include the defocus search. To this end, we first performed CTF estimation and grouped micrographs based on their fit resolution. From each group, we selected ten micrographs representing the highest and lowest fit resolutions. Template matching was then performed using identical parameters, once with defocus search enabled and once with it disabled. The number of picked particles for each micrograph under both conditions was compared. When a significant difference was observed most commonly for icy micrographs with low fit resolution we enabled defocus search for that group of images. The difference between having the defocus search on vs off sometimes resulted in having 2x more matches. We found these images/datasets appeared to have a higher background compared to in-vitro reconstituted samples. The template-matching results from these micrographs were subsequently combined with results from groups processed with defocus search disabled.

      To address this point, we have included this description in “Optimization of 2DTM in RRL data”.

      (4) I think a workflow figure with accompanying text would help the reader understand the logic used in this work and how to apply that logic to their own projects.

      Thanks for this suggestion. We have added a workflow figure as Figure 1—figure supplement 2.

      Minor Points:

      (1) While the image processing described seems appropriate, I think it is still necessary to include Fourier shell correlation plots for the final structures as supplemental data.

      Thank you for pointing out this inadvertent omission. We have added FSC curves in Figure 3—figure supplement 3.

      (2) One of the initial workflows used is a Relion 3 pipeline, which is, at this point, quite dated. Is there a reason Relion 4 or 5 was not used instead?

      The project started when Relion 3 was the latest version.

    1. Author response:

      We thank the editors and reviewers for their careful evaluation of our manuscript, “GM-CSF regulates ILC states and myeloid cell signaling during ulceration in Crohn’s disease.” We appreciate the constructive feedback and agree that strengthening the mechanistic understanding of GM-CSF signaling in the regulation of ILC populations will significantly improve the study.

      The reviewers identified a key gap regarding the downstream mechanisms by which GM-CSF maintains ILC3 populations and limits ILC1 expansion. In response, we will focus our revision on defining the myeloid-mediated pathways downstream of GM-CSF that regulate ILC states.

      Specifically, we plan to: 

      (1) Characterize myeloid cell responses to GM-CSF signaling

      We will perform additional analyses of both our Xenium spatial transcriptomics and zebrafish single-cell RNA-seq datasets to identify transcriptional changes in macrophages and monocytes associated with GM-CSF signaling. This will include differential gene expression and pathway enrichment analyses to uncover candidate signaling pathways (e.g., cytokine and STAT5-associated programs) that may mediate ILC regulation.

      (2) Strengthen spatial niche analysis in human tissue

      We will refine our Xenium-based analyses to better define the cellular microenvironments surrounding GM-CSF-producing cells, including higher-resolution visualization and quantification of receptor-expressing target cells and signaling niches within ulcerated regions.

      (3) Further define immune cell populations in the zebrafish model

      We will enhance the definition of ILC subsets by incorporating additional marker-based analyses and clarifying their relationship to human ILC populations. In parallel, we will more thoroughly characterize the myeloid compartment in csf2rb-deficient zebrafish to determine how GM-CSF signaling impacts these populations.

      (4) Clarify analysis methods and presentation

      We will address all points related to statistical testing, data visualization, and figure clarity raised by the reviewers, including the use of appropriate statistical comparisons for multi-group analyses and improved annotation of gene modules and data sources.

      Together, these revisions will provide a clearer mechanistic framework linking GM-CSF signaling in myeloid cells to the maintenance of ILC3 populations and suppression of inflammatory ILC1 responses.

      We believe these additions will substantially strengthen the manuscript and address the reviewers’ concerns. We appreciate the opportunity to revise our work and look forward to submitting a revised version.

    1. Author response:

      We would like to thank the editors and the reviewers for their thoughtful and constructive assessment of our manuscript. We appreciate the reviewers' positive recognition of our research and their thoughtful assessment of our data.

      In the upcoming revision, we will incorporate rigorous statistical analysis (p-values) for our binding assays, optimize the structural figures and summary tables for better clarity, and discuss the recent preprint paper alongside the nuances of Egl-BicD stoichiometry. Regarding the suggestion for CLIP-seq, we agree that a global analysis would be a valuable extension of this work. However, as our lab’s core expertise is in structural biology, and the in vivo functional studies in this manuscript were conducted through a collaboration to validate our structural findings, we feel that such a large-scale genomic study falls beyond the scope of the current structural report.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zare-Eelanjegh et al. investigate how the endoplasmic reticulum, the nucleus, and the cell periphery are mechanically linked by indenting intact cells with specially shaped atomic force probes that double as drug injection devices. -Fluorescencelifetime imaging of the membrane tension reporter -FlipperTR- reveals that these three compartments are mechanically linked and that the actin cytoskeleton, microtubules, and lamins modulate this coupling in complex ways.

      Strengths:

      (1) The study makes an important advance by applying FluidFM to probe organelle mechanics in living cells, a technically demanding but powerful approach.

      (2) Experimental design is quantitative, the data are clearly presented, and the conclusions are broadly consistent with the measurements.

      Weaknesses:

      (1) Calcium-dependent- effects: Indentation can evoke cytoplasmic CA<sup>2+</sup> elevations that drive myosin contraction and reshape the internal membrane network (e.g., vesiculation: PMID : 9200614, 32179693) possibly confounding the Flipper-TR responses; without simultaneous/matching CA<sup>2+</sup> imaging, cell viability assays (e.g., Sytox), and intracellular CA<sup>2+</sup> sequestration or myosin inhibition experiments, a more complex mechanochemical coupling cannot be excluded, weakening conclusions.

      (2) Baseline measurements: FlipperTR lifetime images acquired without indentation do not exclude potential -light-induced or -time-dependent- changes, which weaken the conclusions.

      (3) Indentation depth versus nuclear stiffness/tension: Because lamin-A/C depletion softens nuclei, a given force may produce a deeper pit and thus greater membrane stretch. It is unclear how the cytoskeletal perturbations affect indentation depth, which weakens the conclusions.

      Reviewer #2 (Public review):

      Summary:

      This useful study combines atomic force microscopy with genetic manipulations of the lamin meshwork and microinjection of cytoskeletal depolymerizing drugs to probe the mechanical responses of intracellular organelles to combinations of cytoskeletal perturbations. This study demonstrates both local and distal responses of intracellular organelles to mechanical forces and shows that these responses are affected by disruption of the actin, microtubule, and lamin cytoskeletal systems. Interpretation of these effects is limited by the absence of key data determining whether acute microinjection of cytoskeleton-depolymerizing drugs has complete or partial effects on the targeted cytoskeletal networks.

      Strengths:

      This study uses a sensitive micromanipulation system to apply and visualize the effects of force on intracellular organelles.

      Weaknesses:

      The choice to deliver cytoskeleton-depolymerizing drugs by local microinjection is unusual, and it is unclear to what extent actin and microtubule filaments are actually depolymerized immediately after microinjection and on the minutes-length timescale being evaluated in this study. This omission limits the interpretation of these data.

      Reviewer #3 (Public review):

      Summary:

      Using an approach developed by the authors (FluidFM) combined with FLIM, they discover that a mechanical force applied over the cell nucleus triggers mechanical responses dependent on the Lamina composition.

      Strengths:

      The authors present a new approach to study mechano-transduction in living cells, with which they uncover lamin-dependent properties of the nucleus.

      Weaknesses:

      (1) The transfer of the mechanical response from the Lamina to the ER is not fully covered.

      (2) In Figure 4D, WT dots are the same for each compartment. Why do the authors not make one graph for each compartment with WT, A-KO, B-KD, and A-KO/B-KD together?

      (3) In Figure 1E, the authors showed well how the probe deforms the nucleus. It is not indicated in the material and methods section or in the figure legend, where, in Z, the acquisition of FLIM images was made or if it is a maximum projection. I assume it was made at a plane in the middle of the nucleus to see the nuclear envelope border and the ER at the same time. Did the authors look at the nuclear membrane facing upward, where most of the deformation should occur? Are there more lifetime changes? In Figure D, before injection of CytoD, we can clearly see a difference at the pyramidal indentation site with two different lifetime colors.

      (4) A great result of this article regards the importance of Lamins, A and B, in triggering the response to a mechanical force applied to the nucleus. Could 3D imaging for LaminA and LaminB be performed at the different time points of indentation to see how the lamins meshworks are deformed and how they return to basal state? This could be correlated with the FLIM results described in the article.

      (5) Lamins form a meshwork underneath the nuclear membrane. They are connected to the cytoskeletons mainly by the LINC complex. Results presented here show that the cytoskeletons are implicated in transferring the stimulus from the nuclear envelope to the ER. Could the author perform the same experiments using Nesprin-2 or/and Nesprin-1 or/and SUN1/2 knockdowns to determine if this transmission is occurring through the LINC complex or rather in a passive way by modifying the nuclear close surroundings?

      (6) The authors used cytoskeleton drugs, CytoD and Nocodazole, with their FluidFM probe, but did not show if the drugs actually worked and to what extent by performing actin or microtubule stainings. In the original paper describing FluidFM, 15s were enough to obtain a full FITC-positive cell after injection. Here, the experiments are around 5 minutes long. I therefore interrogate the rationale behind the injection of the drugs compared to direct incubation, besides affecting only the cell currently under indentation.

      We thank the reviewers for their constructive criticisms and suggestions. Accordingly, we amended the manuscript and the figures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Calcium-dependent effects: Indentation can evoke cytoplasmic CA<sup>2+</sup> elevations that drive myosin contraction and reshape the internal membrane network (e.g., vesiculation: PMID : 9200614, 32179693) that may affect Flipper-TR signals independent of membrane tension; without simultaneous CA<sup>2+</sup> imaging, cell viability assays (e.g., Sytox costaining), intracellular CA<sup>2+</sup> sequestration or myosin inhibition, a more complex mechanochemical coupling cannot be excluded. Tracking ER morphology during the experiments with luminal and membrane markers would further clarify this point.

      For the goal of our article which is exhibiting and quantifying the tension propagation and tension homeostasis over different organelles managing the mechanosensitivity and thus the mechanoresponse of cell, the test cells (drug injected cells) were compared with the control group of non-drug injected cells (Fig. 2 and Fig. 3), and in these cases potential overall responses of the cells to intendation, e.g. potential changes in CA<sup>2+</sup> sequestration, are covered by the control group.

      Interestingly, using only cylindrical probes in CytoD injection while indenting cells, demonstrated higher tension at the NE compared to the control group of non-drug injected cells. This indicates that a higher effect arising from the F-actin-disturbance phenomena compared to the indention process itself, at least where the cells were stimulated using cylindrical probes. That was also the reason why in the next steps of this study including varying the indentation site from the nucleus to the ER or cell periphery as well as studying WT cells compared to varied lamina compositions, only cylindrical probes with minimized indention effect on the NE and the ER were used.

      Lastly, to examine simultaneously response to tension changes and calcium dynamics, we have meanwhile extended our study and analyzed cells treated with different cytoskeleton disturbing drugs (e.g., CytoD), subjected to viscoelasticity measurements using AFM indentation (i.e. cells relaxation studies following indentation), and injected with drugs perturbing the regulation of CA<sup>2+</sup> homeostasis (i.e., Thapsigargin), combined with simultaneous CA<sup>2+</sup> imaging, for which another manuscript is in preparation.  

      (2) Baseline measurements: FlipperTR lifetime images acquired without indentation, collected with identical timing and illumination, are needed as controls to gauge potential light-induced or time-dependent changes.

      For every cell a baseline referring to its tension at relaxed state (without indentation) was quantified by a Flipper-TR image taken before the indention and injection processes (“before”). As explained in the manuscript (lines 180-184), this baseline tension value was then used to be subtracted from the tension measured over time by the time-lapse FlipperTR imaging over the course of 3-4 min of stimulation (indentation + injection) as well as immediately or 5 min post-stimulus. The control group (i.e., non-drug injected cells or WT cells where the effect of F-actin depolymerization or the effect of lamina composition were studied, respectively) was always performed in the same manner as for test group. As such, tenson increase due to the light-inducing, time-dependent changes or indentation solely, were excluded.

      (3) Indentation depth versus nuclear stiffness/tension: Because laminA/C depletion softens nuclei, a given force may arguably produce a deeper pit and thus greater (not less) membrane stretch. Demonstrating that pit geometry depends only on applied force - and not on genetic or pharmacological perturbations - is necessary to rule out alternative interpretations.

      We thank the reviewer for raising this important point regarding the relationship between indentation depth and nuclear stiffness. To address whether pit geometry depends on applied force rather than genetic perturbations, we analyzed the piezo movement required to reach the 150 nN force setpoint across all experimental conditions (WT, LMNA KO, LMNB KD, and LMNA KO/LMNB KD cells).

      Our results (Fig. S6) demonstrate that there is no statistically significant difference in the piezo displacement from the contact point to the 150 nN setpoint between any of the experimental groups (Kruskal-Wallis H-test: H = 1.744, p = 0.627). This indicates that for a constant applied force of 150 nN, the indentation depth is equivalent across all conditions despite differences in nuclear stiffness.

      Therefore, the observed differences in tension response and perhaps the membrane stretch cannot be attributed to variations in indentation depth but rather reflect the intrinsic differences in molecular mechanical response to equivalent mechanical stimuli.

      This has been added in the manuscript in lines 282-286.

      Reviewer #2 (Recommendations for the authors):

      (1) Please clarify the distinctions between the pyramidal and cylindrical probes. The manuscript alludes to sharpening the cylindrical probe to facilitate membrane rupture. Do both probes rupture the plasma membrane upon force application? If so, at what applied force does this occur? It seems that PM rupture would also affect tension on intracellular membranes during and especially after force application.

      Yes, both cylindrical and pyramidal probes are rupturing PM as well as the nuclear membrane when targeting the nucleus of cells. When targeting Hela cells, used for this study, pyramidal probes puncture the membrane at a higher force of 100 nN compared to rupture forces between 10 nN and 50 nN required for sharpened cylindrical probes used here. This was explained in manuscript lines 112-115 for cylindrical probes and revised for pyramidal probes in lines 115-119.

      (2) Also re: probes: it is clear from Figure 1 that the total volume displacement induced by the pyramidal probe is far greater than the cylindrical probe. This greater displaced volume seems to be a very reasonable explanation for the increased membrane tension detected with the pyramidal probe, but this interpretation is not discussed.

      That is a good point, thank you! This has been added in lines 138-140.

      (3) Both cytochalasin D and nocodazole work by preventing new polymerization of monomers, which acutely affects new assembly and, over time, leads to loss of polymerized filaments. On the timescale of the experiments shown, it seems possible that acute effects on new filament assembly may be occurring, but that pre-assembled filaments may remain stable. It may thus be a misinterpretation to describe these conditions as "without actin fibers" or "without MTs". Further complicating matters, it is possible that the kinetics of filament disassembly may be altered by combinatorial treatment and/or in lamin knockout conditions versus wild-type cells. For instance, it has been shown that microtubule depolymerization increases actin contractility (see PMID 33089509). For these reasons, control experiments showing the extent of actin and/or microtubule disassembly in each condition tested are essential to interpret the data shown.

      Thank you for rasing this valid point. This has been corrected and noted as "less actin fibers" and "less MTs". For what concerns the timescale within which the drugs (e.g., CytoD and Nocodazole) affect the filaments assembly, a higher concentration of 50 µM for each of CytoD and Nocodazole leading to final concentration of 0.5 µM was used for intracellular injection. This final physiologically relevant concentration was expected to act as fast as 12 min for CytoD and 1-5 min for Nocodazole when directly delivered inside the cell, excluding the required time for passing the plasma membrane. Especially in our study examining the dynamic response of cells and change in tension is focusing on the early effects of drugs and deviation from the control groups rather than the steady state achieved at longer time points. The basis for the time estimation relies on the reported values in the literature. For instance, a recent comprehensive study quantified actin dynamics and its interaction with CytoD using high resolution images of single actin filaments acquired by total internal reflection fluorescence (TIRF) microscopy and reported a value of approximately 150 s (depicted from the graphs presented in Fig. 2D and 2F) as a starting point of inhibiting actin filaments polymerization after introducing 5 nM CytoD flow in a chamber containing actin filaments.1 Or in another study, a half-time of 40 s for the complete disassembly of microtubules in monocytes has been reported for cells incubated with 1 µM Nocodazole.2 This part was also included in SI file, section “Mechanochemical stimulation”.

      (4) The presentation of some of the data could be clarified. For instance, it is unclear how some time course experiments can be non-significant but the endpoint analysis can be significant (for instance, Figure 3C vs. Figure 3D.)

      We agree that some instances require clearer interpretation: indenting cell nucleus using cylindrical probes induced a higher tension at CytoD-injected cells compared to control cells at both the ER and NE, during and after stimulus (Fig. 2E-F and Fig. 3C-D). Time lapse tension analysis of these cells at the ER and NE showed a close to significant and significant differences between test and control groups, respectively. p-values of 0.087 for Fig. 2E (bottom row, ER) and 0.042 for Fig. 3C (top row, ER) were captured at the ER for the last time point during stimulus. For “after stimulus” condition, significant differences between CytoD-injected and control cells at both the ER and NE were captured. The ER’s complex morphology consists of many curved structures of lumens and disks which can deform when subjected to external mechanical perturbation, making it prone to absorb stress and strain when directly targeted. That could explain the similar tension levels in both CytoD-injected and control cells during ER indentation. Notably, unlike nucleus-targeted cells, ER-targeted cells only show increased tension at the ER and NE in CytoDinjected cells compared to control ones after stimulation. This suggests fundamental differences in the mechanical coupling of the nucleus and the ER to the cytoskeleton. While the nucleus maintains direct, structural actin connections through the nuclear lamina and LINC complexes3, making it immediately sensitive to actin disruption, the ER relies on indirect, signaling-mediated cytoskeletal interactions4,5. Thus, the ER functions as a dynamic tension buffer that engages cytoskeletal support primarily during active repair processes following mechanical perturbation. This explains why nuclear probing reveals immediate tension differences in actin-disrupted cells, while ER probing only shows post-retraction effects. Consequently, statistical analysis detects significant differences between test and control groups after probe removal, but not during probe contact in ER-targeted experiments. This was also explained better in the manuscript in line 236.

      References

      (1) Mitani, T. et al. Microscopic and structural observations of actin filament capping and severing by Cytochalasin D. bioRxiv, 2025.2001.2028.635382 (2025).

      (2) Cassimeris, L. U., Wadsworth, P. & Salmon, E. D. Dynamics of microtubule depolymerization in monocytes. J Cell Biol 102, 2023-2032 (1986).

      (3) Maurer, M. & Lammerding, J. The Driving Force: Nuclear Mechanotransduction in Cellular Function, Fate, and Disease. Annu Rev Biomed Eng 21, 443-468 (2019).

      (4) Shi, X. et al. Actin nucleator formins regulate the tension-buffering function of caveolin-1. J Mol Cell Biol 13, 876-888 (2022).

      (5) van Vliet, A. R. & Agostinis, P. PERK and filamin A in actin cytoskeleton remodeling at ER-plasma membrane contact sites. Molecular & Cellular Oncology 4, e1340105 (2017).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the manuscript entitled 'The Role of ATP Synthase Subunit e (ATP5I) in 1 Mediating the Metabolic and Antiproliferative 2 Effects of Biguanides', Lefrancois G et al. identifies ATP5I, a subunit of F1Fo-ATP synthase, as a key target of medicinal biguanides. ATP5I stabilizes F1Fo-ATP synthase dimers, essential for cristae morphology, but its role in cancer metabolism is understudied. The research shows ATP5I interacts with a biguanide analogue, and its knockout in pancreatic cancer cells mimics biguanide treatment effects, including altered mitochondria, reduced OXPHOS, and increased glycolysis. ATP5I knockout cells resist biguanide-induced antiproliferative effects, but reintroducing ATP5I restores the effects of metformin and phenformin. These findings highlight ATP5I as a promising mitochondrial target for cancer therapies. The manuscript is well written.

      Strengths:

      Demonstrated the experiments in systematic and well-accepted methods.

      Weaknesses:

      The significance of the target molecule and mechanisms may help in understanding the molecular mechanisms of metformin.

      We greatly appreciate the reviewer’s insightful comment regarding the importance of the target molecule and its mechanisms in elucidating metformin’s molecular actions. ATP5I plays a key role in the dimerization and assembly of the F1F0-ATP synthase complex. To address this, we performed Blue Native-PAGE followed by western blotting using an antibody against the β-subunit of the F1 domain. Our results show that metformin affects the oligomeric state of the F1F0-ATP synthase in a way that partially reproduces the effect of the KO of ATP5I (Fig 2G). This provides direct evidence that metformin acts on-target through ATP5I.

      Reviewer #2 (Public review):

      Summary:

      The mechanism(s) by which the therapeutic drug metformin lowers blood glucose in type 2 diabetes and inhibits cell proliferation at higher concentrations remain contentious. Inhibition of complex 1 of the mitochondrial respiratory chain with consequent changes in cellular metabolites which favour allosteric activation of phosphofructokinase-1, allosteric inhibition of fructose bisphosphatase-1 and cAMP signalling and activation of AMPK which phosphorylates transcription factors are candidate mechanisms. The current manuscript proposes the e-subunit of ATP-synthase as a putative binding protein of biguanides and demonstrates that it regulates the expressivity of the Complex 1 protein NDUFB8.

      Strengths:

      (1) The metformin conjugate and metformin show comparable efficacy on inhibition of cell proliferation in the millimolar range.

      (2) Demonstration of compromised expression of the Complex I protein NDUFB8 by the ATP5I knockout and its reversal by ATP5I expression is an important strength of the study. This shows that the decreased "sensitivity" to metformin in the ATP5I knock-out cells could be due to various proteins.

      (3) Demonstration of converse effects of ATP5I KO and re-expression ATP5I on the NAD/NADH ratio.

      Weaknesses:

      (1) The interpretation of the cellular co-localization of the biotin-biguanide conjugate with TOMM20 (Figure 1-D) as mitochondrial "accumulation" of the conjugate is overstated because it cannot exclude binding of the conjugate to the mitochondrial membrane. It would have been more convincing if additional incubations with the biotin-biguanide conjugate in combination with metformin had shown that metformin is competitive with the biotin-conjugate.

      We appreciate the reviewer’s comment and agree that the resolution provided by fluorescence microscopy makes it challenging to pinpoint the specific mitochondrial compartment where the biotin-biguanide conjugate localizes, even with additional markers such as TOMM20 antibodies for the inner mitochondrial membrane. While it remains a possibility that the conjugate binds to the mitochondrial surface, another plausible explanation is that the biotin moiety may facilitate entry into mitochondria through a biotin-specific transporter, adding further mechanistic intricacies. Furthermore, while a competition assay with metformin might help investigate interactions with mitochondrial targets and transporters (OCT family), it would not compete for biotin-mediated transport. Thus, while we acknowledge the reviewer’s suggestion, we believe such an experiment may not provide conclusive evidence regarding the conjugate’s mitochondrial localization or mechanism of entry. Instead, we revised the manuscript to more accurately describe the findings as "mitochondrial association" rather than "mitochondrial accumulation," ensuring that our interpretation remains consistent with the resolution and limitations of the data presented.

      (2) The manuscript reports the identification of 69 proteins by mass spectrometry of the pull-down assay of which 30 proteins were eluted by metformin. However, no Mass Spectrometry data is presented of the peptides identified. The methodology does not state the minimum number of peptides (1, 2?) that were used for the identification of the 31/69 proteins.

      We added a comprehensive table summarizing these findings (Figure 1- figure supplement 2). We considered all peptides and decided to perform stringent validation tests for those chosen to be further studied.

      (3) The validation of ATP5I was based on the use of recombinant protein (which was 90% pure) for the SPR and the use of a single antibody to ATP5I. The validity of the immunoblotting rests on the assumption that there is no "non-specific" immunoactivity in the relevant mol wt range. Information on the validation of the antibody would be helpful.

      Regarding the recombinant protein used for SPR, its purity was evaluated using a Coomassie-stained gel. For the antibody used in immunoblotting, its specificity was validated through knockout cell lines (Figure 2A), ensuring minimal concerns about non-specific immunoactivity within the relevant molecular weight range. Unfortunately, the KO data comes in the paper after the first immunoblots are presented. We outlined this validation in the methods section.

      (4) Knock-out of ATP5I markedly compromised the NAD/NADH ratio (Fig.3A) and cell proliferation (Figure 3D). These effects may be associated with decreased mitochondrial membrane potential which could explain the low efficacy of metformin (and most of the data in Figures 3-5). This possibility should be discussed. Effects of [metformin] on the NAD/NADH ratio in control cells and ATP5I-KO would have been helpful because the metformin data on cell growth is normalized as fold change relative to control, whereas the NAD/NADH ratio would represent a direct absolute measurement enabling comparison of the absolute effect in control cells with ATP5I KO.

      The mitochondrial membrane potential depends on a functional electron transport chain which drives proton pumping from the matrix to the intermembrane space. Metformin can decrease the mitochondrial membrane potential and this is usually explained as a consequence of complex I inhibition [1]. It has been published that metformin requires this membrane potential to accumulate in mitochondria so the actions of metformin are self-limiting due to this requirement. The reviewer is right that ATP5I KO cells could be resistant to metformin because they may have a lower membrane potential. We do not believe this to be the case because the response to phenformin, another biguanide that can enter mitochondria through the membrane without the need of the OCT transporters [2], is also affected in ATP5I KO cells. Of note, compensatory mechanisms such as enhanced glycolysis, as observed in ATP5I KO cells (elevated ECAR and increased sensitivity to 2-D-deoxyglucose), and the ATPase activity of F<sub>1</sub>F<sub>0</sub>-ATP synthase could potentially help maintain membrane potential suggesting that this might not be an issue in the ATP5I KO cells. Chandel and colleagues already proposed that reversal of the F<sub>1</sub>F<sub>0</sub>-ATPase keeps this membrane potential in metformin-treated cells [3].

      Nevertheless, to experimentally address this point, we measured the mitochondrial membrane potential using tetramethylrhodamine methyl ester (TMRE) and ATP levels using luciferase-based assays (CellTiter-Glo) in ATP5I KO cells. We sow now that ATP levels are not significantly reduced in ATP5I KO cells, likely because of compensatory glycolysis (Figure 5D), while the mitochondrial membrane potential remains close to normal (Figure 6D and E).

      We did not measure the NAD<sup>+</sup>/NADH in both control and KO cells treated with metformin because we provide now a more direct measurement of metformin acting on ATP5I: the state of oligomerization of the F<sub>1</sub>F<sub>0</sub>-ATPase (Figure 2G) as well as a Seahorse Bioenergetic Stress test (Figure 6A-C). Both figures provide results consistent with targeting ATP5I by biguanides. We also discuss that targeting ATP5I can result in complex I inhibition due to the well-known role of F<sub>1</sub>F<sub>0</sub>-ATPases in cristae formation and the assembly of the respiratory complexes. We do not believe ATP5I is the only target of metformin and in the paper we properly acknowledged and discussed other proposed targets in the introduction, results section page 8 and the discussion.

      (5) Figure-6 CRISPR/Cas9 KO at 16mM metformin in comparison with 70nM rotenone and 2 micromolar oligomycin (in serum-containing medium). The rationale for the use of such a high concentration of metformin has not been explained. In liver cells metformin concentrations above 1mM cause severe ATP depletion, whereas therapeutic (micromolar) concentrations have minimal effects on cellular ATP status. The 16mM concentration is ~2 orders of magnitude higher than therapeutic concentrations and likely linked to compromised energy status. The stronger inhibition of cell proliferation by 16mM metformin compared with rotenone or oligomycin raises the issue of whether the changes in gene expression may be linked to the greater inhibition of mitochondrial metabolism. Validation of the cellular ATP status and NAD/NADH with metformin as compared with the two inhibitors could help the interpretation of this data.

      NALM-6 cells are very glycolytic, have low respiration rates, and weak dependence on ATP5I (DepMap score: -0.47) [4]. The concentration of 16 mM metformin was chosen based on the IC<sub>50</sub> for this cell line. Both ATP status and NAD<sup>+</sup>/NADH ratios will depend on the extent of the compensatory glycolysis. On the other hand, our genetic screening evaluates cell proliferation as an integration of all metabolic activities required for the process. This unbiased screening revealed a common pathway affected by metformin and oligomycin different that the pathway affected by rotenone, which is consistent with the finding that metformin acts of the F<sub>1</sub>F<sub>0</sub>-ATPase. Our new Seahorse data demonstrate that oligomycin has a markedly reduced effect in metformin-treated cells, supporting a shared mechanism of action. Notably, uncouplers restore respiration in both metformin-treated and ATP5I knockout cells, which aligns with the mechanism we propose (please see our new section on the Seahorse Mito Stress test and the new discussion). In the discussion, we acknowledged—based on existing literature—that the cellular context may play a significant role in determining the response to this drug.

      Reviewer #3 (Public review):

      Most of the data are based on measurements of the oxygen consumption rate (OCR) and extracellular acidification rate (ECAR) measured by the Seahorse analyser in control and ATP5l KO cells. However, these measurements are conducted by a single injection of a biguanide, followed over time and presented as fold change. By doing so, the individual information on the effect of metformin and derivate on control and KO cells are lost. In addition, the usual measurement of OCR is coupled with certain inhibitors and uncouplers, such as oligomycin, FCCP, and Antimycin A/rotenone, to understand the contribution of individual complexes to respiration. Since biguanides and ATP5l KO affect protein levels of components of complex I and IV, it would be informative to measure their individual contributions/effects in the Seahorse. To further strengthen the data, it would be helpful to obtain measurements of actual ATP levels in these cells, as this would explain the activation of AMPK.

      Thank you for this valuable comment. We have now performed the suggested analysis, which is presented in the new Figure 6. The data are consistent with our proposition that biguanides target ATP5I, but they also suggest the possibility of additional targets, such as Complex I, as proposed by other groups. Please see our new section on the Seahorse Mito Stress test and the new discussion. We also measured ATP (Figure 5D). and the mitochondrial membrane potential (Figure 6D and E). These measurements reflect the powerful compensation provided by glycolysis.

      The authors report on alterations in mitochondrial morphology upon ATP5l KO, which is measured by subjective quantifications of filamentous versus puncta structures. Fiji offers great tools to quantify the mitochondrial network unbiasedly and with more accuracy using deconvolution and skeletonization of the mitochondria, providing the opportunity to measure length, shape, and number quantitatively. This will help to understand better, whether mitochondria are really fragmented upon ATP5l KO and rescued by its re-introduction.

      Thanks for the suggestion. We used the Mitochondrial analyzer plugin from ImageJ/Fiji and redid Figure 2 and 4 and quantified details of the mitochondrial network reporting differences in branches number, length, endpoints and diameter.

      Finally, the authors report in the last part of the paper a genetic CRISPR/Cas9 KO screen in NALM-6 cells cultured with high amounts of metformin to identify potential new mediators of metformin action. It is difficult to connect that to the rest of the paper because a) different concentrations of metformin are used and b) the metabolic effects on energy consumption are not defined. They argue about the molecular function of the obtained hits based on literature and on a comparison of the pattern of genetic alterations based on treatments with known inhibitors such as oligomycin and rotenone. However, a direct connection is not provided, thus the interpretation at the end of the results that "the OMA1-DEL1-HRI pathway mediates the antiproliferative activity of both biguanides and the F1ATPase inhibitor oligomycin" while increasing glycolysis, needs to be toned down. This is an interesting observation, but no causality is provided. In general, this part stands alone and needs to be better connected to the rest of the paper.

      NALM-6 are very glycolytic, have low respiration rates, and weak dependence on ATP5I [4], forcing us to use higher concentrations of metformin to inhibit their growth. Recent results show that metformin targets PEN2 in the cytosol to increase AMPK activity, controlling both the glucose lowering and the life span extension abilities of metformin [5]. This work raises the question whether the antiproliferative and anticancer effects of metformin are due to a mitochondrial activity or are controlled by this new pathway of AMPK activation. Hence, the genetic screening was performed to unbiasedly find how metformin works. The results provide compelling evidence for mitochondria and in particular the ATP synthase as potential targets of metformin and a foundation for future studies. We added to the following text to the beginning of this section: “Several candidate targets have been reported for biguanides and our results presented so far suggest a new one. Clues about drug mechanism of action can be obtained in unbiased manner using genetic perturbation [6]. To obtain an unbiased observation of biological processes affected by metformin, we performed a genome-wide pooled CRISPR/Cas9 KO screen in NALM-6 cells cultured in the presence of metformin at a concentration affecting growth (16 mM).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1B, the total ACC antibody is missing, and the total AMPK should be replaced, especially since they claim pAMPK increases with metformin and BFB treatment. Additionally, the streptavidin pull-down image in Figure 1F needs to be resized to show the fully cropped section.

      We repeated this experiment three times and added the new figures to the supplemental data. We corrected the main figure in the manuscript with a representative blot for total ACC (Fig 1B).

      (2) Clarify whether ATP5I alone activates mitochondrial respiratory activity or if it functions in a complex with other proteins. Also, explain how metformin affects ATP5I-is it phosphorylated directly or through an upstream target

      ATP5I interacts directly with ATP5L and both proteins form part of the peripheral stack of the F<sub>1</sub>F<sub>0</sub>-ATP synthase. ATP5I and ATP5L play demonstrated roles in the dimerization of the F<sub>1</sub>F<sub>0</sub>-ATP synthase. We discussed that they may affect other functions of the enzyme as part of the peripheral stack which interact with the OSCP (oligomycin sensitivity conferring protein) located in the F1 portion of the enzyme. Further work is needed to understand how ATP5I may affect the interactions between the F0 and F1 parts of the enzyme. We did not investigate whether metformin affects the phosphorylation of ATP5I, but this remains an important question for future studies. The PhosphoSitePlus database indicates that ATP5I undergoes phosphorylation and acetylation at multiple sites, suggesting potential regulatory mechanisms worth exploring.

      (3) Ensure that all immunofluorescence (IF) images include a scale bar.

      Done

      Reviewer #2 (Recommendations for the authors):

      (1) Details of the mass spectrometry analysis and the number of peptides for the proteins identified would increase the merit of the study.

      We added a comprehensive table summarizing these findings (Figure 1- figure supplement 2). We considered all peptides and decided to perform stringent validation tests for those chosen to be further studied.

      (2) The lower NAD/NADH ratios in the ATP5I KO cell lines and the higher ratios with ATP5I expression are convincing data of the cellular redox state of these cells (with variable NDUFB8). Other data sets (e.g. OCR and ECAR and Relative growth, %) are normalized to the respective control and therefore do not show the relative effect of metformin (in control cells) to the ATP5I knock-out. The effects of metformin concentration on the NAD/NADH ratio would provide a direct measure of the extent to which metformin mimics ATP5I KO. This data would be clearer to interpret than Figure 3GHKL; Figures 5EF; S1; S2).

      We did not measure the NAD<sup>+</sup>/NADH in both control and KO cells treated with metformin because we provide now a more direct measurement of metformin acting on ATP5I: oligomerization state F<sub>1</sub>F<sub>0</sub>-ATPase and its vestigial assembly intermediates (Figure 2G) as well as a Seahorse Bionergetic Stress test (Figure 6A-C). Both figures provide results consistent with targeting ATP5I by biguanides. We also discuss that targeting ATP5I can result in complex I inhibition due to the well-known role of F<sub>1</sub>F<sub>0</sub>-ATPases oligomerization in cristae formation and the assembly of the respiratory complexes.

      (3) Figure 6: NAD/NADH data for metformin (16mM) and rotenone (70 nM) /oligomycin 2 uM) would establish whether the concentrations are "matched" to allow a comparison of their gene signatures.

      We used those concentrations based on similar effects on cell growth since the ration NAD/NADH depends on the extent of glycolytic compensation induced by blocking respiration.

      (4) Intramitochondrial accumulation of the biotin conjugate could be demonstrated in Figure 1D from competition between metformin and the biotin-conjugate.

      We appreciate the reviewer’s comment and agree that the resolution provided by fluorescence microscopy makes it challenging to pinpoint the specific mitochondrial compartment where the biotin-biguanide conjugate localizes, even with additional markers such as TOMM20 antibodies for the inner mitochondrial membrane. While it remains a possibility that the conjugate binds to the mitochondrial surface, another plausible explanation is that the biotin moiety may facilitate entry into mitochondria through a biotin-specific transporter, adding further mechanistic intricacies. Furthermore, while a competition assay with metformin might help investigate interactions with mitochondrial targets and transporters (OCT family), it would not compete for biotin-mediated transport. Thus, while we acknowledge the reviewer’s suggestion, we believe such an experiment may not provide conclusive evidence regarding the conjugate’s mitochondrial localization or mechanism of entry. Instead, we revised the manuscript to more accurately describe the findings as "mitochondrial association" rather than "mitochondrial accumulation," ensuring that our interpretation remains consistent with the resolution and limitations of the data presented.

      Reviewer #3 (Recommendations for the authors):

      In addition to my comments for the public review, the manuscript would be strengthened by the following points:

      (1) The abstract needs to be streamlined to communicate more clearly what the paper is about. The last part of the results is not mentioned and is completely disconnected from the ATP5I KO story.

      We have significantly modified our abstract to include both the genetic screening significance and our new findings on the F<sub>1</sub>F<sub>0</sub>-ATP synthase oligomerization.

      (2) Quantifications of the western blots (Figure 1B) are missing. Seems like AMPK total protein levels go down with BFB.

      We quantified the blots.

      (3) How often was the pull-down repeated (Figure 1F)? It would be also important to show this in other cell types, such as pancreatic cancer cells.

      The pull-down was an initial large-scale discovery experiment performed once. However, the findings were subsequently validated in KP-4 pancreatic cancer cells in three independent experiments. As a direct readout of metformin’s impact on ATP5I, we assessed the oligomerization state of the F1ATPase and compared the effects of metformin with those of ATP5I knockout. We show that metformin partially phenocopies the ATP5I KO phenotype, and we reproduced this effect in a second cell line, U2OS osteosarcoma cells.

      (4) Does the KO of ATP5l affect other subunits of the v-ATP5a?

      Yes—we added an immunoblot to document this in Fig. 2A. Notably, ATP5I knockout also reduces ATP5L and OSCP levels.

      (5) Does metformin and BFB itself affect mitochondrial morphology and respiration?

      To evaluate the activity of BFB in comparison with metformin, we performed immunoblot analyses of the AMPK pathway, growth assays, and microscopy-based assessment of mitochondrial morphology. These data are shown in Fig. 1B–D. A more comprehensive analysis of metformin’s effects on mitochondrial respiration has now been added as Fig. 6, using Seahorse measurements and multiple respiratory inhibitors.

      (6) Since there is a strong increase in ECAR, does this correspond to an increase in glucose uptake? Are the proteins or genes involved altered or how to explain the increased flux through glycolysis in ATP5l KO cells?

      This is a very interesting idea, as our CRISPR screen identified several genes that could potentially enhance glycolysis as a vulnerability in metformin-treated cells. In future work, we will explore this biology in greater depth.

      (7) Line 242, for easier understanding, states clearly that metformin reduces growth by x-percent.

      Yes, is a 65-fold chang. We added it to the text.

      (8) The conclusion at the end of the result section is not supported by the data or not well explained. I guess oligomycin will stop the action of metformin on vATP5l, or how to explain this?

      We clarified the conclusion.

      (1) Xian, H., Liu, Y., Rundberg Nilsson, A., Gatchalian, R., Crother, T. R., Tourtellotte, W. G., Zhang, Y., Aleman-Muench, G. R., Lewis, G., Chen, W., Kang, S., Luevanos, M., Trudler, D., Lipton, S. A., Soroosh, P., Teijaro, J., de la Torre, J. C., Arditi, M., Karin, M. & Sanchez-Lopez, E. Metformin inhibition of mitochondrial ATP and DNA synthesis abrogates NLRP3 inflammasome activation and pulmonary inflammation. Immunity 54, 1463-1477 e1411, (2021).

      (2) Hawley, S. A., Ross, F. A., Chevtzoff, C., Green, K. A., Evans, A., Fogarty, S., Towler, M. C., Brown, L. J., Ogunbayo, O. A., Evans, A. M. & Hardie, D. G. Use of cells expressing gamma subunit variants to identify diverse mechanisms of AMPK activation. Cell metabolism 11, 554-565, (2010).

      (3) Wheaton, W. W., Weinberg, S. E., Hamanaka, R. B., Soberanes, S., Sullivan, L. B., Anso, E., Glasauer, A., Dufour, E., Mutlu, G. M., Budigner, G. S. & Chandel, N. S. Metformin inhibits mitochondrial complex I of cancer cells to reduce tumorigenesis. eLife 3, e02242, (2014).

      (4) Hlozkova, K., Pecinova, A., Alquezar-Artieda, N., Pajuelo-Reguera, D., Simcikova, M., Hovorkova, L., Rejlova, K., Zaliova, M., Mracek, T., Kolenova, A., Stary, J., Trka, J. & Starkova, J. Metabolic profile of leukemia cells influences treatment efficacy of L-asparaginase. BMC Cancer 20, 526, (2020).

      (5) Ma, T., Tian, X., Zhang, B., Li, M., Wang, Y., Yang, C., Wu, J., Wei, X., Qu, Q., Yu, Y., Long, S., Feng, J. W., Li, C., Zhang, C., Xie, C., Wu, Y., Xu, Z., Chen, J., Yu, Y., Huang, X., He, Y., Yao, L., Zhang, L., Zhu, M., Wang, W., Wang, Z. C., Zhang, M., Bao, Y., Jia, W., Lin, S. Y., Ye, Z., Piao, H. L., Deng, X., Zhang, C. S. & Lin, S. C. Low-dose metformin targets the lysosomal AMPK pathway through PEN2. Nature 603, 159-165, (2022).

      (6) Bruno, P. M., Liu, Y., Park, G. Y., Murai, J., Koch, C. E., Eisen, T. J., Pritchard, J. R., Pommier, Y., Lippard, S. J. & Hemann, M. T. A subset of platinum-containing chemotherapeutic agents kills cells by inducing ribosome biogenesis stress. Nat Med 23, 461-471, (2017).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Review:

      (1) Problems associated with averaging: The authors intended to focus on the oviposition clock in individual females, however due to the inherent noise in the oviposition rhythm they had to resort to averaging across Lomb-Scargle periodograms generated from individual time-series. They then tested whether the averaged periodogram contains a significant frequency. However, this reduction in noise also reduces the ability to compare differences in power of the rhythm across individuals. Furthermore, this method makes it especially difficult to distinguish the contribution of subsets of the circuit on the proportion of rhythmic flies and the power of the rhythm. In this revised version the authors use two manipulations to disrupt the molecular clock, which could have different success rates based on the type and number of cells targeted. Unfortunately, the type of averaging used prevents the detection of any such effects. It is to be noted that, indeed, individual-level differences in period between the PdfDicerGal4 > perRNAi and UAS-perRNAi lines help the authors to establish that there is a significant reduction in period length when the molecular clock is abolished in PDF cells. These individual measurements are now very helpful in discerning the effect of manipulations carried out on different circadian neural subsets, some of which could have been missed if only averages were considered.

      First, it is important to emphasize that we are certainly not "averaging across LombScargle periodograms". As explained in the paper (and at length in the Supplementary Material), what we do is first to detrend each individual time series, then average _all_ the resulting time series (and not only those of rhythmic individuals), and finally take the Lomb-Scargle periodogram of this average series. Nevertheless, we agree with the reviewer in that the use of averages reduces our ability of understanding what happens at the individual level. The problem is that in most cases the presence of noise has made it difficult to draw any meaningful conclusions. One fortunate exception is the one mentioned by the reviewer. Averaging, on the other hand, has allowed us to extract some useful information in those cases.

      (2) Sensitivity to sample size: Averaging reduces the effect of random background noise but noise reduction is dependent upon sample size. Comparing genotypes with different sample sizes in addition to varying signal to noise ratios (which might also change with neural manipulations) makes it difficult to estimate how much of the rhythm structure is contributed by a given neuronal subset; thus, whenever possible comparisons should be made between groups that include similar number of flies. This problem is compounded when the averaged periodogram is composed of both rhythmic and weakly rhythmic individuals. For instance, in the main text the reported value of period length of pdfDicerGal4 > perRNAi is 20.74h (see also Fig 2J) but in the Supplementary figure 2S1 this is close to 22h, while the values reported for the control are largely similar (24.35h in Fig 2H versus ~24h in Fig 2S1). A difference of 3.6h between control and experimental flies is much greater than 2h. Which estimate (average versus individual) is more reliable in predicting the behavior of these flies is difficult to determine without further experiments.

      In most of the experiments analyzed for this paper the number of flies for control and experimental genotypes are very similar. In the remaining ones, the number of flies for experimental genotypes is roughly twice the number of flies for control genotypes. As mentioned, noise reduction depends on sample size. This implies that, when a genotype is assessed as rhytyhmic the sample size used is evidently large enough. On the other hand, when a genotype is assessed as arrhythmic it is important to know if sample size is large enough. It is for this reason that we have used many more flies for arrhythmic genotypes vs. their control genotypes.

      Regarding the period difference between the average of rhythmic individuals, and the population denoised average, notice first that they are not necessarily excactly the same thing, since our population average uses all flies, and the denoising might introduce some variations over the underlying periods (which would be undetectable without the denoising). Also, and more importantly, Fig. 2S1 shows that for the average of the individual periods the error bars are large, and thus statistically, the reported value for the population average falls within the confidence interval for the individual average.

      (3) Based on the newly provided data for individual fly periodograms the reader can visually evaluate the rhythmicity associated with each genotype. Such visual inspection did not reveal any clear difference between the proportion of rhythmic individuals between experimental and parental GAL4 and/or UAS controls, except for experiments using per01 mutant animals. This is surprising since if these circuits are controlling the oviposition rhythm, perturbing them should affect most individuals in a similar way.

      The problem here is that, given the amount of noise present in this behavior, it is difficult to obtain any reliable information from individual records, since, by its random nature, in a given experiment noise might be disturbing the expected behavior of individuals in very different ways. That is the reason why we have resorted to population averages.

      Other comments

      Disrupting the clock in the 5th sLNv and 3 Cry+ LNds (and weakly in a small subset of DN1) affected egg-laying. Although the work emphasizes the importance of the LNd, the role of the 5th sLNv's role should be discussed.

      As mentioned in the paper, what the experiments show is that the 3 Cry+ LNds and 5th sLNv (usually called E cells) are candidates to be the main drivers of the oviposition rhythm, but the connectomics show that only 2 Cry+ LNds are connected to the oviposition circuit. In order to be more accurate, throughout the corresponding section (now called "The molecular clock in E neurons is necessary for rhythmic egg-laying") of the corrected manuscript we have always referred to the cells marked by the driver as E-cells. In the Discussion, we have added a line commenting that, in the connectome, the 5th sLNv is not connected to any cells of the oviposition circuit.

      Minor corrections:

      In subsection "Two Cry+ LNd neurons directly oviIN", there was a mistake in the use of "E1" and "E2" (their meanings were interchanged). We have corrected this section, giving the correct definitions. We have also corrected some minor english typos.

      Joint Recommendations for the authors:

      (1) Line 234 'to disrupt the molecular clock in (those) neurons', Please clearly describe the cell types in which MB122B driver works.

      We have clarified the cell types in which MB122B driver is expressed (line 236)

      (2) Line 235 gen cycle, should be gen'e' cycle

      The typo has been corrected

      (3) The authors should provide the raw data in repositories as per journal policy of eLife.

      The data are now available at the following links:

      https://github.com/srisaug/flywork/blob/main/RawData_Rivaetal_eLife2025_Fig4_+> UAS-perRNAi.zip

      https://github.com/srisaug/flywork/blob/main/RawData_Rivaetal_eLife2025_Fig4_M 122Bsplit-Gal4>+.zip

      https://github.com/srisaug/flywork/blob/main/RawData_Rivaetal_eLife2025_Fig4_MB122Bsplit-Gal4>UAS-perRNAi.zip

      https://github.com/srisaug/flywork/blob/main/RawData_Rivaetal_eLife2025_Figures1

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Shahbazi et al used a recurrent neural network model trained to control a musculoskeletal model of the arm to investigate how neural populations accommodate activity patterns underpinning savings. The paper draws upon the recent finding of a "uniform shift" in preparatory activity in monkey motor cortex associated with savings, and leverages full access to a computational model to establish causality.

      Strengths:

      The paper is well written, and the figures are clearly presented. The key finding that the uniform shift first reported based on neural recordings by Sun et al. emerges in artificial neural networks performing a similar task is interesting and well-backed by their analyses. Manipulating this uniform shift to show that it drives behavioural savings is an important causal confirmation of the proposal by Sun et al.

      Weaknesses / Comments:

      As mentioned earlier, the core results are well backed by the analyses. Most of my comments relate to adding more controls and additional questions that could be explored with the model to strengthen the paper.

      (1) Savings are quantified as more rapid relearning of the FF upon re-exposure (e.g., Figure 3). This finding is based on backpropagation through time, but would this hold when using a different optimiser, e.g., FORCE?

      This is an interesting question, and indeed, there are an increasing number of studies addressing how different neural network learning rules may affect the kinds of representations that arise after learning (Codol et al., 2024). However the focus of the present paper is not on which neural network approaches or which specific optimisers produce savings, rather, the focus is on the basis and neural geometry of savings when it emerges.

      We have added a short paragraph to the Discussion section [lines 349-355] to address this:

      “The present results are based on RNNs trained in an error-based approach using backpropagation through time (Werbos, 1990) using the Adam optimizer (Kingma and Ba, 2014). Other techniques for training RNNs have been proposed including the FORCE algorithm (Sussillo and Abbott, 2009). In addition, several recent reports have demonstrated success using reinforcement learning approaches to train neural networks in the context of sensorimotor control tasks (Lillicrap et al., 2015; Codol et al., 2024a). An interesting avenue for future work is to determine how the present results may or may not generalize to different neural network architectures and learning rules.”

      (2) The authors should include a "null model" showing that training on a different reaching task following NF, as opposed to FF2, won't show something akin to a uniform shift during preparation due to the adoption of TDR and having similar targets.

      This is a critical point. Training on a different reaching task other than FF2 (e.g. a different force field) will indeed result in a uniform shift, but critically, a shift in a different direction in neural state space than the uniform shift associated with FF2. The central focus of the present paper is to show that when there remains a non-zero projection of preparatory neural activity along the direction of the uniform shift associated with a given learning task, this residual projection underlies savings when networks are subsequently re-exposed to the same task.

      In the Results section we had included a short paragraph to describe control simulations that we performed that address this concept. We have expanded this text and added a Figure and the results of statistical tests to better describe this control [lines 179-187]:

      “As an additional control we trained networks after the growing up phase on an opposing force field (CCW) and then as above, exposed the networks to a NF washout phase, and then to a CW force field. In this case no savings was observed in the CW force field, either for initial lateral deviation, or for learning rate (Figure 3). In fact, we observed that initial lateral deviation is larger for the novel force field (t(39)=-4.918, p=1.6e-5). This observation is in line with the finding that learning opposing force fields sequentially results in interference (Sun et al., 2022). The results of these control simulations underscore that the savings effect observed in our main study was learning-specific—it was due to prior learning of the CCW force field, and not a general effect of learning any novel dynamics.”

      (3) The analyses of network activity during movement preparation (Figure 4) nicely replicate the key finding in Sun et al, but I think the authors could leverage the full access to their network and go further, e.g., by examining changes (or the lack of) during execution in FF2 with respect to FF (and perhaps in a future NF2 with respect to NF), including whether execution activity lives also lives in parallel hyperplanes, etc.

      We agree that a visualization of the neural activity during movement would be beneficial to the reader. To address this we have added a new Figure (Fig. 6) and associated text [lines 210-219]. The Figure shows the neural trajectories when the RNNs are first exposed to the FF1 and when they are first exposed to FF2 (after NF2 washout). Trajectories are plotted in 3D corresponding to the first 3 principal components, starting at the go cue and ending 200 ms into the movement, for each of the 8 movement targets.

      “The neural trajectories for preparation and for movement can be visualized in principal component space. Figure 6 shows trajectories during planning and early execution for initial FF1 and FF2 exposure. Hidden unit activity was subjected to a principal components analysis, and neural trajectories within the first three PCs are shown for movements to each of the eight movement targets. Filled circles indicate neural state 200 ms prior to the go cue. During the preparatory period trajectories travel along PC1 and then disperse across PC2 and PC3 into the circular pattern indicated by the filled stars, which indicate time of the go cue (also see Figure 5A). After the go cue neural trajectories shift back along PC1 and rotate along oscillatory patterns characteristic of populations of motor cortical neurons in non-human primates during movement (Churchland and Shenoy, 2024).”

      (4) Related to the above, while the results are interesting and the paper is well done, I kept wishing that the authors had done "more" with their model. This could be one or two final sections on "predictions" that would nicely complement their "validation" of the uniform shift, and that, in my opinion, would greatly increase the impact of the paper. In particular:

      (a) What would be the effect of learning more "tasks"? For example, is there a limit on how many fields can be learned? (You show something related by manipulating network size, but this is slightly different.)

      These are interesting questions and to some extent they are already addressed in the paper. Of course, the number of tasks that a network is able to learn, will be related to how much those tasks overlap in a control space. Indeed, this idea goes back to early theoretical accounts of connectionist models such as Hopfield nets and capacity for representing information (Hopfield, 1982; Hopfield et al., 1983). The control simulations that we described in the paper [lines 179-187 and Figure 4] are a test of one extreme version of this, in which two tasks are in direct opposition to each other (opposite force fields), and in this situation no savings emerges. We believe it is an interesting question, but beyond the scope of the present paper to undertake a comprehensive exploration of the nature of task-overlap in upper limb reaching learning tasks.

      (b) Figure 5 is a nice causal demonstration that the uniform shift is related to savings. However, and related to comment #3, it'd be interesting to see more details about how the behaviour and the network activity changes as preparatory activity shifts along this axis, in particular regarding how moving the preparatory states affect the organisation and dynamics of upcoming execution activity -these are the kind of intuitions that modelling studies like this one can provide.

      This has been addressed above by the changes we made to address the reviewer’s comment #3.

      (c) The authors focus on a task design that spans baseline, FF, NF, FF2 to replicate the original study by Sun et al. However, it would be interesting if they generated predictions for neural changes to other types of tasks that have been studied behaviourally. These could include, for example: (i) modelling a visuomotor rotation or a mirror reversal task; (ii) having to adapt to a FF in the opposite direction; (iii) investigating the role of adding an explicit context and having the networks learn multiple FF; and (iv) trying to learn FF fields in opposite directions, perhaps restricted to specific targets. As the authors know, all these questions and more have been studied with similar behavioural paradigms, and it would be nice to see what neural predictions are generated by this model.

      See responses above e.g. to comment 4. We have clarified the text and provided a new Figure to illustrate our opposite FF control simulations. The other suggestions about visumotor rotations, and contextual cues, are interesting and potentially important questions that we are working on, but we believe are beyond the scope of the current paper which is focused specifically around the question of savings in FF learning.

      (5) On the Discussion: When extrapolating from neural network results to animals, the fact that your networks can learn implicitly doesn't mean that animals do learn implicitly. Indeed, I think the consensus view is that different perturbations may lead to the expression of different types of savings (e.g., FF vs VR, which seems to be more explicit). Besides, these different mechanisms may be primarily implemented by brain regions less directly tied to motor control (e.g., cerebellum, parietal cortex?), which are not directly implemented in the authors' model.

      Of course the reviewer is correct that our simulations are not evidence that savings in motor tasks learned by animals is only implicit, and we do not make any such claims in the paper. The model we describe in the present paper is not meant to be a comprehensive model of motor learning in humans/animals. Indeed, the pure “context free” type of learning that we implement in our simulations basically cannot occur in animals, because there is always some information that provides contextual information. Indeed there are computational models of motor learning that include these effects, e.g. the COIN model (Heald et al., 2021). Our model however provides a useful window into what the context-free component of savings may look like. The approach we describe in the present paper is a powerful way to probe the context-free component of savings in isolation in a way that is not possible (at least not readily) in animals/humans. We have modified the text in the Discussion [lines 372-379] to better articulate this point.

      “The simulations described here do not constitute evidence that savings in motor learning tasks is exclusively implicit in animals and humans. The purely context-free learning implemented in our simulations is highly unrealistic, as some form of contextual information is invariably available. Indeed, computational models of motor learning that incorporate contextual effects already exist, e.g. (Heald et al. 2021). Nevertheless, our simulations provide a useful window into what the context-free component of savings may look like. This approach offers a powerful means of probing the context-free component of savings in isolation—something that is not readily achievable in animal or human experiments.”

      Reviewer #2 (Public review):

      Summary:

      Shahbazi et al. trained recurrent neural networks (RNNs) to simulate human upper limb movement during adaptation to a force field perturbation. They demonstrated that throughout adaptation, the pattern of motor commands to the muscles of the simulated arm changed, allowing the perturbed movements to regain their typical, perturbation-free straight-line paths. After this initial learning block (FF1), the network encountered null-fields to wash out the adaptation, before re-experiencing the force in a second learning block (FF2). Upon re-exposure, the network learned faster than during initial learning, consistent with the savings observed in behavioral studies of adaptation. They also found that as the number of hidden units in the RNN increased, so did the probability of exhibiting savings. The authors concluded that these results propose a neural basis for savings that is independent of context and strategic processes.

      Strengths:

      The paper addresses an important and controversial topic in motor adaptation: the mechanism underlying motor memory. The RNN simulation reproduces behavioral hallmarks of adaptation, and it provides a useful illustration of the pattern of muscle activity underlying human-like movements under both normal and perturbing conditions. While the savings effect produced by the network, though significant, appears somewhat small, the simulation demonstrating an increase in savings with a greater number of hidden units is particularly intriguing.

      Weaknesses:

      (1) To be transparent, savings in motor adaptation have been a primary focus of my own research. Some core findings presented in this paper are at odds with the ideas I and others have previously put forward. While I don't want to impose my agenda on the authors of this paper, I do think the authors should address these issues.

      (a) The authors acknowledge the ongoing debate in the literature regarding the mechanisms underlying savings, particularly whether it stems from explicit or implicit learning processes. However, it remains unclear how the current work addresses this debate. There is already a considerable body of research, particularly in visuomotor adaptation, demonstrating that savings is predominantly driven by explicit strategies. For example, when people are asked to report their strategy, they recall a strategy that was useful during the first learning block (Morehead et al. 2015). Furthermore, savings are abolished under experimental manipulations designed to eliminate strategic contributions (e.g., Haith et al., 2015; Huberdeau et al., 2019; Avraham et al., 2021). The authors briefly state that their findings support the hypothesis that a neural basis of memory retention underlying savings can be independent of cognitive or strategic learning components, and that savings can be characterized as implicit. While these statements may be true, it is not clear how this work substantiates these claims.

      We have addressed a similar point raised by Reviewer 1, see point #5 above. Our work represents an example of how savings can occur from implicit mechanisms in the absence of explicit contextual cues. Our goal is not to resolve the debate about how this occurs in humans/animals. Rather, our model provides a useful window into what the context-free component of savings may look like. Our approach is a powerful way to probe the context-free component of savings in isolation in a way that is not possible (at least not readily) in animals/humans. We have modified the text in the Discussion [lines 372-379] to better articulate this point.

      “The simulations described here do not constitute evidence that savings in motor learning tasks is exclusively implicit in animals and humans. The purely context-free learning implemented in our simulations is not meant to be a full model of biological learning, as in biological systems some form of contextual information is invariably available. Indeed, computational models of motor learning that incorporate contextual effects already exist, e.g. (Heald et al. 2021). Nevertheless, our simulations provide a useful window into what the context-free component of savings may look like. This approach offers a powerful means of probing the context-free component of savings in isolation—something that is not readily achievable in animal or human experiments.”

      (b) Our research has also demonstrated that if implicit adaptation is completely washed out after the initial learning block, it not only fails to exhibit savings but is actually attenuated relative to the first learning block (Avraham et al., 2021). This phenomenon of attenuation upon relearning can also be seen in other studies of visuomotor adaptation (e.g., Leow et al., 2020; Yin and Wei, 2020; Hamel et al., 2021; Hamel et al., 2022; Wang and Ivry, 2023; Hadjiosif et al., 2023). More recently, we have shown that this attenuation is due to anterograde interference arising from the experience with the washout block experience (Avraham and Ivry, 2025). We illustrated that the implicit system is highly susceptible to interference; it doesn't require exposure to salient opposite errors and can occur even following prolonged exposure to veridical feedback. The central thesis of this paper, namely that implicit savings can emerge through RNNs, is at odds with these empirical results. The authors should address this discrepancy.

      These empirical results are interesting and intriguing, and we agree that they are relevant in the context of the debate about the relative contributions and interactions between explicit and implicit learning systems and savings. Importantly, contextual interference is impossible in our model, since there are no contextual cues about which force field is present or absent. Interactions between an explicit system and an implicit learning system are also impossible in our model, since there is no possibility of context-driven explicit learning or memory. The approach we have taken in the present paper is not to model a full explicit plus implicit learning system but rather to probe how savings may emerge from a purely implicit learning mechanism alone and to compare the neural geometry underlying this implicit-drive savings to the neural recording results from monkey electrophysiology studies. Nevertheless we have added some text to the Discussion [lines 380-391] to situate our findings in the context of the studies mentioned above by the reviewer.

      “Recent empirical work suggests that relearning after washout of implicit adaptation can be attenuated rather than facilitated, a phenomenon attributed to anterograde interference from the washout phase (Avraham et al., 2021; Hadjiosif et al., 2023; Hamel et al., 2022, 2021; Leow et al., 2020; Wang and Ivry, 2025; Yin and Wei, 2020). The savings observed in our simulations differs from these behavioral findings. Crucially, our model excludes both contextual interference (since no cues signal which force field is present) and explicit-implicit interactions (since context-driven explicit learning is absent). Our goal was not to model a complete explicit-implicit system, but rather to probe how savings may emerge from a purely implicit mechanism and to compare the underlying neural geometry to monkey electrophysiology data. Our results suggest that high-dimensional neural circuits possess an intrinsic capacity for savings via persistent preparatory traces. How and when this capacity may be masked by interference or explicit-implicit interactions in biological systems remains an open question for future work.”

      (2) This brings me to the question about neural correlates: The results are linked to activity in the primary motor cortex. How does that align with the well-established role of the cerebellum in implicit motor adaptation? And with the studies showing that savings are due to explicit strategies, which are generally associated with prefrontal regions?

      The modeling approach we use in the present paper is area agnostic, and we do not include different neural modules to represent specific brain areas such as cerebellum or prefrontal regions. In the current approach we specifically exclude explicit strategies, as a way to specifically probe implicit mechanisms alone. Also see response to reviewer 1 comment 5 above.

      (3) The analysis on the complexity of the neural network (i.e., the number of hidden units) and its relationship to savings is very interesting. It makes sense to me that more complex networks would show more savings. I'm not sure I follow the author's explanation, but my understanding is that increased network complexity makes it more difficult to override the formed memory through interference (e.g., from the experience with NF2). Also, the results indicate that a network with 32 units led to a less-than-chance level of networks exhibiting savings (Figure 3b). What behavioral output does this configuration produce? Could this behavior manifest as attenuation upon relearning? Furthermore, if one were to examine an even smaller, simpler network (perhaps one more closely reflecting cerebellar circuits), would such a model predict attenuation rather than savings?

      These are interesting questions, and are potentially important, for future work to explore. Our interpretation of the results of smaller networks is that these small RNNs fail to show savings presumably because the learned FF behavior is 'erased' during washout because of the limited capacity to retain the FF learning in a distinct neighborhood in neural state space. Our paper is focused specifically on the relationship between savings, implicit learning, and neural capacity via network size, in the context of the monkey electrophysiology results in motor cortex. It would be interesting in future work to explore a cerebellar-like modeling approach.

      (4) The authors emphasize that their network did not receive any explicit contextual signals related to the presence or absence of the force field (FF), thus operating in a 'context-free' manner. From my understanding, some existing models of context's role in motor memories (e.g., Oh and Schweighofer, 2019; Heald et al., 2021) propose that memory-related changes can be observed even without explicit contextual information, as contextual changes can be inferred from sudden or significant environmental shifts (e.g., the introduction or removal of perturbations). Given this, could the observed savings in the current simulation be explained by some form of contextual retrieval, inferred by the network from the re-presentation of the perturbation in FF2?

      It is important to note that this is not possible in the context of the modeling approach described in the present paper. For example, in trial 1 of FF2, because the network has no contextual cue signaling the FF’s presence, the network has no information before movement begins that a FF will be present during movement (recall that the FF is velocity-dependent, and so is zero before movement begins). Once the network encounters the FF during movement, some component of its response I suppose could be described as contextual inference derived from effector state (similar to the account described in the COIN model), but strictly speaking the model is only responding to what it encounters in the moment. Any change in behaviour due to prior learning (e.g. savings) is due to the interaction between the residual learning-related neural state (e.g. the uniform shift), the effector state in the moment, and the errors encountered during movement. We don’t interpret this as “inference” in the traditional sense of an explicit learning system.

      (5) If there is residual hidden unit activity related to the FF at the end of the NF2 phase, how does the simulated movement revert back to baseline? Are there any differences in the movement trajectory, beyond just lateral deviation, between NF1 and NF2? The authors state that "changes in the preparatory hidden unit activity did not result in substantive changes in the motor commands (Figure 5b), which emphasizes that the uniform shift resides in the null space of motor output." However, Figure 5b appears to show visible changes in hidden unit activity. Don't these changes reflect a pattern of muscle activity that is the basis for behavior? These changes are indeed small, but it seems that so is the effect size for savings (Figure 3a). Could this suggest that there is not, in fact, a complete washout of initial learning during NF2 within the network?

      This is precisely the point of the paper, i.e. to show that neural activity during the preparatory period before movement onset is different, even though the behaviour during the preparatory period is the same (i.e. no muscle activity and no movement). This recapitulates the empirical findings from the neural data reported in the Sun et al. (2022) paper.

      The reviewer asks “Don't these changes reflect a pattern of muscle activity that is the basis for behavior?” Yes indeed they do, but not during the NF and not during the preparatory activity prior to movement onset.

      The reviewer asks “Could this suggest that there is not, in fact, a complete washout of initial learning during NF2 within the network?” We addressed this in the paper (Results/Washout) by comparing kinematics after washout to that prior to FF learning; e.g. any differences in lateral deviation of the hand path for the entire reach trajectory was in the range of 0.1 mm, which is less than 0.25 % of the lateral deviation encountered in the FF and only 0.1 % of the reach distance (10 cm).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1c, lower panel: Is this from the early or late stage of FF1?

      This is an example movement after learning in a null field (NF). We have clarified this in the Figure caption.

      (2) Please clarify what the two panels in Figure 1e represent.

      We have clarified in the Figure caption that these are activity from two example hidden units.

      (3) If Figure 2c is intended to illustrate the changes in motor commands for individual muscles, consider reorganizing the plots by muscle to more clearly show the change for each muscle from NF1 to FF1.

      The point here is not to make fine-grained comparisons between specific muscles, rather to show a general example of how muscle activity is different. For the sake of visual simplicity in a Figure that already has many components we have decided to keep Figure 2c the same.

      (4) The text mentions that no savings were observed when the network was trained on CCW followed by CW perturbations. However, no data or statistical analysis is presented to support this claim. I wonder if the authors would expect attenuated learning when exposed to the CW perturbation, given a memory of the opposite perturbation.

      We have added a Figure to provide data for the FF opposite control.

      (5) The relevance of the discussion on choking under pressure to the paper wasn't clear.

      We have modified the relevant text in the Discussion section [lines 356-363] to clarify the relevance of the present work to other recent work on how complex features of motor behaviour can arise due to the dynamics of preparatory neural activity in motor cortex.

      References

      Avraham G, Morehead JR, Kim HE, Ivry RB. 2021. Reexposure to a sensorimotor perturbation produces opposite effects on explicit and implicit learning processes. PLoS Biol 19:e3001147. doi:10.1371/journal.pbio.3001147

      Codol O, Krishna NH, Lajoie G, Perich MG. 2024. Brain-like neural dynamics for behavioral control develop through reinforcement learning. bioRxiv. doi:10.1101/2024.10.04.616712

      Hadjiosif AM, Morehead JR, Smith MA. 2023. A double dissociation between savings and long-term memory in motor learning. PLoS Biol 21:e3001799. doi:10.1371/journal.pbio.3001799

      Hamel R, Dallaire-Jean L, De La Fontaine É, Lepage JF, Bernier PM. 2021. Learning the same motor task twice impairs its retention in a time- and dose-dependent manner. Proc Biol Sci 288:20202556. doi:10.1098/rspb.2020.2556

      Hamel R, Lepage J-F, Bernier P-M. 2022. Anterograde interference emerges along a gradient as a function of task similarity: A behavioural study. Eur J Neurosci 55:49–66. doi:10.1111/ejn.15561

      Heald JB, Lengyel M, Wolpert DM. 2021. Contextual inference underlies the learning of sensorimotor repertoires. Nature 600:489–493. doi:10.1038/s41586-021-04129-3

      Hopfield JJ. 1982. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci U S A 79:2554–2558. doi:10.1073/pnas.79.8.2554

      Hopfield JJ, Feinstein DI, Palmer RG. 1983. “Unlearning” has a stabilizing effect in collective memories. Nature 304:158–159. doi:10.1038/304158a0

      Leow L-A, Marinovic W, de Rugy A, Carroll TJ. 2020. Task errors drive memories that improve sensorimotor adaptation. J Neurosci 40:3075–3088. doi:10.1523/JNEUROSCI.1506-19.2020

      Wang T, Ivry RB. 2025. Contextual effects during sensorimotor adaptation are an emergent property of population coding in a cerebellar-inspired model. Sci Adv 11:eadr4540. doi:10.1126/sciadv.adr4540

      Yin C, Wei K. 2020. Savings in sensorimotor adaptation without an explicit strategy. J Neurophysiol 123:1180–1192. doi:10.1152/jn.00524.2019

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ritzau-Jost et al. investigate the potential contribution of AP broadening in homeostatic upregulation of neuronal network activity with a specific focus on dissociated neuronal cultures. In cultures obtained from a few brain regions from mice or rats using different culture conditions and examined by different laboratories, AP half-width remained stable despite chronic activity block with TTX. The finding suggests that AP width is not significantly modulated by changes in sodium channel activity.

      Strengths:

      The collaborative nature of the study amongst the neuronal culture experts and the rigorous electrophysiological assessments provides for a compelling support of the main conclusion.

      Weaknesses:

      Given the negative nature of the results, a couple of remaining issues (such as the cell density of cultures and the presentation of imaging experiments with a voltage sensor) warrant further consideration. In addition, a discussion of the reasons for the I stability of AP half-width to sodium channel modulation might help extend the scope of the study beyond the presentation of a negative conclusion.

      We would like to thank the reviewer for positively evaluating our manuscript. Please find below our detailed point-to-point response to the reviewer’s comments.

      Reviewer #2 (Public review):

      Summary:

      This study reexamined the idea that action potential broadening serves as a homeostatic mechanism to compensate for changes in network activity. The key finding was that, while action potential broadening does occur in certain neurons - such as CA3 pyramidal cells-it is far from a universal response. This is important because it helps resolve longstanding discrepancies in the field, thereby contributing to a better understanding of network dynamics. The replication of these findings across multiple laboratories further strengthened the study's rigor.

      Strengths:

      Mechanisms of network homeostasis are essential to understand network dynamics.

      Weaknesses:

      No weaknesses were noted by this reviewer.

      We would like to thank the reviewer for the positive evaluation of our manuscript. Please find below our detailed point-to-point response to the reviewer’s comments.

      Reviewer #3 (Public review):

      Summary:

      The manuscript "Unreliable homeostatic action potential broadening in cultured dissociated neurons" by Ritzau-Jost et al. investigates action potential (AP) broadening as a mechanism underlying homeostatic synaptic plasticity. Given the existing variability in the literature concerning AP broadening, the authors address an important and timely research question of considerable interest to the field.

      The study systematically demonstrates cell-type- and model-specific AP broadening in hippocampal neurons after chronic treatment with either tetrodotoxin (TTX) or glutamatergic transmission blockers. The findings indicate AP broadening in CA3 pyramidal neurons in organotypic cultures after TTX treatment, but notably not in dissociated hippocampal neurons under identical conditions. However, blocking glutamatergic neurotransmission caused AP broadening in dissociated hippocampal neurons. Moreover, extensive evaluations in neocortical dissociated cultures robustly challenge previous findings by revealing a lack of AP broadening following TTX treatment. Additionally, the proposed role of BK-type potassium channels in mediating AP broadening is convincingly questioned through complementary electrophysiological and voltage-imaging experiments.

      Strengths:

      The manuscript exhibits an outstanding experimental design, employing state-of-the-art techniques and a rigorous multi-lab validation approach that greatly enhances scientific reliability. The experimental results are meticulously illustrated, and the conclusions drawn are justified and supported by the presented data. Furthermore, the manuscript is comprehensively and clearly written.

      Weaknesses:

      Concerning the statistical analyses employed, it is advisable to consider the Kruskal-Wallis test with corrections for multiple comparisons when evaluating more than two experimental groups.

      We would like to thank the reviewer for the positive evaluation of our manuscript. In the following we first address the comment regarding the used statistical tests. Please also find below the detailed response to the reviewer’s further comments. Indeed, we did not apply a correction for multiple comparisons in Figure 2. This seems justified because in this exceptional case we are more worried about type II errors (false negative). The Kruskal-Wallis test seems not appropriate for this type of data for which only the comparison between the control and respective TTX data is relevant. Instead, we followed the reviewer’s suggestion by applying corrections for false discovery rate (FDR). We thank the reviewer for pointing out this statistical issue and addressed it in the revised manuscript (lines 121–128):

      “Even though AP durations varied up to 2-fold between conditions, statistically significant homeostatic AP broadening was not detectable in any of the tested conditions (Fig. 2B). To minimize type II errors (false negative) we intentionally did not apply a correction for multiple comparisons. The only significance was observed in condition III but in an opposite direction (i.e. AP narrowing with TTX, P=0.026; Fig. 2B). However, this is likely a false positive because application of corrections for false discovery rate results in P=0.268 for both Benjamini–Hochberg and Bonferroni correction.”

      Recommendations for the authors:

      Reviewing Editor Comments:

      The main and most important observation of the study is that the AP does not change in most cases examined. A discussion of the mechanisms of the changes in CA3 neurons would significantly strengthen the compelling evidence presented. The individual reviews are also provided, in case the authors find them useful to include other aspects suggested by the reviewers.

      We would like to thank the Reviewing Editor for handing our manuscript and for the positive evaluation of our work. The main focus of our study was the analysis of homeostatic plasticity in cultured neurons of the neocortex. We agree that the findings in CA3 neurons are interesting. As explained in more detail below, we have carefully discussed the mechanisms of the changes in CA3 neurons in the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      Major points

      (1) AP widths measured in the present study under basal conditions are generally larger than the value reported in previous work by Li et al. 2020 (~1.5 ms). In particular, rat cortical cultures prepared using the same conditions show that the mean AP half-width in controls of the present study (~2.5 ms) is closer to the mean AP half-width in TTX-treated neurons in Li et al. (~2.0 ms).

      We thank the reviewer for the detailed and positive feedback as well as for the thoughtful questions. The inconsistency of action potential half-duration reported in our and Li et al.’s data is partially due to differences in the way the half-duration was measured. In Li et al. the exact method is unfortunately not defined, but from a personal communication with the authors we know that they measured half-duration based on the AP amplitude between AP peak and AP voltage threshold. In contrast, we measured half-duration based on the AP amplitude between AP peak and the resting membrane potential preceding current injections. When we measure AP half-duration instead from voltage threshold, the average half-durations are 1.97 ms (compared to 2.64 ms from baseline, n = 106 cells; average across conditions I–IV, control and TTX merged). Thus, the discrepancy in the half-duration is to a significant proportion due to methodical differences in the way the half-duration was measured.

      One parameter that is not stated in either study is cell plating density, which can potentially bias the neuronal network activity levels of cultures. Could the authors comment on the possible contribution of neuronal culture density to AP half-width under basal recording conditions and its sensitivity to chronic TTX treatment? Are there any data available? For example, cultures used by Li et al may have been plated at a high density and experienced high activity level during culturing, which could have contributed to the enhanced sensitivity to chronic activity suppression by TTX.

      We agree that neuronal culture density is an important factor influencing neuronal activity and hence potentially also the sensitivity to chronic activity suppression. In our experiments, the number of plated cells per cover slip varied between conditions about 3-fold: 30–50k cells for conditions I and II, 25–30k cells for conditions III, VII, XI, 50k cells for condition IV, 65k for conditions V, VI and VIII, and 70k cells for conditions IX and X. Li et al. do not provide the cell density or the number of plated cells. Despite the difference in the number of plated cells in our dataset across various laboratories, we did not observe a systematic effect of cell number on baseline AP half-duration. Furthermore, we observed strongly different baseline activity across our various experimental conditions (Fig. 3A), which did not correlate with cell density. Also, we did not notice an impact of baseline activity on the sensitivity to chronic activity suppression with TTX (cf. Fig. 3A and 2B). We have now added the number of plated cells per condition to the methods section as well as the following paragraph to the discussion section (lines 256–262):

      “The sensitivity to chronic TTX treatment might depend on baseline neuronal activity, which is in part related to neuronal culture density[37]. However, TTX did not induce AP broadening despite different baseline activities (Fig. 3A) and a nearly threefold variation in the number of plated cells per cover slip between conditions (25k – 70k cells per coverslip).”

      In addition, a discussion of the reasons for the seeming stability of AP half-width to sodium channel modulation might help extend the scope of the study beyond the presentation of a negative conclusion.

      We thank the reviewer for this suggestion and have added a paragraph to the end of the discussion emphasizing potential advantages of cell-type specific AP broadening (lines 353–362):

      “Despite the lack of homeostatic, TTX-induced AP broadening in dissociated cultures, AP duration was broadened upon Kyn-treatment in dissociated cultures and using TTX in CA3 neurons in organotypic cultures. Because BK-channels control AP duration in CA3 neurons of organotypic cultures[79], homeostatic BK-channel downregulation as proposed by Li et al. may be involved in AP broadening in this specific cell type. While the reasons for the variable occurrence of homeostatic AP broadening remain unknown, this may render neuronal circuitries more robust to perturbations. The regulation of AP duration therefore might represent one element in the repertoire of neuronal plasticity that is, similar to other plasticity mechanisms, not generally shared, but specifically expressed in some cell types and neuronal compartments.”

      (2) In this study, CA3 neurons in organotypic cultures were the only cells that showed AP broadening with TTX treatment. Notably, CA3 neurons show strong recurrent activity in general and would be expected to have experienced high levels of activity in culture. For CA3 neurons in organotypic cultures, does IbTx increase basal AP half-width?

      We thank the reviewer for this interesting idea. Even though, to our knowledge, there is no study investigating the effect of IbTx on AP width in CA3 neurons of organotypic cultures, Raffaelli et al. (DOI 10.1113/jphysiol.2004.062661) reported ~15% AP broadening using the BK-channel blocker paxilline. Therefore, TTX-induced broadening in CA3 neurons might be related to BK-channel-dependent AP repolarisation, consistent with the model proposed by Li et al. Because organotypic cultures show increased activity for longer cultivation periods and higher connectivity compared to acute slices (De Simoni et al., DOI 10.1113/jphysiol.2003.039099), the effect of TTX may be aggravated in organotypic cultures compared to acute slices or in vivo. However, the lack of a TTX-effect was not dependent on background neuronal activity or culture density in our recordings (see above as well as lines 306–310 of the revised manuscript).

      (3) Figures 4E-G. In experiments to test the efficacy of IbTx with GEVI, larger fields of view of neuron(s) used for recordings should be included. As shown, it is difficult to discern the quality of the preparation and does not provide a representative indication of the type of signals measured.

      We thank the reviewer for this suggestion and have included an image of a representative neuron expressing the GEVI in Fig. 4E.

      Minor points

      (1) Lines 222-228. With respect to cell-type specificity of TTX-induced AP broadening, the observed lack of effect of TTX in dissociated hippocampal cultures might suggest that the cultures are predominantly DG granule cells and CA1 neurons, with few CA3 neurons surviving. Could the authors comment?

      We thank the review for this interesting hypothesis and have discussed it in the manuscript as a potential explanation for our different findings in the hippocampus.(lines 263–270):

      “Although we mainly focus on neocortical cultured neurons (condition I to VIII, Fig. 2) because Li et al. used neocortical neurons, the absence of AP broadening in hippocampal neurons (group IX to XI) could in principle be explained by the selective loss of CA3 neurons, which show AP broadening in organotypic cultured neurons (Fig. 1A and B). However, CA3 neurons were shown to survive in dissociated cultures following region-specific microdissection[40], and CA1 neurons are generally more stress-sensitive to excitotoxicity with glutamate or NMDA than CA3 and DG neurons[42], arguing against a general selective loss of CA3 neuron in dissociated cultures.”

      (2) Figures 3D, E. To what extent is the observed increase in sEPSC amplitude due to an increase in sEPSC frequency? Is quantal amplitude increased following TTX treatment, a postsynaptic strength parameter that one would not expect to be affected by a change in AP width, but that is known to undergo up-scaling with chronic TTX treatment?

      We would like to thank the reviewer for the question. We cannot rule out an interplay between sEPSC amplitude and frequency. We did not measure quantal amplitude in the presence of TTX. Our experiments were designed to test whether TTX successfully induced homeostatic plasticity, but not to attribute the observed effect to pre- and postsynaptic mechanisms. We have added the following statement to the revised manuscript, to highlight the possible interaction of sEPSC amplitude and frequency (lines 176–178):

      “These changes in sEPSC amplitude and frequency are not specific for somatic, pre- or postsynaptic adaptations. However, the results show that blocking AP firing with TTX successfully induced homeostatic plasticity under our experimental conditions.”

      (3) Line 132. Could the authors explain the rationale for using AP amplitude as a measure of neuronal "viability"?

      In a response to Cell, Li et al. suggested that the lack of a TTX effect was due to recordings from unhealthy neurons and that small AP amplitudes could indicate impaired cell viability. Indeed, we also believe that cells which appear morphologically less healthy tend to have small and slow APs. A mechanistic rationale could be a change resting membrane potential or changes in the expression of voltage-gated sodium and potassium channels. However, AP amplitudes were not affected following TTX treatment in any of the eleven recording conditions (Fig. 2D) or a cross-conditional comparison (Fig. 2E). In the revised manuscript, we have now added a possible rationale (lines 134–137):

      “Because unhealthy neurons tend to have small and slow APs, possibly due to changes in resting membrane potential or expression of voltage-gated sodium and potassium channels, we first analyzed AP amplitude as a measure of neuronal viability.”

      Reviewer #3 (Recommendations for the authors):

      I propose addressing the following questions, either through additional experiments (recommended) or a deeper theoretical discussion:

      (1) Since the authors demonstrate that blocking glutamatergic neurotransmission in dissociated hippocampal neurons causes AP broadening, do similar phenomena occur in organotypic cultures and dissociated neocortical neurons?

      We thank the reviewer for the interesting question. In dissociated hippocampal cultures, we show that AP duration is maintained following treatment with TTX and NBXQ, while Kyn-treatment leads to AP broadening (Figure 1C). To our knowledge, the effect of Kyn on AP duration has not been studied in neocortical dissociated cultured neurons. However, Kyn induced AP broadening in CA3 neurons of hippocampal organotypic cultures (Zbili et al., DOI 10.1073/pnas.2110601118) while CNQX did not induce such broadening in CA1 neurons (Karmarkar and Buonomano, DOI 10.1111/j.1460-9568.2006.04692.x). Both findings are in accord with our recordings from dissociated hippocampal cultures. These data however do not allow inference as to whether AP broadening is a cell-type specific or blocker-specific mechanism in hippocampal organotypic cultures. Because the main focus of our study is the absence of AP broadening in neocortical cultured neurons as described by Li et al., we adjusted the corresponding discussion section (lines 299–322)

      “In contrast, APs were not significantly broader following synaptic block by NBQX (Fig. 1C, D), in accord with recordings from CA1 neurons in organotypic cultures using CNQX. TTX-induced broadening may therefore be cell-type specific or due to a differential effect of the glutamate receptor blockers on NMDA receptors which are blocked by Kyn but not NBQX/CNQX or TTX and which have recently been demonstrated to be important for the induction of synaptic homeostatic plasticity[41].”

      (2) Are BK channels involved in AP broadening observed in CA3 pyramidal neurons in organotypic cultures?

      We thank the reviewer for the question. BK channels control spike duration in CA3 neurons of organotypic cultures (~15% broadening upon block by paxilline; Raffaelli et al., DOI 10.1113/jphysiol.2004.062661). Even though there is no available data on the contribution of BK channels to homeostatic spike broadening in this cell type, CA3 neurons in organotypic cultures thereby fulfil the two necessary preconditions of the model proposed by Li et al. (namely, the control of the resting AP width by BK-channels and TTX-induced AP broadening). We include this possibility in the discussion (lines 355–357):

      “Because BK-channels control AP duration in CA3 neurons of organotypic cultures[79], homeostatic BK-channel downregulation as proposed by Li et al. may be involved in AP broadening in this specific cell type.”

      (3) AP broadening consistently occurs in CA3 neurons within organotypic cultures; what molecular or cellular mechanisms underpin this phenomenon, and is there a potential contribution from glial cells?

      We thank the reviewer for this interesting question. CA3 neurons show AP broadening upon chronic inactivity across various studies that has not been observed in CA1 or DG neurons. Recordings from CA3 neurons served as a positive example for TTX-induced AP broadening in our study, in contrast to a lack of broadening in dissociated (neocortical and hippocampal) cultured neurons. The discrepancy between the results in dissociated and organotypic cultured neurons could indeed be due to interactions with glia cells. We have added this possibility to the discussion in the revised version of the manuscript (lines 270–273)

      “Altered cell-cell interactions with glia and neurons in organotypic and dissociated neuronal cultures could instead contribute to the different findings in various hippocampal preparations.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the role of an E3 ubiquitin ligase ITCH in regulating the viral life cycle of SARS-CoV-2. The authors showed that ITCH mediates ubiquitination of the membrane (M) and envelope (E) proteins of SARS-CoV-2. Ubiquitination of E and M results in enhanced interactions between the structural proteins and redistribution of the structural proteins into autophagosomes. The authors claim that the enhanced interactions between structural proteins and trafficking of the structural proteins into autophagosomes contribute to SARS-CoV-2 replication and egress, prompting ITCH as a potential antiviral target. ITCH also alters the cellular distribution of host proteases important for spike cleavage which protect and stabilize spike with cleavage. The authors also demonstrated that SARS-CoV-2 replication is augmented by ITCH in which virus replication is significantly impaired in cells lacking ITCH expression.

      Strengths:

      The authors provided high-quality data with appropriate experimental controls to justify their claims and conclusions. The mechanistic analyses are excellent and presented in a logical manner. The investigation of the role of ubiquitination in coronavirus assembly and egress is novel as most previous studies focused on its role in mediating innate immune responses.

      Weaknesses:

      Although the authors showed that ITCH ubiquitinates E and M proteins, the claim that such ubiquitination promotes virion assembly and egress is circumstantial. The enhanced interaction between the structural proteins and targeting of ubiquitinated structural proteins into autophagosomes does not necessarily result in increased virion production and release as suggested by the authors. There is a disconnect between the ubiquitination of structural proteins and the role of ITCH in augmenting virus replication as shown in Fig. 6A and B. In addition, the authors showed that the catalytic activity of ITCH is important for the localization and maturation of host proteases. However, the mechanism behind is unknown. Also, it is unclear how protection of spike from cleavage conferred by ITCH explains its role in promoting replication as a lack of spike cleavage would inevitably compromise entry. The major weakness of the manuscript is the lack of experimental data that explains the molecular role of ITCH in relation to its phenotype observed during SARS-CoV-2 infection.

      We sincerely thank the reviewer for the positive evaluation of the quality, rigor, and novelty of our study. We particularly appreciate the thoughtful comments regarding the mechanistic link between ITCH-mediated ubiquitination and viral assembly/egress, as well as the broader implications for SARS-CoV-2 replication.

      Our data support a model in which ITCH-mediated ubiquitination of the structural proteins M and E enhances their interactions and promotes their trafficking into autophagosomal compartments, ultimately contributing to increased virion production and release. The phenotypic outcomes observed in Fig. 6A-B (replaced by re-measured viral infectious titer and genomic copy number in the culture medium of vT2-WT and vT2-KO cells) are consistent with our earlier findings in Figs. 1-5, which demonstrate that ITCH promotes SARS-CoV-2 replication. Thus, the replication defect observed in ITCH-deficient cells aligns with the mechanistic effects of ITCH on structural protein ubiquitination and trafficking.

      We agree with the reviewer that directly linking ubiquitination of structural proteins to virion production would further strengthen the mechanistic connection. However, direct detection of ubiquitinated virions in vitro, particularly by electron microscopy (EM), remains technically challenging. Our laboratory has not yet established an EM-based platform optimized for high-resolution SARS-CoV-2 virion analysis. Furthermore, it is possible that ubiquitin chains conjugated to structural proteins are cleaved during or after virion egress, which would complicate their detection in released particles. These technical and biological considerations currently limit direct visualization of ubiquitinated virions.

      Regarding the role of ITCH in regulating the localization and maturation of host proteases, our recent studies [1, 2] have demonstrated that ITCH is involved in Golgi fragmentation, leading to altered furin distribution and impaired cathepsin L maturation. These findings provide mechanistic insight into how ITCH catalytic activity may influence host protease processing. We have incorporated this discussion into the revised manuscript (last paragraph of the Discussion section) to better contextualize our observations.

      With respect to spike cleavage, although S1/S2 processing is required for SARS-CoV-2 entry, accumulating evidence suggests that excessive intracellular cleavage may be detrimental to virion stability. For example, in Vero cells lacking TMPRSS2, virions containing cleaved S1 and S2 are less stable [3]. Additionally, the D614G substitution renders the spike protein more resistant to cleavage, reduces S1 shedding, and enhances incorporation of intact spike into virions, thereby increasing infectivity and stability [4-6]. These findings suggest that maintaining intact spike during intracellular assembly may be advantageous for the viral life cycle. In this context, ITCH-mediated modulation of host protease distribution and spike processing may help preserve spike integrity within assembling virions.

      Taken together, the ability of ITCH to (i) enhance structural protein interactions, (ii) facilitate trafficking through autophagosomal pathways, and (iii) promote incorporation of intact spike into virions provides a coherent mechanistic framework explaining how ITCH enhances virion production and release. While additional studies will be required to further dissect the precise molecular details, our data collectively support a functional link between ITCH ubiquitin ligase activity and SARS-CoV-2 assembly and egress.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript Qiwang Xiang et al. investigated the role of the E3 ubiquitin ligase ITCH in the life cycle of SARS-CoV-2. They claim the following:

      (i) ITCH promotes virion assembly by interacting with E and M proteins and enhancing their K63-linked ubiquitination

      (ii) ITCH-mediated ubiquitination promotes autophagosome-dependent secretion of viral particles.

      (iii) ITCH stabilizes the viral spike protein by impairing its processing by furin and catepsin L proteases.

      The manuscript provides an interesting exploration of ITCH's role in the SARS-CoV-2 life cycle but requires additional work to strengthen key claims and address potential confounding factors.

      Strengths:

      The experiments are sufficiently clear in documenting that ITCH activity is critical for efficient SARS-CoV-2 replication and for M and E proteins K63-linked ubiquitination

      Weaknesses:

      The manuscript does not convincingly demonstrate how ITCH-mediated ubiquitination of E and M impacts virus assembly and release. Identifying the specific lysine residues in M and E targeted by ITCH, and generating mutant VLPs or recombinant viruses, would strengthen the conclusions.

      Most of the conclusions rely on ITCH overexpression data, which may have off-target effects on Golgi integrity and vesicular trafficking. For instance, figure 4F provides evidence of altered Golgi morphology and TGN46 fragmentation raising concerns that ITCH overexpression could indirectly mislocalize furin, affecting S1/S2 cleavage of the spike protein. In addition, inhibition of furin activity may also lead to off-target effects, given its role in processing numerous host proteins.

      Similarly, ITCH overexpression is likely to indirectly affect cathepsin-L maturation. In addition, the manuscript does not clarify how impaired cathepsin L activity would influence virus assembly or release.

      A major concern is also the lack of quantification and statistical analysis of immunofluorescence images throughout the manuscript, which undermines the reliability of these observations.

      We sincerely thank the reviewer for recognizing the importance of ITCH in SARS-CoV-2 replication and for the constructive and insightful suggestions to further strengthen the manuscript.

      Regarding the impact of ITCH-mediated ubiquitination of E and M on virus assembly and release, our data support a model in which ITCH promotes K63-linked ubiquitination of the E and M proteins, facilitating their recruitment to p62-positive autophagosomal compartments. This recruitment likely enhances the spatial proximity and interaction frequency of structural proteins within assembly sites, thereby promoting efficient virion assembly and subsequent release via autophagosome-dependent secretory pathways.

      We agree that identifying the specific lysine residues in M and E targeted by ITCH and generating mutant VLPs or recombinant viruses would provide a more direct mechanistic link. These are important and technically demanding experiments that require extensive mutagenesis and reverse genetics approaches. While beyond the scope of the current study, we fully acknowledge their value and plan to pursue these directions in future work to further refine the mechanistic understanding of ITCH-dependent ubiquitination during coronavirus assembly.

      Regarding the reliance on ITCH overexpression systems, we acknowledge the reviewer’s concern that ectopic ITCH expression may affect Golgi integrity and vesicular trafficking. Indeed, our recent studies [1, 2] demonstrate that ITCH catalytic activity disrupts Golgi structure, resulting in altered furin distribution and impaired cathepsin L maturation. These findings provide mechanistic context for the phenotypes observed in the present study and suggest that ITCH regulates host protease localization through defined cellular pathways rather than nonspecific overexpression artifacts. We have now expanded the Discussion section (last paragraph) to clarify this mechanistic framework.

      Importantly, SARS-CoV-2 infection itself significantly activates endogenous ITCH, and therefore our ectopic expression system likely mimics infection-induced ITCH activation rather than representing a purely artificial condition. In addition, key phenotypes, such as reduced viral replication and altered structural protein behavior, are consistently observed in ITCH-deficient cells, supporting the physiological relevance of ITCH activity in the viral life cycle.

      Regarding cathepsin L (CTSL) maturation, we have expanded the Discussion to clarify how impaired CTSL activity may influence viral assembly and egress. ITCH inhibits CTSL maturation, thereby reducing excessive spike cleavage into smaller fragments. Although CTSL-mediated spike processing facilitates genome release following endocytosis [7, 8], CTSL is a lysosomal protease, and lysosomes are exploited by β-coronaviruses as egress organelles [9]. Excessive lysosomal proteolysis may therefore compromise virion integrity during egress. In this context, ITCH-mediated inhibition of CTSL maturation may preserve spike stability within assembling or trafficking virions, thereby promoting the production and release of infectious particles during the replication phase.

      Regarding quantification and statistical analysis of immunofluorescence data, we appreciate this important point. In the revised manuscript, we have included expanded image panels with increased cell numbers, quantitative colocalization analyses to enhance the rigor of these observations.

      Reviewer #3 (Public review):

      Summary:

      Xiang et al. investigated the role of ubiquitin E3 ligase ITCH in SARS-CoV-2 replication. First, they described the role of ITCH on the structural proteins. Here, the ubiquitination of E and M (but not S) leads to an enhanced interaction and presumably virion assembly. In addition, E and M ubiquitination seems to be necessary for p62-guided sequestration into autophagosomes for secretion. Furthermore, ITCH regulates S proteolytic cleavage by changing furin localization and inhibiting CTSL protease maturation. In addition, SARS-CoV-2 infection upregulates ITCH phosphorylation, whereas knockout of ITCH reduces SARS-CoV-2 replication.

      Strengths:

      The proposed study is of interest to the virology community because it aims to elucidate the role of ubiquitination by ITCH in SARS-CoV-2 proteins. Understanding these mechanisms will address broadly applicable questions about coronavirus biology and enhance our knowledge of ubiquitination's diverse functions in cell biology.

      Weakness:

      The involvement of ubiquitin ligases in SARS-CoV-2 replication is not entirely new (see E3 Ubiquitin Ligase RNF5; Yuan et al., 2022; Li et al., 2023). While the data generally support the conclusions, additional work is needed to confirm the role of ITCH in SARS-CoV-2 replication in a biologically relevant context. The vast majority of data is based on transient overexpression experiments of ITCH, which ultimately leads to massive ubiquitination of several viral and host cell factors, including potentially low-affinity substrates not typically recognized under physiological conditions. In addition to that, nearly all experiments were done in cells co-overexpressing ITCH and the viral structural proteins (or cellular proteases) in HEK293T cells. Therefore, a proteomic analysis of protein ubiquitination in a) SARS-CoV-2-infected cells (ideally several cell types) and b) SARS-CoV-2-infected v2T-ITCH-KO cells would verify the ITCH-related ubiquitination of e.g., E and M and would strengthen the whole manuscript. In addition, the few key experiments using SARS-CoV-2 infected cells were performed in VeroE6 cells, which are neither human nor lung-derived. Only in one experiment were lung-derived Calu3 cells included.

      Moreover, the manuscript names ITCH as a central regulator of SARS-CoV-2 replication. If ITCH is beneficial for E and M interaction and thereby aids virion assembly, showing its effect on VLP production would be desirable. Clarifications regarding data acquisition and data analysis could strengthen the manuscript and its conclusions.

      We sincerely thank the reviewer for the thoughtful evaluation and for highlighting the importance of demonstrating physiological relevance.

      We agree that the involvement of E3 ubiquitin ligases in SARS-CoV-2 replication is not entirely unprecedented. Accordingly, we have expanded the Introduction to discuss RNF5 and other E3 ligases previously implicated in SARS-CoV-2 biology (e.g., Yuan et al., 2022; Li et al., 2023), thereby clarifying how ITCH differs mechanistically.

      Regarding the reliance on transient overexpression systems, we acknowledge the reviewer’s concern. Importantly, SARS-CoV-2 infection itself significantly induces ITCH phosphorylation and activation. Therefore, our ectopic expression system likely mimics infection-driven ITCH activation rather than representing a purely artificial condition. Moreover, key findings, including reduced viral replication and diminished E/M ubiquitination, were validated in ITCH knockout cells, supporting the physiological relevance of ITCH-dependent structural protein ubiquitination under endogenous conditions.

      We appreciate the suggestion to perform a global proteomic analysis of ubiquitinated proteins in (i) SARS-CoV-2-infected cells and (ii) SARS-CoV-2-infected ITCH-KO cells. Such analyses would indeed provide a comprehensive and unbiased assessment of ITCH-dependent ubiquitination events. While this approach is beyond the scope of the current study, we fully recognize its value and plan to pursue it in future investigations to further refine the mechanistic understanding of ITCH-mediated ubiquitination during coronavirus assembly.

      With respect to the cellular models used, Vero E6/TMPRSS2 cells are widely established for SARS-CoV-2 propagation due to their robust viral replication, rapid growth, and reduced culture-adapted mutations. Compared with Calu-3 cells, which grow more slowly and may acquire specific adaptations in certain viral genes during prolonged passage, Vero E6/TMPRSS2 cells maintain high viral stability and reproducibility, making them suitable for mechanistic studies. Nevertheless, we agree that human lung-derived systems are highly relevant, and we have included Calu-3 cell data where feasible to support translational relevance.

      Regarding the role of ITCH in virion assembly, our data in Fig. 2 demonstrate that ITCH-mediated K63-linked ubiquitination enhances the interaction between E and M proteins, supporting a functional role in virus-like particle (VLP) formation. We agree that direct visualization and quantification of VLP production by EM would further strengthen this conclusion. Such experiments require additional optimization and will be pursued in future work to provide more direct structural evidence.

      Finally, in response to the reviewer’s comments on data acquisition and analysis, we have expanded image panels, increased the number of quantified cells, and included quantitative colocalization analyses with appropriate statistical evaluation in the revised manuscript to enhance rigor and reproducibility.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should compare the infectivity of SARS-CoV-2 generated in cell lines expressing or lacking ITCH to investigate the effects of ITCH on infectivity, possibly by measuring RNA to PFU ratio and determining the S cleavage pattern in purified virions.

      We re-measured the viral infectious titer and genomic copy number in the culture medium of vT2-WT and vT2-KO cells infected at an MOI of 0.0001 for 24 h. ITCH ablation reduced the viral copy number by approximately 8-fold (Fig. 6B), while the infectious titer (TCID<sub>₅₀</sub>) decreased by at least 25-fold (Fig. 6A), indicating that loss of ITCH markedly impairs the formation of infectious viral particles. This finding is consistent with the role of ITCH in promoting Spike (S) protein cleavage.

      As suggested, to assess the S cleavage pattern in secreted virions, we precipitated proteins from the culture medium of SARS-CoV-2–infected cells with or without ITCH expression. Analysis of the precipitated S proteins revealed that the loss of ITCH markedly altered the integrity of full-length S in SARS-CoV-2 virions (Fig. S7A).

      (2) The authors should strengthen the connection between ubiquitination of structural proteins and viral egress by measuring infectious virus particles in the supernatants from cells with or without ITCH expression by plaque assay. However, this cannot be accurately achieved without performing the experiment described in point 1 as cleavage of spike and infectivity would affect the results.

      While a plaque assay was not performed, we quantified infectious viral particles in the supernatants using the TCID<sub>₅₀</sub> assay. These analyses showed that loss of ITCH resulted in a marked reduction in infectious virion production (>25-fold; Fig. 6A). In contrast, viral genomic copy numbers, which reflect both infectious and non-infectious particles, were reduced by approximately eightfold (Fig. 6B). The disproportionate reduction in infectious titer relative to viral copy number (approximately threefold difference) is consistent with a defect in virion infectivity, most likely due to impaired S cleavage in the absence of ITCH (Fig. S7A). The reduction in viral copy numbers suggests that ITCH-dependent ubiquitination of viral structural proteins contributes to efficient viral assembly and egress.

      (3) The authors should strengthen the connection between ubiquitination of structural proteins and virion assembly by EM.

      We appreciate the reviewer’s insightful comment. However, detecting ubiquitinated virions in vitro via electron microscopy (EM) remains technically challenging. At present, our laboratory has not yet established an EM-based system optimized for SARS-CoV-2 virion analysis. Moreover, it is also possible that ubiquitin chains present on virions may be cleaved during or after the viral egress process, further complicating their detection.

      Reviewer #2 (Recommendations for the authors):

      Supp. Figure 2: the authors should provide sequencing data for both ITCH-KO clones for consistency.

      The sequence for both ITCH-KO clones have been included now (Fig. S2C).

      Figure 2: All interaction data between structural proteins and p62 rely on ITCH overexpression. It would be helpful to include data in ITCH-KO cells as controls to validate these findings.

      As suggested, we performed E-based immunoprecipitation in wild-type (WT) and ITCH-knockout (KO) cells and found that E pulled down less p62 in the absence of ITCH, confirming that ITCH-mediated ubiquitination of E facilitates its interaction with p62 (Fig. 3C).

      Figure 3H: Verify the middle LC3B panel, as it does not match the merge panel. Please, correct any discrepancies.

      We thank the reviewer for pointing out this error. Fig. 3H (now Fig. 3J) has been corrected accordingly.

      Figure 4F: the labeling of the different panels seems incorrect.

      We have corrected the figure labeling.

      The authors should perform cell viability assays in clomipramine-treated cells. In addition, the authors should clarify whether clomipramine's antiviral effects depend on ITCH expression, given the comparable virus copy numbers in treated WT (Fig. S7B) and ITCH-KO cells (Fig. S7C)

      We thank the reviewer for this helpful comment. As shown Author response image 1., while clomipramine (Clom) treatment for 48 hours resulted in a modest reduction in cell number compared with the DMSO control, no apparent cell death was detected under these conditions.

      Author response image 1.

      Vero-TMPRSS2 (A) or Vero-ITCH-KO (B) cells were treated with DMSO or chloroquine (Clo) for 48 h, and cell viability was assessed by calcein AM staining (n = 3).

      Reviewer #3 (Recommendations for the authors):

      Results:

      Fig.2A and 2E display controversial results with different outcomes depending on the used bait. In my opinion, in both approaches, the overexpressed ITCH should be able to ubiquitinate M and E (since they are co-expressed). However, the interaction of E and M is not affected by the overexpression of ITCH or ITCH-CS when E is used as a bait (Fig.2A). In contrast, the interaction of E and M is enhanced in the presence of overexpressed ITCH (Fig.2E), when M is used as a bait.

      We thank the reviewer for pointing this out. It should be noted that the blots display only the major (un-ubiquitinated) bands of E and M. When M was used as the bait, more E (main band, un-ubiquitinated form) was co-precipitated in the presence of ectopically expressed ITCH. In contrast, when E was used as the bait, comparable levels of M (main band, un-ubiquitinated form) were detected regardless of ITCH expression. These results suggest that ubiquitin-modified M can bind more E, whereas ubiquitin-modified E does not significantly affect its interaction with M. A more detailed explanation has been added to the revised text.

      Fig.3A+3F: The authors claim a reduced E secretion when ITCH-KO cells or shRNA-treated p62 cells are used. I believe an input loading control of the supernatant displaying an equal amount of e.g. BSA is missing.

      In response to the reviewer’s suggestion, we have now included Coomassie Brilliant Blue (CBB) staining of the culture medium (now shown in Fig. 3A and Fig. 3F).

      Fig.3B: ITCH does not interact with E (or M) alone in the displayed data. The data is comparable with data observed for the interaction with S (Supp.4A). However, the author claims that ITCH interacts with M and E but not S (page 11).

      We would like to clarify that in ECL-based Western blotting, strong signals can mask weaker ones due to contrast limitations. In this experiment, ectopic expression of ITCH produced a strong signal that obscured the endogenous ITCH band. Upon longer exposure, the endogenous ITCH signal becomes visible. Additionally, our data presented in Fig. 1 and the new data in Fig. 3C demonstrate the interaction between the relevant proteins.

      Fig 3F: A scrambled control is missing. Moreover, it would be desirable to see if overexpression of p62 would enhance E release to verify that ITCH ubiquitination and p62-positive autophagosomes are necessary for E release.

      We appreciate the reviewer’s comment. Proteins in the culture medium were precipitated using TCA, and Coomassie Brilliant Blue (CBB) staining has been included (now shown in Fig. 3F). Additionally, E release was examined in the presence of overexpressed p62, and the results showed that p62 overexpression increased the level of E detected in the medium (now shown in Fig. 3G).

      Fig.3: Overall, an experiment using, e.g. cycloheximide (protein synthesis inhibitor) and MG132 (proteasome inhibitor) would strengthen the hypothesis that E and M are not degraded in a lysosome after ITCH overexpression. In my opinion, a colocalization experiment with LAMP1 is unsuitable to draw this conclusion. Would the overexpression of a deubiquitinating enzyme diminish M, E and p62 interaction? Does ITCH/p62 only regulate the release of the overexpressed single E or M protein, or does it also affect VLP release? An experiment analyzing purified VLPs produced in ITCH- or ITCH-CS overexpressing cells would be desirable.

      We thank the reviewer for these important questions. As suggested, we performed additional CHX and MG132 experiments. As shown in Fig. 3H and Fig. S3I, degradation of both E and M proteins was blocked by MG132 treatment, indicating that they are degraded via the proteasome pathway. Notably, MG132 treatment did not rescue the ITCH-mediated decrease of E/M levels, suggesting that the ITCH-dependent reduction of E and M is not mediated through the proteasome pathway. In addition, our recent back-to-back studies [1, 2] demonstrated that ITCH overexpression inhibits lysosomal function by impairing hydrolase maturation, suggesting that ITCH-mediated ubiquitination of E or M is unlikely to promote their degradation through the lysosomal pathway. Together, these data suggest that ITCH-mediated reduction of E and M is not due to enhanced degradation but is instead associated with their secretion.

      Overexpression of deubiquitinating enzymes specifically targeting E or M (which remains to be identified) would likely reduce their interaction with p62.

      Our data indicate that ITCH-mediated ubiquitination of E and M enhances their mutual interaction, supporting a role for this process in virus-like particle (VLP) formation. P62 would facilitate the release of VLPs by promoting the secretion of ubiquitinated E and M. In addition, the data presented in Fig. 2 indicate that ITCH enhances the mutual interaction of these structural proteins, thereby promoting virus-like particle (VLP) formation.

      Fig.4A: PPC site mutation indicated in yellow. There is no yellow color.

      We have revised the label to read “PPC site mutation indicated in red and green”.

      Fig.4C: Why should the overexpression of ITCH or ITCH-CS affect the S protein cleavage when the cleavage site is anyhow mutated?

      In this analysis, we aimed to verify that neither ITCH nor ITCH-CS affects the cleavage pattern of the mutated S protein. As these data are already presented in Fig. 4D (now Fig. 4C), the redundant result has been removed, and the corresponding description has been added to the revised text.

      Fig.4C: Lysates from the single expression of S wt protein (-ITCH/ +ITCH-CS; as indicated in Fig.4B) is missing for comparison to S mut protein.

      As these controls and related data are already presented in Fig. 4D (now Fig. 4C), the redundant result here has been removed.

      Fig. 4D: Lane 5 and Lane 7 are labeled similarly. ITCH+ in Lane 5 needs to be removed.

      We thank the reviewer for pointing out this error. The labeling (now Fig. 4C) has been corrected.

      Fig 4G: A theoretical MOI of 1 does not lead to an infection of all cells. Therefore, including a third marker for infection control, e.g., N protein, would be helpful. This would clarify whether the changes in furin localization are due to infection.

      We appreciate the reviewer for raising this point. Our goal was to examine whether SARS-CoV-2 infection affects the localization of furin (mouse antibody) relative to the Golgi marker (rabbit antibody). As suitable E, N, or M antibodies raised in goat or donkey were not available, we could not include those markers in this experiment. However, we did confirm M protein expression in parallel, and the infection efficiency was higher than 80% (Author response image 2.). To further validate that the observed changes in furin localization were due to viral infection, we have now included additional images showing a larger field of view containing more cells .

      Author response image 2.

      Fig.4: Generally, the colocalization of proteases with TGN46 should be analyzed quantitatively using, for example, Madner's overlap coefficient. This would be needed to draw the conclusion stated in the manuscript.

      We appreciate the reviewer’s suggestion. We now have included the colocalization analysis in the Fig. 4E and F.

      Fig.4/5: Overview IF pictures displaying additional cells would be desirable to clarify furin/cathepsin L localization in ITCH/ITCH-CS expressing cells. Otherwise, it looks (in my opinion) very subjective.

      In response to the reviewer’s suggestion, we have included additional images with a larger field of view encompassing more cells for Fig. 4 and 5 (presented in Fig. S5B and S5H).

      Fig.5D/G: MOI is missing in the figure legend.

      As suggested, the MOI information has been added to the figure legend.

      Fig.5D/G/6C/F: Infection control (e.g., N-protein) is missing in the Western Blots.

      We have added the infection control M in the figures.

      Fig.6: Why is the overall amount of ITCH reduced during the course of infection?

      We appreciate the reviewer for raising this point. As shown in Fig. 6C and F, ITCH was significantly activated, as indicated by its phosphorylation at the T222 site during viral infection. This activation promotes ITCH self-ubiquitination.

      Fig.6A: Would an overexpression of ITCH enhance viral replication?

      Moderate upregulation of ITCH promotes viral replication, whereas excessive ITCH overexpression leads to cell death, which in turn partially reduces viral titers.

      Discussion:

      Is there an explanation of how ITCH changes furin localization and CSTL maturation?

      Our recent back-to-back studies[1, 2] demonstrated that ectopic ITCH expression disrupts Golgi integrity, resulting in altered furin distribution and impaired CSTL maturation. The relevant discussion has now been incorporated into the revised text (last paragraph of the Discussion section).

      It would also be helpful to discuss the role of other known ubiquitin ligases like RNF5 in the replication of SARS-CoV-2 and other CoVs. Since the pandemic began, many interactome and host-factor studies in various cell types have been published. None of these studies identified ITCH so far. Could you comment on this?

      As suggested, we have included additional known ubiquitin ligases involved in SARS-CoV-2 replication and in other viral systems (see the third paragraph of the Introduction).

      Overall, in my opinion, the figure legends need to be improved. It is often not clear if ITCH is endogenously detected or overexpressed.

      We thank the reviewer for the helpful suggestion. Additional details have been incorporated into the figure legends.

      (1) Xiang Q, Lu Y, Wang H, Chen H, Chen P, Zhao X, et al. ITCH regulates Golgi integrity and proteotoxicity in neurodegeneration. Science Advances 2025; 11:eado4330.

      (2) Xiang Q, Liu Y, Wang J. Golgi fragmentation driven by the USP11-ITCH axis triggers autolysosomal failure in neurodegeneration. Autophagy 2026.

      (3) Peacock TP, Goldhill DH, Zhou J, Baillon L, Frise R, Swann OC, et al. The furin cleavage site in the SARS-CoV-2 spike protein is required for transmission in ferrets. Nature microbiology 2021; 6:899-909.

      (4) Zhang L, Jackson CB, Mou H, Ojha A, Peng H, Quinlan BD, et al. SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity. Nature communications 2020; 11:1-9.

      (5) Plante JA, Liu Y, Liu J, Xia H, Johnson BA, Lokugamage KG, et al. Spike mutation D614G alters SARS-CoV-2 fitness. Nature 2021; 592:116-21.

      (6) Daniloski Z, Jordan TX, Ilmain JK, Guo X, Bhabha G, Sanjana NE. The Spike D614G mutation increases SARS-CoV-2 infection of multiple human cell types. Elife 2021; 10:e65365.

      (7) Jaimes JA, Millet JK, Whittaker GR. Proteolytic cleavage of the SARS-CoV-2 spike protein and the role of the novel S1/S2 site. IScience 2020; 23:101212.

      (8) Zhao M-M, Yang W-L, Yang F-Y, Zhang L, Huang W-J, Hou W, et al. Cathepsin L plays a key role in SARS-CoV-2 infection in humans and humanized mice and is a promising target for new drug development. Signal transduction and targeted therapy 2021; 6:1-12.

      (9) Ghosh S, Dellibovi-Ragheb TA, Kerviel A, Pak E, Qiu Q, Fisher M, et al. β-Coronaviruses use lysosomes for egress instead of the biosynthetic secretory pathway. Cell 2020; 183:1520-35. e14.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this remarkable study, the authors use some of their recently-developed oxytocin receptor knockout voles (Oxtr1-/- KOs) to re-examine how oxytocin might influence partner preference. They show that shorter cohabitation times lead to decreased huddling time and partner preference in the KO voles, but with longer periods preference is still established, i.e., the KO animals have a slower rate of forming preference or are less sensitive to whatever cues or experiences lead to the formation of the pair bond as measured by this assay. This helps relate the authors' recent study to the rest of the literature on oxytocin and partner preference in prairie voles. To better understand what might lead to slower partner preference, they quantified changes to the durations and frequency of huddling. In separate assays, they also found that Oxtr1-/- KOs interacted more with stranger males than wild-type females. In a partner choice assay, they found that wild-type males prefer wild-type females more than Oxtr1-/- KO females. They then performed bulk RNA-Seq profiling of nucleus accumbens of both wild-type and Oxtr1-/- KO males and females, either housed with animals of the same sex or paired with a wild-type of the opposite sex. 13 differentially expressed genes were identified, mostly due to downregulation in wild-type females. These genes were also identified in a module lost in the Oxtr1-/- voles by correlated expression profiling. They also compared results of transcriptional profiling in female and male wild-type vs Oxtr1-/- voles (independently of bonding state) and found hundreds of differentially expressed genes in nucleus accumbens, mostly in females and often with some relation to neural development and/or autism. Some of the reduction in the transcript was confirmed with in-situs, as well as compared to changes in transcription in the lateral septum and paraventricular nucleus (PVN) of the hypothalamus. Finally, they find fewer oxytocin+ and AVP+ neurons in the anterior PVN.

      Strengths:

      This is an important study helping to reveal the effects of oxytocin receptor knockout on behavior and gene expression. The experiments are thorough and reveal a surprising number of genetic and anatomical differences, with some sexual dimorphism as well, and the authors have more carefully examined the behavioral changes after shorter and longer periods of partner preference formation.

      We thank Reviewer #1 for the positive assessment of the study’s significance and for recognizing the value of our behavioral and transcriptional analyses in refining the role of oxytocin signaling in pair bonding.

      Weaknesses:

      It is surprising that given all the genetic changes identified by the authors, the behavioral phenotypes are fairly mild. The extent of gene changes also might be underreported given the variability in the behavior and relatively low number of animals profiled.

      Pair bonding is a robust behavior composed of distinct modules that are supported by redundant and compensatory neural pathways. Our findings support a model in which Oxtr functions in parallel with other mechanisms to modulate specific components of social attachment. We have addressed this point in the discussion. We have also updated our result and method section to more clearly reflect our cohort size which is comparable to similar studies.

      Reviewer #1 (Recommendations for the authors):

      How do the wild-type males 'know' which animal is which during the three-chamber assay test of Figure 4B? Do the Oxtr1-/- KO females act in some way different from the wild types in this experiment?

      We thank the reviewer for this question. During follow-up analyses prompted by reviewer requests to characterize the behaviors underlying the apparent bias in WT male choice, we discovered a labeling error in the metadata used to analyze these assays. The error flipped the genotypes of the tethered stimulus animals at the ends of the chamber. After correcting this error and reanalyzing the data, we find that naïve WT males do not show a significant preference for naïve WT females over naïve Oxtr<sup>1-/-</sup> females. We have reconfirmed the metadata used in all assays in this study; no other datasets or conclusions are affected.

      While overall choice frequency is equivalent for males and females, our revised analyses demonstrate that Oxtr loss nonetheless alters the dynamics of social interactions in a sex-specific manner. In particular, the presence of an Oxtr<sup>1-/-</sup> male significantly alters WT females’ social behavior—enhancing prosocial engagement and reducing aggression—independent of which male is ultimately chosen. These findings support the conclusion that Oxtr function modulates early reciprocal social interactions rather than categorical choice outcomes.

      MOAT and LOAT seem like cumbersome acronyms, more so than something simpler like vole 1 vs vole 2.

      We have replaced these acronyms throughout the manuscript with the simpler, descriptive terminology; winner (MOAT) and loser (LOAT).

      Only three animals per condition seemed to have been used for RNA-Seq studies in Figure 5. Given the high behavioral variability in the earlier figures, did the authors screen for animals with exemplar or similar behavior within groups? The lack of significance of other genes or across other groups might just be due to a low-powered experiment given the high behavioral and genetic variability.

      We thank the reviewer for raising the important point regarding behavioral preselection, which has been performed in some similar studies. For our study, animals were not preselected based on exemplar or matched behavioral performance prior to tissue collection, as doing so would risk introducing variation in gene expression patterns due to the experience of complex social interactions. Instead, given that our prairie vole lines are maintained on an outbred background, tissue from three animals was pooled for each RNA-seq sample to reduce inter-individual variability and to capture representative transcriptional states within each experimental group. While this approach increases robustness to individual variability, we acknowledge that it may limit sensitivity to detect low expression behavior linked gene transcripts.

      On lines 426-429, the authors state that "While there was no significant difference in Oxtr transcript levels by genotype (padj = 0.753)-consistent with minimal nonsensemediated decay despite a premature stop codon-we have previously shown that no functional protein is produced in Oxtr1-/- animals (52)." This assertion could use strengthening, even if just to explain how this was verified in their previous publication. What is the evidence for nonsense decay and a full knockout of functional receptors at the protein level?

      We agree that this point benefits from clarification. Although Oxtr transcript levels were not significantly different by genotype (padj = 0.753), consistent with minimal nonsense-mediated decay, transcript abundance alone does not reflect receptor functionality. In our prior study, we directly assessed Oxtr protein function using receptor autoradiography and found a complete absence of specific ligand binding in Oxtr<sup>1-/-</sup> animals across brain regions that show robust Oxtr binding in wild-type voles, demonstrating a full loss of functional receptor protein. We have clarified this in our manuscript.

      Reviewer #2 (Public review):

      Summary:

      This manuscript uses a recently published oxytocin receptor null prairie vole line to examine the effects of this mutation on pair bonding behavior and PVN gene expression. Results reveal that Oxtr sex specifically influences early courtship behavior and partner preference formation as well as suppressing promiscuity toward novel potential mates. PVN gene expression varies between Oxtr null and WT prairie voles.

      Strengths:

      Behavioral analyses extend beyond the typical reporting of frequency and duration. The gene expression models and analyses are well-done and convincing. The experimental designs and approaches are strong.

      We thank Reviewer #2 for highlighting the strengths of the gene expression modeling and behavioral analyses.

      Weaknesses:

      More details and background literature explaining the role of the Oxt system in pair bonding behaviors is necessary, particularly for the Introduction. The authors overstate several times that Oxtr expression is not necessary for partner preference formation, based on their previous findings. However, it does appear, particularly, in the short cohabitation that it is necessary. Thus, the nuanced answer may be that Oxt may accelerate partner preference formation. Improving the presentation of the statistics and figures will make the manuscript more reader-friendly.

      We thank the reviewer for this thoughtful feedback and agree that additional background on the oxytocin (Oxt) system’s role in pair bonding will strengthen the manuscript. We have revised the introduction to expand our discussion of prior pharmacological and comparative studies suggesting that Oxt signaling modulates multiple components of pair bonding.

      Finally, in response to the reviewer’s suggestion, we have improved the presentation of figures and statistical reporting by interlacing figures with figure legends and updating the supplementary statistics table.

      Reviewer #2 (Recommendations for the authors):

      Major concerns

      (1) The Introduction provides a "broad strokes" approach to link the oxytocin and vasopressin systems as neuromodulators of social attachment processes. This study is a follow-up to a recent publication by the senior authors' groups which reported that the Oxtr null prairie voles were able to form typical pair bonds. Now, the authors are revisiting the same question by developing a series of behavioral assays to probe distinct aspects of pair bonding behavior. However, the Introduction lacks a nuanced examination of how the oxytocin system has been shown to regulate an array of social behaviors in prairie voles and other social species.

      We thank the reviewer for this observation and agree that the original Introduction did not capture the breadth and nuance of oxytocin system involvement in social behavior. We have substantially revised the Introduction in response to the reviewer’s suggestion to include a more detailed discussion of the role played by oxytocin signaling in social behaviors displayed across multiple phyla, including during the early stages of pair bonding.

      (2) In addition, there seems to be relevant viral Oxtr KD and KO studies in prairie voles which could be referenced to reflect differences between acute pharmacological Oxtr inhibition and prolonged viral KD of Oxtr on behavioral outcomes. This could also be put into context with the authors' first paper in prairie voles and others' work with mice showing how congenital Oxtr null rodent models may result in behavioral changes that are not reflected in the pharmacological or viral manipulation research. This could help justify the approach of the current study.

      We thank the reviewer for suggesting this comparison and have included a section in the discussion comparing pharmacological manipulations and global knock outs as well as the discrepancy in phenotypes that arise due to these methods. This expanded discussion clarifies why a congenital genetic model provides complementary insights: it allows us to identify which components of pair bonding are robust to developmental loss of Oxtr and which remain sensitive, thereby distinguishing between Oxtr-dependent behavioral modules and those supported by parallel mechanisms. Additionally, we have included viral manipulations of Oxtr in prairie voles during the early phase of interactions between the sexes in the introduction, to contextualize our study in the broader field. 

      (3) On lines 129-130: The authors state, "We previously found that Oxtr is not required for the display of partner preference following 1 week of cohabitation". While this is the general conclusion of their previous publication, this seems like a rather larger overgeneralization. There are many studies that have documented the functional regulation and necessity of the Oxt system for partner preference behavior in prairie voles. Therefore, it would be more appropriate to state that their previous study demonstrated that "Oxtr null prairie voles are able to develop a partner preference", but not that Oxtr is not necessary for partner preference formation. This may be a question about when the KO occurs, whether it be congenital or conditional.

      (4) This statement is repeated in Lines 350-352. However, the authors can now qualify this statement at this point in the manuscript with their new data which suggests that Oxtr null voles fail to form a partner preference after short cohabitation, but WT still form such preferences. This would suggest the qualification of this statement should be on the onset of partner preference formation as Oxtr is necessary for partner preference formation after a "short" cohabitation. Therefore, both findings are more in line with previous results which suggest that Oxt signaling accelerates partner preference formation.

      We have revised this language throughout the manuscript to state that our prior work demonstrated that Oxtr null voles are capable of forming a partner preference after extended cohabitation.

      (5) It appears Supplementary Table 1 is not scaled to the page size, so not all statistical results are clear. This limits the accuracy of my review.

      This table has been reformatted to ensure all statistical results are properly scaled to page size.

      (6) It is not always clear what statistical analyses are being performed. For example, how were the data in Figures 4G-H analyzed? What statistics were used and the output should be more readily available.

      During follow-up behavioral analyses prompted by Reviewer #1 requests to characterize the basis of the apparent WT male bias, we discovered a labeling error in the metadata associated with a subset of naïve three-chamber choice assays. In these cases, the genotypes of the tethered stimulus animals had been inadvertently flipped. After correcting this error and reanalyzing the data, we find that naïve WT males do not show a significant preference for naïve WT females over naïve Oxtr1-/- females. We have rechecked the metadata for all assays included in this study and confirmed that this was the only instance in which such an error occurred. We further analyzed the temporal dynamics of naive choice to find that Oxtr function modulates early reciprocal social interactions but does not affect the genotype ultimately chosen.

      To improve the clarity of the statistical analyses performed, we have reformatted our presentation of figure legends and our statistics table. All statistical tests, sample sizes, and relevant parameters (including exact tests used, correction methods where applicable, and definitions of units of analysis) are explicitly stated in the figure legends and compiled in the supplementary statistical summary table, in accordance with eLife reporting guidelines.

      (7) Oxytocin plays a critical role in development as early as embryogenesis. It may be useful to frame some of the Introduction and Discussion recognizing the congenital deletion of Oxtr may affect much of development. With that in mind, it is not surprising to see changes in gene expression associated with neurodevelopmental disorders.

      We now explicitly acknowledge in both the Introduction and Discussion that congenital Oxtr deletion likely impacts neural development which provides context for the observed enrichment of neurodevelopmental gene expression changes.

      Minor concerns

      (1) It was not clear why vasopressin was referenced in the Introduction. Specifically, the study documents that Oxtr null prairie voles have a reduction in Avp neurons in the PVN, which would suggest some aspects of Oxt signaling regulate Avp expression. However, the Introduction is not focused on how Oxt regulates the Avp system but rather on how each is a modulator of social attachment. It would improve the justification of this study to focus on Avp expression if the Introduction presented this concept.

      We thank the reviewer for pointing out the need for greater clarity around our reference to vasopressin (Avp) in the Introduction. We have simply stated that the potential for pair bonding is correlated with the patterns of expression of Oxtr and V1ar in the introduction. The goal of this study was to find evidence of behavior and gene expression changes due to the chronic loss of Oxtr which lead to our finding that a population of Avp neurons is lost in the animals lacking Oxtr. As we did not intend to justify our study on this basis, we have clarified our discussion to include previous studies where OT manipulation affects Avp neurons.

      (2) Figures and supplemental figures need figure legends.

      We have re-arranged the figure legends for each figure (including the supplementary figures) to follow the figures for easier readability and accessibility.

      (3) Figure 1 Timeline is focused more on the male timeline with "bond formation" and "bond maintenance" reflecting the days required to form a partner preference for males. The figure should be revised to reflect similar time points for female pair bonding.

      Figures have been revised to reflect each sex's bonding timeline.

      (4) Figure 1 has a color theme with females represented by red/pink and males represented by dark/light blue. However, this is not true for Figures 1C and 1D. Please revise these color schemes.

      Color schemes have been standardized across all figures.

      (5) It is not clear what is being graphed in Figures 2 and 3. The duration graphs have many more data points than the frequency graphs. Can this be explained?

      We thank the reviewer for pointing out this lack of clarity. The difference in the number of data points reflects how these measures are defined. Duration plots are generated at the level of individual huddle events, specifically pooling all huddles whose duration falls within the top quartile for a given animal, whereas frequency plots are generated at the level of individual animals and therefore contain one data point per subject. As a result, duration graphs necessarily include more data points than frequency graphs. The figure legends and Methods section explicitly state the unit of analysis for each metric and to clarify why the number of data points differs between duration and frequency plots.

      (6) What are the black bars in Figure 4H meant to represent?

      We thank the reviewer for this question. In the original submission, the black bars in Figure 4H were intended to indicate time periods showing statistically significant convergence in the chooser’s preference for the MOAT (More Of Assay Time, now winner) animal, based on the sliding preference index analysis. However, as mentioned during revision we identified a metadata error affecting the dataset used to generate this figure. After correcting the error, the figure was fully reanalyzed and regenerated. As a result, Figure 4H now presents a different analysis and no longer includes these black bars, and the conclusions drawn from this panel have been revised accordingly. The updated figure, legend, Results text and statistics table now accurately reflect the new analysis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      …other neurons such as AWB, AWA, and ADL are also involved in the coding process. These neurons likely communicate with different interneurons to contribute to 1-octinduced outputs. The authors' conclusion that loss of tax-4 reduces attractive responses and that osm-9 mutants reduce repulsive responses is not entirely convincing. TAX-4 is required for both AWC (an attractive neuron) and AWB (a repulsive neuron), and osm-9 is essential for ASH, ADL, and AWA (attraction-associated). Therefore, the observed effects on the attractive and repulsive responses could be more complex. Additionally, the interpretation of results involving the use of IAA to reduce the contribution of AWC at lower concentrations lacks clarity. A more effective approach might involve using transgenically expressed miniSOG or histamine (HisCl1) to specifically inhibit AWC neurons.

      We agree that the sensory inputs into chemotactic behavior are likely more complex, involving other neurons besides ASH and AWC. We now explicitly discuss possibility in the Discussion (lines 449-467).

      We have also utilized transgenically expressed HisCl1 in ASH and AWC to address this concern. Crucially, we observe that some of the effects of the broad mutations are reproduced by inactivating ASH and AWC. This finding validates our overall hypothesis that sensory-driven behavior is a balance of simultaneous afferent inputs of opposite valence AND shows that ASH and AWC are involved as expected. We are currently performing a comprehensive analysis of sensory inputs into locomotory decision making, including the neurons mentioned in the Reviewer’s comment.

      We also agree that using IAA is not a very clean way to inactivate AWC. The AWC HisCl results referenced above should alleviate this concern. However, the IAA result does put our findings into a broader context of multi-sensory integration which demonstrates the potential usefulness and selective advantages of the dual-input coding architecture that we are hypothesizing.

      Furthermore, they did not observe significant entrainment of AIB activity with the 2.2 mM 1-oct application. This might be due to the animals being anesthetized with 1 mM tetramisole hydrochloride, which could affect neural activity and/or feedback from locomotion. 

      We now mention these caveats “It is possible that immobilization and anesthetization may be affecting AIB responses to sensory activity and/or proprioceptive feedback from locomotion. However, it is also possible that motor feedback from RIM was obscuring the sensory signal.” Line 357

      It is unclear whether subtracting AVA activity from AIB activity provides a valid measure. Similarly, it is unclear how the behavioral data from freely moving worms compares to the whole-network calcium imaging results obtained from immobilized worms.

      Ray and Gordus 2025 (Current Biology 35:5534) recently demonstrated that AIB activity can be modeled as the additive convolution of AVA, AWC, and AIA activity, lending validity to our subtractive approach. In their study, AVA was the major contributor, but addition of AWC and AIA signals (i.e. sensory inputs) resulted in a significant greater accuracy. We have now mentioned their work in the manuscript (line 363) “To address this possibility, we subtracted AVA activity, representing the motor state, from the AIB activity (AVA closely mirrors RIM), based on the observation that AIB activity can be modeled as the sum of convolutions of motor activity and sensory activity.” (lines 360-363)

      The relationship between network activity in freely moving worms and immobilized worms has been explored by Kato et al 2015 (Cell 163:656-669); we now refer to this work on line 131 “These transitions are related to network state changes which drive spontaneous reversals during foraging in freely moving worms. Immobilization and anesthetization, necessary for confocal imaging, distort certain aspects of these motor command sequences compared to freely moving worms executing the motor commands and receiving proprioceptive feedback. However, the intrinsic motor programs remain intact under these conditions.” (lines 131-136)

      Reviewer #2 (Public review):

      tax-4, but not osm-9 mutants were used in chemotaxis and imaging assays. It would have been nice to have osm-9 results as well for these assays. The mutants are not specific to AWC and ASH. Cell-specific rescue of these neurons would have strengthened the proposed model.

      Osm-9 data are now included in the chemotaxis assays (Fig. 4E).

      Cell-specific HisCl data are now included for ASH and AWC (Fig. 4F, G, 5D), confirming our proposed model.

      Limited tax-4 data were included in the imaging (Fig. 6), but unfortunately, NeuroPAL imaging in tax-4 has proven to be technically difficult. NeuroPAL images in the tax-4 background appear different, perhaps because of developmental effects on gene expression due to the lack of sensory input (recall that the NeuroPAL color scheme is based on the relative expression levels of 40+ neuronal promoters). Inactivation of individual sensory neurons using HisCl1 or other transgenes may be the simpler approach.

      The Results and Discussion have been significantly rewritten to incorporate these new data

      We are currently working on a comprehensive study of the sensory inputs into locomotory decision making in the context of chemosensation, which we expect to reveal roles of other neurons besides ASH and AWC and provide a fuller picture of the complexities of this system.

      Reviewer #3 (Public review):

      (1) It is not clear precisely how important AWC is (compared to other cells) for the attractive response, though the presence of odor-off behavior implicates it. This could be resolved by looking at additional mutants (tax-4 is broad).

      We have addressed this concern using transgenically-expressed HisCl1 which has demonstrated a clear role for AWC in overall chemotaxis and locomotory decision making upon encountering the 1-oct/buffer interface in microfluidics devices (Fig. 4F, G, 5D).

      (2) Relatedly, dose-dependent chemotaxis data (Figure 4C, D) should be provided for osm-9 animals to get a sense of the degree to which dose-dependence is explained by ASH.

      Osm-9 data now included (Fig. 4E)

      The Results and Discussion have been significantly rewritten to incorporate these new data

      (3) Figure 4A, B should include average traces with errors, as there are several ways the responses can vary across conditions.

      Averaged traces with error bars now shown (Fig. 4A, B)

      (4) The data in Figure 6G does not appear to have error bars.

      Error bars now shown for 6G

      Also, it would help to include a more conventional demonstration of AIB responding to stimuli (e.g. averaging stimulus-aligned responses as a percent of the fluorescence value at stimulus onset to perform the desired subtraction).

      Fig. 6G top panel shows the stimulus-aligned responses of AIB with no subtraction performed. The 6 sequential stimulations are shown as a single continuous trace, consistent with the experimental protocol utilized. Averaging was performed across the 12 individuals of the sample set. However, we did not calculate the average of responses within a dataset (i.e. first plus second plus third etc.) to avoid obscuring any sensitization/desensitization that might be occurring with multiple stimuli.

      Subtracted calcium traces are harder to interpret. As it stands, the evidence that sensory signals are persisting in AIB and not being shunted by proprioceptive feedback in microfluidic devices is not strong.

      Addressing the point about proprioceptive feedback in microfluidics devices, the following sentence was added in the Results section: “Immobilization distorts certain aspects of these motor command sequences compared to freely moving worms executing the motor commands and receiving proprioceptive feedback, but the intrinsic motor programs remain intact.” (lines 131-136).

      To add context for the AIB-AVA subtraction, Ray and Gordus 2025 (Current Biology 35:5534) recently demonstrated that AIB activity can be modeled as the additive convolution of AVA, AWC, and AIA activity, lending validity to our subtractive approach. In their study, AVA was the major contributor, but addition of AWC and AIA signals (i.e. sensory inputs) resulted in a significant greater accuracy. We have now mentioned their work in the manuscript: “To address this possibility, we subtracted AVA activity, representing the motor state, from the AIB activity (AVA closely mirrors RIM), based on the observation that AIB activity can be modeled as the sum of convolutions of motor activity and sensory activity.” (lines 360-363)

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 1: The number of replicates (n) is missing.

      In Fig. 1D, only a single trial is shown as a representative example rather than averages, which would necessitate error bars. The Results and Figure Legend text has been updated to clarify this, and the average CI is now included in the first Results section (lines 111, 976)

      Figure 4: The sample size (n = 3-5) is relatively small, which may limit the statistical power.

      Sample size was increased to 5 for all data points shown on the new graph (Fig. 4E and noted in the figure legend (line 1019)

      Figure 4: The 0.22 mM concentration significantly affects both AWC and ASH. It is also unclear whether this concentration also affects other neurons, such as AWB, ADL, and AWA.

      We have not performed exhaustive analysis of other neurons in these datasets. These analyses are difficult and time consuming, so we have opted to present a dataset which supports our hypothesis that multiple afferent pathways of opposite valence act in a balanced way to drive chemotaxis. We are currently performing an in-depth analysis of the sensory inputs into the circuit, which we expect to present in a future study

      Reviewer #2 (Recommendations for the authors):

      The tax-4 and osm-9 experiments are great, but I recommend clarifying that tax-4 and osm-9 are expressed in other neurons as well. The text gives the impression that these mutants are specific to AWC and ASH, respectively. The authors should note these caveats.

      This concern is thoroughly addressed in the descriptions and rationale presented for the use of ASH and AWC HisCl strains.

      The authors should also provide the code used to interpret their results.

      Code will be provided through Zenodo.org

      Reviewer #3 (Recommendations for the authors):

      It would help to clarify (early on) the degree to which you are attributing responses to particular cells (e.g. AWC) as opposed to a class of cells with AWC as an example.

      This concern is thoroughly addressed in the descriptions and rationale presented for the use of ASH and AWC HisCl strains.

      The NeuroPAL imaging and analysis (especially Figures 3D, E) is a bit distracting and appears non-essential. If possible, it would help to combine Figures 2 and 3 with a focus on panels 3ABC to streamline the narrative.

      We would prefer to keep the present format so the reader can appreciate the power of the whole-brain approach for analyzing network activity and behavioral outputs in the context of sensory-motor responses. Specifically, our insight that attractive and aversive afferent inputs were activated simultaneously was wholly dependent on this approach. Otherwise, there would have been little to no reason for examining AWC activity at aversive 1-oct concentrations, which was essentially the foundation of the study.

      To highlight this point, we have added the following sentence in the Discussion: “This novel insight highlights the value of the whole-brain approach (enabled by the NeuroPAL system) for studying the network dynamics underlying sensory driven behaviors.” Lines 431-433.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors test the hypotheses, using an effort-exertion and an effort-based decision-making task, while recording brain dynamics with EEG, that the brain processes reward outcomes for effort differentially when they earned for themselves versus others.

      Strengths:

      The strengths of this experiment include what appears to be a novel finding of opposite signed effects of effort on the processing of reward outcomes when the recipient is self versus others. Also, the experiment is well-designed, the study seems sufficiently powered, and the data and code are publicly available.

      Weaknesses:

      There is some concern about the fact that participants report feeling less subjective effort, but also more disliking of tasks when they were earning rewards for others versus self. The concern is that participants worked with less vigor during self-versus-others trials and this may partly account for a key two-way Recipient x Effort interaction on the size of the Reward Positivity EEG component. Of note, participants took longer to complete tasks when working for others. While it is true that, in all cases, participants met the requisite task demands (they pressed the required number of buttons) they did so more sluggishly when earning rewards for others. The Authors argue that this reflects less motivation when working for others, which is a plausible explanation. The Authors also try to rule out this diminished vigor as a confounding explanation by showing that the two way interaction remains even when including reaction times (and also self-reported task liking) as a covariate. Nevertheless, it is possible that covariates do not fully account for the effects of differential motivation levels which would otherwise explain the two-way interaction. As such, I think a caveat is warranted regarding this particular result.

      We thank Reviewer #1 for the continued positive assessment and for continuing to highlight the caveat regarding the potential influence of differential vigor on the observed RewP interaction effects.

      We agree that a caveat is warranted. As detailed in our previous response (R5), we had already conducted control analyses addressing this concern; however, we acknowledge that these results were not incorporated into the manuscript itself. We have now addressed this by adding the covariate analyses to the Result section, along with an explicit caveat in the Discussion.

      Before describing the specific revisions, we would like to offer a minor clarification: the covariates in our control analyses were trial-by-trial response speed and self-reported effort ratings, rather than task liking ratings as noted in the summary above. Neither response speed nor effort rating predicted RewP amplitudes, and the critical Recipient × Effort and Recipient × Effort × Magnitude interactions remained significant and essentially unchanged. However, as the reviewer rightly pointed out, covariates may not fully capture the effects of differential motivation. Specifically, we have made the following revisions:

      First, we added the covariate control analyses to the Result section: “To rule out the possibility that the differential vigor between self- and other-benefiting trials drove the Recipient × Effort and Recipient × Effort × Magnitude interactions on the RewP, we conducted two control analyses by including trial-by-trial response speed and subjective effort ratings as separate covariates in the RewP model. Neither response speed (b = -0.07, p = .641) nor effort rating (b = 0.10, p = .186) predicted RewP amplitudes, and the critical Recipient × Effort and Recipient × Effort × Magnitude interactions remained significant and essentially unchanged (see Supplementary Table S3 for full regression estimates)” (page 12, para. 1).

      Second, we added a caveat to the Discussion section acknowledging this alterative explanation, which reads, “Another concern is that participants exhibited less vigor when working for others, as indicated by slower response speed and lower subjective effort ratings for other- versus self-benefiting trials. Although our control analyses confirmed that neither covariate predicted RewP amplitudes and the critical interactions remained significant, covariates may not fully capture the effects of differential motivation, and this alternative explanation cannot be entirely ruled out” (page 22, para. 2, lines 9–12; page 23, para. 1).

      Reviewer #2 (Public review):

      Summary:

      Measurements of the reward positivity, an electrophysiological component elicited during reward evaluation, have previously been used to understand how self-benefitting effort expenditure influences processing of rewards. The present study is the first to complement those measurements with electrophysiological reward after-effects of effort expenditure during prosocial acts. The results provide solid evidence that effort adds reward value when the recipient of the reward is the self but discounts reward value when the beneficiary is another individual.

      Strengths:

      An important strength of the study is that amount of effort, the prospective reward, the recipient of the reward, and whether the reward was actually gained or not were parametrically and orthogonally varied. In addition, the researchers examined whether the pattern of results generalized to decisions about future efforts. The sample size (N=40) and mixed-effects regression models are also appropriate for addressing the key research questions. Those conclusions are plausible and adequately supported by statistical analyses.

      We sincerely appreciate Reviewer #2’s positive evaluation of our manuscript and thank the reviewer for recognizing the strength of our experimental design and analysis approach.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a wonderful and landmark study in the field of human embryo modeling. It uses patterned human gastruloids and conducts a functional screen on neural tube closure, and identifies positive and negative regulators, and defines the epistasis among them.

      Strengths:

      The above was achieved following optimization of the micro-pattern-based gastruloid protocol to achieve high efficiency, and then optimized to conduct and deliver CRISPRi without disrupting the protocol. This is a technical tour de force as well as one of the first studies to reveal new knowledge on human development through embryo models, which has not been done before.

      The manuscript is very solid and well-written. The figures are clear, elegant, and meaningful. The conclusions are fully supported by the data shown. The methods are well-detailed, which is very important for such a study.

      Thank you for this feedback! We are excited for the possibilities of this method to discover genes required for various morphogenetic processes associated with human embryonic development.

      Weaknesses:

      This reviewer did not identify any meaningful, major, or minor caveats that need addressing or correcting.

      A minor weakness is that one can never find out if the findings in human embryo models can be in vitro revalidated in humans in vivo. This is for obvious and justified ethical reasons. However, the authors acknowledge this point in the section of the manuscript detailing the limitations of their study.

      Reviewer #2 (Public review):

      Summary:

      This manuscript is a technical report on a new model of early neurogenesis, coupled to a novel platform for genetic screens. The model is more faithful than others published to date, and the screening platform is an advance over existing ones in terms of speed and throughput.

      Thank you for this feedback! We agree that the robust symmetry breaking observed in our model, the comparisons to the human embryo in our cell type analysis, and the ability to conduct large-scale genetic screens represent advancements in the modeling of human neural tube closure that may be built upon in the future.

      Strengths:

      It is novel and useful.

      Weaknesses:

      The novelty of the results is limited in terms of biology, mainly a proof of concept of the platform and a very good demonstration of the hierarchical interactions of the top regulators of GRNs.

      The value of the manuscript could be enhanced in two ways:

      (1) by showing its versatility and transforming the level of neural tube to midbrain and hindbrain, and looking at the transcriptional hierarchies there.

      We thank the reviewer for this valuable suggestion and will keep this in mind for future work. As accurate answers to this question would require the development of robust midbrain and hindbrain organoid models, we believe that this question is outside the scope of the present work.

      (2) by relating the patterning of the organoids to the situation in vivo, in particular with the information in reference 49. The authors make a statement "To compare our findings with in vivo gene expression patterns, we applied the same approach to published scRNA-seq data from 4-week-old human embryos at the neurula stage" but it would be good to have a more nuanced reference: what stage, what genes are missing, what do they add to the information in that reference?

      We agree that a more comprehensive comparison of in vitro and in vivo data would add value to the study. We have added an analysis of the human Week 3 data, as neurulation occurs between Weeks 3 and 4 of human embryogenesis (new Figure 1F). We see our in vitro cell types in both datasets. We also included volcano plots in our supplementary figure to show major differences in gene expression (new Figure S1G). Somewhat surprisingly, embryo samples show higher expression of hemoglobin subunits and other hypoxia-related genes than organoids do, which may indicate hypoxic stress during sample handling during ex vivo experimentation (Schelshortn, et al., 2008) or alternatively, reflect differences in the metabolic environment between embryos and organoids. We did not find any differences would have affected our transcription factor candidate selection.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers were very enthusiastic about the work and provided suggestions for textual changes that will clarify the figures, methods, and results for readers.

      Reviewer #2 (Recommendations for the authors):

      (1) In Figure 1:

      (a) What is the orientation of the images in 1C?

      We have specified in the text and figure legend that this is a top-down view of an outer organoid.

      In this panel, what is the problem with ZO-1 in D4?

      We believe this is non-specific staining of dead cells that shed into the lumen during folding and closure. We have added this interpretation to the figure legend and added two supplementary time lapse videos (new Supplementary Video 1 and new Supplementary Video 2) of organoid closure that show dead cells being shed into the lumen as support to this interpretation.

      (b) What is the three-dimensional organization of these structures, if any? Or are they two-dimensional? In a way, this also refers to 1C.

      We have clarified in the text and figure legend that these organoids are three dimensional, and that Fig. 1B-C are top-down views.

      (c) Why can't we see FOXG1 amidst the markers forebrain? This is a very characteristic one.

      We see sparse FOXG1 expression in the human embryo samples at Week 4 (new Figure 1F), which may indicate that FOXG1 expression is upregulated later in the human embryo, after neural tube closure. We do see high levels of other fore brain associated transcription factors by this time however, including OTX2, LHX2, and SIX3.

      (d) The Figure 1 legend needs to be clear about the issues raised here.

      We have updated the Figure 1 legend to address these points.

      (2) Figure 2, could they explain in the text better how they organize the ML gene expression? What are their criteria?

      We thank the reviewer for catching this critical omission. We have added details of our medio lateral axis generation to the Methods section under “Single cell RNA sequencing analysis.”

      (3) Explain how and why the 77 genes were picked up?

      We have clarified at our first mention of 77 genes that this is a subset of our original 78 candidate genes, which were selected as described in the text (last paragraph in the results section “Identifying transcription factor candidates for regulation of anterior neurulation”. We have added a line in the Methods section that we were unable to clone a functional guide plasmid against one our candidates (NR6A1).

      (4) The authors mention the value of the geometry and the mechanics in neural tube closure, but they make no attempt to unravel these inputs, or at least the genes, from their screen, associated with them.

      We have rewritten this discussion of the literature to emphasize the active role of the neural ectoderm compared to the surface ectoderm, in order to justify the genetic analysis of the neural ectoderm rather than the surface ectoderm. We have clarified that our goal is to find upstream developmental drivers (transcription factors) of folding and closure, rather than investigate mechanical mechanisms of this process.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Liao et al. present SCOPE (Spatial reConstruction via Oligonucleotide Proximity Encoding), a method for reconstructing spatial organization from diffusion-defined DNA barcode interactions without the use of optical imaging. In SCOPE, hydrogel beads bearing unique DNA barcodes contain both "sender" and "receiver" oligonucleotides. Upon enzymatic release, sender oligos diffuse locally and hybridize to receiver oligos on neighboring beads, forming chimeric molecules that encode spatial proximity. Sequencing these products yields an interaction matrix, which is then used to reconstruct a spatial coordinate map.

      The authors demonstrate reconstruction of synthetic two-dimensional shapes, a large multicolor Snellen eye chart, and the interior surface of three-dimensional molds. The work expands the conceptual and experimental landscape of optics-free spatial sequencing.

      Thank you for this accurate summary of the work.

      Strengths:

      SCOPE employs bidirectional sender and receiver oligonucleotides on every bead, rather than using asymmetric transmitter-receiver architectures found in other diffusion-based methods. The symmetric design may improve detection sensitivity and reconstruction strategies, and represents a meaningful variation on optics-free spatial encoding.

      A notable strength of this study is the physical scale achieved. The authors reconstruct a Snellen chart spanning approximately 704 mm² and demonstrate molded 3D structures on the order of 75-100 mm³. Although some larger-scale warping is evident, and is discussed as potentially due to non-uniform diffusion, the relative local positioning across these large areas appears impressively accurate.

      The authors extend reconstruction beyond two-dimensional arrays to three-dimensional molded surfaces. This demonstrates that the assay and the computational methods for interpreting proximity graphs can support non-planar spatial relationships, expanding the scope of optics-free spatial inference.

      Thank you for highlighting these strengths of SCOPE.

      Weaknesses:

      Although the method is discussed in the context of spatial genomics and potential tissue applications, it is currently demonstrated only on engineered two-dimensional bead arrays and three-dimensional shapes fabricated in molds. It remains unclear how SCOPE would perform in heterogeneous biological environments, where diffusion may exhibit additional non-uniformities. A biological proof-of-concept, even limited in scope, would help define the method's strengths and limitations more clearly.

      We concur with the reviewer that a biological proof-of-concept is a key next step, and that diffusion will be more heterogeneous in this more complex environment. To this end, we are actively working to further develop SCOPE for use in tissue sections, with the goal of capturing transcriptomes, accessible chromatin, and genomes. As part of this work, we also hope to systematically explore a range of tissue permeabilization and tissue clearing approaches to mitigate the impact of heterogeneity on performance.

      The reconstruction of three-dimensional structures lacks strong sampling from volume interiors. This is speculated to be due to several possible factors; however, this limitation constrains the method to reconstruction of volume surfaces rather than comprehensive three-dimensional profiling.

      Thank you for highlighting this important limitation. The 3D reconstructions are indeed constrained by under sampling of volume interiors. We anticipate that this might be addressed via relatively minor adjustments to the protocol, e.g. using light or base-labile linkers to trigger oligo release, with the expectation that this will improve reaction consistency throughout the volume. However, even if we are unable to resolve this issue, we note that surface-resolved reconstructions may be useful for some goals, e.g. embedding a bead-packed gel within a tissue lumen, such as the gut. This could enable surface beads to capture RNA transcripts from adjacent cells, while bead–bead associations serve to define the surface topology.

      The reconstruction workflow involves multiple preprocessing steps and embedding choices. While these appear to work well for synthetic shapes with known geometry, it is less clear how parameter choices would be made in contexts where ground truth is unknown. Clarifying how reconstruction robustness is assessed without prior knowledge of spatial structure would help readers understand how the method could be practically deployed, particularly in more heterogeneous tissue contexts.

      Thank you for the opportunity to clarify. The computational pipeline used for 2D SCOPE reconstruction is designed to operate on a standardized input format and can be applied to arbitrary datasets without prior knowledge of spatial structure. For example, as shown in Figure 3, both the circle and “swoosh” geometries were reconstructed using the same algorithm and identical initial parameters. While certain hyper parameters are pre-specified (e.g. the number of k-nearest neighbors used to compute the pairwise distance matrix for UMAP), these are fixed across datasets. Other parameters, such as UMAP’s “min_dist,” are selected via an automated heuristic grid search that proceeds without user intervention. The agreement with ground truth in these controlled settings, together with the reproducibility of stochastic reconstructions (see Figure 3E-F), supports the robustness of the approach.

      Importantly, there was one exception. Reconstruction of the Snellen eye chart dataset required a manual step, involving an initial 3D UMAP embedding followed by a 2D projection to “flatten” the result. We suspect this reflects radial non-uniformities in sender/receiver oligo diffusion at larger spatial scales. Addressing such confounders algorithmically by explicitly modeling diffusion heterogeneity represents an important area for future work, with the goal of entirely eliminating the need for manual intervention.

      Finally, we note that these benchmark shapes represent somewhat contrived examples, and the geometries encountered in practice may often be much less complex. For example, in conventional spatial genomics, the geometry consists of a bead monolayer forming a flat, regular surface on a rectangular slide of known dimensions. Regardless of the tissue architecture overlaid on this surface, the reconstruction problem is defined by the bead monolayer itself, inferred through sender-receiver interactions.

      References

      Qian N, Li J, Yasser R, Yu M, Weinstein JA. 2026. Volumetric DNA microscopy for mapping spatial transcriptomes in three dimensions. Nat Protoc. doi:10.1038/s41596-025-01329-3

      Qian N, Weinstein JA. 2025. Spatial transcriptomic imaging of an intact organism using volumetric DNA microscopy. Nat Biotechnol 1–11.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study demonstrates, through a series of EEG and MEG experiments, that the human brain automatically categorizes words from alphabetic and non-alphabetic languages, and it unpacks the neural mechanisms of this process from multiple angles. The work examines not only univariate repetition-suppression (RS) effects, but also how repeating or alternating languages influences the representational similarity of words within and across language categories.

      Strengths:

      The univariate RS effects across multiple experiments lend support to some of the main conclusions

      Weaknesses:

      I have reservations about the logic underlying the multivariate analyses, and I believe the implications of the control experiments merit fuller discussion.

      (1) Question 1: Logic of the multivariate analyses

      The original text states:

      "The processing of intra-language similarity was quantified as correlation distances between neural responses to two words of the same language, which occurred more frequently and would be inhibited in the Rep-Cond (vs. Alt-Cond) due to habituation (Fig. 1c)...".

      I argue that this passage conflates two levels. Building a representational dissimilarity matrix (RDM) is a data-analysis step; it cannot be equated with a cognitive computation. Hence, there is no sense in which this computation occurs "more frequently" in one condition. RDM construction rests on the pairwise similarity of activity patterns, so even if a task engaged no cognitive computation of representational similarity, we could still compute an RDM. Conversely, if a task factor alters the RDM, we must explain how that factor changes the underlying neural patterns, not claim that it triggers specific cognitive processing. Therefore, I neither understand what "more frequent processing" the authors refer to, nor accept their account of the multivariate results.

      The multivariate result pattern, briefly, is that distances between words, both within and across languages, are larger under the repetition condition. One plausible interpretation is that a word representation comprises two parts: language-type (alphabetic vs. non-alphabetic) and fine-grained identity features (visual shape, orthography, semantics, phonology, etc.). Repetition of language type may, via RS, reduce the weight of the first component, thereby increasing the relative contribution of fine-grained features and amplifying inter-word differences. This could explain the multivariate findings.

      Thank you for these insightful comments regarding the logic of the multivariate analyses. In the revision, we will clarify that the multivariate analyses were conducted to assess correlation distances between neural responses to pairs of words, either within the same language or across different languages. The processing of intra-language similarity was assessed rather than defined by conducting the multivariate analyses. We will further elaborate the rationale underlying our experimental design, specifically why the processing of intra-language similarity is expected to occur more frequently in the repetition condition (Rep-Cond) than in the alternation condition (Alt-Cond).

      We also appreciate the alternative account of the observed neural repetition suppression (RS) effects in terms of language-type versus fine-grained identity feature processing. This perspective will be incorporated into the revised Discussion. In particular, we will outline the patterns of neural activity predicted by an account that assumes an increasing contribution of fine-grained features, and evaluate the extent to which our findings are consistent with these predictions.

      (2) Question 2:

      For unlearned languages, people cannot distinguish lexical from sub-lexical levels. What, then, determines (i) the RS-effect difference between letters and radicals in familiar languages and words in unlearned ones, and (ii) the similarity of repetition effects between words in unlearned and familiar languages? An explicit account is needed.

      Thank you for this helpful suggestion. In the revised manuscript, we will include a dedicated paragraph addressing these two issues. Specifically, we will provide a more precise account of the differences in repetition suppression (RS) effects between letters and radicals in familiar languages, as well as the similar RS effects observed for unlearned and familiar languages. These additions will help clarify the interpretation of the neural RS effects associated with visual word processing and strengthen the theoretical implications of our findings.

      Reviewer #2 (Public review):

      Summary:

      This study investigates how the human brain categorizes visual words from distinct writing systems (alphabetic vs. non-alphabetic) as a neural basis for the social-categorization function of language. Using a repetition suppression paradigm combined with electroencephalography and magnetoencephalography, the authors conducted nine experiments with independent participants to identify the neural network underlying language-based categorization, characterize its temporal dynamics, and test whether this process operates independently of linguistic properties such as semantic meaning and pronunciation.

      Strengths:

      (1) The study employs a well-validated design with clear control conditions and systematically manipulates key variables, including writing system, language familiarity, and native language background. The use of nine experiments with independent participant samples strengthens the reliability and replicability of the results.

      (2) The work combines EEG and MEG, cross-validating findings across imaging modalities to support the reported neural effects. A combination of univariate, multivariate, and connectivity analyses is used to characterize neural responses and network interactions.

      (3) Results are consistent across multiple language groups and for both familiar and unfamiliar languages, supporting the generalizability of the identified neural mechanism beyond specific languages or prior experience.

      Weaknesses:

      The authors provide compelling evidence that the identified neural network supports the categorization of words by language, including computations of intra-language similarity and inter-language difference. However, the conceptual framing of this finding as directly reflecting the social-categorization function of language may be premature. While the task captures spontaneous language categorization, it does not involve social evaluation or intergroup processes. The connection to social categorization is inferred from prior literature rather than demonstrated within the current experimental design. Clarifying this distinction would strengthen the conceptual precision of the manuscript.

      Thank you for raising this important point. In the revised Discussion, we will include an additional paragraph to clarify several related issues. First, prior research suggests that language can serve as a socially relevant category cue. Second, these findings imply that rapid categorization of words by language may occur in the human brain. Third, our results identify a neural network supporting such rapid language-based categorization but do not directly test how this process relates to social categorization. Highlighting these points will help delineate the scope of our findings and point to important directions for future research.

      We'll work on a revision of the manuscript and will submit the revision when it's ready.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) One may be careful in interpreting the comparison between MCF10a and Beas2b cells as used in this study. The conditions may not necessarily be representative of the actual properties of breast and bronchial epithelia. How much of the epithelial organization is reconstituted under these experimental conditions remains to be established. This is particularly obvious for bronchial cells, which would need quite specific culture conditions to build a proper bronchial layer. In this study, they seemed to be on the verge of a mesenchymal phenotype (large gaps, huge protrusions, cells growing on top of each other, as mentioned in the manuscript).

      We thank the reviewer for this important point. We agree that our experimental conditions do not fully recapitulate the in vivo architecture of either breast or bronchial epithelia. As the reviewer points out, the two cell lines need typical culture conditions to grow in an in-vivo like architecture, such as acinar structures for mammary tissue, and a pseudostratified architecture for the bronchial tissue, and it certainly would be interesting to subject the cell lines in these organotypic architectures and study the fate of oncogenic mutant cells. However, this would be an independent study on its own and is out of the scope of the current manuscript. Here, we intend to compare these two well-established epithelial lines from mammary and bronchial epithelial tissues, with distinct intrinsic mechanical and organisational properties, in minimal culture conditions, and study how just the context of having two different sources of epithelial cells can change the fate of oncogenic cells present in the wild-type population. We have now also performed experiments with the MDCK cell line, which is not like the BEAS2B line, and has well-defined cell-cell adhesions [Supplementary figure. 4a], and epithelial morphology, and shown that the fate of HRasV<sup>12</sup> mutants is different here as well, as compared to the MCF10A cell line.

      (2) As an alternative to Beas2b, comparison of MCF10a with another cell line capable of more robust in vitro epithelial organization, but ideally with different adhesive and/or tensile properties, would be highly interesting, as it may narrow down the parameters involved in the segregation of oncogenic cells.

      We agree with the reviewer and in line with this suggestion, we have repeated the key experiments using Madin-Darby Canine Kidney (MDCK) cells, a well-established model epithelial cell line. Our results show that even though MDCK cells show significantly distinct properties compared to BEAS2B cells (MDCK being more epithelial like than BEAS2B), the dynamics of the HRasV<sup>12</sup> clusters in both these systems are similar [Supplementary figure. 4b], and distinctly different from the mammary epithelial cells (MCF10A). We did not observe the formation of an actin belt around HRasV<sup>12</sup> clusters in MDCK monolayers, which indeed forms in MCF10A monolayers. Additionally, in MDCK cells, the HRasV<sup>12</sup> mutant clusters are not under compaction or jamming, instead, they form protrusions similar to the ones seen in BEAS2B monolayers. These results solidify our hypothesis of tissue-specific differences in the mechanics of cancer initiation.

      (3) While the seminal description of tissue properties based on interfacial tensions (Brodland 2002) is clearly key to interpreting these data, the actual "Differential Interfacial Tension Hypothesis" poses that segregation results from global differences, i.e., juxtaposition of two tissues displaying different intrinsic tensions. On the contrary, the results of the present work support a different scenario, where what counts is the actual difference in tension ALONG the tissue boundary, in other words, that segregation is driven by high HETEROTYPIC interfacial tension. This is an important distinction that should be clarified.

      We thank the reviewer for this insightful comment. As correctly noted, Brodland’s 2002 work provided a foundational formulation of the Differential Interfacial Tension Hypothesis (DITH), which frames tissue organization in terms of effective interfacial tensions.

      While in its original form, DITH emphasised segregation as a consequence of global differences in the intrinsic (bulk) tensions of juxtaposed tissues, our results specifically show that segregation is determined by local interfacial mechanics between transformed- and host cells. These local interfacial dynamics, however, is related to global contractility of cells- From our experiments with blebbistatin, we have observed a loss in the efficiency of segregation upon reducing global contractility, consequently inhibiting the formation of the interfacial actomyosin belt, which serves as the source of the interfacial tension between healthy and mutant populations. Therefore, the differences in local interfacial mechanics stem from intrinsic global contractility of cells in discussion here.

      We have also clarified this distinction more clearly in the discussion and have explicitly stated that while DITH provided the foundation for conceptualizing tissue mechanics, our findings on transformed cell- healthy cell interactions specifically demonstrate that a higher efficiency of segregation is driven by high heterotypic interfacial tension at the tissue boundary.

      (4) Related: The fact that actomyosin accumulates at the heterotypic interface is key here. It would be quite informative to better document the pattern of this accumulation, which is not clear enough from the images of the current manuscript: Are we talking about the actual interface between mutant and wt cells (membrane/cortex of heterotypic contacts)? Or is it more globally overactivated in the whole cell layer along the border? Some better images and some quantification would help.

      We agree that a detailed visualisation of actomyosin distribution would strengthen our conclusions. We have now added a few more images of the interface to the Supplementary Data [Supplementary figure. 5], which show that cortical actin accumulates in individual cells, at the wild type cell-mutant cell interface, and actin levels go up in both wild type and mutant populations at the interface. This is also clear from the quantifications of different region of interests [Figure 2e], which is done by segmenting individual cells in these regions and quantifying actin intensity in each cell.

      (5) In the case of Beas2b cells, mutant cells show higher actin than wt cells, while actin is, on the contrary, lower in mutant MCF10a cells (Author response image 2). Has this been taken into account in the model? It may be in line with the idea that HRas may have a different action on the two cell types, a possibility that would certainly be worth considering and discussing.

      We thank the reviewer for raising this important point. While a direct experimental dissection of how HRasV<sup>12</sup> mutation affects actin levels in BEAS2B and MCF10A cells individually is beyond the scope of the present study, we do not rule out the possibility that a HRasV<sup>12</sup> mutation may exert cell-type-specific biochemical effects on actin regulation in these two epithelial systems.

      Although the difference in actin between the mutants and the wild-type cells has not been incorporated into the model presented in the manuscript, we have now shown how actin levels change in response to the interfacial tension formed between the mutant and wildtype cells by adding a mechanochemical feedback to the model. Rather than prescribing intrinsic differences in actin levels between mutant and wild-type cells, we asked whether the feedback between the actin cytoskeleton and mechanical stress alone is sufficient to generate the observed actin reorganization. To address this, we incorporate a mechanochemical feedback loop (MCFL-I), originally developed in our earlier work [35], into the vertex model framework. This feedback captures the experimentally observed coupling between cell shape, actomyosin organization, and mechanical stress (i.e., heterotypic interfacial tension), and has previously been shown to reproduce biologically realistic epithelial behaviours such as dynamic cell shapes and heterogeneous actomyosin distributions [35].

      In this framework, actin is not introduced as an explicit or intrinsic variable. Instead, changes in actomyosin organization emerge dynamically in response to mechanical stresses. Specifically, MCFL-I allows the preferred area and preferred perimeter of cells to evolve depending on cell shape and actomyosin binding, rather than remaining fixed. From these evolving parameters, we compute the normalized contractility, , which we interpret as a proxy for bulk actin, and normalized line tension which we interpret as a proxy for junctional actin. These normalized quantities provide size-independent measures of actomyosin organization across the tissue. 

      The equations for MCFL-I can be written as:

      Thus, with MCFLs, the vertex model does not have fixed 𝐴<sub>0</sub> and 𝑃<sub>0</sub>. The cells dynamically change these parameters depending on the vertex model dynamics. The constitutive relations for the and are given below [1]:

      Here, is the fraction of myosin bound to actin as a function of cell area 𝐴. This nonlinear dependence arises from the load or strain-dependent binding of myosin to actin, and is a model parameter which is proportional to the binding affinity of myosin to actin in the absence of any strain. We consider to the be the same for both mutant and wild-type . Importantly, both mutant and wild-type cells obey identical mechanochemical rules in the model. Differences in actin organization arise solely due to differences in mechanical stress generated by differential interfacial tension. Positive differential interfacial tension compresses mutant cells within clusters. This will lead to different and P<sub>0>/sub> across the monolayer via MCFL-I, and thus reduced bulk actin and increased junctional actin [Appendix figure. 4], consistent with experimental observations. Conversely, when differential interfacial tension is weak or negative, mutant and wild-type cells experience similar stresses, and the model predicts minimal differences in actin organization [Appendix figure. 5].

      Thus, while HRasV<sup>12</sup>-dependent biochemical effects may indeed differ between BEAS2B and MCF10A cells, our results demonstrate that mechanical interactions at mutant– wild-type interfaces are sufficient to generate distinct actin signatures in the two tissues, without invoking cell-type-specific actin regulation. We have added the details of the mechanochemical feedback loop in the model to the Appendix to emphasize that the model tests the sufficiency of mechanics-driven actin reorganization rather than excluding additional biochemical contributions. 

      Although it looks that even for Λ > 0 we see that the normalized line tension seems to be negative. This is however just an artefact of the colorbar limits we have used to compare with the Λ < 0 case. If we plot with different colorbar limits, we see that the interface has as shown in Author response image 1.

      Author response image 1.

      Reviewer #2 (Public review):

      (1) It is unclear what the mechanistic origin of the shape-tension coupling is, which is used in the vertex model, and how important that coupling is for the presented results. The authors claim that the shape-tension coupling is due to the anisotropic distribution of stress fibers when cells are under external stress. It is unclear why the stress fibers should affect an effective line tension on the cell boundaries and why the stress fibers should be sensitive to the magnitude of the internal isotropic cell pressure. In experiments, it makes sense that stress fibers form when cells are stretched. Similar stress fibers form when the cytoskeleton or polymer networks are stretched. It is unclear why the stress fibers should be sensitive to the magnitude of internal isotropic cell pressure. If all the surrounding cells have the same internal pressure, then the cell would not be significantly deformed due to that pressure, and stress fibers would not form. The authors should better justify the use of the shape-tension coupling in the model and also present simulation results without that coupling. I expect that most of the observed behavior is already captured by the differential tension, even if there is no shape-tension coupling.

      The reviewer is correct in stating that most of the observed behaviour is already captured by the differential tension, without the shape-tension coupling. However, the shape tension coupling has been used here in accordance with the experimental observation that the cells at the interface are aligned and elongated along the interface [Fig. 2h], which can not be captured without the shape-tension coupling. The difference between shape indices of cells at the interface and away from the boundary is plotted versus the interfacial tension in the case of no shape-tension coupling [Appendix figure 2]. The red dashed line represents the experimental value of the shape index difference. The blue line is the shape index difference between two randomly chosen groups of cells (half of the total number of cells in each group is taken). At zero line-tension, the difference in shape index between interface cells and cells away from the interface is same as that between randomly chosen groups of cells, which is expected since there should be no interface at zero line-tension. The no shape-tension data presented here are averaged over 19 seeds. Although the results without shape-tension coupling reaches experimental values at high enough differential tension [Appendix figure 3], a closer inspection of the simulation results show that the cells are just squeezed and are aligned perpendicular to the interface, which is contrary to what is seen in experiments [Fig. 2h].

      Calculating the average of the absolute value of the dot product of the nematic director and the interface edge for simulations with and without shape-tension coupling [Appendix figure 3] clearly shows that with shape-tension coupling, the cells align and elongate along the interface as is seen in experiment, given by an interface dot product value > 0.5 at high enough line-tension values. Further, shape-tension coupling or biased edge tension has been used before to model for cell elongation during embryo elongation [45] and here we use it as an active line-tension force, which elongates cells along the interface, in addition to the differential tension which is passive. This additional quantification of the alignment and elongation of cells along the interface will be added to the Appendix.

      (2) The observed difference of shape indices between the interfacial and bulk cells in simulations in the absence of differential line tension is concerning. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. For all presented simulation results, the authors should repeat multiple simulations and then present both averages and standard deviations. This way, it would be easier to determine whether the observed differences in simulations are statistically significant.

      The difference in shape indices between the interfacial and bulk cells in simulations has now been calculated over 11 different seed values. The observed differences in simulations, along with the standard deviations have been plotted in Figure 4b. This figure will be updated to include the standard deviations. The nonzero difference in shape index in the absence of differential line tension for low values of stress threshold is due to the shape-tension coupling acting even at low differential tension. Thus, a non-zero, sufficiently high value of the stress threshold is required in our model with shape-tension coupling. This has also been stated in section 4 of the paper. The importance of the shape-tension coupling has been stated in response to the previous point.

      (3) The authors should also analyze the cell line tension data in simulations and make a comparison with experiments.

      The line tension for each edge can be calculated as .

      Although the line tension distributions look similar to the ones obtained from Bayesian Force Inference, a better comparison is between the normalized line tension and actin seen in experiment as we have discussed under point (4) asked by Reviewer 1.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) The authors claim that the negative tension Lambda<0 resembles the Beas2b phenotype. This is not consistent with the expression of actin in Figure 2f, which seems very similar in all four regions of interest (ROIs). Also, the segregation index data for Beas2b in Figure 1h looks very different from the demixing parameter in Figure 4f for the negative value of Lambda.

      In the model presented in the previous version of the manuscript, actin differences have not been incorporated. We have only added an interfacial line tension, which might arise only at the interface between cells. In response to comment (4) from Reviewer 1, we have considered a vertex model with mechanochemical feedback and interfacial line tension to understand how actin distribution in the tissue is affected by interfacial tension. The results presented match very well with experimental images.

      The reviewer has rightly pointed out that the segregation index (SI) data presented in Fig. 1h have a different trend compared to those in Fig. 4f. However, it is essential to note that in the simulation, the initial condition is one in which the mutant cluster is already fully segregated, and thus, at the initial time point. This is not the case in experiments, and at initial time points. Thus, the two plots are not directly comparable and only show how SI changes in our simulations. It is more effective to compare the final time points in Fig. 2f with those in Fig. 4e, where we observe that Mcf10a has a higher SI compared to Beas2b, and the case with Λ > 0 has a higher SI than the case with Λ < 0. This supports our claim that Λ < 0 resembles the Beas2b phenotype and Λ > 0 resembles the Mcf10a phenotype.

      (2) It is unclear how the threshold pressure Pi_0 is implemented for the shape-tension coupling in the vertex model. Is the value of the additional tension gamma_ij equal to 0 if the internal pressure is below that threshold?

      The stress threshold is implemented for the shape-tension in the vertex model in the following way. The line tension forces can be written as:

      where, and . If the stress on the cell is below the threshold, then for those cells.

      (3) In vertex model simulations, the authors use identical parameters for wild-type and mutant cells. This does not seem to be consistent with experimental observations in Figure 2, where the expression of actin is different, and also, cell shape indices are different for the wild-type and mutant cells. The authors should comment on how that choice affects their simulation results.

      We thank the reviewer for this comment. As noted in our response to comment 4 from  reviewer 1, we have now attempted our simulations after adding a mechanochemical feedback to the model. Here, both wild-type and mutant cells follow identical mechanochemical rules within the vertex model. This choice does not imply that the cells are mechanically identical in the tissue; rather, it allows us to test whether differences in cell shape and actin organization can emerge purely from mechanical interactions.

      By incorporating the mechanochemical feedback loop (MCFL-I), the model captures how heterotypic interfacial tension redistributes mechanical stresses between mutant and wild-type cells. These stresses lead to differences in cell area, perimeter, and shape, which are then translated via MCFL-I into distinct bulk and junctional actin signatures. Consequently, even though the intrinsic parameters are the same, the emergent mechanical environment reproduces the experimentally observed differences in actin intensity and cell shape indices (as shown in Figure 2).

      Thus, our approach demonstrates that the experimentally observed heterogeneity between mutant and wild-type cells can arise solely from interface-driven mechanical effects, without prescribing any cell-type-specific parameters in the model.

      (4) Also provide data for cell line tensions in the vertex model, which can then be compared with the experimental data in Figure 2. This is especially important because the differential cell line tension at the interface of mutants and wild-type cells seems to be playing a very important role.

      The cell tensions from the vertex model have been plotted in the response to main comment (3) from Reviewer 2. Since the interfacial tension has been included as an extra term in the vertex model by hand, it is not trivial to simply compare the line tensions from the vertex model to the experimental data. However, we can understand how the tensions are by looking at the normalised tension and normalised contractility plotted as a response to comment (4) from Reviewer 1. Those plots are from a vertex model with mechanochemical feedback and the plots match well with experimental actin images.

      (5) In Figure 2j, the authors should report the relative cell pressure and line tension for all four ROIs. The data is only shown for the wild-type cells and for mutants in clusters, even though the figure caption states that the data is presented for all four ROIs. It would also be useful to report the cell tension at the interface between the mutant cells and wild-type cells since this is the key parameter for the vertex model simulations.

      We agree and have updated the graph [Figure 2j].

      (6) The tangential motion of cells around oncogenic clusters only shows up towards the end of Supplementary Video 3. It is unclear whether this is a transient effect or whether this tangential motion would persist for a longer time.

      We thank the reviewer for raising this point. In our experiments, tangential cell motion in the wild type population along the boundary of oncogenic cluster consistently emerges as the oncogenic cluster becomes compacted. We have plotted tangential velocity in interfacial wild type cells over time (Supplementary Fig. 6b), and show that such a motion persist at the cluster-wild-type interface, until the end of time-lapse recordings in all cases. 

      (7) It is very awkward that the authors are representing an integral of the tangential velocity over different loops in Figures 3c and 4i. Thus, it is very hard to separate how much of the increase in the integrated velocity is due to larger loops and how much is due to changes in the average tangential velocity. Since different loops have different perimeters, it would have been better to report the average tangential velocity by dividing the integrated tangential velocity by the perimeter length of each loop. In the methods, the authors state that the concentric circles go from the center to a point twice the radius of the mutant cluster, but this is not consistent with the image in Figure 3c, where the concentric circles seem to go only to the boundary of the mutant cluster.

      We thank the reviewer for raising the point regarding the dependence of the loop-integrated tangential velocity on the perimeter length. While the circulation (loop-integrated tangential velocity) indeed scales with loop size, it increases with radius only if tangential velocity components are directionally coherent along the loop.

      In our data, concentric-loop analysis centered on mutant clusters reveals a systematic increase in tangential motion with radius, with the largest values occurring at the outermost loops corresponding to the cluster–tissue interface. In contrast, applying the identical analysis to randomly selected wild-type regions does not yield any monotonic increase with radius, despite the increasing perimeter of the loops, and instead shows fluctuations around zero. This control demonstrates that the observed increase around mutant clusters is not a trivial geometric consequence of larger loop size but reflects the emergence of coherent tangential motion specifically at the mutant cluster boundary.

      To further address the reviewer’s concern, we additionally computed the mean tangential velocity by normalizing the loop-integrated tangential velocity by the loop perimeter. As shown in Supplementary figure. 6a, this normalization preserves the same qualitative trend: tangential motion peaks near the periphery of mutant clusters, whereas no such trend is observed in wild-type regions. We therefore conclude that both metrics capture the same physical phenomenon: enhanced tangential cell motion localized to the mutant cluster boundary, consistent with the behavior observed in the time-lapse videos.

      Author response image 2.

      From simulation data

      (8) The authors should comment on how jamming and unjamming are related to shape indices because some readers may not be familiar with them.

      We have updated the same in the text of Results 2.

      (9) In the captions of Figure 3, the authors state that the bronchial epithelium gets kinetically arrested. This is not evident from the data in Figure 3d, where the velocity magnitude drops just a little bit for the bronchial epithelium, and it remains much higher compared to the mammary epithelium at long times.

      We agree with this comment, and that using the word, kinetically arrested, for Beas2b cells is misleading, since their motion is much higher, even after the initial drop. We have updated the text in the caption accordingly.

      (10) It is unclear why the authors have used the segregation index for analyzing experiments and the demixing parameter for analyzing simulations. Both parameters are trying to quantify the same thing, so it would have been better to use the same quantity for both experiments and simulations to enable easier comparison.

      We agree that using the same quantity for both experiments and simulation would enable easier comparison. Thus, we have replaced the demixing parameter with segregation index in Figure 4. 

      (11) It is unclear what experimental data were used for shape indices in Figure 4c. Was it the data from Mcf10a or Beas2b? It is also unclear which ROIs were used because different ROIs have very different shape indices in experiments, according to Figure 2e,f.

      We have used the experimental ∆(𝑆ℎ𝑎𝑝𝑒 𝑖𝑛𝑑𝑒𝑥) = 0.75, which is a rough estimate of the difference between the shape indices for ROI 2 (interface), and ROI 1, ROI 3 and ROI 4 (away from interface) from Fig. 2 e for MCFL10a. 

      (12) The authors find that the differences in shape indices are non-zero even for Lambda=0 for some threshold pressure parameters Pi_0 in Figure 4c. This should not happen because all the cells are identical in that case. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. How is this simulation data obtained? Is it from a single simulation, or is this averaged over a certain number of simulations? Authors should perform multiple simulations and report both the mean values and the standard deviation.

      We have addressed this in the response under main comments (1) and (2) from Reviewer 2.

      (13) It is unclear how the cell extrusion was simulated in the vertex model.

      Extrusion probability calculation: Simulations with just a single mutant cell were run for a range of differential interfacial line tension values (Λ = 0, 0.1, 0.4, 0.8, 1.2, 1.6) with shape tension coupling. The simulation was run till the area of the mutant cell fell below a threshold area = 0.1, after which we consider the mutant cell to be extruded. 9 different random initial seeds were run and analysed. Each seed gives a binary result – either extruded or not. This was used to calculate the extrusion probability. We have added this section to the Appendix.

      (14) The authors claim that HRas^V12 clusters in bronchial epithelium grew on top of one another, but it is not clear how this can be observed in Figure 2b or in any other Figure.

      We thank the reviewer for raising this point. Our original statement that cells were growing on top of each other was based on observations from the Z-stack images, which allowed us to resolve cell positions along the apico–basal axis. However, since these Zstack data are not included in the current manuscript, we agree that this claim cannot be directly supported by the figures shown. We have therefore removed this statement from the text and restricted our conclusions to what is directly supported by the presented data.

      (15) In the main text, the authors state that bronchial epithelial cells exhibited higher F-actin intensities compared to mammary bronchial cells, but this difference is not statistically significant according to Figure 5e.

      We agree with the reviewer and have thus changed the text because even though the Factin intensities seemed higher in bronchial epithelium visually, the difference was not statistically significant.

      (16) The definition of eccentricity is incorrect in the text. The authors state that the eccentricity is quantified as the ratio of the length of the minor axis to the major axis of an ellipse. According to this definition, the eccentricity would be 1 for a circle and not 0.

      We have updated the definition of eccentricity in the text to the correct one, including the correct equation.

      (17) It is unclear whether the active force F_act is used in the vertex model simulations. The active force is defined, but then its value is never specified. Note that the motility force is also an active force, so it is unclear why the motility and active forces were separated.

      In our model, the line tension force arising from the shape tension coupling is the active force. We agree that the motility force is also an active force, however, in the absence of any directional movement for instance, the homeostatic tissues in discussion here, we have discounted the role of motility force in our mode, presented here. 

      (18) The authors use inconsistent naming for different types of epithelia throughout the manuscript. Mcf10a cells are referred to as either mammary epithelium or breast epithelium, and Beas2b cells are referred to as either lung epithelium or bronchial epithelium. Because of the very broad spectrum of journal readers, it may not be obvious to all readers that different names refer to the same cell types.

      We have updated the text to keep the naming consistent throughout.

      (19) Many references to individual figure panels in the main text are incorrect. The authors should carefully check all the references to figures.

      We apologize for these errors. We have updated the incorrect references after carefully reviewing the entire manuscript.

      (20) In Figure 5, panel b is incorrectly labeled as d.

      We have corrected the same.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The aim of this work is to directly image collagen in tissue using a new MRI method with positive contrast. The work presents a new MRI method that allows very short, powerful radio frequency (RF) pulses and very short switching times between transmission and reception of radio frequency signals.

      Strengths:

      The experiments with and without the removal of 1H hydrogen, which is not firmly bound to collagen, on tissue samples from tendons and bones, are very well suited to prove the detection of direct hydrogen signals from collagen. The new method has great potential value in medicine, as it allows for better investigation of ageing processes and many degenerative diseases in which functional tissue is replaced by connective tissue (collagen).

      Weaknesses:

      It is clear that, due to the relatively long time intervals between RF excitation and signal readout, standard hardware in whole-body MRI systems can only be used to examine surrounding water and not hydrogen bound to collagen molecules.

      We agree that this is a regrettable situation (see also Discussion section). We are hoping that current and future efforts of MRI manufacturers towards improved hardware will eventually enable the technique for broader application.

      Reviewer #2 (Public review):

      Summary:

      This work presents direct magnetic resonance imaging (MRI) of collagen, which is not possible with conventional MRI or other tomographic imaging modalities.

      Strengths:

      The experimental work is impressive, and the presentation of results is clear and convincing. Through a series of thoughtfully prepared experiments, I found the evidence that the images reflect direct measurements of collagen to be highly compelling.

      Due to the technical demands, direct collagen imaging is unlikely to become widespread for routine clinical work, at least not anytime soon. That said, this work is nonetheless transformative and will likely be highly significant for research and perhaps clinical trials.

      Reviewer #3 (Public review):

      The paper is well written and well presented. The topic is important, and its significance is explained succinctly and accurately. I am only capable of reviewing the clinical aspects of this work, which is very largely technical in nature. Several clinical points are worth considering:

      (1) Tendons typically display large magic angle effects as a result of their highly ordered collagen structure (cortical bone much less so), and so it would have been of interest to know what orientation the tendons had to B 0 (in vitro and in vivo). This could affect the signal level at the longer echo time and thus the signal on the subtracted images.

      We have added arrows in the images showing the direction of the main magnetic field. For the in vivo case, the subject lay in the superman position, with B0 pointing from the hand towards the shoulder.

      (2) The in vivo transverse image looks about mid-forearm, where tendons are not prominent. A transverse image of the lower forearm, where there is an abundance of tendons, might have been preferable.

      We have added a distal view of the forearm, where more tendon structures are observed.

      (3) The in vivo images show the interosseous membrane as a high signal on both the shorter and longer TE images. The structure contains ordered collagen with fibres at different oblique angles to the radius and ulnar, and thus potentially to B 0. Collagen fibres may have been at an orientation towards the magic angle, and this may account for the high signal on the longer TE image and the low signal on the subtracted image.

      This is certainly an interesting take. While the magic angle effect is well established for collagen bound water, the orientation effects on the macromolecular collagen signal are still to be investigated. Our initial experiences so far suggest that the direct collagen signal is not as sensitive to orientation as the bound water.  

      Regarding the described observation for the interosseous membrane, we expect the high signal coming from collagen-bound water (yet not quite at the magic angle), which hardly decays between the two TEs, as their difference is small as compared to the T2* of this signal. Hence, this signal is removed in the subtraction image, and only the macromolecular collagen signal remains, which appears to be very low. Working with samples of the interosseus membrane may provide further insights into why this is the case.

      (4) Some of the signals attributed to the muscle may be from an attachment of the muscle to the aponeurosis.

      We have added the aponeurosis as a possible signal contributor in the muscle tissue.

      (5) There is significant collagen in subcutaneous tissues, so the designation "skin" may more correctly be "skin and subcutaneous tissue".

      We have updated the label accordingly.

      (6) Cortical bone is very heterogeneous, with boundaries between hard bone and soft tissue with significant susceptibility differences between the two across a small distance. This might be another mechanism for ultrashort T 2 * tissue values in addition to the presence of collagen. The two effects might be distinguished by also including a longer TE spin echo acquisition.

      Solid cortical bone may also have an ultrashort T 2 * in its own right.

      The described effect is clearly of importance for bone water but plays a negligible effect for the macromolecular signal. We would like to support this by a brief, coarse estimation. 𝑇<sub>2</sub>* can be approximated by 1/𝑇<sub>2</sub>* = 1/𝑇<sub>2</sub> + 1⁄𝑇<sub>2</sub>′, where 1⁄𝑇<sub>2</sub>′ \= 𝛾∆𝐵 = 𝛾∆𝜒𝐵<sub>0</sub> (Ref. 1).

      The susceptibilty difference reported for the interface between bone and water is ∆𝜒 = 2.5 ppm (Refs. 2 and 3), which at 3T leads to a 𝑇<sub>2</sub>′ ≈ 3000 𝜇𝑠. From our recorded FIDs, we use a 𝑇<sub>2</sub>* of 10 μs and thus obtain 𝑇<sub>2</sub> \= 10.03 𝜇𝑠.

      As can be seen, the change in the transverse relaxation constant due to susceptibility is negligible compared to the intrinsic decay of the macromolecular collagen signal. Notably, this is not the case for the pore water signal where T<sub>2</sub>s are on the order of milliseconds (Ref. 2).

      A footnote was added in the Introduction section regarding this topic.

      (7) It may be worth noting that in disease T 2 * may be increased. As a result, the subtraction image may make abnormal tissue less obvious than normal tissue. Magic angle effects may also produce this appearance.

      This is an important point regarding image interpretation. For this reason, it is advantageous that also the original anatomical images prior to subtraction are available, which will show such effects. They can be used in conjuction with the collagen-specific image to provide further insights regarding tissue disease. Increased T<sub>2</sub>* of diseased tissue has so far been reported for the bound water components due to a reduction of dipolar interactions between bound water and collagen (Ref. 4). A potential related change in T<sub>2</sub> for the macromolecular collagen component itself is certainly of interest and an avenue to explore in future work.

      (8) It may be worth distinguishing fibrous connective tissue (loose or dense), which may be normal or abnormal, from fibrosis, which is an abnormal accumulation of fibrous connective tissue in damaged tissue. Fibrosis typically has a longer T 2 initially and decreases its T 2 * over time. In places, the context suggests that fibrous connective tissue may be more appropriate than fibrosis.

      We are aware of this important distinction. We therefore checked the manuscript for references to fibrosis, making sure that the meaning is as intended.

      Overall, the paper appears very well constructed and describes thoughtful and important work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It should be stated that various methods with very short echo times (e.g. SWIFT by Garwood et al.) have been described in the past. This work shows for the first time that direct signals from collagen and be systematically detected in tissue samples.

      We have expanded a sentence in the introduction and reference selected publications studying short-T<sub>2</sub> water signal in collagen, including SWIFT.

      (2) It should be noted that the 1H atoms bound to collagen are located at different sites (at different amino acids of the protein) of the molecule and have different frequencies, and that further signal analyses are of interest.

      We have included additional information regarding distinct resonances of proton-binding sites of collagen in the introduction. The discrete observation of such signals requires advanced NMR methodology such as magic-angle spinning and RF decoupling, which is not a suitable approach for in vivo MRI. Without such methods, the broad lineshapes overlap strongly and are rather observed as a single decaying exponential with the dipolar oscillation as we observe in the FIDs.

      (3) Is it certain that the bump at 30 microseconds comes from 'dipolar coupling'? Is the development time probably too short for chemical shift-induced interference or J-coupling effects?

      30 microseconds is an extremely short interval to accumulate phase and requires large resonance offsets to observe significant changes. To investigate the nature of the bump, we also collected data on a Bruker 7T NMR spectrometer (see Author response image 1). Overall the same signal characteristics are observed as with 3T. In particular, the position of the bump is the same, excluding chemical shift as as source. However, with the higher field strength, chemical shift becomes significant for the signal phase, as observed by the change in the phase behavior at 50 microseconds, when the collagen component has decayed.

      While J-coupling is independent of field strength, the typical ranges are single-digit to tens of Hertz. In contrast, dipolar coupling interacts on the order of thousands of Hertz, which coincides with the values extracted from our signal model.

      To clarify this point, we extended the respective sentence in the Results section.

      Author response image 1.

      (4) It should be noted that short RF pulses have a relatively high energy content, and whether there are any particular stresses on patients during the examination (SAR, nerve stimulation?).

      SAR is an important issue in ZTE MRI. Since imaging bandwidths are large and excitation is performed with the imaging gradient being on, broadband pulses are necessary. Hence, significant RF deposition occurs and in vivo the flip angle can often not be optimized for the maximum signal, but will be limited by the SAR limit. We have added an explanation in the Discussion section.

      Peripheral nerve stimulation is generated by rapid switching of strong gradients. However, ZTE sequences are usually operated without switching gradients on and off, but with only minor adjustments of the gradient direction between TR intervals. Therefore, PNS is not a relevant issue.

      (5) In the Results section, Part B, 'substantial signal intensity' should be written instead of 'substantial image intensity'.

      We have changed this as suggested.

      References

      (1) Chavhan GB, Babyn PS, Thomas B, Shroff MM, Haacke EM. Principles, techniques, and applications of T2*-based MR imaging and its special applications. Radiographics. 2009 Sep-Oct;29(5):1433-49. doi: 10.1148/rg.295095034. PMID: 19755604; PMCID: PMC2799958.

      (2) Seifert, AC, Wehrli, SL, and Wehrli, FW (2015), Bi-component T<sub>2</sub>* analysis of bound and pore bone water fractions fails at high field strengths. NMR Biomed., 28, 861– 872. doi: 10.1002/nbm.3305.

      (3) Hopkins JA, Wehrli FW. Magnetic susceptibility measurement of insoluble solids by NMR: magnetic susceptibility of bone. Magn Reson Med. 1997 Apr;37(4):494-500. doi: 10.1002/mrm.1910370404. PMID: 9094070.

      (4) Loegering IF, Denning SC, Johnson KM, Liu F, Lee KS, Thelen DG. Ultrashort echo time (UTE) imaging reveals a shift in bound water that is sensitive to sub-clinical tendinopathy in older adults. Skeletal Radiol. 2021 Jan;50(1):107-113. doi: 10.1007/s00256-020-03538-1. Epub 2020 Jul 8. PMID: 32642791; PMCID: PMC7677198.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Based on the effects observed with OC vs. Ntf3 cKO, it is unclear whether OC is indeed exerting its non-cell-autonomous effects via Ntf3. Knocking out both Ntf3 and OC and comparing the effects to those seen with just OC cKO alone could provide more insight on this point.

      In this study, we did not intend to demonstrate that Onecut transcription factors exert their non-cell autonomous action on spinal interneuron development by regulating Ntf3 expression, and we do not state in the manuscript that this is the case. We only show that Onecut factors and Ntf3, the expression of which they regulate, contribute to the non-cell autonomous regulation of spinal interneuron development by the motor neurons. We are convinced that Onecut factors could regulate multiple independent factors and pathways involved in extrinsic regulation of interneuron development, as supported by the regulation of multiple secreted factor or membrane protein expression in motor neurons detected in the reported RNA-sequencing experiment (this manuscript and [1]). This possibly also includes, as demonstrated in cell culture for multiple homeoproteins including human Onecut factors [2], the intercellular transfer of the Onecut homeoproteins during spinal cord development, a process that we are currently investigating. Knocking out both OC and Ntf3 in the motor neurons, beyond being technically extremely challenging (1/64 probability to obtain triple-mutant embryos), would not enable to address this question, as it will simply results in the addition of two different defects.

      Also, a quantitative summary of the effects of Ntf3 overexpression in motor neurons in the chick is lacking.

      A quantitative summary of the effects of Ntf3 overexpression in the chicken embryonic spinal cord is provided in Figure S2.

      (2) How the authors assess changes in the spatial distribution of interneurons is unclear. In Figures 2 and 4, the control distributions (despite reporting the same populations in the same regions) look different, suggesting large sample-to-sample variance in distribution. Although the authors report that several sections in each level were taken from at least three animals for each condition, it's unclear how variance within WT or cKO sections was accounted for in the final statistical evaluation. It seems at a glance that a comparison between control samples in Figure 2 and Figure 4 could report statistically significant differences, which would be problematic. A more rigorous report of sample-to-sample variance and a more in-depth explanation of the statistical methods are needed.

      The experimental procedure to analyze the spatial distribution of spinal interneurons at different stages of development is described in details in the “Statistical analyses” paragraph of the Materials and Methods section of the manuscript, and has been repeatedly used by ourselves [3,4] and by others (see for example [5-7]) to conduct similar analyses.

      We also noticed that the distribution of the different analyzed interneuron populations in the control embryos showed some differences between the cOc1Oc<sup>2-/-</sup> and the cNtf3<sup>-/-</sup> lines. Several parameters can account for this observation. First, this study has been conducted over a period of 15 years, different investigators each contributing to different steps of the analysis. Second, the genetic background of these two lines is not identical, impacting both the duration of the gestation (hence, the embryonic stage of the performed analyses, even if the embryos were collected on the same gestation day) and possibly the distribution of some interneuron populations. Third, because of evolutions in the availability of the primary antibodies used to label the interneuron populations of interest, the same antibodies were not used throughout the study, as stated in the Materials and Methods section, although the same antibody was used by the same investigator to label the same interneuron population in each mouse line at each developmental stage.

      A detailed description of the number of sections and embryos included in each analysis as well as the whole statistical workflow that was used for the distribution analyses, which takes into account variance within control or mutant samples, will be provided in the revised version of the manuscript.

      Reviewer #2 (Public review):

      (1) The study primarily quantifies interneuron numbers and distribution at different levels of the spinal cord and under different genetic manipulations. Experimental details are lacking, defining how many sections were analyzed (several are noted in the methods) and how the rostrocaudal levels of the spinal cord were precisely aligned.

      A detailed description of the number of sections and embryos included in each analysis as well as the whole statistical workflow that was used for the distribution analyses will be provided in the revised version of the manuscript. The rostrocaudal levels of the spinal cord were precisely aligned using the distribution of Foxp1 in the Lateral Motor Columns (LMCs) at brachial or lumbar levels of the spinal cord [8,9], which will also be indicated in the revised version.

      In different figures, the values and distributions shown for controls vary quite a lot. For example, in Figure 2B vs Figure 4B, the number of FoxP2+ V1 neurons at brachial levels is ~350 vs 125. Similarly, the control distributions in 2I and 4I are quite different. This makes it challenging to determine whether the conclusions regarding the impact of each genetic manipulation on interneuron numbers and distribution are valid.

      Multiple factors may explain these observations. First, this study spans a 15-year period, with different researchers contributing to various stages of the analysis. Second, the genetic backgrounds of the two mouse lines are not identical, affecting both gestation length (thus influencing the embryonic stage at which analyses were performed, even when embryos were collected on the same gestational day) and potentially the distribution of certain interneuron populations. Third, due to changes in the availability of primary antibodies used to label the targeted interneuron populations, the same antibodies were not consistently employed throughout the study as noted in the Materials and Methods section though each investigator used the same antibody for a given interneuron population and developmental stage within each mouse line.

      (2) The relationship between OC and NT3 deletion data is not entirely clear. Both deletions presumably lead to changes in interneuron distribution, but is there any reverse relationship between the two that relates to relative changes in NT3 levels? The authors do not directly compare NT3 and OC KO IN distributions. Similarly, one might expect a decrease in interneuron numbers in OC mutants, which is only reported for V2c neurons. However, the image presented in Figure 2G shows an equal number of V2c INs in control and mutant.

      This study was not designed to demonstrate that Onecut transcription factors influence spinal interneuron development in a non-cell-autonomous manner through Ntf3 regulation, nor do we claim this in the manuscript. Instead, we show that Onecut factors and Ntf3, whose expression they control contribute to the non-cell-autonomous regulation of spinal interneuron development by motor neurons. We believe Onecut factors may regulate multiple independent factors and pathways involved in the extrinsic control of interneuron development. For instance, as noted earlier [2], we observed intercellular transfer of Onecut homeoproteins during spinal cord development, suggesting alternative mechanisms for non-cell-autonomous regulation.

      The two mouse lines studied here consist, on the one side, in a combination of OC inactivation and Ntf3 increased expression, and, on the other side, in Ntf3 inactivation. Therefore, a reverse relationship between the changes in interneuron distribution is not expected. Furthermore, gain-of-function and loss-of-function experiments in mouse models frequently generate phenotypes that are not inverse to each other [10-13].

      (3) It is not clear that the behavioral phenotypes seen in the olig2-cre mediated deletion of NT3 can be attributed to changes in interneuron development. How about a role of NT3 in oligodendrocytes? There is a big gap between the embryonic changes shown here and behavior, with no in-between circuit-level changes in locomotor circuits shown.

      We agree, the motor behavior changes that we recorded in Ntf3 conditional mutant mice are, as stated, “consistent with the hypothesis that Ntf3 produced by MNs is required to generate locomotor circuits with properly coordinated activity” but do not demonstrate a direct causal relationship. However, investigating the intrinsic activity of the spinal locomotor circuits, independently from, for example, oligodendrocyte contribution may prove to be extremely challenging and was beyond the scope of this study. In addition, to our best knowledge, Ntf3 has not been shown to be expressed in healthy oligodendrocytes in vivo, and TrkC has not been reported to be displayed by these cells in the same conditions.

      A more restricted manipulation would be deleting TrkC from specific interneuron populations. Related to this, although TrkC is shown to be broadly expressed in ventral interneurons, it is not shown specifically to colocalize with any of the interneuron markers. The authors should validate that the receptor is expressed in the subsets that they are investigating.

      We agree, investigating the consequences of inactivating the TrkC receptor in specific interneuron populations would be extremely informative. However, this experiment is also very challenging to perform, as most of the driver lines available to target spinal interneuron populations additionally target multiple neuronal populations outside of the spinal cord that are also involved in the control of movements and could therefore induce confounding effects on motor behavior analyses [14-20].

      We thank the reviewer for suggesting to investigate in more details the interneuron populations that display TrkC receptors, this will be include in the revised version of the manuscript.

      (4) The rationale for following up on NT3 seems to be the chick electroporation experiments; however, no changes in distribution are shown in those experiments, and only a very minor decrease in Chx10 interneurons. Shouldn't NT3 overexpression lead to substantial decreases in IN numbers according to the authors' model? The "data not shown", which presumably refers to distribution, would be important to show here, to further support this rationale.

      Chicken spinal cord electroporation only enables to study spinal cord development in a limited time-window, given the high mortality rate observed after longer incubation. At the stage we collected the electroporated embryos for analyses, interneuron migration has barely been initiated, and distribution cannot be studied yet. Consistently, we are not aware of any report of interneuron distribution analysis in electroporated chicken embryonic spinal cord, as compared to mouse embryos [3-7].

      (5) The idea that NT3 downregulation causes an increase in IN numbers is not intuitive. Also, considering the DTA experiments in Figure 1, showing that MN ablation leads to a decrease in several IN subtypes and no changes in V2a neurons. It would be helpful for the reader if the authors could synthesize their results in the discussion and reconcile their experimental findings.

      We agree, this will be included in the revise version of the manuscript.

      Reviewer #3 (Public review):

      (1) The manuscript relies heavily on quantifying numbers and the spatial distribution of interneuron populations. However, these do not seem to be consistent in control animals across experiments, making it difficult to interpret any changes observed in genetic manipulations. Specifically, in Figures 2 and 4, the same markers are being used to quantify V1, V2a, V2b, and V2c interneurons in controls vs. OC (Figure 2) or Ntf3 (Figure 4) conditional knockouts, but the numbers of neurons and their distribution in control animals are variable between these two figures. For example, there seems to be a mean of >300 V1 neurons in E12.5 brachial sections of Fig. 2 controls, but this number is <150 in Fig. 4 controls. The cell distribution scoring is similarly variable between these controls without any explanation. The same is true for E14.5 controls used in Figure S1 vs. Figure S3.

      We indeed observed variations in the quantifications and distributions of the analyzed interneuron populations in control embryos between the cOc1/Oc2<sup>⁻/⁻</sup> and cNtf3<sup>⁻/⁻</sup> lines. Several factors may explain this discrepancy. First, the study was carried out over 15 years, with different investigators contributing to distinct stages of the analysis—meaning interneuron distribution was not assessed by the same researchers in both lines. Second, the genetic backgrounds of the two lines differ, affecting gestation length (and thus the embryonic stage at analysis, even when embryos were collected on the same gestational day) as well as potentially altering the distribution of certain interneuron populations. Third, changes in the availability of primary antibodies targeting the interneuron populations of interest led to inconsistencies in antibody use across the study, as detailed in the Materials and Methods section. However, each investigator consistently used the same antibody for a given interneuron population and developmental stage within each mouse line.

      (2) Neurotrophic factors generally promote neuronal survival. However, in this study, the loss of Ntf3 leads to increased numbers of interneurons. This finding is in disagreement with previous observations in slice cultures of spinal cords, as stated in the discussion. This discrepancy makes it even more important that the cell counts reported in the figures discussed above are robust.

      Considering that neurotrophic factors only support neuronal survival would strongly neglect their important function in neuronal differentiation, which has been broadly demonstrated. Severe immunotoxic ablation of motor neurons or anti-serum blockade of Ntf3 activity severely depleted inhibitory, but not excitatory, interneurons in a highly apoptotic-prone organotypic culture model of embryonic rat spinal cord slices, which was rescued by Ntf3 in the first model [21]. Opposite results were obtained in vivo by other researchers using mouse models lacking almost all MNs due to the elimination of skeletal muscles, where the number of spinal INs remained unaffected [22,23]. Combined to our results, these in vivo observations suggest that Ntf-3 is involved in interneuron differentiation rather in their survival. Consistently, Ntf3 has been shown to promote neuronal differentiation [24].

      (3) The claim that phenotypes are non-cell autonomously driven by motor neurons is not well supported. In Olig2-Cre conditional knockouts of Onecut and Ntf3, there is no confirmation that the loss of these factors is specific to motor neurons. Therefore, it cannot be ruled out that other cell populations may be mediating the phenotypes.

      Combined conditional inactivation of Oc1 and Oc2 has been reported in [1]. Conditional inactivation of Ntf3 only impacts motor neurons as it is the only cell population in the ventral spinal cord wherein this factor is produced (this study and [25-27]). Furthermore, Olig2-Cre has been shown to be active in motor neurons and in V3 interneurons (see for example [10]), which, for this reason, have not been studied in the frame of this project as stated in the manuscript.

      (4) The claim that interneuron development is regulated by OC control of Ntf3 expression in motor neurons is not well supported. The authors show that loss of OC1/2 leads to an increase in Ntf3 expression in motor neurons. If this pathway were controlling interneurons, loss of OC function and overexpression of Ntf3 would have the same phenotype, which is not the case. Additionally, it would also be expected that loss of OC function and loss of Ntf3 function would have inverse phenotypes, which is also not the case. The phenotypes from OC loss of function and Ntf3 loss of function seem distinct from one another. The authors state that too little and too much Ntf3 are both bad for interneuron development, but there is no data to support their claim that OC1/2 mutants have altered interneuron development because of higher Ntf3 expression.

      This study was not aimed at proving that Onecut transcription factors mediate their non-cell-autonomous effects on spinal interneuron development through Ntf3 regulation, nor do we make this claim in the manuscript. Rather, we demonstrate that Onecut factors and Ntf3, whose expression they control—participate in the non-cell-autonomous regulation of spinal interneuron development by motor neurons. We propose that Onecut factors likely modulate multiple independent factors and pathways involved in the extrinsic regulation of interneuron development, as evidenced by the regulation of various secreted factors and membrane proteins in motor neurons observed in our RNA-sequencing data (this study and [1]). This may also involve intercellular transfer of Onecut homeoproteins during spinal cord development—a mechanism previously shown in cell culture for several homeoproteins, including human Onecut factors [2] and which we are currently exploring.

      (5) It is not clear that interneurons being studied express the Ntf3 receptor TrkC, which makes it difficult to assess whether changes in Ntf3 signaling are directly responsible for the phenotype.

      Immunofluorescence experiment in Figure 3C shows that TrkC receptor is present in cell populations surrounding motor neurons at e12.5, a stage where only the pre-motor interneuron populations reported in the manuscript are present. However, we thank the reviewer for suggesting to investigate in more details the interneuron populations that display TrkC receptors, this will be include in the revised version of the manuscript.

      (6) While the behavioral phenotypes are consistent with Ntf3 playing a role in motor circuits, there is no evidence to suggest that Ntf3's influence on premotor interneurons being studied is driving or contributing to this phenotype, as discussed by the authors.

      We acknowledge that the motor behavior changes observed in Ntf3 conditional mutant mice—as noted—are “consistent with the hypothesis that MN-derived Ntf3 is necessary for the formation of locomotor circuits with properly coordinated activity,” but they do not establish a direct causal link. However, analyzing the intrinsic activity of spinal locomotor circuits was beyond the scope of this study.

      (1) Toch, M. et al. Onecut-dependent Nkx6.2 transcription factor expression is required for proper formation and activity of spinal locomotor circuits. Sci Rep 10, 996 (2020). https://doi.org/10.1038/s41598-020-57945-4

      (2) Lee, E. J. et al. Global Analysis of Intercellular Homeodomain Protein Transfer. Cell Rep 28, 712-722 e713 (2019). https://doi.org/10.1016/j.celrep.2019.06.056

      (3) Harris, A. et al. Onecut factors and Pou2f2 regulate the distribution of V2 interneurons in the mouse developing spinal cord. Front Cell Neurosci 13 (2019). https://doi.org/10.3389/fncel.2019.00184

      (4) Kabayiza, K. U. et al. The Onecut Transcription Factors Regulate Differentiation and Distribution of Dorsal Interneurons during Spinal Cord Development. Front Mol Neurosci 10, 157 (2017). https://doi.org/10.3389/fnmol.2017.00157

      (5) Deska-Gauthier, D. et al. Embryonic temporal-spatial delineation of excitatory spinal V3 interneuron diversity. Cell Rep 43, 113635 (2024). https://doi.org/10.1016/j.celrep.2023.113635

      (6) Bikoff, J. B. et al. Spinal Inhibitory Interneuron Diversity Delineates Variant Motor Microcircuits. Cell165, 207-219 (2016). https://doi.org/10.1016/j.cell.2016.01.027

      (7) Hayashi, M. et al. Graded Arrays of Spinal and Supraspinal V2a Interneuron Subtypes Underlie Forelimb and Hindlimb Motor Control. Neuron 97, 869-884 e865 (2018). https://doi.org/10.1016/j.neuron.2018.01.023

      (8) Rousso, D. L., Gaber, Z. B., Wellik, D., Morrisey, E. E. & Novitch, B. G. Coordinated actions of the forkhead protein Foxp1 and Hox proteins in the columnar organization of spinal motor neurons. Neuron59, 226-240 (2008). https://doi.org/10.1016/j.neuron.2008.06.025 [pii]

      (9) Roy, A. et al. Onecut transcription factors act upstream of Isl1 to regulate spinal motoneuron diversification. Development 139, 3109-3119 (2012). https://doi.org/10.1242/dev.078501

      (10) Debrulle, S. et al. Vsx1 and Chx10 paralogs sequentially secure V2 interneuron identity during spinal cord development. Cell Mol Life Sci 77, 4117-4131 (2020). https://doi.org/10.1007/s00018-019-03408-7

      (11) Brunklaus, A. et al. in Brain Vol. 145 3816-3831 (2022).

      (12) Scekic-Zahirovic, J. et al. in EMBO J Vol. 35 1077-1097 (2016).

      (13) Wong, J. C. in Epilepsy Curr Vol. 25 347-349 (2025).

      (14) Hafler, B. P., Choi, M. Y., Shivdasani, R. A. & Rowitch, D. H. Expression and function of Nkx6.3 in vertebrate hindbrain. Brain Res 1222, 42-50 (2008). https://doi.org/10.1016/j.brainres.2008.04.072 [pii]

      (15) Nardelli, J., Thiesson, D., Fujiwara, Y., Tsai, F. Y. & Orkin, S. H. Expression and genetic interaction of transcription factors GATA-2 and GATA-3 during development of the mouse central nervous system. Dev Biol 210, 305-321 (1999).

      (16) Bretzner, F. & Brownstone, R. M. in J Neurosci Vol. 33 14681-14692 (2013).

      (17) Chopek, J. W., Zhang, Y. & Brownstone, R. M. in J Neurophysiol Vol. 126 1978-1990 (2021).

      (18) Miyagi, S., Kato, H. & Okuda, A. in Cell Mol Life Sci Vol. 66 3675-3684 (2009).

      (19) French, C. A. et al. in Mol Psychiatry Vol. 24 447-462 (2019).

      (20) Khouri-Farah, N., Guo, Q., Perry, T. A., Dussault, R. & Li, J. Y. H. in Nat Neurosci Vol. 28 2022-2033 (2025).

      (21) Bechade, C., Mallecourt, C., Sedel, F., Vyas, S. & Triller, A. in J Neurosci Vol. 22 8779-8784 (2002).

      (22) Grieshammer, U., Lewandoski, M., Prevette, D., Oppenheim, R. W. & Martin, G. R. Muscle-specific cell ablation conditional upon Cre-mediated DNA recombination in transgenic mice leads to massive spinal and cranial motoneuron loss. Dev Biol 197, 234-247 (1998). https://doi.org/10.1006/dbio.1997.8859

      (24) Kablar, B. & Rudnicki, M. A. Development in the absence of skeletal muscle results in the sequential ablation of motor neurons from the spinal cord to the brain. Dev Biol 208, 93-109 (1999). https://doi.org/10.1006/dbio.1998.9184

      (25) Dutton, R., Yamada, T., Turnley, A., Bartlett, P. F. & Murphy, M. Regulation of spinal motoneuron differentiation by the combined action of Sonic hedgehog and neurotrophin 3. Clin Exp Pharmacol Physiol 26, 746-748 (1999). https://doi.org/10.1046/j.1440-1681.1999.03108.x

      (26) Buck, C. R., Seburn, K. L. & Cope, T. C. Neurotrophin expression by spinal motoneurons in adult and developing rats. J Comp Neurol 416, 309-318 (2000).

      (27) Henderson, C. E. et al. Neurotrophins promote motor neuron survival and are present in embryonic limb bud. Nature 363, 266-270 (1993). https://doi.org/10.1038/363266a0

      (28) Usui, N. et al. Role of motoneuron-derived neurotrophin 3 in survival and axonal projection of sensory neurons during neural circuit formation. Development 139, 1125-1132 (2012). https://doi.org/10.1242/dev.069997