10,000 Matching Annotations
  1. Aug 2025
    1. Reviewer #1 (Public review):

      Summary:

      The authors have studied how a virus (EMCV) uses its RNA (Type 2 IRES) to hijack the host's protein-making machinery. They use cryo-EM to extract structural information about the recruitment of viral Type 2 IRES to ribosomal pre-IC. The authors propose a novel interaction mechanism in which the EMCV Type 2 IRES mimics 28S rRNA and interacts with ribosomal proteins and initiator tRNA (tRNAi).

      Strengths:

      (1) Getting structural insights about the Type 2 IRES-based initiation is novel.

      (2) The study allows a good comparison of other IRES-based initiation systems.

      (3) The manuscript is well-written and clearly explains the background, methods, and results.

      Weaknesses:

      (1) The main weakness of the work is the low resolution of the structure. This limits the possibility of data interpretation at the molecular level.

      However, despite the moderate resolution of the cryo-EM reconstructions, the model fits well into the density. The analysis of the EMCV IRES-48S PIC structure is thorough and includes meaningful comparisons to previously published structures (e.g., PDB IDs - 7QP6 and 7QP7). These comparisons showed that Map B1 represents a closed conformation, in contrast to Map A in the open state (Figure 2). Additionally, the proposed 28S rRNA mimicry strategy supported by structural superposition with the 80S ribosome and sequence similarity between the I domain of the IRES and the h38 region of 28S rRNA (Fig. 4) is well-justified.

      (2) The lack of experimental validation of the functional importance of regions like the GNRA and RAAA loops is another limitation of this study.

      (3) Minor modifications related to data processing and biochemical studies will further validate and strengthen the findings.

      a) In the cryo-EM data section, the authors should include an image showing rejected particles during 2D classification. This would help readers understand why, despite having over 22k micrographs with sufficient particle distribution and good contrast, only a smaller number of particles were used in the final reconstruction. Additionally, employing map-sharpening tools such as Ewald sphere correction, Bayesian polishing, or reference-based motion correction might further improve the quality of the maps. Targeting high-resolution structures would be particularly informative.

      b) The strategic modelling of different IRES domains into the density, particularly the domain into the region above the 40S head, is appreciable. However, providing the full RNA tertiary structure (RNAfold) of the EMCV IRES (nucleotides 280-905) would better explain the logic behind the model building and its molecular interpretation.

      c) Although the authors compare their findings with other types of IRESs (Types 1, 3, and 4), there is no experimental validation of the functional importance of regions like the GNRA and RAAA loops. Including luciferase-based assays or mutational studies of these regions for validation of structural interpretations is strongly recommended.

    2. Reviewer #2 (Public review):

      Summary:

      The field of protein translation has long sought the structure of a Type 2 Internal Ribosome Entry Site (IRES). In this work, Das and Hussain pair cryo-EM with algorithmic RNA structure prediction to present a structure of the Type 2 IRES found in Encephalomyocarditis virus (EMCV). Using medium to low resolution cryo-EM maps, they resolve the overall shape of a critical domain of this Type 2 IRES. They use algorithmic RNA prediction to model this domain onto their maps and attempt to explain previous results using this model.

      Strengths:

      (1) This study reveals a previously unknown/unseen binding modality used by IRESes: a direct interaction of the IRES with the initiator tRNA.

      (2) Use of an IRES-associated factor to assemble and pull down an IRES bound to the small subunit of the ribosome from cellular extracts is innovative.

      (3) Algorithmic modeling of RNA structure to complement medium to low resolution cryo-EM maps, as employed here, can be implemented for other RNA structures.

      Weaknesses:

      (1) Maps at the resolution presented prevent unambiguous modelling of the EMCV-IRES. This, combined with the lack of any biochemical data, calls into question any inferences made at the level of individual nucleotides, such as the GNRA loop and CAAA loop (Figure 4).

      (2) The EMCV IRES contains an upstream AUG at position 826, where the PIC can assemble (Pestova et al 1996; PMID 8943341). It is unclear if this start codon was mutated in this study. If it were not mutated, placement of AUG-834 over AUG-826 in the P-site is unexplained.

      (3) The claims the authors make about (i) the general overall shape and binding site of the IRES, (ii) its gross interaction with the two ribosomal proteins, (iii) the P-in state of the 48S, (iv) the rearrangement of the ternary complex are all warranted. Their claims about individual nucleotides or smaller stretches of the IRES-without any supporting biochemical data-is not warranted by the data.

    3. Reviewer #3 (Public review):

      Summary:

      Type II IRES, such as those from encephalomyocarditis virus (EMCV) and foot-and-mouth disease virus (FMDV), mediate cap-independent translation initiation by using the full complement of eukaryotic initiation factors (eIFs), except the cap-binding protein eIF4E. The molecular details of how IRES type II interacts with the ribosome and initiation factors to promote recruitment have remained unclear. Das and Hussain used cryo-electron microscopy to determine the structure of a translation initiation complex assembled on the EMCV IRES. The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Strengths:

      The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Weaknesses:

      While this reviewer acknowledges the technical challenges inherent in determining the structure of such a highly flexible complex, the overall resolution remains insufficient to fully support the authors' conclusions, particularly given that cryo-EM is the sole experimental approach presented in the manuscript.

      The study is biologically significant; however, the authors should improve the resolution or include complementary biochemical validation.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have studied how a virus (EMCV) uses its RNA (Type 2 IRES) to hijack the host's protein-making machinery. They use cryo-EM to extract structural information about the recruitment of viral Type 2 IRES to ribosomal pre-IC. The authors propose a novel interaction mechanism in which the EMCV Type 2 IRES mimics 28S rRNA and interacts with ribosomal proteins and initiator tRNA (tRNAi).

      Strengths:

      (1) Getting structural insights about the Type 2 IRES-based initiation is novel.

      (2) The study allows a good comparison of other IRES-based initiation systems.

      (3) The manuscript is well-written and clearly explains the background, methods, and results.

      We thank Reviewer 1 for appreciating our efforts and finding structural insights about the type 2 IRES-based initiation presented in this study as novel.

      Weaknesses:

      (1) The main weakness of the work is the low resolution of the structure. This limits the possibility of data interpretation at the molecular level.

      However, despite the moderate resolution of the cryo-EM reconstructions, the model fits well into the density. The analysis of the EMCV IRES-48S PIC structure is thorough and includes meaningful comparisons to previously published structures (e.g., PDB IDs - 7QP6 and 7QP7). These comparisons showed that Map B1 represents a closed conformation, in contrast to Map A in the open state (Figure 2). Additionally, the proposed 28S rRNA mimicry strategy supported by structural superposition with the 80S ribosome and sequence similarity between the I domain of the IRES and the h38 region of 28S rRNA (Fig. 4) is welljustified.

      We agree that the low resolution of the map has compromised the data interpretation at the molecular level, and we thank the reviewer for appreciating our findings at this resolution. Due to the compromise in resolution, we have reported findings related to stretches or regions such as loops and stems, rather than individual nucleotides and interactions.  

      (2) The lack of experimental validation of the functional importance of regions like the GNRA and RAAA loops is another limitation of this study.

      We agree with the lack of any additional experiments other than Cryo-EM for probing the importance of regions such as GNRA and RAAA loops in this study. However, we have cited earlier reports that demonstrate the importance of these regions for overall IRES activity. The essentiality of RAAA loop for type 2 IRES was demonstrated in earlier report López de Quinto and Martínez-Salas, 1997 (Cited in manuscript). Further, the conservation of this loop across the type 2 IRES family adds to the importance of this loop (Manuscript Figure 6B). This loop and its flanking G-C stem are similar to h38 of 28S rRNA, and it appears that RAAA loop adopts a mimicry mechanism to interact with the 40S ribosomal protein- uS19, thus highlighting its importance for interaction with 40S. Experiments destabilising the G-C stem also compromise IRES activity, as shown in the case of FMDV IRES (Fernández et al 2011). Previous studies related to the mutation of the GNRA or GCGA loop in EMCV IRES have shown a deficiency in IRES activity (Roberts and Belsham, 1997; Robertson et al 1999), suggesting the importance of these regions in the viral IRES biology, and these reports are cited in the manuscript. Not only EMCV IRES, but mutation in the GUAA (representative of GNRA) loop of FMDV IRES also showed significant reduction in IRES activity (López de Quinto and Martínez-Salas, 1997). In our study, we observe that GCGA loop interacts with tRNA<sub>i</sub> in EMCV IRES-48S PIC, thus implicating the importance of this loop. Moreover, incubation of FMDV IRES with 40S ribosomes has shown a decrease in SHAPE reactivity in domain 3 apex (position 170- 200 nucleotides) (Lozano et al 2018), which corresponds to EMCV IRES domain I apex. Further, we will attempt to address the concern of lack of experimental validation of GNRA and RAAA loops by performing biochemical assays.

      (3) Minor modifications related to data processing and biochemical studies will further validate and strengthen the findings.

      a) In the cryo-EM data section, the authors should include an image showing rejected particles during 2D classification. This would help readers understand why, despite having over 22k micrographs with sufficient particle distribution and good contrast, only a smaller number of particles were used in the final reconstruction. Additionally, employing mapsharpening tools such as Ewald sphere correction, Bayesian polishing, or reference-based motion correction might further improve the quality of the maps. Targeting high-resolution structures would be particularly informative.

      We thank the reviewer for the suggestions, and we would employ suggested processes that may help improve the quality of the maps further. We will include image for rejected 2D classes in the revised manuscript. We agree with the Reviewer’s query related to the substantial number of micrographs and smaller number of particles for the final reconstruction. The total number of micrographs is the summation of multiple datasets, prepared and collected at various times. Among these, around 8000 micrographs have extremely poor particle number and distribution. As a result, the number of particles per micrograph is heterogeneous in the compiled dataset. We obtained only 237054 ‘good particles’ after multiple rounds of 2D & 3D classifications, and the final reconstruction has 28439 particles (~12%). This class was obtained after masked classification for IRES and ternary complex density. Hence, only the particles that show the best density for both IRES and ternary complex are used for reconstructing this map. Another set of particles that have only a portion of IRES and tRNA but NO density for eIF2 forms another map (26792 particles, 11.3%). Thus, we obtained a total of 55231 particles (23.3%) with IRES density.  

      b) The strategic modelling of different IRES domains into the density, particularly the domain into the region above the 40S head, is appreciable. However, providing the full RNA tertiary structure (RNAfold) of the EMCV IRES (nucleotides 280-905) would better explain the logic behind the model building and its molecular interpretation.

      We thank the reviewer for appreciating the modelling of the domain I apex in the cryo-EM density. We tried to predict the full tertiary structure of the IRES, however, inclusion of the full-length sequence from 280-905 gave models of extremely low confidence, and few domains do not abide by the secondary structure of EMCV IRES as reported in Duke et al 1992. Hence, we used individual domains of EMCV IRES and predicted the tertiary structure independent of other IRES domains. Furthermore, 3D models of FMDV IRES domains 2, 3, and 4 (corresponding to EMCV IRES domains- H, I, and J-K) were predicted from SHAPE reactivity values and RNAComposer server (Figure 3 in Lozano et al 2018). The predicted architecture of domain 3 apex (FMDV IRES) coincides with our I domain apex model (EMCV IRES).

      c)  Although the authors compare their findings with other types of IRESs (Types 1, 3, and 4), there is no experimental validation of the functional importance of regions like the GNRA and RAAA loops. Including luciferase-based assays or mutational studies of these regions for validation of structural interpretations is strongly recommended.

      We have discussed the possibility of how the other IRESs, such as type 1 and type 5 (Aichi virus), might use similar strategies as EMCV IRES to assemble the 48S PIC, given the similarity in the motif sequence and position across the viral IRESs. Like EMCV IRES, the type 1 IRES (e.g. Poliovirus, Coxsackie virus) also harbours the GNRA loop, preceded by a C-rich loop at its longest domain, known for long-range RNA-RNA interactions. The segment harbouring GNRA loop is highly conserved across the type 1 family of IRESs (Kim et al 2015).The Aichi viral IRES (type 5) harbours a GNRA loop in its longest domain, which is domain J. Deletion of the GNRA loop has compromised the IRES activity; however, substitution mutations in this region either elevated the IRES activity or it remained unaltered (Yu et al 2011). We have hypothesized that these IRESs (type 1 and type 5) might use the GNRA motifs in their longest domain (domain IV in type 1, and domain J in type 5) similar to that of EMCV IRES, where GNRA is present in the longest domain (I) and preceded by a C-rich loop. Thus, GNRA can potentially mediate long-range interactions with tRNA<sub>i</sub> as all these IRESs require eIF2-ternary complex for the formation of 48S PIC. Parallelly, like EMCV IRES, type 1 and type 5 IRESs also have similar placement of GNRA motif-containing domain before the eIF4G-binding domain (domain J-K in EMCV IRES, domain V in poliovirus, domain K in Aichi virus). Hence, we suggest the possibility of a similar strategy by these IRESs to interact with tRNA<sub>i</sub> during the formation of 48S PIC.  

      Reviewer #2 (Public review):

      Summary:

      The field of protein translation has long sought the structure of a Type 2 Internal Ribosome Entry Site (IRES). In this work, Das and Hussain pair cryo-EM with algorithmic RNA structure prediction to present a structure of the Type 2 IRES found in Encephalomyocarditis virus (EMCV). Using medium to low resolution cryo-EM maps, they resolve the overall shape of a critical domain of this Type 2 IRES. They use algorithmic RNA prediction to model this domain onto their maps and attempt to explain previous results using this model.

      Strengths:

      (1) This study reveals a previously unknown/unseen binding modality used by IRESes: a direct interaction of the IRES with the initiator tRNA.

      (2) Use of an IRES-associated factor to assemble and pull down an IRES bound to the small subunit of the ribosome from cellular extracts is innovative.

      (3) Algorithmic modeling of RNA structure to complement medium to low resolution cryoEM maps, as employed here, can be implemented for other RNA structures.

      We thank Reviewer 2 for positive and encouraging comments on our work, appreciating our ‘innovative’ approach of using IRES-associated factor to assemble and pull down IRES-bound ribosomal complex.  

      Weaknesses:

      (1) Maps at the resolution presented prevent unambiguous modelling of the EMCV-IRES. This, combined with the lack of any biochemical data, calls into question any inferences made at the level of individual nucleotides, such as the GNRA loop and CAAA loop (Figure 4).

      We understand the concerns raised by the reviewer related to the resolution of the EMCV IRES-48S PIC map. However, we would like to mention that we refrained from commenting on individual nucleotides or molecular interactions in the manuscript. Instead, we discuss about loops, RNA stretches or motifs that could be inferred with more confidence as shown in Manuscript Figure 4. The EMCV IRES can directly interact with the 40S ribosome using its domain H and I (Chamond et al 2014), however, the details this interaction was unknown. We observe that the CAAA loop of domain I apex interacts with 40S ribosome based on the placement of portion of domain I in the cryo-EM map. This is also reflected in the earlier reported SHAPE data (Supplementary figures 2, and 8 in Chamond et al 2014), where a decrease in reactivity is evident in the presence of 40S ribosome. In addition, incubation of EMCV IRES with rabbit reticulocyte lysate (RRL) offered protection to domain I apex regions, which included the CAAA loop (Figure 4b in Maloney and Joseph, 2024).

      Furthermore, this decrease in SHAPE reactivity pattern is also evident for FMDV IRES domain 3 apex (like domain I in EMCV IRES) in the presence of 40S ribosome (Lozano et al 2018).

      Thus, these studies are consistent with the placement of IRES model in the cryo-EM map.

      We aim to improve the resolution of the maps for better clarity and add biochemical experiments to justify the possible interactions.

      (2) The EMCV IRES contains an upstream AUG at position 826, where the PIC can assemble (Pestova et al 1996; PMID 8943341). It is unclear if this start codon was mutated in this study. If it were not mutated, placement of AUG-834 over AUG-826 in the P-site is unexplained.

      We thank the reviewer for bringing up this point, as we missed mentioning this in the manuscript. The EMCV IRES does not require scanning and directly positions the AUG-834 at the P site (Pestova et al 1996). In Pestova et al 1996, the intensity of the toeprint at AUG-834 is much more intense than that of AUG-826. Further, AUG-834 lies in the Kozak context, whereas AUG-826 has a poor Kozak context. Furthermore, the synthesis of the polypeptide requires placement of AUG-834 at the P site. In our cryo-EM map, we observed that the tRNA<sub>i</sub> is in a P<sub>IN</sub> state, which indicates the recognition of the start codon, and we reasoned that it is more likely that AUG-834 is placed at the P site than AUG-826. We will mention this in the revised manuscript, as we had NOT mutated AUG-826.

      (3) The claims the authors make about (i) the general overall shape and binding site of the IRES, (ii) its gross interaction with the two ribosomal proteins, (iii) the P-in state of the 48S, (iv) the rearrangement of the ternary complex are all warranted. Their claims about individual nucleotides or smaller stretches of the IRES-without any supporting biochemical data-is not warranted by the data.

      We thank the reviewer for warranting major claims, and we wish to make further improvements to support our assessment of small stretches and individual nucleotides.

      Reviewer #3 (Public review):

      Summary:

      Type II IRES, such as those from encephalomyocarditis virus (EMCV) and foot-and-mouth disease virus (FMDV), mediate cap-independent translation initiation by using the full complement of eukaryotic initiation factors (eIFs), except the cap-binding protein eIF4E. The molecular details of how IRES type II interacts with the ribosome and initiation factors to promote recruitment have remained unclear. Das and Hussain used cryo-electron microscopy to determine the structure of a translation initiation complex assembled on the EMCV IRES. The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Strengths:

      The structure reveals a direct interaction between the IRES and the 40S ribosomal subunit, offering mechanistic insight into how type II IRES elements recruit the ribosome.

      Weaknesses:

      While this reviewer acknowledges the technical challenges inherent in determining the structure of such a highly flexible complex, the overall resolution remains insufficient to fully support the authors' conclusions, particularly given that cryo-EM is the sole experimental approach presented in the manuscript.

      The study is biologically significant; however, the authors should improve the resolution or include complementary biochemical validation.

      We thank Reviewer 3 for acknowledging the technical challenges in this study and finding our study biologically significant. We understand the concerns related to low resolution and the requirement of complementary biochemical validation for our reported observations and interpretations in the manuscript. We are attempting to improve the resolution and complement the interpretations with biochemical experiments.

    1. eLife Assessment

      This valuable investigation provides new and solid evidence for a specific cognitive deficit in cerebellar degeneration patients. The authors use three tasks that modulate complexity and violations of cognitive expectations. They show specific slowing of reaction times in the presence of violations but not with task complexity. While some alternative interpretations of the results are possible and are discussed, the work provides a new, invaluable data point in describing the cognitive contribution of cerebellar processing.

    2. Reviewer #1 (Public review):

      Summary:

      The authors test the hypothesis that the contribution of the cerebellum to cognitive tasks is similar to motor tasks, and is related to the processing of prediction errors (here: violation of expectations, VE). In three experiments, they find that cerebellar patients show differences compared to controls in measures of VE, but not task complexity. The findings show that cerebellar disease results in deficits in VE processing in cognitive tasks, and makes a valuable contribution of the field. The authors were able to test a large number of patients with cerebellar disease which is known to primarily affect the cerebellum (i.e. SCA6).

      Strengths:

      A strength of the study is that it is hypothesis-driven and that the three experiments are very well thought out. Furthermore, a comparatively large group of patients with spinocerebellar ataxia type 6 (SCA6) was tested, a disease which affects primarily the cerebellum.

      Weaknesses:

      - Acquisition of brain MRI scans would have been useful to perform lesion-behaviour-mapping. But this does not limit the significance of the behavioural findings.<br /> - Exp. 1 and 2: The lack of difference in accuracy was that an unexpected finding? How meaningful are the used paradigms when accuracy was the same in cerebellar patients and controls?<br /> - Exp. 1 and 2: Cerebellar patients have motor dysfunction which impacts reaction time. Can the authors exclude that this contributed at least in part to their findings? Any correlations to SARA score (upper limb function) or oculomotor dysfunction (e.g. presence of nystagmus)?<br /> - Data on the attention probes which have been done would be of interest. Were there any differences in attention between patients and controls, any correlations with the findings?

      Comments on revisions:

      I am not sure if I can follow the interpretation of the authors that the cerebellum contributes to prediction errors, but not predictions; These two are tightly connected? It may rather be that in patients with slowly progressive chronic disease there is a lot of compensation? It is not so rare that in cognitive tasks cerebellar patients do not perform differently from controls, even though one would expect a difference (e.g. based on fMRI data in controls)? Another factor which likely adds is age, Patients and controls are often middle-aged and elderly, adding to variability, decreasing the chance to see group differences?

    3. Author response:

      The following is the authors’ response to the original reviews

      Joint Public Review:

      Summary:

      In this study, Daniel et al. used three cognitive tasks to investigate behavioral signatures of cerebellar degeneration. In the first two tasks, the authors found that if an equation was incorrect, reaction times slowed significantly more for cerebellar patients than for healthy controls. In comparison, the slowing in the reaction times when the task required more operations was comparable to normal controls. In the third task, the authors show increased errors in cerebellar patients when they had to judge whether a letter string corresponded to an artificial grammar.

      Strengths:

      Overall, the work is methodologically sound and the manuscript well written. The data do show some evidence for specific cognitive deficits in cerebellar degeneration patients.

      Thank you for the thoughtful summary and constructive feedback. We are pleased that the methodological rigor and clarity of the manuscript were appreciated, and that the data were recognized as providing meaningful evidence regarding cognitive deficits in cerebellar degeneration.

      Weaknesses:

      The current version has some weaknesses in the visual presentation of results. Overall, the study lacks a more precise discussion on how the patterns of deficits relate to the hypothesized cerebellar function. The reviewers and the editor agreed that the data are interesting and point to a specific cognitive deficit in cerebellar patients. However, in the discussion, we were somewhat confused about the interpretation of the result: If the cerebellum (as proposed in the introduction) is involved in forming expectations in a cognitive task, should they not show problems both in the expected (1+3 =4) and unexpected (1+3=2) conditions? Without having formed the correct expectation, how can you correctly say "yes" in the expected condition? No increase in error rate is observed - just slowing in the unexpected condition. But this increase in error rate was not observed. If the patients make up for the lack of prediction by using some other strategy, why are they only slowing in the unexpected case? If the cerebellum is NOT involved in making the prediction, but only involved in detecting the mismatch between predicted and real outcome, why would the patients not show specifically more errors in the unexpected condition?

      Thank you for asking these important questions and initiating an interesting discussion. While decision errors and processing efficiency are not fully orthogonal and are likely related, they are not necessarily the same internal construct. The data from Experiments 1 and 2 suggest impaired processing efficiency rather than increased decision error. Reaction time slowing without increased error rates suggests that the CA group can form expectations but respond more slowly, possibly due to reduced processing efficiency. Thus, this analysis of our data suggests that the cerebellum is not essential for forming expectations, but it plays a critical role in processing their violations.

      Relatedly, a few important questions remain open in the literature concerning the cerebellum’s role in expectation-related processes. The first is whether the cerebellum contributes to the formation of expectations or the processing of their violations. In Experiments 1 and 2, the CA group did not show impairments in the complexity manipulation. Solving these problems requires the formation of expectations during the reasoning process. Given the intact performance of the CA group, these results suggest that they are not impaired in forming expectations. However, in both Experiments 1 and 2, patients exhibited selective impairments in solving incorrect problems compared to correct problems. Since expectation formation is required in both conditions, but only incorrect problems involve a VE, we hypothesize that the cerebellum is involved in VE processes. We suggest that the CA group can form expectations in familiar tasks, but are impaired in processing unexpected compared to expected outcomes. This supports the notion that the cerebellum contributes to VE, rather than to forming expectations.

      In Experiment 3, during training, the participant is learning a novel rule (grammar), forming new expectations on how strings of letters should be. Afterwards, during testing, the participant is requested to identify if a novel string is following the rule or not. We examined sensitivity to distinguish between grammatical and non‐grammatical strings of letters, thus taking into account a baseline ability to identify expected strings. Additionally, both in the low‐similarity and highsimilarity conditions, there are expectations regarding whether the strings are following the rule or not. However, in the high‐similarity condition, there is more uncertainty regarding which strings are following the grammatical rule, as demonstrated in a lower sensitivity (d prime). Given the group differences only in the low similarity condition, these results suggest the CA group is impaired only when the rules are more certain. Given these results, we suggest that forming cognitive expectations is not necessarily dependent on the cerebellum. Rather, we propose that the cerebellum is critical for processing rule-based VE (detection or processing of detected errors) under conditions of more certainty. One remaining question for future studies is whether the cerebellum contributes to detection of a mismatch between the expectation and sensory evidence, or the processing of a detected VE. 

      We suggest that these key questions are relevant to both motor and non-motor domains and were not fully addressed even in the previous, well-studied motor domain. Importantly, while previous experimental manipulations17,19,40,94–96 have provided important insights regarding the cerebellar role in these processes, some may have confounded these internal constructs due to task design limitations (e.g., lack of baseline conditions). Notably, some of these previous studies did not include control conditions, such as correct trials, where there was no VE. In addition, other studies did not include a control measure (e.g., complexity effect), which limits their ability to infer the specific cerebellar role in expectation manipulation. 

      Thus, the current experimental design used in three different experiments provides a valuable novel experimental perspective, allowing us to distinguish between some, but not all, of the processes involved in the formation of expectations and their violations. For instance, to our knowledge, this is the first study to demonstrate a selective impairment in rule-based VE processing in cerebellar patients across both numerical reasoning and artificial grammar tasks. If feasible, we propose that future studies should disentangle different forms of VE by operationalizing them in experimental tasks in an orthogonal manner. This will allow us to achieve a more detailed and well-defined cerebellar motor and non-motor mechanistic account.

      Recommendations for the authors:

      Editors comments:

      The Figures are somewhat sub-standard and should be improved before the paper is made the VOR. Ensure consistent ordering of the group factor (CA, NT) and experimental factor across Figure 3,4, and 6 (panels A). Having the patient group as columns in Figure 4a and in rows in Figure 6a is very confusing.

      We have standardized the layout across Figures 2, 4, and 6 so that the group factor (CA, NT) and experimental conditions are consistently ordered. In all panels, the group factor now appears as a column.

      Subpanels should be numbered A,B,C... not A, B1, B2.

      Subpanel labels have been updated to follow the standard A, B, C format across all figures.

      Fonts should have a 100% aspect ratio - they should not be stretched (Figure 6B).

      We have corrected the font aspect ratios in all figures (e.g., Figure 6B) to ensure proper proportions and readability. 

      Colors should be more suitable to print - use a CYMK color scheme (i.e. avoid neon colors such as the neon green for the CA).

      The color scheme across all figures has been revised to be print-friendly using CMYKcompatible, colorblind-accessible palettes. Neon green for the CA group was replaced with a more muted, distinguishable color.

      Abstract: "The CA group exhibited a disproportionate cost when comparing expected problems compared to unexpected problems" - I recommend switching unexpected and expected, as the disproportional cost in on the former.

      We have changed the wording of the sentence accordingly. 

      Upon re-reading the details for the AGL task were not clear to us. Please do not rely on the reference (78) for the details - your paper should contain enough information to have the reader understand the experimental details. For you to appreciate the depth of our not-understanding, here a simple question: The test strings either followed the grammar in Fig 5 or they did not. If they did not, how exactly was similarity to the grammar measured? If they did, what was the difference between the “Grammatical-high” and “Grammatical-low” trials? If the string was grammatical, there should not be a notion of similarity, no? Or where these trials arbitrary split in half? 

      We have clarified that 50% of the test strings followed the grammar of the training strings. We also elaborated on the calculation of chunk strength as a measure of similarity between the training and testing strings, similar to the previous papers. The differences between low and high similarity are explained in the paper. Specifically, for each test string, we calculated chunk strength by summing the frequencies of all relevant substrings (e.g., bigrams and trigrams) that appeared in the training set. The test strings whose chunk‐strength values fell above the median for grammatical items were classified as “high similarity,” while those falling below the median were classified as “low similarity.” Also, grammatical strings can be of both low and high similarity; this is precisely the beautiful aspect of this experimental manipulation, showing the importance of uncertainty. We have utilized a 2 × 2 fully orthogonal design (grammaticality × similarity).

      Experimental details of the task should be added to the Method section. In the results you should only mention the experimental details that are necessary for understanding the experiments, but details such as the number of trials, etc, can be moved to the methods. 

      We have now moved the experimental task details to the Method sections.

      Reviewer #1 (Recommendations for the author):

      Studies have been done online and not in the lab. Could that have affected the results?

      We addressed this in the Methods section, referring to established protocols for online neuropsychological testing[9–12]. Our results align with similar in-lab findings in both the subtraction and AGL tasks, supporting the online approach's robustness. 

      Figure 2, B1; Figure 4, B1; Figure 6B: How many patients performed worse than the (worst-performing) controls? There appears to be quite some overlap between patients and controls. In the patients who performed worse, was there any difference from the other patients (e.g. disease severity as assessed by SARA score, repeat length, data of attention probes)?

      We appreciate the reviewer’s thoughtful comment. We considered conducting individual-level comparisons to identify patients who performed worse than the lowest-performing controls. However, defining "worse" based on the performance of the lowest control is only one possible criterion. Other definitions—such as a specific number (1/2/3?) of standard deviations below the control mean—are also commonly used in literature, and each may yield different conclusions. This variability highlights the lack of a standardized threshold for what constitutes “worse” or "impaired" performance at the individual level. Given this ambiguity, and in line with prior studies that focus on average group differences rather than “impairment” prevalence, we chose not to include these individual-level comparisons. We believe this approach better aligns with the goals and design of the current study. That said, we agree that examining individual variability is important and may be more appropriate in future studies with larger samples so that percentage is a more robust measure. However, given the rarity of the disease, this would also be a challenge for future studies.  

      SARA ataxia scale does not include oculomotor function. In SCA6 oculomotor deficits are frequent, eg, downbeat nystagmus. Please include information on oculomotor dysfunction.

      We thank the reviewer for this important observation. While it is true that the SARA scale does not explicitly assess oculomotor function, our experimental design – in all three experiments – has control conditions that help account for general processing differences, including those that could arise from oculomotor deficits. These conditions, such as the correct trials and the complexity effects, allow us to isolate effects specifically related to the violation of expectation while minimizing the influence of broader performance factors, such as eye movement abnormalities. We also note that, while some patients can experience oculomotor symptoms such as downbeat nystagmus, none of our tasks required precise visual tracking or gaze shifts. In our experimental tasks, stimuli were centrally presented, and no visual tracking or saccadic responses were required. Moreover, the response time windows and stimulus durations (>2–5 s) were sufficient to mitigate the effects of delayed visual processing due to oculomotor impairment.

      Why was MoCA used and not the CCAS-Schmahmann scale to assess cognitive function?

      We selected the MoCA due to its broad clinical utility, time efficiency, and ability to detect mild cognitive impairment specifically in CA[101,102].  

      Were there any signs of depression in the patient group that could have affected the results?

      None of the patients had a clinical diagnosis of depression or were undergoing psychiatric treatment.  

      Additionally, the interaction between group and expectancy was insignificant when RT was the depended vaibale .." = variable

      This has been corrected to "variable" in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      The terms 'unexpected' and 'expected' conditions are confusing. [...] Terming this 'violation of expectation' seems unnecessarily complicated to me. 

      We thank the reviewer for raising this important concern. We recognize that the terms "expected" and "unexpected" can be ambiguous without clarification, and that "violation of expectation" (VE) may initially appear unnecessary. Our choice to use VE terminology is grounded in an established theoretical framework that distinguishes between mere stimulus correctness and prediction mechanisms. Specifically, VE captures the internal processing of mismatches between anticipated and observed outcomes, which we believe is central to the cerebellar function under investigation. While simpler, technical alternatives (e.g., "correct" vs. "incorrect") could describe the stimuli, we find that VE more accurately reflects the mental constructs under study and is consistent with previous literature in both motor and cognitive domains. 

      Both tasks provide an error (or violation of expectation) that is non-informative and therefore unlikely to be used to update a forward model. The authors draw on motor literature to formulate a cognitive task where the presence of an error would engage the cerebellum and lead to longer reaction times in cerebellar patients. But in the motor domain, mismatch of sensory feedback and expectations would lead to an updating of the internal forward model. It seems unlikely to me in the arithmetic and alphabetic addition tasks that patients would update their internal model of addition according to an error presented at the end of each trial. If the error processed in these tasks will not lead to the updating of the internal forward model, can the authors discuss to what extent the cerebellum will be engaged similarly in these tasks, and what exactly connects cerebellar processing in these motor and cognitive tasks.

      We thank the reviewer for this thoughtful and important comment. We fully agree that the current tasks do not directly probe learning-related updating of internal models. As stated in the paper, the goal of the present study was not to support or refute a specific claim regarding the cerebellum’s role in learning processes. Rather, our focus was on examining cerebellar involvement in the processing of VE. While we were inspired by models from the motor domain, our design was not intended to induce learning or adaptation per se, but to isolate the processing of unexpected outcomes. We agree that the tasks in their current form are unlikely to engage forward model updating in the same way as in sensorimotor adaptation paradigms. That said, we believe the current findings can serve as a basis for future research exploring the relationship between cerebellar prediction error processing and learning over time. As we also noted in the paper, this is a direction we propose, and actively pursuing, in ongoing research work.

      The colour scheme is difficult for anyone with colour blindness or red-green visual impairment. Please adjust.

      All figures have been revised to use CMYK-compatible, colorblind-safe palettes, and neon colors have been removed.

      The introduction is a bit difficult to understand, because the authors draw on a number of different theories about cerebellar functioning, without clearly delineating how these relate to each other. For example: a) In the paragraph beginning with 'notably': If the cerebellum is required for sequential operations, why does it show the impairment with the rotation of the letters?

      We understand the concern that if the cerebellum is involved in sequential operations, its involvement in mental letter rotation, which can be assumed as “continuous transformation,” may appear contradictory. We note that the boundary between continuous and stepwise, procedural operations is not always clear-cut and may vary depending on the participant's strategy or previous knowledge, which is not fully known to the researchers. Furthermore, to our knowledge, prior work on mental rotation has not directly investigated the impact of VE during this task. However, these are two debatable considerations. 

      More importantly, a careful reading of our paper suggests that our experiments were designed to examine VE within tasks that involve sequential processing. Notably, we are not claiming that the cerebellum is involved in sequential or procedural processing per se. Rather, our findings point to a more specific role for the cerebellum in processing VE that arises during the construction of multistep procedural tasks. In fact, the results indicate that while the cerebellum may not be directly involved in the procedural process itself, it is critical when expectations are violated within such a context. This distinction is made possible in our study by the inclusion of a control condition (the complexity effect), which allows for a unique dissociation in our experimental design—one that, to our knowledge, has not been sufficiently addressed in previous studies.

      Additionally, in the case of arithmetic problem solving—such as the tasks used in prior studies cited in our manuscript21—there is substantial evidence that these problems are typically solved through stepwise, procedural operations. Arithmetic reasoning, used in Experiments 1 and 2, has been robustly associated with procedural, multi-step strategies, which may be more clearly aligned with traditional views of cerebellar involvement in sequential operations. Thus, we propose that the role of the cerebellum in continuous transformations should be further examined. 

      We suggest a more parsimonious theory that the cerebellum contributes to VE,  a field that was highly examined before. Yet, to reconcile ours and previous findings, we propose that the cerebellum’s contribution may not be limited to either continuous or stepwise operations per se, but rather to a domain-general process: the processing of VE. This theoretical framework can explain performance patterns across both mental rotation tasks and stepwise, procedural arithmetic.   

      The authors mention generation prediction as a function of the cerebellum, processing of prediction errors (or violations of expectations), sequentially, and continuous transformations - but it is unclear whether the authors are trying to dissociate these from each other or whether ALL of these functions have informed task design.

      We propose that the cerebellum’s contribution may not be limited to either continuous transformations or stepwise, procedural operations per se, but rather to a domain-general process: the processing of VE. We would like to clarify that we do not claim the cerebellum contributes to continuous transformations only, as suggested in some earlier work[21]. Rather, it could be that the cerebellum may contribute to continuous transformations, but we propose that it also supports multi-step, procedural processes. Given that framework, in the current study, across three separate experiments, we demonstrated that the cerebellum can also contribute to procedural, multi-step reasoning tasks.  

      Minor Comments

      Typo under paragraph beginning with 'notably' - cerebellum role should be cerebellar role.

      Corrected as suggested.

      When mentioning sequences as a recruiting feature for the cerebellum in the introduction, Van Overwalle's extensive work in the social domain should be referenced for completeness.

      Thank you for the suggestion. We have now cited Van Overwalle’s work on cerebellar involvement in sequence processing within the social domain in the revised Introduction.

    1. eLife Assessment

      This study provides fundamental insights into eukaryotic phosphate homeostasis by demonstrating how yeast vacuoles dynamically regulate cytosolic phosphate levels. The conclusions are convincing, supported by an elegant combination of in vitro assays and in vivo measurements. This study will be of interest to cell biologists, particularly for those who are working in the field of phosphate metabolism.

    2. Reviewer #1 (Public review):

      The manuscript by Bru et al. focuses on the role of vacuoles as a phosphate buffering system for yeast cells. The authors describe here the crosstalk between the vacuole and the cytosol using a combination of in vitro analyses of vacuoles and in vivo assays. They show that the luminal polyphosphatases of the vacuole can hydrolyze polyphosphates to generate inorganic phosphate, yet they are inhibited by high concentrations. This balances the synthesis of polyphosphates against the inorganic phosphate pool. Their data further show that the Pho91 transporter provides a valve for the cytosol as it gets activated by a decline in inositol pyrophosphate levels. The authors thus demonstrate how the vacuole functions as a phosphate buffering system to maintain a constant cytosolic inorganic phosphate pool.

      This is a very consistent and well-written manuscript with a number of convincing experiments, where the authors use isolated vacuoles and cellular read-out systems to demonstrate the interplay of polyphosphate synthesis, hydrolysis, and release. The beauty of this system the authors present is the clear correlation between product inhibition and the role of Pho91 as a valve to release Pi to the cytosol to replenish the cytosolic pool. I find the paper overall an excellent fit and only have a few issues, including :

      (1) Figure 3: The authors use in their assays 1 mM ZnCl2 or 1mM MgCl2. Is this concentration in the range of the vacuolar luminal ion concentration? Did they also test the effect of Ca2+, as this ion is also highly concentrated in the lumen?

      (2) Regarding the concentration of 30 mM K-PI, did the authors also use higher and lower concentrations? I agree that there is inhibition by 30 mM, but they cannot derive conclusions on the luminal concentration if they use just one in their assay. A titration is necessary here.

      (3) What are the consequences on vacuole morphology if the cells lack Pho91?

      (4) Discussion: The authors do not refer to the effect of calcium, even though I would expect that the levels of the counterion should affect the phosphate metabolism. I would appreciate it if they would extend their discussion accordingly.

      (5) I would appreciate a brief discussion on how phosphate sensing and control are done in human cells. Do they use a similar lysosomal buffer system?

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a well-conceived and concise study that significantly advances our understanding of polyphosphate (polyP) metabolism and its role in cytosolic phosphate (Pi) homeostasis in a model unicellular eukaryote. The authors provide evidence that yeast vacuoles function as dynamic regulatory buffers for Pi homeostasis, integrating polyP synthesis, storage, and hydrolysis in response to cellular metabolic demands. The work is methodologically sound and offers valuable insights into the conserved mechanisms of phosphate regulation across eukaryotes.

      Strengths:

      The results demonstrate that the vacuolar transporter chaperone (VTC) complex, in conjunction with luminal polyphosphatases (Ppn1/Ppn2) and the Pi exporter Pho91, establishes a finely tuned feedback system that balances cytosolic Pi levels. Under Pi-replete conditions, inositol pyrophosphates (InsPPs) promote polyP synthesis and storage while inhibiting polyP hydrolysis, leading to vacuolar Pi accumulation.

      Conversely, Pi scarcity triggers InsPP depletion, activating Pho91-mediated Pi export and polyP mobilization to sustain cytosolic phosphate levels. This regulatory circuit ensures metabolic flexibility, particularly during critical processes such as glycolysis, nucleotide synthesis, and cell cycle progression, where phosphate demand fluctuates dramatically.

      From my viewpoint, one of the most important findings is the demonstration that vacuoles act as a rapidly accessible Pi reservoir, capable of switching between storage (as polyP) and release (as free Pi) in response to metabolic cues. The energetic cost of polyP synthesis-driven by ATP and the vacuolar proton gradient-highlights the evolutionary importance of this buffering system. The study also draws parallels between yeast vacuoles and acidocalcisomes in other eukaryotes, such as Trypanosoma and Chlamydomonas, suggesting a conserved role for these organelles in phosphate homeostasis.

      Weaknesses:

      While the manuscript is highly insightful, referring to yeast vacuoles as "acidocalcisome-like" may warrant further discussion. Canonical acidocalcisomes are structurally and chemically distinct (e.g., electron-dense, in most cases spherical, and not routinely subjected to morphological changes, and enriched with specific ions), whereas yeast vacuoles have well-established roles beyond phosphate storage. A comment on this terminology could strengthen the comparative analysis and avoid potential confusion in the field.

    4. Reviewer #3 (Public review):

      Bru et al. investigated how inorganic phosphate (Pi) is buffered in cells using S. cerevisiae as a model. Pi is stored in cells in the form of polyphosphates in acidocalcisomes. In S. cerevisiae, the vacuole, which is the yeast lysosome, also fulfills the function of Pi storage organelle. Therefore, yeast is an ideal system to study Pi storage and mobilization.

      They can recapitulate in their previously established system, using isolated yeast vacuoles, findings from their own and other groups. They integrate the available data and propose a working model of feedback loops to control the level of Pi on the cellular level.

      This is a solid study, in which the biological significance of their findings is not entirely clear. The data analysis and statistical significance need to be improved and included, respectively. The manuscript would have benefited from rigorously testing the model, which would also have increased the impact of the study.

    5. Author response:

      Reviewer #1 (Public review): 

      The manuscript by Bru et al. focuses on the role of vacuoles as a phosphate buffering system for yeast cells. The authors describe here the crosstalk between the vacuole and the cytosol using a combination of in vitro analyses of vacuoles and in vivo assays. They show that the luminal polyphosphatases of the vacuole can hydrolyse polyphosphates to generate inorganic phosphate, yet they are inhibited by high concentrations. This balances the synthesis of polyphosphates against the inorganic phosphate pool. Their data further show that the Pho91 transporter provides a valve for the cytosol as it gets activated by a decline in inositol pyrophosphate levels. The authors thus demonstrate how the vacuole functions as a phosphate buffering system to maintain a constant cytosolic inorganic phosphate pool. 

      This is a very consistent and well-written manuscript with a number of convincing experiments, where the authors use isolated vacuoles and cellular read-out systems to demonstrate the interplay of polyphosphate synthesis, hydrolysis, and release. The beauty of this system the authors present is the clear correlation between product inhibition and the role of Pho91 as a valve to release Pi to the cytosol to replenish the cytosolic pool. I find the paper overall an excellent fit and only have a few issues, including: 

      (1) Figure 3: The authors use in their assays 1 mM ZnCl2 or 1mM MgCl2. Is this concentration in the range of the vacuolar luminal ion concentration? Did they also test the effect of Ca2+, as this ion is also highly concentrated in the lumen? 

      The concentrations inside vacuoles can reach those values. However, given that polyP is a potent chelator of divalent metal ions, what would matter are the concentrations of free Zn<sup>2+</sup> or Mg<sup>2+</sup> inside the organelle. These are not known. This is not critical since we use those two conditions only as a convenient tool to differentiate Ppn1 and Ppn2 activity in vitro. In our initial characterisation of Ppn2 (10.1242/jcs.201061), we had also tested Mn, Co, Ca, Ni, Cu. Only Zn and Co supported activity. Ca did not. Andreeva et al. (10.1016/j.biochi.2019.06.001) reached similar conclusions and extended our results.

      (2) Regarding the concentration of 30 mM K-PI, did the authors also use higher and lower concentrations? I agree that there is inhibition by 30 mM, but they cannot derive conclusions on the luminal concentration if they use just one in their assay. A titration is necessary here. 

      The concentration of 30 mM was not arbitrarily chosen. It is the luminal P<sub>i</sub> concentration that the vacuoles could reach through when they entered a plateau of luminal Pi. We consider this as an upper limit because polyP kept increasing which luminal P<sub>i</sub> did not. Thus, there is in principle no physiological motivation for trying higher values. But we will probably add a titration to the revised version.

      (3) What are the consequences on vacuole morphology if the cells lack Pho91? 

      We had not observed significant abnormalities during a screen of the genome-wide deletion collection of yeast (10.1371/journal.pone.0054160)

      (4) Discussion: The authors do not refer to the effect of calcium, even though I would expect that the levels of the counterion should affect the phosphate metabolism. I would appreciate it if they would extend their discussion accordingly. 

      We will pick this up in the discussion. However, the situation is much more complex because major pools of counterions (up to hundreds of mM) are constituted by vacuolar lysine, arginine, polyamines, Mg, Zn etc. Their interplay with polyP is probably complex and worth to be treated in a dedicated project.

      (5) I would appreciate a brief discussion on how phosphate sensing and control are done in human cells. Do they use a similar lysosomal buffer system? 

      Mammalian cells have their Pi exporter XPR1 mainly on a lysosome-like compartment (10.1016/j.celrep.2024.114316). Whether and how it functions there for Pi export from the cytosol is not entirely clear. We will address this situation in the revision.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript presents a well-conceived and concise study that significantly advances our understanding of polyphosphate (polyP) metabolism and its role in cytosolic phosphate (Pi) homeostasis in a model unicellular eukaryote. The authors provide evidence that yeast vacuoles function as dynamic regulatory buffers for Pi homeostasis, integrating polyP synthesis, storage, and hydrolysis in response to cellular metabolic demands. The work is methodologically sound and offers valuable insights into the conserved mechanisms of phosphate regulation across eukaryotes. 

      Strengths: 

      The results demonstrate that the vacuolar transporter chaperone (VTC) complex, in conjunction with luminal polyphosphatases (Ppn1/Ppn2) and the Pi exporter Pho91, establishes a finely tuned feedback system that balances cytosolic Pi levels. Under Pi-replete conditions, inositol pyrophosphates (InsPPs) promote polyP synthesis and storage while inhibiting polyP hydrolysis, leading to vacuolar Pi accumulation. 

      Conversely, Pi scarcity triggers InsPP depletion, activating Pho91-mediated Pi export and polyP mobilization to sustain cytosolic phosphate levels. This regulatory circuit ensures metabolic flexibility, particularly during critical processes such as glycolysis, nucleotide synthesis, and cell cycle progression, where phosphate demand fluctuates dramatically. 

      From my viewpoint, one of the most important findings is the demonstration that vacuoles act as a rapidly accessible Pi reservoir, capable of switching between storage (as polyP) and release (as free Pi) in response to metabolic cues. The energetic cost of polyP synthesis-driven by ATP and the vacuolar proton gradient-highlights the evolutionary importance of this buffering system. The study also draws parallels between yeast vacuoles and acidocalcisomes in other eukaryotes, such as Trypanosoma and Chlamydomonas, suggesting a conserved role for these organelles in phosphate homeostasis. 

      Weaknesses: 

      While the manuscript is highly insightful, referring to yeast vacuoles as "acidocalcisome-like" may warrant further discussion. Canonical acidocalcisomes are structurally and chemically distinct (e.g., electron-dense, in most cases spherical, and not routinely subjected to morphological changes, and enriched with specific ions), whereas yeast vacuoles have well-established roles beyond phosphate storage. A comment on this terminology could strengthen the comparative analysis and avoid potential confusion in the field. 

      Yeast vacuoles show all major chemical features of acidocalcisomes. They are acidified, contain high concentrations of Ca, polyP (which make them electron-dense, too), other divalent ions, such as Mg, Zn, Mn etc, and high concentrations of basic amino acids. Thus, they clearly have an acidocalcisome-like character. In addition, they have hydrolytic, lysosome-like functions and, depending on the strain background, they can be larger than acidocalcisomes described e.g. in protists. We will elaborate this point, which is obvious to us but probably not to most readers, in the revised version.

      Reviewer #3 (Public review): 

      Bru et al. investigated how inorganic phosphate (Pi) is buffered in cells using S. cerevisiae as a model. Pi is stored in cells in the form of polyphosphates in acidocalcisomes. In S. cerevisiae, the vacuole, which is the yeast lysosome, also fulfills the function of Pi storage organelle. Therefore, yeast is an ideal system to study Pi storage and mobilization. 

      They can recapitulate in their previously established system, using isolated yeast vacuoles, findings from their own and other groups. They integrate the available data and propose a working model of feedback loops to control the level of Pi on the cellular level. 

      This is a solid study, in which the biological significance of their findings is not entirely clear. The data analysis and statistical significance need to be improved and included, respectively. The manuscript would have benefited from rigorously testing the model, which would also have increased the impact of the study.

      It is not clear to us what the reviewer would see as a more rigorous test of the model.

    1. eLife Assessment

      This important study suggests that adolescent mice exhibit less accuracy than adult mice in a sound discrimination task when the sound frequencies are very similar. The evidence supporting this observation is solid and suggests that it arises from cognitive control differences between adolescent and adult mice. The adolescent period is largely understudied, despite its contribution to shaping the adult brain, which makes this study interesting for a broad range of neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      Praegel et al. explore the differences in learning an auditory discrimination task between adolescent and adult mice. Using freely-moving (Educage) and head-fixed paradigms, they compare behavioral performance and neuronal responses over the course of learning. The mice were initially trained for seven days on an easy pure frequency tone Go/No-go task (frequency difference of one octave), followed by seven days of a harder version (frequency difference of 0.25 octave). While adolescents and adults showed similar performance on the easy task, adults performed significantly better on the harder task. Quantifying the lick bias of both groups, the authors then argue that the difference in performance is not due to a difference in perception, but rather to a difference in cognitive control. The authors then used neuropixel recordings across 4 auditory cortical regions to quantify the neuronal activity related to the behavior. At the single cell level, the data shows earlier stimulus-related discrimination for adults compared to adolescents in both the easy and hard tasks. At the neuronal population level, adults displayed a higher decoding accuracy and lower onset latency in the hard task as compared to adolescents. Such differences were not only due to learning, but also to age as concluded from recordings in novice mice. After learning, neuronal tuning properties had changed in adults but not in adolescent. Overall, the differences between adolescent and adult neuronal data correlates with the behavior results in showing that learning a difficult task is more challenging for younger mice.

      Strengths:

      The behavioral task is well designed, with the comparison of easy and difficult tasks allowing for a refined conclusion regarding learning across age. The experiments with optogenetics and novice mice are completing the research question in a convincing way.

      The analysis, including the systematic comparison of task performance across the two age groups, is most interesting and reveals differences in learning (or learning strategies?) that are compelling.

      Neuronal recording during both behavioral training and passive sound exposure is particularly powerful, and allows interesting conclusions.

      Weaknesses:

      The weaknesses listed by this reviewer were addressed by adequate revisions.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to find out how and how well adult and adolescent mice discriminate tones of different frequencies and whether there are differences in processing at the level of the auditory cortex that might explain differences in behavior between the two groups. Adolescent mice were found to be worse at sound frequency discrimination than adult mice. The performance difference between the groups was most pronounced when the sounds are close in frequency and thus difficult to distinguish and could, at least in part, be attributed to the younger mice' inability to withhold licking in no-go trials. By recording the activity of individual neurons in the auditory cortex when mice performed the task or were passively listening as well as in untrained mice the authors identified differences in the way that the adult and adolescent brains encode sounds and the animals' choice that could potentially contribute to the differences in behavior.

      Strengths:

      The study combines behavioural testing in freely-moving and head-fixed mice, optogenetic manipulation and high density electrophysiological recordings in behaving mice to address important open questions about age differences in sound-guided behavior and sound representation in the auditory cortex.

      Weaknesses:

      The weaknesses listed by this reviewer were addressed by adequate revisions.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      Praegel et al. explore the differences in learning an auditory discrimination task between adolescent and adult mice. Using freely-moving (Educage) and head-fixed paradigms, they compare behavioral performance and neuronal responses over the course of learning. The mice were initially trained for seven days on an easy pure frequency tone Go/No-go task (frequency difference of one octave), followed by seven days of a harder version (frequency difference of 0.25 octave). While adolescents and adults showed similar performance on the easy task, adults performed significantly better on the harder task. Quantifying the lick bias of both groups, the authors then argue that the difference in performance is not due to a difference in perception, but rather to a difference in cognitive control. The authors then used neuropixel recordings across 4 auditory cortical regions to quantify the neuronal activity related to the behavior. At the single cell level, the data shows earlier stimulus-related discrimination for adults compared to adolescents in both the easy and hard tasks. At the neuronal population level, adults displayed a higher decoding accuracy and lower onset latency in the hard task as compared to adolescents. Such differences were not only due to learning, but also to age as concluded from recordings in novice mice. After learning, neuronal tuning properties had changed in adults but not in adolescent. Overall, the differences between adolescent and adult neuronal data correlates with the behavior results in showing that learning a difficult task is more challenging for younger mice.

      Strengths:

      The behavioral task is well designed, with the comparison of easy and difficult tasks allowing for a refined conclusion regarding learning across age. The experiments with optogenetics and novice mice are completing the research question in a convincing way.

      The analysis, including the systematic comparison of task performance across the two age groups, is most interesting, and reveals differences in learning (or learning strategies?) that are compelling.

      Neuronal recording during both behavioral training and passive sound exposure is particularly powerful, and allows interesting conclusions.

      Weaknesses:

      The presentation of the paper must be strengthened. Inconsistencies, missing information or confusing descriptions should be fixed.

      We have carefully re-read the manuscript and reviewed it for inconsistencies. We made several corrections in the figures. For example, we removed redundant lines from violin plots and statistics, applied consistent labels, matched y- and x-limits of graphics, and adjusted labels. We also clarified descriptions of some experiment by adding explanations to the text.

      The recording electrodes cover regions in the primary and secondary cortices. It is well known that these two regions process sounds quite differently (for example, one has tonotopy, the other not), and separating recordings from both regions is important to conclude anything about sound representations. The authors show that the conclusions are the same across regions for Figure 4, but is it also the case for the subsequent analysis? Comparing to the original manuscript, the authors have now done the analysis for AuDp and AUDv separately, and say that the differences are similar in both regions. The data however shows that this is not the case (Fig S7). And even if it were the case, how would it compatible with the published literature?

      To address this and previous concerns about regional differences, the manuscript now includes 4 figures (4-1, 4-3, 6-2, 7-1) and 5 supplemental tables (3,4, 5, 6, 8) that explicitly compare results across brain regions.

      Following the reviewer’s request for subsequent analysis, we now added a new supplemental figure (Fig. S6-2) and two new supplementary tables (Tables S5, S6). We show that similar to expert mice (supplementary Table 3, and supplementary Table 4), the firing properties of adolescent and adult novice mice differ across auditory subregions (supplementary Table 5). We also show that the different auditory subregions have different firing properties (supplementary Table 6). With respect to task engagement, we show that (similar to Fig. S4-2) the neuronal discriminability in different auditory subregions is similar in both novice and expert mice (Fig. S6-2).

      Following the comment on Fig. S7-1, we made three changes to the revised manuscript. First, we now highlight that the differences firing properties between adolescent and adult neurons in AUDp and AUDv were distinct, but not significantly different within age-group comparisons. Second, we clearly state that the learning related changes in the measured parameters are different between AUDp and AUDv. Note, however, the greater changes in adult neurons after learning remains consistent between AUDp and AUDv. Third, we softened our original claim but still highlighted the stronger learning-induced plasticity in adults.

      Regarding the concern that different regions should show different patterns due to their known differences (e.g. tonotopy). Of course we agree that different areas differ functionally (as shown in our own previous work and here as well). However, it is still plausible, and biologically reasonable, that developmental changes may proceed in a similar direction across different areas, even if their baseline coding properties differ.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to find out how and how well adult and adolescent mice discriminate tones of different frequencies and whether there are differences in processing at the level of the auditory cortex that might explain differences in behavior between the two groups. Adolescent mice were found to be worse at sound frequency discrimination than adult mice. The performance difference between the groups was most pronounced when the sounds are close in frequency and thus difficult to distinguish and could, at least in part, be attributed to the younger mice' inability to withhold licking in no-go trials. By recording the activity of individual neurons in the auditory cortex when mice performed the task or were passively listening as well as in untrained mice the authors identified differences in the way that the adult and adolescent brains encode sounds and the animals' choice that could potentially contribute to the differences in behavior.

      Strengths:

      The study combines behavioural testing in freely-moving and head-fixed mice, optogenetic manipulation and high density electrophysiological recordings in behaving mice to address important open questions about age differences in sound-guided behavior and sound representation in the auditory cortex.

      Weaknesses:

      For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them.

      We have carefully re-read the manuscript and reviewed it for analyses that lacked a clear rationale or conclusion. To address this, we have made several changes to clarify the reasoning and strengthen the interpretation of the results.

      Reviewer #1 (Recommendations for the authors):

      It would have helped if the authors had highlighted the changes they made to the manuscript compared to the original version - especially since many replies to the reviewers' comments were as vague as "...we fixed some of the wording so it adheres to the data shown", or "we refined our interpretation", without further details.

      The revised version has improved substantially, and the main claims have been discussed in a more objective way. Important new analyses have been added to allow for a refined interpretation of the results. However, the presentation of the data could still be strengthened significantly (in response to comment A from last review).

      We apologize for the lack of detail in some of our previous responses. Our intention was to keep the replies concise, assuming that the side-by-side version with tracked changes would make the edits sufficiently clear. However, we understand the need for greater transparency. Thus, below we provide the following five lists describing the major changes: (1) List of specific reviewer recommendations, (2) list of corrections in figures, (3) list of clarity issues, (4) list of fixed mistakes, (5) list of new figures. We hope this breakdown makes the revisions clearer and more accessible.

      List of specific reviewer recommendations:

      l.108 mentions a significant change in the vertical line of Fig 1F - Could this significance be indicated and quantified in the figure?

      We quantified and indicated the significance of the vertical line in Fig. 1f and Fig. 1i.

      Fig.1G - the thick and thin lines should be defined, as well as the grey and white dots (same values for adolescents, not for adults).

      (a) We removed the thin inner lines from the violin plot. We define the bar (thick line) of the violin plot in an additional sentence in the methods section under data analysis (LL820-823). b) We adjusted the marker outlines in the adult data (Fig. 1G).

      the figure axis legends should be consistent (trails in Fig D vs # trails in Fig 1F)

      We adjusted the axis legend to # trials in Fig. 1D.

      l.110: is d' always calculated based on the 100 last trials of a session, or is it just for Figure 1F? -etc...

      d’ is always calculated based on the last 100 trials. To clarify this, we added a description in the methods section (L830).

      List of corrections in the figures:

      (1) We removed the internal lines from violin plots in throughout Fig. 1-7.

      (2) We removed the underline of the statistics throughout Fig. 1-7.

      (3) We consistently applied ‘adolescent’ and ‘adult’ figure labels and titles with lowercase letters throughout Fig. 1-7.

      (4) We applied consistent labelling of ‘time (ms)’ throughout Fig. 1-7.

      (5) We matched the size of dashed lines throughout Fig. 1-6.

      (6) We adjusted the x-label of Fig. 1d, Fig. S-1-1 a, Fig. 3c, Fig. 3h-i, Fig, 4d to ‘# trials’.

      (7) We removed the x-label of ‘Experimental Group’ from Fig. 1 to enhance consistency with other figures.

      (8) We removed misaligned dots from the violin plots in Fig. 1g, Fig. 2f, Fig. 3f,g.

      (9) We corrected the plot in Fig. S1-1b.

      (10) We adjusted the y-limits of Fig. S1-1c to be consistent with Fig. S1-1d,e.

      (11) We adjusted the x-labels and y-labels of Fig, 2, Fig. S3-1, Fig, S3-2 and Fig. 3b to ‘freq. (kHz)’.

      (12) We added the age of adolescent and adult mice to the schematic timeline in Fig. 2a.

      (13) We added a label of the reinforcement delay to the schematic trial structure in Fig. 3b.

      (14) We added within-group statistics to Fig. 3e and the figure legend.

      (15) We adjusted the x-label of Fig. 3d to ‘# sessions’.

      (16) We adjusted the x-label of Fig. 3d and Fig. S3-1b to ‘# licks’.

      (17) We changed the y-label in Fig. S3-1a, and Fig. S3-2d, e to ‘lick ratio’ to avoid confusion with the lick rate (Hz) that was calculated in Fig. 4 and Fig. 6.

      (18) We replaced the titles ‘CAMKII’ with ‘dTomato’ in Fig. S3-2 to correctly highlight that both the experimental and control injection were CAMKII injections.

      (19) We adjusted the x-labels and y-labels of Fig, 2, Fig. S3-1, Fig, S3-2 and Fig. 3b to ‘freq. (kHz)’.

      (20) We adjusted the y-label of Fig. S4-1c to ‘# neurons’.

      (21) We matched the x-ticks in Fig. 4e,f.

      (22) We matched the x-ticks in Fig. 6d-g.

      (23) We changed the x-label in Fig. 4g, S4-2 and S6-2 to ‘duration (ms)’ to match the figure label with the manuscript.

      (24) We consistently label ‘Hit’, ‘Miss’, ‘FA’ and ‘CR’ with capital letters in Fig. 4d-e.

      (25) We replaced the double figure label ‘C.’ in Fig. S4-2 with ‘D.’.

      (26) We adjusted the dot-size in Fig. 5 to be equal for all graphs.

      (27) We added ticks to the experimental timeline in Fig. 6a.

      (28) We corrected the y-label in Fig.7c. Now it correctly reflects 5 attenuations from 72-32 dB SPL.

      (29) We matched the y-label of Fig. 7e-h and Fig. S7-1.

      List of clarity issues:

      (1) We replaced the term ‘lower response bias’ with ‘higher lick bias’ (L24) to accurately describe the more negative (lower) criterion-bias, which highlights a higher tendency to lick.

      (2) We replaced the term ‘response bias’ with ‘lick bias’ to consistently describe the calculated criterion-bias (L24, L149, L164, L455, L456, L468).

      (3) We clarify that the age-related differences were ‘more pronounced’ instead of simply ‘higher’ to accurately reflect not simply the increase in adolescent lick-bias, but also the decrease in adult lick-bias (L31).

      (4) We clarified that adolescent sound representations are not merely ’distinct’, but ‘not fully mature’ in L83.

      (5) We clarified in L180 that the impulsive responses we observed in adolescent mice could be related to being ‘less impacted by punishments’.

      (6) We clarified the differences in firing properties of auditory sub-regions analyzed in Supplementary Table 3 (L287-295).

      (7) We explained and clarified the reference to Fig. 3j (LL252-253).

      (8) We added statistics to Fig.S4-2 to support our claim that there are no differences in the onset-latency, duration of discriminability and maximal discriminability between different sub-regions within age-groups (LL 314-315).

      (9) We expanded our explanation of the results in Table 3 (LL370-379).

      (10) We separated the reference to Fig. 6b and Fig. 6c to clarify their meaning (LL358-361).

      (11) We clarified the differences in basic firing properties during the FRA protocol in Fig. 7 (LL409-418).

      (12) We expanded our explanation of the differences of the learning related firing properties in AUDp and AUDv of Fig. S7-1 (LL426-433).

      (13) We changed the term ‘plasticity profiles’ to ‘learning related plasticity’ to further clarify our limitation that L5/6 and L2/3 may exhibit distinct learning related changes (L496).

      (14) We changed the term ‘sluggish’ (L481) to ‘delayed’ to more precisely explain differences between adolescent and adult tuning properties.

      (15) We clarified that the running d’ was calculated in bins of 25 trials, instead of ‘the last 25 trials’ (LL845-846).

      List of fixed mistakes:

      (1) We corrected and matched the age to more accurately reflect the age mice were recorded (P37-42 and P77-82).

      (2) We corrected the attenuation range from 72-42 to 72-32 dB SPL to correctly reflect the 5 attenuations used in the protocol.

      (3) We corrected the number of channels shown in the voltage trace from 10 to 11 (Fig. S4-1a)

      (4) We corrected the number of neurons recorded in novice adolescent mice in the legend of Fig. 6 from 140 to 130 (Fig. 6b).

      (5) We removed redundant, or double brackets, commas, dots, and semi-colons in the figure legends.

      (6) We corrected the LME statistics Table 2.

      List of new figures and tables:

      (1) We added a new supplementary figure to accompany Figure 6. Specifically, Fig. S6-2, shows the interaction of the three measured discriminability properties (onset delay, duration of discriminability, and maximal discriminability) in novice compared to expert mice in the easy and hard task (Go compared to No Go). The figure compares the different auditory sub-regions (similar to Fig. S4-2). We show that the discriminability properties within different groups is not significantly different among the four different sub-regions.

      (2) Supplementary Table 5: We compared the firing properties in different auditory subregions in novice mice, and found (similar to expert mice) that the firing properties differ between adult and adolescent mice across the four different sub-regions.

      (3) Supplementary Table 6: We compared the firing properties between different subregions, separately for adolescent and adult novice mice. Similar to expert mice, we found that different auditory subregions differ in their auditory firing properties.

      Reviewer #2 (Recommendations for the authors):

      The authors largely addressed my suggestions.

      Comparing hit vs correct rejection trials in the population decoding analysis (L313-314): The authors acknowledge that comparing these two trial types conflates choice and stimulus decoding but I am not convinced that the changes to the manuscript text make this clear enough to the reader.

      Thank you for pointing this out. We have made additional revisions to clarify this, and other issues more explicitly, as follows:

      (1) We have expanded the explanation of how our population decoding analysis conflates stimulus and choice, and we acknowledge the limitations of this approach in the Abstract (L28), the Results section (L324-326, LL367-370) and the Discussion (LL516-519).

      (2) We replaced the analysis of impulsivity on the head-fixed task. Instead of analyzing all it is, we focus only on ITIs following FA trials (Fig. S3-1c,d). This is more consistent with the analysis in the Educage (Fig. S2-1), where we show that adolescents exhibit increased impulsivity after FA trials. We found a similar result for ITIs following FA trials in the head-fixed task.

      (3) To provide complementary insight, we now further justify our use of the Fisher separation metric alongside decoding accuracy in Figure 5, with a clearer rationale provided in LL343-345

      (4) We also clarified our reasoning for focusing on 62 dB SPL in the FRA-based analysis in LL400-403.

    1. eLife Assessment

      This study presents a valuable finding on the representational structure of task encoding in the prefrontal cortex. The evidence supporting the claims of the authors is solid, representing an impressive data collection effort and best-practice fMRI analyses. However, at least including visual regions as a control and controlling for behavioral differences in the task in representation analyses would have strengthened the study. The work will be of interest to cognitive neuroscientists interested in the neural basis of cognitive control.

    2. Reviewer #1 (Public review):

      Summary:

      Bhandari and colleagues present tour-de-force analyses that compare the representational geometry in the lateral prefrontal cortex and primary auditory cortex between two complex cognitive control tasks, with one having a "flat" structure where subjects are asked to form rote memory of all the stimulus-action mappings in the task and one having a "hierarchical" task structure that allows clustering of task conditions and that renders certain stimulus dimensions irrelevant for choices. They discovered that the lPFC geometry is high-dimensional in nature in that it allows above-chance separation between different dichotomies of task conditions. The separability is significantly higher for task-relevant features than task-irrelevant ones. They also found task features that are represented in an "abstract" format (e.g., audio features), i.e., the neural representation generalizes across specific task conditions that share this variable. The neural patterns in lPFC are highly relevant for behaviors as they are correlated with subjects' reaction times and choices.

      Strengths:

      Typically, geometry in coding patterns is reflected in single-unit firings; this manuscript demonstrates that such geometry can be recovered using fMRI BOLD signals, which is both surprising and important. The tasks are well designed and powerful in revealing the differences in neural geometry, and analyses are all done in a rigorous way. I am thus very enthusiastic about this paper and identify no major issues.

      I am curious about the consequence of dimensionality collapse in lPFC. The authors propose a very interesting idea that separability is critical for cognitive control; indeed, separability is high for task-relevant information. What happens when task-relevant separation is low or task-irrelevant separation is high, and will this lead to behavioral errors? Maybe a difference score between the separability of task-relevant and task-irrelevant features is a signature of the strength of cognitive control?

      Weaknesses:

      The authors show a difference between flat and hierarchical tasks, but the two tasks are different in accuracy, with the flat task having more errors. Will this difference in task difficulty/errors contribute to the task differences in results reported?

    3. Reviewer #2 (Public review):

      Summary:

      The authors study the influence of tasks on the representational geometry of the lPFC and auditory cortex (AC). In particular, they use two context-dependent tasks: a task with a hierarchical structure and a task with a flat structure, in which each context/stimulus maps to a specific response. Their primary finding is that the representational geometry in the lPFC, in contrast to AC, aligns with the optimal organization of the task. They conclude that the geometry of representations adapts, or is tailored, to the task in the lPFC, therefore supporting control processes.

      Strengths:

      (1) Dataset:<br /> The dataset is impressive and well-sampled. Having data from both tasks collected in the same subjects is a great property. If it is publicly available, it will be a significant contribution to the community.

      (2) Choice of methods:<br /> The choice of analyses are largely well-suited towards the questions at hand - cross-condition generalization, RSA + regression, in combination with ANOVAs, are well-suited to characterizing task representations.

      (3) I found some of their results, in particular, those presented in Figures 4 and 5, to be particularly compelling.

      (4) The correlation analysis with behavior is also a nice result.

      Weaknesses:

      (1) Choice of ROIs:<br /> A strength of fMRI is its spatial coverage of the whole brain. In this study, however, the authors focus on only two ROIs: the lPFC and auditory cortex. Though I understand the justification for choosing lPFC from decades of research, the choice of AC as a control feels somewhat arbitrary - AC is known to have worse SNR in fMRI data, and limiting a 'control' to a single region seems arbitrary. For example, why not also include visual regions, given that the task also involves two visual features?

      (2) Construction of ROIs:<br /> The choice and construction of the ROIs feel a bit arbitrary, as the lPFC region was constructed out of 10 parcels from Schaefer, while the AC was constructed from a different methodology (neurosynth). Did both parcels have the same number of voxels/vertices? It would be helpful to include a visualization of these masks as a figure.

      (3) Task dimensionality:<br /> In some ways, the main findings - that representation dimensionality is tailored to the task - seem to obviously follow from the choice of two tasks, particularly from a normative modeling perspective. For example, the flat task is effectively a memorization task, and is incompressible in the sense that there are no heuristics to solve it. In contrast, the hierarchical task can have several strategies, an uncompressed (memorized) strategy, and a compressed strategy. This is analogous to other studies evaluating representations during 'rich' vs. 'lazy'/kernel learning in ANNs. However, it seems unlikely (if not impossible) to form a 'rich' representation in the flat task. Posed another way, the flat task will always necessarily have a higher dimensionality than the hierarchical task. Thus, is their hypothesis - that representational geometry is tailored to the task - actually falsifiable? I understand the authors posit alternative hypotheses, e.g., "a fully compressed global axis with no separation among individual stimulus inputs could support responding [in the flat task]" (p. 36). But is this a realistic outcome, for example, in the space of all possible computational models performing this task? I understand that directly addressing this comment is challenging (without additional data collection or modeling work), but perhaps some additional discussion around this would be helpful.

      (4) Related to the above:<br /> The authors have a section on p. 27: "Local structure of lPFC representational geometry of the flat task shows high separability with no evidence for abstraction" - I understand a generalization analysis can be done in the feature space, but in practice, the fact that the flat task doubles as a memorization task implies that there are no useful abstractions, so it seems to trivially follow that there would be no abstract representations. In fact, the use of task abstractions in the stimulus space would be detrimental to task performance here. I could understand the use of this analysis as a control, but the phrasing of this section seems to indicate that this is a surprising result.

      (5) Statistical inferences:<br /> Throughout the manuscript, the authors appear to conflate failure to reject the null with acceptance of the null. For example, p. 24: "However, unlike left lPFC, paired t-tests showed no reliable difference in the separability of the task-relevant features vs the orthogonal, task-irrelevant features... Therefore, the overall separability of pAC representations is not shaped by either task-relevance of task structure."

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, Bhandari, Keglovits, et al. explore the representational structure of task encoding in the lateral prefrontal cortex. Through an impressive fMRI data-collection effort, they compare and contrast neural representations across tasks with different high-level stimulus-response structures. They find that the lateral prefrontal cortex shows enhanced encoding of task-relevant information, but that most of these representations do not generalize across conditions (i.e., have low abstraction). This appears to be driven in part by the representation of task conditions being clustered by the higher-order task properties ('global' representations), with poor generalization across these clusters ('local' representations). Overall, this paper provides an interesting account of how task representations are encoded in the PFC.

      Strengths:

      (1) Impressive dataset, which may provide further opportunities for investigating prefrontal representations.

      (2) Clever task design, allowing the authors to confound several features within a complex paradigm.

      (3) Best-practice analysis for decoding, similarity analyses, and assessments of representational geometry.

      (4) Extensive analyses to quantify the structure of PFC task representations.

      Weaknesses:

      (1) The paper would benefit from improved presentational clarity: more scaffolding of design and analysis decisions, clearer grounding to understand the high-level interpretations of the analyses (e.g., context, cluster, abstraction), and better visualizations of the key findings.

      (2) The paper would benefit from stronger theoretical motivation for the experimental design, as well as a refined discussion on the implications of these findings for theories of cognitive control.

    5. Author response:

      We thank the reviewers and editors for their careful and constructive assessment of our manuscript. We have provided a provisional response to the eLife assessment and the reviewer’s public comments below, addressing their main concerns and outlining our planned revisions that we believe will substantially strengthen our paper.  

      eLife Assessment

      This study presents a valuable finding on the representational structure of task encoding in the prefrontal cortex. The evidence supporting the claims of the authors is solid, representing an impressive data collection effort and best-practice fMRI analyses. However, at least including visual regions as a control and controlling for behavioral differences in the task in representation analyses would have strengthened the study. The work will be of interest to cognitive neuroscientists interested in the neural basis of cognitive control.

      We plan to address both specific methodological weaknesses mentioned in the assessment in our forthcoming revision. First, the revision will include analyses of an early visual cortex ROI as an additional control region, allowing us to test whether the primary auditory cortex findings generalize to the sensory cortex across input modalities. Preliminary results indicate that the early visual cortex ROI exhibits a similar pattern of results, with evidence for coding both task-relevant and task-irrelevant visual dimensions across both tasks, as well as the context dimension specifically in the hierarchy task. Second, we will include behavioral performance as a covariate for the relevant statistical comparison across tasks to mitigate concerns over performance-related confounds. In addition, we will include a set of control analyses that demonstrate that equating the amount of data for pattern analyses across the two tasks by subsampling from the hierarchy task, while reducing our overall power, does not appreciably alter our results. We note that our analyses of representational geometries relied only on neural data from correct trials and, in the first-level modelling of the fMRI data, already controlled for differences in trial-by-trial response times. Therefore, our analyses of decoding and representation similarity are not directly affected by differences in performance across the two tasks. Finally, we have provided clarifications regarding Reviewer 2’s questions about the size and construction of the regions of interest employed in the study, as well as about the language employed to discuss null results.  

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bhandari and colleagues present tour-de-force analyses that compare the representational geometry in the lateral prefrontal cortex and primary auditory cortex between two complex cognitive control tasks, with one having a "flat" structure where subjects are asked to form rote memory of all the stimulus-action mappings in the task and one having a "hierarchical" task structure that allows clustering of task conditions and that renders certain stimulus dimensions irrelevant for choices. They discovered that the lPFC geometry is high-dimensional in nature in that it allows above-chance separation between different dichotomies of task conditions. The separability is significantly higher for task-relevant features than task-irrelevant ones. They also found task features that are represented in an "abstract" format (e.g., audio features), i.e., the neural representation generalizes across specific task conditions that share this variable. The neural patterns in lPFC are highly relevant for behaviors as they are correlated with subjects' reaction times and choices.

      Strengths:

      Typically, geometry in coding patterns is reflected in single-unit firings; this manuscript demonstrates that such geometry can be recovered using fMRI BOLD signals, which is both surprising and important. The tasks are well designed and powerful in revealing the differences in neural geometry, and analyses are all done in a rigorous way. I am thus very enthusiastic about this paper and identify no major issues.

      I am curious about the consequence of dimensionality collapse in lPFC. The authors propose a very interesting idea that separability is critical for cognitive control; indeed, separability is high for task-relevant information. What happens when task-relevant separation is low or task-irrelevant separation is high, and will this lead to behavioral errors? Maybe a difference score between the separability of task-relevant and taskirrelevant features is a signature of the strength of cognitive control?

      We appreciate the reviewers’ positive evaluation of our paper.

      Weaknesses:

      The authors show a difference between flat and hierarchical tasks, but the two tasks are different in accuracy, with the flat task having more errors. Will this difference in task difficulty/errors contribute to the task differences in results reported?

      To address the Reviewer’s concern about the difference in behavioural performance between the two tasks influencing our results, we will take several approaches. First, we will include behavioral performance as a covariate for the relevant statistical comparison across tasks. This should ensure that any differences we observe across tasks are over and above those that can be explained by the difference in behavioral performance. Second, we will include a set of decoding analyses that control for differences in performance across the tasks. We note that all our analyses of representational geometries relied on neural data from correct trials only. In addition, the first-level modelling of the fMRI data already controlled for trial-by-trial variability in response times. Therefore, our decoding and representation similarity analyses should not directly be affected by differences in performance across the two tasks. However, one possible issue with this approach is that the larger number of errors in the flat task means that less data was available for estimating multivoxel patterns in the flat task compared to the hierarchy task, resulting in differential power to detect decoding effects across the two tasks. We note that the on average, this difference was not substantial: on average, 21.7 runs were available per participant for the flat task, while 23.8 runs per participant were available for the hierarchy task. Moreover, rerunning our analyses with the number of runs equated for each participant does not meaningfully alter the pattern of results. These additional analyses will be included in the supplement in the forthcoming revised manuscript.  

      Reviewer #2 (Public review):

      Summary:

      The authors study the influence of tasks on the representational geometry of the lPFC and auditory cortex (AC). In particular, they use two context-dependent tasks: a task with a hierarchical structure and a task with a flat structure, in which each context/stimulus maps to a specific response. Their primary finding is that the representational geometry in the lPFC, in contrast to AC, aligns with the optimal organization of the task. They conclude that the geometry of representations adapts, or is tailored, to the task in the lPFC, therefore supporting control processes.

      Strengths:

      (1) Dataset:

      The dataset is impressive and well-sampled. Having data from both tasks collected in the same subjects is a great property. If it is publicly available, it will be a significant contribution to the community.

      (2) Choice of methods:

      The choice of analyses are largely well-suited towards the questions at hand - crosscondition generalization, RSA + regression, in combination with ANOVAs, are well-suited to characterizing task representations.

      (3) I found some of their results, in particular, those presented in Figures 4 and 5, to be particularly compelling.

      (4) The correlation analysis with behavior is also a nice result.

      We thank the reviewer for noting the strengths of the paper. We respond to the weaknesses noted below. 

      Weaknesses:

      (1) Choice of ROIs:

      A strength of fMRI is its spatial coverage of the whole brain. In this study, however, the authors focus on only two ROIs: the lPFC and auditory cortex. Though I understand the justification for choosing lPFC from decades of research, the choice of AC as a control feels somewhat arbitrary - AC is known to have worse SNR in fMRI data, and limiting a 'control' to a single region seems arbitrary. For example, why not also include visual regions, given that the task also involves two visual features?

      We agree with the reviewer that the whole-brain fMRI data certainly provide ample opportunities to explore the nature of these representations across the brain. Our focus in this paper is squarely on the principles of coding and flexibility in the lPFC. We believe that a whole-brain exploration addresses a separate question that would be out of the scope of this study. To clarify, we are not arguing that the lPFC is the only region in the brain that employs the coding principles that our study brings to light. Our contention is only that lPFC employs these principles, and it differs at least from the primary sensory cortex. The questions of whether these principles generalize beyond lPFC (quite likely) and, if so, how broadly, are distinct from the ones addressed in the manuscript. We intend to follow up with another manuscript that addresses these questions.

      Nevertheless, given the focus of this paper, we agree that a second control region, which allows one to test if the primary auditory cortex findings generalize to the sensory cortex more broadly, would strengthen our claims. We will include an early visual cortex ROI in our forthcoming revision. Preliminary results indicate that the early visual cortex ROI shows a similar set of findings – with evidence for coding of task-relevant and taskirrelevant visual dimensions across both tasks, but also specifically the context dimension in the hierarchy task. These results will be detailed in the forthcoming revision

      (2) Construction of ROIs:

      The choice and construction of the ROIs feel a bit arbitrary, as the lPFC region was constructed out of 10 parcels from Schaefer, while the AC was constructed from a different methodology (neurosynth). Did both parcels have the same number of voxels/vertices? It would be helpful to include a visualization of these masks as a figure.

      We defined the lPFC ROIs by selecting Schaefer parcels in the frontal lobe that were previously mapped onto the Control A resting state network identified by Yeo et al. (2011). This network aligns with the multiple-demand network, which has also been identified in the macaque, where it includes the lPFC regions that abut the principal sulcus. Prior results from these regions in the monkey brain provide the scientific premise for our hypotheses. The two lPFC ROIs in each hemisphere were constructed out of 5 Schaefer parcels in each hemisphere. These parcels cluster into the same functional network and tend to behave similarly in univariate analyses. Given that our hypotheses do not distinguish between the different parcels, we elected to improve power by merging them into left and right dlPFC ROIs. 

      On the other hand, the same approach could not be used to identify the primary auditory cortex. As Yeo et al. noted in their paper, the 17 resting state networks they identify did not adequately parcellate somatomotor and auditory cortices into distinct networks, likely due to their proximity (see Fig 14 and related text in Yeo et al. (2011)). We therefore relied on a different approach to define the primary auditory cortex, using an association test in Neurosynth to obtain a map of regions associated with the term “primary auditory”. In the revised manuscript, we will also include a primary auditory cortex ROI, defined again using a term-based association test in Neurosynth.

      Our lPFC ROIs and pAC ROIs are of similar size. In the left hemisphere, the lPFC ROI (constructed from merging Schaefer parcels 128-thru-132) has, on average, 624.55 voxels. The left pAC ROI (defined with Neurosynth) has, on average, 628 voxels. In the right hemisphere, the lPFC ROI (constructed from merging Schaefer parcels 330-thru334), has 470.8 voxels on average. The right pAC ROI has, on average, 568 voxels. A table reporting the size of our parcels and ROIs was included in the supplement. In our forthcoming revision, we will additionally include a supplementary figure visualizing the ROI masks. 

      (3) Task dimensionality:

      In some ways, the main findings - that representation dimensionality is tailored to the task - seem to obviously follow from the choice of two tasks, particularly from a normative modeling perspective. For example, the flat task is effectively a memorization task, and is incompressible in the sense that there are no heuristics to solve it. In contrast, the hierarchical task can have several strategies, an uncompressed (memorized) strategy, and a compressed strategy. This is analogous to other studies evaluating representations during 'rich' vs. 'lazy'/kernel learning in ANNs. However, it seems unlikely (if not impossible) to form a 'rich' representation in the flat task. Posed another way, the flat task will always necessarily have a higher dimensionality than the hierarchical task. Thus, is their hypothesis - that representational geometry is tailored to the task - actually falsifiable? I understand the authors posit alternative hypotheses, e.g., "a fully compressed global axis with no separation among individual stimulus inputs could support responding [in the flat task]" (p. 36). But is this a realistic outcome, for example, in the space of all possible computational models performing this task? I understand that directly addressing this comment is challenging (without additional data collection or modeling work), but perhaps some additional discussion around this would be helpful.

      We thank the reviewer for this comment, which gives us a chance to clarify our argument.

      As noted by the reviewer, whether a network takes advantage of the compressibility of a task depends on its learning regime (i.e. rich vs lazy). One way to frame our question regarding the lPFC’s coding strategy, then, is to ask whether it operates in a rich or a lazy learning regime (which would predict, respectively, task-tailored vs task-agnostic representations). The reviewer’s concern is that the two task structures we employed are differentially compressible, and therefore, it is inevitable that we observe tailored representations and therefore, our hypotheses are not falsifiable.

      First, it is important to clarify the theoretical premise behind our design and how it relates logically to our hypotheses. Under a lazy learning regime, a network would encode highdimensional representations of both tasks, regardless of their compressibility. On the other hand, under a rich learning regime, representational dimensionality will likely be shaped by the tasks’ structure. If the two tasks differ in their compressibility, only in the rich learning regime would the network learn representations of different dimensionality. Therefore, observing representations with dimensionality tailored to the task structure rules out the possibility that the lPFC is operating in a lazy regime. Therefore, the hypotheses are certainly testable.

      The second point of clarification is that, contrary to the reviewer’s assertion, the flat task is, in fact, compressible – the task can be solved with a categorical representation of the response categories, with no sensitivity to the different specific stimuli within each category. Indeed, it is possible to train a simple, three-layer feedforward artificial neural network to perform the flat task perfectly with only 2 units in the hidden layer, demonstrating this compressibility. While we agree with the reviewer that in the space of all possible architectures one might consider the two tasks may differ in compressibility, particularly at the local levels, as we noted above, this does not imply that our hypotheses are not testable.

      Finally, as a third point of clarification, our focus in this paper is on understanding the nature of coding in the lPFC in particular. Arguments based on a normative modelling perspective properly apply to the representations learned by an agent (such as an ANN or a human) as a whole. In a minimal feedforward ANN with a single hidden layer trained in a regime which encourages compression (i.e. a rich learning regime), it would indeed be the case that the representational dimensionality in that hidden layer would be higher for less compressible tasks. However, when applied to humans, such an argument applies to the brain as a whole rather than to an individual region of the brain like the lPFC. As such, it is less straightforward to predict how a single region might represent a task without additional information about the region’s inputs, outputs and broader position in a network. Even for a highly compressible task, a particular brain region may nevertheless be sensitive to all task dimensions. Conversely, even when a task is not compressible, a particular population within the brain may be invariant to some task features. For example, the primary auditory cortex is expected to be invariant to visual task dimensions.

      Therefore, how a task is represented in the lPFC in particular (as opposed to the whole brain) depends on its computational function and coding principles, which remain debated. For instance, as some accounts (such as the guided activation theory) posit, if the primary function of the lPFC is to encode ‘context’ and shape downstream processing based on context, we might only expect to see the abstract coding of the auditory context in the hierarchy task (and, perhaps, the response categories across both tasks as they encode the ’context’ for the lower-level response decision), while being invariant to lowerlevel features of the input. In our paper, we specifically contrast two accounts of lPFC coding that have emerged in the literature – one positing that the lPFC learns a representation tailored to the structure of the task, and another that the lPFC encodes a high-dimensional representation that privileges sensitivity to many task features and their non-linear mixture at the cost of generalization. Regardless of the compressibility of the tasks in question, how the lPFC encodes the two tasks is an empirical question.

      In our forthcoming revision, we will clarify these points in the discussion. We will also include the results of neural network simulations alluded to above.

      (4) Related to the above:

      The authors have a section on p. 27: "Local structure of lPFC representational geometry of the flat task shows high separability with no evidence for abstraction" - I understand a generalization analysis can be done in the feature space, but in practice, the fact that the flat task doubles as a memorization task implies that there are no useful abstractions, so it seems to trivially follow that there would be no abstract representations. In fact, the use of task abstractions in the stimulus space would be detrimental to task performance here. I could understand the use of this analysis as a control, but the phrasing of this section seems to indicate that this is a surprising result.

      As explained above, there is no need for high local separability in the flat task. The lPFC could have completely abstracted over the individual trial-types that contributed to each response category, encoding only the response categories. Indeed, as also noted above, it is possible to train a simple, three-layer feedforward artificial neural network to perform the flat task perfectly with only 2 units in the hidden layer. The two hidden layer units code for each of the two response categories. 

      (5) Statistical inferences:

      Throughout the manuscript, the authors appear to conflate failure to reject the null with acceptance of the null. For example, p. 24: "However, unlike left lPFC, paired t-tests showed no reliable difference in the separability of the task-relevant features vs the orthogonal, task-irrelevant features... Therefore, the overall separability of pAC representations is not shaped by either task-relevance of task structure."

      We thank the reviewer for pointing these out. These sentences will be corrected in the revision. For instance, the sentence above will be modified to “Therefore, we find no evidence that the overall separability of pAC representations is shaped by either taskrelevance or task structure.”

      Reviewer #3 (Public review):

      Summary:

      In this paper, Bhandari, Keglovits, et al. explore the representational structure of task encoding in the lateral prefrontal cortex. Through an impressive fMRI data-collection effort, they compare and contrast neural representations across tasks with different highlevel stimulus-response structures. They find that the lateral prefrontal cortex shows enhanced encoding of task-relevant information, but that most of these representations do not generalize across conditions (i.e., have low abstraction). This appears to be driven in part by the representation of task conditions being clustered by the higher-order task properties ('global' representations), with poor generalization across these clusters ('local' representations). Overall, this paper provides an interesting account of how task representations are encoded in the PFC.

      Strengths:

      (1) Impressive dataset, which may provide further opportunities for investigating prefrontal representations.

      (2) Clever task design, allowing the authors to confound several features within a complex paradigm.

      (3) Best-practice analysis for decoding, similarity analyses, and assessments of representational geometry.

      (4) Extensive analyses to quantify the structure of PFC task representations.

      Weaknesses:

      (1) The paper would benefit from improved presentational clarity: more scaffolding of design and analysis decisions, clearer grounding to understand the high-level interpretations of the analyses (e.g., context, cluster, abstraction), and better visualizations of the key findings.

      (2) The paper would benefit from stronger theoretical motivation for the experimental design, as well as a refined discussion on the implications of these findings for theories of cognitive control.

      We thank the reviewer for highlighting the strengths of our paper and their feedback on the writing. We have reviewed these helpful suggestions with an eye to which we may implement in our revision to improve clarity. Our forthcoming revision will 1) provide clearer scaffolding to aid the reader in understanding our design, analyses and our interpretation of the results 2) incorporate the MDS-based visualization of the representational geometries, which is currently presented in the Supplement, as a figure panel in the main text, 3) provide a justification for the particular task structures we picked in the introduction and 4) incorporate a new paragraph in the Discussion section to highlight the implications of our findings for cognitive control.

    1. eLife Assessment

      The study introduces new tools for measuring the intracellular calcium concentration close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. This approach yields important new information about the spatial and temporal profile of calcium concentrations near the site of entry at the plasma membrane. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in calcium domains. The conclusions are solid and well supported by the data.

    2. Reviewer #1 (Public Review):

      This paper describes technically impressive measurements of calcium signals near synaptic ribbons in zebrafish bipolar cells. The data presented provides high spatial and temporal resolution information about calcium concentrations along the ribbon at various distances from the site of entry at the plasma membrane. This is important information. The experiments appear to be well-done and provide strong evidence for the main conclusions reached.

      Strengths

      The technical aspects of the measurements are impressive. The authors use calcium indicators bound to the ribbon and high-speed line scans to resolve changes with a spatial resolution of ~250 nm and temporal resolution of less than 10 ms. These spatial and temporal scales are much closer to those relevant for vesicle release than previous measurements. Hence the results provide a unique window onto these events.

      The use of calcium indicators with very different affinities and of different intracellular calcium buffers helps provide confirmation of key results.

    3. Reviewer #2 (Public review):

      Summary:

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal.

      Strengths:

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM.

      Comments on revisions:

      Several concerns were raised about the kinetic analyses, and the authors have carefully acknowledged the critiques. The ideal outcome would have been a more complete kinetic readout and analyses (in particular a better readout of risetime would have improved the results). In the absence of a suitable readout of the risetime, the authors scaled back their claims and improved on the description of the falling phase of the signals. The authors have given a reasonable response under the circumstances.

      In addition, the authors provided more context to their results.

      I have no further concerns.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors have developed a new Ca indicator conjugated to the peptide, which likely recognizes synaptic ribbons and have measured microdomain Ca near synaptic ribbons at retinal bipolar cells. This interesting approach allows one to measure Ca close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. Though microdomain Ca at the active zone of ribbon synapses has been measured by Hudspeth and Moser, the new study uses the peptide recognizing synaptic ribbons, potentially measuring the Ca concentration relatively proximal to the release sites.

      Strengths:

      The study is, in principle, technically well done, and the peptide approach is technically interesting, which allows one to image Ca near the particular protein complexes. The approach is potentially applicable to other types of imaging.

      Weaknesses:

      Peptides may not be entirely specific, and genetic approach tagging particular active zone proteins with fluorescent Ca indicator proteins may well be more specific. The readers should be aware of this, when interpreting the results.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      This paper describes technically-impressive measurements of calcium signals near synaptic ribbons in goldfish bipolar cells. The data presented provides high spatial and temporal resolution information about calcium concentrations along the ribbon at various distances from the site of entry at the plasma membrane. This is important information. Important gaps in the data presented mean that the evidence for the main conclusions is currently inadequate. 

      Strengths 

      The technical aspects of the measurements are impressive. The authors use calcium indicators bound to the ribbon and high speed line scans to resolve changes with a spatial resolution of ~250 nm and temporal resolution of less than 10 ms. These spatial and temporal scales are much closer to those relevant for vesicle release than previous measurements. 

      The use of calcium indicators with very different affinities and of different intracellular calcium buffers helps provide confirmation of key results. 

      Thank you very much for this positive evaluation of our work.

      Weaknesses 

      Multiple key points of the paper lack a statistical test or summary data from populations of cells. For example, the text states that the proximal and distal calcium kinetics in Figure 2A differ. This is not clear from the inset to Figure 2A - where the traces look like scaled versions of each other. Values for time to half-maximal peak fluorescence are given for one example cell but no statistics or summary are provided. Figure 8 shows examples from one cell with no summary data. This issue comes up in other places as well. 

      Thank you for this fair and valuable feedback. Following also the suggestion by the Editor, we have now removed the rise-time kinetic fitting results from the manuscript and only retain the bi-exponential decay time constant values. Further, we explicitly detail the issues with kinetic fitting, and state that the precise quantitative conclusions should not be drawn from the differences in kinetic parameters (pages 7 and 2728). 

      We have included the results of paired-t-tests to compare the amplitudes of proximal vs. distal calcium signals shown in Fig. 2A & B, Fig. 3C & D, Fig. 4C & D, Fig. 5A-D, and Fig. 8E&F. Because proximal and distal calcium signals were obtained from the same ribbons within 500-nm distances, as the Reviewer pointed out, “the traces look like scaled versions of each other”. For experiments where we make comparisons across cells or different calcium indicators, as shown in Fig. 3E & F, Fig.5E, and Fig. 8B&C, we have included the results of an unpaired t-test. We have also included the t-test statistics information in the respective figure legends in the revised version.

      In Figure 8, we have shown example fluorescence traces from two different cells at the bottom of the A panel, and example traces from different ribbons of RBC a in the D, and the summary data is described in B-C and E-F, with statistics provided in the figure legends.

      The rise time measurements in Figure 2 are very different for low and high affinity indicators, but no explanation is given for this difference. Similarly, the measurements of peak calcium concentration in Figure 4 are very different with the two indicators. That might suggest that the high affinity indicator is strongly saturated, which raises concerns about whether that is impacting the kinetic measurements. 

      Yes, we do believe that the high-affinity indicator is partially saturated, and therefore, the measurement with the low-affinity indicator dye is a more accurate reflection of the measured Ca<sup>2+</sup> signal. We now state this more explicitly in the text. Further, we note that the rise time values are no longer listed due to lack of statistical significance for such comparisons, as noted above.

      Reviewer #2 (Public review): 

      Summary: 

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal. 

      Strengths: 

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM. 

      Thank you very much for this positive evaluation of our work.

      Comments on revisions: 

      Specific minor comments: 

      (1) Rewrite the final sentence of the Abstract. It is difficult to understand. 

      Thank you for pointing that out. We have updated the final sentence of the Abstract.

      (2) Add a definition in the Introduction (and revisit in the Discussion) that delineates between micro- and nano-domain. A practical approach would be to round up and round down. If you round up from 0.6 um, then it is microdomain which means ~ 1 um or higher. Likewise, round down from 0.3 um to nanodomain? If you are using confocal, or even STED, the resolution for Ca imaging will be in the 100 to 300 nm range. The point of your study is that your new immobile Ca2-ribbon indicator may actually be operating on a tens of nm scale: nanophysiology. The Results are clearly written in a way that acknowledges this point but maybe make such a "definition" comment in the intro/discussion in order to: 1) demonstrate the power of the new Ca2+ indicator to resolve signals at the base of the ribbon (effectively nano), and 2) (Discussion) to acknowledge that some are achieving nanoscopic resolution (50 to 100nm?) with light microscopy (as you ref'd Neef et al., 2018 Nat Comm).  

      Thank you for the valuable comments. We have now provided this information in the introduction and discussion.  

      (3) Suggested reference: Grabner et al. 2022 (Sci Adv, Supp video 13, and Fig S5). Here rod Cav channels are shown to be expressed on both sides the ribbon, at its base, and they are within nanometers from other AZ proteins. This agrees with the conclusions from your imaging work.  

      Thank you for the valuable suggestion. We have now provided this information in the introduction and discussion.

      (4) In the Discussion, add a little more context to what is known about synaptic transmission in the outer and inner retina.. First, state that the postsynaptic receptors (for example: mGluR6-OnBCs vs KARs-OffBCs, vs. AMPAR-HCs), and possibly the synaptic cleft (ground squirrel), are known to have a significant impact on signaling in the outer retina. In the inner retina, there are many more unknowns. For example, when I think of the pioneering Palmer JPhysio study, which you sight, I think of NMDAR vs AMPAR, and uncertainty in what type postsynaptic cell was patched (GC or AC....). Once you have informed the reader that the postsynapse is known to have a significant impact on signaling, then promote your experimental work that addresses presynaptic processes: "...the new tool and results allow us to explore release heterogeneity, ribbon by ribbon in dissociated preps, which we eventually plan to use at ribbon synapses within slices......to better understand how the presynapse shapes signaling......". 

      Thank you for the valuable comments. We have now provided this information in the introduction and discussion.

      Reviewer #3 (Public review): 

      Summary: 

      In this study, the authors have developed a new Ca indicator conjugated to the peptide, which likely recognizes synaptic ribbons and have measured microdomain Ca near synaptic ribbons at retinal bipolar cells. This interesting approach allows one to measure Ca close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. Though microdomain Ca at the active zone of ribbon synapses has been measured by Hudspeth and Moser, the new study uses the peptide recognizing synaptic ribbons, potentially measuring the Ca concentration relatively proximal to the release sites. 

      Strengths: 

      The study is, in principle, technically well done, and the peptide approach is technically interesting, which allows one to image Ca near the particular protein complexes. The approach is potentially applicable to other types of imaging. 

      Thank you very much for this appreciation.

      Weaknesses: 

      Peptides may not be entirely specific, and genetic approach tagging particular active zone proteins with fluorescent Ca indicator proteins may well be more specific. Although the authors are aware of this and the peptide approach is generally used for ribbon synapses, the authors should be aware of this, when interpreting the results. 

      We acknowledge the reviewer’s point and believe the peptides and genetic approaches to measure local calcium signals have their merits, each with separate advantages and disadvantages.  

      Reviewer #1 (Recommendations for the authors): 

      The revisions helped with some concerns about the original paper, but some issues were not adequately addressed. I have left two primary concerns in my public review. To summarize those: 

      The difference in kinetics of proximal and distal locations is emphasized and quantified in the paper, but the quantification consists of a fit to the average responses. This does not give an idea of whether the difference observed is significant or not. Without an estimate of the error across measurements the difference in kinetic quoted is not interpretable. 

      Thank you for this feedback. Since the kinetics information is a minor part of the manuscript, we have followed the Editor’s advice to significantly tone down the comparison of kinetic fit parameters (completely removing the rise-time comparisons), in order to put more focus on the better-documented conclusions. We also note that we did establish statistical significance of the differences in fluorescence signal amplitudes. 

      Somewhat relatedly, the difference in amplitude and kinetics of the calcium signals measured with low and high affinity indicators is quite concerning. The authors added one sentence stating that the high affinity indicator might be saturated. This is not adequate. Should we distrust the measurements using the high affinity indicator? The differences between the results using the low and high affinity indicators is in some cases large - e.g. larger than the differences cited as a key result between distal and proximal locations. This issue needs to be dealt with directly in the paper. 

      Thank you for this feedback. Yes, the measurements from high-affinity indicators cannot report the Ca2+ as accurately as low-affinity indicators. However, the value of HA indicators is in their ability to detect lowamplitude signals that lower-affinity indicators may miss due to lower signal-to-noise resolution.  We added a sentence on page 12 to further stress this point.

      Related to the point about statistics, it is not clear how to related the horizontal lines in Figure 8 to the actual measurements. It is critical for the evaluation of the conclusions from that figure to understand what is plotted and what the error bars are on the plotted data. 

      We apologize for the earlier ambiguity in Fig. 8. In this figure, we first compare proximal (panel B) and distal (panel C) calcium signals across several RBCs, labeled RBC-a through RBC-d. Each RBC contains multiple ribbons, and for each cell, we present the average calcium signals from multiple ribbons using box plots in panels B and C. In these box plots, the horizontal lines represent the average calcium signal for each cell, while the size of the error bars reflects the variability in proximal and distal calcium signals among the ribbons within that RBC.

      For example, RBC-a had five identifiable ribbons. In panels D–F, we use RBC-a to illustrate the variability in calcium signals across individual ribbons. Specifically, we distinguished proximal and distal calcium signals from five ribbons (ribbons 1–5) within RBC-a. When feasible, we acquired multiple x–t line scans at a single ribbon, shown now as individual data points, to assess variability in calcium signals recorded from the same ribbon.

      The box plots in panels E and F display the average calcium signal (horizontal lines) for each ribbon, based on multiple recordings. These plots demonstrate considerable variability between ribbons of RBC-a. Importantly, the lack of or minimal error bars for repeated measurements at the same ribbon indicates that the proximal and distal calcium signals are consistent within a ribbon. These findings emphasize that the observed variability among ribbons and among cells reflects true biological heterogeneity in local calcium domains, rather than experimental noise.

    1. eLife Assessment

      This useful study presents a hierarchical computational model that integrates locomotion, navigation, and learning in Drosophila larvae. The evidence supporting the model is solid, as it qualitatively replicates empirical behavioral data, but the experimental data is incomplete. While some simplifications in neuromechanical representation and sensory-motor integration are limiting factors, the study could be of use to researchers interested in computational modeling of biological movement and adaptive behavior.

    2. Reviewer #1 (Public review):

      Summary:

      The paper presents a three-layered hierarchical model for simulating Drosophila larva locomotion, navigation, and learning. The model consists of a basic locomotory layer that generates crawling and turning using a coupled oscillator framework, incorporating intermittency in movement through alternating runs and pauses. The intermediate layer enables navigation by allowing larvae to actively sense and respond to odor gradients, facilitating chemotaxis. The adaptive learning layer integrates a spiking neural network model of the Mushroom Body, simulating associative learning where larvae modify their behavior based on past experiences. The model is validated through simulations of free exploration, chemotaxis, and odor preference learning, demonstrating close agreement with empirical behavioral data. This modular framework provides a valuable advance for modeling larva behavior.

      Strengths:

      Every modeling paper requires certain assumptions and abstractions. The main strength of this paper lies in its modular and hierarchical approach to modeling behavior, making connections to influential theories of motor control in the brain. The authors also provide a convincing discussion of the experimental evidence supporting their layered behavioral architecture. This abstraction is valuable, offering researchers a useful conceptual framework and marking a significant step forward in the field. Connections to empirical larval movement are another major strength.

      Weaknesses:

      While the model represents a conceptual advance in the field, some of its assumptions and choices fall behind state-of-the-art approaches. One limitation is the paper's simplified representation of larval neuromechanics, in which the body is reduced to a two-segment structure with basic neural control. Another limitation is the absence of an explicit neuromuscular control system, which would better capture the role of segmental central pattern generators (CPGs) and neuronal circuits in regulating peristalsis and turning in Drosophila larvae. Many detailed neuromechanical models, as cited by the authors, have already been published. These abstractions overlook valuable experimental studies that detail segmental dynamics during crawling and the larval connectome.

      The strength of the model could also be its weakness. The model follows a subsumption architecture, where low-level behaviors operate autonomously while higher layers modulate them. However, this approach may underestimate the complexity of real neural circuits, which likely exhibit more intricate feedback mechanisms between sensory input and motor execution.

    3. Reviewer #2 (Public review):

      Summary:

      Sakagiannis et al. propose a hierarchically layer architecture to larval locomotion and foraging. They go from exploration to chemotaxis and odour preference test after associative learning.

      Strengths:

      A new locomotion model based on two oscillators that also incorporates peristaltic strides.

      Weaknesses:

      • The model is not always clearly or sufficiently explained (chemotaxis and odour test).

      • Data analysis of the model movement is not very thorough.

      • Comparisons with locomotion of behaving animals missing in chemotaxis and odour preference test after associative learning.

      • Overall it is hard to judge the descriptive and predictive value of the model.

    4. Reviewer #3 (Public review):

      Summary:

      This paper presents a framework for a multilevel agent-based model of the drosophila larva, using a simplified larval body and locomotor equations coupled to oscillators and sensory input. The model itself is built upon significant existing literature, particularly Wystrach, Lagogiannis, and Webb 2016 and Jürgensen et al. 2024. The aim is to generate an easily configurable, well-documented platform for organism-scale behavioral simulation in specific experiments. The authors demonstrate qualitative similarity between in vivo behavioral experiments to calibrated models.

      Strengths:

      The goal is excellent - a system to rapidly run computational experiments that align naturally with behavioral experiments would be well-suited to develop intuitions and cut through hypotheses. The authors provide quantitative descriptions that show that the best-fit parameters in their models produce results that agree with several properties of larval locomotion.

      The description of model calibration in the appendix is clear and explains several aspects of the model better than the main text.

      In addition, the code is well-organized using contemporary Python tooling and the documentation is nicely in progress (although it remains incomplete). However, see notes for difficulties with installation.

      Weaknesses:

      (1) As presented here the modeling itself is described in an unclear fashion and without a particular scientific question. The majority of the effort appears to be calibrating modest extensions of existing models and applying them to very simple experiments. This could be an effective first part of a paper on the software tool, but the paper needs to point to a scientific question or, if it is a tool paper, a gap in the current state of modeling tools needed to address scientific goals. While the manuscript has a good overview of larval behavioral papers, the discussion of modeling is more of an afterthought. However, the paper is a modeling paper and the contribution is to modeling and particularly with this work's minor adaptions of existing models, it is unclear what the principle contribution is intended to be.

      (2) While the models presented do qualitatively agree with experimental data in specific situations, there is no effort to challenge the model assumptions or compare them to alternative models. Simply because the data is consistent in a small number of simple experiments does not mean that the models are correct. Moreover, given the highly empirical nature of the modeling, I wonder what results are largely the model putting out what was put in, particularly with regards to kinematic results like frequency and body length or the effect of learning simply changing the sensory gain constant. It is difficult to imagine how at this level of empirical modeling, it would appear quite difficult to integrate the type of cell-type-specific perturbation or functional observation that is common in larval experiments.

      (3) The central framing of a "layered control architecture" does not have a significant impact on the work presented here and the paper would do better with less emphasis on it. Given the limited empirical models, there are only so many parameters where different components can influence one another, and as best as I can tell from the paper there is only chemotaxis and modulation of a chemotactic gain constant that are incorporated so far. However, since these are empirical functions it says little about how the layers are actually controlled by the nervous system - indeed, the larval nervous system appears to have many levels of local and long-range module of circuits at both the sensory and motor layers. It is not clear how this aspect would contribute beyond the well-appreciated concept of a relatively finite set of behavioral primitives in an insect brain, particularly for the fly larva. What would be a contradictory model and how would the authors differentiate between that and the one they currently propose? If focusing only on olfactory learning and chemotaxis, how does the current framing add to the existing understanding?

      (4) The paper uses experimental data to calibrate the models, however, the experiments are not described at all in the text.

    5. Author response:

      We thank all three anonymous reviewers for their thoughtful evaluations of our manuscript and for recognizing the conceptual advance in combining agent-based behavioral simulations with systems neuroscience models. We are especially encouraged by the acknowledgement of the framework’s potential to support simulation of neural control of individual animal behavior in realistic sensory environments.

      Below, we respond to each reviewer’s public comments in turn. Throughout, we have aimed to clarify our rationale for modeling choices, acknowledge limitations, and outline concrete steps for improvement in the revised manuscript.

      Furthermore, the call for a better description of the model implementation as voiced by all three reviewers and additional requests from community members has prompted us to formulate a separate technically detailed description of the publicly available larvaworld software package as well as of the readily implemented models in form of a preprint paper (Sakagiannis et al., 2025, bioRxiv, DOI: https://doi.org/10.1101/2025.06.15.659765).

      Reviewer #1:

      We are happy to read that this reviewer considers the proposed behavioral architecture ‘a significant step forward in the field’, and that she/he recognizes the strengths of our work in the modular and hierarchical approach that provides connections to influential theories of motor control in the brain, in the experimental evidence it is based on, and in the valuable abstractions that we have chosen for the larval behavioral modeling.

      The reviewer raises important points about the simplifications we have made, both conceptually and in the specific implementation of larval behaviors. Our main goal in this study is to introduce a conceptual framework that integrates agent-based modeling with systems neuroscience models in a modular fashion. To serve this purpose, we aimed for a minimal yet representative implementation at the motor layer of the architecture, calibrated to larval locomotion kinematics. This choice enables efficient simulation while allowing us to test top-down modulation and adaptive mechanisms in higher layers without the computational overhead of a full neuromechanical model. In addition to chemotaxis, we have recently used this simplified approach to model thermotaxis in larvae (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The reviewer notes the absence of explicit segmental neuromuscular control or central pattern generators (CPGs). We deliberately abstracted from these mechanisms, representing the larval body as two segments with basic kinematic control, to focus on reproducing overall locomotor patterns. This bisegmental simplification, which we illustrate in Supplemental Video “Bisegmental larva-body simplification”, retains the behavioral features relevant to our current aims. However, the modular structure of the framework means that more detailed neuromechanical models—incorporating CPG dynamics or connectome-derived circuit models—can be integrated in future work without altering the architecture as a whole.

      We fully agree that real neural circuits are more complex than a strict subsumption architecture implies. In the Drosophila larva, there is clear evidence for ascending sensory feedback from the motor periphery to premotor and higher brain circuits, as well as neuromodulatory influences. These add layers of complexity beyond the predominantly descending control in our present model. At the same time, both larval and adult connectome data show that across-level descending and ascending connections are sparse compared to the dense within-layer connectivity. We see value in casting our model as a hierarchical control system precisely to make the strengths and limitations of such an abstraction explicit. The revised manuscript will include further discussion of these points.

      In summary, our design choices reflect a trade-off: by limiting the biological detail in the lower layers, we gain computational efficiency and maintain a clear modular structure that can host models at different levels of abstraction. This ensures that the architecture remains both a tool for immediate behavioral simulation and a scaffold for integrating richer neural and biomechanical models as they become available.

      Reviewer #2:

      We thank the reviewer for recognizing the novelty of our locomotory model, particularly the implementation of peristaltic strides based on our new analyses of empirical larval tracks, and for providing constructive feedback that will help us improve the manuscript.

      The reviewer highlights the need for clearer explanations of the chemotaxis and odor preference modules. We expand these sections in the revised manuscript with more explicit descriptions of model structure, parameterization, and calibration. As mentioned above, we have also prepared a separate preprint dedicated to the larvaworld Python package, which contains detailed implementation notes and hands-on tutorials that allow users to adapt or extend individual modules.

      Regarding the comparison to empirical behavior in chemotaxis, our present analysis is indeed primarily qualitative. However, we would like to emphasize that the temporal profile of odor concentration at the larval head in our simulations matches that measured in Gomez-Marin et al. (Nature Comm., 2011, DOI: https://doi.org/10.1038/ncomms1455) using only one additional free parameter, while all parameters of the basic locomotory model had been fitted to a separate exploration dataset before and were kept fixed in the chemotaxis experiments. In addition to the simulation of chemotaxis in the present paper, we recently used larvaworld in a practical model application to estimate a species-specific parameter of thermotaxis from experiments across different drosophilids (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The preference index in our simulations was computed using the same definition as in the established experimental group assay for larval memory retention, enabling a direct quantitative comparison between simulated and empirical results. Variability in the simulated outcomes arose naturally from inter-individual differences in body length and locomotory parameters, derived from real larval measurements, as well as from the random initial orientation of each individual in the arena. These factors contributed to variation in individual tracks and ultimately produced preference index values that closely matched those observed experimentally. In the revised manuscript, we also discuss handedness, as highlighted by the reviewer, as another meaningful expression of inter-individual variability in Drosophila larvae and insects more generally.

      Finally, we acknowledge the reviewer’s concern about the scalability and broader applicability of the model. While the present paper focuses on three specific behavioral paradigms (exploration, chemotaxis, odor preference), the modular structure of the architecture is designed for flexibility: modules at any layer can be exchanged for more detailed or alternative implementations, and new sensory modalities or behaviors can be integrated without redesigning the system. The larvaworld package, associated codebase, and documentation are openly available to encourage adoption and adaptation by the larval research community.

      Reviewer #3:

      This public review provides an excellent account of our central aim to build an easily configurable, well-documented platform for organism-scale behavioral simulation and we are happy to read that the reviewer considers this an excellent goal.

      We thank the reviewer for her/his account of our well-organized code using contemporary Python tooling. We are currently further improving code readability and code documentation, and we will release a new version of the larvaworld Python package. We further agree with the reviewer’s assessment that understanding the model calibration currently requires reading of the appendix. For the revised manuscript we thus aim at improving our description of all calibration and modeling steps along the way. We will also make sure to improve the description of the experimental datasets used for calibration.

      We recognize that our description of the paper’s scientific contribution could be clearer. In revision, we will sharpen the Introduction and Discussion to highlight our main contributions:

      (1) Promoting a shift from isolated neural circuit modeling to integrated agent-based simulations in realistic environments.

      (2) Proposing the layered behavioral architecture, adopting the subsumption paradigm for modular integration.

      (3) Providing the larvaworld software as a ready-to-use, extensible modeling platform.

      (4) Implementing an empirically calibrated locomotory model and demonstrating its integration with navigation and learning modules in replicated behavioral paradigms.

      We agree with the reviewer that the next challenge is to integrate the empirically based behavioral simulations presented here with functional brain models capable of reproducing or predicting experimental findings at the level of cellular neurophysiology, including the effects of cell-type-specific manipulations such as gene knock-down or optogenetic activation/inhibition. However, based on our experience with systems-level modeling, we deliberately invested in behavioral simulation because functional models of the nervous system—including our own—often lack translation into simulated agent behavior. In many cases, model output is limited to one or more variables that can at best be interpreted as a behavioral bias, and most often represents an “average animal” that fails to capture inter-individual differences. By linking our spiking mushroom body model to behavioral simulations in a group of individual agents during memory retention tests (Figure 6C,D), we were able to achieve a first successful direct comparison between simulated and experimental behavior metrics—in this case, the behavioral preference index reported in Jürgensen et al. (iScience, 2024, DOI: https://doi.org/10.1016/j.isci.2023.108640).

      Finally, we reiterate that the layered behavioral architecture is designed to promote a modular modeling paradigm. Our adoption of a subsumption architecture does not conflict with the concept of behavioral primitives; on the contrary, the notion that such primitives follow (semi-)autonomous motor programs and can be combined into more complex behaviors was the starting point for our implementation of the architecture in the fly larva. In our view, a genuinely contradictory paradigm for neural control of behavior would require a non-modular, strictly non-hierarchical organization of the nervous system and, by extension, of behavioral control.

    1. eLife Assessment

      NeuroSC is an accessible and interactive tool for streamlined observation of neuronal morphology, membrane contact, and synaptic connectivity across developmental stages in the nematode C. elegans. This important tool relies on solid electron microscopy datasets. This resource will be of high interest to C. elegans researchers interested in nervous system wiring and circuit function.

    2. Reviewer #2 (Public review):

      Summary

      The past several years has seen publication of both new (Witvliet et al., 2021) and newly analyzed (Cook et al., 2019; Moyle et al., 2021; Brittin et al., 2021) data for the C. elegans connectome. The increase in data availability for a single species allows researchers to examine variability due to both stochastic events and due to changes over development. The quantity of these data are huge. To help the community make these data more accessible, the authors present a new online tool that allows examination of 3D models for C. elegans neurons in the central neuropil across development. In addition to visualizing the overall structure of the neuronal processes and locations of synapses, the NeuroSC tool also allows users to probe into the C-PHATE visualization results, which this group previously pioneered to describe similarities in neuron adjacency (Moyle et al., 2021).

      Strengths

      The ability to visualize the data from both a connectomics and contactomics perspective across developmental time has significant power. The original C. elegans connectome (White et al., 1986) presented their circuits as line drawings with chemical and electrical synapses indicated through arrows and bars. While these line drawings are incredibly useful, they were necessary simplifications for a 2D publication and lack details of the complex architecture seen within each EM image. Koonce et al takes advantage of their own and others segmented image data of each neuronal process within the nerve ring to create a web interface where users can visualize 3D models for their neuron of choice. The C-PHATE visualization is intended to allow users to explore similarities among different neurons in terms of adjacency and then go directly to the 3D model for these neurons. The 3-D models it generates are beautiful and will likely be showing up in many future presentations and publications. The tool doesn't require any additional downloading and is open source. This revision includes an option where hovering over an individual neurons, synapse, or contact will pull up a statistics panel. The addition of text to the video tutorials in the revision is very useful.

      Weaknesses

      There are several bugs with this tool, which make it a bit clunky to use and suggest a lack of rigorous testing. There are also issues with data availability. I was disappointed that my "recommendations for the authors", which focused on the user interface, were not addressed in the response to reviewers.

    3. Reviewer #3 (Public review):

      Summary:

      This work provides graphical tools for reconstructing the detailed anatomy of a nervous system from a series of sections imaged by electron microscopy. Contact between neuronal processes can direct outgrowth and is necessary for connectivity, thus function. A bioinformatic approach is used to group neurons according to shared features (e.g., contact, synapses) in a hierarchy of "relatedness" that can be interrogated at each step. In this work, Koonze et al analyze vEM data sets for the C. elegans nerve ring (NR), a dense fascicle of processes from181 neurons. In a bioinformatic approach, the clustering algorithm Diffusion Condensation (DC) groups neurons according to similar cell biological features in iterations that remove chunks of differences in feature data with each step ultimately merging all NR neurons in one cluster. DC results are displayed with C-Phate a 3D visualization tool to produce a trajectory that can be interrogated for cell identities and other features at each iterative step. In previous work by these authors, this approach was utilized to identify subgroups of neuronal processes or "strata" in the NR that can be grouped by physical contact and connectivity. Here they expand their analysis to include a series of available vEM data sets across C. elegans larval development. This approach suggests that strata initially established during embryonic development are largely preserved in the adult. Importantly, exceptions involving stage specific-specific reorganization of neuronal placement in specific strata were also detected. A case study featured in the paper demonstrates the utility of this approach for visualizing the integration of newly generated neurons into the existing NR anatomy. Visualization tools used in this work are publicly available at NeuroSCAN.

      Strengths:

      A web-based app, NeuroSCAN, that individual researchers can use to interrogate the structure and organization of the C. elegans nerve ring across development.

      Weaknesses:

      minor revisions

      Comments on Revisions:

      The authors have satisfactorily addressed my critiques.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      Comment 

      Koonce et al. have generated a web-based visualization tool for exploring C. elegans neuronal morphology, contact area between neurons, and synaptic connectivity data. Here, the authors integrate volumetric segmentation of neurons and visualization of contact area patterns of individual neurons generated from Diffusion Condensation and C-PHATE embedding based on previous work from adult volumetric electron microscopy (vEM) data, extended to available vEM data for earlier developmental stages, which effectively summarizes modularity within the collated C. elegans contactomes to date. Overall, NeuroSC's relative ease of use for generating visualizations, its ability to quickly toggle between developmental stages, and its integration of a concise visualization of individual neurons' contact patterns strengthen its utility.

      We thank that reviewer for this positive assessment of our work.

      Comment

      NeuroSC provides an accessible and convenient platform. However, many of the characteristics of NeuroSC overlap with that of an existing tool for visualizing connectomics data, Neuroglancer, which is a widely-used and shared platform with data from other organisms. The authors do not make clear their motivation for generating this new tool rather than building on a system that has already collated previous connectomics data. Although the field will benefit from any tool that collates connectomics data and makes it more accessible and user-friendly, such a tool is only useful if it is kept up-to-date, and if data formatting for submitting electron microscopy data to be added to the tool is made clear. It is unclear from this manuscript whether NeuroSC will be updated with recently published and future C. elegans connectomes, or how additional datasets can be submitted to be added in the future.

      We have added new language to more explicitly state the motivations for developing NeuroSC (Introduction, lines 98-111, and discussion lines 375-384). In a new discussion section, we also include comparisons of the features of NeuroSC with other existing tools, like Neuroglancer and Webknossos, (lines 393-417).

      Briefly, the functional features of NeuroSC are substantially different (and do not exist) in other web-based tools for navigating EM datasets, including NeuroGlancer. This is because the intended use of NeuroSC is substantially different (and purposefully synergistic) to the intended use, and tools available, in NeuroGlancer. 

      NeuroGlancer is a versatile tool designed primarily for web-based visualizations and sharing of large EM datasets. NeuroSC was not designed to enable this type of access to the primary EM data (purposefully done because these features were already available through tools like NeuroGlancer). 

      Instead, the explicit goal of NeuroSC is to provide a platform specifically optimized for examining neuronal relationships across connectomic datasets. NeuroSC builds on the segmentations emerging from programs like NeuroGlancer, but the tools are tailored to explore relationships such as contact profiles in the context of neuronal morphologies and synaptic positions, and across datasets that represent different animals or different developmental stages. 

      To achieve this, all datasets in NeuroSC were optimized to facilitate comparisons across different connectomes of segmented neuronal features, including: 1) alignment of the neurons that are compared upon the display of the segmentations; 2) synchronization of the 3D windows; 3) implementation of a ‘universal color code’ across datasets for each neuron and relationship for easy visual comparisons; 4) use of the specific neuronal names to label instances of the same cells across all available datasets. The use of precise neuronal names among separate data sets allows integration of these objects with other catalogued datasets, including genomic and neuronal activity profiles.

      The formatting and display of the datasets used in NeuroSC was accompanied by the development of new tools including: 1) Rendering of the contact profiles of all neurons in the context of the morphology of the cell and the synapses and 2) C-PHATE diagrams to inspect multidimensional relationship hierarchies based on these contact profiles. In NeuroSC, C-PHATEs can be navigated and compared across multiple stages of development while visualizing neuronal reconstructions, allowing users to compare neuronal relationships across individual datasets.

      We agree with the reviewer that these tools are most useful when integrated. With that intention in mind, we designed NeuroSC as a series of modular, open-source tools that could be integrated into other programs, including Neuroglancer. In that sense our intent was not to produce another free-standing tool, but a set of tools that, if useful, could be integrated to other existing web-based connectomic resources to enhance the user experience of navigating complex EM datasets and draw biological meaning from the relationships between the neurons. Additionally, we intentionally designed NeuroSC to enable the ability to integrate new methods of understanding neuron relationships as they arise. We have dedicated a more detailed section to the discussion (lines 369- 417) to better convey this intention and directly address the unique abilities of NeuroSC as a complementary tool to the powerful existing tools, including Neuroglancer.

      Comment

      The interface for visualizing contacts and synapses would be improved with better user access to the quantitative underlying data. When contact areas or synapses are added to the viewer, adding statistics on the magnitude of the contact area, the number of synapses, and the rank of these values among the neuron's top connections, would make the viewer more useful for hypothesis generation. Furthermore, synapses are currently listed individually, with names that are not very legible to the web user. Grouping them by pre- and postsynaptic neurons and linking these groups across developmental stages would also be an improvement.

      [what do they even mean by linking?]

      We thank the reviewer for this insightful comment and have implemented several improvements to address these suggestions. Specifically, we have added new features to enhance user access to quantitative data within the NeuroDevSCAN viewer:

      Cell, Patch, and Synapse Statistics: Users can now see a statistics panel when clicking on a rendered neuron, contact patch, or a synapse. These panels provide the following information, respectively, and are highlighted in lines 303-315):

      Cell Stats: Click on a cell rendering to show cell stats which displays the total volume and surface area of the selected neuron within the defined neuropil area of our datasets (see Methods). 

      Contact Stats: Click on a patch rendering to show ‘contact stats’. This pop up displays quantifications of the selected contact relationship. Rank compares the summed surface area of contacts ("patches") between these two neurons relative to all other contact relationships for the primary neuron for the cell and the whole nerve ring. A rank of 1, for example, means this neuron pair shares the largest contact surface area of the examined relationship. “Total surface area” is displayed in nanometers, and is the summed surface area of all patches of this identity. Contact percentages are presented in two ways: (1) as the proportion of the primary cell's total surface area occupied by the contact in question, and (2) as the proportion of the total surface area of the nerve ring occupied by that same contact. (Showcased in figure S5). 

      Synapse Stats: A click on a synapse rendering now shows ‘synapse stats’, which displays the number of synapses of the selected identity within the primary neuron, including any polyadic synapse combinations involving the primary neurons. (Showcased in figure S7).

      (1) Grouping and Readability Improvements: While individual synapses are still visualized, their display has been improved for legibility. We have condensed the lengthy naming scheme to improve clarity and codified the synapse type by using superscript letters C, E, U to represent chemical, electrical and undefined synapses, respectively. This is explained and shown in figure S7, we added arrows to indicate the directionality of presumed information flow at each synapse. 

      (2) Developmental Linkage: We can link objects across datasets via cellular identity, but each synapse in the dataset does not yet have an identity attributed to its spatial coordinates, preventing us from linking specific synapses across development beyond their connectivity (ie, that a given synapses connects cell X to cell Y, for instance), also addressed in R1.11.  

      Together, these improvements substantially enhance the utility of the viewer for hypothesis generation by making key quantitative data readily accessible.

      Comment

      While the DC/C-PHATE visualizations are a useful tool for the user, it is difficult to understand when grouping or splitting of cell contact patterns is biologically significant. DC is a deterministic algorithm applied to a contactome from a single organism, and the authors do not provide quantitative metrics of distances between individual neurons or a number of DC iterations on the C-PHATE plot, nor is the selection process for the threshold for DC described in this manuscript. In the application of DC/C-PHATE to larval stage nerve ring strata organization shown by the authors, qualitative observations of C-PHATE plots colored based on adult data seem to be the only evidence shown for persistent strata during development (Figure 3) or changing architectural motifs across stages (Figure 4). Quantitation of differences in neuron position within the DC hierarchy, or differences in modularity across stages, is needed to support these conclusions. Furthermore, illustrating the quantitative differences in C-PHATE plots used to make these conclusions will provide a more instructive guide for users of NeuroSC in generating future hypotheses.

      There are several ways to visualize DC outputs, and one way to quantitatively compare DC clustering events of neurons is via Sankey diagrams. To make the inclusion of these resources more clear, we have highlighted them in lines 175-178 (Supplemental Tables 3-6). ‘DC outputs for each strata across animals can also be inspected using Sankey diagrams (Supplemental Tables 3-6). These spreadsheets detail the neuron members at each iteration of DC, allowing the user to derive quantitative comparisons of clustering events.’

      As the reviewer points out, DC is a deterministic algorithm that will iteratively cluster neurons based on the similarity of their contact profiles. To better explain the selection process for the threshold, the number of DC iterations and the quantitative metrics between the neurons, we have added new text in the Diffusion Condensation methods section.  Briefly:

      Number of DC iterations: During diffusion Condensation (DC) we track the modularity of the resulting clusters at each iteration and select the iteration with the highest modularity to define the clusters that represent the strata  (Moyle et al., 2021), (Brugnone et al., 2019). Mathematically, modularity is calculated by comparing the actual number of edges within clusters to the expected number of such edges in a randomized network with the same degree distribution (Newman et al., 2006). A higher modularity value implies that nodes within the same cluster are more densely connected to each other than to nodes in other clusters. We now better explain this in lines 562-567.

      Threshold for merging points: The threshold (epsilon) used to merge data points in each iteration is set as a small fraction of the spatial extent of the data: for each coordinate dimension (x, y, z), we compute the range (maximum minus minimum), take the maximum of these three values, and divide it by 10,000. This process is performed iteratively for each round of clustering until all data points cluster into a single point. We have updated the manuscript to clarify this threshold selection and included this information in the revised algorithm description and pseudocode. We now better explain this in lines 556-559.

      Distances between neurons in DC C-PHATE: In our previous description in Box 1 algorithm 1, we had provided a general algorithm for DC for any high dimensional dataset. We have now revised the algorithm to indicate how we used DC for these EM datasets. 

      Distances between neurons are determined by the pixel overlap between their segmented shapes in the EM dataset. We use these distances to build a graph with weighted edges, in which the weight of the edge represents the pixel overlap (the adjacency in the actual EM segmentation). Affinities between neurons, which are a proxy for their distance in the graph, are then computed as now revised in Box 1, Algorithm 1. This process is done iteratively as neurons cluster. To better communicate this, we have changed the text in lines 533-538.  

      Comment

      R1.5. While the case studies presented by the authors help to highlight the utility of the different visualizations offered by the NeuroSC platform, the authors need to be more careful with the claims they make from these correlative observations. For example, in Figure 4, the authors use C-PHATE clustering patterns to make conclusions about changes in clustering patterns of individual neurons across development based on single animal datasets. In this and many other cases presented in this study with the limited existing datasets, it is difficult to differentiate between developmental changes and individual variability between the neurite positions, contacts, and synapse differences within these data. This caveat needs to be clearly addressed.

      We now better explain in the manuscript that the selected case study, of the AVF neuron outgrowth, is not one of just correlation based solely on an EM dataset. Instead, the case study represents the NeuroSC-driven exploration of a biologically significant event supported by several independent datasets, as now explained in lines 257-276.

      Briefly, we agree with the reviewer that examining differences across individual EM datasets is insufficient evidence to make conclusions about developmental changes. But the strength of NeuroSC is in its ability to combine and compare multiple datasets, bolstering observations that are not possible by looking at just one dataset, and providing new insights on the way to new hypotheses. We now better explain that we are not looking at single connectomes in isolation and then deriving conclusions, but instead using NeuroSC to compare across 9 EM datasets. We better explain how the tools in NeuroSC, including C-PHATE, enabled comparisons across these multiple connectomes to identify apparent differences in neuronal relationships. We then explain that by using NeuroSC, we could examine these variations in neuronal relationships at the level of individual, cell biological differences of neuronal morphologies between the developmental datasets. This could be due, as pointed by the reviewer, to differences due to development, or just differences between individual animals. In the case of AVF, that features are absent in all early specimens, then arise and persist in all specimens after a certain time point, which lead us to hypothesize they result from a developmental event. Because the segmented objects in NeuroSC are linked to neuronal identities, we are also able to cross reference our observations from the EM datasets with information in other datasets and the literature. In the specific case of postembryonic development of AVF outgrowth, we can now tie the knowledge, from developmental lineage information and molecular profiles, that AVF is a postembryonically born neuron (Sulston et al. 1977, Sun et al 2022, Poole et al 2024, wormatlas.org) to the outgrowth dynamics of its neurites using the postembryonic EM datasets. Our findings using  NeuroSC provide a proof of concept of the utility of the resource and extended our understanding of how the outgrowth of this neuron affects the relationships between the neural circuits in the nerve ring.

      Comment

      R1.6. Given that recent studies have also quantified contact area between neurons across multiple connectomes (Cook et al., Current Biology, 2023; Yim et al., Nature Communications, 2024), and that the authors use a slightly different approach to quantify contact area, a direct comparison between contact area values obtained in this study with prior studies seems appropriate.

      We acknowledge that there are multiple different approaches to calculate adjacencies. In the papers cited above, there are 3 different algorithms used:

      (1) Brittin 2019 (python parse Track EM, boundary thresholds), used in Cook et al 2023, Moyle 2021, and this study).

      (2) Witvliet 2021 (Matlab 2D masks), used in Cook et al 2023.

      (3) Yim 2024 (3D masks), used in Yim et al 2024.

      To briefly describe the different approaches, and the methods we chose for this paper:

      Algorithm 1 (used in this study) defines adjacency based on distances between boundary points in TrakEM2 segmentations, allowing threshold tuning to accommodate differences in resolution and image quality across datasets—an important feature for consistent cross-dataset comparisons.

      Algorithm 2 infers contact via morphological dilation of VAST segmentations, identifying adjacency through overlapping expanded boundaries. 

      Algorithm 3 uses voxelwise contact detection with directional surface area measurements and normalization to account for dataset size differences. 

      In NeuroSC, we use algorithm 1, mostly because we had tested the rigor of this method in (Moyle et al. 2021), where we have shown that results were robust across a range of thresholds. This flexibility enables tailored application across datasets of varying quality and scale, critical for NeuroSC’s mission of curating data sets across differing methodologies to allow for direct relationship comparisons. We detail the methodology for defining thresholds for each dataset in methods section lines 492-521, defined in Supplementary table 1. Another difference between our analysis and the previously cited work is that for our analysis we also chose to include all individually resolved neurons, including post-embryonic cells, without collapsing them into left/right or dorsal/ventral symmetry classes. In this way our approach retains the full cellular resolution of the nervous system. 

      Comment

      Neuroglancer is not mentioned at all in the manuscript, despite it being a very similar and widely accepted platform for vEM data visualization across model organisms. An explicit comparison of NeuroSC and Neuroglancer would be appropriate, given the similarity of the tools. Currently, published C. elegans data (Witvliet et al., 2021; Yim et al., 2024) use Neuroglancer-based viewers, and directly comparing NeuroSC and highlighting its strengths relative to Neuroglancer would strengthen the paper.

      In the original manuscript we had not mentioned tools like Neuroglancer because we envisioned them as distinct, in intended use and output, from NeuroSC. But, as explained in R1.2 comment, in the revised version we have included a section in the Introduction lines 98-108 and in the Discussion (lines 369- 417) that compares these types of web-based tools and highlights synergies. 

      Comment

      Assigning shorthand names to strata, such as "shallow reflex circuit" (page 4, line 172), may oversimplify this group of neurons. Either more detailed support for shorthand names of C-PHATE modules should be included, or less speculative names for strata should be used.

      We appreciate this comment and understand that the original language used in the manuscript to describe strata categorizations may run the risk of oversimplification. We have now clarified the text to communicate that: 1) Strata are labeled by numbers (Strata 1, Strata 2, Strata 3 and Strata 4), rather than functional features of the neurons forming part of the strata, and that 2) the assignment of ‘strata’ is just one level of classification available via DC/CPHATE (as explained below). 

      To be sure, we have observed and published (Moyle et. al. Nature 2021) that within a given stratum, many neurons share the functional identities that we have used as summary descriptors for the strata (eg, shallow reflex circuits for Stratum 1; sensory and integrative circuits in Strata 3 and Strata 4; command interneurons in Strata 2, etc). However, those cell types are not the only members of the strata. We have adjusted the language in lines 197-204 to reflect this more clearly. “Stratum 1, which contains most neurons contributing to shallow reflex circuits that control aversive head movements in response to noxious stimuli, displayed the fewest changes among the developmental connectomes (Figure 3B–F; Supplementary Table 3). In contrast, C. elegans exhibit tractable behaviors that adapt to changing environmental conditions (Flavell et al., 2020). Strata 3 and 4 contain most neurons involved in circuits associated with such learned behaviors, including mechano- and thermo-sensation. This is reflected in Strata 3 and 4 showing the most change in neuronal relationships across postembryonic development.“

      Comment

      The authors state that NeuroSC can be applied to other model organisms. Since model organisms with greater neuron numbers include more individual neurons per cell class, the authors should support this by quantitatively demonstrating how DC/C-PHATE relationships correlate with shared functional roles among C. elegans neurons.

      We now clarify in the manuscript that, like in other organisms, C. elegans neurons are also grouped into functional classes with shared characteristics. In the context of the cylindrical nerve ring of the animal, these neuronal classes are sometimes bilaterally symmetric (forming left-right pairs), four-fold symmetric and six-fold symmetric. We now explain in the discussion that the DC/CPHATE analyses group these neuron classes and their relationships (lines 442-451). In the specific section mentioned by the reviewer, we now also add new text to contextualize this concept and how it might relate to the possible use of these tools in organisms with larger nervous systems: ‘However, our previous work has demonstrated that DC/CPHATE clustering of C. elegans neurons consistently pulls out clusters of shared neuron classes and shared functional roles Moyle et al. (2021). Building on this foundation, we envision applying similar clustering approaches to larger connectomes, aiming to identify classes and functionally related neuronal groups in more complex nervous systems. We suggest that contact profiles, along with neuron morphologies and synaptic partners, can act as ‘fingerprints’ for individual neurons and neuron classes. These ‘fingerprints’ can be aligned across animals of the same species to create identities for neurons. Frameworks for systematic connectomics analysis in tractable model systems such as C. elegans are critical in laying a foundation for future analyses in other organisms with up to a billion-fold increase in neurons (Toga et al., 2012).’

      Comment

      Lack of surface smoothing in NeuroSC leads to processes sometimes appearing to have gaps, which could be remedied by smoothing with a surface mesh. 

      We thank the reviewer for the suggestion, and understand the visibility of gaps in certain neuron processes can be distracting. But this was an intentional choice, with our main goal being to show the most accurate representation of the available data segmentation and avoid any rendering interpretations. In this way, we render the data with the highest fidelity we can and as close as possible to the ground truth of the EM segmentation. We have added language to describe this in the methods, lines 490-491, and in Figure legend 5b.

      Comment

      Toggling between time points while maintaining the same neurons and contact area in NeuroSC is a really valuable feature. The tool would be improved even more by extending this feature to synapses, specifically by allowing the user to add an entire group of synapses to the viewer at once (e.g. "all synapses between AIM and PVQ"), and to keep this synapse group invariant when toggling between developmental stages.

      We thank the reviewer for this suggestion. In response we have now implemented a new feature to ‘clone’ a rendered scene across time while preserving the original elements to ease comparisons. Once the user has rendered a scene, they can use the in-viewer developmental slider to clone the renderings and assigned colors, but display the renderings of the newly selected timepoint. These renderings populate a new window tab which can be dragged to align developmental stage windows side by side. We have added a sentence to account for this in lines 315-317 and to the legend of supplemental Figure S11. 

      Reviewer #2 (Public review)

      Comment

      The ability to visualize the data from both a connectomics and contactomics perspective across developmental time has significant power. The original C. elegans connectome (White et al., 1986) presented their circuits as line drawings with chemical and electrical synapses indicated through arrows and bars. While these line drawings remain incredibly useful, they were also necessary simplifications for a 2D publication and they lack details of the complex architecture seen within each EM image. Koonce et al take advantage of segmented image data of each neuronal process within the nerve ring to create a web interface where users can visualize 3D models for their neuron of choice. The C-PHATE visualization allows users to explore similarities among different neurons in terms of adjacency and then go directly to the 3D model for these neurons. The 3D models it generates are beautiful and will likely be showing up in many future presentations and publications. The tool doesn't require any additional downloading and is open source.

      We thank that reviewer for this positive assessment of our work.

      Comment

      While it's impossible to create one tool that will satisfy all potential users, I found myself wanting to have numbers associated with the data. For example, knowing the number of connections or the total surface area of contacts between individual neurons wasn't possible through the viewer, which limits the utility of taking deep analytical dives. While connectivity data is readily accessible through other interfaces such as Nemanode and WormWiring, a more thorough integration may be helpful to some users.

      We thank the reviewer for this feedback and in response have now implemented displays with quantitative information in NeuroSC. Now, upon hovering over a contact patch or synapse, the user will see the quantitative data of the relationship. For contact patches, you will see the total area shared between two neurons in that dataset. On hovering over a synapse, you will see how many synapses there are in total with the same members and throughout the dataset. We agree that this improves user analyses, (see also R1.3 response).

      Comment

      There were several issues with the user interface that made it a bit clunky to use. For example, as I added additional neurons to the filter search box, the loading time got longer and longer. I ran an experiment uploading all of the amphid neurons, one pair at a time. Each additional neuron pair added an additional 5-10 seconds to the loading. By the time I got to the last pair, it took over a minute to load. Issues like these, some of which may be unavoidable given the size of the data, could be conveyed through better documentation. I did not find the tutorial very helpful and the supplementary movies lacked any voiceover, so it wasn't always clear what they were trying to show.

      We appreciate that some of the more complex models can take a while to load. One of our core goals is to keep the high resolution of our models to most accurately represent the EM data, so we had to compromise between resolution and loading times. But to address this concern we have now added a ‘loading’ prompt that reassures the user when there is a wait. We also added, as suggested, text guidance throughout all of the supplemental videos (Supplemental Videos 1-4).

      Reviewer #3 (Public review)

      Comment

      A web-based app, NeuroSC, that individual researchers can use to interrogate the structure and organization of the C. elegans nerve ring across development In the opinion of this reviewer, only minor revisions are required.

      We thank that reviewer for this positive assessment of our work.

      Comment

      Contact is defined by length, why not contact area? How are these normalized for changes in the overall dimensions of neurons during development?

      To clarify our methodology: the adjacency algorithm that we use generates a 2D adjacency profile by summing the number of adjacent boundary points per EM section, which are then summed across all EM z slices.

      Contact area can be derived by multiplying the adjacency length in each slice by pixel resolution and z-thickness. Prompted by the reviewer we have now also calculated and display contact surface areas, along with their ranks among all contact relationships for a given neuron. These can be inspected directly via the interface by clicking on a rendered cell or contact patch (Figure S5 and lines 308-312). We believe these additional surface area metrics enhance the interpretability and utility of the viewer.

      We apply normalization at the level of the adjacency threshold to account for dataset-specific differences such as contrast, boundary definition, and age-related changes in neuropil packing density. This normalization is applied before running the adjacency algorithm. We do not normalize by individual neuron size, as the contact data are intended to reflect relational differences between neurons, rather than absolute morphological scaling. In fact, our addition of a scale-spheroid within each rendered model emphasizes the large increase in spatial scale that the nerve ring experiences during larval growth.  

      Comment

      Figure 1, C&D, explanation unclear for how the adjacency matrix is correlated with C-Phate schematic in D.

      We thank the reviewer for the comment and have clarified this section by adding greater detail to the explanation of how an adjacency matrix is computed (lines 149-155), as well as a description now in the figure legend 1C. Additionally, we revised Figure 1C and D to simplify neuron representations/colors and to simplify the adjacency heat map gradient. We also extended the area of contact between neurons on Figure 1C to better reflect what would be considered a “contact”. Lastly, in the figure, we changed the color and placement for the z plane arrow and label from black to white, to make it more visible, to highlight the method of computing adjacency for each z slice. 

      Comment

      Figure 4, panels F & G, unclear why AVF is shown in panel G (L3) but not panel F (L1). Explanation (see below) should be provided earlier, i.e., AVF is not generated until the end of the L1.

      We have now clarified this important point by adding labels to Figure 4 panels F and G, ‘Pre-AVF outgrowth’ and ‘Post-AVF outgrowth’ respectively. Briefly, the point is that AVF grows into the nerve ring after the L2 stage, and that is why it is absent in panel F (L1 stage, now with the label ‘Pre-AVF outgrowth’).  

      Comment

      Line 146 What is the justification for the statement: "By end of Larval Stage 1 (L1), neuronal differentiation has concluded...."? This statement is confusing since this sentence also states that "90% of neurons in the neuropil...have entered the nerve ring..." which would suggest that at least 10% additional NR neurons have NOT fully differentiated.

      We have fixed this sentence in the text. Now the sentence reads ‘By Larval stage 1 (L1) 90% of the neurons in the neuropil (161 neurons out of the 181 neurons) have grown into the nerve ring and adopted characteristic morphologies and positions. 

      Lines 171-175 What is meant by the statement that "degree of these changes mapped onto...plasticity? What are examples of "behavioral plasticity?"

      We have added the following new lines of text (lines 200-204) and now additionally cite a review discussing C. elegans behaviors to clarify and give context to behavioral plasticity. ‘C. elegans exhibit tractable behaviors which can adapt due to changing environmental conditions  (Flavell et. al. Genetics 2020). Strata 3 and 4 contain most neurons belonging to circuits associated with such learned behaviors, including chemo, mechano and thermo sensation. This is seemingly reflected by strata 3 and 4 harboring the most readily recognized set of changes in neuronal relationships across postembryonic development.’  

      Comment

      Lines 189-190 The meaning of this sentence is unclear, "The logic in....merge events."

      This sentence has been deleted and we have instead refocused our descriptions of C-PHATES comparisons by neuronal clustering trajectories and cluster members (rather than iterations).

      Comment

      Lines 193-208 This section reports varying levels of convergence across larval development in C-Phate maps for the interneurons AIML and PVQL. Iterations leading to convergence varied: 16 (L1), 14 (L2), 22 (L3), 20 (l4), 14 (adult). The authors suggest that these differences are biologically significant and reflect the reorganization of AIML and PVQL contact relationships especially between the L4 and adult. Are these differences in iterations significant?

      We agree this could be confusing and instead of focusing on comparing the iteration at which each merging event occurs, we now focus on examining the differences in members of clusters, before and after the merge event. Cluster membership is easier to interpret than the differences in the number of DC iterations (lines 224-229).

      Lines 240-241 States that AVF neurons "terminally differentiate in the embryo" which is not correct. AVF neurons are generated from neuronal precursors (P0 and P1) at the end of the L1 stage which accounts for their outgrowth into the NR during the L2 stage. 

      We thank the reviewer for the correction and have edited the text to read: ‘AVF neurons are generated from neuronal precursors (P0 and P1) at the end of the L1 stage (Sulston et al. (1983); Sun and Hobert (2023); Poole et al. (2024); Hall and Altun (2008); Sulston and Horvitz (1977). AVF neurons do not grow into the nerve ring until the L2 stage, and continue to grow until the Adult stage (lines 261-266).’

      Comment

      Lines 289-315. A detailed and highly technical description of website architecture would seem more appropriate for the Methods section.

      We agree and have moved this section to the methods as suggested (lines 663-690).

      Comment

      Line 307 "source data is" should be "source data are"

      Thank you- we have fixed this grammatical error.

      Comment

      Line 324 "circuits identities" should be "circuit identity".

      Thank you- we have fixed this grammatical error.

      Comment

      Trademark/copyright conflict with these sites? https://compumedicsneuroscan.com/about/ https://www.neuroscanai.com/

      We thank the reviewer for drawing our attention to this. To avoid potential conflicts, we have proactively altered the name to NeuroSC throughout the paper.

    1. eLife Assessment

      This valuable study reports convincing evidence about associations between 35 polygenic indices (PGIs) for social, behavioral, and psychological traits, along with some non-fatal health conditions (e.g., BMI) and all-cause mortality in data from Finnish population-based surveys and a twin cohort linked with administrative registers. PGIs for education, depression, alcohol use, smoking, BMI, and self-rated health showed the strongest associations with all-cause mortality, on the order of ~10% increment in risk per PGI standard deviation. Effect sizes from twin-difference analyses tended to be slightly larger than the effect sizes from population cohorts, opposite the pattern generally observed when testing PGI associations with their target phenotypes and supporting robustness of findings to confounding by population stratification.

    2. Reviewer #1 (Public review):

      Lahtinen et al. evaluated the association between polygenic scores and mortality. This question has been intensely studied (Sakaue 2020 Nature Medicine, Jukarainen 2022 Nature Medicine, Argentieri 2025 Nature Medicine), where most studies use PRS as an instrument to attribute death to different causes. The presented study focuses on polygenic scores of non-fatal outcomes and separates the cause of death into "external" and "internal". The majority of the results are descriptive, and the data doesn't have the power to distinguish effect sizes of the interesting comparisons: (1) differences between external vs. internal (2) differences between PGI effect and measured phenotype. I have two main comments:

      (1) The authors should clarify whether the p-value reported in the text will remain significant after multiple testing adjustment. Some of the large effects might be significant; for example, Figure 2C (note that the small prediction accuracy of PGI in older age groups has been extensively studied, see Jiang, Holmes, and McVean, 2021, PLoS Genetics).

      (2) The authors might check if PGI+Phenotype has improved performance over Phenotype only. This is similar to Model 2 in Table 1, but slightly different.

    3. Reviewer #2 (Public review):

      Summary:

      This study provides a comprehensive evaluation of the association between polygenic indices (PGIs) for 35 lifestyle and behavioral traits and all-cause mortality, using data from Finnish population- and family-based cohorts. The analysis was stratified by sex, cause of death (natural vs. external), age at death, and participants' educational attainment. Additional analyses focused on the six most predictive PGIs, examining their independent associations after mutual adjustment and adjustment for corresponding directly measured baseline risk factors.

      Strengths:

      Large sample size with long-term follow-up.

      Use of both population- and family-based analytical approaches to evaluate associations.

      Weaknesses:

      It is unclear whether the PGIs used for each trait represent the most current or optimal versions based on the latest GWAS data.

      If the Finnish data used in this study also contributed to the development of some of the PGIs, there is a risk of overestimating their associations with mortality due to overfitting or "double-dipping." Similar inflation of effect sizes has been observed in studies using the UK Biobank, which is widely used for PGI construction.

    1. eLife Assessment

      In this valuable study, the authors developed long-term imaging tools to simultaneously monitor the temporal and spatial dynamics of excitatory and inhibitory synapses and reported that excitatory and inhibitory synapses need to develop synergistically during synaptogenesis to maintain balance. While the analysis and quantification of the imaging data are incomplete, there is convincing evidence that the developed tools are feasible. If these tools can function stably in vivo, their applications will be much broader.

    2. Reviewer #1 (Public review):

      Summary:

      By imaging the dynamics of synaptic proteins in cultured neurons, this study presents significant findings regarding the dynamics of excitatory and inhibitory synaptic proteins during development. The evidence shows that the ratios of excitatory and inhibitory synaptic proteins are stable during synapse development. This discovery advances our understanding of the complex mechanisms governing synapse formation. The strength of the evidence is robust, as it is supported by a combination of biological assays and endogenous labeling.

      Strengths:

      This research sheds light on the dynamics of the excitatory and inhibitory synapses during development. It is crucial to understand that while excitatory synapses and inhibitory synapses are developed independently, the ratio of their number is relatively stable during development, maintaining a stable excitatory/inhibitory ratio.

      Important findings and implications in the research include:

      (1) Persistent Synapse Dynamics: Excitatory and inhibitory synapses remain highly dynamic even in mature neurons (DIV12-14), challenging the dogma that synaptic structures are stable after the synaptogenesis stage.

      (2) Maintained E/I Balance: Despite ongoing synapse turnover (formation/elimination) and presynaptic terminal reduction, the overall density and ratio of excitatory-to-inhibitory synapses remain relatively stable during circuit maturation (Figure 7).

      (3) Developmental Shifts: While presynaptic compartments decrease over time, postsynaptic sites increase, suggesting independent regulation of pre- and postsynaptic elements within a stable E/I framework.

      Weaknesses:

      This study focuses on specific synaptic proteins within synapses, which may not fully represent the dynamics of other synaptic machinery; also, whether similar observations exist in vivo is still unknown. Further research is needed to explore the implications of these findings in more complex neuronal environments.

    3. Reviewer #2 (Public review):

      Summary:

      The Garbett et al. identified a critical need to begin to understand the interplay between the assembly, maturation, and elimination of excitatory and inhibitory synapses. They also detail the lack of reliable tools to address this gap in knowledge. Here, the authors developed synaptic reporters expressed by lentiviruses (mClover3-Homer1c, HaloTag-Syb2, and tdTomato-Gephyrin). They combined these reporters with resonance scanning confocal imaging to measure synapses over a 15-hour period during neuron development and in mature neurons in primary hippocampal cultures. Using these reporters in the same neuron, the authors compared the ratios of postsynaptic excitatory and inhibitory specializations that co-localize with presynaptic terminals during development and in mature neurons and found that they are stable across time points. Finally, the authors developed CRISPR/Cas9 tools (TKIT) to knock-in endogenous fluorescent tags (GFP/tdTomato-Gephyrin) or epitope tags (HA-Bassoon and HA-Homer1) to begin to study synapse dynamics using endogenous proteins. I believe this paper highlights an important gap in knowledge and begins to offer methodologies to determine the dynamic coordination between excitatory and inhibitory synapses.

      Strengths:

      (1) The experiments are well-designed and carefully controlled.

      (2) The authors carefully validated the reporter and TKIT constructs.

      (3) The authors provide strong proof-of-principle for the use of the reporter constructs to track synapse formation, maintenance, and elimination over a 15-hour period.

      (4) Ingenious use of technologies (reporters, TKIT, and resonance scanning confocal microscopy) to develop a platform for future studies of synapse dynamics.

      (5) Strong evidence supporting that the ratio of excitatory and inhibitory synapses (those that oppose syb2) stays constant through development.

      Weaknesses:

      Overall, this is a well-executed study that develops tools to simultaneously image excitatory and inhibitory synapse dynamics and represents an important first step to address the fundamental question regarding the coordination between these two types of synapses.

      Minor weaknesses of the manuscript include:

      (1) The lack of a characterization of endogenous Homer1-positive excitatory synapses using TKIT.

      (2) Discussion about other approaches to study excitatory and inhibitory synapses using endogenous proteins (e.g., intrabodies - FingR or nanobodies) should be included.

      (3) The activity state of a neuron and/or a synapse might alter the dynamic properties (formation, maintenance, and/or elimination). A discussion on whether the overexpression of Homer1 and/or gephyrin might alter synapse/neuron activity would provide greater interpretability of the results. A discussion of the potential limitations and benefits of the reporter and TKIT approaches would be beneficial.

      (4) A description and interpretation of the computational approach to calculate particle tracking would be helpful. I found that particle tracking figures, while elegant, are difficult to interpret.

    4. Reviewer #3 (Public review):

      In the present study, the authors describe the development of new tools and imaging strategies to assess the concomitant development of excitatory and inhibitory synapses in dissociated neuron cultures. To this end, they generate fluorescently tagged constructs of excitatory and inhibitory synapse marker proteins using either conventional overexpression or CRISPR-based strategies. They then image these marker proteins over a timespan of 15 hours to assess synaptic dynamics at different developmental timepoints. Based on their data, they conclude that excitatory and inhibitory synapse development occur in concert to maintain a functional balance despite individual synapse turnover.

      Overall, this study addresses an interesting question, i.e., the interplay between the development of excitatory and inhibitory synapses, which has important implications, particularly for neurodevelopmental disorders in which the balance of excitation and inhibition is disrupted. The experiments are technically solid and well-executed, and the individual images are highly compelling.

      However, a number of aspects remain to be addressed in order for the study to support the claims made by the authors. First, the novelty aspect of the development of the fluorescently tagged synaptic proteins is unclear, since reporters of this nature are in routine use in many labs. Second, the analysis of the acquired images often seems incomplete, with only example images but no quantification shown, or the distinction between spatial and temporal dynamics appearing unclear. Third, given this incomplete analysis, the interpretations of the authors are not always convincingly supported by the data presented. In conclusion, substantial improvements are required to render the main messages of the study clear and compelling.

    1. eLife Assessment

      This paper presents valuable findings on the processing of sound mixtures in the auditory cortex of ferrets, a species widely used for studies of auditory processing. Using the convenient and relatively high-resolution method of functional ultrasound imaging, the authors provide convincing evidence that background noise invariance emerges across the auditory cortical processing hierarchy. They also draw informative comparisons with previously published fMRI data obtained in humans. This work will be of interest to researchers studying the auditory cortex and the neural mechanisms underlying auditory scene analysis and hearing in noise.

    2. Reviewer #1 (Public review):

      This is a very interesting paper addressing the hierarchical nature of the mammalian auditory system. The authors use an unconventional technique to assess brain responses -- functional ultrasound imaging (fUSI). This measures blood volume in cortex at a relatively high spatial resolution. They present dynamic and stationary sounds in isolation and together, and show that the effect of the stationary sounds (relative to the dynamic sounds) on blood volume measurements decreases as one ascends the auditory hierarchy. Since the dynamic/stationary nature of sounds is related to their perception as foreground/background sounds, this suggests that neurons in higher levels of the cortex may be increasingly invariant to background sounds.

      The study is interesting, well conducted and well written. In the revised manuscript, the authors have addressed all the points I raised in my review.

    3. Reviewer #2 (Public review):

      Summary:

      Noise invariance is an essential computation in sensory systems for stable perception across a wide range of contexts. In this paper, Landemard et al. perform functional ultrasound imaging across primary, secondary and tertiary auditory cortex in ferrets to uncover the mesoscale organization of background invariance in auditory cortex. Consistent with previous work, they find that background invariance increases throughout the cortical hierarchy. Importantly, they find that background invariance is largely explained by progressive changes in spectro-temporal tuning across cortical stations which are biased towards foreground sound features. To test if these results are broadly relevant, they then re-analyze human fMRI data and find that spectro-temporal tuning fails to explain background invariance in human auditory cortex.

      Strengths:

      (1) Novelty of approach: Though the authors have published on this technique previously, functional ultrasound imaging offers unprecedented temporal and spatial resolution in a species where large-scale calcium imaging is not possible and electrophysiological mapping would take weeks or months. Combining mesoscale imaging with a clever stimulus paradigm, they address a fundamental question in sensory coding.

      (2) Quantification and execution: the results are generally clear and well supported by statistical quantification.

      (3) Elegance of modeling: The spectrotemporal model presented here is explained clearly and most importantly, provides a compelling framework for understanding differences in background invariance across cortical areas.

      Comments on revised version:

      The authors have addressed all of my previous concerns and their publicly shared data is easy to view, this is a nice contribution to the field.

    4. Reviewer #3 (Public review):

      This paper investigates invariance to natural background noise in the auditory cortex of ferrets and humans. The authors first replicate, in ferrets, a finding from human neuroimaging showing that invariance to background noise increases along the cortical hierarchy (i.e. from primary to non-primary auditory cortex). Next, the authors ask whether this pattern of invariance could be explained by differences in tuning to low-level acoustic features across primary and non-primary regions. The authors conclude that this tuning can explain the spatial organization of background invariance in ferrets, but not in humans. The conclusions of the paper are well supported by the data.

      The paper is very straightforwardly written, with a generally clear presentation including well-designed and visually appealing figures. Not only does this paper provide an important replication in a non-human animal model commonly used in auditory neuroscience, but also it extends the original findings in three ways. First, the authors reveal a more fine-grained gradient of background invariance by showing that background invariance increases across primary, secondary and tertiary cortical regions. Second, the authors address a potential mechanism that might underlie this pattern of invariance by considering whether differences in tuning to frequency and spectrotemporal modulations across regions could account for the observed pattern of invariance. The spectrotemporal modulation encoding model used here is a well-established approach in auditory neuroscience and seems appropriate for exploring potential mechanisms underlying invariance in auditory cortex, particularly in ferrets. Third, the authors provide a more complete picture of invariance by additionally analyzing foreground invariance, a complementary measure not explored in the original study.

      Comments on author revisions:

      The authors have thoroughly addressed the concerns raised in my initial review.

    5. Author response:

      The following is the authors’ response to the original reviews.\

      Reviewer #1(Public review):

      (1) Changes in blood volume due to brain activity are indirectly related to neuronal responses. The exact relationship is not clear, however, we do know two things for certain: (a) each measurable unit of blood volume change depends on the response of hundreds or thousands of neurons, and (b) the time course of the volume changes are slow compared to the potential time course of the underlying neuronal responses. Both of these mean that important variability in neuronal responses will be averaged out when measuring blood changes. For example, if two neighbouring neurons have opposite responses to a given stimulus, this will produce opposite changes in blood volume, which will cancel each other out in the blood volume measurement due to (a). This is important in the present study because blood volume changes are implicitly being used as a measure of coding in the underlying neuronal population. The authors need to acknowledge that this is a coarse measure of neuronal responses and that important aspects of neuronal responses may be missing from the blood volume measure.

      The reviewer is correct: we do not measure neuronal firing but use blood volume as a proxy for bulk local neuronal activity, which does not capture the richness of single neuron responses. This is why the paper focuses on large-scale spatial representations as well as cross-species comparison. For this latter purpose, fMRI responses are on par with our fUSI data, with both neuroimaging techniques showing the same weakness. We have now added this point to the discussion: 

      “Second, we used blood volume as a proxy for local neuronal activity. Thus, our signal ignores any heterogeneity that might exist at the level of local neuronal populations. However, our main findings are related to the large-scale organization of cortical responses and how they relate to those of humans. For this purpose, the functional spatial resolution of our signal, driven by the spatial resolution of neurovascular coupling, should be adapted. In addition, using hemodynamic signals provides a much better comparison with human fMRI data, where the same limitations are present.”

      (2) More importantly for the present study, however, the effect of (b) is that any rapid changes in the response of a single neuron will be cancelled out by temporal averaging. Imagine a neuron whose response is transient, consisting of rapid excitation followed by rapid inhibition. Temporal averaging of these two responses will tend to cancel out both of them. As a result, blood volume measurements will tend to smooth out any fast, dynamic responses in the underlying neuronal population. In the present study, this temporal averaging is likely to be particularly important because the authors are comparing responses to dynamic (nonstationary) stimuli with responses to more constant stimuli. To a first approximation, neuronal responses to dynamic stimuli are themselves dynamic, and responses to constant stimuli are themselves constant. Therefore, the averaging will mean that the responses to dynamic stimuli are suppressed relative to the real responses in the underlying neurons, whereas the responses to constant stimuli are more veridical. On top of this, temporal following rates tend to decrease as one ascends the auditory hierarchy, meaning that the comparison between dynamic and stationary responses will be differently affected in different brain areas. As a result, the dynamic/stationary balance is expected to change as you ascend the hierarchy, and I would expect this to directly affect the results observed in this study.

      It is not trivial to extrapolate from what we know about temporal following in the cortex to know exactly what the expected effect would be on the authors' results. As a first-pass control, I would strongly suggest incorporating into the authors' filterbank model a range of realistic temporal following rates (decreasing at higher levels), and spatially and temporally average these responses to get modelled cerebral blood flow measurements. I would want to know whether this model showed similar effects as in Figure 2. From my guess about what this model would show, I think it would not predict the effects shown by the authors in Figure 2. Nevertheless, this is an important issue to address and to provide control for.

      We understand the reviewer’s concern about potential differences in response dynamics in stationary vs non-stationary sounds. It seems that the reviewer is concerned that responses to foregrounds may be suppressed in non-primary fields because foregrounds are not stationary, and non-primary regions could struggle to track and respond to these sounds. Nevertheless, we observed the contrary, with non-primary regions overrepresenting non-stationary (dynamic) sounds, over stationary ones. For this reason, we are inclined to think that this explanation cannot falsify our findings. 

      We understand the comment that temporal following rates might differ across regions in the auditory hierarchy and agree. In fact, we do show that tuning to temporal rates differs across regions and partly explains the differences in background invariance we observe. In this regard, we think the reviewer’s suggestion is already implemented by our spectrotemporal model, which incorporates the full range of realistic temporal following rates (up to 128 Hz). The temporal averaging is done as we take the output of the model (which varies continuously through time) and average it in the same window as we used for fUSI data. When we fit this model to the ferret data, we find that voxels in non-primary regions, especially VP (tertiary auditory cortex), tend to be more tuned to low temporal rates (Figure 2F, G), and that background invariance is stronger in voxels tuned to low rates. This is, however, not true in humans, suggesting that background invariance in humans relies on different computational mechanisms. We have added a sentence to clarify this: “The model included a range of realistic temporal rates and this axis was the most informative to discriminate foregrounds from backgrounds.”

      (3) I do not agree with the equivalence that the authors draw between the statistical stationarity of sounds and their classification as foreground or background sounds. It is true that, in a common foreground/background situation - speech against a background of white noise - the foreground is non-stationary and the background is stationary. However, it is easy to come up with examples where this relationship is reversed. For example, a continuous pure tone is perfectly stationary, but will be perceived as a foreground sound if played loudly. Background music may be very non-stationary but still easily ignored as a background sound when listening to overlaid speech. Ultimately, the foreground/background distinction is a perceptual one that is not exclusively determined by physical characteristics of the sounds, and certainly not by a simple measure of stationarity. I understand that the use of foreground/background in the present study increases the likely reach of the paper, but I don't think it is appropriate to use this subjective/imprecise terminology in the results section of the paper.

      We appreciate the reviewer’s comment that the classification of our sounds into foregrounds and backgrounds is not verified by any perceptual experiments. We use those terms to be consistent with the literature (McWalter and McDermott, 2018; McWalter and McDermott, 2019), including the paper we derived this definition from (Kell et al., 2019). These terms are widely used in studies where no perceptual or behavioral experiments are included, and even when animals are anesthetized. We have clarified and justified this choice in the beginning of the Results section:

      “We used three types of stimuli: foregrounds, backgrounds, and combinations of those. We use those terms to refer to sounds differing in their stationarity, under the assumption that stationary sounds carry less information than non-stationary sounds, and are thus typically ignored.”

      We have also added a paragraph in the discussion to emphasize the limits of this definition:

      “First, this study defined foregrounds and backgrounds solely based on their acoustic stationarity, rather than perceptual judgments. This choice allowed us to isolate the contribution of acoustic factors in a simplified setting. Within this controlled framework, we show that acoustic features of foreground and background sounds drive their separation in the brain and the hierarchical extraction of foreground sound features.”

      (4) Related to the above, I think further caveats need to be acknowledged in the study. We do not know what sounds are perceived as foreground or background sounds by ferrets, or indeed whether they make this distinction reliably to the degree that humans do. Furthermore, the individual sounds used here have not been tested for their foreground/background-ness. Thus, the analysis relies on two logical jumps - first, that the stationarity of these sounds predicts their foreground/background perception in humans, and second, that this perceptual distinction is similar in ferrets and humans. I don't think it is known to what degree these jumps are justified. These issues do not directly affect the results, but I think it is essential to address these issues in the Discussion, because they are potentially major caveats to our understanding of the work.

      We agree with the reviewer that the foreground-background distinction might be different in ferrets. In anticipation of that issue, we had enriched the sound set with more ecologically relevant sounds, such as ferret and other animal vocalizations. Nevertheless, we have emphasized this limitation in addition to the limitation of our definition of foregrounds and backgrounds in the discussion: 

      “In addition, most of the sounds included in our study likely have more relevance for humans compared to ferrets (see table \ref{tbl1}). Despite including ferret vocalizations and environmental sounds that are more ecologically relevant for ferrets, it is not clear whether ferrets would behaviorally categorize foregrounds and backgrounds as humans do. Examining how ferrets naturally orient or respond to foreground and background sounds under more ecologically valid conditions, potentially with free exploration or spontaneous listening paradigms, could help address this issue.”

      Reviewer #2(Public review);

      (1) Interpretation of the cerebral blood volume signal: While the results are compelling, more caution should be exercised by the authors in framing their results, given that they are measuring an indirect measure of neural activity, this is the difference between stating "CBV in area MEG was less background invariant than in higher areas" vs. saying "MEG was less background invariant than other areas". Beyond framing, the basic properties of the CBV signal should be better explored:

      a) Cortical vasculature is highly structured (e.g. Kirst et al.( 2020) Cell). One potential explanation for the results is simply differences in vasculature and blood flow between primary and secondary areas of auditory cortex, even if fUS is sensitive to changes in blood flow, changes in capillary beds, etc (Mace et al., 2011) Nat. Methods.. This concern could be addressed by either analyzing spontaneous fluctuations in the CBV signal during silent periods or computing a signal-to-noise ratio of voxels across areas across all sound types. This is especially important given the complex 3D geometry of gyri and sulci in the ferret brain.

      We agree with the reviewers that there could be differences in vasculature across subregions of the auditory cortex and note that this point would also be valid for the published human fMRI data. Nevertheless, even if small differences in vasculature were present, it is unlikely that they would affect our analyses and results, which are designed to be independent of local vascular density. First, we normalize the signal in each voxel using the silent periods, so that the absolute strength of the raw signal, or baseline blood volume in each voxel, is factored in our analysis. Second, we only focus on reliably responsive voxels in each region and do see comparable sound-evoked responses in all regions (Figure S2). Third, our analysis mostly relies on voxel-based correlation across sounds, which is independent of the mean and variance of the voxel responses. Differences in noise, measured through test-retest reliability, can affect values of correlation, which is why we used a noise-correction procedure. After this procedure, invariance does not depend on test-retest, and differences across regions are still seen when matching for test-retest (new  Figure S7). Thus, we believe that differences in vascular architecture across regions are unlikely to affect our results. We added this point in the Methods section when discussing the noise-correction:

      “After this correction, the differences we observed between brain regions were present regardless of voxels' test-retest reliability, or noise level (Figure S7). Thus, potential differences in vasculature across regions are unlikely to affect our results.”

      b) Figure 1 leaves the reader uncertain what exactly is being encoded by the CBV signal, as temporal responses to different stimuli look very similar in the examples shown. One possibility is that the CBV is an acoustic change signal. In that case, sounds that are farther apart in acoustic space from previous sounds would elicit larger responses, which is straightforward to test. Another possibility is that the fUS signal reflects time-varying features in the acoustic signal (e.g. the low-frequency envelope). This could be addressed by cross-correlating the stimulus envelope with fUS waveform. The third possibility, which the authors argue, is that the magnitude of the fUS signal encodes the stimulus ID. A better understanding of the justification for only looking at the fUS magnitude in a short time window (2-4.8 s re: stimulus onset) would increase my confidence in the results.

      We thank the reviewer for raising that point as it highlights that the layout of Figure 1 is misleading. While Figure 1B shows an example snippet of our sound streams, Figure 1D shows the average timecourse of CBV time-locked to a change in sound (foreground or background, isolated or in a mixture). This is the average across all voxels and sounds, aiming at illustrating the dynamics for the three broad categories. In Figure 1E however, we show the cross-validated cross-correlation of CBV across sounds (and different time lags). To obtain this, we compute for each voxel the response to each sound at each time lag, thus obtaining two vectors (size: number of sounds) per lag, one per repeat. Then, we correlate all these vectors across the two repeats, obtaining one cross-correlation matrix per voxel. We finally average these matrices across all voxels. The presence of red squares with high correlations demonstrates that the signal encodes sound identity, since CBV is more similar across two repeats of the same sound (e.g., in the foreground only matrix, 0-5 s vs 0-5 s), than two different sounds (0-5 s vs. 7-12 s). We modified the figure layout as well as the legend to improve clarity.

      (2) Interpretation of the human data: The authors acknowledge in the discussion that there are several differences between fMRI and fUS. The results would be more compelling if they performed a control analysis where they downsampled the Ferret fUS data spatially and temporally to match the resolution of fMRI and demonstrated that their ferret results hold with lower spatiotemporal resolution.

      We agree with the reviewer that the use of different techniques might come in the way of cross-species comparison. We already control for the temporal aspect by using the average of stimulus-evoked activity across time (note that due to scanner noise, sounds are presented cut into small pieces in the fMRI experiments). Regarding the spatial aspect, there are several things to consider. First, both species have brains of very different sizes, a factor that is conveniently compensated for by the higher spatial resolution of fUSI compared to fMRI (0.1 vs 2 mm). Downsampling to fMRI resolution would lead to having one voxel per region per slice, which is not feasible. We also summarize results with one value per region, which is a form of downsampling that is fairer across species. Furthermore, we believe that we already established in a previous study (Landemard et al, 2021 eLife) that fUSI and fMRI data are comparable signals. We indeed could predict human fMRI responses to most sounds from ferret fUSI responses to the same identical sounds. We clarified these points in the discussion:

      “In addition, fMRI has a worse spatial resolution than fUSI (here, 2 vs. 0.1 mm voxels). However, this difference in resolution compensates for the difference in brain size between humans and ferrets. In our previous work, we showed that a large fraction of cortical responses to natural sounds could be predicted from one species to the other using these methods (Landemard et al., 2021).”

      Reviewer #3 (Public review):

      As mentioned above, interpretation of the invariance analyses using predictions from the spectrotemporal modulation encoding model hinges on the model's ability to accurately predict neural responses. Although Figure S5 suggests the encoding model was generally able to predict voxel responses accurately, the authors note in the introduction that, in human auditory cortex, this kind of tuning can explain responses in primary areas but not in non-primary areas (Norman-Haignere & McDermott, PLOS Biol. 2018). Indeed, the prediction accuracy histograms in Figure  S5C suggest a slight difference in the model's ability to predict responses in primary versus non-primary voxels. Additional analyses should be done to a) determine whether the prediction accuracies are meaningfully different across regions and b) examine whether controlling for prediction accuracy across regions (i.e., subselecting voxels across regions with matched prediction accuracy) affects the outcomes of the invariance analyses.

      The reviewer is correct: the spectrotemporal model tends to perform less well in human non-primary cortex. We believe this does not contradict our results but goes in the same direction: while there is a gradient in invariance in both ferrets and humans, this gradient is predicted by the spectrotemporal model in ferrets, but not in humans (possibly indeed because predictions are less good in human non-primary auditory cortex). Regardless of the mechanism, this result points to a difference across species. In ferrets, we found a significantly better prediction accuracy in VP (p=0.001, permutation test) and no differences between MEG and dPEG (p=0.89). In humans, prediction accuracy was slightly higher in primary compared to non-primary auditory cortex, but this effect was not significant (p=0.076). In both species, when matching prediction accuracy between regions, the gradients in invariance were preserved. We have added these analyses to the manuscript (Figure S5).

      A related concern is the procedure used to train the encoding model. From the methods, it appears that the model may have been fit using responses to both isolated and mixture sounds. If so, this raises questions about the interpretability of the invariance analyses. In particular, fitting the model to all stimuli, including mixtures, may inflate the apparent ability of the model to "explain" invariance, since it is effectively trained on the phenomenon it is later evaluated on. Put another way, if a voxel exhibits invariance, and the model is trained to predict the voxel's responses to all types of stimuli (both isolated sounds and mixtures), then the model must also show invariance to the extent it can accurately predict voxel responses, making the result somewhat circular. A more informative approach would be to train the encoding model only on responses to isolated sounds (or even better, a completely independent set of sounds), as this would help clarify whether any observed invariance is emergent from the model (i.e., truly a result of low-level tuning to spectrotemporal features) or simply reflects what it was trained to reproduce.

      We thank the reviewer for this suggestion. We have run an additional prediction using only the sounds presented in isolation, which replicates our main results (new Figure S6). We have added this control to the manuscript:

      “Results were similar if the model was fit solely on isolated sounds, excluding mixtures from the training set (Figure S6).”

      Finally, the interpretation of the foreground invariance results remains somewhat unclear. In ferrets (Figure 2I), the authors report relatively little foreground invariance, whereas in humans (Figure 5G), most participants appear to show relatively high levels of foreground invariance in primary auditory cortex (around 0.6 or greater). However, the paper does not explicitly address these apparent crossspecies differences. Moreover, the findings in ferrets seem at odds with other recent work in ferrets (Hamersky et al. 2025 J. Neurosci.), which shows that background sounds tend to dominate responses to mixtures, suggesting a prevalence of foreground invariance at the neuronal level. Although this comparison comes with the caveat that the methods differ substantially from those used in the current study, given the contrast with the findings of this paper, further discussion would nonetheless be valuable to help contextualize the current findings and clarify how they relate to prior work.

      We thank the reviewer for this point. While we found a trend for higher background invariance than foreground invariance in ferret primary auditory cortex, this difference was not significant and many voxels exhibit similar levels of background and foreground invariance (for example in Figure 2D, G). Thus, we do not think our results are inconsistent with Hamersky et al., 2025, though we agree the bias towards background sounds is not as strong in our data. This might indeed reflect differences in methodology, both in the signal that is measured (blood volume vs spikes), and the sound presentation paradigm. Our timescales are much slower and likely reflect responses post-adaptation, which might not be as true for Hamersky et al. We have added this point to the discussion, as well as a comment on the difference between ferrets and humans in foreground invariance in primary auditory cortex:

      “In ferrets, primary auditory cortex has been found to over-represent backgrounds in mixtures compared to foregrounds (Hamersky et al., 2025). In contrast, we found a slight, non-significant bias towards foregrounds in primary regions. This difference could be driven by a difference in timescales, as we looked at slower timescales in which adaptation might be more present, reducing the strength of background encoding. In humans, we found a much smaller gap between background and foreground invariance in primary auditory cortex, which was not predicted by the spectrotemporal model. Additional, more closely controlled experiments would be needed to confirm and understand this species difference.”

      Reviewer #1 (Recommendations for the authors):

      (1) In the introduction, explain the relationship between background/foreground and stationarity/non-stationarity, and thus why stationary/nonstationary stimuli could be used to probe differences in background/foreground processing.

      We have added a sentence at the beginning of the results section to justify our choice (see public review).  

      (2) Avoid use of the background/foreground terminology in Results (and probably Methods).

      For consistency with previous literature, we decided to keep this terminology, though imperfect. We further justified our choice in the beginning of the Results section (see previous point).

      (3) In the Discussion, explain what the implications of the results are for background/foreground processing, and, importantly, highlight any caveats that result from stationarity not being a direct measure of background/foreground.

      We added a paragraph in the Discussion to highlight this point choice (see public review).

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1: Showing a silent period in the examples would help in understanding the fUS signal.

      In Figure 1D, we show the average timecourse of CBV time-locked to a change in sound (foreground or background, isolated or in a mixture). This is the average across all voxels and sounds. Thus, it would not be very informative to show an equivalent plot for a silent period, as it would look flat by definition. However, we updated the layout and legend of Figure 1 to make it clearer and avoid confusion.

      (2) "Responses were not homogenous" - would make more sense to say something like "responses were not spatially distributed".

      We removed these words which were indeed not necessary: “We found that reliable soundevoked responses were confined to the central part of ventral gyrus of the auditory cortex.”

      (3) Figure 2D: The maps shown in Figure 2D are difficult to understand for the noninitiated in fUS. At a minimum, labels should be added to indicate A-P, M-L, D-V. I cannot see the white square in the primary figure. An additional graphic would be helpful here to understand the geometry of the measurement.

      We thank the reviewer for pointing out that reading these images is indeed an acquired skill. We added an annotated image of anatomy with indications of main features to guide the reader in Figure 1. We also added missing white squares. 

      (4) Figure 2F: Can the authors better justify why the summary statistic is shown for all three areas, but the individual data only compares primary vs. higher order?`

      We now show individual data for all three areas.

      (5) More methods information is needed to understand how recordings were stitched across days. Was any statistical modeling used to factor out the influence of day on overall response levels?

      We simply concatenated voxels recorded across different sessions and days. The slices were sampled randomly to avoid any systematic effect. Because different slices were sampled in different sessions, any spatial structure spanning several slices is unlikely to be artefactual. For instance, the map of average responses in Figure 2A shows a high level of continuity of spatial patterns across slices. This indicates that this pattern reflects a true underlying organization rather than session-specific noise. It also shows that the overall response levels are not affected by the day or recording session. We added a section in the Methods (“Combining different recordings”) to clarify this point:

      “The whole dataset consisted of multiple slices, each recorded in a different recording session. Slices to image on a given day were chosen at random to avoid any systematic bias. Responses were consistent across neighboring slices recorded on different sessions, as shown by the maps of average responses (Figure 2A, Figure S2) where any spatial continuity across different slices must reflect a true underlying signal in the absence of common noise.”

      Reviewer #3 (Recommendations for the authors):

      (1) Figures:

      The figures are generally very well done and visually appealing. However, I have a few suggestions and questions.

      a)  In Figure 1G, the delta CBV ranges from 0.5 to 1.5, although in subsequent figures (e.g., Figure 2D), the range is much larger (-15 to 45). Is it possible that the first figure is a proportion rather than a percentage, or is there some other explanation for the massive difference in scale? Not being very familiar with this measure, it was confusing.

      The same scale is used in both figures, the major difference being that in Figure 1D, we take the average over all voxels and sounds (for each category), which will include many nonresponsive voxels, and for responsive voxels, sounds that they do not respond a lot to. On the other hand, Figure 2D shows the response of a single, responsive voxel. Thus, the values it reaches for its preferred sounds (45%) are an extreme, which weighs only little in Figure 1D. We have changed the legend of Figure 1D to make this more explicit.

      b)  Similar to the first point, the strength of the correlations in the matrices of Figure 1E is very small (~ 0.05) compared to the test-retest reliabilities plotted in Figure 2B (~0.5). Again, I was confused by this large difference in scale.

      Two main factors explain the difference in values between Figure 1E and Figure 2B. First, in Figure 1B, each correlation is done on the average activity in a window of 0.3 s, opposed to 2.4 s in Figure 2B. More averaging leads to better SNR, which inevitably leads to higher testretest correlations. Second, in Figure 1B, the cross-correlation matrices are averaged across all responsive voxels without any criterion for reliability. On the other hand, Figure 2B show example voxels with good test-retest reliability. 

      c)  In Figure 2D, the example voxels are supposed to be shown in white. It appears that this example voxel is only shown for the non-primary voxel. Please be sure to add these voxels throughout the other panels and figures as well. 

      We fixed this mistake and added the example voxel in all panels.

      d)  Why do the invariance results (e.g., Figure 2F) for individual animals combine across dPEG and VP, while the overall results (across all animals) split things across all three regions? The results in Table 2 do, in fact, provide this data. Upon further examination of the data in Table 2, it seems like there is only a significant difference between background invariance between dPEG and VP for one of the two animals, and that this might be what drives the effect when pooling across all animals. This seems important to both show visually in the figure and to potentially discuss. There is still very clearly a difference between primary and non-primary, but whether there is a real difference between dPEG and VP seems more unclear.

      We added the values for single animals in the plot and highlighted this limitation in the text:

      “While background invariance was overall highest in VP, the differences within non-primary areas were more variable across animals (see table 2).”

      e)  Again, as in Figure 2F, the cross symbols seem like a bad choice as markers since the vertical components of the cross are suggestive of the error of the measurement. However, no error is actually plotted in these figures. I recommend using a different marker and including some measure of error in the invariance plots.

      We replaced the crosses with circles to avoid confusion. The measure of error is provided by the representation of values for single animals.

      f) The caption for Figure 4C states that each line corresponds to one animal, but does not precisely state what this line represents. Is this the median or something?

      Each line indeed represents the median across voxels for one animal. We added this information to the legend.

      g)  In Figure 5, the captions for panels D and E are swapped.

      This has now been corrected.

      (2) Discussion:

      (a) In the paragraph on methodological differences, it mentions that the fMRI voxel size is around 2 mm. This may be true in general, but given the comparison to Kell & McDermott 2019, the voxel size should reflect that used in their study (1 mm).

      The reviewer might refer to this sentence from the methods of Kell et al., 2019: “T1weighted anatomical images were collected in each participant (1-mm isotropic voxels) for alignment and cortical surface reconstruction.” However, this does not correspond to the resolution of the functional data, which is 2 mm, as mentioned a bit further in the Methods:  “In-plane resolution was 2 × 2 mm (96 × 96 matrix), and slice thickness was 2.8 mm with a 10% gap, yielding an effective voxel size of 2 × 2 × 3.08 mm.”

      (b) In the next paragraph on the control of attention, it mentions that attentional differences could play a role. However, in Kell & McDermott 2019, they manipulated attention (attend visual versus attend auditory) and found that it did not substantially affect the observed pattern invariance. I suppose it could potentially affect the degree to which an encoding model could explain the invariance. This seems important, and given that the data was already collected, it could be worth it to analyze that data.

      As the reviewer points out, Kell et al. 2019 ran an additional experiment in which they manipulated auditory vs. visual attention. However, the auditory task was just based on loudness and ensured that the participants were awake and paying attention to the stimuli, but not specifically to the foreground or background. This type of attention did not lead to changes in the observed patterns of invariance, which might have been the case for selective attention to backgrounds or foregrounds in the mixture. Given that these manipulations were not done in the ferret experiments, we chose to not include the analysis of this dataset in the scope of this paper. However, future work investigating that topic further would indeed be of interest.

      (c) The mention of "a convolutional neural network trained to recognize digits in noise" should make more obvious that this is visual recognition rather than auditory recognition.

      We clarified this sentence to make clear that the recognition is visual and not auditory: “For instance, in a convolutional neural network trained to visually recognize digits in different types of noise, when local feedback is implemented, early layers encode noise properties, while later layers represent clean signal.”

      (d) Finally, one explanation of the results in the discussion is that "primary auditory areas could be recruited to maintain background representations, enabling downstream cortical regions to use these representations to specifically suppress background information and enhance foreground representations." This "background-related information" being used to "facilitate further extraction of foregrounds" is similar to what is argued in Hicks & McDermott PNAS 2024.

      We thank the reviewer for suggesting this relevant reference and added it in this paragraph of the discussion.

      (3) Methods:

      In the "Cross-correlation matrices" section, it mentions that time-averaged responses from 2.4 to 4.8 s were used. It would be helpful to provide an explanation of why this particular time window was used. Additionally, I wondered whether one could look at adaptation type effects (e.g., that of Khalighinejad et al., 2019) or whether fUSI does not offer this kind of temporal precision?

      The effects shown in Khalighinejad et al., 2019, are indeed likely too fast to be observed with our methods. However, there are still dynamics in the fUSI signal and in its invariance (Figure S1). Each individual combination of foreground and background is presented for 4.8 s (Figure 1B). Therefore, we chose the range 2.4-4.8 s as the biggest window we could use (to improve SNR) while minimizing contamination from the previous or next sound (indeed, blood volume typically lags neuronal activity by 1.5-2 s). We added this precision to the methods.

      In the "Human analyses" section, it is very unclear which set of data was used from Kell & McDermott 2019. For example, that paper contains 4 different experiments, none of which has 7 subjects. Upon closer reading, it seems that only 7 of the 11 participants from Experiment 1 also heard the background sounds in isolation (thus enabling the foreground invariance analyses). However, they stated that there were only 3 female participants in that experiment, while you state that you used data from 7 females. It would be helpful to double-check this and to more clearly state exactly which participants (i.e., from which experiment) were used and why (e.g., why not use data from Experiment 4 in the visual task/attention condition?).

      We added a sentence to clarify which datasets were used: “Specifically, we used data from Experiment 1 which provided the closest match to our experimental conditions, and only considered the last 7 subjects that heard both the foregrounds and the backgrounds in isolation, in addition to the mixtures.” 

      It was a mistake to mention that it was all female, as the original dataset has 3 females and 8 males, of which we used 7 without any indication of their sex. Thus, we removed this mention from the text.

      In the "Statistical testing" section, why were some tests done with 1000 permutations/shuffles while others were done with 2000?

      We homogenized and used 1000 permutations/shuffles for all statistical tests.

      (4) Miscellany:

      (a) The Hamersky et al. 2023 preprint has recently been published (referenced in the public review), and so you could consider updating the reference.

      This reference has now been updated.

      (b) There are a few borderline statistical tests that could use a bit more nuance. For example (on page 4), "In primary auditory cortex (MEG), there was no significant difference between values of foreground invariance and background invariance (p = 0.063, obtained by randomly permuting the sounds' background and foreground labels, 1000 times)." This test is quite close to being significant, and this might be acknowledged.

      We emphasized the trend to nuance the interpretation of these results: “In primary auditory cortex (MEG), foreground invariance was slightly lower than background invariance, although this difference was not significant (p=0.063, obtained by randomly permuting the sounds' background and foreground labels, 1000 times).”

      (5) Potential typos:

      (a)   Should the title be "natural sound mixtures" instead of "natural sounds mixtures"?

      (b) The caption for Figure 1 says "We imaged the whole auditory through successive slices across several days." I believe this should the "the whole auditory [cortex]." c) In the first paragraph of the discussion, there is a sentence ending in "...are segregated in hemody-namic signal." I believe this should be "hemody-namic signal."

      These errors are now all corrected.

    1. eLife Assessment

      This valuable study characterises receptors for calcitonin-related peptides from a deuterostomian animal, the echinoderm Apostichopus japonicus, by a combination of heterologous expression, pharmacological experiments, and the quantification of gene-expression levels. The authors provide solid evidence for a functional calcitonin-related peptide system in the sea cucumber, but further work will be needed to confirm the proposed phylogenetic relationships and physiological functions of PDF receptor system in this species. This work should be of interest to scientists studying the signaling pathways, functions, and evolution of neuropeptides, and could be of relevance to improving the culture conditions of this economically key species.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript characterizes a functional peptidergic system in the echinoderm Apostichopus japonicus that is related to the widely conserved family of calcitonin/diuretic hormone 31 (CT/DH31) peptides in bilaterian animals. In vitro analysis of receptor-ligand interactions, using multiple receptor activation assays, identifies three cognate receptors for two CT-like peptides in the sea cucumber, which stimulate cAMP, calcium, and ERK signaling. Only one of these receptors clusters within the family of calcitonin and calcitonin-like receptors (CTR/CLR) in bilaterian animals, whereas two other receptors cluster with invertebrate pigment dispersing factor receptors (PDFRs). In addition, this study sheds light on the expression and in vivo functions of CT-like peptides in A. japonicus, by quantitative real-time PCR, immunohistochemistry, pharmacological experiments on body wall muscle and intestine preparations, and peptide injection and RNAi knockdown experiments. This reveals a conserved function of CT-like peptides as muscle relaxants and growth regulators in A. japonicus.

      Strengths:

      This work combines both in vitro and in vivo functional assays to identify a CT-like peptidergic system in an economically relevant echinoderm species, the sea cucumber A. japonicus. A major strength of the study is that it identifies three G protein-coupled receptors for AjCT-like peptides, one related to the CTR/CLR family and two related to the PDFR family. A similar finding was previously reported for the CT-related peptide DH31 in Drosophila melanogaster that activates both CT-type and PDF-type receptors. Here, the authors expand this observation to a deuterostomian animal, which suggests that receptor promiscuity is a more general feature of the CT/DH31 peptide family and that CT/DH31-like peptides may activate both CT-type and PDF-type receptors in other animals as well.

      Besides the identification of receptor-ligand pairs, the downstream signaling pathways of AjCT receptors have been characterized, revealing broad and in some cases receptor-specific effects on cAMP, calcium, and ERK signaling.

      Functional characterization of the CT-related peptide system in heterologous cells is complemented with ex vivo and in vivo experiments. First, peptide injection and RNAi knockdown experiments establish transcriptional regulation of all three identified receptors in response to changing AjCT peptide levels. Second, ex vivo experiments reveal a conserved role for the two CT-like peptides as muscle relaxants, which have differential effects on body wall muscle and intestine preparations. Finally, peptide injection and knockdown experiments uncover a growth-promoting role for one CT-like peptide (AjCT2). Injection of AjCT2 at high concentration, or long-term knockdown of the AjCT precursor, affects diverse growth-related parameters including weight gain rate, specific growth rate, and transcript levels of growth-regulating transcription factors. The authors also reveal a growth-promoting function for the PDFR-like receptor AjPDFR2, suggesting that this receptor mediates the effects of AjCT2 on growth.

      Weaknesses:

      The authors present a more detailed phylogenetic analysis in the revised version, including a larger number of species. But some clusters in the analysis are not well supported because they have only low bootstrap values. This makes it difficult to interpret the clustering in some parts of the tree.

      Expression of CT-like peptides was investigated both at transcript and protein level, but insight into the expression of the three peptide receptors is limited. This makes it difficult to understand the mechanism underlying the (different) functions of the two CT-like peptides in vivo. The authors identify differences in signal transduction cascades activated by each peptide, which might underpin distinct functions, but these differences were established only in heterologous cells.

      The authors show overlapping phenotypes for a long-term knockdown of the AjCT precursor and the AjPDFR2 receptor, suggesting that the growth-regulating functions of AjCT2 are mediated by this receptor pathway. However, it remains unclear whether this mechanism underpins the growth-regulating function of AjCT2, until further in vivo evidence for this ligand-receptor interaction is presented. For example, the authors could investigate whether knockdown of AjPDFR2 attenuates the effects of AjCT2 peptide injection. In addition, a functional PDF system in this species remains uncharacterized, and a potential role of PDF-like peptides in growth regulation has not yet been investigated in A. japonicus. Therefore, it also remains unclear whether the ability of CT-like peptides to activate PDFRs is an evolutionary ancient property of this peptide family or whether this is an example of convergent evolution in some protostomian (Drosophila) and deuterostomian (sea cucumber) species.

    3. Reviewer #2 (Public review):

      Summary:

      The authors show that A. japonicus calcitonins (AjCT1 and AjCT2) activate not only the calcitonin/calcitonin-like receptor, but they also activate the two "PDF receptors", ex vivo. They also explore secondary messenger pathways that are recruited following receptor activation. They determine the source of CT1 and CT2 using qPCR and in situ hybridization and finally test the effects of these peptides on tissue contractions, feeding and growth. This study provides solid evidence that CT1 and CT2 act as ligands for calcitonin receptors; however, evidence supporting cross-talk between CT peptides and "PDF receptors" is weak.

      Strengths:

      This is the first study to report pharmacological characterization of CT receptors in an echinoderm. Multiple lines of evidence in cell culture (receptor internalization and secondary messenger pathways) support this conclusion.

      Weaknesses:

      The authors claim that A. japonicus CTs activate "PDF" receptors and suggest that this cross-talk is evolutionary ancient since similar phenomenon also exists in the fly Drosophila melanogaster. These conclusions are not fully supported. The authors perform phylogenetic analysis to show that the two "PDF" receptors form an independent clade. The bootstrap support is quite low in a lot of instances, especially for the deuterostomian and protostomian PDFR clades which is below 30. With such low support, it is unclear if the clade comprising deuterostomian "PDFR" is in fact PDFRs and not another receptor type whose endogenous ligand (besides CT) remains to be discovered.

    1. eLife Assessment

      This fundamental study examines infection of the liver and hepatocytes during tuberculosis infection. The authors convincingly demonstrate that aerosol infection of mice and guinea pigs leads to appreciable infection of the liver as well as the lung. A further strength of the study lies in clinical evaluation of the presence of tuberculosis bacteria in human autopsied liver samples from individuals with miliary tuberculosis and the presence of a clear granuloma-like structure, which will prompt further study.

    2. Reviewer #1 (Public review):

      Summary:

      Authors showed the presence of Mtb in human liver biopsy samples of TB patient and reported that chronic infection of Mtb causes immune-metabolic dysregulation. Authors showed that Mtb replicates in hepatocytes in a lipid rich environment created by up regulating transcription factor PPARγ. Authors also reported that Mtb protects itself from anti-TB drugs by inducing drug metabolising enzymes.

      Strengths:

      It has been shown that Mtb induces storage of triacylglycerol in macrophages by induction of WNT6/ACC2 which helps in its replication and intracellular survival, however, creation of favorable replicative niche in hepatocytes by Mtb is not reported. It is known that Mtb infect macrophages and induces formation of lipid-laden foamy macrophages which eventually causes tissue destruction in TB patient. In a recent article it has been reported that "A terpene nucleoside from M. tuberculosis induces lysosomal lipid storage in foamy macrophages" that shows how Mtb manipulates host defense mechanisms for its survival. In this manuscript, authors reported the enhancement of lipid droplets in Mtb infected hepatocytes and convincingly showed that fatty acid synthesis and triacylglycerol formation is important for growth of Mtb in hepatocytes. Authors also showed the molecular mechanism for accumulation of lipid and showed that the transcription factor associated with lipid biogenesis, PPARγ and adipogenic genes were upregulated in Mtb infected cells.

      The comparison of gene expression data between macrophages and hepatocytes by authors is important which indicates that Mtb modulates different pathways in different cell type as in macrophages it is related to immune response whereas, in hepatocytes it is related to metabolic pathways.

      Authors also reported that Mtb residing in hepatocytes showed drug tolerance phenotype due to up regulation of enzymes involved in drug metabolism and showed that cytochrome P450 monooxygenase that metabolize rifampicin and NAT2 gene responsible for N-acetylation of isoniazid were up regulated in Mtb infected cells.

      Weaknesses:

      There are reports of hepatic tuberculosis in pulmonary TB patients especially in immune-compromised patients, therefore finding granuloma in human liver biopsy samples is not surprising.

      Mtb infected hepatic cells showed induced DME and NAT and this could lead to enhanced metabolism of drug by hepatic cells as a result Mtb in side HepG2 cells get exposed to reduced drug concentration and show higher tolerance to drug. Authors mentioned that " hepatocyte resident Mtb may display higher tolerance to rifampicin". In my opinion higher tolerance to drug is possible only when DME of Mtb inside is up regulated or target is modified. Although, in the end authors mentioned that drug tolerance phenotype can be better attributed to host intrinsic factors rather than Mtb efflux pumps. It may be better if Drug tolerant phenotype section can be rewritten to clarify the facts.

      In the revised manuscript, by immune-staining authors convincingly showed that hepatocytes are a favourable niche for replication of MTb.

      Authors have rewritten the drug tolerant phenotype section which reads better.

      Overall, this paper has new and important information on how MTb establishes a favourable niche for growth in hepatocytes and creates a drug tolerant environment.

    3. Reviewer #2 (Public review):

      The manuscript by Sarkar et al has demonstrated the infection of liver cells/hepatocytes with Mtb and the significance of liver cells in the replication of Mtb by reprogramming lipid metabolism during tuberculosis. Besides, the present study shows that similar to Mtb infection of macrophages (reviewed in Chen et al., 2024; Toobian et al., 2021), Mtb infects liver cells but with a greater multiplication owing to consumption of enhanced lipid resources mediated by PPARg that could be cleared by its inhibitors. The strength of the study lies in clinical evaluation of the presence of Mtb in human autopsied liver samples from individuals with miliary tuberculosis and presence of a clear granuloma-like structure. The interesting observation is of granuloma-like structure in liver which prompts further investigations in the field.

      The modulation of lipid synthesis during Mtb infection, such as PPARg upregulation, appears generic to different cell types including both liver cells and macrophage cells. It is also known that infection affect PPARγ expression and activity in hepatocytes. It is also known that this can lead to lipid droplet accumulation in the liver and the development of fatty liver disease (as shown for HCV). This study is in similar line for M.tb infection. As liver is the main site for lipid regulation, the availability of lipid resources is greater and higher is the replication rate. In short, the observations from the study confirm the earlier studies with these additional cell types. It is known that higher the lipid content, greater are Lipid Droplet-positive Mtb and higher is the drug resistance (Mekonnen et al., 2021). The DMEs of liver cells add further to the phenotype.

      Comments on revised version:

      The authors noted that even in experiments where mice were infected with lower CFUs, the presence of Mtb colonies could still be detected in the liver. It would be beneficial to include some experimental data related to this in the supplementary information, as it could provide valuable insights for the research field.

    4. Reviewer #3 (Public review):

      In this revised manuscript, the authors explore how Mtb can infect hepatocytes and create a favorable niche associated with upregulation of the transcription factor PPARγ which presumably allows the bacteria to scavenge lipids from lipid droplets in host cells and upregulate drug-metabolizing enzymes to protect against its elimination. In response to the review, the authors have performed some additional immunostaining of hepatocytes, added more detail to figure legends, added experiments somewhat showing improved colocalization and staining, clarified several points and paragraphs, and updated the referenced literature and discussion.

      The current manuscript provides evidence that human miliary TB patients have infection of hepatocytes with Mtb, with evidence that the bacteria survive at least partially through upregulation of PPARγ, which significantly changes the lipid milieu of the cells. There is also an examination of transcriptomics and lipid metabolism in response to Mtb infection, as well as drug tolerance of Mtb inside hepatocytes. The current manuscript is an improvement over the previous one.

      However, although the manuscript is improved, tissue immunophenotyping of the various cells in the liver remains weak and unconvincing. This is truly a missed opportunity and lessens the rigor of the central findings and conclusions. As pointed out by another reviewer, literature has described different fates of Mtb in the liver. Given the tissue available to the authors, carefully dissecting the various cells that the bacteria are in (esp. hepatocytes versus Kupffer cells) is critical. The authors use only 2 generic markers and do not distinguish among cell types within the tissue slices. A review of the literature shows a variety of both human and mouse antibody markers. In fact, a liver atlas based on immunophenotyping has been published. Likewise, the authors comment on liver granulomas, but this is not justified without immunophenotyping.

    1. eLife Assessment

      This study presents an important finding on the role of GATA4 in aging- and OA-associated cartilage pathology. The conclusions are well supported by compelling in vitro and in vivo evidence. This work will be of broad interest to both cell biologists and orthopedic clinicians.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript assesses the differences between young and aged chondrocytes. Through transcriptomic analysis and further assessments in chondrocytes, GATA4 was found to be increased in aged chondrocyte donors compared to young. Subsequent mechanistic analysis with lentiviral vectors, siRNAs, and a small molecule were used to study the role of GATA4 in young and old chondrocytes. Lastly, an in vivo study was used to assess the effect of GATA4 expression on osteoarthritis progression in a DMM mouse model.

      Strengths:

      This work linked the over expression of GATA4 to NF-kB signaling pathway activation, alterations to the TGF-b signaling pathway, and found that GATA4 increased the progression of OA compared to the DMM control group. Indicating that GATA4 contributes to the onset and progression of OA in aged individuals.

      Comments on revised version:

      Great work! All my concerns have been well addressed.

    3. Reviewer #2 (Public review):

      Summary:

      This study elucidated the impact of GATA4 on aging- and injury-induced cartilage degradation and osteoarthritis (OA) progression, based on the team's finding that GATA expression is positively correlated with aging in human chondrocytes. By integrating cell culture of human chondrocytes, gene manipulation tools (siRNA, lentivirus), biological/biochemical analyses and murine models of post-traumatic OA, the team found that increasing GATA4 levels reduced anabolism and increased catabolism of chondrocytes from young donors, likely through upregulation of the BMP pathway, and that this impact is not correlated with TGF-β stimulation. Conversely, silencing GATA4 by siRNA attenuated catabolism and elevated aggrecan/collagen II biosynthesis of chondrocytes from old donors. The physiological relevance of GATA4 was further validated by the accelerated OA progression observed in lentivirus-infected mice in the DMM model.

      Strengths:

      This is a highly significant and innovative study that provides new molecular insights into cartilage homeostasis and pathology in the context of aging and disease. The experiments were performed in a comprehensive and rigorous manner. The data were interpreted thoroughly in the context of the current literature.

      Weaknesses:

      The only aspect that would benefit from further clarification is a more detailed discussion of aging-associated ECM changes in the context of prior literature.

    4. Reviewer #3 (Public review):

      Summary:

      This is an exciting, comprehensive paper that demonstrates the role of GATA4 on OA-like changes in chondrocytes. The authors present elegant reverse translational experiments that justify this mechanism and demonstrate the sufficiency of GATA4 in a mouse model of osteoarthritis (DMM), where GATA4 drove cartilage degeneration and pain in a manner that was significantly worse than DMM alone. This could pave the way for new therapies for OA that account for both structural changes and pain.

      Strengths:

      (1) GATA4 was identified from human chondrocytes.

      (2) IHC and sequencing confirmed GATA4 presence.

      (3) Activation of SMADs is clearly shown in vitro with GATA4 overexpression.

      (4) The role of GATA4 was functionally assessed in vivo using the mouse DMM model, where the authors uncovered that GATA4 worsens OA structure and hyperalgesia in male mice.

      (5) It is interesting that GATA4 is largely known to be found in cardiac cells and to have a role in cardiac repair, metabolism, and inflammation, among other things listed by the authors in the discussion (in liver, lung, pancreas). What could this new knowledge of GATA4 mean for OA as a potentially systemically mediated disease, where cardiac disease and metabolic syndrome are often co-morbid?

      Weaknesses:

      (1) It would be useful to explain why GATA4 was chosen over HIF1a, which was the most differentially expressed.

      (2) In Figure 5, it would be useful to demonstrate the non-surgical or naive limbs to help contextualize OARSI scores and knee hyperalgesia changes.

      (3) While there appear to be GATA4 small molecule inhibitors in various stages of development that could be used to assess the effects in age-related OA, those experiments are out of scope for the current study.

      Comments on revised version:

      I do not have further comments. Thank you for addressing the previously mentioned concerns.

    1. eLife Assessment

      This important study reports the conservation of sperm-egg envelope binding by demonstrating successful recognition of the micropyle in fish eggs by mouse sperm. The evidence supporting the conclusions drawn is convincing. This study will be of interest to reproductive biologists and clinicians studying the biology of fertilization and fertility.

    2. Reviewer #1 (Public review):

      Summary:

      The paper is well written and investigates the cross-species insemination of fish eggs with mouse sperm. and I have a few major and minor comments.

      Strengths:

      The experiments are well executed and could provide valuable insights into the complex mechanisms of fertilization in both species. I found the information presented to be very interesting,

      Weaknesses:

      The rationale of some of the experiments, in particular those using CatSper KO sperm is, in my view.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The paper is well written and investigates the cross-species insemination of fish eggs with mouse sperm. I have a few major and minor comments.

      Strengths:

      The experiments are well executed and could provide valuable insights into the complex mechanisms of fertilization in both species. I found the information presented to be very interesting,

      Thank you.

      Weaknesses:

      The rationale of some of the experiments is not well defined.

      Thank you. In the revised manuscript, we have clarified and expanded the rationale behind each experiment to better highlight the specific questions being addressed and how each approach contributes to our overall investigation. These clarifications have been integrated throughout the Results and Discussion sections. We provide detailed rationale in our point-by-point responses to both reviewers, outlining how each experimental design was motivated by prior findings, hypotheses, or specific gaps in knowledge. We hope these revisions make the experimental logic and progression better defined and more compelling.

      Major Comments:

      (1) Figure 5

      I do not understand the rationale for performing experiments using CatSper-null sperm and CD9-null oocytes. It is well established that CatSper-null sperm are unable to penetrate the zona pellucida (ZP), so the relevance of this approach is unclear.

      We thank the reviewer for this comment. This experiment was conducted as the basis to then evaluate the contributions of progressive and hyperactivated motility to the ability of mouse sperm to locate and traverse the zebrafish micropyle. In earlier experiments (Figures 1 and 3), we assessed whether sperm-micropyle interaction was robust by comparing it to binding to the mouse zona pellucida and testing whether both interactions persisted after washing, which is standard approach to distinguish specific binding from non-specific adherence (Avella et al., 2014; Baibakov et al., 2012). Thus, we extended this analysis to CatSper1<sup>Null</sup> sperm; CatSper1<sup>Null</sup> sperm were still capable of binding the zona pellucida comparably to heterozygous controls, though they were unable to cross the zona of Cd9<sup>Null</sup> eggs. These observations served as a validation step for the use of CatSper1<sup>Null</sup> sperm for downstream micropyle interaction assays. Thus, we proceeded to test whether hyperactivated motility, absent in CatSper1<sup>Null</sup> sperm, is required for locating and crossing the micropyle.

      It is indeed well established that CatSper1<sup>Null</sup> sperm are unable to penetrate the zona pellucida, and previous studies have typically used the absence of fertilized eggs as a readout. However, failed fertilization may result from multiple factors, including impaired sperm motility, reduced capacity to bind the zona pellucida, or an inability to penetrate it. To our knowledge, no study has quantitatively assessed the number of CatSper-deficient sperm that successfully bind, cross the zona and reach the perivitelline space. To address this, we first used normal oocytes for sperm binding and Cd9<sup>Null</sup> oocytes (Le Naour et al., 2000), which allow direct quantification of sperm accumulation in the perivitelline space. We have 7included a detailed explanation in the Results to clarify this point, lines 352-365 and 376-369.

      (2) Micropyle penetration and sperm motility

      CatSper-null sperm are reportedly unable to cross the micropyle, but this could be due to their reduced motility rather than a lack of hyperactivation per se. Were these experiments conducted using capacitated or non-capacitated spermatozoa? What was the observed motility of CatSper-null sperm during these assays? Clarifying these conditions is essential to avoid drawing incorrect conclusions from the results.

      Thank you for raising these points. Under our IVF conditions, qualitative observations confirmed that CatSper1<sup>Null</sup> sperm displayed progressive motility, maintained sufficient progressive motility during the first hour post-insemination and exhibited zona binding efficiency comparable to that of CatSper1<sup>Het</sup> controls (Figure 5A and B). This is consistent with previous reports showing that within the first 90 minutes of sperm incubation in media, approximately 20% of CatSper1<sup>Null</sup> sperm preserve motility (Qi et al., 2007). Given previous studies indicating that 15–35% of sperm undergo hyperactivation within 90 minutes (Goodson et al., 2011), and considering that 100,000 progressively motile sperm were used for insemination, we estimate that approximately 3,000 hyperactivated CatSper1<sup>Null</sup> sperm were present in the cross-species insemination dish (mouse sperm x zebrafish eggs). Based on these numbers, we would have expected at least some sperm to locate the micropyle if hyperactivation were not required for its detection and entry. Nevertheless, CatSper1<sup>Null</sup> sperm were detected in proximity to the micropyle canal, its opening, or within the inter-chorion space (ICS). These observations support the conclusion that the inability ofCatSper1<sup>Null</sup> sperm to locate and enter the micropyle is attributable to their failure to hyperactivate. Also, all sperm used in these assays were exposed to identical capacitating conditions (HTF/HSA, 37 °C, 5% CO2). We now clarify this in the Methods, line 624, and we added more rationale under the Results, lines 361-365 and in the Discussion, lines 470-483.

      (3) Rheotaxis and micropyle navigation

      Previous studies have shown that CatSper-null sperm fail to undergo rheotaxis. Could this defect be related to their inability to locate and penetrate the micropyle? Exploring a potential shared mechanism could be informative.

      Thank you for raising this interesting point. Indeed, homozygous mutant mice lacking expression of a different component of the CatSper channel, CatSperz, show reduced rheotactic efficiency and severe subfertility (Chung et al., 2017). We cannot exclude that complete lack of CatSper as shown in CatSper1<sup>Null</sup> mice could lead to reduced rheotactic efficiency, hence we include this interpretation in the Discussion (lines 484-486).

      (4) Lines 61-74

      This paragraph omits important information regarding acrosomal exocytosis, which occurs prior to sperm-egg fusion. Including this detail would strengthen the discussion.

      Thank you. We have revised the text in the discussion to describe the process of acrosome exocytosis, and its relevance for fertilization (lines 504-518).

      Reviewer #2 (Public review):

      Summary:

      Garibova et al. investigated the conservation of sperm recognition and interaction with the egg envelope in two groups of distantly related animals: mammals (mouse) and fish (zebrafish). Previous work and key physiological differences between these two animal groups strongly suggest that mouse sperm would be incapable of interaction with the zebrafish egg envelope (chorion) and its constituent proteins, though homologous to the mammalian zona pellucida (ZP). Indeed, the authors showed that mouse sperm do not bind recombinant zebrafish ZP proteins nor the intact chorion. Surprisingly, however, mouse sperm are able to locate and bind to the zebrafish micropyle, a specialized canal within the chorion that serves as the egg's entry point for sperm. This study suggests that sperm attraction to the egg might be highly conserved from fish to mammals and depends on the presence of a still unknown glycosylated protein within the micropyle. The authors further demonstrate that mouse sperm are able to enter the micropyle and accumulate within the intrachorionic space, potentially through a CatSper-dependent mechanism.

      Strengths:

      The authors convincingly demonstrate that mouse sperm do not bind zebrafish ZP proteins or the chorion. Furthermore, they make the interesting observation that mouse sperm are able to locate and enter the zebrafish micropyle in an MP-dependent manner, which is quite unexpected given the large evolutionary distance between these species, the many physiological differences between mouse and zebrafish gametes, and the largely different modes of both fertilization and reproduction in these species. This may indicate that the sperm chemoattractant in the egg is conserved between mammals and fish; however, whether zebrafish sperm are attracted to mouse eggs was not tested.

      Thank you. We performed an additional experiment with fish sperm used to inseminate ovulated mouse eggs, and results are reported in lines 183-187 and in Supplementary Figure 2.

      Weaknesses:

      The key weakness of this study lies in the rationale behind the overall investigation. In mammals, the zona pellucida (ZP) has been implicated in binding sperm in a taxon-specific manner, such that human sperm are incapable of binding the mouse ZP. Indeed, work by the corresponding author showed that this specificity is mediated by the N-terminal region of the ZP protein ZP2 (Avella et al., 2014). The N-termini of human and mouse ZP2 share 48% identity, which is higher than the overall identity between mouse and zebrafish ZP2, with the latter ortholog entirely lacking the N-terminal domain that is essential for sperm binding to the ZP. Given this known specificity for mouse vs. human sperm-ZP binding, it does not follow that mouse sperm would bind ZP proteins from not only a species that is much more distantly related, but also one that is not even a mammal, the zebrafish. Furthermore, the fish chorion does not play a role in sperm binding at all, while the mammalian ZP can bind sperm at any location. On the contrary, the zebrafish chorion prevents polyspermy by limiting sperm entry to the single micropyle.

      We thank the reviewer for this detailed comment. In this study, our goal was precisely that one of validating the hypothesis that mouse sperm would not bind either recombinant fish ZP proteins or the chorion; in addition, we found it important to examine the observation that mouse sperm could detect the micropyle. We further elaborated this rationale in the Introduction (lines 93-100).

      In addition, though able to provide some information regarding the broad conservation of sperm-egg interaction mechanisms, the biological relevance of these findings is difficult to describe. Fish and mammals are not only two very distinct and distantly related animal groups but also employ opposite modes of fertilization and reproduction (external vs. internal, oviparous vs viviparous). Fish gametes interact in a very different environment compared to mammals and lack many typically mammalian features of fertilization (e.g., sperm capacitation, presence of an acrosome, interaction with the female reproductive tract), making it difficult to make any physiologically relevant claims from this study. While this study may indicate conserved mechanisms of sperm attraction to the egg, the identity of the molecular players involved is not investigated. With this knowledge, the reader is forced to question the motivation behind much of the study.

      We thank the reviewer for their perspective, and we appreciate the opportunity to further elaborate on our rationale. As outlined in our Results and Discussion sections, a growing body of evidence supports the presence of conserved molecular players and signaling pathways involved in gamete interaction across species with diverse reproductive strategies. While zebrafish and mice do differ in their fertilization environments and modes of reproduction, these differences may not necessarily exclude the possibility of conserved molecular mechanisms underlying gamete interaction. For example, the CatSper calcium channel, which plays a key role in regulating sperm motility and hyperactivation, is conserved across a broad range of taxa—from echinoderms such as sea urchins (external fertilizers)(Seifert et al., 2015) to mammals, including mice and humans (internal fertilizers)(Lishko and Mannowetz, 2018). Moreover, sperm from some fish species possess acrosomes that undergo exocytosis prior to fertilization while sperm cross the micropyle (Psenicka et al., 2010). Also, in ovoviviparous species with internal fertilization, such as the black rockfish, sperm do undergo molecular changes while in the female reproductive tract—including immunomodulatory adaptations, glycocalyx remodeling, and interactions with ovarian cells—enabling the sperm with a longer-term survival and a selective persistence that ensures only the fittest sperm can successfully fertilize eggs (Li et al., 2024). As per the mammalian capacitation, it is broadly defined as the process during which sperm undergo hyperactivation (Yanagimachi, 1970), and acquire the ability to undergo the acrosome exocytosis, making the sperm competent for gamete fusion and fertilization (Bhakta et al., 2019; Puga Molina et al., 2018; Yanagimachi, 1957; Yanagimachi et al., 2017). Of note, acrosome exocytosis or changes in sperm motility are not exclusive to internal fertilizers. For example, as we cite in our manuscript (and as just stated above), acrosome exocytosis has been described to occur as sturgeon sperm cross the micropyle (Psenicka et al., 2010). As per changes in flagellar motility, investigations in the Pacific herring (Clupea sp.) demonstrated that sperm remain nearly immotile upon release into seawater and only initiate motility when approaching the micropyle region of the egg (Yanagimachi, 1957; Yanagimachi et al., 2017). In other fish, including bitterling and zebrafish, further enhancement in sperm motility is observed as sperm approach the micropyle area (Suzuki, 1958; Yanagimachi et al., 2017). These studies suggest that functional equivalents of capacitation may exist across taxa.

      We interpret the observation that mouse sperm can locate and enter the micropyle as suggesting that underlying guidance mechanisms may be more broadly conserved across distant species than previously recognized. We have now elaborated on these points in the revised Discussion (lines 531-552), and we hope the motivation behind our study is now more clearly articulated.

      During fertilization in fish, the sperm enters the micropyle and subsequently, the egg, as it is simultaneously activated by exposure to water. During egg activation, the chorion lifts as it separates from the egg and fills with water. This mechanism prevents supernumerary sperm from entering the egg after the successfully fertilizing sperm has bound and fused. In this study, the authors show that mouse sperm enter the micropyle and accumulate in the intrachorionic space. Whether any sperm successfully entered the egg is not addressed, and the status of egg activation is not reported.

      We appreciate the reviewer’s detailed comments and the opportunity to elaborate on this important aspect for our cross-insemination assay. We interpret the reviewer’s reference to “sperm entering the egg” as pertaining to sperm adhesion to the oocyte plasma membrane followed by fusion with the egg cell, two separate steps regulated by different molecular players for sperm-egg plasma membrane adhesion (Bianchi et al., 2014; Fujihara et al., 2021; Herberg et al., 2018; Inoue et al., 2005) and for fusion. It is important to note that proteins mediating gamete fusion are still unidentified in fish and mammals (Bianchi and Wright, 2020; Deneke and Pauli, 2021).

      In our cross-species insemination experiments, zebrafish oocytes were maintained in Hank’s solution to limit spontaneous activation; however, as the reviewer correctly notes, activation likely occurred upon exposure to HTF. While this model does not recapitulate full fertilization events, it serves as a platform to explore whether mammalian sperm can detect (within the scope of our study) and respond (future studies) to putative evolutionarily conserved signals, such as those guiding fish sperm toward the micropyle.

      While investigating cross-species sperm–oocyte fusion was not within the scope of this study and would require a distinct set of experimental approaches, we believe this question is an important one. However, we do not expect our platform to be informative for evaluating sperm adhesion to the fish oolemma or for enabling cross-species gamete fusion. In our assays focused on sperm-micropyle interaction, Hoechst staining of nuclei of transgenically-tagged acrosome sperm revealed no evidence of sperm adhesion to or fusion with the fish egg membrane (Figure 4D). Also, molecular incompatibilities may further prevent this interaction: in zebrafish, the Ly6/uPAR family protein Bouncer is expressed exclusively in the egg and is necessary for sperm–egg membrane adhesion (Herberg et al., 2018). Recent studies in zebrafish and mice have shown that a conserved trimeric complex composed of Izumo1, Spaca6, and Tmem81 on the sperm surface is required for mediating adhesion to the oocyte membrane by interacting with the mammalian oocyte receptor Izumo1R (also known as JUNO) or the zebrafish oocyte receptor Bouncer (Deneke et al., 2024). One would hypothesize that for mouse sperm to adhere to the zebrafish egg membrane, the mouse Izumo1-Spaca6-Tmem81 complex would need to establish binding with Bouncer. To explore this possibility, we performed AlphaFold2-Multimer structural predictions and docking analyses to mimic an interaction between mouse Izumo1-Spaca6-Tmem81 and zebrafish Bouncer, using mouse Izumo1-Spaca6-Tmem81 and Juno or zebrafish Izumo1-Spaca6-Tmem81 and Bouncer as positive controls. We observed low binding affinity between zebrafish Bouncer and the mouse trimeric complex (Izumo1, Spaca6, and Tmem81), as indicated by low ipTM scores and high predicted aligned error (PAE) values. These findings suggest that the mouse complex is unlikely to form an interaction with Bouncer (now shown in Suppl. Figure 7). These predictions were consistent with our observations that no sperm were found adhering or fusing to the egg cell. We describe methods and results in the supplementary files (Supporting Info, lines 53-66) and in the result sections (lines 335-339).

      In Supplementary Videos 3-4, the egg shown has been activated for some time, as evident by the separation of yolk and cytoplasm, yet the chorion is only partially expanded (likely due to mouse IVF conditions). How multiple sperm were able to enter the micropyle but presumably not the egg is not addressed, yet this suggests that the zebrafish mechanism of blocking polyspermy (fertilization by multiple sperm) is not effective for mouse sperm or is rendered ineffective due to mouse IVF conditions. The authors do not discuss these observations in the context of either species' physiological process of fertilization, highlighting the lack of biological context in interpreting the results.

      Thank you for raising this important point. One model for mammalian gamete recognition at the zona supports the notion that mouse sperm can penetrate extracellular matrices as long as sperm can bind to them, and binding is dependent on the cleavage status of ZP2. Zonae surrounding unfertilized mouse eggs present uncleaved ZP2 and these zonae support sperm binding. After gamete fusion, the cortical granules release ovastacin which cleaves ZP2 at the N-terminus, and consequently, zonae presenting cleaved ZP2 no longer support sperm binding. This mechanism acts as block to zona binding and prevents further crossing (Bhakta et al., 2019). Indeed, fertilized mouse eggs or 2-cell embryos surrounded by a zona containing uncleaved ZP2 support de novo sperm binding, and supernumerary sperm cross the zona and accumulate in the perivitelline space, unable to fuse with the fertilized oocyte plasma membrane or blastomere cells (Baibakov et al., 2012, 2007; Burkart et al., 2012; Gahlay et al., 2010). Thus, because under our experimental conditions, mouse sperm could interact with the micropyle opening, we interpret these findings to suggest that once interaction occurs at the micropyle opening, mouse sperm are capable of crossing it, even under conditions where the micropyle may be detached from the oocyte due to oocyte activation. Therefore, our data indicates that mouse sperm may be able to bypass the mechanism of zebrafish oocytes blocking multiple sperm to pass through the micropyle, even after oocyte activation. This point has now been incorporated into the revised Discussion (lines 425-441).

      The authors further show that the zebrafish micropyle does not trigger the acrosome reaction in mouse sperm. Whether the acrosome reacts is not correlated with a sperm's ability to cross the micropyle opening, as both acrosome-intact and acrosome-reacted sperm were observed within the intrachorionic space. While the acrosome reaction is a key event during mammalian fertilization and is required for sperm to fertilize the egg, zebrafish sperm do not contain an acrosome. Thus, these results are particularly difficult to interpret biologically, bringing into question whether this observation has biological relevance or is a byproduct of egg activation/chorion lifting that indirectly draws sperm into the chorion.

      We thank the reviewer for raising this point and we appreciate the opportunity to elaborate on the biological relevance of this experiment. Our motivation to assess acrosome status in mouse sperm following entry into the zebrafish micropyle stemmed from the following biological considerations.  In fish species such as the sturgeon, sperm present an acrosome and undergo acrosome exocytosis while passing through the micropyle, before gamete fusion (Alavi et al., 2012; Psenicka et al., 2010). By contrast, zebrafish sperm lack an acrosome, raising the hypothesis that the zebrafish micropyle may not be able to trigger acrosome exocytosis. However, this possibility has not been experimentally tested. We therefore considered it important to investigate whether passage through the zebrafish micropyle induces acrosome exocytosis in mouse sperm. We have revised the Discussion to better clarify the rationale behind the experiment as well as the interpretation of the findings (lines 504-518). As per the chorion lifting indirectly drawing sperm into the chorion, we have not observed this phenomenon.

      The final experiments regarding CatSper1's role in mediating mouse sperm entry into the micropyle/chorion are not convincing. As no molecular interactions are described or perturbed, the reader cannot be sure whether the sperm's failure to enter is due to signaling via CatSper1 or whether the overall failure to undergo hyperactivation limits sperm motility such that the mutant sperm can no longer find and enter the zebrafish micropyle. Indeed, in Figure 5E, no CatSper1 mutant sperm are visible near any part of the egg, suggesting that overall motility is impaired, and this is not a phenotype specific to interactions with the micropyle.

      We appreciate the comment and the opportunity to further elaborate on the rationale of this experiment. While our data demonstrates a lack ofCatSper1<sup>Null</sup> sperm accumulation within the micropyle and ICS, we appreciate that this may be interpreted as the result of general motility defects, rather than a specific failure in undergoing hyperactivation and micropyle recognition. CatSper1<sup>Null</sup>  sperm are known to lack hyperactivated motility and exhibit a progressive loss of forward motility over time. After 90 minutes, only ~20% of CatSper1<sup>Null</sup>l sperm remain motile, compared to over 70% in fertile sperm (Qi et al., 2007). Of note, under our IVF conditions, CatSper1<sup>Null</sup> sperm retained sufficient progressive motility during the first hour post-insemination to bind the zona pellucida with comparable efficiency to CatSper1<sup>Het</sup> controls. Based on prior reports indicating that 15–35% of sperm exhibit hyperactivation by 90 minutes (Goodson et al., 2011), and considering that we inseminated with 100,000 progressively motile sperm, we estimate that approximately 3,000 hyperactivated CatSper1<sup>Null</sup> sperm were present in the dish. Yet, none were observed near the micropyle canal, its opening, or within the ICS. This led us to conclude that failure to hyperactivate underlies the inability of CatSper1<sup>Null</sup> sperm to reach and traverse the micropyle. Also, we appreciate that identifying the molecular components of the micropyle would allow direct testing of whether the CatSper channel is activated in response to micropyle-associated signals. Indeed, no targeted perturbation of molecular interaction regulating micropyle recognition was performed in this study, as the molecular identity of the zebrafish micropyle guidance cue remains unknown. Efforts to identify and characterize this factor are ongoing in our lab and lie outside the scope of the current work. Therefore, throughout the manuscript, we have clarified that it is the failure to undergo hyperactivation, rather than the absence of CatSper per se, that limits the ability of sperm to locate and traverse the micropyle. The rationale for the experiment, the interpretation of our findings, and relevant future directions have been further elaborated in the revised Abstract, Impact Statement and Discussion (lines 40-41; 46-47; 343-365; 376-379; 389-399; 470-486).

      Reviewer #1 (Recommendations for the authors):

      Minor Comments

      (1) Figure numbering

      There appear to be inconsistencies in the figure references. For example, what is referred to as Figure 3F in the text is actually Figure 4F. Please review and correct all figure labels for accuracy.

      We thank the reviewer for pointing this out. We have carefully reviewed the manuscript and corrected all figure references throughout the text. Also, for better flow and coherence, we have moved the paragraph describing the videos to the end of the Results section titled "Mouse sperm recognize the micropylar region of fish oocytes." Previously, the callout of panels in Figure 3 was out of order (3A, 3B, 3E, 3C, 3D), and this reorganization also helps maintain logical progression through the figure panels.

      (2) Figure 5 terminology:

      The term "normal" sperm should be replaced with "CatSper heterozygous (Het)" sperm to avoid confusion and improve precision.

      We thank the reviewer for this helpful suggestion. We have revised the terminology in Figure 5 and throughout the manuscript, replacing “normal” sperm with “CatSper1 heterozygous (Het)”

      Reviewer #2 (Recommendations for the authors):

      In addition to my comments in the public review, I would encourage the authors to consider the following suggestions:

      The authors show that mouse sperm can find and enter the fish micropyle, and that this depends on the presence of MP. To better assess sperm binding to the micropyle region, the number of sperm binding to the micropyle vs. non-micropyle chorion should be clearly quantified, as well as the percentage of sperm that enter the micropyle compared to the total used for insemination. The authors state several times throughout the text that a "subpopulation" of mouse sperm finds and enters the micropyle, but it would be more precise and informative to give a percentage.

      We thank the reviewer for this suggestion. We have now reported also the number of sperm bound to the other regions of the chorion (away; lines 231-233), as well as the percentage of sperm that entered the micropyle relative to the total number used for insemination (lines 276-279).

      To ensure that all sperm are inside the chorion, the egg should be removed from the insemination dish, washed thoroughly, and then the chorion should be torn open to definitively show that the sperm were indeed inside.

      We thank the reviewer for these excellent suggestions. As per ensuring that the sperm are inside the ICS, (as shown now in Figures 4A, F, G , Supplementary Figure 6 and Supplementary Movies 3–5), the inseminated oocytes were thoroughly washed prior to imaging to ensure that only sperm located inside the chorion were visualized (as described in the Methods, lines 646-648). In addition, to confirm the spatial localization of sperm within the ICS, we are now including additional TEM images showing sperm in the ICS (Figure 4G, right panel). Also, we generated orthogonal views using ZEN Lite software (Zeiss, Germany) from a z-stack encompassing the full volume of the chorion, ICS, and oocyte (added in the supplementary materials, as Supplementary Figure 6). These views display three focal planes: the surface of the WGA-stained chorion, the middle of the ICS, and the oocyte plasma membrane. Sperm nuclei stained with Hoechst are clearly visible below the chorion surface and above the oocyte plasma membrane, confirming their localization within the ICS. Additionally, in a separate set of experiments, as recommended by this reviewer, we mechanically disrupted the chorion and consistently detected sperm within the ICS. This procedure, however, was technically challenging: upon disruption, the chorion often collapsed onto the oocyte, and during the extraction process, sperm were sometimes displaced. As a result, it was not always possible to determine with complete confidence whether the sperm had originally been located inside or outside the chorion. However, we hope that the additional TEM and confocal images (Figure 4G and Supplementary Figure 6) offer further support for the localization of sperm within the ICS.

      I would further suggest that they examine the micropyle opening after the entry of multiple sperm, as well as the dynamics of egg activation during insemination with mouse sperm.

      Thank you. We now include one additional TEM image capturing the full structure of a micropyle that was traversed by multiple mouse sperm (shown in Figure 4G, left panel).

      At what point does the micropyle detach from the egg surface? Live imaging of this process with a confocal microscope would be very informative.

      During live imaging, the interval between placing the oocyte in the imaging dish, replacement of Hank’s solution with HTF and the addition of sperm, followed by the initiation of video acquisition, is approximately 2 to 3 min. By this time, the ICS is already apparent (Supplementary Video 2), although the micropyle appears to remain adherent to the egg cell. Partial detachment of the micropyle from the egg cell begins around 6–7 minutes after imaging starts and continues progressively over time. We provide time-lapse imaging frames to show the micropyle detachment under mouse IVF conditions (Supplementary Figure 5).

      Along the same lines, sperm should be doubly labeled with an acrosome-independent marker, i.e., a live DNA stain or MitoTracker. Then the authors could track if any sperm are actually able to enter the egg itself, which would be highly unlikely but an important detail to confirm.

      Thank you for pointing this out. In our assays designed to study sperm–micropyle interactions, Hoechst staining of nuclei in transgenically labeled acrosome sperm showed no indication of sperm adhesion to, or fusion with, the zebrafish egg cell (Figure 4D).

      Line 242, 282: The text should refer to Figure 4, not 3. Please make sure all figure references correspond to the correct figure and panel.

      Thank you for bringing this to our attention. We have carefully reviewed the manuscript and corrected the reference to Figure 4, along with all other figure and panel citations to ensure they accurately correspond to the correct content. Also, to improve the overall flow, we relocated the paragraph describing the videos to the end of the Results section titled "Mouse sperm recognize the micropylar region of fish oocytes". This change also helped correct the sequence of figure panel references, which were previously cited out of order (i.e., 3A, 3B, 3E, 3C, 3D).

      Line 244: The authors quantify sperm that are "away" from the micropyle, but this is not clearly defined. This should be given as a set radius or distance from the center (e.g., in microns). If the sperm are still motile, can this be accurately measured?

      We thank the reviewer for this valuable suggestion. We have now defined “away from the micropyle” as a distance greater than 160 µm from the center of the micropyle. This measurement was determined using confocal z-stack projections of fixed samples. These details have been added to the revised Methods section (lines 670-674).

      To strengthen the conclusion that the sperm chemoattractant is indeed conserved from fish to mammals, the authors could show that zebrafish sperm are also able to find/approach mouse eggs. Even more compelling would be to show the same is true for other species combinations. As it stands, the choice of comparing mouse and zebrafish does not seem scientifically motivated but rather due to their availability.

      We thank the reviewer for this important suggestion. To test whether zebrafish sperm are capable of binding to the mammalian zona pellucida, we conducted the suggested experiment: ovulated, cumulus-free mouse oocytes were placed in water and incubated with zebrafish sperm. We did not observe any zebrafish sperm bound to the mouse zona pellucida, consistent with the hypothesis that zebrafish sperm do not recognize or interact with mammalian zonae or ZP proteins. This has now been added in the Results (lines 183-187) and shown in Supplementary Figure 2. We interpret these findings as in cross-species insemination assays, reciprocity in sperm-egg interaction is not always observed. For example, while human sperm bind only to human zonae and not to mouse zonae, mouse sperm are able to bind both mouse and human zonae (Avella et al., 2014; Baibakov et al., 2012; Bedford, 1977). This asymmetry may reflect species-specific adaptations in sperm-egg recognition. We have now added this point to the revised Discussion to clarify the rationale and context of our approach (lines 416-423).

      As per the choice of experimental models, while we agree that testing additional species combinations would broaden the scope of the findings, the choice to compare mouse and zebrafish was not solely based on availability. Rather, it was motivated by the opportunity to examine sperm guidance across two evolutionary distant vertebrates. This contrast allows us to seek for potential conservation of structural or molecular cues involved in gamete interaction. Additionally, both zebrafish and mouse offer extensive gene editing, blotting and imaging reagents, which are particularly valuable should future studies aim to identify and functionally disrupt genes encoding micropyle-associated proteins and their putative orthologs in mammals.

      For the CatSper experiment, I would suggest that the authors repeat this experiment with another mouse sperm mutant that is known to have reduced/altered motility. With the current data, I do not believe the failure to find/enter the micropyle is necessarily CatSper-specific. Because we do not know what the sperm interacts with in the micropyle or what the MP interacts with on the sperm, the signaling pathway cannot be tested, making other controls necessary for these results to be meaningful.

      Thank you for highlighting this important point. A wide range of mouse models with sperm motility defects exhibit subfertility or infertility due to structural abnormalities in the axoneme or midpiece rigidity. (Miyata et al., 2024). These defects often result in impaired progressive motility, failure to reach the zona pellucida, or inability to bind or penetrate it. In contrast, we could test and validate that CatSper1<sup>Null</sup> sperm display preserved early progressive motility but fail to transition into hyperactivated motility, making them particularly well suited for specifically assessing the role of hyperactivation in sperm navigation toward and entry into the micropyle. Taken together, these points, along with those discussed in our response to the public review, led us to conclude that the CatSper1<sup>Null</sup> model provides the most biologically relevant context currently available to assess the role of hyperactivation in guiding sperm to the micropyle.

      The authors could greatly strengthen the discussion by addressing the key points I raised in the public review, particularly in terms of interpreting these results in the context of each species' physiological mode of fertilization.

      We thank the reviewer for this important recommendation. We have carefully revised the Discussion to address the key points raised in the public review, particularly by framing our findings within the context of the distinct physiological modes of fertilization in each species, as indicated n our answers to the public review. We hope these additions have strengthened the manuscript as suggested.

    1. eLife Assessment

      The article presents important findings on the impact of climate change on odonates, integrating phenological and range shifts to broaden our understanding of biodiversity change. The study leverages extensive natural history data, offering a convincing analysis of temporal trends in phenology and range limit and their potential drivers.

    2. Reviewer #1 (Public review):

      Summary:

      This study evaluates whether species can shift geographically, temporally, or both ways in response to climate change. It also teases out the relative importance of geographic context, temperature variability, and functional traits in predicting the shifts. The study system is large occurrence datasets for dragonflies and damselflies split between two time periods and two continents. Results indicate that more species exhibited both shifts than one or the other (or neither), and that geographic context and temperature variability were more influential than traits. The results have implications for future analyses (e.g. incorporating habitat availability) and for choosing winner and loser species under climate change. The results also seem to support climate vulnerability assessments for species that rely on geographic range size and geospatial climate data layers rather than more detailed information (like demographic rates, abundances, or traits) that may not be so readily available. The methodology would be useful for other taxa and study regions with strong participatory ("citizen") science and extensive occurrence data.

      Strengths:

      This is an organized and well written paper that builds on a popular topic and moves it forward. It has the right idea and approach, and the results are useful answers to the predictions and for conservation planning (i.e. identifying climate winners and losers). There is technical proficiency and analytical rigor driven by an understanding of the data and its limitations.

    3. Reviewer #2 (Public review):

      Summary:

      This paper explores a highly interesting question regarding how species migration success relates to phenology shifts, and it finds a positive relationship. The findings are significant, and the strength of the evidence is solid. However, there are substantial issues with the writing, presentation, and analyses that need to be addressed. First, I disagree with the conclusion that species that don't migrate are "losers" - some species might not migrate simply because they have broad climatic niches and are less sensitive to climate change. Second, the results concerning species' southern range limits could provide valuable insights. These could be used to assess whether sampling bias has influenced the results. If species are truly migrating, we should observe northward shifts in their southern range limits. However, if this is an artifact of increased sampling over time, we would expect broader distributions both north and south. Finally, Figure 1 is missed panel B, which needs to be addressed.

      Comments on revised version:

      The revision has substantially improved the paper.

    4. Reviewer #3 (Public review):

      Summary:

      In their article "Range geography and temperature variability explain cross-continental convergence in range and phenology shifts in a model insect taxon" the authors rigorously investigate the spatial and temporal trends in the occurrence of odonate species and their potential drivers. Specifically, they examine whether species shift their geographic ranges poleward or alter their phenology to cope with changing conditions. Leveraging opportunistic observations of European and North American odonates, they find that species showing significant range shifts also exhibited shifts to earlier emergence. Considering a broad range of potential predictors, their results reveal that geographical factors, but not functional traits, are associated with these shifts.

      Strengths:

      The article addresses an important topic in ecology and conservation that is particularly timely in the face of reports of substantial insects declines in North America and Europe over the past decades. Through data integration the authors leverage the rich natural history record for odonates, broadening the taxonomic scope of analyses of temporal trends in phenology and distribution. The combination of phenological and range shifts in one framework presents an elegant way to reconcile previous findings and informs about the drivers of biodiversity loss.

      Weaknesses:

      To better understand whether species shifting both their ranges and phenology are more successful, or as stated here are 'clear winners', and hence whether those that do neither are more vulnerable would require integrating population trends alongside the discussed response. The ~10% species that have not shifted their distribution or phenology might have not declined in abundance, if they have rapidly adapted to local changes in climatic conditions (i.e. they might show a plastic response). These species might be the real 'winners', while species that have recently shifted their ranges or phenology may eventually reach hard limits. The authors are discussing this limitation but might want to adapt their wording, given the potential for misinterpretation. The finding that species with more northern ranges showed lesser northward shifts would speak to the fact that some species have already reached such a geographical range limit.

      Achievements and impact:

      The results support broad differences in the response of odonate species to climate change, and the prediction that range geography and temperature seasonality are more important predictors of these changes than functional traits. Simultaneously addressing range and phenological shifts highlights that most species exhibit coupled responses but also identifies a significant portion of species that do not respond in these ways that are of critical conservation concern. These results are important for improving forecasts of species' responses to climate change and identifying species of particularly conservation concern. Although not exhaustive regarding abundance trends, the study presents an important step towards a general framework for investigating the drivers of multifaceted species responses.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Sumary:

      This study evaluates whether species can shift geographically, temporally, or both ways in response to climate change. It also teases out the relative importance of geographic context, temperature variability, and functional traits in predicting the shifts. The study system is large occurrence datasets for dragonflies and damselflies split between two time periods and two continents. Results indicate that more species exhibited both shifts than one or the other or neither, and that geographic context and temp variability were more influential than traits. The results have implications for future analyses (e.g. incorporating habitat availability) and for choosing winner and loser species under climate change. The methodology would be useful for other taxa and study regions with strong community/citizen science and extensive occurrence data.

      We thank Reviewer 1 for their time and expertise in reviewing our study. The suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      This is an organized and well-written paper that builds on a popular topic and moves it forward. It has the right idea and approach, and the results are useful answers to the predictions and for conservation planning (i.e. identifying climate winners and losers). There is technical proficiency and analytical rigor driven by an understanding of the data and its limitations.

      We thank Reviewer 1 for this assessment.

      Weaknesses:

      (1) The habitat classifications (Table S3) are often wrong. "Both" is overused. In North America, for example, Anax junius, Cordulia shurtleffii, Epitheca cynosura, Erythemis simplicicollis, Libellula pulchella, Pachydiplax longipennis, Pantala flavescens, Perithemis tenera, Ischnura posita, the Lestes species, and several Enallagma species are not lotic breeding. These species rarely occur let alone successfully reproduce at lotic sites. Other species are arguably "both", like Rhionaeschna multicolor which is mostly lentic. Not saying this would have altered the conclusions, but it may have exacerbated the weak trait effects.

      We thank the reviewer for their expertise on this topic. We obtained these habitat classifications from field guides and trait databases, and reviewed our primary sources to clarify the trait classifications. We reclassified the species according to the expertise of this reviewer and perform our analysis again; please see details below.

      (2) The conservative spatial resolution (100 x 100 km) limits the analysis to wide- ranging and generalist species. There's no rationale given, so not sure if this was by design or necessity, but it limits the number of analyzable species and potentially changes the inference.

      It is really helpful to have the opportunity to contextualize study design decisions like this one, and we thank the reviewer for the query. Sampling intensity is always a meaningful issue in research conducted at this scale, and we addressed it head-on in this work.

      Very small quadrats covering massive geographical areas will be critically and increasingly afflicted by sampling weaknesses, as well as creating a potentially large problem with pseudoreplication. There is no simple solution to this problem. It would be possible to create interpolated predictions of species’ distributions using Species Distribution Models, Joint Species Distribution Models, or various kinds of Occupancy Models. None of these approaches then leads to analyses that rely on directly observed patterns. Instead, they are extrapolations, and those extrapolations typically fail when tested, although they have still been tested (for example, papers by Lee-Yaw demonstrate that it is rare for SDMs to predict things well; occupancy models often perform less well than SDMs and do not capture how things change over time - Briscoe et al. 2021, Global Change Biology). The result of employing such techniques would certainly be to make all conclusions speculative, rather than directly observable. 

      Rather than employing extrapolative models, we relied on transparent techniques that are used successfully in the core macroecology literature that address spatial variation in sampling explicitly and simply. Moreover, we constructed extensive null models that show that range and phenology changes, respectively, are contrary to expectations that arise from sampling difference. 100km quadrats make for a reasonable “middle-ground” in terms of the effects of sampling, and we added a reference to the methods section to clarify this (see details below).

      (3) The objective includes a prediction about generalists vs specialists (L99-103) yet there is no further mention of this dichotomy in the abstract, methods, results, or discussion.

      Thank you for pointing this out - it is an editing error that should have been resolved prior to submission. We replaced the terms specialist and generalist with specific predictions based on traits (see details below).

      (4) Key references were overlooked or dismissed, like in the new edition of Dragonflies & Damselflies model organisms book, especially chapters 24 and 27.

      We thank Reviewer 1 for making us aware of this excellent reference. We have reviewed the text and include it as a reference, in addition to other references recommended by Reviewer 1 and other reviewers (see details below).

      Reviewer #2 (Public review):

      Summary:

      This paper explores a highly interesting question regarding how species migration success relates to phenology shifts, and it finds a positive relationship. The findings are significant, and the strength of the evidence is solid. However, there are substantial issues with the writing, presentation, and analyses that need to be addressed. First, I disagree with the conclusion that species that don't migrate are "losers" - some species might not migrate simply because they have broad climatic niches and are less sensitive to climate change. Second, the results concerning species' southern range limits could provide valuable insights. These could be used to assess whether sampling bias has influenced the results. If species are truly migrating, we should observe northward shifts in their southern range limits. However, if this is an artifact of increased sampling over time, we would expect broader distributions both north and south. Finally, Figure 1 is missed panel B, which needs to be addressed.

      We thank Reviewer 2 for their time and expertise in reviewing our study.

      It is possible that some species with broad niches may not need to migrate, although in general failing to move with climate change is considered an indicator of “climate debt”, signaling that a species may be of concern for conservation (ex. Duchenne et al. 2021, Ecology Letters). We revised the discussion to acknowledge potential differences in outcomes (please see details below).

      We used null models to test whether our results regarding range shifts were robust, and if they varied due to increased sampling over time. We found that observed northern range limit shifts are not consistent with expectations derived from changes in sampling intensity (Figure S1, S2). 

      We thank Reviewer 2 for pointing out this error in Figure 1. This conceptual figure was a challenge to construct, as it must illustrate how phenology and range shifts can occur simultaneously or uniquely to enable a hypothetic odonate to track its thermal niche over time. In a previous version of the figure, we had a second panel and we failed to remove the reference to that panel when we simplified the figure. We have updated the figure and figure caption (please see details below).

      Reviewer #3 (Public review):

      Summary:

      In their article "Range geographies, not functional traits, explain convergent range and phenology shifts under climate change," the authors rigorously investigate the temporal shifts in odonate species and their potential predictors. Specifically, they examine whether species shift their geographic ranges poleward or alter their phenology to avoid extreme conditions. Leveraging opportunistic observations of European and North American odonates, they find that species showing significant range shifts also exhibited earlier phenological shifts. Considering a broad range of potential predictors, their results reveal that geographical factors, but not functional traits, are associated with these shifts.

      We thank Reviewer 3 for their expertise and the time they spent reviewing our study. Their suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      The article addresses an important topic in ecology and conservation that is particularly timely in the face of reports of substantial insect declines in North America and Europe over the past decades. Through data integration the authors leverage the rich natural history record for odonates, broadening the taxonomic scope of analyses of temporal trends in phenology and distribution to this taxon. The combination of phenological and range shifts in one framework presents an elegant way to reconcile previous findings improving our understanding of the drivers of biodiversity loss.

      We thank Reviewer 3 for this assessment.

      Weaknesses:

      The introduction and discussion of the article would benefit from a stronger contextualization of recent studies on biological responses to climate change and the underpinning mechanism.

      The presentation of the results (particularly in figures) should be improved to address the integrative character of the work and help readers extract the main results. While the writing of the article is generally good, particularly the captions and results contain many inconsistencies and lack important detail. With the multitude of the relationships that were tested (the influence of traits) the article needs more coherence.

      We thank Reviewer 3 for these suggestions. We revised the introduction and discussion to better contextualize species’ responses to climate change and the mechanisms behind them (see details below). We carefully reviewed all figures and captions, and made changes to improve the clarity of the text and the presentation of results (see details below).

      Reviewer #1 (Recommendations for the authors):

      Comment:

      (1) Following weakness #1 in the public review, the authors should review the habitat classifications, consult with an odonatologist, and reclassify many species from Both to Lentic and redo the analysis.

      Thank you for pointing out this disagreement among expert habitat classifications that we cited and other literature. We reclassified species’ habitat preferences based on classifications by Hof et al., a source that was consistent with your suggestions, and identified additional species as Lentic that our other references had identified as Both. We performed our analysis with this new dataset and, as you suspected, our results did not change qualitatively: species habitat preferences did not predict their range shifts.

      Hof, Christian, Martin Brändle, and Roland Brandl. "Lentic odonates have larger and more northern ranges than lotic species." Journal of Biogeography 33.1 (2006): 63-70.

      Comment:

      (2) Following weakness #2, would it be worthwhile or interesting to analyze a smaller ranging group (e.g. cut the quad size in half, 50 x 50 km) to bring in more species and potentially change the inference? Or is the paper too tightly constructed to allow this, even as a secondary piece?

      Thank you for this comment, as it highlights an important consideration for macroecological analyses, and the importance of balancing multiple factors for determining quadrat size. Issues exist with identifying drivers of range boundaries among species with narrow ranges when they are analyzed separately from wide-ranging species, and examining larger quadrats can actually help clarify drivers (Szabo, Algar, and Kerr 2009). The smaller quadrats are, the higher the likelihood that the species is actually there but was never observed, or that the quadrat only covers unsuitable habitat and the species is absent from the entire (or almost entire) quadrat. Too many absences creates issues with violating model assumptions, and creates noise that makes it difficult to identify drivers of species’ range and phenology shifts.

      Moreover, we constructed extensive null models that show that range and phenology changes, respectively, are contrary to expectations that arise from sampling difference. 100km quadrats make for a reasonable “middle-ground”, and we have included a brief explanation of this in the text: “We assigned species presences to 100×100 km quadrats, a scale that is large enough to maintain adequate sampling intensity but still relevant to conservation and policy (Soroye et al., 2020), to identify the best sampled species.”  (Lines 170-172).

      Szabo, Nora D., Adam C. Algar, and Jeremy T. Kerr. "Reconciling topographic and climatic effects on widespread and range‐restricted species richness." Global Ecology and Biogeography 18.6 (2009): 735-744.

      Comment:

      (3) Following weakness #3, are specialists the ones that "failed to shift" (L18)? If so please specify. The prediction about generalists vs specialists needs to be removed or incorporated in other parts of the paper.

      Thank you for pointing this out, we intended to suggest that species with more generalist habitat requirements might be better able to shift, but ultimately found that traits did not predict species’ shifts. We corrected our prediction regarding habitat generalists as follows: “We predicted that species able to use both lentic and lotic habitats would shift their phenologies and geographies more than those able to use just one habitat type, as generalists outperform specialists as climate and land uses change (Ball-Damerow et al., 2015, 2014; Hassall and Thompson, 2008; Powney et al., 2015; Rapacciuolo et al., 2017).” (Lines 128-132).

      Comment:

      (4) Following weakness #4, cite Pinkert et al at lines 70-73 and Rocha-Ortega et al at lines 73-77 along with https://doi.org/10.1098/rspb.2019.2645. Add Sandall et al https:// doi.org/10.1111/jbi.14457 to L69 references.

      Thank you for the excellent reference suggestions, we have added them as suggested (Lines 80, 86, 77).

      Comment:

      Other comments/suggestions:

      (1) Title: consider adding temp variability 'Range geography and temperature variability, not functional traits,...'.

      Thank you for this suggestion, we have added temperature variability to the title: “Range geography and temperature variability explain cross-continental convergence in range and phenology shifts in a model insect taxon”.

      Comment:

      (2) L125: is (northern) Mexico included in North America?

      Yes, we did include observations from Northern Mexico, and have specified this in the text: “We retained ~1,100,000 records from Canada, the United States, and Northern Mexico, comprising 76 species (Figure 2).” (Lines 174-176).

      Comment:

      (3) L128: I'd label this section 'Temperature variability' rather than 'Climate data'.

      Thank you, we agree that this is a more appropriate title for this section, and have replaced ‘Climate data’ with ‘Temperature variability’ (Line 185).

      Comment:

      (4) Table 2: why are there no estimates for the traits?

      We apologise, this information should have been included in the main body of the manuscript, but was only explained in the Table 2 caption. We have added the following explanation: “Non-significant variables, specifically all functional traits, were excluded from the final models.”. (Line 312-323).

      Comment:

      (5) Figure 2: need to identify the A-D panels.

      We apologise for this error and have clarified the differences between panels in the figure caption:

      “Figure 2: Richness of 76 odonate species sampled in North America and Europe in the historic period (1980-2002; panes A and C) and the recent period (2008-2018; panes B and D). Species richness per 100 × 100 km quadrat is shown in panes A and B, while panes C and D show species richness per 200 × 200 km quadrat. Dark red indicates high species richness, while light pink indicates low species richness.” (Lines 1002-1006).

      Comment:

      (6) L163-173: I am not familiar with this analysis but it sounds interesting and promising, I am not sure if this can be clarified further. Why the -25 to 25, and -30 to 30, doesn't the -35 to 35 cover these? And what is meant by "include only phenology shifts that could be biologically meaningful", that larger shifts would not be meaningful or tied to climate change?

      We used different cutoffs for phenology shifts to inspect for outliers that were likely to be errors, potentially do to insufficient sampling to calculate phenology. We clarified in the text as follows:

      “We retained emergence estimates between March 1st and September 1st, as well as species and quadrats that showed a difference in emergence phenology of -25 to 25 days, -30 to 30 days, or -35 to 35 days between both time periods, to include only phenology shifts that could be biologically meaningful to environmental climate change (i.e. exclude errors).” (Lines 169-173).

      Comment:

      (7) L193-200: I agree but would make a distinction between ecological vs functional traits, as other studies view geographic traits as ecological manifestations of functional biology, e.g. https://doi.org/10.1016/j.biocon.2019.07.001 and https://doi.org/10.1016/ j.biocon.2023.110098.

      Thank you for this suggestion, and for making us aware of the thinking around range geographies as ecological traits. We have specified throughout the manuscript that the ‘traits’ we are considering are ‘functional traits’, changed the methods subsection title to “Range geographies and functional traits” (Line 252), and added a brief discussion of ecological traits: “Geographic range and associated climatic characteristics are often considered ecological traits, as they are consequences of functional traits and their interactions with geographic features (Bried and Rocha-Ortega, 2023; Chichorro et al., 2019).” (Lines 256-259).

      Comment:

      (8) L203: What's the rationale for egg-laying habitat as "biologically relevant to spatial and temporal responses to climate change"? That one's not as obvious as the others and needs a sentence more. Also, I am wondering why other traits were not considered here, like color lightness and voltinism. And why not wing size instead of body size, or better yet the two combined (wing loading) as a proxy for dispersal ability?

      We agree that our rationale for using this trait should be better explained, and we have included the following explanation: “Egg laying habitat was assigned according to whether species use exophytic egg-laying habitat (i.e. eggs laid in water or on land, relatively larger in number), or endophytic egg-laying habitat (i.e. eggs laid inside plants, usually fewer in number); species using exophytic habitats are associated with greater northward range limit shifts (Angert et al., 2011).” (Lines 271-275).

      We considered traits that have been found to be important for range and phenology shifts among odonates, as well as being key traits for expectations for species responses to climate change. Flight duration and body size are correlated with dispersal ability (Powney et al. 2015). Body size is also correlated with competitive ability (Powney et al. 2015), potentially making it an important predictor of a species’ ability to establish and maintain populations in expanding range areas. Traits correlated with range shifts also include breeding habitat type (Powney et al. 2015; Bowler et al. 2021) and egg laying habitat (Angert et al. 2011). Ideally, we would have used dispersal data from mark/release/recapture studies, but it was not available for many of the species included in this study. After finding that none of the functional traits we included were related to range shifts, there was no reason to believe that a further investigation of traits would be meaningful.

      Angert AL, Crozier LG, Rissler LJ, Gilman SE, Tewksbury JJ, Chunco AJ. 2011. Do species’ traits predict recent shifts at expanding range edges? Ecology Letters 14:677–689. doi:10.1111/j.1461-0248.2011.01620.x

      Bowler DE, Eichenberg D, Conze K-J, Suhling F, Baumann K, Benken T, Bönsel A, Bittner T, Drews A, Günther A, Isaac NJB, Petzold F, Seyring M, Spengler T, Trockur B, Willigalla C, Bruelheide H, Jansen F, Bonn A. 2021. Winners and losers over 35 years of dragonfly and damselfly distributional change in Germany.Diversity and Distributions 27:1353–1366. doi:10.1111/ddi.13274

      Powney GD, Cham SSA, Smallshire D, Isaac NJB. 2015. Trait correlates of distribution trends in the Odonata ofBritain and Ireland. PeerJ 3:e1410. doi:10.7717/peerj.1410

      Comment:

      (9) L210: I count at least 5 migratory species in table S3, so although maybe not enough to analyze it's misleading to say "nearly all" were non-migratory, revise to "most" or "vast majority".

      Thank you for pointing this out, we have made the suggested correction (Line 277).

      Comment:

      (10) L252-254: save this for the Discussion and write a more generalized statement for results to avoid citations in the results.

      Thank you for this suggestion, we have moved this to the discussion (Lines 517-527).

      Comment:

      (11) Figures S5 & S6: these are pretty important, I'd consider elevating them to the main document as one figure with two panels.

      Thank you for this suggestion, we agree these figures should be elevated to the main text, and have made them into a panel figure (Figure 4).

      Comment:

      (12) L305-307: great point and recommendation!

      Thank you very much for this positive feedback!

      Comment:

      (13) L335-336: another place to cite https://doi.org/10.1098/rspb.2019.2645 which includes a thermal sensitivity index and would add an odonate citation behind the statement.

      Thank you for this excellent suggestion, we have added this citation (line 480). (Rocha-Ortega et al. 2020)

      Comment:

      (14) L352-353: again see also https://doi.org/10.1098/rspb.2019.2645.

      Thank you for highlighting this reference, we have added it to Line 505 as suggested.

      Comment:

      (15) L355: revise "populations that coexist" to "species that co-occur" (big difference between population and species levels and between coexistence and co-occurrence).

      Thank you very much for pointing this out, we have made the suggested change (Line 507).

      Comment:

      (16) L359-365: are the winners and losers depicted in Figures S5 & S6? If so reference the figure (which I suggest combining and promoting to the main text), if not create a table listing the analyzed species and their winner/loser status.

      We agree that this is an excellent place to bring up Figures S5 and S6 from the supplemental. We have moved them to the main document as one figure and referenced it at line 510.

      Reviewer #2 (Recommendations for the authors):

      Comment:

      (1) Line 53-55: The claim that "These relationships generalize poorly taxonomically and geographically" is valid, but the study only tests Odonata on two continents.

      Thank you for this comment – the word ‘generalize’ may imply that our study tries to find a general pattern across many groups. We have changed the language to: “However, these relationships are inconsistent across taxa and regions, and cross-continental tests have not been attempted (Angert et al., 2011; Buckley and Kingsolver, 2012; Estrada et al., 2016; MacLean and Beissinger, 2017).” (Lines 57-59).

      Comment:

      (2) Line 58-59: Is this statement only true for Odonata? It does not seem to hold for plants, for example.

      Thank you for this comment – this statement references a meta-analysis of multiple animal and plant taxa, but the evidence for the importance of range location comes from animal taxa. We have specified that we are referring to animal species to clarify (Line 60).

      Comment:

      (3) Line 87-91: This section is difficult to understand and needs clarification.

      We have clarified this section as follows: “While warm-adapted species with more equatorial distributions could expand their ranges poleward following warming (Devictor et al., 2008), they could also increase in abundance in this new range area relative to species that historically occupied those areas and are less heat-tolerant (Powney et al., 2015).” (Lines 95-121).

      Comment:

      (4) Line 99-100: Please define "generalist" and "specialist" more clearly here (e.g., based on climate niche?).

      Thank you for pointing this out, we intended to suggest that species with more generalist habitat requirements might be better able to shift, but ultimately found that traits did not predict species’ shifts. We corrected our prediction regarding habitat generalists as follows: “We predicted that species able to use both lentic and lotic habitats would shift their phenologies and geographies more than those able to use just one habitat type, as generalists outperform specialists as climate and land uses change (Ball-Damerow et al., 2015, 2014; Hassall and Thompson, 2008; Powney et al., 2015; Rapacciuolo et al., 2017).” (Lines 128-132).

      Comment:

      (5) Line 122: Replace the English letter "X" in "100x100 km" with the correct mathematical symbol.

      We have made the suggested replacement throughout the manuscript.

      Comment:

      (6) Line 148: To address sampling effects, you could check the paper: https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.15524. Additionally, maximum and minimum values are sensitive to extreme data points, so using 95% percentiles might be more robust.

      Thank you for sharing this paper, as it offers a valuable perspective on the study of species’ ranges. While our dataset is substantially composed of observations from adult sampling protocols, unlike the suggested paper which compares adults and juveniles, this is an interesting alternative approach.

      For our purposes it is meaningful to include outliers, as otherwise we may have missed individuals at the leading edge of range expansions. Our intent here was to detect range limits, as opposed to finding the central tendency of species distributions. This approach is widely accepted in the macroecology literature (i.e. Devictor et al., 2012, 2008; Kerr et al. 2015).

      We have included the following discussion of our approach in the methods section:

      “We followed widely accepted methods to determine species range boundaries (Devictor et al., 2012, 2008; Kerr et al., 2015), although other methods exist that are appropriate for different data types and research questions i.e. (Ni and Vellend, 2021). We assigned species presences to 100×100 km quadrats, a scale that is large enough to maintain adequate sampling intensity but still relevant to conservation and policy (Soroye et al., 2020), to identify the best sampled species.” (Lines 168-173).

      Kerr JT, Pindar A, Galpern P, Packer L, Potts SG, Roberts SM, Rasmont P, Schweiger O, Colla SR, Richardson LL,Wagner DL, Gall LF, Sikes DS, Pantoja A. 2015. Climate change impacts on bumblebees converge across continents. Science 349:177–180. doi:10.1126/science.aaa7031

      Soroye P, Newbold T, Kerr J. 2020. Climate change contributes to widespread declines among bumble bees across continents. Science 367:685–688. doi:10.1126/science.aax8591

      Devictor V, Julliard R, Couvet D, Jiguet F. 2008. Birds are tracking climate warming, but not fast enough.Proceedings of the Royal Society B: Biological Sciences 275:2743–2748. doi:10.1098/rspb.2008.0878

      Devictor V, van Swaay C, Brereton T, Brotons L, Chamberlain D, Heliölä J, Herrando S, Julliard R, Kuussaari M,Lindström Å, Reif J, Roy DB, Schweiger O, Settele J, Stefanescu C, Van Strien A, Van Turnhout C,

      Vermouzek Z, WallisDeVries M, Wynhoff I, Jiguet F. 2012. Differences in the climatic debts of birds and butterflies at a continental scale. Nature Clim Change 2:121–124. doi:10.1038/nclimate1347

      Comment:

      (7) Line 195: The species' climate niche should also be considered a product of evolution.

      Thank you for this suggestion. To address this comment and a comment from another reviewer, we changed the text to the following: “Geographic range and associated climatic characteristics are often considered ecological traits, as they are consequences of functional traits and their interactions with geographic features (Bried and Rocha-Ortega, 2023; Chichorro et al., 2019).” (Lines 256-259).

      Comment:

      (8) Line 244: This speculative statement belongs in the Discussion section.

      Thank you for this suggestion, we have moved this statement to the discussion (Lines 451-453).

      Comment:

      (9) Line 252-254: The projection of Coenagrion mercuriale's range contraction is not part of your results and should be clarified or removed.

      Following this suggestion and a similar suggestion from another reviewer, we moved this text to the discussion (Line 517-527).

      Comment:

      (10) Line 314-316: If the species can tolerate warmer temperatures better, why would they migrate?

      We apologize for the confusion, and we have reworded the section as follows: “Emerging mean conditions in areas adjacent to the ranges of southern species may offer opportunities for range expansions of these relative climate specialists, which can then tolerate climate warming in areas of range expansion better than more cool-adapted historical occupants (Day et al., 2018).” (Lines 445-448).

      Comment:

      (11) Line 334-335: Species' tolerance to temperature likely depends on their traits, which were not tested in this study. This should be noted.

      We agree, and we have removed the wording “rather than traits” from this sentence (Line 479).

      Reviewer #3 (Recommendations for the authors):

      Comment:

      (1) Title: The title is too general not specifying that your results are on odonates only, but also stressing the implicit role of climate change to a degree the tests do not support.

      Following this comment and a suggestion from another reviewer we changed the title to the following: “Range geography and temperature variability explain cross-continental convergence in range and phenology shifts in a model insect taxon”. We wanted to emphasize our use of Odonates as a model species that we used to ask broad questions, while being more specific about the climatic variable that we examined (temperature variability).

      Comment:

      (2) L32: consider including Novella-Fernandez et al. 2023 (NatCommun) which addresses this topic in Odonates.

      Thank you for suggesting this very interesting paper, we have added it as a citation (Line 31-32).

      Comment:

      (3) L35: consider including Grewe et al. 2013 (GEB) and Engelhardt et al. 2022(GCB).

      Thank you for these excellent suggestions, we have added the citations (Line 35).

      Comment:

      (4) L47: rather write 'result from' instead of 'driven by'.

      We agree this is a better characterization and have corrected the wording (Line 48-49).

      Comment:

      (5) L49-52: There has been a recent study on this topic for birds (Neate-Clegg et al., 2024 NEE). However, specifying this to insects would make it not less relevant. This review for odonates might be helpful in this regard (Pinkert et al.. 2022, Chapter: "Odonata as focal taxa for biological responses to climate change" IN Dragonflies & Damselflies: Córdoba-Aguilar et al. (2022) Model Organisms for Ecological and Evolutionary Research.

      Thank you for again suggesting excellent references, we have added them to line 52-53, as well as adding the Pinkert citation to lines 61 and 82.

      Comment:

      (6) L53-66: Combine into one paragraph about drivers. With traits first and the environment second. The natural land cover perspective may be too complicated in this context. Consider focusing on generalities of the impact of changes within species' ranges.

      As suggested we have combined these into one paragraph about drivers (Line 59).

      Comment:

      (7) L67-69: The book from before would be a much stronger reference for this claim. Kalkmann et al (2018) do not address the emphasis of global change research in insects on bees and butterflies. Also, I would highlight that most of the current work is at a national scale, rather than cross-continental.

      Thank you for this suggestion, we have added the suggested reference and included that “…recently assembled databases of odonate observations provide a rare opportunity to investigate species’ spatiotemporal responses at larger taxonomic and spatial scales, particularly as most work has been done at national scales.” (Lines 75-77).

      Comment:

      (8) L68: consider rephrasing this part to '..provide a rare opportunity to investigate spatiotemporal biotic responses at larger taxonomic and spatial scales'

      We appreciate this suggestion and really like the wording. We have changed the phrase to read as follows: “While global change research on insects often emphasizes butterfly and bee taxa, recently assembled databases of odonate observations provide a rare opportunity to investigate species’ spatiotemporal responses at larger taxonomic and spatial scales, particularly as most work has been done at national scales.” (Lines 74-77).

      Comment:

      (9) L69: This characteristic is not unique to odonates and would hamper drawing general conclusions. Honestly, I think the detailed and comprehensive data on them is the selling point.

      Thank you for this suggestion, we have edited the sentence to emphasize their use as an indicator species: “Due to their use of aquatic and terrestrial habitat across life different stages, dragonflies and damselflies are also considered indicator species for both terrestrial and aquatic insect responses to changing climates (Hassall, 2015; Pinkert et al., 2022; Šigutová et al., 2025), giving the study of these species broad relevance for conservation.” (Lines 78-81)

      Comment:

      (10) L73: Indicator for what? The first part of the sentence would suggest lesser surrogacy for responses of other taxa. Reconsider this statement. They are well- established indicators for habitat intactness and freshwater biodiversity. Darwell et al. suggested their diversity can serve as a surrogate for the diversity of both terrestrial and aquatic taxa.

      Thank you for this suggestion, we have edited the sentence to emphasize their use as an indicator species: “Due to their use of aquatic and terrestrial habitat across life different stages, dragonflies and damselflies are also considered indicator species for both terrestrial and aquatic insect responses to changing climates (Hassall, 2015; Pinkert et al., 2022; Šigutová et al., 2025), giving the study of these species broad relevance for conservation.” (Lines 78-81)

      Comment:

      (11) L76: Fritz et al., is a study on mammals, not odonates.

      Thank you for pointing out this error, the reference has been removed (Line 84-85).

      Comment:

      (12) L84: Lotic habitats are generally better connected than lentic ones. Lentic species are considered to have a greater propensity for dispersal DUE to the lower inherent spatiotemporal stability (implying lower connectivity) compared to lotic habitats.

      Thank you for your comment, we have rewritten this section as follows: “For example, differences in habitat connectivity and dispersal ability may constrain range shifts for lentic species (those species that breed in slow moving water like lakes or ponds) and lotic species (those living in fast moving-water) in different ways (Kalkman et al., 2018). More southerly lentic species may expand their range boundaries more than lotic species, as species accustomed to ephemeral lentic habitats better dispersers (Grewe et al., 2013), yet lotic species have also been found to expand their ranges more often than lentic species, potentially due to the loss of lentic habitat in some areas (Bowler et al., 2021).” (Lines 88-95).

      Comment:

      (13) L90: I would be cautious with this interpretation. If only part of the range is considered (here a country in the northern Hemisphere) southern species are moving more of their range into and northern species more of their range out of the study area in response to warming (implying northward shifts).

      We have clarified this section as follows: “While warm-adapted species with more equatorial distributions could expand their ranges poleward following warming (Devictor et al., 2008), they could also increase in abundance in this new range area relative to species that historically occupied those areas and are less heat-tolerant (Powney et al., 2015).” (Lines 95-121)

      Comment:

      (14) L117: Odonata Central contains many county centroids as occurrence records. These could be an issue for your use case. I may have overlooked the steps you took to address this, but I think this requires at least more detail and possibly further removal/checks using for instance CoordinateCleaner. The functions implemented in this package allow you to filter records based on political units to avoid exactly this source of error.

      Thank you for this suggestion, we weren’t aware of this issue with Odonata Central. We used the CoordinaterCleaner tool in R to filter all odonate records that we used in our analyses. Less than 1% of observations in our dataset were identified as having potential problems by the tool, so we would not expect this to affect our inferences. However, in future we will employ this tool when using similar datasets.

      Comment:

      (15) L119: Please add a brief explanation of why this was necessary. I am ok with something along the lines in the supplement.

      We moved this information from the supplemental to the main text as follows: “If a species was found on both continents, we only retained observations from the continent that was the most densely sampled. If we merged data for one species found on both continents, we could not perform a cross-continental comparison. However, if the same species on different continents was treated as different species, this would lead to uninterpretable outcomes (and the creation of pseudo-replication) in the context of phylogenetic analyses. In addition, species found on both continents did not have sufficient data to meet criteria for the phenology analysis.” (Lines 161-167).

      Comment:

      (16) L132: This is the letters 'X' or 'x' are not multiplier symbols! Please change to the math symbol (×), everywhere.

      Thank you for pointing out this error, we have made the correction throughout the manuscript.

      Comment:

      (17) L133: add 'main' before 'flight period'

      Thank you for this suggestion, we have made the change. (Line 190)

      Comment:

      (18) L135: I suggest using the coefficient of variation, as it is controlled for the mean. Otherwise, what you see is partly the signature of temperature and not of its variation. For me, it's very difficult to understand what this variation of the variation means and at least needs more explanation.

      Thank you very much for this suggestion, we agree that using the coefficient of variation is a better fit for the question that we’re asking. We re-ran out analyses with the coefficient of variation as the measure of climate variability: all the results reported in the manuscript are now updated for that analysis (Line 377, Table 2), and we have also updated the methods section (Line 191). The results are qualitatively the same to our previous analysis, but we agree that they are now easier to interpret.            

      Comment:

      (19) L155: Please adequately reference all R packages (state the name, and a reference for them including the authors' names, title, and version).

      Thank you for pointing out this omission, we have added reference information for the glm function in base R (Line 298) and ensured all other packages are properly referenced.

      Comment:

      (20) L207: Mention the literature sources here (again).

      We agree that they should be referenced here again, and we have done so (Lines 267-268).

      Comment:

      (21) L209: You could use the number of grid cells as a proxy for range size.

      Following this excellent suggestion, we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Range size was not significant in our models, but we believe this is the best way to analyze our data, and so have updated our methods (Lines 261-263) and results (375-378).

      Comment:

      (22) L218: It would be preferable to say 'species-level' instead of 'by-species'.

      Thank you for this suggestion, we agree that this is clearer and made the change (Line 298).

      Comment:

      (23) L219-220: this is unclear. Please rephrase.

      We have clarified as follows: “We used both species-level frequentist (GLM; glm function in R) and Bayesian (Markov Chain Monte Carlo generalized linear mixed model, MCMCglmm; Hadfield, 2010) models to improve the robustness of the results.” (Lines 298-300).

      Comment:

      (24) L224: At least for Europe there is a molecular phylogeny available, which you should preferably use (Pinkert et al. 2018, Ecography). Otherwise, I am ok with using what is available

      We apologize that the nature of the phylogeny that we used was not clear; the phylogeny that we used was built similarly to that in Pinkert et al. 2018, Ecography. It created a molecular phylogeny with a morphological/taxonomic tree as the backbone tree, so that species could only move within their named genera or families. We clarified this in the manuscript as follows:

      “We used the molecular phylogenetic tree published by the Odonate Phenotypic Database (Waller et al., 2019), which used a morphological and taxonomic phylogeny as the backbone tree, allowing species to move within their named genera or families according to molecular evidence (Waller and Svensson, 2017).” (Lines 302-305).

      Comment:

      (25) L233: You said so earlier (1st sentence of this paragraph).

      Thank you for pointing this out, we removed the repetitive sentence (Line 323).

      Comment:

      (26) L236-238: To me, it makes more sense to test this prior to fitting the phylogenetic models.

      MCMC-GLMM is considerably less familiar to most researchers than general linear models or there derivatives/descendants, such as PGLS. We report models both with and without phylogenetic relationships included for the sake of transparency, and we are happy to acknowledge that no interpretation here changes substantially relative to these decisions. However, failing to report models that included possible (if small) effects of phylogenetic relatedness might cause some readers to question what those models might have implied. For the moment, we are opting for the most transparent reporting approach here.

      Comment:

      (27) L241: Rather say directly XX of XX species in our data....

      (28) L245: Same here. Provide the actual numbers, please.

      Thank you for this suggestion, we made this change on Line 332 and Line 334.

      Comment:

      (29) L247-249: Then not necessary.

      This issue highlights a challenge in the global biology literature and around the issue of biodiversity monitoring for understanding global change impacts on species. Almost no studies have been able to report simultaneous range and phenology shifts, and the literature addresses these biotic responses to global change predominantly as distinct phenomena. Differences in numbers of species for which these observations exist, even among the extremely widely-observed odonates, seems to us to be a meaningful issue to report on. If the reviewer prefers that we abbreviate or remove this sentence, we are happy to do so.

      Comment:

      (30) L251:261: That is discussion as you interpret your results.

      Following your suggestion and the suggestion of another reviewer, we moved the following lines to the discussion section: “Species that did not shift their ranges northwards or advance their phenology included Coenagrion mercuriale, a European species that is listed as near threatened by the IUCN Red List (IUCN, 2021), and is projected to lose 68% of its range by 2035 (Jaeschke et al., 2013).” (Lines 517-527).

      Comment:

      (31) 252: Good to mention, but why is the discussion limited to C. mercurial?

      We feel that it is important to link the broad-scale results to the specific biological characteristics of individual species, and C. mercurial is an IUCN threatened species. We are happy to expand links to natural history of this group and have added the following: “This group also includes Coenagrion resolutum, a common North American damselfly (Swaegers et al., 2014), for which we could not find evidence of decline. This may be due in part to the greater area of intact habitat available in North American compared to Europe, enabling C. resolutum to maintain larger populations that are less vulnerable to stochastic climate events. Still, this and other species failing to shift in range or phenology should be assessed for population health, as this species could be carrying an unobserved extinction debt.” (Lines 527-533).

      Comment:

      (32) L264: Insert 'being' before 'consistently'.

      Thank you for the suggestion, we made this change (Line 373).

      Comment:

      (33) L271: .'. However,'.

      Thank you for pointing out this grammatical error, we have corrected it (Line 382).

      Comment:

      (34) L273: 'affected' instead of 'predicted'

      Thank you for the suggestion, we made this change (Line 383).

      Comment:

      (35) L279: 'despite pronounced recent warming' sounds not relevant in this context.

      Thank you for this suggestion, we removed this portion of the sentence (Line 408).

      Comment:

      (36) L281: Rather 'the model performance did not improve....'

      Thank you for the suggestion, we made this change (Line 409).

      Comment:

      (37) L288: Add 'but' before 'not'.

      Thank you for the suggestion, we made this change (Line 416).

      Comment:

      (38) L311-316: Reconsider the causality here. maybe rather rephrase to are associated instead. Greater dispersal ability and developmental plasticity might well lead to higher growth rates, rather than the other way around.

      We agree that plasticity/evolution at range edges is important to consider and have included it as an alternative explanation: “Adaptive evolution and plasticity may enable higher population growth rates in newly-colonized areas (Angert et al., 2020; Usui et al., 2023), but this possibility can only be directly tested with long term population trend data.” (Line 449-451).  

      Comment:

      (39) L313-316: Maybe delete the second 'should be able to'.

      This phrase has been changed in response to other reviewer comments and now reads as follows:

      “Emerging mean conditions in areas adjacent to the ranges of southern species may offer opportunities for range expansions of these relative climate specialists, which can then tolerate climate warming in areas of range expansion better than more cool-adapted historical occupants (Day et al., 2018).” (Lines 445-448).

      Comment:

      (40) L331: Limit this statement ending with 'in North American and European Odonata'.

      Thank you for this suggestion, we made this addition (Lines 475-476).

      Comment:

      (41) L346-347: There are too many of these more-research-is-needed statements in the discussion (at least three in the last paragraphs). Please consider finishing the paragraphs rather with a significance statement.

      Thank you for this suggestion, we have changed the final sentence here to the following: “The extent to which species’ traits actually determine rates of range and phenological shifts, rather than occasionally correlated with them, is worth considering further, but functional traits do not systematically drive patterns in these shifts among Odonates in North America and Europe.” (Lines 480-483).

      We also made additional changes, removing a ‘more-research is needed’ statement from the following paragraph (Line 443), as well as from line 499.

      Comment:

      (42) L349: See also Franke et al. (2022, Ecology and Evolution).

      Thank you for highlighting this excellent reference! We have added it to Line 501.

      Comment:

      (43) L363: Maybe a bit late in the text, but it is important to note that there is the third dimension 'abundance trends' or rather a common factor related to range and phenology shifts. I feel this fits better with the discussion of population growth.

      Thank you for this suggestion, we have addressed the importance of abundance trends in the following sentences: “Further mechanistic understanding of these processes requires abundance data.” (Lines 442-443); “It remains unclear if range and phenology shifts relate to trends in abundance, but our results suggest that there are clear ‘winners’ and ‘losers’ under climate change.” (Lines 509-510).

      Comment:

      (44) L375-377: This last sentence is very similar to L371-373. Please reduce the redundancy. Focus more on specifically stating the process instead of vaguely saying 'new insights into patterns' and 'suggesting processes'. Rather, deliver a strong concluding message here.

      Thank you for this suggestion, we feel that we now have a much stronger concluding message: “By considering both the seasonal and range dynamics of species, emergent and convergent climate change responses across continents become clear for this well-studied group of predatory insects.” (Lines 545-547).

      Comment:

      (45) Table 1: To me, the few estimates presented here do not justify a table. rather include them in the text. OR combine them with Table 2. Also, why not include the traits as predictors (from the range shift models) in these models as well?

      We have clarified in the text that the results displayed in Table 1 are from the analysis of the relationship between range and phenology shifts: “The effect of species’ range shifts on phenology range shifts was significant in our model investigating the relationship between these responses, indicating that species shifting their northern range limits to higher latitudes also showed stronger advances in their emergence phenology (Figure 3).” (Lines 341-344).

      As there were no significant effects in the model of phenology change drivers, we have not shown results of this model: “Emergence phenology shifts were not affected by species’ traits, range geography, nor climate variability; due to this, model results are not displayed here.” (Lines 383-384).

      Comment:

      (46) Table 2: L712-713: What does this mean? Are phenology shifts not used as a predictor of range shifts? (why then this comment?). Or do you want to say phenological shifts are not related to Southern range etc? Why do you present a phylosig here but not in Table 1? Why not include the traits as predictors (from the range shift models) in these models as well? Consider using the range size as a continuous predictor instead of 'Widespread'.

      We are glad the reviewer pointed this out to us. We did not emphasize this issue sufficiently. We DID evaluate traits as predictors both of geographical range and phenological shifts, and species-specific biological traits did not significantly affect models predicting either of those sets of responses. We state this on Lines 312-323, but we have also noted in the discussion (Lines 473-476) that the most commonly assessed traits, like body size, do not alter observed trends here. Instead, where species are found, rather than the characteristics of species, is the key determinant of their overall responses.

      Following this excellent suggestion, we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Range size was not significant in our models, but we believe this is the best way to analyze our data, and so have updated our methods (Lines 261-263) and results (375-378).

      Comment:

      (47) Figure 1: I don't see any grey points in the figure. Also, there is no A or B. If you are referring to the symbols then write cross and triangle instead and not use capital letters which usually refer to component plots of composite figures. Also, I highly recommend providing a similar figure based on your data (maybe each species as a dot for T1 and another symbol for T2). Given the small number of species, you could try to connect these points with arrows. For the set with only range shifts maybe play the T2-dots at the center of the 'Emergence' axis.

      Thank you for pointing out this error: a previous version of Figure 1 included grey points and multiple panels. We have removed this text from the figure caption to be consistent with the final version of the figure (Line 989).

      The graphical depictions of the conceptual and empirical discoveries in this paper were challenging to create. The reviewer might be suggesting effectively decomposing Figure 3 (change in range on the y axis vs change in phenology among all species into two sets of points on the same graph, where each pair of points is a before and after value for each species. This would make for a very busy figure indeed. We have modified the conceptual Figure 1 to illustrate more clearly, we believe, that species can (in principle) remain within tolerable niche spaces by shifting their activity periods in time (phenology) or in space (geographical range) or both.

      Comment:

      (48) Figure 2: Please add a legend. Also black is a poor background color. The maps appear to be stretched. Please check aspect ratios. Now here are capital letters without an explanation in the caption. From the context I assume the upper panel maps are for the data used to calculate range shifts at the bottom panel maps are for data used to calculate the phenological shifts.

      We apologise for the error in the figure caption and have clarified the differences between panels in the text, as well as changing the map background colour and fixing the aspect ratio:

      “Figure 2: Richness of 76 odonate species sampled in North America and Europe in the historic period (1980-2002; panes A and C) and the recent period (2008-2018; panes B and D). Species richness per 100 × 100 km quadrat is shown in panes A and B, while panes C and D show species richness per 200 × 200 km quadrat. Dark red indicates high species richness, while light pink indicates low species richness.” (Lines 1002-1006).

      Comment:

      (49) Figure 3: Why this citation? Of terrestrial taxa? Please explain. Consider adding some stats here, such as the r-squared value for each of the relationships.

      We have better explained the citation in the figure caption, as well as adding r-squared values:

      “Figure 3: Relationship between range shifts and emergence phenology shifts among North American and European odonate species (N = 66; model R2 = 17.08 for glm, 14.9% for MCMCglmm). For reference, the shaded area shows mean latitudinal range shifts of terrestrial taxa as reported by Lenoir et al. (2020; calculated as the yearly mean dispersal rate of 1.11 +/- 0.96 km per year over 38 years).” (Lines 679-682)

      Comment:

      (50) L801: What are these underscored references?

      This was an issue with the reference software and has been resolved.

      Comment:

      (51) Table S1: L848: Consider starting with 'Samples of 76 North American and European odonate species from between ...'. Please use a horizontal line to separate the content from the table header. Add a horizontal line below the last row. Same for all tables.

      Thank you for this suggestion, we have edited the caption for Figure S1 as suggested (Line 1124). We have also made the suggested line additions to Table S1, S2, and S3.

      Comment:

      (52) Table S3: This is confusing. In Table 1 (main text) both 'southern range' and 'widespread' are used as predictors. Please explain.

      We originally included information on species range geography, including southern versus northern range, and widespread versus not, into one categorical variable. Following additional comments we re-analysed our data using range size, calculated as the number of quadrats occupied by a species in the historical time period, as a predictor. Now the methods section text (Lines 261-263) and Table 1 report results of that variable with distribution options northern, southern, or both. 

      Comment:

      (53) Figure S5 and S6: It would be more coherent if the colors refer to the continents and the suborders are indicated by shading. I would love to see a combination of the two figures with species ordered by the phylogenetic relationship and a dot matrix indicating the traits in the main text! This could really be a good starting point for a synthesis figure.

      The reviewer presents an interesting challenge for us. We have a choice, as we understand things, to present a figure showing phylogeny and traits (as requested here), or an ordered list of species relative to effect sizes in the two main responses to global change. The latter choice centers on the discoveries of the paper, while the former would be valuable for dragonfly biology but would depict information that proved to be biologically uninformative relative to our discovery. That is to say, there is no phylogenetic trend and biological traits among species did not affect results. We have gone some way toward illustrating that issue by retaining phylogeny in the MCMC-GLMM models, but we feel that a figure illustrating phylogeny and traits would (for most readers, at least) illustrate noise, rather than signal. For this reason, we have opted to take on the previous reviewer’s suggestion for a modified, main-text Figure 4, which we include below.

      Figure 4: Distribution of Northern range limit shifts (Panel A, kilometers) and emergence phenology shift (Panel B, Julian day) of 76 European and North American odonate species between a recent time period (2008 - 2018) and a historical time period (1980 - 2002). Anisoptera (dragonflies) are shown in pink, Zygoptera (damselflies) are shown in blue.

      Change last: Figure 3: Relationship between range shifts and emergence phenology shifts among North American and European odonate species (N = 66; model R2 = 17.08 for glm, 14.9% for MCMCglmm). For reference, the shaded area shows mean latitudinal range shifts of terrestrial taxa as reported by Lenoir et al. (2020; calculated as the yearly mean dispersal rate of 1.11 +/- 0.96 km per year over 38 years).

    1. eLife Assessment

      This study is a valuable contribution to the field of neuronal modeling by way of providing a method for rapidly obtaining neuronal physiology parameters from electrophysiological recordings. The method is solid as the generated models reproduce both ground-truth simulated data and empirical data, and there is now a quantitative comparison with other approaches.

    2. Reviewer #2 (Public review):

      Summary:

      Developing biophysically detailed computational models that accurately capture the characteristic physiological properties of neurons across diverse cell types is a key challenge in computational neuroscience. A major obstacle lies in determining the large number of model parameters, which are notoriously difficult to fit such that the model faithfully reproduces the empirically observed electrophysiological responses. Existing approaches require substantial computational resources to generate models for even a single neuron. Generating models for additional neurons typically requires starting from scratch, with no reuse of previous computations - making the process just as computationally expensive each time.

      Kim et al. introduce an innovative approach based on a Generative Adversarial Network (GAN) to overcome these limitations. Once trained, the network takes empirically observed electrophysiological responses as input and predicts the biophysical parameters with which a Hodgkin-Huxley model can reproduce these responses. The authors demonstrate this for nine non-spiking neurons in C. elegans. The resulting models generally provide a good fit to the empirical data. As the GAN has learned general relationships between biophysical parameters and the resulting electrophysiology, it can be used to generate models of diverse cell types without retraining - enabling model generation at low computational cost.

      Strengths:

      The authors address an important and technically challenging problem. A noteworthy strength of their approach is that, once trained, the GAN can generate models from new empirical data at low computational cost. The generated models reproduce the responses to current injections well.

      The authors have addressed all of my previous major concerns and have significantly improved their method:

      (1) Most importantly, the generated models reproduce both ground-truth simulated and empirical data well. Responses - including resting membrane potentials - are now well captured.

      (2) The comparison with other approaches has been extended to be more quantitative and rigorous.

      (3) The authors now convincingly demonstrate that the improved EP-GAN is relatively robust to data ablation.

      Weaknesses:

      Slow dynamics (e.g., slow ramps) are still not reliably captured. However, as the approach excels at other frontiers - the generation of models for diverse cell types at low computational cost - I consider this to be a relatively minor limitation.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      (1) The bad equilibria of the model still remain a concern, as well as other features like the transient overshoots that do not match with the data. I think they could achieve more accuracy here by assigning more weight to such specific features, through adding these as separate objectives for the generator explicitly. The traces contain a five-second current steps, and one second before and one second after the training step. This means that in the RMSE, the current step amplitude will dominate as a feature, as this is simply the state for which the data trace contains most time-points. Note that this is further exacerbated by using the IV curve as an auxiliary objective. I believe a better exploration of specific response features, incorporated as independently weighted loss terms for the generator, could improve the fit. E.g. an auxiliary term could be the equilibrium before and after the current step, another term could penalise response traces that do not converge back to their initial equilibrium, etc.

      We thank the reviewer for the suggestion. We supplemented the membrane potential regression loss with errors computed for 3 intervals: pre- post- and mid- stimulation time intervals, improving the accuracy of EP-GAN for baseline membrane potential responses (Figure 2, 3, Table S2, S3). We also changed the simulation protocols for generated parameters by allowing a longer simulation time of 15 seconds, where the stimulation is applied during [5, 10] seconds and no stimulation at t = [0, 5) (pre-stimulation) and t = (10, 15] (post-stimulation). These time intervals are chosen to ensure sufficient stabilization periods before and after stimulation.  

      (2) The explanation of what the authors mean with 'inverse gradient operation' is clear now. However, this term is mathematically imprecise, as the inverse gradient does not exist because the gradient operator is not injective. The method is simply forward integration under the assumption that the derivate of the voltage is known at the grid time-points, and should be described as such.

      We thank the reviewer for the clarification on inverse gradient operation terminology. In the Methods section, we changed the term describing the inverse gradient operation to ‘forward integration’ which is a more accurate description describing the process.

      (3) I appreciate that the authors' method provides parameters of models at a minimal computational cost compared to running an evolutionary optimization for every new recording. I also believe that with some tweaking of the objective, the method could improve in accuracy. However, I share reviewer 2's concerns that the evolutionary baseline methods are not sufficiently explored, as these methods have been used to successfully fit considerably more complex response patterns. One way out of the dilemma is to show that the EP-GAN estimated parameters provide an initial guess that considerably narrows the search space for the evolutionary algorithm. In this context, the authors should also discuss the recent gradient based methods such as Deistler et al. (https://doi.org/10.1101/2024.08.21.608979) or Jones et al (https://doi.org/10.48550/arXiv.2407.04025).

      We supplemented the optimization setup for existing methods (GDE3, NSDE, DEMO, and NSGA2) by incorporating steady-state response constraints as the initial selection process. The process is similar to that of EP-GAN training data generation and DEMO parameter selection process [16] (see Results section, page 6 for detail). We also expanded the testing scenarios by evaluating all methods with respect to both small and large HH-model estimation. The small HH-model scenario estimates 47 parameters consisting of channel conductance, reversal potentials and initial conditions with the channel parameters (n = 129) frozen to default values in [41]. Large HH-model includes estimating channel parameters (i.e. 129) in addition to the 47 parameters by considering +-50% variations from their default values. For both small and large HH-model scenarios, we test total sample sizes of both 32k and 64k for all methods to evaluate their scalability with the number of simulated samples given during optimization. The results show that existing methods show good performances for small HH-model scenarios that scale with sample size consistent with literature. EP-GAN on the other hand shows overall better performance in predicting membrane potential responses on both small and large HH-model scenarios.  

      Reviewer #2 (Public review):

      Major 1: Models do not faithfully capture empirical responses. While the models generated with EPGAN reproduce the average voltage during current injections reasonably well, the dynamics of the response are generally not well captured. For example, for the neuron labeled RIM (Figure 2), the most depolarized voltage traces show an initial 'overshoot' of depolarization, i.e. they depolarize strongly within the first few hundred milliseconds but then fall back to a less depolarized membrane potential. In contrast, the empirical recording shows no such overshoot. Similarly, for the neuron labeled AFD, all empirically recorded traces slowly ramp up over time. In contrast, the simulated traces are mostly flat. Furthermore, all empirical traces return to the pre-stimulus membrane potential, but many of the simulated voltage traces remain significantly depolarized, far outside of the ranges of empirically observed membrane potentials. The authors trained an additional GAN (EPGAN Extended) to improve the fit to the resting membrane potential. Interestingly, for one neuron (AWB), this improved the response during stimulation, which now reproduced the slowly raising membrane potentials observed empirically, however, the neuron still does not reliably return to its resting membrane potential. For the other two neurons, the authors report a decrease in accuracy in comparison to EP-GAN. While such deviations may appear small in the Root mean Square Error (RMSE), they likely indicate a large mismatch between the model and the electrophysiological properties of the biological neuron. The authors added a second metric during the revision - percentages of predicted membrane potential trajectories within empirical range. I appreciate this additional analysis. As the empirical ranges across neurons are far larger than the magnitude of dynamical properties of the response ('slow ramps', etc.), this metric doesn't seem to be well suited to quantify to which degree these dynamical properties are captured by the models.

      We made improvements to the training data generation and architecture of EP-GAN to improve its overall accuracy with predicted membrane potential responses. In particular, we divided training data generation into three neuron types found in C. elegans non-spiking neurons: 1) Transient outward rectifier, 2) Outward rectifier and 3) Bistable [8, 16]. Each randomly generated training sample is categorized into one of 3 types by evaluating its steady-state currents with respect to experimental dI/dV bound constraints (See generating training data section under Methods for more detail). The process is then followed by imposing minimum-maximum constraints on simulated membrane potential responses. The setup allows generations of training samples that are of closer distribution to experimentally recorded neurons. This is further described in Section Methods page 15 in the revised manuscript.

      We also improved the EP-GAN training process by incorporating random masking of input membrane potential responses. The masking forces EP-GAN to make predictions even with missing voltage traces, improving overall accuracy and allowing EP-GAN to use membrane potential inputs with arbitrary clamping protocol (see Methods page 13 for more detail). For the training loss functions, we further supplemented the membrane potential regression loss with errors computed for 2 intervals: pre- and post-stimulation time intervals to improve EP-GAN prediction capabilities for baseline membrane potentials.

      Taken together, these modifications improved EP-GAN’s overall ability to better capture empirical membrane potential responses and we show the results in Figure 2 – 5, Table S2, S3.

      Major 2: Comparison with other approaches is potentially misleading. Throughout the manuscript, the authors claim that their approach outperforms the other approaches tested. But compare the responses of the models in the present manuscript (neurons RIM, AFD, AIY) to the ones provided for the same neurons in Naudin et al. 2022 (https://doi.org/10.1371/journal. pone.0268380). Naudin et al. present models that seem to match empirical data far more accurately than any model presented in the current study. Naudin et al. achieved this using DEMO, an algorithm that in the present manuscript is consistently shown to be among the worst of all algorithms tested. I therefore strongly disagree with the authors claim that a "Comparison of EP-GAN with existing estimation methods shows EP-GAN advantage in the accuracy of estimated parameters". This may be true in the context of the benchmark performed in the study (i.e., a condition of very limited compute resources - 18 generations with a population size of 600, compare that to 2000 generations recommended in Naudin et al.), but while EP-GAN wins under these specific conditions (and yes, here the authors convincingly show that their EP-GAN produces by far the best results!), other approaches seem to win with respect to the quality of the models they can ultimately generate.

      We thank the reviewer for the feedback regarding the comparison with existing methods. We have revised the optimization setup for existing methods (GDE3, NSDE, DEMO, and NSGA2) by incorporating steady-state response constraints as the initial selection process. The process is similar to that of EP-GAN training data generation and DEMO parameter selection process [16] (see Results section, page 6 for detail). Incorporating this process has improved the accuracy of existing methods especially for small HH-model scenarios where DEMO stood out with the best performance alongside NSGA2 (Figure 5, Table 1, 2).

      We also expanded the testing scenarios by evaluating all methods with respect to both small and large HH-model estimation. The small HH-model scenario estimates 47 parameters consisting of channel conductance, reversal potentials and initial conditions with the channel parameters (n = 129) frozen to default values in [41]. Large HH-model includes estimating channel parameters (i.e. 129) in addition to the 47 parameters by considering +-50% variations from their default values. For both small and large HH-model scenarios, we test total sample sizes of both 32k and 64k for all methods to evaluate their scalability with the number of simulated samples given during optimization. The results show that existing methods show good performances for small HH-model scenarios that scale with sample size. EP-GAN on the other hand shows overall better performance in predicting membrane potential responses on both small and large HH-model scenarios. 

      In particular, with extended membrane potential error including pre-, mid- , post-activation periods, EP-GAN (trained with 32k samples, large HH-model, 9 neurons) mean membrane potential responses error of 2.82mV was lower than that of DEMO (12.2mV, 64k samples) trained on identical setup (Table 2) and DEMO (7.78mV, using 36,000k samples, 3 neurons) applied to simpler HHmodel in [16]. With respect to DEMO performance in [16], under identical simulation protocol (i.e., no stimulation during (0, 5s), (10, 15s) and stimulation during (5, 10s)), EP-GAN predicted RIM (large HH-model) showed membrane potential accuracy on par with that of DEMO (simpler HH-model) and EP-GAN predicted AFD showed better accuracy for post-activation membrane potential response where DEMO predicted membrane potentials overshoot above the baseline (not shown in the paper).

      Major 3: As long as the quality of the models generated by the EP-GAN cannot be significantly improved, I am doubtful that it indeed can contribute to the 'ElectroPhysiome', as it seems likely that dynamics that are currently poorly captured, like slow ramps, or the ability of the neuron to return to its resting membrane potential, will critically affect network computations. If the authors want to motivate their study based on this very ambitious goal, they should illustrate that single neuron model generation with their approach is robust enough to warrant well-constrained network dynamics. Based on the currently presented results, I find the framing of the manuscript far too bold.

      We thank the reviewer for the feedback regarding the paper's scope. With revised methods, the overall quality of EP-GAN models is improved with the most significant improvements in baseline membrane potential accuracy. While high quality neuron models could be attained with existing methods given sufficient sample size, our results suggest EP-GAN can predict models with enhanced quality with significantly fewer sample size without a need for retraining, thus complementing the main drawback of evolutionary based methods. While EP-GAN still has limitations (e.g., difficulty in predicting slow ramps) that need to be addressed in the future, we believe its overall performance combined with fast inference speed and flexibility in its input data format (e.g., missing membrane potential traces) is a step forward in the large-scale neuron modeling tasks that can contribute to network models.   

      Major 4: The conclusion of the ablation study 'In addition the architecture of EP-GAN permits inference of parameters even when partial membrane potential and steady-state currents profile are given as inputs' does not seem to be justified given the voltage traces shown in Figure 3. For example, for RIM, the resting membrane potential stays around 0 mV, but all empirical traces are around -40mV. For AFD, all simulated traces have a negative slope during the depolarizing stimuli, but a positive slope in all empirically observed traces. For AIY, the shape of hyperpolarized traces is off. While it may be that by their metric neurons in the 25% category are classified as 'preserving baseline accuracy', this doesn't seem justified given the voltage traces presented in the manuscript. It appears the metric is not strict enough.

      We improved EP-GAN’s training process by incorporating random masking of input membrane potential responses. The masking forces EP-GAN to make predictions even with missing voltage traces, improving overall accuracy and allowing EP-GAN to use membrane potential inputs with arbitrary clamping protocol.

      Such input masking during training has improved the results with ablation studies where EP-GAN now retains baseline membrane potential error (3.3mV, averaged across pre-, mid-, post-activation periods) up to 50% of membrane potential inputs remaining (3.5mV) and up to 25% of steady-state currents remaining (3.5mV).

    1. eLife Assessment

      This valuable study investigates the implementation of an efference copy mechanism in the visual flight control system of Drosophila, a topic of broad interest to sensorimotor neuroscientists. Although the behavioral data and computational analyses are each individually solid, there is limited quantitative evaluation of how the model predictions compare to the experimental data.

    2. Reviewer #1 (Public review):

      This study provides an integrative model of the visuomotor control in Drosophila melanogaster. This model presents an experimentally derived model based on visually evoked wingbeat pattern recordings of three strategically selected visual stimulus types with well-established behavioral response characteristics. By testing variations of these models, the authors demonstrate that the virtual model behavior can recapitulate the recorded wing beat behavioral results and those recorded by others for these specific stimuli when presented individually. Yet, the novelty of this study and their model is that it allows predictions for natural visual scenes in which multiple visual stimuli occur simultaneously and may have opposite or enhancing effects on behavior. Testing three models that would allow interactions of these visual modalities, the authors show that using a visual efference copy signal allows visual streams to interact, replicating behavior recorded when multiple stimuli are presented simultaneously. Importantly, they validated the prediction of this model in real flies using magnetically tethered flies, e.g., presenting moving bars with varying backgrounds. In conclusion, the presented manuscript presents a commendable effort in developing and demonstrating the validity of a mixture model that enables predictions of Drosophila behavior in natural visual environments.

      The manuscript employs a thorough, logical approach, combining computational modeling with experimental behavioral validation using magnetically tethered flies. This iterative integration of simulation and empirical behavioral evidence enhances the credibility of the findings. The quantitative models and validating behavioral experiments make this a valuable contribution to the field. This study is well executed and addresses a significant gap in the modeling of fly behavior and holistic understanding of visuomotor behaviors.

      The associated code base is well documented and readily produces all figures in the document.

    3. Reviewer #2 (Public review):

      Summary:

      The fly visual circuit and its behavioral response to simple visual stimuli have been well investigated, yet how they respond to more complex visual patterns is less understood. Canelo et al. first characterized a fly's steering to simple stimuli and examined how the combination of those stimuli impacts behavior. Combining behavioral experiments and simulation, the authors found that, for some combinations, a behavioral response can be explained by a linear summation of responses to individual stimuli. However, for looming and background motion combinations, the behavioral response to one was suppressed by the other. Furthermore, the effect was dependent on the onset timing of the pair of stimuli.

      Strength:

      The authors tested various visual stimulus patterns and time delays between combinations of visual stimuli and found novel interactions in behavior. Their findings support the idea that, depending on the visual context, additional mechanisms kick into the visual-motor circuit to coordinate steering behavior flexibly.

      Weakness:

      The manuscript does not provide conclusive evidence on the presence of an efference copy signal, though there appears to be an intention to associate it with the result. However, demonstrating it is likely to be beyond the main scope of the revised version.

      The goal of this manuscript is to understand how the fly's steering behavior is coordinated upon complex visual stimuli, and a number of experiments and simulations support their conclusion.

      The behavioral findings presented in this paper will be helpful in further dissecting the underlying neural mechanisms of contextual sensory processing and in understanding visual processing in other species.

    4. Reviewer #3 (Public review):

      Summary:

      Canelo et al. used a combination of mathematical modeling and behavioral experiments to ask how flies orient to visual features and stabilize their gaze. In particular, the authors propose three models of visuomotor control, which lead to specific experimental predictions. With the goal of teasing out the suggested models, the authors design three flight experiments: 1) a bar-background experiment, 2) a looming-background experiment, and 3) a bar-background statistics experiment. The authors claim that: experiment 1 data favor the addition-only and graded EC model; experiment 2 data favor the all-or-none EC model; experiment 3 appears to suggest a graded EC model.

      While the study is interesting, there are major issues with the conceptual framework. In general, there is a major disconnect between model and animal data. The manuscript lacks a statistical framework to support or refute the proposed models. In the end, it is unclear what are the main conclusions of the manuscript and contributions to the field.

      Strengths:

      They ask a significant question related to efference copies during volitional movement.

      The figures are overall clear and salient.

      Weaknesses:

      Comparison of model to fly data:<br /> In general, the manuscript suffers from a lack of quantitative comparisons between proposed models and fly data, which compromises the main findings of the work. While Figure 1-Fig. supplement 1 shows a direct comparison between experiment and model predictions, puzzlingly there is no such quantitative comparison in the main manuscript for the faster moving stimuli. Please overlay model predictions and experimental data and provide statistical comparisons throughout. The 3 proposed models are hypotheses, but there is no statistical framework to reject or support the models/hypotheses. Further, there is a disconnect between the new flight experiments and models. In fact, we do not see the model predictions for the set of experimental conditions tested in Figs. 5-7.

      Concerns about mechanical model: I have several concerns regarding the biomechanics block in Figure 2:

      (1) The inertia coefficient, derived from free flight studies. does not take into account the fact that the center of rotation and center of mass do not align in the magnetic tether (see Bender & Dickinson, 2006 for estimates). This must be corrected using the parallel axis theorem. As the authors compare the model prediction to experimental data in a magnetic tether, it is critical that they revise their analysis.

      (2) According to their chosen inertia and damping constants, they would estimate that the I/C time constant is ~1E-3 ms, which is much much smaller than what has been estimated for yaw turns in the magnetic tether (200 ms; Bender & Dickinson, 2006) or free flight saccades (~17 ms; see Cheng et al., 2010; 10.1242/jeb.038778). The bottom line is that the current model underestimates the influence of inertia in turn manoeuvres, i.e. the aerodynamic damping is cranked up too high relative to yaw inertia. This may explain the mismatch between data and model that the authors posit, "What causes the fly to undershoot the movement of the target object in the magnetically tethered assay? One hypothesis is that strong upward magnetic force or a blunt top end of the steel pin significantly dampens the flies' flight turns."

      Loom response experiment:<br /> As nicely shown by 10.1242/jeb.02369, visual stimulation of looming stimuli in the magnetic tether evokes saccades. Is it the case as well in Fig. 6? Without showing individual trials, it is not possible to know whether this is the case. If indeed saccades are present, then the authors will have to reframe their results given the physiological evidence for saccade-related cancellation signals and the three proposed models.

      Minor comments:

      Missing Equation 13 for saccade model in Methods.

      For the discussion and results related to flight responses to the mismatch between expected and actual visual feedback, which is germane to the proposed models, the authors should integrate a discussion of a recent paper which directly tested this idea through an augmented reality system: 10.1016/j.cub.2023.11.045. In particular, the authors argue that the optomotor response is not particularly flexible because it may not rely on an internal model, as suggested by recent physiological evidence (Fenk et al.). How do these findings relate to the 3 proposed models within your work?

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "Drosophila Visuomotor Integration: An Integrative Model and Behavioral Evidence of Visual Efference Copy" provides an integrative model of the visuomotor control in Drosophila melanogaster. This model presents an experimentally derived model based on visually evoked wingbeat pattern recordings of three strategically selected visual stimulus types with well-established behavioral response characteristics. By testing variations of these models, the authors demonstrate that the virtual model behavior can recapitulate the recorded wing beat behavioral results and those recorded by others for these specific stimuli when presented individually. Yet, the novelty of this study and their model is that it allows predictions for natural visual scenes in which multiple visual stimuli occur simultaneously and may have opposite or enhancing effects on behavior. Testing three models that would allow interactions of these visual modalities, the authors show that using a visual efference copy signal allows visual streams to interact, replicating behavior recorded when multiple stimuli are presented simultaneously. Importantly, they validated the prediction of this model in real flies using magnetically tethered flies, e.g., presenting moving bars with varying backgrounds. In conclusion, the presented manuscript presents a commendable effort in developing and demonstrating the validity of a mixture model that allows predictions of the behavior of Drosophila in natural visual environments.

      Strengths:

      Overall, the manuscript is well-structured and clear in its presentation, and the modeling and experimental research are methodically conducted and illustrated in visually appealing and easy-to-understand figures and their captions.

      The manuscript employs a thorough, logical approach, combining computational modeling with experimental behavioral validation using magnetically tethered flies. This iterative integration of simulation and empirical behavioral evidence enhances the credibility of the findings.

      The associated code base is well documented and readily produces all figures in the document.

      Suggestions:

      However, while the experiments provide evidence for the use of a visual efference copy, the manuscript would be even more impressive if it presented specific predictions for the neural implementation or even neurophysiological data to support this model. Or, at the very least, a thorough discussion. Nonetheless, these models and validating behavioral experiments make this a valuable contribution to the field; it is well executed and addresses a significant gap in the modeling of fly behavior and holistic understanding of visuomotor behaviors.

      We appreciate the reviewer’s thoughtful comments on the strengths and weaknesses of our manuscript. We agree that biophysically realistic model reflecting the structure of neural circuits as well as physiological data from them would be invaluable. However, we are currently unable to provide physiological evidence for EC-based suppression, nor provide circuit architecture for efference copy-based suppression of the stability circuit because the neural pathway underlying this behavior remains unidentified. Extensive recordings from the HS/VS system have revealed cell-type-specific motor-related inputs during both spontaneous and loom-evoked flight turns (Fenk et al., 2021; Kim et al., 2017, 2015). These studies predicted suppression of the optomotor stability response during such turns, and our new experiments confirmed this suppression specifically during loom-evoked turns (Figures 5, 6). However, these neurons are primarily involved in the head optomotor response, not the body optomotor response. We hope to extend our current model in future studies to incorporate more cellular-level detail, as the feedforward circuits underlying stability behavior become more clearly defined.

      Here are a few points that should be addressed:

      (1) The biomechanics block (Figure 2) should be elaborated on, to explain its relevance to behavior and relation to the underlying neural mechanisms.

      We appreciate this suggestion. The mathematical representation of the biomechanics block has been developed by other groups in previous studies (Fry et al., 2003; Ristroph et al., 2010). We used exactly the same model, and its parameters were identical to those used in one of those studies (Fry et al., 2003; Ristroph et al., 2010), in which the parameters were estimated from the stabilizing response in response to magnetic “stumbling” pulses. In the previous version of the manuscript, we had a description of the biomechanics block in the Method section (see Equation 4). In response to the reviewer’s comment, we have made a few changes in Figure 2A and expanded the associated description in the main text, as follows.

      (Line 160) “To test the orientation behavior of the model, we developed an expanded model, termed “virtual fly model” hereafter. In this model, we added a biomechanics block that transforms the torque response of the fly to the actual heading change according to kinematic parameters estimated previously (Michael H Dickinson, 2005; Ristroph et al., 2010) (Figure 2A, see Equation 4 in Methods and Movie S1). The virtual fly model, featuring position and velocity blocks that are conditioned on the type of the visual pattern, can now change its body orientation, simulating the visual orientation behavior of flies in the free flight condition.”

      (2) It is unclear how the three integrative models with different strategies were chosen or what relevance they have to neural implementation. This should be explained and/or addressed.

      Thank you for this valuable comment. We selected the three models based on previous studies investigating visuomotor integration across multiple species, under conditions where multiple sensory cues are presented simultaneously.

      The addition-only model represents the simplest hypothesis, analogous to the “additive model” proposed by Tom Collett in his 1980 study (Collett, 1980). We used this model as a baseline to illustrate behavior in the absence of any efference copy mechanism. Notably, some modeling studies have proposed linear (additive) integration for multimodal sensory cues at the behavioral level (Liu et al., 2023; Van der Stoep et al., 2021). However, experimental evidence demonstrating strictly linear integration—either behaviorally or physiologically—remains limited. In our study, new data (Figure 5) show that bar-evoked and background movement-evoked locomotor responses are combined linearly, supporting the addition-only model.

      The graded efference copy model has been most clearly demonstrated in the cerebellum-like circuit of Mormyrid fish during electrosensation (Bell, 1981; Kennedy et al., 2014). In this system, the efference copy signal forms a negative image of the predicted reafferent input and undergoes plastic changes as the environment changes—an idea that inspired our modifiable efference copy model (Figure 4–figure supplement 1). The all-or-none efference copy model is exemplified in the sensory systems of smaller organisms, such as the auditory neurons of crickets during stridulation (Poulet and Hedwig, 2006). Notably, in crickets, the motor-related input is referred to as corollary discharge rather than efference copy. Typically, “efference copy” refers to a graded, subtractive motor-related signal, while “corollary discharge” denotes an all-or-none signal, both counteracting the sensory consequences of self-generated actions. In this manuscript, we use the term efference copy more broadly, encompassing both types of motor-related feedback signals (Sommer and Wurtz, 2008).

      In response to this comment, we have made the following changes in the main text to enhance its accessibility to general readers.

      (Line#268) “This integration problem has been studied across animal sensory systems, typically by analyzing motor-related signals observed in sensory neurons (Bell, 1981; Collett, 1980; Kim et al., 2017; Poulet and Hedwig, 2006). Building on the results of these studies, we developed three integrative models. The first model, termed the “addition-only model”, assumes that the outputs of the object (bar) and the background (grating) response circuits are summed to control the flight orientation (Figure 4B, see Equation 14 in Methods).”

      (Line#272) “In the second and third models, an EC is used to set priorities between different visuomotor circuits (Figure 4C,D). In particular, the EC is derived from the object-induced motor command and sent to the object response system to nullify visual input associated with the object-evoked turn (Bell, 1981; Collett, 1980; Poulet and Hedwig, 2006). These motor-related inputs fully suppress sensory processing in some systems (Poulet and Hedwig, 2006), whereas in others they selectively counteract only the undesirable components of the sensory feedback (Bell, 1981; Kennedy et al., 2014).”

      (3) There should be a discussion of how the visual efference could be represented in the biological model and an evaluation of the plausibility and alternatives.

      Thank you for this helpful comment. We have now added the following discussion to share our perspective on the circuit-level implementation of the visual efference copy in Drosophila.

      (Line#481) “Efference copy in Drosophila vision

      Under natural conditions, various visual features in the environment may concurrently activate multiple motor programs. Because these may interfere with one another, it is crucial for the central brain to coordinate between the motor signals originating from different sensory circuits. Among such coordination mechanisms, the EC mechanisms were hypothesized to counteract so-called reafferent visual input, those caused specifically by self-movement (Collett, 1980; von Holst and Mittelstaedt, 1950). Recent studies reported such EC-like signals in Drosophila visual neurons during spontaneous as well as loom-evoked flight turns (Fenk et al., 2021; Kim et al., 2017, 2015). One type of EC-like signals were identified in a group of wide-field visual motion-sensing neurons that were shown to control the neck movement for the gaze stability (Kim et al., 2017). The EC-like signals in these cells were bidirectional depending on the direction of flight turns, and their amplitudes were quantitatively tuned to those of the expected visual input across cell types. Although amplitude varies among cell types, it remains inconclusive whether it also varies within a given cell type to match the amplitude of expected visual feedback, thereby implementing the graded EC signal. A more recent study examined EC-like signal amplitude in the same visual neurons for loom-evoked turns, across events (Fenk et al., 2021). Although the result showed a strong correlation between wing response and the EC-like inputs, the authors pointed that this apparent correlation could stem from noisy measurement of all-or-none motor-related inputs.

      Thus, these studies did not completely disambiguate between graded vs. all-or-none EC signaling. Another type of EC-like signals observed in the visual circuit tuned to a moving spot exhibited characteristics consistent with all-or-none EC. That is, it entirely suppressed visual signaling, irrespective of the direction of the self-generated turn (Kim et al., 2015; Turner et al., 2022). 

      Efference-copy (EC)–like signals have been reported in several Drosophila visual circuits, yet their behavioral role remains unclear. Indirect evidence comes from a behavioral study showing that the dynamics of spontaneously generated flight turns were unaffected by unexpected background motion (Bender and Dickinson, 2006a). Likewise, our behavioral experiments showed that, during loom-evoked turns, responses to background motion are suppressed in an all-or-none manner (Figures 6 and 7). Consistent with this, motor-related inputs recorded in visual neurons exhibit nearly identical dynamics during spontaneous and loom-evoked turns (Fenk et al., 2021). Together, these behavioral and physiological parallels support the idea that a common efference-copy mechanism operates during both spontaneous and loom-evoked flight turns.

      Unlike loom-evoked turns, bar-evoked turn dynamics changed in the presence of moving backgrounds (Figure 5), a result compatible with both the addition-only and graded EC models. However, when the static background was updated just before a bar-evoked turn—thereby altering the amplitude of optic flow—the turn dynamics remained unaffected (Figures 5 and 7), clearly contradicting the addition-only model. Thus, the graded EC model is the only one consistent with both findings. If a graded EC mechanism were truly at work, however, an unexpected background change should have modified turn dynamics because of the mismatch between expected and actual visual feedback (Figure 4–figure supplement 1)—yet we detected no such effect at any time scale examined (Figure 7–figure supplement 1). This mismatch would be ignored only if the amplitude of the graded EC adapted to environmental changes almost instantaneously—a mechanism that seems improbable given the limited computational capacity of the Drosophila brain. In electric fish, for example, comparable adjustments take more than 10 minutes (Bell, 1981; Muller et al., 2019). Further investigation is needed to clarify how reorienting flies ignore optic flow generated by static backgrounds, potentially by engaging EC mechanisms not captured by the models tested in this study.

      Why would Drosophila rely on the all-or-none EC mechanism instead of the graded one for loom-evoked turns? A graded EC must be adjusted adaptively depending on the environment, as the amplitude of visual feedback varies with both the dynamics of self-generated movement and environmental conditions (e.g., empty vs. cluttered visual backgrounds) (Figure 4—figure supplement 1). Recent studies on electric fish have suggested that a large array of neurons in a multi-layer network is crucial for generating a modifiable efference copy signal matched to the current environment (Muller et al., 2019). Given their small-sized brain, flies might opt for a more economical design for suppressing unwanted visual inputs regardless of the visual environment. Circuits mediating such a type of EC were identified in the cricket auditory system during stridulation (Poulet and Hedwig, 2006), for example. Our study strongly suggests the existence of a similar circuit in the Drosophila visual system. 

      We tested the hypothesis that efference-copy (EC) signals guide action selection by suppressing specific visuomotor reflexes when multiple visual features compete. An alternative motif with a similar function is mutual inhibition between motor pathways (Edwards, 1991; Mysore and Kothari, 2020). In Drosophila, descending neurons form dense lateral connections (Braun et al., 2024), offering a substrate for such competitive interactions. Determining whether—and how—EC and mutual inhibition operate will require recordings from the neurons that ensure visual stability, which remain unidentified. Mapping these pathways and assessing how they are modulated by visual and behavioral context are important goals for future work.”

      Reviewer #2 (Public Review):

      It has been widely proposed that the neural circuit uses a copy of motor command, an efference copy, to cancel out self-generated sensory stimuli so that intended movement is not disturbed by the reafferent sensory inputs. However, how quantitatively such an efference copy suppresses sensory inputs is unknown. Here, Canelo et al. tried to demonstrate that an efference copy operates in an all-or-none manner and that its amplitude is independent of the amplitude of the sensory signal to be suppressed. Understanding the nature of such an efference copy is important because animals generally move during sensory processing, and the movement would devastatingly distort that without a proper correction. The manuscript is concise and written very clearly. However, experiments do not directly demonstrate if the animal indeed uses an efference copy in the presented visual paradigms and if such a signal is indeed non-scaled. As it is, it is not clear if the suppression of behavioral response to the visual background is due to the act of an efference copy (a copy of motor command) or due to an alternative, more global inhibitory mechanism, such as feedforward inhibition at the sensory level or attentional modulation. To directly uncover the nature of an efference copy, physiological experiments are necessary. If that is technically challenging, it requires finding a behavioral signature that unambiguously reports a (copy of) motor command and quantifying the nature of that behavior.

      We thank the reviewer for this insightful and constructive comment. We agree that our current behavioral evidence does not directly identify the underlying circuit mechanism, and that direct recordings from visual neurons modulated by an efference copy would be critical for distinguishing between potential mechanisms.

      A prerequisite for such physiological investigations would be the identification of both (1) the feedforward neurons directly involved in the optomotor response, and (2) the neurons conveying motor-related signals to the optomotor circuit. Despite efforts by several research groups, the location of the feedforward circuit mediating the optomotor response remains elusive. This limitation has prevented us from obtaining direct cellular evidence of flight turn-associated suppression of optomotor signaling.

      In light of the reviewer’s suggestion, we expanded our investigation to strengthen the behavioral evidence for efference copy (EC) mechanisms. In addition to our earlier experiments involving unexpected changes in the static background, we examined how object-evoked flight turns influence the optomotor stability reflex and vice versa (Figures 5 and 6). To quantify the interaction between different visuomotor behaviors, we systematically varied the temporal relationship between two types of visual motion—loom versus moving background, or moving bar versus moving background—and measured the resulting behavioral responses.

      Our findings support pattern- and time-specific suppressive mechanisms acting between flight turns associated with the different visual patterns. Specifically:

      The responses to a moving bar and a moving background add linearly, even when presented in close temporal proximity.

      Loom-evoked turns and the optomotor stability reflex mutually suppress each other in a time-specific manner.

      For both loom- and moving bar-evoked flight turns, changes in the static background had no measurable effect on the dynamics of the object-evoked responses.

      These results provide a detailed behavioral characterization of a suppressive interaction between distinct visuomotor responses. This, in turn, offers correlative evidence supporting the involvement of an efference copy-like mechanism acting on the visual system. While similar efference copy mechanisms have been documented in other parts of the visual system, we acknowledge that our findings do not exclude alternative explanations. In particular, it is still possible that lateral inhibition within the central brain or ventral nerve cord contributes to the suppression we observed.

      Ultimately, definitive proof will require identifying the specific neurons that convey efference copy signals and demonstrating that silencing these neurons abolishes the behavioral suppression. Until such experiments are feasible, our behavioral approach provides an important contribution toward understanding the nature of sensorimotor integration in this system.

      Reviewer #3 (Public Review):

      Summary:

      Canelo et al. used a combination of mathematical modeling and behavioral experiments to ask whether flies use an all-or-none EC model or a graded EC model (in which the turn amplitude is modulated by wide-field optic flow). Particularly, the authors focus on the bar-ground discrimination problem, which has received significant attention in flies over the last 50-60 years. First, they use a model by Poggio and Reichardt to model flight response to moving small-field bars and spots and wide-field gratings. They then simulate this model and compare simulation results to flight responses in a yaw-free tether and find generally good agreement. They then ask how flies may do bar-background discrimination (i.e. complex visual environment) and invoke different EC models and an additive model (balancing torque production due to background and bar movement). Using behavioral experiments and simulation supports the notion that flies use an all-or-none EC since flight turns are not influenced by the background optic flow. While the study is interesting, there are major issues with the conceptual framework.

      Strengths:

      They ask a significant question related to efference copies during volitional movement.

      The methods are well detailed and the data (and statistics) are presented clearly.

      The integration of behavioral experiments and mathematical modeling of flight behavior.

      The figures are overall very clear and salient.

      Weaknesses:

      Omission of saccades: While the authors ask a significant question related to the mechanism of bar-ground discrimination, they fail to integrate an essential component of the Drosophila visuomotor responses: saccades. Indeed, the Poggio and Reichardt model, which was developed almost 50 years ago, while appropriate to study body-fixed flight, has a severe limitation: it does not consider saccades. The authors identify this major issue in the Discussion by citing a recent switched, integrate-and-fire model (Mongeau & Frye, 2017). The authors admit that they "approximated" this model as a smooth pursuit movement. However, I disagree that it is an approximation; rather it is an omission of a motor program that is critical for volitional visuomotor behavior. Indeed, saccades are the main strategy by which Drosophila turn in free flight and prior to landing on an object (i.e. akin to a bar), as reported by the Dickinson group (Censi et al., van Breugel & Dickinson [not cited]). Flies appear to solve the bar-ground discrimination problem by switching between smooth movement and saccades (Mongeau & Frye, 2017; Mongeau et al., 2019 [not cited]). Thus, ignoring saccades is a major issue with the current study as it makes their model disconnected from flight behavior, which has been studied in a more natural context since the work of Poggio.

      Thank you for this helpful comment. We agree that including saccadic turns is essential and qualitatively improves the model. In the revised manuscript, we therefore expanded our bar-tracking model to incorporate an integrate-and-saccade strategy, now presented in Figure 2—figure supplement

      The manuscript now introduces this result as follows:

      (Line#190) “Finally, one important locomotion dynamics that a flying Drosophila exhibits while tracking an object is a rapid orientation change, called a “saccade” (Breugel and Dickinson, 2012; Censi et al., 2013; Heisenberg and Wolf, 1979). For example, while tracking a slowly moving bar, flies perform relatively straight flights interspersed with saccadic flight turns (Collett and Land, 1975; Mongeau and Frye, 2017). During this behavior, it has been proposed that visual circuits compute an integrated error of the bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau et al., 2019; Mongeau and Frye, 2017). We expanded our bar fixation model to incorporate this behavioral strategy (Figure 2--figure supplement 2). The overall structure of the modified model is akin to the one proposed in a previous study (Mongeau and Frye, 2017), and the amplitude of a saccadic turn was determined by the sum of the position and velocity functions (Figure 2--figure supplement 2A; see Equation 13 in Methods). When simulated, our model successfully reproduced experimental observations of saccade dynamics across different object velocities (Figure 2--figure supplement 2B-D) (Mongeau and Frye, 2017). Together, our models faithfully recapitulated the results of previous behavioral observations in response to singly presented visual patterns (Collett, 1980; Götz, 1987; H. Kim et al., 2023; Maimon et al., 2008; Mongeau and Frye, 2017).”

      Apart from Figures 1 and 2, most of our data—whether from simulations or behavioral experiments—use brief visual patterns lasting 200 ms or less. These stimuli trigger a single, rapid orientation change reminiscent of a saccadic flight turn. In this part of the paper, we essentially have examined how multiple visuomotor pathways interact to determine the direction of object-evoked turns when several visual patterns occur simultaneously.

      Critically, recent work showed that a group of columnar neurons (T3) appear specialized for saccadic bar tracking through integrate-and-fire computations, supporting the notion of parallel visual circuits for saccades and smooth movement (Frighetto & Frye, 2023 [not cited]).

      Thanks for bringing up this critical issue. We have now added this paper in the following part of the manuscript.

      (Line#193) “During this behavior, it has been proposed that visual circuits compute an integrated error of the horizontal bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau and Frye, 2017).”

      (Line#462) “Visual systems extract features from the environment by calculating spatiotemporal relationships of neural activities within an array of photoreceptors. In Drosophila, these calculations occur initially on a local scale in the peripheral layers of the optic lobe (Frighetto and Frye, 2023; Gruntman et al., 2018; Ketkar et al., 2020).”

      A major theme of this work is bar fixation, yet recent work showed that in the presence of proprioceptive feedback, flies do not actually center a bar (Rimniceanu & Frye, 2023). Furthermore, the same study found that yaw-free flies do not smoothly track bars but instead generate saccades. Thus prior work is in direct conflict with the work here. This is a major issue that requires more engagement by the authors.

      Thank you for your thoughtful comments and for drawing our attention to this important paper. In our experiments, bar fixation on oscillating vertical objects emerges during the “alignment” phase of the magneto-tether protocol. The pattern movement dynamics was similar those used by Rimniceanu & Frye (2023), yet the two studies differ in a key respect: Rimniceanu & Frye employed a motion-defined bar, whereas we presented a dark vertical bar against a uniform or random-dot background. The alignment success rate—defined as the proportion of trials in which the fly’s body angle is within ±25° of the target—was about 50 % (data not shown). Our alignment pattern consisted of three vertical stripes spanning ~40° horizontally; when we replaced it with a single, narrower stripe, the success rate was lowered (data not shown). These observations suggest that bar fixation in the magnetically tethered assay is less robust than in the rigid-tethered assay, although flies still orient toward highly salient vertical objects.

      We also observed that bar-evoked turns were elicited more reliably when the bar moved rapidly (45° in 200 ms) in the magneto-tether assay, although the turn magnitude was significantly smaller than the actual bar displacement (Figure 3).

      In response to the reviewer’s comment, we now added the following description in the paper regarding the bar fixation behavior, citing Rimniceanu&Frye 2023.

      (Line#239) “Another potential explanation arises from recent studies demonstrating that proprioceptive feedback provided during flight turns in a magnetically tethered assay strongly dampens the amplitude of wing and head responses (Cellini and Mongeau, 2022; Rimniceanu et al., 2023).”

      Relevance of the EC model: EC-related studies by the authors linked cancellation signals to saccades (Kim et al, 2014 & 2017). Puzzlingly, the authors applied an EC model to smooth movement, when the authors' own work showed that smooth course stabilizing flight turns do not receive cancellation signals (Fenk et al., 2021). Thus, in Fig. 4C, based on the state of the field, the efference copy signal should originate from the torque commands to initiate saccades, and not from torque to generate smooth movement. As this group previously showed, cancellation signals are quantitatively tuned to that of the expected visual input during saccades. Importantly, this tuning would be to the anticipated saccadic turn optic flow. Thus the authors' results supporting an all-or-none model appear in direct conflict with the author's previous work. Further, the addition-only model is not particularly helpful as it has been already refuted by behavioral experiments (Rimneceanu & Frye, Mongeau & Frye).

      Thank you for this constructive comment. Efference copy is best established for brief, discrete actions like flight saccades. While motor-related modulation of visual processing has been reported across short- and long-duration behaviours (Chiappe et al., 2010; Fujiwara et al., 2017; Kim et al., 2015, 2017; Maimon et al., 2010; Turner et al., 2022), only flight saccade-associated signals exhibit the temporal profile appropriate to cancel reafferent input. However, von Holst & Mittelstaedt (1950) originally formulated efference copy to explain the smooth optomotor response of hoverflies. In HS/VS recordings in previous studies, however, we could not detect membrane-potential changes tied to baseline wing-beat amplitude (data not shown), but further work is needed. 

      Note that visually evoked flight turns analyzed in this paper have relatively fast dynamics. Fenk et al. (2021) showed that HS cells carry EC-like motor signals during both loom-evoked turns and spontaneous saccades. Building on this, we tested whether object-evoked rapid turns modulate other visuomotor pathways. Although Fenk et al. also found that optomotor turns lack motor input to HS cells, the authors did not test whether the optomotor pathway suppresses other reflexes, such as loom-evoked turns. Our new behavioral data (Figure 6) show that optomotor turns indeed suppress loom-evoked turns, suggesting a potential EC signal arising from the optomotor pathway that inhibits loom-responsive visual neurons.

      In Kim et al. (2017), the authors argued that HS/VS neurons receive a “quantitatively tuned” efference copy that varies across cell types: yaw-sensitive LPTCs are strongly suppressed, roll-sensitive cells receive intermediate input, and pitch-sensitive cells receive little or none. We also showed that when the amplitude of ongoing visual drive changes, the amplitude of saccade-related potentials (SRPs) scales linearly. This proportionality does not imply a genuinely graded EC, however, because SRP amplitude could vary solely through changes in driving force (Vm – Vrest) with a fixed EC conductance. Crucially, SRPs do not fully suppress feed-forward visual signalling, arguing against an all-or-none EC mechanism.

      How, then, can the cellular and behavioural data be reconciled? Silencing HS/VS neurons—or their primary inputs, the T4/T5 neurons—does not markedly diminish the optomotor response in flight (Fenk et al., 2014; Kim et al., 2017), indicating the presence of additional, as-yet-unidentified pathways.

      Physiological recordings from other visual neurons that drive the optomotor response in flying Drosophila are therefore needed to determine how strongly they are suppressed during loom-evoked turns.

      Behavioral evidence for all-or-none EC model: The authors state "unless the stability reflex is suppressed during the flies' object evoked turns, the turns should slow down more strongly with the dense background than the sparse one". This hypothesis is based on the fact that the optomotor response magnitude is larger with a denser background, as would be predicted by an EMD model (because there are more pixels projected onto the eye). However, based on the authors' previous work, the EC should be tuned to optic flow and thus the turning velocity (or amplitude). Thus the EC need not be directly tied to the background statistics, as they claim. For instance, I think it would be important to distinguish whether a mismatch in reafferent velocity (optic flow) links to distinct turn velocities (and thus position). This would require moving the background at different velocities (co- and anti-directionally) at the onset of bar motion. Overall, there are alternative hypotheses here that need to be discussed and more fully explored (as presented by Bender & Dickinson and in work by the Maimon group).

      We appreciate the reviewer’s important suggestion. In response, we performed the recommended experiment. In Figures 5 and 6 of the revised manuscript, we now present how bar- or loom-evoked flight turns affect the response to a moving background pattern. These experiments revealed that bar-evoked turns do not suppress the optic flow response, whereas loom-evoked turns strongly suppress it. Specifically, when background motion began 100 ms after the onset of loom expansion, the response to the background was significantly suppressed. Although weak residual responses to the background motion were observed in this case, this could be due to background motion occurring outside of the suppression interval, which may correspond in duration to the duration of flight turns (Figure 6C,D). 

      The lack of suppression of the optic flow response during and after bar-evoked turns appears to suggest that the responses are added linearly (Figure 5), seemingly contradicting the lack of dynamic change when the background dot density was altered (Figure 7, Figure 7–figure supplement 1). That is, the experimental result in Figure 5 supports either an addition-only or a graded efference copy (EC) model. However, the result in Figure 7 supports an all-or-none EC model. If a graded EC were used, the amplitude of the EC should be updated almost instantaneously when the static background changes.

      Another possibility is that the optic flow during self-generated turns in a static background is extremely weak compared to the optic flow input generated by physically moving the pattern, perhaps due to the rapid nature of head movements. Indeed, detailed kinematic analysis of head movement during spontaneous saccades in blow flies revealed that the head reaches the target angle before the body completes the orientation change, making the effective speed of reafferent optic flow higher than the speed of body rotation (Hateren and Schilstra, 1999). To test these hypotheses, further experiments will be needed for bar-evoked flight turns.

      Publishing the reviewed preprint:

      (1) The Reviewed Preprint (including the full text of the preprint we reviewed, the eLife assessment, and public reviews) will typically be published in two weeks' time.

      Please let us know if you would like to provide provisional author responses to be posted at the same time (if so, please send these by email). Please do not resubmit within the next two/three weeks, as we will need to publish the first version of the Reviewed Preprint first.

      If there are any factual errors in the eLife assessment or public reviews, or other issues we should be aware of, please let us know as soon as possible.

      (2) After publication of the Reviewed Preprint, you can use the link below to submit a revised version. There is no deadline to resubmit. Before resubmitting, please ensure that you update the preprint at the preprint server to correspond with the revised version. Upon submitting a revised version, we will ask the editors and reviewers if it's appropriate to update their assessment and public reviews, which will be included alongside the revised Reviewed Preprint. At that time we will also post the recommendations to the authors and the author responses you provide with the revised version. In the author response, please respond to the public reviews (where relevant) and the recommendations to the authors.

      (3) Alternatively, you can proceed with the current version of the Reviewed Preprint (once published), without revisions, and request an eLife Version of Record. See the Author Guide for further information: https://elife-rp.msubmit.net/html/elife-rp_author_instructions.html#vor. However, most authors decide to request a Version of Record after a round of revision.

      (4) After publication of eLife's Reviewed Preprint, you also have the option to submit/publish in another journal instead: if you choose to do this, please let us know so we can update our records.

      The reviewers identified two key revisions that could improve the assessment of the paper:

      (1) Consideration of saccades within the model framework (outlined by reviewer 3).

      (2) Addition of physiology data to support the conclusions of the paper (outlined by reviewer 2). If this is not feasible within the timescale of revisions, the paper would need to be revised to clarify that the model leads to a hypothesis that would need to be tested with future physiology experiments.

      Thank you for these comments.

      Regarding revision point #1, we have added Figure 2–figure supplement 2, where we incorporated our position-velocity model (estimated in Figure 1) into the framework of the integrate-and-saccade model. A detailed description of this model is now provided in the main text (Lines 190–203).

      For revision point #2, obtaining electrophysiological evidence for efference copy remains challenging, as neither the visual neurons nor the efference-copy neuron has been identified for the wing optomotor response. As suggested by the reviewers, we have revised the title of the paper to reduce emphasis on efference copy and have noted electrophysiological recordings as a direction for future work.

      old title: A visual efference copy-based navigation algorithm in Drosophila for complex visual environments

      new title: Integrative models of visually guided steering in Drosophila

      Specific recommendations are detailed below.

      Reviewer #2 (Recommendations For The Authors):

      To directly demonstrate if an efference copy is non-scaled, the following experiments can be helpful: record from HS/VS cells and examine the relation between the amplitude of the succade-suppression signal vs. succade amplitude.

      Thanks for raising this important point. We previously carried out the suggested analysis for loom-evoked saccades in Fenk et al. (2021). There, significant correlations emerged between wing-response amplitude and saccade-related potentials (Figures 2F and 3C). However, we did not interpret the strong correlation (r ≈ 0.8) as evidence for a graded efference copy, because the amplitude of saccade-related potentials appeared to be bimodal. Upon presentation of the looming stimulus, flies either executed large evasive turns or showed minimal changes in wing-stroke amplitude. Large wing responses were accompanied by strong, saturated suppression of HS-cell membrane potential, whereas trials without wing responses produced only weak modulations—reflected in the bimodal distribution of saccade-related potential amplitudes (Figure 3C). 

      Importantly, in rigidly tethered preparations—where these potentials are typically measured—the absence of proprioceptive feedback can itself drive wingbeat amplitudes to saturation during saccades. We therefore reasoned that the lack of intermediate-sized flight saccades would naturally yield correspondingly saturated saccade-related potentials, even if a graded EC system is in play. 

      In Kim et al. (2017), we also performed a comprehensive analysis of spontaneous saccade-related potentials across all HS/VS cell types. When we later examined the relationship between saccade amplitude and the corresponding saccade-related potentials in each cell type, we could not find any statistically significant correlation (unpublished data).

      measure how much a weak visual stimulus and a strong visual stimulus are suppressed by the suppression signal. If the signal is non-scaled, visual stimuli should always be suppressed independently of their intensities.

      Thank you for this important suggestion. As mentioned in our response to the previous comment, we believe it is not feasible to record from neurons responsible for the body optomotor response at this point, as their identity remains unknown. Regarding the HS/VS cells, our previous study showed that HS cells are not always fully suppressed. The changes in saccade-related potential amplitude can be described as a linear function of the pre-saccadic visually-evoked membrane potential (Figure 7 in Kim et al., 2017). 

      As suggested by Fenk et al. 2014 (doi: 10.1016/j.cub.2014.10.042), HS cells might also be responsive to a moving bar. If that is the case, and if you present a bar and background (either sparse or dense) in a closed-loop manner to a head-fixed fly, HS cells might be sensitive only to the bar but not to the background (independently of the density).

      Thanks for pointing out this important issue. HS cells indeed respond strongly to the horizontal movement of a vertical bar, as expected given that their receptive fields are formed by the integration of local optic flow vectors. In one of our previous studies (Supplemental Figure 1 in Kim et al., 2015), we showed that the response amplitude to a single vertical bar is roughly equivalent to that elicited by a vertical grating composed of 12 bars of the same size. Therefore, we believe that HS cells are likely to contribute to the head response to a moving vertical bar. In a body-fixed flight simulator, HS cells would respond only to the bar if the bar runs in a closed loop with a static background. In this scenario, HS cells are likely to play a role in the head optomotor response.

      Note also that the role of HS cells in the wing optomotor response remains unresolved. Unilateral activation of HS cells has been shown to elicit locomotor turns in walking Drosophila (Fujiwara et al., 2017), as well as in flying individuals (unpublished data from our lab). However, a previous study also showed that strong silencing of HS/VS cells significantly reduced the head optomotor response, but not the wing optomotor response (Kim et al., 2017).

      If neurophysiology is technically challenging, an alternative way might pay attention to a head movement that exclusively follows the background (Fox et al., 2014 (doi: 10.1242/jeb.080192)). Because HS cells are thought to promote head rotation to background motion, a non-scaled suppression signal on HS cells would always suppress the head rotation independently of the background density.

      Thanks for this helpful comment. We have analyzed head movements during bar-evoked flight turns (Figure 7–figure supplement 1B) and found no significant changes across different background dot densities. We think that this might suggest that HS cells are unlikely to receive suppressive inputs during bar-evoked turns, akin to the lack of modulation during optomotor turns.

      Another way to separate a potential efference copy from other mechanisms (more global inhibition) is the directionality. A global inhibition would suppress the response to the background even if the background moves in the same direction as self-motion, but the efference copy would not.

      Thanks for this important point. In Heisenberg and Wolf, 1979, it was proposed that modulation might be bidirectional, with behavioral effects observed only for perturbations in the “unexpected” direction. In our new data on loom-evoked turns (Figure 6), the suppression appears equally strong for background motion in either direction, supporting an all-or-none suppression mechanism.

      Besides, in general, it is unclear if you think an efference copy operates both in smooth pursuits and saccades or if such a signal is only present during saccades. Your previous neurophysiological work supports the latter. Are your behavioral results consistent with the previous saccade suppression idea, or do you propose a new type of efference copy that also operates in smooth pursuits?

      Thanks for raising this important point. von Holst and Mittelstaedt (1950) originally introduced the concept of efference copy to explain the smooth optomotor response. We previously analyzed electrophysiological recordings from HS cells for membrane-potential changes associated with slow deviations in wing-steering angle but found none. However, this negative result does not entirely rule out modulation of visual processing during smooth flight turns, given the slow drift in membrane potential observed in most whole-cell recordings.

      In this study, We examined only the interactions among visuomotor pathways during these rapid flight turns as the dynamics of visually evoked turns are almost as rapid as spontaneous saccades. Our data reveal that interactions between distinct visuomotor reflexes are more diverse than previously appreciated.

      Minor comments:

      Line 108, 109: match the description between here and the labels in Fig. 1F.

      Thank you for indicating this issue. We have defined the general equation to obtain the position and velocity components in the main text lines 108,109, but due to a slight asymmetry in the data (Fig. 1E) we used the approach indicated in Fig. 1F. and explained in lines 113-117.

      Fig.1 F: If the position-dependent component is due to fatigue, the tuning curve's shape is likely changed (shrunk or extended) depending on the stimulus speed. How can you generalize the tuning curve shown here? Does the result hold even if the stimulus speed/contrast/spatial frequency is changed?

      We appreciate this indication. We believed that fatigue may be the reason why the wing response to the grating stimulus showed that significant decay (Fig. 1E). As you mention, the stimulus speed would increase the amplitude of the fly’s response up to a saturation point. We addressed this in our model by multiplying the derived value by the angular velocity of the grating.

      Regarding the contrast, and spatial frequency we did not test it experimentally, instead, we simulated our model for changing visual feedback (Fig. 4A, B), which can be seen as increasing/decreasing contrast of a grating. An increase in the contrast would increase the response of the fly to the grating and so will contribute to dampening the response to the foreground object (Fig. 4C).

      Line 233-255: Here, the description sounds like you will consider several parallel objects (e.g., two stripes) in the visual field instead of the combination of the figure and background (which is referred to in the following paragraph).

      Thank you for pointing it out. Indeed it was slightly ambiguous. We have addressed this by explaining the specific situation of a combination of an object and the background in lines 231-233.

      Figure 6C: you kept the foreground visual field between sparse and dense random dot backgrounds to keep the bar's saliency. Is it sure that this does not influence the difference in the fly's response to these two backgrounds (in Figure 6B)?

      This is a good point that we have also discussed internally. We also carried out similar experiments with a fully covered background and found no significant differences (Figure 7–figure supplement 1).

      Reviewer #3 (Recommendations For The Authors):

      Identify and analyze flight saccade dynamics in the raw trajectories (e.g., Fig. 3B). There should be some since the bar is near the 'sweet spot' for triggering saccades (see Mongeau & Frye, 2017).

      Thank you for bringing up this interesting point. In previous work, it was reported that the fly fixated on a vertical bar through saccadic turns rather than smooth-tracking (Mongeau & Frye, 2017). When the bar width was thin (<15 deg) there was barely one saccade per second (Mongeau & Frye, 2017, Fig. 4). In our magno tether essay (Fig. 3A, B) the object width was 11.25 degrees, and the object moved for a short time window, and so the fly only generated the saccade related to the onset of the object. It could not be considered as a saccade some small turns of a few degrees that are likely related to small perturbations in comparison to those previously reported (Mongeau & Frye, 2017). Additionally, in our protocol (Fig. 3A) from onset time (‘go’ mark), only a single object moved, within an empty background, so in principle there is no trigger for a switch to a smooth movement. We addressed this in lines x-x.

      Consider updating the Poggio model with flight saccades (switched, integrate-and-fire).

      We appreciate this suggestion. Following previous work (Mongeau et al., 2017), we expanded our model to include a saccade mechanism: the torque produced by the summed position- and velocity-dependent components is now replaced by an integrate-and-fire saccade (Figure 2—figure supplement 2). We optimized the saccade interval and amplitude so that both vary linearly with stimulus amplitude and faithfully reproduce the kinematic properties reported previously (Mongeau et al., 2017).  

      Please engage more with the literature, especially work that directly conflicts with your conclusions (see above). Also, highly relevant work by Bender & Dickinson was not sufficiently discussed. Spot results presented in Fig. 3 should be contextualized in light of the work of Mongeau et al., 2019, who performed similar experiments and identified a switch in saccade valence.

      We appreciate your pointing out the relevant previous work. We have added references to the following papers and tried to describe the relationship between our data and previous ones.

      Bender & Dickinson 2006

      (Line#162) “This simulation experiment is reminiscent of the magnetically tethered flight assay, where a flying fly remains fixed at a position but is free to rotate around its yaw axis (Bender and Dickinson, 2006b; Cellini et al., 2022; G. Kim et al., 2023; Mongeau and Frye, 2017).”

      (Line#218) “We tested the predictions of our models with flies flying in an environment similar to that used in the simulation (Figure 3A). A fly was tethered to a short steel pin positioned vertically at the center of a vertically oriented magnetic field, allowing it to rotate around its yaw axis with minimal friction (Bender and Dickinson, 2006b; Cellini et al., 2022; G. Kim et al., 2023).”

      (Line#238) “To determine if our assay imposes additional friction compared to other assays used in previous studies, we analyzed the dynamics of spontaneous saccades during the “freeze” phase (Figure 3–figure supplement 1A). We found their duration and amplitude to be within the range reported previously (Bender and Dickinson, 2006b; Mongeau and Frye, 2017) (Figure 3–figure supplement 1B-D). 

      Mongeau et al., 2019

      (Line#196) “During this behavior, it has been proposed that visual circuits compute an integrated error of the bar position with respect to the frontal midline and triggers a saccadic turn toward the bar when the integrated value reaches a threshold (Frighetto and Frye, 2023; Mongeau et al., 2019; Mongeau and Frye, 2017). We expanded our bar fixation model to incorporate this behavioral strategy (Figure 2–figure supplement 2).”

      This paper shows that the dynamics of saccadic flight turns elicited by a rotating bar or spot determine whether flies display attraction or aversion. In that study, the visual stimulus—a bar or spot—rotated slowly at a constant 75 deg s⁻¹. By contrast, in our Figure 3 the object moves much faster, driving the neural “integrator” to saturation and triggering an almost immediate flight turn. In Mongeau et al. (2019), saccades occur at variable times and their amplitudes and directions are more stochastic, again reflecting the slower stimulus speed. Because these differences all arise from the disparity in object speed, we did not cite Mongeau et al. (2019) in Figure 3 or the associated text.

      In addition to the two papers cited above, we have incorporated several relevant studies on the Drosophila visuomotor control identified through the reviewers’ insightful comments. Examples include:

      Frighetto G, Frye MA. 2023 (Line#195, 464)

      Rimniceanu et al., 2023 (Line#241)

      Cellini & Mongeau 2020 (Line#91)

      Cellini & Mongeau 2022 (Line#241)

      Cellini et al., 2022 (LIne#91, 162, 218)

      Many citations are not in the proper format (e.g. using numbers rather than authors' last name).

      Thank you for letting us know. We have changed the remaining citations to the proper format.

    1. eLife Assessment

      This valuable study reports evidence that items maintained in working memory can bias attention in an oscillatory manner, with the attentional capture effect fluctuating at theta frequency. The study provides incomplete evidence that this dynamic attentional bias is associated with oscillatory neural mechanisms, particularly in the alpha and theta bands, as measured by EEG. The study will be relevant for researchers studying attention, working memory, and neural oscillations, particularly those interested in how memory and perception interact over time.

    2. Reviewer #1 (Public review):

      Summary:

      In the presented paper, Lu and colleagues focus on how items held in working memory bias someone's attention. In a series of three experiments, they utilized a similar paradigm in which subjects were asked to maintain two colored squares in memory for a short and variable time. After this delay, they either tested one of the memory items or asked subjects to perform a search task.

      In the search task, items could share colors with the memory items, and the authors were interested in how these would capture attention, using reaction time as a proxy. The behavioral data suggest that attention oscillates between the two items. At different maintenance intervals, the authors observed that items in memory captured different amounts of attention (attentional capture effect).

      This attentional bias fluctuates over time at approximately the theta frequency range of the EEG spectrum. This part of the study is a replication of Peters and colleagues (2020).

      Next, the authors used EEG recordings to better understand the neural mechanisms underlying this process. They present results suggesting that this attentional capture effect is positively correlated with the mean amplitude of alpha power. Furthermore, they show that the weighted phase lag index (wPLI) between the alpha and theta bands across different electrodes also fluctuates at the theta frequency.

      Strengths:

      The authors focus on an interesting and timely topic: how items in working memory can bias our attention. This line of research could improve our understanding of the neural mechanisms underlying working memory, specifically how we maintain multiple items and how these interact with attentional processes. This approach is intriguing because it can shed light on neuronal mechanisms not only through behavioral measures but also by incorporating brain recordings, which is definitely a strength.

      Subjects performed several blocks of experiments, ranging from 4 to 30, over a few days, depending on the experiment. This makes the results - especially those from behavioral experiments 2 and 3, which included the most repetitions - particularly robust.

      Weaknesses:

      One of the main EEG results is based on the weighted phase lag index (wPLI) between oscillations in the alpha and theta bands. In my opinion, this is problematic, as wPLI measures the locking of oscillations at the same frequency. It quantifies how reliably the phase difference stays the same over time. If these oscillations have different frequencies, the phase difference cannot remain consistent. Even worse, modeling data show that even very small fluctuations in frequency between signals make wPLI artificially small (Cohen, 2015).

      Another result from the electrophysiology data shows that the attentional capture effect is positively correlated with the mean amplitude of alpha power. In the presented scatter plot, it seems that this result is driven by one outlier. Unfortunately, Pearson correlation is very sensitive to outliers, and the entire analysis can be driven by an extreme case. I extracted data from the plot and obtained a Pearson correlation of 0.4, similar to what the authors report. However, the Spearman correlation, which is robust against outliers, was only 0.13 (p = 0.57), indicating a non-significant relationship.

      The behavioral data are interesting, but in my opinion, they closely replicate Peters and colleagues (2020) using a different paradigm. In that study, participants memorized four spatial positions that formed the endpoints of two objects, and one object was cued. Similarly, reaction times fluctuated at theta frequency, and there was an anti-phase relationship between the two objects. The main novelty of the present study is that this bias can be transferred to an unrelated task. While the current study extends Peters and colleagues' findings to a different task context, the lack of a thorough, direct comparison with Peters et al. limits the clarity of the novel insights provided.

      Cohen, M. X. (2015). Effects of time lag and frequency matching on phase-based connectivity. Journal of Neuroscience Methods, 250, 137-146.

      Peters, B., Kaiser, J., Rahm, B., & Bledowski, C. (2020). Object-based attention prioritizes working memory contents at a theta rhythm. Journal of Experimental Psychology: General, 150(6), 1250-1256.

    3. Reviewer #2 (Public review):

      The information provided in the current version of the manuscript is not sufficient to assess the scientific significance of the study.

      (1) In many cases, the details of the experiments or behavioral tasks described in the main text are not consistent with those provided in the Materials and Methods section. Below, I list only a few of these discrepancies as examples:

      a) For Experiment 1, the Methods section states that the detection stimulus was presented for 2000 ms (lines 494 and 498), but Figure 1 in the main text indicates a duration of 1500 ms.

      b) For Experiment 2, not only is the range of SOAs mentioned in the Methods section inconsistent with that shown in the main text and the corresponding figure, but the task design also differs between sections.

      c) For Experiment 3, the main text indicates that EEG recordings were conducted, but in the Methods section, the EEG recording appears to have been part of Experiment 2 (lines 538-540).

      (2) The results described in the text often do not match what is shown in the corresponding figure. For example:

      a) In lines 171-178, the SOAs at which a significant difference was found between the two conditions do not appear to match those shown in Figure 2A.

      b) In Figure 4, the figure legend (lines 225-228) does not correspond to the content shown in the figure.

      c) In Figure 9, not sufficient information is provided within the figure or in the text, making it difficult to understand. Consequently, the results described in the text cannot be clearly linked to the figure.

      (3) Insufficient information is provided regarding the data analysis procedures, particularly the permutation tests used for the data presented in Figures 2B, 4, and 10. The results shown in these figures are critical for the main conclusions drawn in the manuscript.

      Given these issues, it is not possible to provide a detailed review of the study, particularly regarding its scientific significance.

    1. eLife Assessment

      This study presents valuable computational findings on the neural basis of learning new motor memories and the savings using recurrent neural networks. The evidence supporting the claims of the authors is solid, but it would benefit from more controls and from considering the role of explicit strategies and other brain regions. This work will be of interest to computational and experimental neuroscientists working in motor learning.

    2. Reviewer #1 (Public review):

      Summary:

      Shahbazi et al used a recurrent neural network model trained to control a musculoskeletal model of the arm to investigate how neural populations accommodate activity patterns underpinning savings. The paper draws upon the recent finding of a "uniform shift" in preparatory activity in monkey motor cortex associated with savings, and leverages full access to a computational model to establish causality.

      Strengths:

      The paper is well written, and the figures are clearly presented. The key finding that the uniform shift first reported based on neural recordings by Sun et al. emerges in artificial neural networks performing a similar task is interesting and well-backed by their analyses. Manipulating this uniform shift to show that it drives behavioural savings is an important causal confirmation of the proposal by Sun et al.

      Weaknesses / Comments:

      As mentioned earlier, the core results are well backed by the analyses. Most of my comments relate to adding more controls and additional questions that could be explored with the model to strengthen the paper.

      (1) Savings are quantified as more rapid relearning of the FF upon re-exposure (e.g., Figure 3). This finding is based on backpropagation through time, but would this hold when using a different optimiser, e.g., FORCE?

      (2) The authors should include a "null model" showing that training on a different reaching task following NF, as opposed to FF2, won't show something akin to a uniform shift during preparation due to the adoption of TDR and having similar targets.

      (3) The analyses of network activity during movement preparation (Figure 4) nicely replicate the key finding in Sun et al, but I think the authors could leverage the full access to their network and go further, e.g., by examining changes (or the lack of) during execution in FF2 with respect to FF (and perhaps in a future NF2 with respect to NF), including whether execution activity lives also lives in parallel hyperplanes, etc.

      (4) Related to the above, while the results are interesting and the paper is well done, I kept wishing that the authors had done "more" with their model. This could be one or two final sections on "predictions" that would nicely complement their "validation" of the uniform shift, and that, in my opinion, would greatly increase the impact of the paper. In particular:<br /> a) What would be the effect of learning more "tasks"? For example, is there a limit on how many fields can be learned? (You show something related by manipulating network size, but this is slightly different.)<br /> b) Figure 5 is a nice causal demonstration that the uniform shift is related to savings. However, and related to comment #3, it'd be interesting to see more details about how the behaviour and the network activity changes as preparatory activity shifts along this axis, in particular regarding how moving the preparatory states affect the organisation and dynamics of upcoming execution activity -these are the kind of intuitions that modelling studies like this one can provide.<br /> c) The authors focus on a task design that spans baseline, FF, NF, FF2 to replicate the original study by Sun et al. However, it would be interesting if they generated predictions for neural changes to other types of tasks that have been studied behaviourally. These could include, for example: (i) modelling a visuomotor rotation or a mirror reversal task; (ii) having to adapt to a FF in the opposite direction; (iii) investigating the role of adding an explicit context and having the networks learn multiple FF; and (iv) trying to learn FF fields in opposite directions, perhaps restricted to specific targets. As the authors know, all these questions and more have been studied with similar behavioural paradigms, and it would be nice to see what neural predictions are generated by this model.

      (5) On the Discussion: When extrapolating from neural network results to animals, the fact that your networks can learn implicitly doesn't mean that animals do learn implicitly. Indeed, I think the consensus view is that different perturbations may lead to the expression of different types of savings (e.g., FF vs VR, which seems to be more explicit). Besides, these different mechanisms may be primarily implemented by brain regions less directly tied to motor control (e.g., cerebellum, parietal cortex?), which are not directly implemented in the authors' model.

      These aspects (limitations) should be discussed in the paper.

    3. Reviewer #2 (Public review):

      Summary:

      Shahbazi et al. trained recurrent neural networks (RNNs) to simulate human upper limb movement during adaptation to a force field perturbation. They demonstrated that throughout adaptation, the pattern of motor commands to the muscles of the simulated arm changed, allowing the perturbed movements to regain their typical, perturbation-free straight-line paths. After this initial learning block (FF1), the network encountered null-fields to wash out the adaptation, before re-experiencing the force in a second learning block (FF2). Upon re-exposure, the network learned faster than during initial learning, consistent with the savings observed in behavioral studies of adaptation. They also found that as the number of hidden units in the RNN increased, so did the probability of exhibiting savings. The authors concluded that these results propose a neural basis for savings that is independent of context and strategic processes.

      Strengths:

      The paper addresses an important and controversial topic in motor adaptation: the mechanism underlying motor memory. The RNN simulation reproduces behavioral hallmarks of adaptation, and it provides a useful illustration of the pattern of muscle activity underlying human-like movements under both normal and perturbing conditions. While the savings effect produced by the network, though significant, appears somewhat small, the simulation demonstrating an increase in savings with a greater number of hidden units is particularly intriguing.

      Weaknesses:

      (1) To be transparent, savings in motor adaptation have been a primary focus of my own research. Some core findings presented in this paper are at odds with the ideas I and others have previously put forward. While I don't want to impose my agenda on the authors of this paper, I do think the authors should address these issues.

      a) The authors acknowledge the ongoing debate in the literature regarding the mechanisms underlying savings, particularly whether it stems from explicit or implicit learning processes. However, it remains unclear how the current work addresses this debate. There is already a considerable body of research, particularly in visuomotor adaptation, demonstrating that savings is predominantly driven by explicit strategies. For example, when people are asked to report their strategy, they recall a strategy that was useful during the first learning block (Morehead et al. 2015). Furthermore, savings are abolished under experimental manipulations designed to eliminate strategic contributions (e.g., Haith et al., 2015; Huberdeau et al., 2019; Avraham et al., 2021). The authors briefly state that their findings support the hypothesis that a neural basis of memory retention underlying savings can be independent of cognitive or strategic learning components, and that savings can be characterized as implicit. While these statements may be true, it is not clear how this work substantiates these claims.<br /> b) Our research has also demonstrated that if implicit adaptation is completely washed out after the initial learning block, it not only fails to exhibit savings but is actually attenuated relative to the first learning block (Avraham et al., 2021). This phenomenon of attenuation upon relearning can also be seen in other studies of visuomotor adaptation (e.g., Leow et al., 2020; Yin and Wei, 2020; Hamel et al., 2021; Hamel et al., 2022; Wang and Ivry, 2023; Hadjiosif et al., 2023). More recently, we have shown that this attenuation is due to anterograde interference arising from the experience with the washout block experience (Avraham and Ivry, 2025). We illustrated that the implicit system is highly susceptible to interference; it doesn't require exposure to salient opposite errors and can occur even following prolonged exposure to veridical feedback. The central thesis of this paper, namely that implicit savings can emerge through RNNs, is at odds with these empirical results. The authors should address this discrepancy.

      (2) This brings me to the question about neural correlates: The results are linked to activity in the primary motor cortex. How does that align with the well-established role of the cerebellum in implicit motor adaptation? And with the studies showing that savings are due to explicit strategies, which are generally associated with prefrontal regions?

      (3) The analysis on the complexity of the neural network (i.e., the number of hidden units) and its relationship to savings is very interesting. It makes sense to me that more complex networks would show more savings. I'm not sure I follow the author's explanation, but my understanding is that increased network complexity makes it more difficult to override the formed memory through interference (e.g., from the experience with NF2). Also, the results indicate that a network with 32 units led to a less-than-chance level of networks exhibiting savings (Figure 3b). What behavioral output does this configuration produce? Could this behavior manifest as attenuation upon relearning? Furthermore, if one were to examine an even smaller, simpler network (perhaps one more closely reflecting cerebellar circuits), would such a model predict attenuation rather than savings?

      (4) The authors emphasize that their network did not receive any explicit contextual signals related to the presence or absence of the force field (FF), thus operating in a 'context-free' manner. From my understanding, some existing models of context's role in motor memories (e.g., Oh and Schweighofer, 2019; Heald et al., 2021) propose that memory-related changes can be observed even without explicit contextual information, as contextual changes can be inferred from sudden or significant environmental shifts (e.g., the introduction or removal of perturbations). Given this, could the observed savings in the current simulation be explained by some form of contextual retrieval, inferred by the network from the re-presentation of the perturbation in FF2?

      (5) If there is residual hidden unit activity related to the FF at the end of the NF2 phase, how does the simulated movement revert back to baseline? Are there any differences in the movement trajectory, beyond just lateral deviation, between NF1 and NF2? The authors state that "changes in the preparatory hidden unit activity did not result in substantive changes in the motor commands (Figure 5b), which emphasizes that the uniform shift resides in the null space of motor output." However, Figure 5b appears to show visible changes in hidden unit activity. Don't these changes reflect a pattern of muscle activity that is the basis for behavior? These changes are indeed small, but it seems that so is the effect size for savings (Figure 3a). Could this suggest that there is not, in fact, a complete washout of initial learning during NF2 within the network?

    1. eLife Assessment

      This useful study replicates a previous finding that information about peripherally presented visual stimuli is represented in the foveal visual cortex, and extends it by demonstrating that these representations are similar to those evoked by foveally presented stimuli. The authors' gaze-contingent fMRI design provides solid evidence for these findings. Some of the stronger theoretical claims, such as that the effects are due to predictive pre-saccadic remapping, are not fully supported by the current results.

    2. Reviewer #1 (Public review):

      Summary:

      The main contributions of this paper are: (1) a replication of the surprising prior finding that information about peripherally-presented stimuli can be decoded from foveal V1 (Williams et al 2008), (2) a new demonstration of cross-decoding between stimuli presented in the periphery and stimuli presented at the fovea, (3) a demonstration that the information present in the fovea is based on shape not semantic category, and (4) a demonstration that the strength of foveal information about peripheral targets is correlated with the univariate response in the same block in IPS.

      Strengths:

      The design and methods appear sound, and finding (2) above is new, and importantly constrains our understanding of this surprising phenomenon. The basic effect investigated here is so surprising that even though it has been replicated several times since it was first reported in 2008, it is useful to replicate it again.

      Weaknesses:

      (1) The paper, including in the title ("Feedback of peripheral saccade targets to early foveal cortex") seems to assume that the feedback to foveal cortex occurs in conjunction with saccade preparation. However, participants in the original Williams et al (2008) paper never made saccades to the peripheral stimuli. So, saccade preparation is not necessary for this effect to occur. Some acknowledgement and discussion of this prior evidence against the interpretation of the effect as due to saccade preparation would be useful. (e.g., one might argue that saccade preparation is automatic when attending to peripheral stimuli.)

      (2) The most important new finding from this paper is the cross-decodability between stimuli presented in the fovea and stimuli presented in the periphery. This finding should be related to the prior behavioral finding (Yu & Shim, 2016) that when a foveal foil stimulus identical to a peripheral target is presented 150 ms after the onset of the peripheral target, visual discrimination of the peripheral target is improved, and this congruency effect occurred even though participants did not consciously perceive the foveal stimulus (Yu, Q., & Shim, W. M., 2016). Modulating foveal representation can influence visual discrimination in the periphery (Journal of Vision, 16(3), 15-15).

      (3) The prior literature should be laid out more clearly. For example, most readers will not realize that the basic effect of decodability of peripherally-presented stimuli in the fovea was first reported in 2008, and that that original paper already showed that the effect cannot arise from spillover effects from peripheral retinotopic cortex because it was not present in a retinotopic location between the cortical locus corresponding to the peripheral target and the fovea. (For example, this claim on lines 56-57 is not correct: "it remains unknown 1) whether information is fed back all the way to early visual areas".) What is needed is a clear presentation of the prior findings in one place in the introduction to the paper, followed by an articulation and motivation of the new questions addressed in this paper. If I were writing the paper, I would focus on the cross-decodability between foveal and peripheral stimuli, as I think that is the most revealing finding.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigated whether the identity of a peripheral saccade target object is predictively fed back to the foveal retinotopic cortex during saccade preparation, a critical prediction of the foveal prediction hypothesis proposed by Kroell & Rolfs (2022). To achieve this, the authors leveraged a gaze-contingent fMRI paradigm, where the peripheral saccade target was removed before the eyes landed near it, and used multivariate decoding analysis to quantify identity information in the foveal cortex. The results showed that the identity of the saccade target object can be decoded based on foveal cortex activity, despite the fovea never directly viewing the object, and that the foveal feedback representation was similar to passive viewing and not explained by spillover effects. Additionally, exploratory analysis suggested IPS as a candidate region mediating such foveal decodability. Overall, these findings provide neural evidence for the foveal cortex processing the features of the saccade target object, potentially supporting the maintenance of perceptual stability across saccadic eye movements.

      Strengths:

      This study is well-motivated by previous theoretical findings (Kroell & Rolfs, 2022), aiming to provide neural evidence for a potential neural mechanism of trans-saccadic perceptual stability. The question is important, and the gaze-contingent fMRI paradigm is a solid methodological choice for the research goal. The use of stimuli allowing orthogonal decoding of stimulus category vs stimulus shape is a nice strength, and the resulting distinctions in decoded information by brain region are clean. The results will be of interest to readers in the field, and they fill in some untested questions regarding pre-saccadic remapping and foveal feedback.

      Weaknesses:

      The conclusions feel a bit over-reaching; some strong theoretical claims are not fully supported, and the framing of prior literature is currently too narrow. A critical weakness lies in the inability to test a distinction between these findings (claiming to demonstrate that "feedback during saccade preparation must underlie this effect") and foveal feedback previously found during passive fixation (Williams et al., 2008). Discussions (and perhaps control analysis/experiments) about how these findings are specific to the saccade target and the temporal constraints on these effects are lacking. The relationship between the concepts of foveal prediction, foveal feedback, and predictive remapping needs more thorough treatment. The choice to use only 4 stimuli is justified in the manuscript, but remains an important limitation. The IPS results are intriguing but could be strengthened by additional control analysis. Finally, the manuscript claims the study was pre-registered ("detailing the hypotheses, methodology, and planned analyses prior to data collection"), but on the OSF link provided, there is just a brief summary paragraph, and the website says "there have been no completed registrations of this project".

      Specifics:

      (1) In the eccentricity-dependent decoding results (Figure 2B), are there any statistical tests to support the results being a U-shaped curve? The dip isn't especially pronounced. Is 4 degrees lower than the further ones? Are there alternative methods of quantifying this (e.g., fitting it to a linear and quadratic function)?

      (2) In the parametric modulation analysis, the evidence for IPS being the only region showing stronger fovea vs peripheral beta values was weak, especially given the exploratory nature of this analysis. The raw beta value can reflect other things, such as global brain fluctuations or signal-to-noise ratio. I would also want to see the results of the same analysis performed on the control condition decoding results.

      (3) Many of the claims feel overstated. There is an emphasis throughout the manuscript (including claims in the abstract) that these findings demonstrate foveal prediction, specifically that "image-specific feedback during saccade preparation must underlie this effect." To my understanding, one of the key aspects of the foveal prediction phenomenon that ties it closely to trans-saccadic stability is its specificity to the saccade target but not to other objects in the environment. However, it is not clear to what degree the observed findings are specific to saccade preparation and the peripheral saccade target. Should the observers be asked to make a saccade to another fixation location, or simply maintain passive fixation, will foveal retinotopic cortex similarly contain the object's identity information? Without these control conditions, the results are consistent with foveal prediction, but do not definitively demonstrate that as the cause, so claims need to be toned down.

      (4) Another critical aspect is the temporal locus of the feedback signal. In the paradigm, the authors ensured that the saccade target object was never foveated via the gaze-contingent procedure and a conservative data exclusion criterion, thus enabling the test of feedback signals to foveal retinotopic cortex. However, due to the temporal sluggishness of fMRI BOLD signals, it is unclear when the feedback signal arrives at the foveal retinotopic cortex. In other words, it is possible that the feedback signal arrives after the eyes land at the saccade target location. This possibility is also bolstered by Chambers et al. (2013)'s TMS study, where they found that TMS to the foveal cortex at 350-400 ms SOA interrupts the peripheral discrimination task. The authors should qualify their claims of the results occurring "during saccade preparation" (e.g., pg 1 ln 22) throughout the manuscript, and discuss the importance of temporal dynamics of the effect in supporting stability across saccades.

      (5) Relatedly, the claims that result in this paradigm reflect "activity exclusively related to predictive feedback" and "must originate from predictive rather than direct visual processes" (e.g., lines 60-65 and throughout) need to be toned down. The experimental design nicely rules out direct visual foveal stimulation, but predictive feedback is not the only alternative to that. The activation could also reflect mental imagery, visual working memory, attention, etc. Importantly, the experiment uses a block design, where the same exact image is presented multiple times over the block, and the activation is taken for the block as a whole. Thus, while at no point was the image presented at the fovea, there could still be more going on than temporally-specific and saccade-specific predictive feedback.

      (6) The authors should avoid using the terms foveal feedback and foveal prediction interchangeably. To me, foveal feedback refers to the findings of Williams et al. (2008), where participants maintained passive fixation and discriminated objects in the periphery (see also Fan et al., 2016), whereas foveal prediction refers to the neural mechanism hypothesized by Kroell & Rolfs (2022), occurring before a saccade to the target object and contains task irrelevant feature information.

      (7) More broadly, the treatment of how foveal prediction relates to saccadic remapping is overly simplistic. The authors seem to be taking the perspective that remapping is an attentional phenomenon marked by remapping of only attentional/spatial pointers, but this is not the classic or widely accepted definition of remapping. Within the field of saccadic remapping, it is an ongoing debate whether (/how/where/when) information about stimulus content is remapped alongside spatial location (and also whether the attentional pointer concept is even neurophysiologically viable). This relationship between saccadic remapping and foveal prediction needs clarification and deeper treatment, in both the introduction and discussion.

      (8) As part of this enhanced discussion, the findings should be better integrated with prior studies. E.g., there is some evidence for predictive remapping inducing integration of non-spatial features (some by the authors themselves; Harrison et al., 2013; Szinte et al., 2015). How do these findings relate to the observed results? Can the results simply be a special case of non-spatial feature integration between the currently attended and remapped location (fovea)? How are the results different from neurophysiological evidence for facilitation of the saccade target object's feature across the visual field (Burrow et al., 2014)? How might the results be reconciled with a prior fMRI study that failed to find decoding of stimulus content in remapped responses (Lescroart et al, 2016)? Might this reflect a difference between peripheral-to-peripheral vs peripheral-to-foveal remapping? A recent study by Chiu & Golomb (2025) provided supporting evidence for peripheral-to-fovea remapping (but not peripheral-to-peripheral remapping) of object-location binding (though in the post-saccadic time window), and suggested foveal prediction as the underlying mechanism.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, the authors used fMRI to determine whether peripherally viewed objects could be decoded from the foveal cortex, even when the objects themselves were never viewed foveally. Specifically, they investigated whether pre-saccadic target attributes (shape, semantic category) could be decoded from the foveal cortex. They found that object shape, but not semantic category, could be decoded, providing evidence that foveal feedback relies on low-mid-level information. The authors claim that this provides evidence for a mechanism underlying visual stability and object recognition across saccades.

      Strengths:

      I think this is another nice demonstration that peripheral information can be decoded from / is processed in the foveal cortex - the methods seem appropriate, and the experiments and analyses are carefully conducted, and the main results seem convincing. The paper itself was very clear and well-written.

      Weaknesses:

      There are a couple of reasons why I think the main theoretical conclusions drawn from the study might not be supported, and why a more thorough investigation might be needed to draw these conclusions.

      (1) The authors used a blocked design, with each object being shown repeatedly in the same block. This meant that the stimulus was entirely predictable on each block, which weakens the authors' claims about this being a predictive mechanism that facilitates object recognition - if the stimulus is 100% predictable, there is no aspect of recognition or discrimination actually being tested. I think to strengthen these claims, an experiment would need to have unpredictable stimuli, and potentially combine behavioural reports with decoding to see whether this mechanism can be linked to facilitating object recognition across saccades.

      (2) Given that foveal feedback has been found in previous studies that don't incorporate saccades, how is this a mechanism that might specifically contribute to stability across saccades, rather than just being a general mechanism that aids the processing/discrimination of peripherally-viewed stimuli? I don't think this paper addresses this point, which would seem to be crucial to differentiate the results from those of previous studies.

    1. eLife Assessment

      This important study uses a combination of eye-tracking and computational models based on Active Inference to explain behavior in a gaze-contingent cued-reversal paradigm with 6 - 10-month-old infants. The study demonstrates solid evidence that the same rigorous computational modeling standards commonly applied in studies in adults can also be applied in studies of infants' learning, and a cluster analysis reveals that the parameters of the winning model provide better pattern separation between identified subgroups than behavior or questionnaire data alone. However, the evidence for some specific claims is incomplete, due to poor behavioral performance, unclear significance of the pupil data, and complexity of the model fitting; the claims regarding implications for psychiatry were also considered to be too strong and unsupported by evidence. This work will be of interest to developmental psychologists and cognitive neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      The authors developed a new gaze-based reversal task to study 6 - 10-month-old infants, in what would typically be a very challenging age group to study behavior related to learning, exploration, and perseveration. Here, the research question is excellently motivated by pointing out the limitation of past work that has typically studied adult clinical populations using similar approaches, which presents only the endpoint of the developmental process. Thus, there is important clinical and scientific value in studying much earlier stages in the developmental process. Here, the authors accomplish this with a new gaze-based paradigm that allows them to fit a variety of complex computational models to data from 41 infants. The main advantage of their winning model is that the parameters provide better pattern separation between two identified clusters of participants compared to behavioral variables alone.

      Strengths:

      Overall, the paper is well-written, and the models and analyses are applied in a principled and thorough fashion. The authors do an excellent job of both motivating their research question and addressing it through their task and set of computational models. The scope is also quite ambitious, modeling both choices and pupillary responses, while also using the models to generate behavior that is comparable to the experimental data and performing a cluster analysis to compare the suitability of the model parameters vs. other behavioral/questionnaire data in performing pattern separation between participants.

      Weaknesses:

      However, despite these strengths, I had a number of concerns that may limit the reliability of the findings.

      First, given the fact that the rewards for the initial pre-reversal setting are defined by the first choice of the infants, it was unclear to me whether the behavioral patterns in Figure 2 really support the fact that there was in fact, (prediction-error-based) learning in the task at all. The behavioral analyses proceed very briskly without really addressing this question, before rapidly jumping off the complexity cliff to present the models. However, even with the models, the winning model only had free parameters for preference (c) and a left-right dominance (epsilon), which don't really capture mechanisms related to learning. The epistemic and extrinsic components included in the model at the 2nd stage could potentially help shed light on this question, but (unless I've misunderstood) they seem to be all-or-nothing parts of the model, and thus don't reappear in later analyses (e.g., cluster analysis) because they are not individual-specific parameters. Thus, the main learning-relevant aspects of the model seem divorced from the ability to perform clustering or other clinically relevant diagnoses downstream. Thus, it was unclear to me whether the results really capture mechanisms related to cognitive flexibility that motivate the manuscript in the introduction.

      My other main concern was the complexity of the models and the way model comparison was performed using the three stages. First of all, the set of models is quite complex and risks alienating many developmental psychologists who would otherwise be very interested in these findings. Thus, I'm curious why the authors didn't consider including much simpler context-based RL models (e.g., Rescorla-Wagner/Q-learning models) that explicitly use prediction-error updates and whose simplicity might better match the simplicity of the behavior that 6-10 month infants are capable of displaying. Certainly, preference (as an inverse temperature parameter for a softmax policy) and left-right dominance (as a bias) could be implemented with these much simpler models. Second, while the three-stage model comparison seems somewhat principled, it left me questioning whether the 1st stage or 2nd stage results might be impacted by later stages. For instance, if the Simple-discard model were to still win in the first stage, once omega and eta have been eliminated as free parameters. Of course, I understand that there may be feasibility issues with testing all combinatorial variants of the model. But it was unclear why this specific order was chosen and what consequences this sequential dependency in the model fitting may have for the conclusions. And while model identifiability is stated in the abstract as one of the strengths of this approach, there don't seem to be any clear analyses supporting this fact. I would have loved to see a model recovery analysis (see Wilson & Collins et al., eLife 2019) to support this statement.

    3. Reviewer #2 (Public review):

      Summary:

      This paper examines infants' learning in a novel gaze-contingent cued reversal learning task. The study provides strong evidence that infants learn in the task, and they characterize individual differences in learning using computational modeling. The best-fitting model of the set compared reflects a learning of mappings between context cues and outcomes that do not carry over across blocks. Infants are then clustered into two groups based on model parameter estimates capturing primacy bias and reward sensitivity. These groupings exhibited differences in infant temperament and other developmental measures. The modeling is rigorous, with model predictions accounting for substantial variance in infants' choices, and parameter estimates showing high recoverability. This study is important in that it demonstrates that such rigorous standards in computational modeling of behavior can be successfully deployed in infant studies.

      Strengths:

      The study provides evidence that infants exhibit cognitive flexibility within a reversal learning task and do not simply perseverate.

      The methods used within the novel gaze-contingent will be useful for other groups interested in studying learning and decision-making in infants.

      The study applies rigorous computational modeling approaches to infants' choices (inferred from gaze) and their physiological responses (i.e., pupil dilation) in the task, demonstrating that infants' reward learning is well-captured by an error-driven learning process.

      The authors conduct model comparison, posterior predictive checks, and parameter recoverability analyses and demonstrate that model parameters can be well estimated and that the model can recapitulate infant choice behavior.

      Physiological pupil dilation measures that correlate with prediction error signals from the model further validate the model as capturing the learning process.

      Weaknesses:

      It is not entirely clear that the individual differences in reversal learning identified between the two clusters of infants (ostensibly reflecting differences in cognitive flexibility) have construct validity or specificity for the associated developmental abilities that differ between groups (daily living, communication, motor function, and socialization).

      Similarly, it's not clear why the paper is framed as an advance for infant computational *psychiatry* rather than simply an advance in computational modeling of infant behavior. It seems to me that a more general framing is warranted. Basic cognitive development research can also benefit from cognitive hypothesis testing via computational model comparison and precise measurement of infants' behavior in reward learning tasks. Is there reason to believe that infants' behavior in this task might have construct validity for mental health problems related to cognitive flexibility later in development? Do the Vineland or IBQ-R-VSF prospectively predict clinical symptoms?

      A large proportion of the recruited infants (14 of 55) were excluded, but few details are provided on why and when they were excluded. Did the excluded infants differ on any of the non-task measures? This information would be helpful to understand limitations in the utility of the task or the generalizability of the findings.

      It is stated that: "The infants who completed at least three trials following the reversal were included in the analysis, as it is more likely that their expectations were violated in this interval." Are three trials post-reversal sufficient to obtain reliable estimates of model parameters? More details should be provided on the number of trials completed for all of the included/excluded infants.

    4. Reviewer #3 (Public review):

      This paper used computational modeling of infants' performance in a reversal learning paradigm to identify two subgroups of infants, one that initially learned a bit faster but then perseverated more and failed to switch after the reversal (yellow cluster), and those who sampled more before the switch but then perseverated less/switched better (magenta cluster - though see below for comments about infants' overall weak performance). The authors describe magenta babies as showing a profile of greater cognitive flexibility, which they note in adults is linked to better outcomes and a lower incidence of psychiatric disorder. Indeed, the yellow cluster scored less well on several scales of the Vineland and showed lower surgency on the IBQ than the magenta cluster. The authors argue that this paper paves the way for the field of "infant computational neuropsychiatry."

      In general, I think this is a fun and intriguing paper. That said, I have a number of concerns with how it is currently written.

      First, the role of pupil dilation in the models was really unclear -- I've read it through a few times and came away with different impressions each time. I am now pretty sure the models were only based on infants' behavioural responses (e.g., choice for the correct versus incorrect location) rather than differences in pupil size, but pupil size kept popping up throughout, and so I initially thought the clusters were based on that. The authors should clarify this so other readers are not confused. (One thing that might help is avoiding the word "behaviour" on its own, unless it is further specified as looking behaviour or not, as I assume that some would characterize pupil dilation as a behaviour as well.)

      If clusters were NOT based on pupil size (e.g., reaction to prediction error), why not? Was this attempted, and did no clusters emerge? Did the yellow and magenta group also differ in reaction to prediction error, or not? It seems like the argument that this work will be the basis of infant computational psychiatry would require that there not simply be a link between behaviour in an infant study and other measurements of their functioning - because many other papers to date have demonstrated such relationships, many longitudinally - but instead with the link to something where the neurobiology of the behaviour being studied is better understood. I assume this is why pupil dilation kept coming up, but again, it didn't actually seem to be part of the modelling unless I missed something. That is, although I think that this is a nice finding, currently I think the novelty of the finding, as well as the suggestion that it will start a whole new field, may be overblown. I certainly think the pupillometry data has promise, as does the LUMO data, which the authors alluded to being in the works. But perhaps the implications should be toned down a bit in this paper, until those data are further along.

      My final substantial comment (a few more minimal ones below) is that overall, babies did quite poorly at this task. Even after 9 post-switch trials, the magenta group was still responding at chance, and the yellow group seemed not to switch at all. Infants then all seemed to perform very well again during block 2, which makes it seem like they still had the original contingency in mind. That said, from what I could see, no data was provided about how many babies looked to the original correct first during Block 2. But based on the data, I assume they basically all went back to predicting on the first side, as otherwise their return to high levels of successful trials would not make sense, unless they somehow forgot the entire thing. It would be good to know for sure, and to have that data (specifically, how many babies looked to the original side again at the start of block 2) in the main paper. Given this overall lack of sensitive performance in the paradigm, even despite the cues signaling where the rewarding video would be changing completely (that is, the contingency between cue and outcome did not itself switch, the cues themselves did), it seems odd to discuss things like statistical or even skillful learning alongside these data.

    1. eLife Assessment

      This valuable study shows the impact of the metabolic state of bacteria on phage infection. The experimental results, based on various phages infecting E. coli, are solid and consistent with a two-step adsorption mathematical model, although the detailed evidence supporting this model is currently incomplete. This study should be of interest to the communities working on cell metabolism and on host-pathogen interactions.

    2. Reviewer #1 (Public review):

      In the wild, bacteria can be found in a wide range of metabolic states, including states in which they are resource-limited. Because phages heavily rely on the infected cell's molecular machinery to replicate, it is natural to wonder how phage-bacteria interactions depend on the metabolic state of the cell. In this work, Marantos et al. investigate specifically how the rate of infection of 5 different phages changes between cells grown in energy-rich conditions and cells grown in energy-depleted conditions. Their results clearly show that 4 out of the 5 phages studied display a significant reduction in infection rate in cells that are energetically depleted and provide a potential explanation for this observation by looking into the mechanisms that these phages use to irreversibly infect their host cells.

      The work also tries to explain the observation using a mathematical/mechanistic model that describes infection as the sequence of two steps, where a phage first needs to bind to a cell receptor, from which it can potentially unbind, and then irreversibly infects by injecting its genome. While the model is sensible from a mechanistic perspective, the experimental evidence that supports how each model's rate is affected by the cell metabolic state is weak, as only ratios of these rates can be inferred from the data.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate the dependence of phage adsorption rates on host metabolic state, using 5 coliphages that differ in their infection cycles and host receptors. They find that four of the 5 phages showed significantly reduced infection under low metabolic states, with phages that generally have weaker adsorption being more strongly affected by low metabolism. The authors complement their findings with a 2-step infection model where phages can disengage from their hosts after initial adsorption. The paper illustrates the power of standardized experimental protocols for quantitative trait comparisons and highlights the dependence of phage infection success on host physiology.

      Strengths:

      The paper is well written and clearly structured.

      The experiments are well-designed, and particularly commendable is the diligent use of control scenarios to allow for quantitative comparison between phages. This standardized protocol will be valuable for the entire phage community.

      The authors convincingly show the impact of host physiology on phage adsorption success. This dependence has so far mainly been considered for intracellular phage replication, and the paper shows that host physiology has to be taken into account at all steps of phage infection.

      Weaknesses:

      There are some concerns about the experimental setup and which conclusions can be drawn from it:

      Before phage infection, bacterial cultures are grown to exponential growth, washed, and then resuspended with glucose or arsenate-azide for 10min. It is however, questionable that 10 minutes is enough to simulate high and low metabolic states realistically. 10 minutes seems to be quite short to go from exponential growth to a low metabolic state, given the transcriptional memory of previous environments. It seems more likely that the population will be quite heterogeneous, with cells in various states of transition towards low metabolic states.

      Given that arsenate and azide inhibit cellular metabolism, i.e., have antimicrobial effects, cells might not just downregulate metabolism but also activate the stress response, and this causes some of the observed effects on phage adsorption. Therefore, the 'low metabolic state' of the cells in this paper could mean that cells are starved or that they are stressed or both.

      The abundance of receptors could change between the high and low metabolic media conditions and contribute to the observed differences in adsorption, while the authors seem to assume in their model that the initial adsorption rate always remains the same.

    4. Reviewer #3 (Public review):

      Summary:

      Marantos et al. showed that for some coliphages, the energetic state of the bacterial host cell has a strong impact on whether phage infection is initiated. The authors drew this conclusion from the observation that there are more free phages remaining in the medium after infection of arsenate-azide-treated cells as compared to after infection of untreated cells. These data were analyzed and reported both as ratios of the treated vs. untreated conditions and using a mass-action kinetic model of phage-cell collision in the infection mixture. The data supported the findings that for four phages infecting Escherichia coli bacteria, namely, phages λ, 𝜙80, m13, and T6, the phages are less likely to initiate infection if the host bacteria are energy-depleted. However, for phage T5, the authors found that their infection propensity is not impacted.

      Strengths:

      The data presented by the authors clearly supported the principal conclusion of the study ("Viral commitment to infection depends on host metabolism"). The five phages chosen by the authors represent different viral lifestyles and infection mechanisms, highlighting the potential applicability to other Escherichia coli phages. Finally, the authors successfully used a classic mass-action model of phage-cell collision to interpret their data. The simplicity of their experimental assay, combined with the use of this mathematical model, offers other investigators who study phage-bacterial interactions in other contexts a potentially useful toolkit to examine infection in general, and specifically, the dependence of phage infection on the host's metabolic state.

      Weaknesses:

      (1) The authors isolated and measured the numbers of free phages in the medium after infection of bacteria under different treatments. These measurements were analyzed in two different ways: (1) simply as ratios (corrected/normalized using different controls), and (2) fitted using a simple mathematical model. I have concerns regarding both analyses.

      1.1) For the first method, having different time points at which the sample of each phage is collected critically complicates data interpretation. As one incubates the phage-bacteria mixture for a longer time, more infection occurs, and the number of phages collected from the mixture decreases. Therefore, the different incubation time forfeits the goal of "a systematic and quantitative comparison across different phages [...]" (line 81), just as the authors self-criticized. Conceivably, the authors could have used the shortest measurement time for all phages (i.e., 10 minutes, as for phage λ). Alternatively, the authors could have applied a systematic criterion such as half (or any other fraction) of the latent period of each phage, which would still "maximize the incubation period while ensuring that manipulations were completed before the first infection cycle concluded" (lines 126-127). In my view, the seemingly arbitrary measurement time for each phage renders the entire first analysis very challenging to interpret. It also goes against the author's proposition that the protocol was "standardized" (line 92) or "consistent" (line 200). It is not clear what the readers are supposed to take away from this first analysis, or rather, which evidence, finding, or conclusion the manuscript would lose if the authors only presented the modeling-based analysis.

      1.2) The second method of analysis sought to remove the dependence of the measurements on time. I completely agree with this goal, and the findings extracted from this analysis significantly contributed to the merits of this manuscript. However, the authors achieved this goal using a single time point for each phage to calculate the infection rate (η). As shown in Figure S3, each of the phage depletion curves is anchored by only one data point (note that the P(t)/P(0) = 1 at t = 0 is assumed, not measured). This goes against the typical way this collision model is used in the literature, where a time series is measured and used to fit the model (e.g., DOI 10.1007/978-1-60327-164-6 18, or more recently, PMID 39700139). This practice in the current manuscript reduced the robustness of the inferred η values. This problem is exacerbated by assumptions used by the authors in formulating this model. For instance, the authors used a constant value for the bacterial concentration, B, because "bacterial growth and lysis were negligible" (lines 135-136). However, considering that the bacteria were cultured at 37oC in a very rich medium (first in YT broth, then in 2% glucose), the measurement times of 20, 30, and 55 minutes are most likely one or a few generations of bacterial growth and division.

      Related note: I suggest that one of the panels in Figure S3 should be moved to the main text, since it is critical to the second method of analysis.

      (2) The data were able to distinguish phages that successfully infected bacteria and those that remained free in the medium, and the authors appropriately interpreted the data as such throughout the Results section. However, in the Discussion (starting from the very first sentence, line 172), the authors used terms that include "adsorption" and "entry" more interchangeably (for example, see the three sentences in lines 310-313, for "viral entry efficiency is shaped by [...]", then "adsorption kinetics modeling"). I do not see how the authors' data could distinguish between adsorption (the phage particles attaching to the outside of the cell) and entry (the phage DNA being injected into the cell). Conceivably, any phage particles that irreversibly attach to a cell but do not yet inject their genome into the cell would still be removed from the medium and therefore not quantified. Another example: in lines 189-191, the authors interpreted that "[...] when the bacterium is in a low metabolic state, the phage does not bind irreversibly to the host", but how do the authors eliminate the case of no phage binding (i.e., the reversible step) to begin with? Similarly, in lines 283-293, how do the authors delineate whether energy depletion would increase the k_off term or decrease the k_inj term, because either would result in more free phages in the medium as observed in the data? I believe that the writing of the Discussion, as it stands now, is doing a disservice to the conclusions presented in the Results section.

      (3) The authors presented an argument that performing infection of all five phages in the same condition is an advantage, allowing for comparison across different phages. While this goal is a completely valid one, it is difficult to reconcile that with the fact that different phages require different optimal conditions for successful infection. For instance, phage T5 famously requires Ca2+ for successful infection into the host bacterium (and later successful replication); see PMID 13174489. However, all infections were performed in TMG, which lacks Ca2+. Perhaps the absence of T5 dependence on the host metabolism is because the infection condition used by the authors was not optimal for T5 to begin with? Similar arguments could be made for other phages.

      (4) Whereas the manuscript examined five coliphages, only phage T5 and phage λ were discussed extensively. I believe some discussion points for these two phages need clarification.

      4.1) Phage T5: The data obtained by the authors show that the infection rate of phage T5 is not impacted by the metabolic state of the host cell. Considering that the authors used the terms "infection", "adsorption", and "entry" interchangeably to refer to the irreversible commitment of a phage to a host cell (see point 2), this discussion regarding phage T5 lacks one critical literature context: DNA entry of phage T5 is known to occur in two phases (first-step transfer and second-step transfer). Critically, the second step can only occur if phage proteins encoded by the phage DNA transferred in the first step are expressed (see PMID 10577483 and the cited papers therein). In that context, metabolic poisoning of the host bacteria should have impeded T5 infection. The authors should comment on this point.

      4.2) Phage λ: The experiment using phage λ in this current study shares many resemblances to that in Brown et al. 2022. That feature alone is not a problem, but at many places in the text, the writing is ambiguous as to whether it is discussing the results in Brown et al. 2022 or in the current manuscript. I am giving three examples below, but this is not exhaustive: (i) Lines 67-69, there is no Brown et al. 2022 reference immediately after "a mutant phage variant (λh) could bypass this dependency [...]" (not just in the previous sentence); (ii) Line 228 should clearly say "Our previous findings suggested that phage λ is capable of [...]", since it concerns Brown et al., 2022, not the current study; and (iii) Lines 245-246, there is no Brown et al., 2022 reference immediately after "we observed that a mutant variant [...] even energy-depleted host" (without a reference, it reads like the authors "observed" that finding in this current manuscript).

      Also, regarding phage λ: The discussion between line 230 and line 249 is very interesting, but since it concerns the differences between λ PaPa and Ur-λ, the authors should consider mentioning and discussing a very relevant recent study, PMCID: PMC6312755.

      (5) Control experiments, or references to prior studies, are needed to support that the As/Az treatment at this concentration and duration (at least 10 minutes) is sufficient to deplete the metabolic state of the cell. For instance, this can be shown by impeded or null cell growth, arrested motility (using a standard swimming assay), or a fluorescent reporter for the energetic state of the cell.

    1. eLife Assessment

      Zandvoort and colleagues describe respiration-brain coupling in the context of apnoea in human newborns. The authors have addressed an important question and supported their claims with solid data. The rigor of the findings could perhaps be further strengthened with some relatively minor changes to the analysis methodology.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the extent to which phase-amplitude coupling (PAC) of respiratory and electrophysiological brain activity recordings was related to episodes of life-threatening apnoea in human newborns.

      Strengths:

      I want to commend the authors for acquiring unique and illuminating data; the difficulty in recording and handling these data has to be appreciated. As far as I can tell, Zandvoort and colleagues are the first to provide robust evidence for respiration-brain coupling in newborns. Their creative use of the phase-slope index for peripheral-central interactions is innovative and credible. If proven to be robust, the authors' findings have important implications well beyond the field of brain-body research.

      Weaknesses:

      While the analyses were overall competently conducted and well-justified, I was not entirely convinced by a few methodological choices, specifically i) the computation of PAC surrogates, ii) details of the linear mixed-effects model, and iii) the electrode selection for linking phase-amplitude coupling to apnoea frequency.

    3. Reviewer #2 (Public review):

      Summary:

      The author's central hypothesis was that the strength of cortico-respiratory coupling in infants is negatively associated with apnoea rate. To prove this, they first investigated the existence of cortico-respiratory coupling in premature and term-born infants, the spatial localisation of the cortical activity and its relationship with the phase of the respiratory cycle, and the directionality of coupling.

      Strengths:

      The researchers used synchronised EEG and impedance pneumography to detect the phase amplitude coupling.

      They have studied a wide range of gestations, from 28 weeks to 42 weeks, including males and females. Their exclusion criteria ensured that healthy babies were studied and potential confounders of impaired respiratory activity were avoided. Their sequential approach in addressing the objectives was appropriate.

      Weaknesses:

      As a neonatal clinician and neuroscientist, I have commented based on my expertise. I have not commented on signal processing.

      I did not identify any major weaknesses in the study. Some minor weaknesses include:

      (1) Data relating to the cortical oscillations and the respiratory phase is given. However, whether this would lead to their hypothesis that the strength of cortico-respiratory coupling is negatively associated with apnoea rate is unclear. What preceding data enabled the authors to link the strength of coupling to the rate of apnoea?

      (2) If we did not know of data showing the existence of cortico-respiratory coupling in newborn infants, then should it not be the first research question to examine?

      (3) What are the characteristics of the infants who contributed data to establish the cortico-respiratory coupling (Figures 2 and 3)?

      (4) Although it is the most plausible direction of the relationship, with neural activation driving respiratory muscle contraction, how can the authors prove this with their data? Given that they show coherence between signals, how do we know that the cortical signal precedes the respiratory muscle contraction?

      (5) Apgar score is an ordinal variable. The authors should summarise this as median (range).

    4. Reviewer #3 (Public review):

      Summary:

      This is a strong and important report that presents a framework for understanding cortical contributions to neonatal respiration. Overall, the authors successfully achieved their goal of linking cortical activity to respiratory drive. Despite the correlational nature of this study, it is a crucial step in establishing a foundation for future work to elucidate the interaction between cortical activity and breathing.

      Strengths:

      (1) The introduction and use of workflows that establish correlational relationships between breathing and brain activity.

      (2) The execution of these workflows in human neonates.

      Weaknesses:

      Interpretations related to causal inference, confounds of sleep and caffeine, and the spatial interpretation of EEG data need to be addressed to ensure that the data appropriately support the conclusions.

    5. Author response:

      We would like to thank the reviewers for their helpful comments and critique of our manuscript. We plan to make the following revisions, which will improve the clarity of our manuscript and the robustness of our findings.

      We will revise methodological details and interpretation throughout the manuscript. In particular, we will consider alternative methods for calculating surrogates. We intend to investigate the relationship between apnoea rate and phase-amplitude coupling at other electrodes as suggested by Reviewer 1, and we will revise the details of the linear-mixed effects models.

      In relation to the comments raised by both Reviewers 2 and 3, we will carefully address the wording throughout the manuscript, including addressing the order of hypotheses, our interpretation of the directionality of the relationship between cortical and respiratory activity, and the connection between cortical-respiratory coupling and apnoea. We will further clarify the limitations of our recording setup and approach, in particular the limited EEG montage, and add further details with regards to sleep state and caffeine.

    1. eLife Assessment

      This study presents valuable and compelling evidence that β-glucan-induced trained immunity can protect against intestinal inflammation by reprogramming innate immune cells toward a reparative phenotype. The authors employ a convincing combination of functional assays, adoptive transfers, and single-cell transcriptomics to uncover mechanistic insights and demonstrate the therapeutic potential of innate immune memory in IBD. While the work is robust, addressing the underlying epigenetic mechanisms and including additional controls would further reinforce the trained immunity-specific interpretation.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents an interesting investigation into the role of trained immunity in inflammatory bowel disease, demonstrating that β-glucan-induced reprogramming of innate immune cells can ameliorate experimental colitis. The findings are novel and clinically relevant, with potential implications for therapeutic strategies in IBD. The combination of functional assays, adoptive transfer experiments, and single-cell RNA sequencing provides comprehensive mechanistic insights. However, some aspects of the study could benefit from further clarification to strengthen the conclusions.

      Strengths:

      (1) This study elegantly connects trained immunity with IBD, demonstrating how β-glucan-induced innate immune reprogramming can mitigate chronic inflammation.

      (2) Adoptive transfer experiments robustly confirm the protective role of monocytes/macrophages in colitis resolution.

      (3) Single-cell RNA sequencing provides mechanistic depth, revealing the expansion of reparative Cx3cr1⁺ macrophages and their contribution to epithelial repair.

      (4) The work highlights the therapeutic potential of trained immunity in restoring gut homeostasis, offering new directions for IBD treatment.

      Weaknesses:

      While β-glucan may exert its training effect on hematopoietic stem cells, performing ATAC-seq on HSCs or monocytes to profile chromatin accessibility at antibacterial defense and mucosal repair-related genes would further validate the trained immunity mechanism. Alternatively, the authors could acknowledge this as a study limitation and future research direction.

    3. Reviewer #2 (Public review):

      Summary:

      The study investigates whether β-glucan (BG) can reprogram the innate immune system to protect against intestinal inflammation. The authors show that mice pretreated with BG prior to DSS-induced colitis experience reduced colitis severity, including less weight loss, colon damage, improved gut repair, and lowered inflammation. These effects were independent of adaptive immunity and were linked to changes in monocyte function.

      The authors show that the BG-trained monocytes not only help control inflammation but confer non-specific protection against experimental infections (Salmonella), suggesting the involvement of trained immunity (TI) mechanisms. Using single-cell RNA sequencing, they map the transcriptional changes in these cells and show enhanced differentiation of monocytes into reparative CX3CR1⁺ macrophages. Importantly, these protective effects were transferable to other mice via adoptive cell transfer and bone marrow transplantation, suggesting that the innate immune system had been reprogrammed at the level of stem/progenitor cells.

      Overall, this study provides evidence that TI, often associated with heightened inflammatory programs, can also promote tissue repair and resolution of inflammation. Moreover, this BG-induced functional reprogramming can be further harnessed to treat chronic inflammatory disorders like IBD.

      Strengths:

      (1) The authors use advanced experimental approaches to explore the potential therapeutic use of myeloid reprogramming by β-glucan in IBD.

      (2) The authors follow a data-to-function approach, integrating bulk and single-cell RNA sequencing with in vivo functional validation to support their conclusions.

      (3) The study adds to the growing evidence that TI is not a singular pro-inflammatory program, but can adopt distinct functional states, including anti-inflammatory and reparative phenotypes, depending on the context.

      Weaknesses:

      (1) The epigenetic and metabolic basis of TI is not explored, which weakens the mechanistic claim of TI. This is especially relevant given that a novel reparative, anti-inflammatory TI program is proposed.

      (2) The absence of a BG-only group limits interpretation of the results. Since the authors report tissue-level effects such as enhanced mucosal repair and transcriptional shifts in intestinal macrophages (colonic RNA-Seq), it is important to rule out whether BG alone could influence the gut independently of DSS-induced inflammation.<br /> Without a BG-only control, it is hard to distinguish a true trained response from a potential modulation caused directly by BG.

      (3) Although monocyte transfer experiments show protection in colitis, the fate of the transferred cells is not described (e.g., homing or differentiation into Cx3cr1⁺ macrophage subsets). This weakens the link between specific monocyte subsets and the observed phenotype.

      (3) While scRNA-seq reveals distinct monocyte/macrophage subclusters (Mono1-3..), their specific functional roles remain speculative. The authors assign reparative or antimicrobial functions based on transcriptional signatures, but do not perform causal experiments (depletion or in vitro assays). The biological roles of these cells remain correlative.

      (4) While Rag1⁻/⁻ mice were used to rule out adaptive immunity, the potential role of innate lymphoid cells (ILCs), particularly ILC2s and ILC3s, which are known to promote mucosal repair (PMID: 27484190), was not explored. Given the reparative phenotype observed, the contribution of ILCs remains a confounding factor.

    4. Reviewer #3 (Public review):

      Summary:

      In the present work, Yinyin Lv et al offer evidence for the therapeutic potential of trained immunity in the context of inflammatory bowel disease (IBD). Prior research has demonstrated that innate cells pre-treated (trained) with β-glucan show an enhanced pro-inflammatory response upon a second challenge.

      While an increased immune response can be beneficial and protect against bacterial infections, there is also the risk that it will worsen symptoms in various inflammatory disorders. In the present study, the authors show that mice preconditioned with β-glucan have enhanced resistance to Staphylococcus aureus infection, indicating heightened immune responses.

      The authors demonstrate that β-glucan training of bone marrow hematopoietic progenitors and peripheral monocytes mitigates the pro-inflammatory effects of colitis, with protection extending to naïve recipients of the trained cells.

      Using a dextran sulfate sodium (DSS)-induced model of colitis, β-glucan pre-treatment significantly dampens disease severity. Importantly, the use of Rag1^-/- mice, which lack adaptive immune cells, confirms that the protective effects of β-glucan are mediated by innate immune mechanisms. Further, experiments using Ccr2^-/- mice underline the necessity of monocyte recruitment in mediating this protection, highlighting CCR2 as a key factor in the mobilization of β-glucan-trained monocytes to inflamed tissues. Transcriptomic profiling reveals that β-glucan training upregulates genes associated with pattern recognition, antimicrobial defense, immunomodulation, and interferon signaling pathways, suggesting broad functional reprogramming of the innate immune compartment. In addition, β-glucan training induces a distinct monocyte subpopulation with enhanced activation and phagocytic capacity. These monocytes exhibit an increased ability to infiltrate inflamed colonic tissue and differentiate into macrophages, marked by increased expression of Cx3cr1. Moreover, among these trained monocyte and macrophage subsets, other gene expression signatures are associated with tissue and mucosal repair, suggesting a role in promoting resolution and regeneration following inflammatory insult.

      Strengths:

      (1) Overall, the authors present a mechanistically insightful investigation that advances our understanding of trained immunity in IBD.

      (2) By employing a range of well-characterized murine models, the authors investigate specific mechanisms involved in the effects of β-glucan training.

      (3) Furthermore, the study provides functional evidence that the protection conferred by the trained cells persists within the hematopoietic progenitors and can be transferred to naïve recipients. The integration of transcriptomic profiling allows the identification of changes in key genes and molecular pathways underlying the trained immune phenotype.

      (4) This is an important study that demonstrates that β-glucan-trained innate cells confer protection against colitis and promote mucosal repair, and these findings underscore the potential of harnessing innate immune memory as a therapeutic approach for chronic inflammatory diseases.

      Weaknesses:

      However, FPKM is not ideal for between-sample comparisons due to its within-sample normalization approach. Best practices recommend using raw counts (with DESeq2) for more robust statistical inference.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents an interesting investigation into the role of trained immunity in inflammatory bowel disease, demonstrating that β-glucan-induced reprogramming of innate immune cells can ameliorate experimental colitis. The findings are novel and clinically relevant, with potential implications for therapeutic strategies in IBD. The combination of functional assays, adoptive transfer experiments, and single-cell RNA sequencing provides comprehensive mechanistic insights. However, some aspects of the study could benefit from further clarification to strengthen the conclusions.

      We are grateful for the reviewer’s positive assessment of our study and constructive suggestions to improve the manuscript.

      Strengths:

      (1) This study elegantly connects trained immunity with IBD, demonstrating how β-glucan-induced innate immune reprogramming can mitigate chronic inflammation.

      (2) Adoptive transfer experiments robustly confirm the protective role of monocytes/macrophages in colitis resolution.

      (3) Single-cell RNA sequencing provides mechanistic depth, revealing the expansion of reparative Cx3cr1⁺ macrophages and their contribution to epithelial repair.

      (4) The work highlights the therapeutic potential of trained immunity in restoring gut homeostasis, offering new directions for IBD treatment.

      Weaknesses:

      While β-glucan may exert its training effect on hematopoietic stem cells, performing ATAC-seq on HSCs or monocytes to profile chromatin accessibility at antibacterial defense and mucosal repair-related genes would further validate the trained immunity mechanism. Alternatively, the authors could acknowledge this as a study limitation and future research direction.

      We agree that further epigenetic profiling—such as ATAC-seq analysis on HSCs or monocytes—would provide additional mechanistic depth to our current findings. We will acknowledge this as a limitation of the present study and highlight it as an important direction for future research.

      Comment (1): It’s better to include a schematic summarizing the proposed mechanism for reader clarity.

      We agree that a visual summary will enhance the clarity and accessibility of our findings. We will add a new schematic diagram (Figure 6) illustrating the proposed mechanism of β-glucan–induced myeloid reprogramming and its protective effects in the experimental colitis model.

      Comment (2): Discuss potential off-target effects of β-glucan-induced trained immunity (e.g., risk of exacerbated inflammation in other contexts).

      We appreciate this important comment regarding the potential off-target effects of β-glucan pretreatment. As trained immunity is known to amplify inflammatory responses upon heterologous stimulation and has been implicated in chronic inflammation–prone conditions such as atherosclerosis, this is an important consideration. Previous in vivo studies have shown that β-glucan pretreatment can enhance antibacterial or antitumor responses without inducing basal inflammation after one week of administration (PMID: 22901542, PMID: 30380404, PMID: 36604547, PMID: 33125892). Nevertheless, it remains possible that β-glucan–induced trained immunity could have unintended effects in certain contexts, which warrants further investigation and caution. We will expand the Discussion section to include a dedicated paragraph addressing these potential off-target effects.

      Reviewer #2 (Public review):

      Summary:

      The study investigates whether β-glucan (BG) can reprogram the innate immune system to protect against intestinal inflammation. The authors show that mice pretreated with BG prior to DSS-induced colitis experience reduced colitis severity, including less weight loss, colon damage, improved gut repair, and lowered inflammation. These effects were independent of adaptive immunity and were linked to changes in monocyte function.

      The authors show that the BG-trained monocytes not only help control inflammation but confer non-specific protection against experimental infections (Salmonella), suggesting the involvement of trained immunity (TI) mechanisms. Using single-cell RNA sequencing, they map the transcriptional changes in these cells and show enhanced differentiation of monocytes into reparative CX3CR1<sup>+</sup> macrophages. Importantly, these protective effects were transferable to other mice via adoptive cell transfer and bone marrow transplantation, suggesting that the innate immune system had been reprogrammed at the level of stem/progenitor cells.

      Overall, this study provides evidence that TI, often associated with heightened inflammatory programs, can also promote tissue repair and resolution of inflammation. Moreover, this BG-induced functional reprogramming can be further harnessed to treat chronic inflammatory disorders like IBD.

      Strengths:

      (1) The authors use advanced experimental approaches to explore the potential therapeutic use of myeloid reprogramming by β-glucan in IBD.

      (2) The authors follow a data-to-function approach, integrating bulk and single-cell RNA sequencing with in vivo functional validation to support their conclusions.

      (3) The study adds to the growing evidence that TI is not a singular pro-inflammatory program, but can adopt distinct functional states, including anti-inflammatory and reparative phenotypes, depending on the context.

      We are grateful for the reviewer’s positive assessment of our study and recognition of its translational implications. We particularly appreciate the acknowledgment that our work expands the therapeutic potential of β-glucan–mediated trained immunity in ameliorating colitis.

      Weaknesses:

      (1) The epigenetic and metabolic basis of TI is not explored, which weakens the mechanistic claim of TI. This is especially relevant given that a novel reparative, anti-inflammatory TI program is proposed.

      We appreciate the reviewer’s valuable comment highlighting the importance of the epigenetic and metabolic basis of TI in providing mechanistic insight. While previous studies, including work from our group (S.-C. Cheng), have extensively characterized the epigenetic and metabolic signatures of monocytes from BG-trained mice—primarily in the context of inflammatory genes—we acknowledge that these aspects are not directly addressed in our current manuscript.

      To strengthen the mechanistic component, we plan to: 1. Reanalyze relevant public datasets, focusing on pathways related to reparative and antibacterial function. 2. Perform monocyte ATAC-seq in our current model to validate the epigenetic changes in these pathways.

      (2) The absence of a BG-only group limits interpretation of the results. Since the authors report tissue-level effects such as enhanced mucosal repair and transcriptional shifts in intestinal macrophages (colonic RNA-Seq), it is important to rule out whether BG alone could influence the gut independently of DSS-induced inflammation.

      Without a BG-only control, it is hard to distinguish a true trained response from a potential modulation caused directly by BG.

      We thank the reviewer for this important suggestion. Although we did not perform qPCR for mucosal repair genes in Figure S1C and Figure S1D, our colon RNA-seq analysis in Figure 5G included a BG-only control group (Colitis_d0). The results from this group indicate that BG preconditioning alone does not alter baseline expression of colon mucosal repair genes, supporting the conclusion that the observed effects occur in the context of DSS-induced inflammation.

      (3) Although monocyte transfer experiments show protection in colitis, the fate of the transferred cells is not described (e.g., homing or differentiation into Cx3cr1⁺ macrophage subsets). This weakens the link between specific monocyte subsets and the observed phenotype.

      (4) While scRNA-seq reveals distinct monocyte/macrophage subclusters (Mono1-3.), their specific functional roles remain speculative. The authors assign reparative or antimicrobial functions based on transcriptional signatures, but do not perform causal experiments (depletion or in vitro assays). The biological roles of these cells remain correlative.

      We agree that the functional role of CX3CR1<sup>+</sup> macrophages is not comprehensively validated and is currently inferred from scRNA-seq clustering. While our flow cytometry data show increased CX3CR1<sup>+</sup> macrophages in the BG-TI group, and our CCR2 KO and monocyte adoptive transfer experiments indicate these macrophages are monocyte-derived, we lack direct depletion experiments due to the unavailability of effective depletion antibodies for this subset.

      We acknowledge this as a limitation and will clarify in the Discussion that our conclusions regarding CX3CR1<sup>+</sup> macrophage function are based on transcriptional profiling and association with protective phenotypes, rather than direct causal evidence.

      (5) While Rag1<sup>-/-</sup> mice were used to rule out adaptive immunity, the potential role of innate lymphoid cells (ILCs), particularly ILC2s and ILC3s, which are known to promote mucosal repair (PMID: 27484190IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1 ), was not explored. Given the reparative phenotype observed, the contribution of ILCs remains a confounding factor.

      We appreciate the reviewer’s valuable comment regarding the potential role of ILCs in the observed mucosal repair. Indeed, in examining the BG-trained immunity effect, the contribution of ILCs was not evaluated. We will explicitly acknowledge in the Discussion that Rag1⁻/⁻ mice retain ILCs (including ILC3s) and that BG-induced activation of these cells remains possible.

      The literature (PMID: 21502992; PMID: 32187516) supports a role for ILC3-mediated IL-22 production in tissue repair, which could overlap with our observed effects. However, our monocyte adoptive transfer experiments show that monocytes alone can alleviate DSS-induced colitis, suggesting a dominant role for monocytes in this context. Nonetheless, we will make it clear that ILC contributions cannot be excluded.

      Reviewer #3 (Public review):

      Summary:

      In the present work, Yinyin Lv et al offer evidence for the therapeutic potential of trained immunity in the context of inflammatory bowel disease (IBD). Prior research has demonstrated that innate cells pre-treated (trained) with β-glucan show an enhanced pro-inflammatory response upon a second challenge.

      While an increased immune response can be beneficial and protect against bacterial infections, there is also the risk that it will worsen symptoms in various inflammatory disorders. In the present study, the authors show that mice preconditioned with β-glucan have enhanced resistance to Staphylococcus aureus infection, indicating heightened immune responses.

      The authors demonstrate that β-glucan training of bone marrow hematopoietic progenitors and peripheral monocytes mitigates the pro-inflammatory effects of colitis, with protection extending to naïve recipients of the trained cells.

      Using a dextran sulfate sodium (DSS)-induced model of colitis, β-glucan pre-treatment significantly dampens disease severity. Importantly, the use of Rag1<sup>-/-</sup> mice, which lack adaptive immune cells, confirms that the protective effects of β-glucan are mediated by innate immune mechanisms. Further, experiments using Ccr2<sup>-/-</sup> mice underline the necessity of monocyte recruitment in mediating this protection, highlighting CCR2 as a key factor in the mobilization of β-glucan-trained monocytes to inflamed tissues. Transcriptomic profiling reveals that β-glucan training upregulates genes associated with pattern recognition, antimicrobial defense, immunomodulation, and interferon signaling pathways, suggesting broad functional reprogramming of the innate immune compartment. In addition, β-glucan training induces a distinct monocyte subpopulation with enhanced activation and phagocytic capacity. These monocytes exhibit an increased ability to infiltrate inflamed colonic tissue and differentiate into macrophages, marked by increased expression of Cx3cr1. Moreover, among these trained monocyte and macrophage subsets, other gene expression signatures are associated with tissue and mucosal repair, suggesting a role in promoting resolution and regeneration following inflammatory insult.

      Strengths:

      (1) Overall, the authors present a mechanistically insightful investigation that advances our understanding of trained immunity in IBD.

      (2) By employing a range of well-characterized murine models, the authors investigate specific mechanisms involved in the effects of β-glucan training.

      (3) Furthermore, the study provides functional evidence that the protection conferred by the trained cells persists within the hematopoietic progenitors and can be transferred to naïve recipients. The integration of transcriptomic profiling allows the identification of changes in key genes and molecular pathways underlying the trained immune phenotype.

      (4) This is an important study that demonstrates that β-glucan-trained innate cells confer protection against colitis and promote mucosal repair, and these findings underscore the potential of harnessing innate immune memory as a therapeutic approach for chronic inflammatory diseases.

      We thank the reviewer for their positive evaluation and constructive feedback on our manuscript.

      Weaknesses:

      However, FPKM is not ideal for between-sample comparisons due to its within-sample normalization approach. Best practices recommend using raw counts (with DESeq2) for more robust statistical inference.

      We appreciate the reminder about best practices for RNA-seq analysis. We apologize for the inaccurate description in the Materials and Methods section. For all differential expression analyses, we have in fact used raw count data as input for DESeq2. FPKM values were only used for visualization purposes, such as in heatmaps and clustering analyses. We will correct this description in the revised manuscript to accurately reflect our analysis workflow.

    1. eLife Assessment

      The study by Takagi and colleagues is an important contribution to the question of how homologous neuronal circuits might be wired differently to elicit specific behaviours. The authors combine genetic, neuroanatomical, and behavioral data to provide convincing evidence that Dfz2/DWnt4 signaling controls the innervation pattern of wave command neurons in the fly larva, and thereby behavioral locomotion program selection.

    2. Reviewer #1 (Public review):

      Summary

      In this study Takagi and colleagues demonstrate that changes in axonal arborization of the segmental wave motor command neurons are sufficient to change behavioral motor output.

      The authors identify the Wnt receptors DFz2 and DFz4 and the ligand Wnt4 as modulators of the stereotypic segmental arborization pattern of segmental wave neurons along the anterior-posterior body axis. Based on both embryonic expression pattern analysis and genetic manipulation of the signaling components in wave neurons (receptors) and the neuropil (Wnt4) the authors convincingly demonstrate that Wnt4 acts as a repulsive ligand for DFz2 that restricts posterior axon guidance of both anterior and posterior wave neurons. They also provide first evidence that Wnt4 potentially acts as an attractive ligand for Df4 to promote posterior extension of p-wave neurons. Interestingly, artificial optogenetic activation of all wave neurons that normally induces a backward locomotion due to the activity of anterior wave neurons, fails to induce backward locomotion in a DFz2 knock down condition with altered axonal extensions of all wave neurons towards posterior segments. In addition, the authors now observe enhanced fast forward locomotion a feature normally induced by posterior wave neurons. Consistent with these findings, they observe that the natural response to an anterior tactile stimulus is similarly altered in DFz2 knock down animals. The animals respond with less backward movement and increase fast forward motion. These results suggest that alterations in the innervation pattern of wave motor command neurons are sufficient to switch behavioral response programs.

      Strengths

      The authors convincingly demonstrate the importance of Wnt signaling for anterior-posterior axon guidance of a single class of motor command neurons in the larval CNS. The demonstration that alteration of the expression level of a single axon guidance receptor is sufficient to not only alter the innervation pattern but to significantly modify the behavioral response program of the animal provides a potential entry point to understand behavioral adaptations during evolution.

      Weaknesses

      The authors demonstrate an alteration of the behavioral response to a natural tactile stimulus and correlate this to morphological alterations observed in the single-neuron analyses. As the authors suggest an alteration of the command circuitry, a direct observation of the downstream activation pattern in response to selective optogenetic stimulation of anterior wave neurons (if possible with appropriate genetic tools in the future) would further strengthen their claims.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript, the authors aim to determine the molecular mechanisms involved in wiring the segmentally homologous a- and p -Wave neurons distinctively and thus are functionally different in modulating forward or backward locomotion. The genetic screen focused on Wnt/Fz-signaling due to its known anterior-to-posterior guidance roles in mammals and nematodes.

      Strengths:

      The conclusion that Frizzled receptors DFz2 and DFz4 as well as the DWnt4 ligand is essential for normal segment-specific axon projections of Wave command neurons is strongly supported by the elaborate morphological analyses of numerous Wnt/Fz in gain and loss of function mutants. The distinctive Wnt/Fz ligand-receptor gradients also imply that they contribute to the diversification of Wave neurons in a location-dependent manner and that DFz2 and DFz4 may have opposing effects on axon extension.

      Labeling of synaptic marker Bruchpilot in DFz2 mutants in this revised manuscript, now supports that the ectopic projections in a-Wave neurons make synaptic connections. Finally, the altered responses in two behavioral assays (optogenetic stimulation of all Wave neurons or tactile stimuli on heads using a von Frey filament) further strongly support the main conclusion, that Wnt/Fz-signaling is essential for the guidance of both Wave neurons and in diversifying their protection pattern in a segment-specific manner.

      Weaknesses:

      There are no major weaknesses in the revised version of this work.

      Re-analysis of DFz2 expression now shows it is bidirectionally distributed. This new result does not affect the previous and current conclusions for the a-Wave neurons but leaves alternative interpretations for p-Wave neurons, which the author now included in their discussions. Evidently, it seems unlikely that the complex wiring of the numerous segmental a- and p-Wave neurons will be solely dependent on Wnt4-DFz2/4 but are likely to also involve other Wnt/Fz (see, Figure 1-figure supplement 2) or distinct guidance signaling pathways. However, unraveling all factors involved is certainly beyond the scope of this study, and the main conclusions made by the authors are well supported by the data provided.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      In this study, Takagi and colleagues demonstrate that changes in axonal arborization of the segmental wave motor command neurons are sufficient to change behavioral motor output.

      The authors identify the Wnt receptors DFz2 and DFz4 and the ligand Wnt4 as modulators of stereotypic segmental arborization patterns of segmental wave neurons along the anterior-posterior body axis. Based on both embryonic expression pattern analysis and genetic manipulation of the signaling components in wave neurons (receptors) and the neuropil (Wnt4) the authors convincingly demonstrate that Wnt4 acts as a repulsive ligand for DFz2 that restricts posterior axon guidance of both anterior and posterior wave neurons. They also provide the first evidence that Wnt4 potentially acts as an attractive ligand for Df4 to promote the posterior extension of p-wave neurons. Interestingly, artificial optogenetic activation of all wave neurons that normally induces backward locomotion due to the activity of anterior wave neurons, fails to induce backward locomotion in a DFz2 knockdown condition with altered axonal extensions of all wave neurons towards posterior segments. In addition, the authors now observe enhanced fast-forward locomotion, a feature normally induced by posterior wave neurons. Consistent with these findings, they observe that the natural response to an anterior tactile stimulus is similarly altered in DFz2 knockdown animals. The animals respond with less backward movement and increased fast forward motion. These results suggest that alterations in the innervation pattern of wave motor command neurons are sufficient to switch behavioral response programs.

      Strengths

      The authors convincingly demonstrate the importance of Wnt signaling for anteriorposterior axon guidance of a single class of motor command neurons in the larval CNS. The demonstration that alteration of the expression level of a single axon guidance receptor is sufficient to not only alter the innervation pattern but to significantly modify the behavioral response program of the animal provides a potential entry point to understanding behavioral adaptations during evolution.

      Weaknesses

      While the authors demonstrate an alteration of the behavioral response to a natural tactile stimulus the observed effects, a reduction of backward motion and increased fast-foward locomotion, currently cannot be directly correlated to the morphological alterations observed in the single-neuron analyses. The authors do not report any loss of innervation in the "normal" target region but only a small additional innervation of more posterior regions. An analysis of synaptic connectivity and/or a more detailed morphological analysis that is supported by a larger number of analyzed neurons both in control and experimental animals would further strengthen the confidence of the study. As the authors suggest an alteration of the command circuitry, a direct observation of the downstream activation pattern in response to selective optogenetic stimulation of anterior wave neurons would further strengthen their claims (analogous to Takagi et al., 2017, Figure 4).

      We sincerely thank the reviewer for their insightful comments, which were instrumental in improving our manuscript. In response to the reviewers’ suggestion, we have now studied Brp expression and demonstrate that the ectopically extending Wave axons in the posterior region do contain synapses (new Figure 2). This finding supports the idea that these axons are functionally connected to ectopic downstream circuits. 

      Additionally, we have increased the number of analyzed Wave clones in Figure 1F-J (WT and DFz2 KD) and new Figure 3C-G (WT; formerly Figure 2C-G) to strengthen the morphological analyses. We fully agree with the reviewer that “direct observation of the downstream activation pattern in response to selective optogenetic stimulation” would further reinforce our conclusions. However, this was not feasible in the current study since we found that the Wave-Gal4 driver used in this study, which drives expression during embryonic stages, does not drive sufficiently strong expression in the larvae to enable selective optogenetic stimulation (please see below for details). 

      Reviewer #2 (Public Review):

      Summary:

      The authors previously demonstrated that anterior-located a-Wave neurons (neuromeres A1-A3) extend axons anteriorly to connect to circuits inducing backward locomotion, while p-Wave axon (neuromeres A4-A7) project posteriorly to promote forward locomotion in Drosophila larvae. In the manuscript, the authors aim to determine the molecular mechanisms involved in wiring the segmentally homologous Wave neurons distinctively and thus are functionally different in modulating forward or backward locomotion. The genetic screen focused on Wnt/Fz-signaling due to its known anterior-to-posterior guidance roles in mammals and nematodes.

      Strengths:

      Knock-down (KD) DFz2 with two independent RNAi-lines caused ectopic posterior axon and dendrite extension for all a- and p-Wave neurons, with a-Wave axon extending into regions where p-Wave axons normally project. Both behavioral assays (optogenetic stimulation of all Wave neurons or tactile stimuli on heads using a von Frey filament) show that backward movement is reduced or absent and that the speed of evoked fast-forward locomotion is increased. This demonstrates that altered projections of Wave do alter behavior and the DFz2 KD phenotype is consistent with the potential aberrant wiring of a-Wave neurons to forward locomotion-promoting circuits instead of to backward locomotion-promoting circuits.

      The main conclusion, that Wnt/Fz-signaling is essential for the guidance of Wave neurons and in diversifying their protection pattern in a segment-specific manner, is further supported by the results showing that DFz2 gain of function causes shortening of a-Wave but not p-Wave axon extensions towards the posterior end and that KD of DFz4 causes axonal shortening only in A6-p-Wave neurons but does not affect dendrites or processes of other Wave neurons. A role for ligand Wnt4 is demonstrated by results indicating that WNT4 mutants' posterior extension of aWave axons was elongated similar to DFz2 KD animals and p-Wave axon extension towards the posterior end was shortened similar to DFz2 KD animals. Finally, a DWnt4 gradient decreasing from the posterior (A8) to the anterior end (A2), similar to that described in other species, is supported by analyses of DWnt4 gene expression (using Wnt4 Trojan-Gal4) and protein expression (using antibodies). In contrast, DFz2 receptor levels seemed to decrease from the anterior (A2) to the posterior end (A5/6). Together the results support the conclusion that opposing Wnt/Fz ligand-receptor gradients contribute to the diversification of Wave neurons in a location-dependent manner and that DFz2 and DFz4 have opposing effects on axon extension.

      Weaknesses:

      Wave axon and dendrite projections are not exclusively determined by Wnt4, DFz2, and DFz4, and are likely to involve other Fz receptors, Wt ligands, and other types of receptor-ligand signaling pathways. This is in part supported by the fact that Wnt4 loss of function also resulted in phenotypes that do not mimic DFz2 KD or DFz4 KD (Figures 3D, E, and F) and that other Fz/Wnt mutants caused wave neuron phenotypes (Figure 1-supplement 2, D+E). This is not a weakness per se, since it doesn't affect the main conclusion of the manuscript. However, the description and analyses of the data in particular for Figure 1-supplement 2 D should be clarified in the legend. The number within the bars and the asterisks are not defined. It's presumed they refer to numbers of animals assessed and the asterisk next to DFz2 and DFz4 indicate statistically significant differences. However, only one p-value is provided in the legend. It is also unclear if p-values for the other mutants have not been determined or are non-significant. At least for mutants like Corin, which also exhibit altered axon projections, the p-values should be provided.

      We appreciate this reviewer’s careful attention to detail and intellectual curiosity. We apologize for the confusions caused by the statistical reporting in Figure 1 – figure supplement 2D. The numbers shown in the bars represent the number of neurons (i.e. Wave neurons from left or right hemisphere). As mentioned in Materials and Methods section, we applied Chi-square test followed by Haberman's adjusted residual analysis to determine the statistical significance of each RNAi group. The p-value provided in the figure legend corresponds to the Chi-square test. P-values for Haberman's adjusted residual analysis were calculated for all RNAi groups and groups without the asterisk are not statistically significant. We have clarified these points in the corresponding figure legend.

      Figure 4 D, F. The gradient for Wnt4 was determined by comparison of expression levels of other segments to A8 but the gradient for DFz2 was by comparison to A2 and the data supports opposing gradients. However, for DFz2 (Figure 4, F) it seems that the gradient is bi-directional with the lowest being in A5 and increasing towards A2 as well as A8. Analysis should be performed in reference to A8 as well to determine if it is indeed bi-directional. While such a finding would not affect the interpretation of aWave neurons, it may impact conclusions about p-Wave neuron projections.

      We thank the reviewer for highlighting this interesting possibility. In response, we performed an additional analysis of the DFz2 gradient by comparing the signal from each neuromere to that from A8 (new Figure 5—figure supplement 3). This analysis confirmed that the gradient is indeed bidirectional. We revised the description of DFz2 expression accordingly in the revision. We believe this finding does not affect our main conclusions since only the anterior gradient is relevant for a-Wave axon guidance. 

      As discussed above, the DFz2 KD phenotypes are consistent with the potential aberrant wiring of a-Wave neurons to forward locomotion-promoting circuits instead of to backward locomotion-promoting circuits. However, since the axon and dendrites of a-Wave and p-Wave are affected the actual dendritic and axonal contributions for the altered behavior remain elusive. The authors certainly considered a potential contribution of altered dendrite projection of a-Wave neurons to the phenotype and their conclusion that altered axonal projections are involved is supported by the optogenetic experiment "bypassing" sensory input (albeit it seems unlikely that all Wave neurons are activated simultaneously when perceiving natural stimuli).However, the author should also consider that altered perception and projection of pWave neuron may directly (e.g. extended P-wave axon projections increase forward locomotion input thereby overriding backward locomotion) or indirectly (e.g. feedback loops between forward and backward circuits) contribute to the altered behavioral phenotypes in both assays. It is probably noteworthy that the more complex behavioral alterations observed with mechanical stimulation are likely to also be caused by altered dendritic projections.

      We fully agree with the reviewer’s thoughtful interpretation. We have now included these important possibilities in the revised Discussion section. Specifically, we acknowledge that while the DFz2 knockdown phenotypes are consistent with aberrant wiring of a-Wave neurons to forward locomotion-promoting circuits, the contributions of both axonal and dendritic alterations remain unclear. We also recognize that altered perception and projection of p-Wave neurons may directly or indirectly contribute to the observed behavioral phenotypes, particularly in response to mechanical stimulation.

      Presynaptic varicosities of a-Wave neurons in DFz2 KD animals are indicated by orange arrows in Figure 1. However, no presynaptic markers have been used to confirm actual ectopic synaptic connections. At least the authors should more clearly define what parameters they used to "visually" define potential presynaptic varicosities. Some arrows seem to point to more "globular structures" but for several others, it's unclear what they are pointing at.

      As mentioned in our response to Reviewer #1, we have now performed Brp immunostaining to confirm the presence of ectopic synaptic connections (new Figure 2). This analysis supports the interpretation that the presynaptic varicosities observed in DFz2 knockdown animals represent actual synaptic sites. We also clarified in the figure legend the visual criteria used to identify potential presynaptic varicosities.

      Reviewing Editor (Recommendations For The Authors):

      There are a few major concerns that we recommend the authors address:

      (1) Neuroanatomy: The point aberrant synaptic connectivity of a-Wave neurons following Dfz2 knockdown could be substantiated. This could be done by using a presynaptic marker and showing ectopic posterior presynaptic sites ( and/or reduced anterior presynaptic sites) in a-wave neurons.

      As mentioned in our response to the public review, we now have used Brp as a presynaptic marker to quantify the number and distribution of presynaptic sites along the normal and ectopic a-Wave axons (new Figure 2). We show that ectopic posterior Wave axons do contain presynaptic sites.  

      (2) Gradient calculations: As detailed in the reviews below, the Dfz2 gradient looks like it may be bidirectional. Changing the way the gradient is calculated might help address this point.

      As mentioned in our response above, we now have recalculated the gradient by comparing the DFz2 signal to A8 and show that it indeed is bidirectional (new Figure 5—figure supplement 2; formerly Figure 4—figure supplement 2).

      (3)  Statistics and sample sizes: As detailed in the reviews, some of the statistical reporting could be improved. Further, increasing sample sizes could help bolster confidence in the data as well.

      As mentioned above, we have added a description on the sample size, asterisks, and p-values in Figure 1 – figure supplement 2 legend. We also increased sample sizes of single Wave neurons in control and DFz2 knock-down animals (Figure 1F-J (WT and DFz2 KD) and new Figure 3C-G (WT; formerly Figure 2C-G)).

      (4) It would help to include some discussion of the potential contributions of altered p-wave neurons to the observed phenotypes.

      As described above, we have added in the Discussion potential contributions of altered p-wave neurons to the observed phenotypes. 

      Reviewer #1 (Recommendations For The Authors):

      (1) In the current model the authors assume that posterior elongation of a-wave neuron connectivity (axonal projections) induces a loss of connectivity to their natural targets, as backward motion is no longer induced, and a gain of connectivity to posterior wave neuron targets. Is this at the cost of innervation of p-wave neurons, e.g. did these neurons now lose connectivity to their natural targets as well? Therefore, it would be very interesting if the authors would test the behavioral responses to tactile stimuli in the posterior parts of the animal - does the response pattern change?

      This is indeed an interesting possibility that p-Wave function is altered upon DFz2 knock-down and hence behavioral response to posterior touch is changed. However, it is technically challenging to test this with tactile stimuli, due to the difficulty of (1) distinguishing between normal and fast-forward locomotion and (2) delivering a posterior touch stimulus while the larva is moving forward, which is the default behavior of the larvae on an agar plate.

      As highlighted above, the authors should provide additional evidence that the circuit response to a-wave neurons is changed after a DFz2 knockdown. The authors should monitor the activation wave in response to optogenetic activation of anterior wave neurons - analogous to the data provided in Figure 4 of their 2017 paper. If this response is now switched for a-wave activation but not p-wave activation it would greatly support their claims and this data would be less ambiguous compared to the behavioral locomotion data.

      As described in our response to the public review, we attempted this approach but found that the in vitro optogenetics experiment is unfortunately not feasible due to relatively weak expression of R60G09-GAL4 in the larvae. Local activation of control aWave induced fictive backward locomotion only at low frequencies, making comparison with the experimental a-Wave very difficult.  The MB120B-spGAL4 used in our 2017 study could not be employed in this study as it does not drive expression during the embryonic stages and thus cannot be used to knock down DFz2 during development. 

      (2) Related to this point. Why would the normal "backward" circuitry of a-wave neurons be functionally suppressed in Dfz2 knockdowns? Do the authors observe reduced synaptic connectivity in these segments? Vesicle clustering of synaptotagmin or other presynaptic markers could be used as a first. As the innervation pattern is only extended by approximately one segment, it is surprising that the changes are so significant.

      We agree that these are important and interesting points, which remain to be explored in the future study. As described above, we have performed Brp immunostaining and showed that the posterior ectopic axons of a-Wave do contain synapses (new Figure 2). We also found a slight decrease in the number of synapses in the anterior region, which could partially contribute to the weaker activation of downstream neurons responsible for eliciting backward locomotion. Another possibility is that backward suppression occurs through lateral interaction among downstream circuits. Since forward and backward locomotion do not occur simultaneously, it is likely that the circuits driving these two behaviors are mutually inhibitory. Upon DFz2 knock down in a-Wave, downstream neurons inducing fastforward locomotion may become more strongly activated than those inducing backward locomotion, resulting in inhibition of the latter via a “winner-take-all” mechanism. Since these discussions are highly speculative, we chose not to include them in the revised manuscript.  

      (3) The low number of neurons analyzed per segment is of slight concern. This is particularly the case for the control data set used in Figure 1 and Figure 2. As stated, the same datasets are used for both figures. However, at most 6 neurons were analyzed (and for two segments only 3). The control morphology may be more variable than indicated by this data.

      As mentioned above, we now have dissected 50 larvae each for the control and experimental groups, obtained seven and six clones respectively, and included these data in the revised manuscript. We apologize that the sample sizes are still relatively small but hope the reviewer understands the inherently low “hit rate” of the stochastic labelling method.

      It is somewhat curious that in Figure 1- Supplement 3 the authors report the same number of control clones per segment as in Figure 1/2 - is this simply a coincidence? And if this is an independent dataset why did the author use new controls here but not for Figure 2? It is clear that it is very difficult to generate this data but increasing the n-number beyond 3-6 per segment would significantly increase the confidence in the presented data.

      We apologize for the confusion. The data in Figure 1 – figure supplement 3 represent the innervation pattern of dendrites, not axons. We have corrected the figure caption accordingly. These data were obtained from the same samples used to analyze axonal innervation, as shown in the original version of Figure 1F-J.

      (2) The name of the RNAi lines should be indicated in Figure 1 and Figure Supplement 3 to facilitate reading - at least the precise names should be given in both figure legends.

      We have added these labels in the revised figure legends as requested.

      (3) In Figure 4E again the control numbers of Figure 1 for the A2-wave axon are reused. This does not seem appropriate as now a different Gal4 driver is used and a different method to induce individual neuronal clones. Both components may induce significant variability in expression or arborization. As only 3 clones for the wnt4 mutant condition are analyzed (and compared to 5 control clones), this data does not allow for strong conclusions. The authors clearly state the reuse and different methods in the legend of Figure 4 F/G but should also highlight it for the E panel.

      Here, we assume that the reviewer is referring to the former Figure 3 (now Figure 4). We have added a note in the legend that the control data, obtained using a different method, were reused in this panel.

      (4) The expression levels of DWnt4 and DFz2 were analyzed at the end of embryogenesis. At what developmental stage does the axonal extension of wave neurons take place? Is the gradient maintained throughout the first larval stages?

      Based upon the lateral view of Wave neurons in Figure 1—figure supplement 1D, we think that the axonal extension is already established by approximately 20 hr after egg laying. Previously, we performed Wnt4<sup>MI03717-Trojan-GAL4</sup> > GFP.nls immunostaining in the third instar larva and observed a similar gradient of GFP signals towards the posterior end of the ventral nerve cord (VNC). We have included this data in the revised manuscript (new Figure 5—figure supplement 1).

      (5) The authors state that either 2nd or 3rd instar larvae were used for the optogenetic experiments. This may induce unnecessary variation in their assay and should be avoided. As natural variance exists in larvae regarding forward stride duration, the comparison of "on" state forward stride duration between control and experimental genotype is potentially not the best measurement of effect size. What is the difference between OFF and ON stage within the control and experimental genotype? In both cases stride duration decreases but there may not be a significant difference between the delta of the two genotypes. Thus, the observed effect may in part be due to "slower" animals in the control pool. The authors should discuss this more carefully.

      We thank the reviewer for bringing up this critical issue. Indeed, the stride durations of larvae between the control and DFz2 knock-down are slightly different in the OFF condition, although this is not statistically significant. In addition, the effect size of Wave activation on mean stride duration is -0.14 (s) in control while -0.21 (s) in DFz2 knock-down, which we interpret as DFz2 knock-down resulting in stronger fastforward locomotion upon Wave activation. We have incorporated this note in the corresponding figure legends (new Figure 6; formerly Figure 5).

      (6) While the study clearly provides convincing evidence for their model, the authors should tune down their conclusions in the discussion a little bit and highlight that parts of their discussion are speculative.

      We have revised the discussion as suggested.

      Reviewer #2 (Recommendations For The Authors):

      Albeit the optogenetic behavioral experiments strongly support that the altered axonal projection affect normal locomotion, simultaneous labeling of Wave neurons in DFz2 KD animals with presynaptic markers would strengthen the conclusion of ectopic connection of the extended axon with other circuits.

      Please see our response to your public review.

      Figure 1 K+L, Figure 2H, I, Figure 3 F+G: many of the individual data points are not visible in the Whisker plot- changing their color would be useful to visualize them better.

      We have changed the outline width of the box plots to make the individual data points visible.

      Figure 1-Supplement 2: In addition to the comments in the public review- a) the asterisk font size changes in the different panels, e.g. it is much smaller in G', b) font size in some graphs/legends should be increased - in particular in E the hyphenated letters in the genotypes are so small rendering them almost illegible.

      We have unified the font size to make them readable in the figure. We thank the reviewer for the suggestions.

    1. eLife Assessment

      This valuable paper describes the crystal structure of a complex of the Sld3-Cdc45-binding domain (CBD) with Cdc45, which is essential for the assembly of an active Cdc45-MCM-GINS (CMG) double-hexamer at the replication origin. The structural and biochemical analyses of protein-protein interactions and DNA binding provided solid evidence to support the authors' conclusion. The results shown in the paper are of interest to researchers in DNA replication and genome stability.

    2. Reviewer #1 (Public review):

      Summary:

      The crystal structure of the Sld3CBD-Cdc45 complex presented by Li et al. is a significant contribution that enhances our understanding of CMG formation during the rate-limiting step of DNA replication initiation. This structure provides crucial insights into the intermediate steps of CMG formation, and the particle analysis and model predictions compellingly describe the mechanism of Cdc45 loading.<br /> Building upon previously known Sld3 and Cdc45 structures, this study offers new perspectives on how Cdc45 is recruited to MCM DH through the Sld3-Sld7 complex. The most notable finding is the structural rearrangement of Sld3CBD upon Cdc45 binding, particularly the α8-helix conformation, which is essential for Cdc45 interaction and may also be relevant to its metazoan counterpart, Treslin. Additionally, the conformational shift in the DHHA1 domain of Cdc45 suggests a potential mechanism for its binding to Mcm2NTD.<br /> Furthermore, the ssDNA-binding experiments involving Sld3 further support a broader functional role in the replication process, beyond its established role in recruiting Cdc45. This adds an intriguing new layer to our understanding of Sld3's activity in the yeast.

    3. Reviewer #2 (Public review):

      Summary

      The manuscript presents valuable findings, particularly in the crystal structure of the Sld3CBD-Cdc45 interaction and the identification of additional sequences involved in their binding. The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is novel, and the results provide insights into potential conformational changes that occur upon interaction. Although the single-stranded DNA binding data from Sld3 of different species is a minor weakness, the experiments support a model in which the release of Sld3 from the complex may be promoted by its binding to origin single-stranded DNA exposed by the helicase.

    4. Reviewer #3 (Public review):

      Summary:

      The paper by Li et al. describes the crystal structure of a complex of Sld3-Cdc45-binding domain (CBD) with Cdc45 and a model of the dimer of an Sld3-binding protein, Sld7, with two Sld3-CBD-Cdc45 for the tethering. In addition, the authors showed the genetic analysis of the amino acid substitution of residues of Sld3 in the interface with Cdc45 and biochemical analysis of the protein interaction between Sld3 and Cdc45 as well as DNA binding activity of Sld3 to the single-strand DNAs of the ARS sequence.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The crystal structure of the Sld3CBD-Cdc45 complex presented by Li et al. is a significant contribution that enhances our understanding of CMG formation during the rate-limiting step of DNA replication initiation. This structure provides crucial insights into the intermediate steps of CMG formation, and the particle analysis and model predictions compellingly describe the mechanism of Cdc45 loading. Building upon previously known Sld3 and Cdc45 structures, this study offers new perspectives on how Cdc45 is recruited to MCM DH through the Sld3-Sld7 complex. The most notable finding is the structural rearrangement of Sld3CBD upon Cdc45 binding, particularly the α8-helix conformation, which is essential for Cdc45 interaction and may also be relevant to its metazoan counterpart, Treslin. Additionally, the conformational shift in the DHHA1 domain of Cdc45 suggests a potential mechanism for its binding to Mcm2NTD. Furthermore, Sld3's ssDNA-binding experiments provide evidence of its novel functions in the DNA replication process in yeast, expanding our understanding of its role beyond Cdc45 recruitment.

      Strengths:

      The manuscript is generally well-written, with a precise structural analysis and a solid methodological section that will significantly advance future studies in the field. The predictions based on structural alignments are intriguing and provide a new direction for exploring CMG formation, potentially shaping the future of DNA replication research. This research also opens up several new opportunities to utilize structural biology to unravel the molecular details of the model presented in the paper.

      Weaknesses:

      The main weakness of the manuscript lies in the lack of detailed structural validation for the proposed Sld3-Sld7-Cdc45 model, and its CMG bound models, which could be done in the future using advanced structural biology techniques such as single particle cryo-electron microscopy. It would also be interesting to explore how Sld7 interacts with the MCM helicase, and this would help to build a detailed long-flexible model of Sld3-Sld7-Cdc45 binding to MCM DH and to show where Sld7 will lie on the structure. This will help us to understand how Sld7 functions in the complex. Also, future experiments would be needed to understand the molecular details of how Sld3 and Sld7 release from CMG is associated with ssARS1 binding.

      The proposals based on this study provide new knowledge of the CMG formation process. We agree that our Sld3-Sld7-Cdc45 model will be further confirmed by cryo-EM. We improved our ssARS1-binding assay and quantified data (See the response to Recommendations for the authors of #3 review).

      Reviewer #2 (Public review):

      Summary

      The manuscript presents valuable findings, particularly in the crystal structure of the Sld3CBD-Cdc45 interaction and the identification of additional sequences involved in their binding. The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is novel, and the results provide insights into potential conformational changes that occur upon interaction. Although the single-stranded DNA binding data from Sld3 of different species is a minor weakness, the experiments support a model in which the release of Sld3 from the complex may be promoted by its binding to origin single-stranded DNA exposed by the helicase.

      Strengths

      The Sld3CBD-Cdc45 structure is a novel contribution, revealing critical residues involved in the interaction.

      The model structures generated from the crystal data are well presented and provide valuable insights into the interaction sequences between Sld3 and Cdc45.

      The experiments testing the requirements for interaction sequences are thorough and conducted well, with clear figures supporting the conclusions.

      The conformational changes observed in Sld3 and Cdc45 upon binding are interesting and enhance our understanding of the interaction.

      The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is a new and valuable addition to the field.

      The proposed model of Sld3 release from the complex through binding to single stranded DNA at the origin is intriguing.

      Weaknesses

      The section on the binding of Sld3 complexes to origin single-stranded DNA is somewhat weakened by the use of Sld3 proteins from different species. The comparisons between Sld3-CBD, Sld3CBD-Cdc45, and Sld7-Sld3CBD-Cdc45 involve complexes from different species, limiting the comparisons' value.

      Although the study reveals that Sld3 binds to different residues of Cdc45 than those previously shown to bind Mcm or GINS, the data in the paper do not shed any additional light on how GINS and Sld3 binding to Cdc45 or Mcms. would affect each other. Other previous research has suggested that the binding of GINS and Sld3 to Mcm or Cdc45 may be mutually exclusive. The authors acknowledge that a structural investigation of Sld3, Sld7, Cdc45, and MCM during the stage of GINS recruitment will be a significant goal for future research.

      We agree that it is better to use all samples from a source; however, due to limitations in protein expression, we used Sld7-Sld3CBD-Cdc45 from a different source. The two sources used in this study belong to the same family, and the proteins Sld7, Sld3 and Cdc45 share sequence conservation with similar structures predicted by Alphafold3 (RMSD = 0.356, 1.392, and 0.891 for Ca atoms of Sld7CTD, Sld7NTD-Sld3NTD, and Sld3CBD-Cdc45). Such similarity in source and proteins allows us to do the comparison. We also mentioned that a cryo-EM study of Sld3-Sld7-Cdc45-MCM and Sld3-Sld7-CMG structures will be a significant goal for future research in our manuscript.

      Reviewer #3 (Public review):

      Summary:

      The paper by Li et al. describes the crystal structure of a complex of Sld3-Cdc45-binding domain (CBD) with Cdc45 and a model of the dimer of an Sld3-binding protein, Sld7, with two Sld3-CBD-Cdc45 for the tethering. In addition, the authors showed the genetic analysis of the amino acid substitution of residues of Sld3 in the interface with Cdc45 and biochemical analysis of the protein interaction between Sld3 and Cdc45 as well as DNA binding activity of Sld3 to the single-strand DNAs of the ARS sequence.

      Strengths:

      The authors provided a nice model of an intermediate step in the assembly of an active Cdc45-MCM-GINS (CMG) double hexamers at the replication origin, which is mediated by the Sld3-Sld7 complex. The dimer of the Sld3-Sld7 complexes tethers two MCM hexamers together for the recruitment of GINS-Pol epsilon on the replication origin.

      Weaknesses:

      The biochemical analysis should be carefully evaluated with more quantitative ways to strengthen the authors' conclusion even in the revised version.

      In this revision, we improved our ssARS1-binding assay in more quantitative ways (See the response to Recommendations for the authors).

      Reviewer #1 (Recommendations for the authors):

      I thank the authors for all their replies to my previous questions and for doing all the necessary corrections. I am satisfied with most of their replies, however, upon second reading I have a few more suggestions which could help to improve the manuscript further and make an impact in the field. My comments are listed below.

      (1) In general, the manuscript is well structured, but I feel that it requires professional English correction. In many places it was difficult to understand the sentences and I had to read it several times to understand it. Also, very long sentences should be avoided. The flow should be easy to read and understand, and that is why I feel it requires professional English correction.

      Following the comment, we checked English carefully and shortened the very long sentences.

      (2) Page 5, line 103, please include molecule after the word complex to make it like- "Only one complex molecule exists within an asymmetric unit."

      We revised this sentence (P5/L103).

      (3) Line 113- more than the N-terminal half of the protruding long helix α7 113 was disordered in the Sld3CBD-Cdc45 complex. This sentence is not clear. What does it mean more than the N-terminal half? Please rewrite it.

      We revised this sentence to give the corresponding residue number “(D219–H231)” (P5/L114).

      (4) Page 5, result 2- Conformation changes in Sld3CBD and Cdc45 for binding each other, this section may require a little restructuring. Line 130-131- "Therefore, the helix α8CTP seems to be an intrinsically disordered segment when Sld3 alone but 130 folds into a helix coupled to the binding partner Cdc45 in the Sld3CBD-Cdc45 complex." This statement is the crux of the structural finding and therefore, I feel it should move after the first sentence.

      Thank you for your comments. We rewrote this part (P5/L128-131).

      (5) Line 121-122: Compared to the isolated form (PDBIDs: 5DGO 121 for huCdc45 [31] and 6CC2 for EhCdc45 [33]) and the CMG form (PDBID: 3JC6. Write it in the same format. Make 6CC2 in bracket like other PDB IDs. Restructure this sentence.

      We revised this sentence (P5/122-123).

      (6) Line 127-129: This sentence is also not very clear.

      We revised this sentence together with above No (4). (P5/L128-131)

      (7) In my question 4- "Can authors add a supplementary figure showing the probability of disordernes..."., I meant to use a disorder prediction tool like IUPred for the protein sequences and show that α8 is predicted to be a disordered upon sequence analysis. This will help to show the inherent property of α8 helix, and it could add up to the understanding that a disordered region is being structured in the complex structure.

      The structures showed that α8CTP is stabilized by binding with Cdc45, but disordered in Sld3CBD alone, indicating that this part is flexible, like an intrinsically disordered segment. We have deposited the structure to PDB, so predictions like IUPred cannot show meaningful information.

      (8) Question 9 regarding Supplementary Figure 8- Please include your statement in the figure legend - "WT Sld3CBD was prepared in a complex with Cdc45, while the mutants of Sld3CBD existed alone, we calculated the elements of secondary structure from the crystal structure of Sld3CBD-Cdc45. The concentration of samples was controlled to the same level for CD measurement."

      Following the comment, we optimized the figure legend of Supplementary Figure 8.

      (9) Question 13- I understand that negative staining and SEC-SAXS experiments could be very tricky for such protein complexes, which have very long loops and are flexible. Did authors try a GraFix cross-linking before doing the negative staining TEM? If it is not being tried, then it might be a good idea to try it and it may help to get much cleaner particles and easier class averaging. Although I completely understand the technical challenges the authors describe and I agree with them, I still feel that one good experiment that shows this dimer model would be very helpful to strengthen the claim. I am concerned because if people start using a similar DLS experiment to calculate intermolecular distances, citing your paper, in many cases it might be a wrong interpretation. In case the negative staining still does not work, at least discuss your technical challenges in the discussion section and mention that SEC-SAXS showed a similar length of the complex and show the Guinier plot and Porod plots in the supplementary data.

      We believe that DLS is one of the methods for analyzing the single particle size. Of course, the confirmation by multiple methods will give compelling evidence. Following the comment, we added SEC-SAXS data in the [Results] (P7/L194-196) (Cdc45 recruitment to MCM DH by Sld3 with partner Sld7) and Supplementary Figure 11. The Sld7-Sld3-Cdc45 forms a flexible, long shape. Each binding domain is rigid but linked by the long loops. The flexibility problems are caused by the long loop linkers, but not by binding. So, we did not try to use the cross-linking method for analysis experiments.  

      (10) Page 8, line 221- litter sequence specificity: Correct the word "litter" with little. Also, the word shaped is written as sharped at a few places in the manuscript. Please correct it.

      We apologize for making such mistakes. We have modified these words.

      (11) Page 9, line 237-238: Would it be possible to add a lane showing Sld7 binding to the ssDNA in figure 4. I recommend showing this to understand the ssDNA binding affinity of Sld7 by itself and it will also help us to compare when it is in complex with Sld3.

      Considering that Sld7 on CMG is always a complex with Sld3, the ssDNA binding affinity should use the Sld3-Sld7 complex. Additionally, we attempted to overexpress Sld7, but could not obtain the target protein.

      Reviewer #2 (Recommendations for the authors):

      Thank you for the improved manuscript. The following sentence is unclear: "Cdc45 binds tighter to long ssDNA (>60 bases) with a litter sequence specificity".

      We apologize for making such a mistake. We modified “litter” to “little”.

      I found it challenging to understand which species were used while reading the results section and figure legends. I recommend that the authors revise the text in both the results and figure legends to clearly indicate when proteins from different species are being compared. Additionally, it would be valuable to explicitly acknowledge this limitation in the text.

      Following the comment, we added a description for using different species in results (P8/L224-225) and figure legends (Supplementary Figure 14). We added more information in the Methods to explain why we used two species for preparing proteins.

      Reviewer #3 (Recommendations for the authors):

      Major points:

      (1) The current title is not appropriate for the general readers. At least, DNA replication or DNA replication initiation should be added and abbreviations such as CBD should be avoided.

      Following the comment, we added “DNA replication” into the title. Regarding “CBD”, since the full name of “Cdc45 binding domain” is too long, we continue to use Sld3CBD.

      (2) As in my previous review, I asked for quantification of the EMSA assay shown in Figure 4 and Supplemental Figures 13 and 14. Since some signals of the bands are very weak, it is hard to conclude something. Given different protein concentrations used in the experiment, the authors should provide any kinds of value. For example, Sld3CBD-CDC45 shows weaker DNA binding than Sld3CBD alone (line 231). Is this true (or reproducible)? It is hard to conclude without any quantification.

      We have repeated the EMSA assay four or more times with different rods of overexpression, purification and DNA synthesis, indicating that the EMSA assay is reproducible. In this revision, we changed the DNA stain and adjusted the ratio between the protein and ssDNA with increasing concentrations. The smeared bands of ssDNA with Sld7–Sld3ΔC–Cdc45 or Sld7–Sld3ΔC exhibit enhanced discernibility, and the ssDNA bands are intense enough for grayscale calculations (Figure 4 in the second revised version). We used a series of t-tests to confirm a significantly ssDNA residual level between Sld3CBD–Cdc45 to Sld3CBD, Sld7–Sld3ΔC–Cdc45, and Sld7–Sld3ΔCS (t-test, ****: P<0.0001). We also carefully controlled the sample amount in the EMAS assay and described it in the [Methods].

      Moreover, in this EMSA assay (in Figure 4), the authors suggest that the disappearance of ssDNA bands corresponds with the binding of the protein to the DNA. However, it is also possible that the DNA is degraded. It is very important to show the band of protein-DNA complexes on the gel (a whole gel, not the parts of the gel shown in Figure). Why did the authors use this "insensitive" assay using SyberGreen, not radio-labelled ssDNA?

      In this revision, we added a negative control of no ssDNA-binding by using ssARS1-3_3 for all protein samples (Sld3CBD, Sld3CBD–Cdc45, Sld7–Sld3ΔC–Cdc45 and Sld7–Sld3ΔC), which were the same rod of expression and purification for bound to ssARS1s (ssARS1-2 and ssARS1-5) (Figure 4), showing that the disappearance of ssDNA bands is caused by binding to proteins, not degradation. Moreover, this time, by changing the DNA stain and increasing the concentration of the samples, the smeared ssDNA bands exhibit enhanced discernibility in the high molecular weight regions when mixed with Sld7–Sld3ΔC–Cdc45 or Sld7–Sld3ΔC, whereas no bands appeared in the NC (ssARS1-3_1). The positions of smeared ssDNA bonds correspond to those of protein in the protein-stain pages, indicating that ssARS1 were complexed with proteins. Following the comment, we show all bands on the gel in Figure 4 and Supplementary Figure 14. Compared to Sld7–Sld3ΔC–Cdc45 or Sld7–Sld3ΔC, Sld3CBD and ssDNA bonds could not be observed because the pI value of Sld3CBD, which affects the entry of the samples into the gel.

      We agree that using radio-labelled ssDNA can obtain a sensitive binding assay. However, current laboratory constraints did not allow us to use radio-labelled ssDNA. Furthermore, considering the characteristics of our target proteins, Sld3CBD, Sld3CBD–Cdc45, Sld7–Sld3ΔC–Cdc45, and Sld7–Sld3ΔC, we planned to perform the binding assay in a more natural state without any modifications, labelling or linkers. Additionally, we have attempted to use ITC experiments but failed in the measurements. Presumably, the conformational flexibility of Sld7-Sld3-Cdc45 and Sld7-Sld3 caused a thermodynamic anomaly.

      Minor points:

      (1) Line 215, 80b: This should be "80 nucleotides(nt)". Throughout the text, nucleotides is better than base to show the length of ssDNAs.

      Thank you for your comments. We modified these words throughout the text.

    1. eLife Assessment

      This important study provides a description of how single-neuron firing rates in the human medial temporal lobe and frontal cortex are modulated by theta-burst stimulation of the basolateral amydala. The results are supported by convincing evidence obtained from a rigorous task design and analysis of an incredibly rare dataset. The results may help guide future studies incorporating amygdala stimulation to improve patient health.

    2. Reviewer #1 (Public review):

      In this manuscript, Campbell et al. assess how intracranial theta-burst stimulation (TBS) applied to the basolateral amygdala in 23 epilepsy patients affects neuronal spiking in the medial temporal lobe and prefrontal cortex during a visual recognition memory task. This is an incredibly rare dataset; collecting single-unit spiking data from behaving humans during active intracranial stimulation is a Herculean task, with immense potential for translational studies of how stimulation may be applied to modulate biological mechanisms of memory. The authors utilize careful, high quality methodology throughout (e.g. task design, spike recording and sorting, statistical analysis), providing high confidence in the validity of their findings.

      In providing such a detailed and deep investigation into the single-unit responses to intracranial stimulation the authors provide a very useful resources to any researchers in the fields of brain stimulation and human neurophysiology. This work could be instrumental in guiding diverse research studies, from basic science investigating the role of theta oscillations in human cognition to translational work investigating deep-brain stimulation for memory.

      The authors have adequately addressed all prior concerns.

    3. Reviewer #2 (Public review):

      Summary:

      This study presents a valuable characterization of the effects of intracranial theta-burst stimulation of the basolateral amygdala on single units spiking activity in several areas in the human brain, associated with memory processing. It is written clearly and concisely, allowing readers to fully understand the analysis used.

      The authors used a visual recognition memory task previously employed by their group to characterize the effects of basolateral amygdala stimulation upon memory consolidation (Inman et al, 2018). This current report presents an interesting analysis that complements the results reported in the 2018 paper.

      Strengths:

      Rare combination of human neurophysiology and behavior -<br /> The type of experiment performed in the manuscript, which contains both neurophysiological data, behavior, and a deep brain stimulation intervention (DBS), is incredibly rare, takes many years to accomplish with tight collaboration between clinical and research teams. Our understanding of spiking dynamics of human neurons is very limited, and this report is an important piece in the puzzle that allows DBS to be used in future interventions that will benefit patients' health.

      Multiple brain areas included -<br /> It's important to note that the report analyzes brain areas with which the Amygdala has extensive connections (Fig. 1A) - Hippocampus, OFC, Amygdala, ACC. It seems that neurons in all these areas were modulated by the stimulation, except the ACC, in which firing rates were so low that only a handful of neurons were included in the analysis. This is an important demonstration that low-amplitude stimulation (even when reduced to 0.5mA) can travel far and wide across the human brain.

      The experiment is cleverly designed to tease apart responses due to visual stimuli (image presentation) and electrical stimulation. Authors suggest that the units modulated by stimulation are largely distinct from those responsive to image offset during trials without stimulation. The subpopulation that responds strongly also tends to have a higher baseline firing rate. It's important to add that the chosen modulation index is more likely to be significant in neurons with higher firing rates (Figure S8). The authors discuss the tradeoff of using a nonparametric modulation index for vs. other methods (for example, percent change in trial-averaged firing rate from baseline).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This is an exploratory study that doesn't explore quite enough. Critically, the authors make a point of mentioning that neuronal firing properties vary across cell types, but only use baseline firing rate as a proxy metric for cell type. This leaves several important explorations on the table, not limited to the following:”

      1a: “Do waveform shape features, which can also be informative of cell type, predict the effect of stimulation?”

      To address this question, we modeled our approach to cell type classification after Peyrache et al. 2012. More specifically, we extracted two features from the mean unit waveforms—the valley-to-peak time (VP) and the peak half-width (PHW). These features were then used to classify units into two distinct clusters (k-means, clusters = 2, based on a strong prior from existing literature), representing putative excitatory and inhibitory neurons. Our approach recapitulated many of the same observations in Peyrache et al. 2012, namely (1) identification of two clusters (low PHW/VP: inhibitory, high PHW/VP: excitatory), (2) an ~80/20 ratio of excitatory/inhibitory neurons, and (3) greater baseline firing rates in the inhibitory vs. excitatory neurons. However, we did not observe a preferential modulation of one cell type compared to another (see newly created Figure 4). A description of this analysis and its takeaways has been incorporated into the manuscript.

      Change to Text:

      Created Figure 4 (Separation of presumed excitatory and inhibitory neurons by waveform morphology).

      Caption: (A) Two metrics were calculated using the averaged waveforms for each detected unit: the valley-to-peak width (VP) and peak half-width (PHW). (B) Scatterplot of the relationship between VP and PHW; note that units with identical metrics are overlaid. Using k-means clustering, we identified two distinct response clusters, representing presumed excitatory (E, blue) and inhibitory (I, red) neurons. The units from which the example waveforms were taken are outlined in black. Probability distributions for each metric are shown along the axes. (C) Total number of units within each cluster, separated by region. (D) Comparison of baseline firing rates, separated by cluster. (E) Percent of modulated units in each cluster. * p < 0.05, NS = not significant.

      Added a description of clustering methodology to lines 132-137: “We calculated two metrics from the averaged waveform from each detected unit: the valley-to-peak-width (VP) and the peak half-width (PHW) (Figure 4A); previously, these two properties of waveform morphology have been used to discriminate pyramidal cells (excitatory) from interneurons (inhibitory) in human intracranial recordings (Peyrache et al., 2012). Next, we performed k-means clustering (n = 2 clusters) on the waveform metrics, in line with previous approaches to cell type classification.

      Added a section in the Results titled “Theta Burst Stimulation Modulates Excitatory and Inhibitory Neurons Equally”. Lines 370-378: “Using k-means clustering, we grouped neurons into two distinct clusters based on waveform morphology, representing neurons that were presumed to be excitatory (E) and inhibitory (I) (Figure 4B). Inhibitory (fast-spiking) neurons exhibited shorter waveform VP and PHW, compared with excitatory (regular-spiking) neurons (I cluster centroid: VP = 0.50ms, PHW = 0.51ms; E cluster centroid: VP = 0.32ms, PHW = 0.31ms), and greater baseline firing rates (U(N<sub>I</sub> = 23, N<<sub>E</sub> = 133) = 1074.50, p = 0.023) (Figure 4D). Although we observed a much greater proportion of excitatory vs. inhibitory neurons (E: 85.3%, I: 14.7%), stimulation appeared to affect excitatory and inhibitory neurons equally, suggesting that one cell type is not preferentially activated over another (Figure 4E).

      Modified discussion of the effects of stimulation on different cell types. Lines 475-483: “…To test these hypotheses directly, we clustered neurons into presumed excitatory and inhibitory neurons based on waveform morphology. In doing so, we observed ~85% excitatory and ~15% inhibitory neurons, which is very similar what has been reported previously in human intracranial recordings (Cowan et al. 2024, Peyrache et al., 2012). Interestingly, stimulation appeared to modulate approximately the same proportion of neurons for each cell type (~30%), despite the differently-sized groups. Recent reports, however, have suggested that the extent to which electrical fields entrain neuronal spiking, particularly with respect to phase-locking, may be specific to distinct classes of cells (Lee et al., 2024).”

      1b:  “Is the autocorrelation of spike timing, which can be informative about temporal dynamics, altered by stimulation? This is especially interesting if theta-burst stimulation either entrains theta-rhythmic spiking or is more modulatory of endogenously theta-modulated units.”

      The reviewer is correct in suggesting that rate-modulation represents only one of many possible ways by which exogenous theta burst stimulation may influence neuronal activity. Indeed, intracranial theta burst stimulation has previously been shown to evoke theta-frequency oscillatory responses in local field potentials (Solomon et al. 2021), and other forms of stimulation (i.e., transcranial alternating current stimulation) may modulate the rhythm, rather than the rate, of neuronal spiking (Krause et al. 2019).

      To investigate whether stimulation altered rhythmicity in neuronal firing, we contrasted the spike timing autocorrelograms, as suggested. More specifically, we computed the pairwise differences in spike timing for each trial, separating spikes into the same pre-, during-, and post-stimulation epochs described in the manuscript (bin size = 5 ms, max lag = 250 ms), grouped neurons by whether they were modulated, and then contrasted the differences in the latencies of the peak normalized autocorrelation value between epochs. Only neurons with a firing rate of ≥ 1 Hz (n = 70/203, 34.5%) were included in this analysis since sparse firing resulted in noisy autocorrelation estimates. Subsequent statistical testing of the peak latency differences between pre-/during- and pre-/post-stimulation did not reveal any group-level differences (Mann-Whitney U tests, p > 0.05). Thus, we were not able to identify neuronal responses suggestive of altered rhythmicity (see Figure S5). A description of this analysis and its takeaways has been incorporated into the manuscript.

      Of note, there are two elements of the data that constrain our ability to detect modulation in the rhythm of firing. First, the baseline activity recorded across neurons modulated by stimulation was relatively low (i.e., median firing rate = 1.77 Hz). Second, stimulation often resulted in a suppression, rather than an enhancement, of firing rate. Taken together, the sparse firing afforded limited opportunity to characterize changes to subtle patterns of spiking. 

      Change to Text:

      Created Figure S5 (Analysis of modulation in spiking rhythmicity)

      Caption: (A) Representative autocorrelograms ACG) for a single neuron. The pairwise differences in spike timing were computed for each trial and epoch (bin size = 5 ms, max lag = 250 ms), then smoothed with a Gaussian kernel. The peak in the normalized ACG across trials was computed for each epoch. (B) Kernel density estimate of the peak ACG lag, separated by epoch. (C) The peak ACG lags were split by whether the neuron was modulated (Mod) or unaffected by stimulation (NS = not significant) for each of the two contrasts: pre- vs. during-stim (left) and pre- vs. post-stim (right).

      Details about the autocorrelation methodology have been incorporated. Lines 166-172: “To investigate whether stimulation altered rhythmicity in neuronal firing, we analyzed the spike timing autocorrelograms. More specifically, we computed the pairwise differences in spike timing for each trial (bin size = 5 ms, max lag = 250 ms) and then contrasted the differences in the latencies of the peak normalized autocorrelation value between epochs (pre-, during-, post-stimulation). Only neurons with a firing rate of ≥ 1 Hz (n = 70/203, 34.5%) were included in this analysis since sparse firing resulted in noisy autocorrelation estimates.

      The results from contrasting the autocorrelograms are now mentioned briefly. Lines 297-298: “Stimulation, however, did not appear to alter the rhythmicity in neuronal firing, as measured by spiking autocorrelograms (Figure S5).”

      1c: “The authors reference the relevance of spike-field synchrony (30-55 Hz) in animal work, but ignore it here. Does spike-field synchrony (comparing the image presentation to post-stimulation) change in this frequency range? This does not seem beyond the scope of investigation here.”

      We agree that a further characterization of spike-field and spike-phase relationships may provide rich insights into more complex regional and interregional dynamics that may be altered by stimulation. Given that many metrics are biased by sample size (e.g., number of spikes), which can vary considerably, computing the pairwise phase consistency (PPC) between spikes and LFP is a preferred metric (Vinck et al. 2010). Although PPC is unbiased, its variance nonetheless increases considerably with low spike counts; pooling spike counts across trials, however, decouples the temporal relationship between spiking and the LFP phase for each trial, confounding results and yielding an unstable estimate.

      To determine whether such an analysis is indeed possible, we calculated the percentage of stimulation trials with ≥ 10 spikes in both the 1s pre- and post-stimulation epochs (a relatively low threshold for inclusion). Only a very small proportion of the total number of trials across all neurons met this criterion (2.5%). Thus, because of the sparse spiking in our data, we are unable to reliably characterize spike-field or spike-phase modulation in detected neurons.

      Change to Text:

      In the manuscript, we have added a description of why our data is not well-suited to investigate these relationships.

      Lines 532-538: “The present study did not investigate interactions between spiking activity and local field potentials because neuronal spiking was sparse at baseline and often further suppressed by stimulation; only a very small proportion of the total number of trials across all neurons exhibited ≥ 10 spikes in both the 1s pre- and post-stimulation epochs (~2.5%). Although certain metrics are not biased by sample size (e.g., pairwise phase consistency), low spike counts can dramatically affect variance and, therefore, result in unstable estimates (Vinck et al., 2011).

      1d: “How does multi-unit activity respond to stimulation? At this somewhat low count of neurons (total n=156 included) it would be valuable to provide input on multi-unit responses to stimulation as well.”

      We thank the reviewer for this suggestion. We have incorporated an analysis of multiunit activity (MUA), which similarly identifies robust modulation via permutation-based statistical testing and characterizes the different profiles of responses (i.e., increased vs. decreased MUA threshold crossings pre- vs. post-stimulation).

      Change to Text:

      Created Figure S8 (Analysis of multiunit activity response to stimulation)

      Caption: (A) Example trace of multiunit activity (MUA) in one channel during a single stimulation trial. Threshold crossings are highlighted with a pink dot overlaid on the MUA signal with a corresponding hash below. (B) The percentage of channels with significantly modulated MUA, separated by the direction of effect. (C) The percentage of channels with significantly modulated MUA, separated by direction effect and region. Inc (red; post > pre) vs. Dec (blue; post < pre). HIP = hippocampus, OFC = orbitofrontal cortex, AMY = amygdala, ACC = anterior cingulate cortex. *** p < 0.001, NS = not significant.

      Details about the MUA methodology have been incorporated. Lines 174-180: “Finally, we measured modulation in multiunit activity (MUA) by filtering the microleectrode signals in a 300-3,000 Hz window and counting the number of threshold crossings. Thresholds were determined on a per-channel basis and defined as -3.5 times the root mean square of the signal during the baseline period; activity during stimulation was excluded since stimulation artifact is difficult to separate from MUA in the absence of spike sorting.

      MUA results are now incorporated. Lines 365-367: “Additional characterization of MUA revealed a dominant signature of increased activity post- vs. pre-stimulation, in line with these trends observed at the single-neuron level (Figure S8).”

      1e: “Several intracranial studies have implicated proximity to white matter in determining the effects of stimulation on LFPs; do the authors see an effect of white matter proximity here?”

      We thank the reviewer for the interesting question. Subsequent characterization revealed only small differences in the proximity of stimulation contacts to white matter (range 1.5-8.0 mm), likely because the chosen target (i.e., basolateral amygdala) has several nearby white matter structures (e.g., stria terminalis). Nonetheless, we performed a linear regression between the proximity to white matter and the stimulation-induced effect on behavior (stimulation vs. no-stimulation d’ difference), the results of which indicate no clear association (p > 0.05; see Figure S9). Critically, this is not to suggest that white matter proximity has no interaction with the reported behavioral effects, but rather, that we could not identify such an association within our data.

      Change to Text:

      Created Figure S9 (The effect of stimulation proximity to white matter and distance to recorded neurons).

      Caption: (A) Kernel density estimate of the Euclidean distance from stimulation contacts to nearest WM structure (in mm); hash marks represent individual observations. (B) The change in memory performance (Δd’) was linearly regressed onto the distance from the stimulated contacts to white matter.

      The following has been added to lines 405-426: “Proximity to white matter has been shown to influence the effects of stimulation on behavior and the strength of evoked responses (Mankin et al., 2021; Mohan et al., 2020; Paulk et al., 2022). Across all stimulated contacts, we observed only small differences in the proximity of stimulation contacts to white matter (median = 4.5 mm, range = 1.5-8.0 mm), likely because the chosen target (i.e., basolateral amygdala) has several nearby white matter structures (e.g., stria terminalis). Nonetheless, we performed a linear regression between the proximity to white matter and the stimulation-induced effect on behavior (stimulation vs. no-stimulation d’ difference), the results of which indicate no clear association (p > 0.05; see Figure S9).

      Comment 2: “It is a little confusing to interpret stimulation-induced modulation of neuronal spiking in the absence of stimulation-induced change in behavior. How do the authors findings tell us anything about the neural mechanisms of stimulation-modulated memory if memory isn't altered? In line with point #1, I would suggest a deeper dive into behavior (e.g. reaction time? Or focus on individual sessions that do change in Figure 4A?) to make a stronger statement connecting the neural results to behavioral relevance.”

      We agree that the connection between the observed stimulation-induced neuronal modulation and effects on behavior is unclear and has proven challenging to elucidate. Per the reviewer’s suggestion, we further focused our analyses on the neuronal modulation effects in the individual sessions that resulted in a robust change in memory performance (stimulation vs. no-stimulation d’ difference threshold of ± 0.5, based on a moderate effect size for Cohen’s d); both a positive and negative threshold were used to capture robust changes in memory performance associated with firing rate modulation, whether enhancement or suppression. To this end, we contrasted the proportion of modulated neurons in the sessions where stimulation resulted in a robust behavioral change (Δd’) with those that did not (~d’). We did not observe a difference in the proportions between groups when collapsed across all sampled regions, or when separately evaluated (Fisher’s exact tests, p > 0.05; see Figure 5C).

      Given that this approach did not further clarify the connection between our neural and behavioral results, we believe it is most appropriate to deemphasize claims in the manuscript regarding the potential insights for behavioral modulation (e.g., memory enhancement), and have done so.

      Change to Text:

      Toned down reference to the memory-related effects of stimulation in the abstract by removing the following lines from the abstract: “Previously, we demonstrated that intracranial theta burst stimulation (TBS) of the basolateral amygdala (BLA) can enhance declarative memory, likely by modulating hippocampal-dependent memory consolidation…” and “…and motivate future neuromodulatory therapies that aim to recapitulate specific patterns of activity implicated in cognition and memory.”

      Changed Figure 4 to Figure 5

      Created Figure 5C (Interaction between behavioral effects and neuronal modulation)(C)  Change in recognition memory performance was split into two categories using a d’ difference threshold of ± 0.5: responder (positive or negative; Δd’, pink) and non-responder (~d’, grey). Individual d’ scores are shown (left) with points colored by outcome category; dotted lines demarcate category boundaries, and the grey-shaded region represents negligible change. The number of sessions within each outcome category (middle) and the proportion of modulated units as a function of outcome category, separated by region (right). NS = not significant.

      The description of the behavioral results has been updated. Lines 394-403: “At the level of individual sessions, we observed enhanced memory (Δd’ > +0.5) in 36.7%, impaired memory (Δd’ < -0.5) in 20.0%, and negligible change (-0.5 ≤ Δd’ ≤ 0.5) in 43.3% when comparing performance between the stim and no-stim conditions; a threshold of Δd’ ± 0.5 was chosen for this classification based on the defined range of a “medium effect” for Cohen’s d. To test our hypothesis that neuronal modulation would be associated with changes in memory performance, we combined the sessions that resulted in either memory enhancement or impairment and contrasted the proportion of modulated units across regions sampled. We did not, however, observe a meaningful difference in the proportion of modulated units when grouped by behavioral outcome (all contrasts p > 0.05) (Figure 5C).

      Lines 213-214 and 394-397 have been edited to reflect a change in the d’ threshold used for categorizing behavioral results (from Δd’ ± 0.2 to Δd’ ± 0.5).

      Comment 3: “It is not clear to me why the assessment of firing rates after image onset and after stim offset is limited to one second - this choice should be more theoretically justified, particularly for regions that spike as sparsely as these.”

      We thank the reviewer for this question and acknowledge that no clear justification was provided for this decision in the manuscript. Our decision to limit each of the analysis epochs to 1s was chosen for two reasons. First, the maximum possible length of the during-stimulation epoch was 1 s (stim on for 1 s). Although the pre- and post-stimulation epochs could be extended without issue, we were concerned that variable time windows could introduce a bias, for instance, resulting in different variances between epochs. Second, we anticipated, both from empirical observations and prior literature, that the neural response following stimulation or task features (e.g., image onset/offset) was likely to be transient, rather than sustained for a period of many seconds. By keeping the windows short, we ensured that our approach to detecting modulation (i.e., contrasting trial-wise spike counts between each pair of epochs) captured the intended effect rather than random noise. We have incorporated a discussion of this rationale in the Peri-Stimulation Modulation Analyses section.

      Change to Text:

      Lines 156-158 have been added: “Each epoch was constrained to 1 s to ensure that subsequent firing rate contrasts were unbiased and to capture potential transient effects (e.g., image onset/offset).”

      Comment 4: “This work coincides with another example of human intracranial stimulation investigating the effect on firing rates (doi: https://doi.org/10.1101/2024.11.28.625915). Given how incredibly rare this type of work is, I think the authors should discuss how their work converges with this work (or doesn't).”

      Thank you for bringing this highly relevant work to our attention. We were unaware of this recent preprint and have incorporated a discussion of its main findings into the manuscript.

      Change to Text:

      New citations: van der Plas et al. 2024 (bioRxiv), Cowan et al. 2024 (bioRxiv)

      The discussion of related studies has been updated. Lines 447-457: “Few studies, however, have characterized the impact of electrical stimulation via macroelectrodes on the spiking activity of human cortical neurons, none of which involve intracranial theta burst stimulation. One study reported a long-lasting reduction in neural excitability among parietal neurons, with variable onset time and recovery following continuous transcranial TBS in non-human primates (Romero et al., 2022). In a similar vein, it was recently shown that human neurons are largely suppressed by single-pulse electrical stimulation (Cowan et al., 2024; Plas et al., 2024). Other emerging evidence suggests that transcranial direct current stimulation may entrain the rhythm rather than rate of neuronal spiking (Krause et al., 2019) and that stimulation-evoked modulation of spiking may meaningfully impact behavioral performance on cognitive tasks (Fehring et al., 2024).”

      Comment 5: “What information does the pseudo-population analysis add? It's not totally clear to me.”

      We recognize the need to further contextualize the motivation for the exploratory pseudo-population analysis and appreciate the reviewer for bringing the lack of detail to our attention. In brief, the analysis allowed us to observe trends in activity across populations of neurons, which, in principle, are not visible by characterizing modulation solely in discrete neurons. Additional details have been incorporated into the manuscript, as suggested.

      Change to Text:

      Additional justification has been incorporated in the description of the methodology. Lines 185-187: “…This approach enables the identification of dominant patterns of coordinated neural activity that may not be apparent when examining individual neurons in isolation.”, lines 192-194: “…By collapsing across subjects into a common pseudo-population, this analysis provides a mesoscale view of how stimulation modulates shared activity patterns across anatomically distributed neural populations.”

      A summary interpretation has been added to the paragraph describing the results. Lines 326-328: “Taken together, these analyses reveal global structure in the state space of responses to BLA stimulation within hippocampal circuits.”

      Reviewer #2 (Public review):

      Comment 1 “Authors suggest that the units modulated by stimulation are largely distinct from those responsive to image offset during trials without stimulation. The subpopulation that responds strongly also tends to have a higher baseline of firing rate. It's important to add that the chosen modulation index is more likely to be significant in neurons with higher firing rates.”

      This is an important point that was not previously addressed in our manuscript. We suspect there are likely two factors at play worth considering with respect to our chosen nonparametric modulation index: neurons with lower activity require smaller changes in spike counts to be significantly modulated (easier to flip ranks), and neurons with higher activity empirically exhibit greater absolute shifts in the number of spikes. Our further use of permutation testing, while mitigating false positives, may also somewhat constrain the ability to detect modulation in sparsely active neurons. Nonetheless, given that many trials entailed few or no spikes, we believe this approach is preferable to alternatives that may be more susceptible to noise (e.g., percent change in trial-averaged firing rate from baseline).

      To better understand the tradeoffs with detection probability, we performed a sensitivity analysis. We generated synthetic data with different baseline firing rates (0.1-5.0 Hz) and effect sizes (± 0.1-0.7 Hz) and simulated the likelihood of detection with our given modulation index across neurons. The results of the simulation support the notion that the probability of detecting modulation is lower for sparsely active neurons (Figure S8C). Further discussion of this consideration for the chosen modulation index, as well as details regarding the sensitivity analysis, have been incorporated into the manuscript.

      Change to Text:

      Created Figure S7C (Detection probability analysis)

      Caption: The same permutation-based analyses reported in the manuscript were repeated under different control conditions… (C) Visualization of the predicted probability of detecting modulation across synthetic neurons with variable firing rates and modulation effect sizes; FR = firing rate.

      Lines 223-224 have been added to the Methods section titled “Firing Rate Control Analyses”: “We performed a series of control analyses to test whether our approach to firing rate detection was robust…”

      A description of the simulation has been incorporated into the same section as above. Lines 234-237: “Finally, to better understand the tradeoffs with our statistical approach, we generated synthetic data with different baseline firing rates (0.1-5.0 Hz) and effect sizes (± 0.1-0.7 Hz), then simulated the likelihood of detecting modulation across variable conditions (Figure S7C).”

      The description of the results from the control analyses has been updated. Lines 330-339: “Finally, we performed three supplementary analyses to evaluate the robustness of our approach to detecting firing rate modulation: a sensitivity analysis assessing the proportion of modulated units at different firing rate thresholds for inclusion/exclusion, a data dropout analysis designed to control for the possibility that non-physiological stimulation artifacts may preclude the detection of temporally adjacent spiking, and a synthetic detection probability analysis. These results recapitulate our observation that units with higher baseline firing are most likely to exhibit modulation (though the probability of detecting modulation is lower for sparsely active neurons) and suggest that suppression in firing rate is not solely attributable to amplifier saturation following stimulation (Figure S7).

      Comment 2: “Readers can benefit from understanding with more details the locations chosen for stimulation - in light of previous studies that found differences between effects based on proximity to white matter (For example - PMID 32446925, Mohan et al, Brain Stimul. 2020 and PMID 33279717 Mankin et al Brain Stimul. 2021).”

      This has been addressed in the above response to Reviewer’s 1 comment 1.1e.

      Change to Text:

      See changes related to Reviewer 1 comment 1.1e.

      Comment 3: “Missing information in the manuscript…”

      3a: “Images of stimulation anatomical locations for all subjects included in this study. Ideally information about the impedance of the contacts to be able to calculate the actual current used.”

      As requested, we have provided an image from the coronal T1 MRI sequence, which highlights the position of the stimulated contacts for each of the 16 patients. Though we did not measure the impedances directly, the stimulation was current-controlled, which ensured that the desired current and charge density were consistent regardless of the tissue or electrode impedance.

      Change to Text:

      Created Figure S1 (Anatomical location of stimulated electrodes).

      Caption: A coronal slice from the T1-weighted MRI scan is shown for each patient who participated in the study (n = 16). Electrode contacts within the same plane of the image are shown with blue circles, and the bipolar pair of stimulated contacts within the basolateral amygdala is highlighted in red.

      Lines 144-145 have been edited to reflect that the delivered stimulation was current-controlled: “Specifically, we administered current-controlled, charge-balanced, …”

      3b: “The studied population is epilepsy patients, and the manuscript lacks description of their condition, proximity to electrodes included in the study to pathological areas, and the number of units from each patient/hemisphere.”

      We agree that additional information regarding patient demographics, experimental details, and clinical characteristics would further contextualize this unique patient population. A new table has been included, which contains the following information: patient ID, sex, age, # experimental session, # SEEG leads (and # microelectrodes), # detected units (L vs. R hemisphere), and suspected seizure onset zone.

      Change to Text:

      Created Table S1 (Patient demographics and clinical characteristics).

      Lines 258-259 have been added: “…(see Table S1 for patient demographics).”

      3c: “I haven't seen any comments on code availability (calculating modulation indices and statistics) and data sharing.”

      For clarification, a section titled Resource Availability is already appended to the end of the manuscript following the Conclusion, which describes the data and code availability.

      Change to Text:

      None

      3d: “Small comment - Figure legend 3E - Define gray markers (non-modulated units?)”

      Thank you for highlighting this omission. We have updated the relevant figure caption.

      Change to Text:

      The following has been added to the Figure 3 caption: “…whereas units without a significant change in activity are shown in grey.”

    1. eLife Assessment

      This study presents an important discovery regarding the diversity and evolution of gall-forming microbial effectors. Supported by convincing computational structural predictions and analyses, the research provides insights into the unique mechanisms by which gall-forming microbes exert their pathogenicity in plants. This study also offers guidance that is of value for future studies on pathogen effector function and co-evolution with host plants.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents a comprehensive structure-guided secretome analysis of gall-forming microbes, providing valuable insights into effector diversity and evolution. The authors have employed AlphaFold2 to predict the 3D structures of the secretome from selected pathogens and conducted a thorough comparative analysis to elucidate commonalities and unique features of effectors among these phytopathogens.

      Strengths:

      The discovery of conserved motifs such as 'CCG' and 'RAYH' and their central role in maintaining the overall fold is an insightful finding. Additionally, the discovery of a nucleoside hydrolase-like fold conserved among various gall-forming microbes is interesting.

      Weaknesses:

      Important conclusions are not verified by experiments.

      Comments on revisions: I acknowledge the authors' revision efforts.

    3. Reviewer #2 (Public review):

      Summary:

      Soham Mukhopadhyay et al. investigated the protein folding of the secretome from gall-forming microbes using the AI-based structure-modeling tool AlphaFold2. Their study analyzed six gall-forming species, including two Plasmodiophorid species and four others spanning different kingdoms, along with one non-gall-forming Plasmodiophorid species, Polymyxa betae. The authors found no effector fold specifically conserved among gall-forming pathogens, leading to the conclusion that their virulence strategies are likely achieved through diverse mechanisms. However, they identified an expansion of the Ankyrin repeat family in two gall-forming Plasmodiophorid species, with a less pronounced presence in the non-gall-forming Polymyxa betae. Additionally, the study revealed that known effectors such as CCG and AvrSen1 belong to sequence-unrelated but structurally similar (SUSS) effector clusters.

      Strengths:

      (1) The bioinformatics analyses presented in this study are robust, and the AlphaFold2-derived resources deposited in Zenodo provide valuable resources for researchers studying plant-microbe interactions. The manuscript is also logically organized and easy to follow.

      (2) The inclusion of the non-gall-forming Polymyxa betae strengthens the conclusion that no effector fold is specifically conserved in gall-forming pathogens and highlights the specific expansion of the Ankyrin repeat family in gall-forming Plasmodiophorids.

      (3) Figure 4a and 4b effectively illustrate the SUSS effector clusters, providing a clear visual representation of this finding.

      (4) Figure 1 is a well-designed, comprehensive summary of the number and functional annotations of putative secretomes in gall-forming pathogens. Notably, it reveals that more than half of the analyzed effectors lack known protein domains in some pathogens, yet some were annotated based on their predicted structures, despite the absence of domain annotations.

      Weaknesses:

      (1) The effector families discussed in this paper remain hypothetical in terms of their functional roles, which is understandable given the challenges of demonstrating their functions experimentally. However, this highlights the need for experimental validation as a next step.

      Authors' response: Thank you. Yes, there is a lot of work to do in the coming years.

      Reviewer's response: Incorporating experimental validation substantially strengthened the manuscript. Did you try the AlphaFold-Multimer prediction of the interaction between PBTT_00818 and the GroES-like protein? Does the model indicate a high-confidence interface?

      (2) Some analyses, such as those in Figure 4e, emphasize motifs derived from sequence alignments of SUSS effector clusters. Since these effectors are sequence-unrelated, sequence alignments might be unreliable. It would be more rigorous to perform structure-based alignments in addition to sequence-based ones for motif confirmation. For instance, methods described in Figure 3E of de Guillen et al. (2015, https://doi.org/10.1371/journal.ppat.1005228) or tools like Foldseek could be useful for aligning structures of multiple sequences.

      Authors' response: In Fig. 4e, we highlight the conserved cysteine residues. While there is no clearly conserved overall motif, the figure illustrates that despite the high sequence divergence, the key cysteines involved in disulfide-bridge formation are consistently conserved across the sequences.

      Reviewer's response: Understood. Nevertheless, if a reliable sequence alignment can indeed be generated, I would interpret this to mean that the CCG effectors constitute a highly diversified family rather than being truly sequence unrelated. By comparison, members of the MAX effector family share a common fold, yet their sequences are so divergent that sequence alignment is impossible.

      (3) When presenting AlphaFold-generated structures, it is essential to include confidence scores such as pLDDT and PAE. For example, in Figure 1D of Derbyshire and Raffaele (2023, https://doi.org/10.1038/s41467-023-40949-9), the structural representations were colored red due to their high pLDDT scores, emphasizing their reliability.

      Authors' response: Thank you for the observation. Due to the restrictive parameters used in our analysis, over 90 % of the structure would appear red. For this reason, we chose not to include the color scale, as it would not provide additional informative value in this context.

      Reviewer's response: Understood.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a comprehensive structure-guided secretome analysis of gall-forming microbes, providing valuable insights into effector diversity and evolution. The authors have employed AlphaFold2 to predict the 3D structures of the secretome from selected pathogens and conducted a thorough comparative analysis to elucidate commonalities and unique features of effectors among these phytopathogens.

      Strengths:

      The discovery of conserved motifs such as 'CCG' and 'RAYH' and their central role in maintaining the overall fold is an insightful finding. Additionally, the discovery of a nucleoside hydrolase-like fold conserved among various gall-forming microbes is interesting.

      Weaknesses:

      Important conclusions are not verified by experiments.

      Thank you very much. There are many aspects of this study that could be further validated, each potentially requiring years of work. Therefore, we chose to focus on two specific hypotheses: are AlphaFol-Multimer predictions accurate? Can ANK target more than one host protein? Particularly, we focused on the identification of putative targets for one of the ankyrin repeat proteins, PBTT_00818 (Fig. 6). Using one-by-one yeast two-hybrid (Y2H) assays, we tested the AlphaFold-Multimer prediction of an interaction between PBTT_00818 and MPK3. The interaction did not occur in yeast, suggesting it might not take place under those conditions.

      This negative result led us to perform a Y2H screen using an Arabidopsis cDNA library, which identified a GroES-like protein, highly expressed in roots, as a potential target of the ANK effector. Surprisingly, both the PBTT_00818–MPK3 and PBTT_00818–GroES-like protein interactions were later confirmed in planta using BiFC assays. These findings suggest two key points: (1) AlphaFold predictions can be accurate for ANK proteins, and (2) ANK domains, known for mediating protein-protein interactions, may enable these effectors to target multiple host proteins.

      Although the precise biological implications remain unclear, it is possible that ANK proteins act as scaffolds or adaptors for other effectors during infection. The validations presented here open exciting avenues for further research into the role of ANK proteins in Plasmodiophorid pathogenesis and gall formation. This is presented in the corrected preprint and Fig. 7, Table S12, Fig. S7-S8.

      Reviewer #2 (Public review):

      Summary:

      Soham Mukhopadhyay et al. investigated the protein folding of the secretome from gall-forming microbes using the AI-based structure modeling tool AlphaFold2. Their study analyzed six gall-forming species, including two Plasmodiophorid species and four others spanning different kingdoms, along with one non-gall-forming Plasmodiophorid species, Polymyxa betae. The authors found no effector fold specifically conserved among gall-forming pathogens, leading to the conclusion that their virulence strategies are likely achieved through diverse mechanisms. However, they identified an expansion of the Ankyrin repeat family in two gall-forming Plasmodiophorid species, with a less pronounced presence in the non-gall-forming Polymyxa betae. Additionally, the study revealed that known effectors such as CCG and AvrSen1 belong to sequence-unrelated but structurally similar (SUSS) effector clusters.

      Strengths:

      (1) The bioinformatics analyses presented in this study are robust, and the AlphaFold2-derived resources deposited in Zenodo provide valuable resources for researchers studying plant-microbe interactions. The manuscript is also logically organized and easy to follow.

      (2) The inclusion of the non-gall-forming Polymyxa betae strengthens the conclusion that no effector fold is specifically conserved in gall-forming pathogens and highlights the specific expansion of the Ankyrin repeat family in gall-forming Plasmodiophorids.

      (3) Figure 4a and 4b effectively illustrate the SUSS effector clusters, providing a clear visual representation of this finding.

      (4) Figure 1 is a well-designed, comprehensive summary of the number and functional annotations of putative secretomes in gall-forming pathogens. Notably, it reveals that more than half of the analyzed effectors lack known protein domains in some pathogens, yet some were annotated based on their predicted structures, despite the absence of domain annotations.

      Weaknesses:

      (1) The effector families discussed in this paper remain hypothetical in terms of their functional roles, which is understandable given the challenges of demonstrating their functions experimentally. However, this highlights the need for experimental validation as a next step.

      Thank you. Yes, there is a lot of work to do in the coming years.

      (2) Some analyses, such as those in Figure 4e, emphasize motifs derived from sequence alignments of SUSS effector clusters. Since these effectors are sequence-unrelated, sequence alignments might be unreliable. It would be more rigorous to perform structure-based alignments in addition to sequence-based ones for motif confirmation. For instance, methods described in Figure 3E of de Guillen et al. (2015, https://doi.org/10.1371/journal.ppat.1005228) or tools like Foldseek could be useful for aligning structures of multiple sequences.

      In Fig. 4e, we highlight the conserved cysteine residues. While there is no clearly conserved overall motif, the figure illustrates that despite the high sequence divergence, the key cysteines involved in disulfide bridge formation are consistently conserved across the sequences.

      (3) When presenting AlphaFold-generated structures, it is essential to include confidence scores such as pLDDT and PAE. For example, in Figure 1D of Derbyshire and Raffaele (2023, https://doi.org/10.1038/s41467-023-40949-9), the structural representations were colored red due to their high pLDDT scores, emphasizing their reliability.

      Thank you for the observation. Due to the restrictive parameters used in our analysis, over 90% of the structure would appear red. For this reason, we chose not to include the color scale, as it would not provide additional informative value in this context.

      Reviewer #1 (Recommendations for the authors):

      Experimental validation of the significance of 'CCG' and 'RAYH' motifs would further strengthen this study.

      Regarding the Mig1-like protein in Ustilago maydis, the presence of four conserved cysteine residues that are pivotal for maintaining the stability of its folded structure raises an intriguing question. Specifically, while many Mig cluster effectors contain four cysteine residues that form two conserved disulfide bridges, this structure is notably absent in the Mig protein itself. The author has speculated that these four cysteine residues form two conserved disulfide bonds, which are crucial for the stability of Mig protein folding. However, this hypothesis remains unvalidated. To test this prediction, it would be prudent to simulate mutations in the cysteine residues corresponding to the disulfide bonds in Mig and employ molecular dynamics simulations to assess the stability of folding before and after the mutation.

      Mig-1 does contain the four conserved cysteine residues responsible for forming disulfide bridges. However, due to the high divergence among Mig-1-like sequences, the alignment software was unable to properly align all the cysteine residues. As a result, Mig-1 may appear to lack these conserved cysteines in the alignment, although they are indeed present upon individual inspection. This is an area that research groups working with U. maidis as a model could explore further to expand our understanding of this effector family.

      Could you please clarify why talking about Ankyrins and LRR in Arabidopsis thaliana (line 252)? Additionally, what are the structural and functional differences between the LRR sequences of P. brassicae and those of the host plants?

      This sentence refers to the identification of the ANK motif in P. brassicae and S. spongospora, not in Arabidopsis thaliana. While the hydrophobic core of the ANK domains appears conserved between the host and the pathogen, the surface residues are highly polymorphic.

      The evidence supporting the interaction between the ANK effector and Arabidopsis immunity-related proteins, as validated using AlphaFold-Multimer, is currently limited. To enhance the reliability of these data, it is advisable for the author to select several pairs of proteins predicted to interact for further experimental verification.

      We conducted a large-scale yeast two-hybrid (Y2H) screen using the ANK domain effector PBTT_00818, which was selected due to its high iPTM+pTM score. The Y2H interactions were subsequently validated through BiFC assays. Our results show that PBTT_00818 interacts with Arabidopsis MPK3 in the nucleus, consistent with predictions from the AlphaFold2-multimer model. In addition, PBTT_00818 was also found to target AT3G56460, a GroES-like zinc-binding alcohol dehydrogenase, also localized in the nucleus.

      While the manuscript is well-composed, certain sections could be enhanced for clarity and readability. For example, the discussion section could be expanded to include a more in-depth analysis of the implications of the findings for understanding the virulence mechanisms of gall-forming microbes. Additionally, a comparison of the findings with previous studies on related pathogens would provide a more comprehensive perspective.

      Certain sections of the discussion have been expanded. However, we chose to focus on the novel aspects of the study and to avoid comparisons with other plant pathogens, as those mechanisms are already well known and extensively studied. Studies using AlphaFold in plant pathology are also limited.

      *Reviewer #2 (Recommendations for the authors):*

      The results of clustering analyses are highly dependent on the chosen thresholds. Given that the authors provide clear and well-designed visualizations of SUSS effectors in Figures 4a and 4b, applying the same presentation methods to Figures 5a and 5b could make these analyses more convincing.

      We were able to generate the all-vs-all matrix for Figures 4a and 4b because it involved only 13 proteins. However, Figure 5b includes over 40 effectors, making it impractical to visualize the data in the same way. Instead, we presented the sequence-based clusters as nodes and connected them based on structural similarity.

    1. eLife Assessment

      This valuable study presents computational analyses of over 5,000 predicted extant and ancestral nitrogenase structures. The data analyses are convincing, it offers unique insights into the relationship between structural evolution and environmental and biological phenotypes. The data generated in this study provide a vast resource that can serve as a starting point for studies of reconstructed and extant nitrogenases.

    2. Reviewer #1 (Public review):

      This was a clearly written manuscript that did an excellent job summarizing complex data. In this manuscript, Cuevas-Zuviría et al. use protein modeling to generate over 5,000 predicted structures of nitrogenase components, encompassing both extant and ancestral forms across different clades. The study highlights that key insertions define the various Nif groups. The authors also examined the structures of three ancestral nitrogenase variants that had been previously identified and experimentally tested. These ancestral forms were shown in earlier studies to exhibit reduced activity in Azotobacter vinelandii, a model diazotroph.

    3. Reviewer #2 (Public review):

      Summary:

      This work aims to study the evolution of nitrogenanses, understanding how their structure and function adapted to changes in environment, including oxygen levels and changes in metal availability.

      The study predicts > 3000 structures of nitrogenases, corresponding to extant, ancestral and alternative ancestral sequences. It is observed that structural variations in the nitrogenases correlate with phylogenetic relationships. The amount of data generated in this study represents a massive and admirable undertaking. The study also provides strong insight into how structural evolution correlates with environmental and biological phenotypes.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Comments on revisions:

      I appreciate the authors responding to my comments. I think Fig. S10 helps put the structural data into more context. It would be helpful to make clearer in the legend what proteins are being compared, especially in 10C.

      Although I can see why the authors focus on the NifK extension and its potential connection to oxygen protection, I would point out that Vnf and Anf do not have this extension in their K subunit, and you find both Vnf and Anf in aerobic and facultative anaerobic diazotrophs. This is a minor point, but I think it is important to mention in the discussion.

      We thank the reviewer for their thoughtful comments. We now added an additional line to the Discussion following their recommendation and moved Figure S10 to main text.

      Reviewer #2 (Public review):

      Summary: 

      This work aims to study the evolution of nitrogenanses, understanding how their structure and function adapted to changes in environment, including oxygen levels and changes in metal availability. 

      The study predicts > 3000 structures of nitrogenases, corresponding to extant, ancestral and alternative ancestral sequences. It is observed that structural variations in the nitrogenases correlate with phylogenetic relationships. The amount of data generated in this study represents a massive and admirable undertaking. The study also provides strong insight into how structural evolution correlates with environmental and biological phenotypes. 

      We thank the reviewer for their summary and positive appraisal.

    1. eLife Assessment

      This fundamental study characterizes the mechanics and stability of bolalipids from archaeal membranes using a minimalist, physics-based computational model. The authors present a robust mesoscale model of bolalipids-containing membranes, systematically evaluating it across diverse membrane configurations. The results are compelling, demonstrating that the incorporation of bolalipids and regular bilayer lipids in archaeal membranes significantly enhances membrane fluidity and structural stability.

    2. Reviewer #2 (Public review):

      Summary:

      The authors aimed to understand the biophysical properties of archeal membranes made of bolalipids. Bacterial and eukaryotic membranes are made of lipids that self-assemble into bilayers. Archea, instead, use bolalipids, lipids that have two headgroups and can span the entire bilayer. The authors wanted to determine if the unique characteristics of archaea, which are often extremophiles, are in part due to the fact that their membranes contain bolalipids.

      The authors develop a minimal computational model to compare the biophysics of bilayers made of lipids, bolalipids, and mixtures of the two. Their model enables them to determine essential parameters such as bilayer phase diagrams, mechanical moduli, and the bilayer behavior upon cargo inclusion and remodeling.

      The author demonstrates that bolalipid bilayers behave as binary mixtures, containing bolalipids organized either in a straight conformation, spanning the entire bilayer, or in a u-shaped one, confined to a single leaflet. This dynamic mixture allows bolalipid bilayers to be very sturdy but also provides remodeling. However, remodeling is energetically more expensive than with standard lipids. The authors speculate that this might be why lipids were more abundant in the evolutionary process.

      Strengths:

      This is a wonderful paper, a very fine piece of scholarship. It is interesting from the point of view of biology, biophysics, and material science. The authors mastered the modeling and analysis of these complex systems. The evidence for their findings is really strong and complete. The paper is written superbly, the language is precise and the reading experience very pleasant. The plots are very well-thought.

      Weaknesses:

      None. The authors have addressed all the potential weaknesses that were raised by the reviewers.

    3. Reviewer #3 (Public review):

      Summary:

      The authors have studied the mechanics of bolalipid and archaeal mixed-lipid membranes via comprehensive molecular dynamics simulations. The Cooke-Deserno 3-bead-per-lipid model is extended to bolalipids with 6 bead. Phase diagrams, bending rigidity, mechanical stability of curved membranes, and cargo uptake are studied. Effects such as formation of U-shaped bolalipids, pore formation in highly curved regions, and changes in membrane rigidity are studied and discussed. The main aim has been to show how the mixture of bolalipids and regular bilayer lipids in archaeal membrane models enhances the fluidity and stability of these membranes.

      The authors have presented a wide range of simulation results for different membrane conditions and conformations. Analyses and findings are presented clearly and concisely. Figures, supplementary information and movies are of very high quality and very well present what has been studied. The manuscript is well written and is easy to follow.

      The authors have provided detailed response to the points I raised on the first version and have revised their manuscript accordingly. Hence, I only mention what, in my opinion, still deserves to be noted.

      Comments:

      I previously raised an issue with respect to the resort to the Hamm-Kozlov model for fitting the power spectrum of membrane undulations. The authors provided very nice arguments against my concerns. For the sake of completeness, I include a simple scenario, which will better highlight the issue:

      The tilt contribution to the Helfrich Hamiltonian can be written as a quadratic term 1/2 k_t |T|^2, where T is a tilt vector field. This field is written as the difference between the surface normal and the director field aligned with the lipid orientations. In the small deviation Monge description with z=h(x, y) as the height function, the surface normal has the form N=(-dh/dx, -dh/dy, 1). Now assume the director field, n = (b_x, b_y, 1) with small b_x and b_y components. The tilt contribution to the energy thus reads as 1/2 k_t (N - n)^2 ~= 1/2 k_t [|grad h|^2 + 2 b . grad h]. The first term, 1/2 k_t |grad h|^2, is indeed similar to a surface tension term, \sigma |grad h|^2 that you get from the (1 + 1/2 |grad h|^2) approximation to the area element. Therefore, if you only look at height fluctuations, while your membrane actually has some surface tension, it will make distinguishing the tilt contributions to the fluctuations in the linear Monge gauge impossible.

      However, considering that the authors have made sure that the membrane is indeed tensionless, this argument is settled.

      I had also raised an issue about the correct NpT sampling in the simulations, and I'm glad that the authors also set up more rigorously thermostatted/barostatted simulations to check the validity of their findings.

      Also, from the SI, I previously noted that the authors had neglected the longest wavelength mode because it was not equilibrated. This was an important problem and the authors looked into it and ran more simulations that were better equilibrated.

      The analysis of energy of U-shaped lipids with the linear model E=c_0 + c_1 * k_bola is indeed very interesting. I am glad that the authors have expanded this analysis and included mean energy measurements.

    1. eLife Assessment

      This compelling work describes how the cell cycle-regulating phosphatase subunit, RepoMan, is regulated by the oxygen-dependent, metabolite-sensing hydroxylase PHD1. The characterisation of how proline hydroxylation alters signalling at the molecular and cellular level provides important evidence to enhance our understanding of how 2-oxoglutarate-dependent dioxygenases influence the cell cycle and mitosis.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Druker et al. shows that siRNA depletion of PHD1, but not PHD2, increases H3T3 phosphorylation in cells arrested in prometaphase. Additionally, the expression of wild-type RepoMan, but not the RepoMan P604A mutant, restored normal H3T3 phosphorylation localization in cells arrested in prometaphase. Furthermore, the study demonstrates that expression of the RepoMan P604A mutant leads to defects in chromosome alignment and segregation, resulting in increased cell death. These data support a role for PHD1-mediated prolyl hydroxylation in controlling progression through mitosis. This occurs, at least in part, by hydroxylating RepoMan at P604, which regulates its interaction with PP2A during chromosome alignment.

      Strengths:

      The data support most of the conclusions made. However, some issues need to be addressed.

      Weaknesses:

      (1) Although ectopically expressed PHD1 interacts with ectopically expressed RepoMan, there is no evidence that endogenous PHD1 binds to endogenous RepoMan or that PHD1 directly binds to RepoMan.

      (2) There is no genetic evidence indicating that PHD1 controls progression through mitosis by catalyzing the hydroxylation of RepoMan.

      (3) Data demonstrating the correlation between dynamic changes in RepoMan hydroxylation and H3T3 phosphorylation throughout the cell cycle are needed.

      (4) The authors should provide biochemical evidence of the difference in binding ability between RepoMan WT/PP2A and RepoMan P604A/PP2A.

      (5) PHD2 is the primary proline hydroxylase in cells. Why does PHD1, but not PHD2, affect RepoMan hydroxylation and subsequent control of mitotic progression? The authors should discuss this issue further.

    3. Reviewer #2 (Public review):

      Summary:

      This is a concise and interesting article on the role of PHD1-mediated proline hydroxylation of proline residue 604 on RepoMan and its impact on RepoMan-PP1 interactions with phosphatase PP2A-B56 complex leading to dephosphorylation of H3T3 on chromosomes during mitosis. Through biochemical and imaging tools, the authors delineate a key mechanism in the regulation of the progression of the cell cycle. The experiments performed are conclusive with well-designed controls.

      Strengths:

      The authors have utilized cutting-edge imaging and colocalization detection technologies to infer the conclusions in the manuscript.

      Weaknesses:

      Lack of in vitro reconstitution and binding data.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript is a comprehensive molecular and cell biological characterisation of the effects of P604 hydroxylation by PHD1 on RepoMan, a regulatory subunit of the PPIgamma complex. The identification and molecular characterisation of the hydroxylation site have been written up and deposited in BioRxiv in a separate manuscript. I reviewed the data and came to the conclusion that the hydroxylation site has been identified and characterised to a very high standard by LC-MS, in cells and in vitro reactions. I conclude that we should have no question about the validity of the PHD1-mediated hydroxylation.

      In the context of the presented manuscript, the authors postulate that hydroxylation on P604 by PHD1 leads to the inactivation of the complex, resulting in the retention of pThr3 in H3.

      Strengths:

      Compelling data, characterisation of how P604 hydroxylation is likely to induce the interaction between RepoMan and a phosphatase complex, resulting in loading of RepoMan on Chromatin. Loss of the regulation of the hydroxylation site by PHD1 results in mitotic defects.

      Weaknesses:

      Reliance on a Proline-Alanine mutation in RepoMan to mimic an unhydroxylatable protein. The mutation will introduce structural alterations, and inhibition or knockdown of PHD1 would be necessary to strengthen the data on how hydroxylates regulate chromatin loading and interactions with B56/PP2A.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      We appreciate the reviewer’s agreement that our data, "support most of the conclusions made”.

      With respect to Concerns raised by reviewer 1:

      (1) Although ectopically expressed PHD1 interacts with ectopically expressed RepoMan, there is no evidence that endogenous PHD1 binds to endogenous RepoMan or that PHD1 directly binds to RepoMan.

      We do not fully agree that this comment is accurate - the implication is that we only show interaction between two exogenously expressed proteins, i.e. both exogenous  PHD1 and RepoMan, when in fact we show that tagged PHD1 interacts with endogenous RepoMan. The major technical challenge here is the well known difficulty of detetcing endogenous PHD1 in such cell lines. We agree that co-IP studies do not prove that this interaction is direct and never claim to have shown this, though we do feel that a direct interaction is most likely, albeit not proven.

      (2) There is no genetic evidence indicating that PHD1 controls progression through mitosis by catalyzing the hydroxylation of RepoMan.

      We agree that our current study is primarily a biochemical and cell biological study, rather than a genetic study. Nonetheless, similar biochemical and cellular approaches have been widely used and validated in previous studies in mechanisms regulating cell cycle progression and we are confident in the conclusions drawn based on the data obtained so far.

      (3) Data demonstrating the correlation between dynamic changes in RepoMan hydroxylation and H3T3 phosphorylation throughout the cell cycle are needed.

      We agree that it will be very interesting to analyse in more detail the cell cycle dynamics of RepoMan hydroxylation and H3T3 phosphorylation - along with other cell cycle parameters. We view this as outside the scope of our present study and are actively engaged in raising the additional funding needed to pursue such future experiments.

      (4) The authors should provide biochemical evidence of the difference in binding ability between RepoMan WT/PP2A and RepoMan P604A/PP2A.

      Here again we agree that it will be very interesting to analyse in future the detailed binding interactions between wt and mutant RepoMan and other interacting proteins, including PP2A. We view this as outside the scope of our present study and are actively engaged in raising the additional funding needed to pursue such future experiments.

      (5) PHD2 is the primary proline hydroxylase in cells. Why does PHD1, but not PHD2, affect RepoMan hydroxylation and subsequent control of mitotic progression? The authors should discuss this issue further.

      We agree with the main point underlining this comment, i.e., that there are still many things to be learned concerning the specific roles and mechanisms of the different PHD enzymes in vivo. We look forward to addressing these questions in future studies.

      Reviewer #2 (Public review):

      We appreciate the reviewer’s comments that our manuscript uses biochemical and imaging tools to delineate a key mechanism in the regulation of the progression of the cell cycle and their appreciation that our experiments performed are, 'conclusive with well-designed controls.'

      With respect to the specific Concern raised by reviewer 2:

      Lack of in vitro reconstitution and binding data.

      We agree that it will be very interesting to pursue in vitro reconstitution studies and detailed binding data. We view this as outside the scope of our present study and are actively engaged in raising the additional funding needed to pursue such future experiments.

      Reviewer #3 (Public review):

      We appreciate the reviewer’s comments that our study, “is a comprehensive molecular and cell biological characterisation of the effects of P604 hydroxylation by PHD1 on RepoMan, a regulatory subunit of the PPIgamma complex” and their conclusion that, “we should have no question about the validity of the PHD1-mediated hydroxylation”.

      With respect to the specific Concern raised by reviewer 3:

      Reliance on a Proline-Alanine mutation in RepoMan to mimic an unhydroxylatable protein. The mutation will introduce structural alterations, and inhibition or knockdown of PHD1 would be necessary to strengthen the data on how hydroxylates regulate chromatin loading and interactions with B56/PP2A.

      We do not agree that we rely solely on analysis of the single site pro-ala mutatin in RepoMan for our conclusions, since we also present a raft of additional experimental evidence, including knock-down data and experiments using both fumarate and FG. We would also reference the data we present on RepoMan in the parallel study by Jiang et al, which has also been reviewed by eLife and is currently available on biorxiv (doi: https://doi.org/10.1101/2025.05.06.652400). Of course we agree with the reviewer that even although the muatnt RepoMan features only a single amino acid change, this could still result in undetermined structural effects on the RepoMan protein that could conceivably contribute, at least in part, to some of the phenotypic effects observed. Hopefully future studies will help to clarify this.

    1. eLife Assessment

      This manuscript presents solid experimental data using Fmr1 knockout mice to explore the fundamental role of Fmr1 in sleep regulation. The study supports the hypothesis that scheduled feeding can improve circadian rhythm and behavior in a mouse model of Fragile X syndrome. These findings may offer new insights into neurodevelopmental disorders and their potential treatment strategies.

    2. Reviewer #1 (Public review):

      The authors conducted a comprehensive investigation into sleep and circadian rhythm disturbances in Fmr1 knockout (KO) mice, a model for Fragile X Syndrome (FXS). They began by monitoring daily home cage behaviors to identify disruptions in sleep and circadian patterns, then assessed the mice's adaptability to altered light conditions through photic suppression and skeleton photoperiod experiments. To uncover potential mechanisms, they examined the connectivity between the retina and the suprachiasmatic nucleus. The study also included an analysis of social behavior deficits in the mutant mice and tested whether scheduled feeding could alleviate these issues. Notably, scheduled feeding not only improved sleep, circadian, and social behaviors but also normalized plasma cytokine levels. The manuscript is strengthened by its focus on a significant and underexplored area-sleep deficits in an FXS model-and by its robust experimental design, which integrates a variety of methodological approaches to provide a thorough understanding of the observed phenomena and potential therapeutic avenues.

    3. Reviewer #2 (Public review):

      Summary:

      In the present study, the authors, using a mouse model of Fragile X syndrome, explore the intriguing hypothesis that restricting food access over the daily schedule will improve sleep patterns and subsequently enhance behavioral capacities. By restricting food access from 12h to 6h over the nocturnal period (the active period for mice), they show, in these KO mice, an improvement in the sleep pattern accompanied by reduced systemic levels of inflammatory markers and improved behavior. These data, using a classical mouse model of neurodevelopmental disorder (NDD), suggest that modifying eating patterns might improve sleep quality, leading to reduced inflammation and enhanced cognitive/behavioral capacities in children with NDD.

      Overall, the paper is well-written and easy to follow. The rationale of the study is generally well introduced. Data are globally sound. The interpretation is overall supported by the provided data.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thank you for the extensive response to my comments and questions.

      Reviewer #2 (Recommendations for the authors):

      (1) The Fmr1/Fxr2 double KO mice are not well described in the Introduction.

      We have changed the sentence in the introduction to clarify that in Zhang et al ., 2008 they used a mouse lacking both the Fmr1 gene and its paralog Fxr2.

      (3) The Authors decided not to discuss the potential translation of the present study to human patients, despite their final conclusion statement.

      The paragraph below has been added to the end of the discussion:

      “Translational Implications”

      The present findings support the view that circadian disruption is not merely a downstream consequence of disease processes but actively contributes to symptom expression. Hence, the possibility that interventions designed to reinforce circadian rhythms can hold therapeutic value for individuals with FXS and related neurodevelopmental conditions. Given that sleep and circadian dysfunction are detectable early in development and are predictive of more severe clinical phenotypes, circadian-based interventions may be particularly beneficial if applied during periods of heightened neural plasticity. Importantly, time-restricted feeding represents a relatively low-cost, non-invasive strategy that could be feasibly implemented in realworld settings. Further translational work is needed to evaluate whether the mechanistic links identified here—between circadian misalignment, immune dysregulation, and behavioral impairments—are conserved in humans, and similar approaches can be implemented for clinical use.

    1. eLife Assessment

      This study presents an important finding on the signaling mechanisms underlying Treg cell homeostasis by identifying the simultaneous requirement of diacylglycerol (DAG) kinases (DGK) alpha and zeta for Foxp3+ Treg cell function and follicular responses, with implications for the pathogenesis of some autoimmune diseases. Whereas data based on the characterization of double knock-out mice (for DGK alpha and zeta) is solid, showing the emergence of autoimmune manifestations, the study has gaps in its experimental approaches since it is not clear what can be attributed to the simultaneous DKGα and ζ deficiency, versus the individual deficiency of either one. Experiments on the pathogenic potential of the DKO Tregs in the absence of other T-cells were not presented and results on the role of CD25 downregulation and CD28-independent activation of Treg cells were not properly discussed. Nonetheless, the reported data would be of interest to immunologists working on T-cell intracellular signaling and autoimmunity.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Li and colleagues describes the impact of deficiency on the DKGα and ζ on Treg cells and follicular responses. The experimental approach is based on the characterization of double KO mice that show the emergence of autoimmune manifestations that include the production of autoantibodies. Additionally, there is an increase in Tfh cells, but also Tfr cells in these mice deficient in both DKGα and ζ. Although the observations are interesting, the interpretation of the observations is difficult in the absence of data related to single mutations. While a supplementary figure shows that the autoimmune manifestations are more severe in the DKGα and ζ deficient mice, prior observations show that a single DKGα deficiency has an impact on Treg homeostasis. As such, the contribution of the two chains to the overall phenotype is hard to establish.

      Strengths:

      Well-conducted experiments with informative mouse models with defined genetic defects.

      Weaknesses:

      The major weakness is the lack of clarity concerning what can be attributed to simultaneous DKGα and ζ deficiency versus deficiency on DKGα or ζ alone. Technical concerns related to a number of figures were raised in the initial report and not adequately addressed by the authors in the revised manuscript.

      In conclusion, the claims in the manuscript are not convincingly supported by the data,

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Li et al investigates the combined role of diacylglycerol (DAG) kinases (DGK) a and z in Foxp3+ Treg cells function that prevent autoimmunity. The authors generated DGK a and z Treg-specific double knock out mice (DKO) by crossing Dgkalpha-/- mice to DgKzf/f and Foxp3YFPCre/+ mice. The resulting "DKO" mice thus lack DGK a in all cells and DGK and z in Foxp3+Treg cells. The authors show that the DKO mice spontaneously develop autoimmunity, characterized by multiorgan inflammatory infiltration and elevated anti double strand DNA (dsDNA), -single strand DNA (ssDNA), and -nuclear autoantibodies. The authors attribute the DKO mice phenotype to Foxp3+Treg dysfunction, including accelerated conversion into "exTreg" cells with pathogenic activity. Interestingly, the combined deficiency of DGK a and z seems to release Treg cell dependence on CD28-mediated costimulatory signals, which the authors show by crossing their DKO mice to CD28-/- mice (TKO mice), which also develop autoimmunity.

      Strengths:

      The phenotypes of the mutant mice described in the manuscript are striking, and the authors provide a comprehensive analysis of the functional processes alters by the lack of DGKs.

      Weaknesses:

      One aspect that could be better explored is the direct role of "ex-Tregs" in causing pathogenesis in the models utilized.

      But overall, this is an important report that makes a significant addition to the understanding of DAG kinases to Treg cells biology.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Li and colleagues describes the impact of deficiency on the DKGα and ζ on Treg cells and follicular responses. The experimental approach is based on the characterization of double KO mice that show the emergence of autoimmune manifestations that include the production of autoantibodies. Additionally, there is an increase in Tfh cells, but also Tfr cells in these mice deficient in both DKGα and ζ. Although the observations are interesting, the interpretation of the observations is difficult in the absence of data related to single mutations. While a supplementary figure shows that the autoimmune manifestations are more severe in the DKGα and ζ deficient mice, prior observations show that a single DKGα deficiency has an impact on Treg homeostasis. As such, the contribution of the two chains to the overall phenotype is hard to establish.

      Strengths:

      Well-conducted experiments with informative mouse models with defined genetic defects.

      Weaknesses:

      The major weakness is the lack of clarity concerning what can be attributed to simultaneous DKGα and ζ deficiency versus deficiency on DKGα or ζ alone.

      Some interpretations are also not conclusively supported by data.

      We appreciate the reviewer 1’s positive comments about our manuscript and for the suggestion to include DGKα‑ or DGKζ‑single‑knockout (SKO) Tregs for the mechanistical studies. Unfortunately, performing this sound simple but truly extensive experiment would exceed our current budget and personnel capacity. Importantly, it is well known that DGKα and DGKζ act redundantly or synergistically in T cells, with single loss producing minimal or partial phenotypes compared with the double knockout. The comprehensive mechanistic data already presented for DGKαζ‑DKO Tregs therefore capture the combined functional and mechanistical deficit that is most relevant to DGK functions in Treg biology, and they support the conclusions drawn in this manuscript. The reviewer also pointed out some interpretation issues such as CD25 down regulation in Tfr cells and some minor issues. We appreciate the reviewer’s expertise and have revised the text and discussion accordingly.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Li et al investigate the combined role of diacylglycerol (DAG) kinases (DGK) α and ζ in Foxp3+ Treg cells function that prevent autoimmunity. The authors generated DGK α and ζ Treg-specific double knockout mice (DKO) by crossing Dgkalpha-/- mice to DgKzf and Foxp3YFPCre/+ mice. The resulting "DKO" mice thus lack DGK α in all cells and DGK ζ in Foxp3+Treg cells. The authors show that the DKO mice spontaneously develop autoimmunity, characterized by multiorgan inflammatory infiltration and elevated anti-double-strand DNA (dsDNA), -single-strand DNA (ssDNA), and -nuclear autoantibodies. The authors attribute the DKO mice phenotype to Foxp3+Treg dysfunction, including accelerated conversion into "exTreg" cells with pathogenic activity. Interestingly, the combined deficiency of DGK α and ζ seems to release Treg cell dependence on CD28-mediated costimulatory signals, which the authors show by crossing their DKO mice to CD28-/- mice (TKO mice), which also develop autoimmunity.

      Strengths:

      The phenotypes of the mutant mice described in the manuscript are striking, and the authors provide a comprehensive analysis of the functional processes altered by the lack of DGKs.

      Weaknesses:

      One aspect that could be better explored is the direct role of "ex-Tregs" in causing pathogenesis in the models utilized.

      However, overall, this is an important report that makes a significant addition to the understanding of DAG kinases in Treg cell biology.

      We greatly appreciate reviewer 2’s positive comments about the manuscript. The data we presented in the manuscript show that DGKαζDKO Tregs but not WT Tregs are able to trigger autoimmunity in T cell deficient mice in the presence of WT CD4 T cells support that DGKαζDKO Tregs are pathogenic. Reviewer 2 suggested to test the direct role of DGKαζDKO Treg/ex-Tregs in the pathogenesis of autoimmune diseases in the absence of conventional T cells. This is really an interesting idea that we will test it in the future should recourse for executing the experiment become available.

    1. eLife Assessment

      This important study decoded target-associated information in prefrontal and sensory cortex during the preparatory period of a visual search task, suggesting a memory component of human subjects performing such visual attention task. The evidence supporting this claim is compelling, based on multivariate pattern analyses of fMRI data. The results will be of interest to psychologists and cognitive neuroscientists.

    2. Reviewer #1 (Public review):

      When you search for something, you need to maintain some representation (a "template") of that target in your mind/brain. Otherwise, how would you know what you were looking for? If your phone is in a shocking pink case, you can guide your attention to pink things based on a target template that includes the attribute 'pink'. That guidance should get you to the phone pretty effectively, if it is in view. Most real-world searches are more complicated. If you are looking for the toaster, you will make use of your knowledge of where toasters can be. Thus, if you are asked to find a toaster, you might first activate a template of a kitchen or a kitchen counter. You might worry about pulling up the toaster template only after you are reasonably sure you have restricted your attention to a sensible part of the scene.

      Zhou and Geng are looking for evidence of this early stage of guidance by information about the surrounding scene in a search task. They train Os to associate four faces with four places. Then, with Os in the scanner, they show one face - the target for a subsequent search. After an 8 sec delay, they show a search display where the face is placed on the associated scene 75% of the time. Thus, attending to the associated scene is a good idea. The questions of interest are "When can the experimenters decode which face Os saw from fMRI recording?" "When can the experimenters decode the associated scene?" and "Where in the brain can the experimenters see evidence of this decoding? The answer is that the face but not the scene can be read out during the face's initial presentation. The key finding is that the scene can be read out (imperfectly but above chance) during the subsequent delay when Os are looking at just a fixation point. Apparently, seeing the face conjures up the scene in the mind's eye.

      This is a solid and believable result. The only issue, for me, is whether it is telling us anything specifically about search. Suppose you trained Os on the face-scene pairing but never did anything connected to search. If you presented the face, would you not see evidence of recall of the associated scene? Maybe you would see the activation of the scene in different areas and you could identify some areas as search specific. I don't think anything like that was discussed here.

      You might also expect this result to be asymmetric. The idea is that the big scene gives the search information about the little face. The face should activate the larger useful scene more than the scene should activate the more incidental face, if the task was reversed. That might be true if finding is related to search where the scene context is presumed to be the useful attention guiding stimulus. You might not expect an asymmetry if Os were just learning an association.

      It is clear in this study that the face and the scene have been associated and that this can be seen in the fMRI data. It is also clear that a valid scene background speeds the behavioral response in the search task. The linkage between these two results is not entirely clear but perhaps future research will shed more light.

      It is also possible that I missed the clear evidence of the search-specific nature of the activation by the scene during the delay period. If so, I apologize and suggest that the point be underlined for readers like me.

      Comments on revised version:

      I am satisfied with the revision.

    3. Reviewer #2 (Public review):

      Summary:

      This work is one of the best instances of a well-controlled experiment and theoretically impactful findings within the literature on templates guiding attentional selection. I am a fan of the work that comes out of this lab and this particular manuscript is an excellent example as to why that is the case. Here, the authors use fMRI (employing MVPA) to test whether during the preparatory search period, a search template is invoked within the corresponding sensory regions, in the absence of physical stimulation. By associating faces with scenes, a strong association was created between two types of stimuli that recruit very specific neural processing regions - FFA for faces and PPA for scenes. The critical results showed that scene information that was associated with a particular cue could be decoded from PPA during the delay period. This result strongly supports invoking of a very specific attentional template.

      Strengths:

      There is so much to be impressed with in this report. The writing of the manuscript is incredibly clear. The experimental design is clever and innovative. The analysis is sophisticated and also innovative. The results are solid and convincing.

      Weaknesses:

      I only have a few weaknesses to point out.<br /> This point is not so much of a weakness, but a further test of the hypothesis put forward by the authors. The delay period was long - 8 seconds. It would be interesting to split the delay period into the first 4seconds and the last 4seconds and run the same decoding analyses. The hypothesis here is that semantic associations take time to evolve, and it would be great to show that decoding gets stronger in the second delay period as opposed to the period right after the cue. I think it would be a stronger test of the template hypothesis.

      Typo in the abstract "curing" vs "during."

      It is hard to know what to do with significant results in ROIs that are not motivated by specific hypotheses. However, for Figure 3, what are explanations for ROIs that show significant differences above and beyond the direct hypotheses set out by the authors?

      Following the revision, I have no further comments or concerns.

    4. Reviewer #3 (Public review):

      The manuscript contains a carefully designed fMRI study, using MVPA patter analysis to investigate which high-level associate cortices contain target-related information to guide visual search. A special focus is hereby on so-called 'target-associated' information, that has previously been shown to help in guiding attention during visual search. For this purpose the author trained their participants and made them learn specific target-associations, in order to then test which brain regions may contain neural representations of those learnt associations. They found that at least some of the associations tested were encoded in prefrontal cortex during the cue and delay period.

      The manuscript is very carefully prepared. As far as I can see, the statistical analyses are all sound and the results integrate well with previous findings.

      I have no strong objections against the presented results and their interpretation.

      The authors have addressed all my previous comments and questions in their revision of the text.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      When you search for something, you need to maintain some representation (a "template") of that target in your mind/brain. Otherwise, how would you know what you were looking for? If your phone is in a shocking pink case, you can guide your attention to pink things based on a target template that includes the attribute 'pink'. That guidance should get you to the phone pretty effectively if it is in view. Most real-world searches are more complicated. If you are looking for the toaster, you will make use of your knowledge of where toasters can be. Thus, if you are asked to find a toaster, you might first activate a template of a kitchen or a kitchen counter. You might worry about pulling up the toaster template only after you are reasonably sure you have restricted your attention to a sensible part of the scene.

      Zhou and Geng are looking for evidence of this early stage of guidance by information about the surrounding scene in a search task. They train Os to associate four faces with four places. Then, with Os in the scanner, they show one face - the target for a subsequent search. After an 8 sec delay, they show a search display where the face is placed on the associated scene 75% of the time. Thus, attending to the associated scene is a good idea. The questions of interest are "When can the experimenters decode which face Os saw from fMRI recording?" "When can the experimenters decode the associated scene?" and "Where in the brain can the experimenters see evidence of this decoding? The answer is that the face but not the scene can be read out during the face's initial presentation. The key finding is that the scene can be read out (imperfectly but above chance) during the subsequent delay when Os are looking at just a fixation point. Apparently, seeing the face conjures up the scene in the mind's eye.

      This is a solid and believable result. The only issue, for me, is whether it is telling us anything specifically about search. Suppose you trained Os on the face-scene pairing but never did anything connected to the search. If you presented the face, would you not see evidence of recall of the associated scene? Maybe you would see the activation of the scene in different areas and you could identify some areas as search specific. I don't think anything like that was discussed here.

      You might also expect this result to be asymmetric. The idea is that the big scene gives the search information about the little face. The face should activate the larger useful scene more than the scene should activate the more incidental face, if the task was reversed. That might be true if the finding is related to a search where the scene context is presumed to be the useful attention guiding stimulus. You might not expect an asymmetry if Os were just learning an association.

      It is clear in this study that the face and the scene have been associated and that this can be seen in the fMRI data. It is also clear that a valid scene background speeds the behavioral response in the search task. The linkage between these two results is not entirely clear but perhaps future research will shed more light.

      It is also possible that I missed the clear evidence of the search-specific nature of the activation by the scene during the delay period. If so, I apologize and suggest that the point be underlined for readers like me.

      We have added text related to this issue, particularly in the discussion (page 19, line 6), and have also added citations of studies in humans and non-human primates showing a causal relationship between preparatory activity in prefrontal and visual cortex and visual search performance (page 6, line 16).

      Reviewer #2 (Public review):

      Summary:

      This work is one of the best instances of a well-controlled experiment and theoretically impactful findings within the literature on templates guiding attentional selection. I am a fan of the work that comes out of this lab and this particular manuscript is an excellent example as to why that is the case. Here, the authors use fMRI (employing MVPA) to test whether during the preparatory search period, a search template is invoked within the corresponding sensory regions, in the absence of physical stimulation. By associating faces with scenes, a strong association was created between two types of stimuli that recruit very specific neural processing regions - FFA for faces and PPA for scenes. The critical results showed that scene information that was associated with a particular cue could be decoded from PPA during the delay period. This result strongly supports the invoking of a very specific attentional template.

      Strengths:

      There is so much to be impressed with in this report. The writing of the manuscript is incredibly clear. The experimental design is clever and innovative. The analysis is sophisticated and also innovative. The results are solid and convincing.

      Weaknesses:

      I only have a few weaknesses to point out.<br /> This point is not so much of a weakness, but a further test of the hypothesis put forward by the authors. The delay period was long - 8 seconds. It would be interesting to split the delay period into the first 4seconds and the last 4seconds and run the same decoding analyses. The hypothesis here is that semantic associations take time to evolve, and it would be great to show that decoding gets stronger in the second delay period as opposed to the period right after the cue. I don't think this is necessary for publication, but I think it would be a stronger test of the template hypothesis.

      We conducted the suggested analysis, and we did not find clear evidence of differences in decoding scene information between the earlier and later portions of the delay period. This may be due to insufficient power when the data are divided, individual differences in when preparatory activation is the strongest, or truly no difference in activation over the delay period. More details of this analysis can be found in the supplementary materials (page 12, line 16; Figure S1).

      Type in the abstract "curing" vs "during."

      Fixed.

      It is hard to know what to do with significant results in ROIs that are not motivated by specific hypotheses. However, for Figure 3, what are the explanations for ROIs that show significant differences above and beyond the direct hypotheses set out by the authors?

      We added reasoning for the other a priori ROIs in the introduction (page 4, line 26). There is substantial evidence suggesting that frontoparietal areas are involved in cognitive control, attentional control, and working memory. The ROIs we selected from frontal and parietal cortex are based on parcels within resting state networks defined by the s17-network atlases (Schaefer et al., 2018). The IFJ was defined by the HCP-MMP1 (Glasser et al., 2016). These regions are commonly used in studies of attention and cognitive control, and the exact ROIs selected are described in the section on “Regions of interest (ROI) definition”. While we have the strongest hypothesis for IFJ based on relatively recent work from the Desimone lab, the other ROIs in lateral frontal cortex and parietal cortex, are also well documented in similar studies, although the exact computation being done by these regions during tasks can be hard to differentiate with fMRI.\

      Reviewer #3 (Public review):

      The manuscript contains a carefully designed fMRI study, using MVPA pattern analysis to investigate which high-level associate cortices contain target-related information to guide visual search. A special focus is hereby on so-called 'target-associated' information, that has previously been shown to help in guiding attention during visual search. For this purpose the author trained their participants and made them learn specific target-associations, in order to then test which brain regions may contain neural representations of those learnt associations. They found that at least some of the associations tested were encoded in prefrontal cortex during the cue and delay period.

      The manuscript is very carefully prepared. As far as I can see, the statistical analyses are all sound and the results integrate well with previous findings.

      I have no strong objections against the presented results and their interpretation.

      Reviewer #1 (Recommendations for the authors):

      One bit of trivia. In the abstract, you should define IFJ on its first appearance in the text. You get to that a bit later.

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      I really don't have much to suggest, as I thought that this was a clearly written report that offered a clever paradigm and data that supported the conclusions. My only suggestion would be to split the delay period activity and test whether the strength of the template evolves over time. Even though fMRI is not the best tool for this, still you would predict stronger decoding in the second half of the delay period

      Please see above for our response to the same comment.

      Reviewer #3 (Recommendations for the authors):

      I would just like to point out some minor aspects that might be worth improving before publishing this work.

      Abstract: While in general, the writing is clear and concise, I felt that the abstract of the manuscript was particularly hard to follow, probably because the authors at some point re-arranged individual sentences. For example, they write in line 12 about 'the preparatory period', but explain only in the following sentence that the preparatory period ensues 'before search begins'. This made it a bit hard to follow the overall logic and I think could easily be fixed. 

      We have addressed this comment and updated the abstract.

      Also in the abstract: 'The CONTENTS of the template typically CONTAIN...' sounds weird, no? Also, 'information is used to modulate sensory processing in preparation for guiding attention during search' sounds like a very over-complicated description of attentional facilitation. I'm not convinced either whether the sequence is correct here. Is the information really used to (first) modulate sensory processing (which is a sort of definition of attention in itself) to (then) prepare the guidance of attention in visual search?

      We have addressed this comment and updated the abstract.

      The sentence in line 7, 'However, many behavioral studies have shown that target-associated information is used to guide attention,...' (and the following sentence) assumes that the reader is somewhat familiar with the term 'target-associations'. I'm afraid that, for a naive reader, this term may only become fully understandable once the idea is introduced a bit later when mentioning that participants of the study were trained on face-scene pairings. I think it could help to give some very short explanation of 'target-associations' already when it is first mentioned. The term 'statistically co-occurring object pairs', for example, could be of great help here.

      Thank you for the suggestion. We have added it to the abstract.

      page 2, line 22: 'prefrotnal'

      Fixed.

      page 2, line 24/25: 'information ... can SUPPLANT (?) ... information'. (That's also a somewhat unfortunate repetition of 'information')

      Fixed.

      page 4, line 23-25: 'Working memory representations in lateral prefrontal and parietal regions are engaged in cognitive control computations that ARE (?) task non-specific but essential to their functioning'

      Fixed.

      page 7, line 1: maybe a comma before 'suggesting'?

      Fixed.

      page 7, line 14-16: Something seems wrong with this sentence: 'The distractor face was a race-gender match, which we previously FOUND MADE (?) target discrimination difficult enough to make the scene useful for guiding attention'

      We have addressed this comment and rewritten this part (now on page 7, line 18).

      Results / Discussion sections:

      In several figures, like in Fig3A, the three different IFJ regions, are grouped separately from the other frontal areas, which makes sense given the special role IFJ plays for representing task-related templates. However, IFJ is still part of PFC. I think it would be more correct to group the other frontal areas (like FEF vLPFC etc.) as 'Other Frontal' or even 'Other PFC'.

      We have made the changes based on the reviewer’s suggestion.

      In some of the Figures, e.g. Fig 3 and 5, I had the impression that the activation patterns of some conditions in vLPFC were rather close to the location of IFJ, which is just a bit posterior. I think I remember that functional localisers of IFJ can actually vary quite a bit in localisation (see e.g. in the Baldauf/Desimone paper). Also, I think it has been shown in the context of other regions, like the human FEF that its position when defined by localisation tasks is not always nicely and fully congruent with the respective labels in an atlas like the Glasser atlas. It might help to take this in consideration when discussing the results, particularly since the term vLPFC is a rather vague collection of several brain parcels and not a parcel name in the Glasser atlas. Some people might even argue that vLPFC in the broad sense contains IFJ, similar to how 'Frontal' contains IFJ (see above). How strong of a point do the authors want to make about activation in IFJ versus in vlPFC?

      We have now added text discussing the inability to truly differentiate between subregions of IFJ and other parts of vLPFC in the methods section on ROIs (page 25, line 13) and in the discussion (page 18, line 25). However, one might think that it is even more surprising given the likely imprecision of ROI boundaries that we see distinct patterns between the subregions of IFG defined by Glasser HCP-MMP1 and the other vLPFC regions defined by the 17-network atlases. We do not wish to overstate the precision of IFJ regions, but note the ROI results within the context of the larger literature. We are sure that our findings will have to be reinterpreted when newer methods allow for better localization of functional subregions of the vLPFC in individuals.

      Given that the authors nicely explain in the introduction how important templates are in visual search, and given that FEF has such an important role in serially guiding saccades through visual search templates, I think it would be worth discussing the finding that FEF did not hold representation of these targets. Of course, this could be in part due to the specific task at hand, but it may still be interesting to note in the Discussion section that here FEF, although important for some top-down attention signals, did not keep representations of the 'search' templates. Is it because there is no spatial component to the task at hand (like proposed in Bedini 2021)?

      We have now added text directly addressing this point and citing the Bedini et al. paper in the discussion (page 18, line 18). Besides our current findings, the relationship between IFJ and FEF is really interesting and will hopefully be investigated more in the future.

      Page 18, line 5: 'we the(N) associated...'

      Fixed.

    1. eLife Assessment

      This manuscript by Li, Lu et al., presents important findings on the role of cDC1 in atherosclerosis and their influence on the adaptive immune system. Using Xcr1Cre-Gfp Rosa26LSL-DTA ApoE-/- mouse models, these data convincingly reveal an unexpected, non-redundant role of the XCL1-XCR1 axis in mediating cDC1 contributions to atherosclerosis.

    2. Reviewer #1 (Public review):

      Summary:

      In this study by Li et al., the authors re-investigated the role of cDC1 for atherosclerosis progression using the ApoE model. First, the authors confirmed the accumulation of cDC1 in atherosclerotic lesions in mice and humans. Then in order to examine the functional relevance of this cell type, the authors developed a new mouse model to selectively target cDC1. Specifically, they inserted the Cre recombinase directly after the start codon of endogenous XCR1 gene, thereby avoiding off-target activity. Following validation of this model, the authors crossed it with ApoE-deficient mice and found a striking reduction of aortic lesions (numbers and size) following high fat diet. The authors further characterized the impact of cDC1 depletion on lesional T cells and their activation state. Also, they provide in-depth transcriptomic analyses of lesional in comparison to splenic and nodal cDC1. These results imply cellular interactions between lesion T cells and cDC1. Finally, the authors show that the chemokine XCL1, which is produced by activated CD8 T cells (and NK cells) plays a key role for the interaction with XCR1-expressing cDC1 and particularly for the atherosclerotic disease progression.

      Strengths:

      The surprising results on XCL1 represent a very important gain in knowledge. The role of cDC1 is clarified with a new genetic mouse model.

      Comments on revised version:

      The authors have addressed my concerns in the revised version of this manuscript.

    3. Reviewer #2 (Public review):

      This study investigates the role of cDC1 in atherosclerosis progression using Xcr1Cre-Gfp Rosa26LSL-DTA ApoE-/- mice. The authors demonstrate that selective depletion of cDC1 reduces atherosclerotic lesions in hyperlipidemic mice. While cDC1 depletion did not alter macrophage populations, it suppressed T cell activation (both CD4+ and CD8+ subsets) within aortic plaques. Further, targeting the chemokine Xcl1 (ligand of Xcr1) effectively inhibits atherosclerosis. The manuscript is well-written, and data are clearly presented. The data provided in the article can well support the author's conclusion.

      Comments on revised version:

      The authors have addressed all previous concerns and made appropriate revisions to the data. I have no further questions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this study by Li et al., the authors re-investigated the role of cDC1 for atherosclerosis progression using the ApoE model. First, the authors confirmed the accumulation of cDC1 in atherosclerotic lesions in mice and humans. Then, in order to examine the functional relevance of this cell type, the authors developed a new mouse model to selectively target cDC1. Specifically, they inserted the Cre recombinase directly after the start codon of the endogenous XCR1 gene, thereby avoiding off-target activity. Following validation of this model, the authors crossed it with ApoE-deficient mice and found a striking reduction of aortic lesions (numbers and size) following a high-fat diet. The authors further characterized the impact of cDC1 depletion on lesional T cells and their activation state. Also, they provide in-depth transcriptomic analyses of lesional in comparison to splenic and nodal cDC1. These results imply cellular interactions between lesion T cells and cDC1. Finally, the authors show that the chemokine XCL1, which is produced by activated CD8 T cells (and NK cells), plays a key role in the interaction with XCR1-expressing cDC1 and particularly in the atherosclerotic disease progression.<br /> Strengths:

      The surprising results on XCL1 represent a very important gain in knowledge. The role of cDC1 is clarified with a new genetic mouse model.

      Thank you

      Weaknesses:

      My criticism is limited to the analysis of the scRNAseq data of the cDC1. I think it would be important to match these data with published data sets on cDC1. In particular, the data set by Sophie Janssen's group on splenic cDC1 might be helpful here (PMID: 37172103; https://www.single-cell.be/spleen_cDC_homeostatic_maturation/datasets/cdc1). It would be good to assign a cluster based on the categories used there (early/late, immature/mature, at least for splenic DC).

      Thank you very much for your help. Using the scRNA seq data of Xcr1<sup>+</sup> cDC1 sorted from ApoE<sup>–/–</sup> mice, we re-annotated the populations, following the methodology proposed by Sophie Janssen's group. These results are presented in Figure S9 and Figure S10 and described in detail in the Results and Discussion section.

      Please refer to the Results section from line 264 to 284: “Using the scRNA seq data of Xcr1<sup>+</sup> cDC1 sorted from hyperlipidemic mice, we annotated the 10 populations as shown in Figure S9A, following the methodology from a previous study [41]. Ccr7<sup>+</sup> mature cDC1s (Cluster 3, 7 and 9) and Ccr7- immature cDC1s (remaining clusters) were identified across cDC1 cells sorted from aorta, spleen and lymph nodes (Figure S9B). Further stratification based on marker genes reveals that Cluster 10 is the pre-cDC1, with high expression level of CD62L (Sell) and low expression level of CD8a (Figure S9C). Cluster 6 and 8 are the proliferating cDC1s, which express high level of cell cycling genes Stmn1 and Top2a (Figure S9D). Cluster 1 and 4 are early immature cDC1s, and cluster 2 and 5 are late immature cDC1s, according to the expression pattern of Itgae, Nr4a2 (Figure S9E). Cluster 9 cells are early mature cDC1s, with elevated expression of Cxcl9 and Cxcl10 (Figure S9F). Cluster 3 and 7 as late mature cDC1s, characterized by the expression of Cd63 and Fscn1 (Figure S9G). As shown in Figure 5C and Figure S9, the 10 populations displayed a major difference of aortic cDC1 cells that lack in pre-cDC1s (cluster 10) and mature cells (cluster 3, 7 and 9). Interestingly, in hyperlipidemic mice splenic cDC1 possess only Cluster 3 as the late mature cells while the lymph node cDC1 cells have two late mature populations namely Cluster 3 and Cluster 7. In further analysis, we also compared splenic cDC1 cells from HFD mice to those from ND mice. As shown in Figure S10, HFD appears to impact early immature cDC1-1 cells (Cluster 1) and increases the abundance of late immature cDC1 cells (Cluster 2 and 5), regardless of the fact that all 10 populations are present in two origins of samples. We also found that Tnfaip3 and Serinc3 are among the most upregulated genes, while Apol7c and Tifab are downregulated in splenic cDC1 cells sorted from HFD mice”.  

      Please refer to the Discussion section from line 380 to 385: “Based on the maturation analysis of the cDC1 scRNA seq data [41], our findings suggest that the aortic cDC1 cells display a major difference from those of spleen and lymph nodes by lacking the mature clusters, whereas lymph node cDC1 cells contain an additional Fabp5<sup>+</sup> S100a4<sup>+</sup> late mature Cluster. Our results also suggest that hyperlipidemia contributes to alteration in early immature cDC1 and in the abundance of late immature cDC1 cells, which was associated with dramatic change in gene expression of Tnfaip3, Serinc3, Apol7c and Tifab”.

      Reviewer #2 (Public review):

      This study investigates the role of cDC1 in atherosclerosis progression using Xcr1Cre-Gfp Rosa26LSL-DTA ApoE-/- mice. The authors demonstrate that selective depletion of cDC1 reduces atherosclerotic lesions in hyperlipidemic mice. While cDC1 depletion did not alter macrophage populations, it suppressed T cell activation (both CD4+ and CD8+ subsets) within aortic plaques. Further, targeting the chemokine Xcl1 (ligand of Xcr1) effectively inhibits atherosclerosis. The manuscript is well-written, and the data are clearly presented. However, several points require clarification:

      (1) In Figure 1C (upper plot), it is not clear what the Xcr1 single-positive region in the aortic root represents, or whether this is caused by unspecific staining. So I wonder whether Xcr1 single-positive staining can reliably represent cDC1. For accurate cDC1 gating in Figure 1E, Xcr1+CD11c+ co-staining should be used instead.

      The observed false-positive signal in the wavy structures within immunofluorescence Figure 1C (upper panel) results from the strong autofluorescence of elastic fibers, a major vascular wall component (alongside collagen). This intrinsic property of elastic fibers is a well-documented confounder in immunofluorescence studies [A, B].

      In contrast, immunohistochemistry (IHC) employs an enzymatic chromogenic reaction (HRP with DAB substrate) that generates a brown precipitate exclusively at antigen-antibody binding sites. Importantly, vascular elastic fibers lack endogenous enzymatic activity capable of catalyzing the DAB reaction, thereby preventing this source of false positivity in IHC.

      Given that Xcr1 is exclusively expressed on conventional type 1 dendritic cells [C], and considering that IHC lacks the multiplexing capability inherent to immunofluorescence for antigen co-localization, single-positive Xcr1 staining reliably identifies cDC1s in IHC results.

      [A] König, K et al. “Multiphoton autofluorescence imaging of intratissue elastic fibers.” Biomaterials vol. 26,5 (2005): 495-500. doi:10.1016/j.biomaterials.2004.02.059

      [B] Andreasson, Anne-Christine et al. “Confocal scanning laser microscopy measurements of atherosclerotic lesions in mice aorta. A fast evaluation method for volume determinations.” Atherosclerosis vol. 179,1 (2005): 35-42. doi:10.1016/j.atherosclerosis.2004.10.040

      [C] Dorner, Brigitte G et al. “Selective expression of the chemokine receptor XCR1 on cross-presenting dendritic cells determines cooperation with CD8+ T cells.” Immunity vol. 31,5 (2009): 823-33. doi:10.1016/j.immuni.2009.08.027

      (2) Figure 4D suggests that cDC1 depletion does not affect CD4+/CD8+ T cells. However, only the proportion of these subsets within total T cells is shown. To fully interpret effects, the authors should provide:

      (a) Absolute numbers of total T cells in aortas.

      (b) Absolute counts of CD4+ and CD8+ T cells.

      Thanks for your suggestions. We agree that assessing both proportions and absolute numbers in Figure 4 provides a more complete picture of the effects of cDC1 depletion on T cell populations. Furthermore, we also add the absolute count of cDC1 cells and total T cells, and CD44 MFI (mean fluorescence intensity) in CD4<sup>+</sup> and CD8<sup>+</sup> T cells in Figure 4, and supplemented corresponding textual descriptions in the revised manuscript.

      Please refer to the Results section from line 183 to 187: “Subsequently, we assessed T cell phenotype in the two groups of mice. While neither the frequencies nor absolute counts of aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells differed significantly between two groups of mice (Figure 4D-F), CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcr1<sup>+</sup> cDC1 depleted mice compared to controls (Figure 4G and H)”.

      (3) How does T cell activation mechanistically influence atherosclerosis progression? Why was CD69 selected as the sole activation marker? Were other markers (e.g., KLRG1, ICOS, CD44) examined to confirm activation status?

      We sincerely appreciate these insightful comments. As extensively documented in the literature, activated effector T cells (both CD4+ and CD8+) critically promote plaque inflammation and instability through their production of pro-inflammatory cytokines (particularly IFN-γ and TNF-α), which drive endothelial activation, exacerbate macrophage inflammatory responses, and impair smooth muscle cell function [A].

      In our study, we specifically investigated the role of cDC1 cells in atherosclerosis progression. Our key findings demonstrate that cDC1 depletion attenuates T cell activation (as shown by reduced CD69/CD44 expression) and that this reduction in activation is functionally linked to the observed decrease in atherosclerosis burden in our model. 

      Regarding CD44 as an activation marker, we performed quantitative analyses of CD44 mean fluorescence intensity (MFI) in aortic T cells (Figure 4). Importantly, the MFI of CD44 was significantly lower on both CD4+ and CD8+ T cells from Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> mice compared to the control ApoE<sup>–/–</sup> mice (data shown below), which is consistent with the result of CD69 in Figure 4. We added the related description in the Result section.

      Please refer to the Results section from line 185 to 187 “CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4+ and CD8+ T cells from Xcr1+ cDC1 depleted mice compared to controls (Figure 4G and H)”.

      Similarly, MFI of CD44 was significantly lower on both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> mice compared to the control ApoE<sup>–/–</sup> mice (data shown below), which is consistent with the result of CD69 in Figure 7. We also added the related description in the Result section.

      Please refer to the Results section from line 308 to 309 “Crucially, CD69<sup>+</sup> frequency and CD44 MFI remained comparable in both aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells between two groups (Figure 7D-F).”

      [A] Hansson, Göran K, and Andreas Hermansson. “The immune system in atherosclerosis.” Nature immunology vol. 12,3 (2011): 204-12. doi:10.1038/ni.2001

      (4) Figure 7B: Beyond cDC1/2 proportions within cDCs, please report absolute counts of: Total cDCs, cDC1, and cDC2 subsets. Figure 7D: In addition to CD4+/CD8+ T cell proportions, the following should be included:

      (a) Total T cell numbers in aortas

      (b) Absolute counts of CD4+ and CD8+ T cells.

      Thanks for your suggestions. We have now included in Figure 7 the absolute counts of cDC, cDC1, and cDC2 cells, along with CD4<sup>+</sup> and CD8<sup>+</sup> T cells in aortic tissues. Additionally, we provide the corresponding CD44 mean fluorescence intensity (MFI) measurements for both CD4<sup>+</sup> and CD8<sup>+</sup> T cell populations. We added the related description in the Result section.

      Please refer to the Results section from line 303 to 311: “The flow cytometric results illustrated that both frequencies and absolute counts of Xcr1<sup>+</sup> cDC1 cells in the aorta were significantly reduced, but cDCs and cDC2 cells from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> were comparable with that from ApoE<sup>–/–</sup> (Figure 7A-C). Moreover, in both lymph node and spleen, the absolute numbers of pDC, cDC1 and cDC2 from Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> were comparable with that from ApoE<sup>–/–</sup> (Figure S11). Crucially, CD69<sup>+</sup> frequency and CD44 MFI remained comparable in both aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells between two groups (Figure 7D-F). However, aortic CD8<sup>+</sup> T cells exhibited reduced frequency and absolute count, while CD4<sup>+</sup> T cells showed increased frequency but unchanged counts in Xcl1<sup>–/–</sup> ApoE<sup>–/–</sup> mouse versus controls (Figure 7G and H).”

      (5) cDC1 depletion reduced CD69+CD4+ and CD69+CD8+ T cells, whereas Xcl1 depletion decreased Xcr1+ cDC1 cells without altering activated T cells. How do the authors explain these different results? This discrepancy needs explanation.

      We sincerely appreciate your professional and insightful comments regarding the mechanistic relationship between cDC1 depletion and T cell activation. Direct cDC1 depletion in the Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> micmodel removes both recruited and tissue-resident cDC1s, eliminating their multifunctional roles in antigen presentation, co-stimulation and cytokine secretion essential for T cell activation. In contrast, Xcl1 depletion reduces, but does not eliminate cDC1 migration into plaques. Furthermore, alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue cDC1 recruitment [13, 68, 69], and non-cDC1 APCs (e.g., monocytes, cDC2s) may compensate for T cell activation [55, 70]. We emphasize that Xcl1 depletion specifically failed to alter T cell activation in hyperlipidemic ApoE<sup>–/–</sup> mice. However, its impact may differ in other pathophysiological contexts due to compensatory mechanisms. We thank you again for highlighting this nuance, which strengthens our mechanistic interpretation. We have added these points to the discussion section and included new references.

      Please refer to the Discussion section from line 407 to 413: “Notably, while complete ablation of Xcr1<sup>+</sup> cDC1s impaired T cell activation, reduction of Xcr1<sup>+</sup> cDC1 recruitment via Xcl1 deletion did not significantly compromise this process. This discrepancy may arise through compensatory mechanisms: alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue Xcr1<sup>+</sup> cDC1 homing [13, 68, 69], while non-cDC1 antigen-presenting cells (e.g., monocytes, cDC2s) may sustain T cell activation [55, 70]. Furthermore, tissue-specific microenvironment factors could potentially modulate its role in other diseases.”. [13] Eisenbarth, S C. “Dendritic cell subsets in T cell programming: location dictates function.” Nature reviews. Immunology vol. 19,2 (2019): 89-103. doi:10.1038/s41577-018-0088-1 [55] Brewitz, Anna et al. “CD8+ T Cells Orchestrate pDC-XCR1+ Dendritic Cell Spatial and Functional Cooperativity to Optimize Priming.” Immunity vol. 46,2 (2017): 205-219. doi:10.1016/j.immuni.2017.01.003 [68] de Oliveira, Carine Ervolino et al. “CCR5-Dependent Homing of T Regulatory Cells to the Tumor Microenvironment Contributes to Skin Squamous Cell Carcinoma Development.” Molecular cancer therapeutics vol. 16,12 (2017): 2871-2880. doi:10.1158/1535-7163.MCT-17-0341.[69] He F, Wu Z, Liu C, Zhu Y, Zhou Y, Tian E, et al. Targeting BCL9/BCL9L enhances antigen presentation by promoting conventional type 1 dendritic cell (cDC1) activation and tumor infiltration. Signal Transduct Target Ther. 2024;9(1):139. Epub 2024/05/30. doi: 10.1038/s41392-024-01838-9. PubMed PMID: 38811552; PubMed Central PMCID: PMCPMC11137111.[70] Böttcher, Jan P et al. “Functional classification of memory CD8(+) T cells by CX3CR1 expression.” Nature communications vol. 6 8306. 25 Sep. 2015, doi:10.1038/ncomms9306.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 32 - The authors might want to add that the mouse model leads to a "constitutive" depletion of cDC1.

      Thanks for your advice, we have revised the sentence as follows.

      Please refer to the Results section from line 31 to 33: “we established Xcr1<sup>Cre-Gfp</sup> Rosa26<sup>LSL-DTA</sup> ApoE<sup>–/–</sup> mice, a novel and complex genetic model, in which cDC1 was constitutively depleted in vivo during atherosclerosis development”.

      (2) Line 187-188: The authors claim that T cell activation was "inhibited" if cDC1 was depleted. The data shows that the T cells were less activated, but there is no indication of any kind of inhibition; this should be corrected.

      Thanks for your advice, we have revised the sentence as follows.

      Please refer to the Results section from line 183 to 187: “Subsequently, we assessed T cell phenotype in the two groups of mice. While neither the frequencies nor absolute counts of aortic CD4<sup>+</sup> and CD8<sup>+</sup> T cells differed significantly between two groups of mice (Figure 4D-F), CD69 frequency and CD44 MFI (Mean Fluorescence Intensity), the T cell activation markers, were significantly reduced in both CD4<sup>+</sup> and CD8<sup>+</sup> T cells from Xcr1<sup>+</sup> cDC1 depleted mice compared to controls (Figure 4G and H)”.

      (3) Why are some splenic DC clusters absent in LNs and vice versa? This is not obvious to this reviewer and should at least be discussed.

      We appreciate the insightful question regarding the absence of certain splenic DC clusters in LNs. This phenomenon in Figure 5 aligns with the 'division of labor' paradigm in dendritic cell biology: tissue microenvironments evolve specialized DC subsets to address local immunological challenges. The absence of universal clusters reflects functional adaptation, not technical artifacts. We acknowledge that this tissue-specific heterogeneity warrants further discussion and have expanded our analysis to address this point in the discussion part of our manuscript.

      Please refer to the Discussion section from line 375 to 385: “This pronounced tissue-specific compartmentalization of Xcr1<sup>+</sup> cDC1 subsets may related to multiple mechanisms including developmental imprinting that instructs precursor differentiation into transcriptionally distinct subpopulations [62], and microenvironmental filtering through organ-specific chemokine axes (e.g., CCL2/CCR2 in spleen) selectively recruits receptor-matched subsets [63, 64]. This spatial specialization optimizes pathogen surveillance for local immunological challenges. Based on the maturation analysis of the cDC1 scRNA seq data [41], our findings suggest that the aortic cDC1 cells display a major difference from those of spleen and lymph nodes by lacking the mature clusters, whereas lymph node cDC1 cells contain an additional Fabp5<sup>+</sup> S100a4<sup>+</sup> late mature Cluster. Our results also suggest that hyperlipidemia contributes to alteration in early immature cDC1 and in the abundance of late immature cDC1 cells, which was associated with dramatic change in gene expression of Tnfaip3, Serinc3, Apol7c and Tifab”.

      [62]. Liu Z, Gu Y, Chakarov S, Bleriot C, Kwok I, Chen X, et al. Fate Mapping via Ms4a3-Expression History Traces Monocyte-Derived Cells. Cell. 2019;178(6):1509-25 e19. Epub 2019/09/07. doi: 10.1016/j.cell.2019.08.009. PubMed PMID: 31491389.

      [63]. Bosmans LA, van Tiel CM, Aarts S, Willemsen L, Baardman J, van Os BW, et al. Myeloid CD40 deficiency reduces atherosclerosis by impairing macrophages' transition into a pro-inflammatory state. Cardiovasc Res. 2023;119(5):1146-60. Epub 2022/05/20. doi: 10.1093/cvr/cvac084. PubMed PMID: 35587037; PubMed Central PMCID: PMCPMC10202633.

      [64]. Mildner A, Schonheit J, Giladi A, David E, Lara-Astiaso D, Lorenzo-Vivas E, et al. Genomic Characterization of Murine Monocytes Reveals C/EBPbeta Transcription Factor Dependence of Ly6C(-) Cells. Immunity. 2017;46(5):849-62 e7. Epub 2017/05/18. doi: 10.1016/j.immuni.2017.04.018. PubMed PMID: 28514690.

      [41]. Bosteels V, Marechal S, De Nolf C, Rennen S, Maelfait J, Tavernier SJ, et al. LXR signaling controls homeostatic dendritic cell maturation. Sci Immunol. 2023;8(83):eadd3955. Epub 2023/05/12. doi: 10.1126/sciimmunol.add3955. PubMed PMID: 37172103.

      (4) The authors should discuss how XCL1 could impact lesional cDC1 and T cell abundance. Notably, preDCs do not express XCR1, and T cells express XCL1 following TCR activation. Is there a recruitment or local proliferation defect of cDC1 in the absence of XCL1? Could there also be a role for NK cells as a potential source of XCL1?

      We appreciate your insightful questions regarding the differential effects of Xcl1 on cDC1s and T cells. Xcl1 primarily mediates the recruitment of mature cDC1s. Our data demonstrate that Xcl1 deletion significantly reduces aortic cDC1 abundance, which correlates with a concomitant decrease in CD8<sup>+</sup> T cell numbers within the aorta. These findings strongly suggest that the Xcl1-Xcr1 axis plays a regulatory role in T cell accumulation in aortic plaques.

      Consistent with prior studies [A, B], cDC1 recruitment can occur in the absence of Xcl1 which echoes our findings that cDC1 cells were still found in Xcl1 knockout aortic plaque but in lower abundance. It is very true that further studies are required to address how the Xcl1 dependent and independent cDC1 cells activate T cells and if they possess capability of proliferation in tissue differentially. We have added these points in discussion section.

      Please refer to the Discussion section from line 407 to 415: “Notably, while complete ablation of Xcr1<sup>+</sup> cDC1s impaired T cell activation, reduction of Xcr1<sup>+</sup> cDC1 recruitment via Xcl1 deletion did not significantly compromise this process. This discrepancy may arise through compensatory mechanisms: alternative chemokine axes (e.g., CCL5/CCR5, CXCL9/CXCR3, BCL9/BCL9L) may partially rescue Xcr1<sup>+</sup> cDC1 homing [13, 68, 69], while non-cDC1 antigen-presenting cells (e.g., monocytes, cDC2s) may sustain T cell activation [55, 70]. Furthermore, tissue-specific microenvironment factors could potentially modulate its role in other diseases. In summary, our findings identify Xcl1 as a potential therapeutic target for atherosclerosis therapy, though its cellular origins and regulation of lesional Xcr1<sup>+</sup> cDC1 and T cells dynamics require further studies”.

      In literatures, Xcl1 are expressed in NK cells and subsects of T cells, and NK cells can be a potential source of Xcl1 during atherosclerosis which deserve further investigations [A, C, D].

      [A] Böttcher, Jan P et al. “NK Cells Stimulate Recruitment of cDC1 into the Tumor Microenvironment Promoting Cancer Immune Control.” Cell vol. 172,5 (2018): 1022-1037.e14. doi:10.1016/j.cell.2018.01.004

      [B] He, Fenglian et al. “Targeting BCL9/BCL9L enhances antigen presentation by promoting conventional type 1 dendritic cell (cDC1) activation and tumor infiltration.” Signal transduction and targeted therapy vol. 9,1 139. 29 May. 2024, doi:10.1038/s41392-024-01838-9

      [C] Woo, Yeon Duk et al. “The invariant natural killer T cell-mediated chemokine X-C motif chemokine ligand 1-X-C motif chemokine receptor 1 axis promotes allergic airway hyperresponsiveness by recruiting CD103+ dendritic cells.” The Journal of allergy and clinical immunology vol. 142,6 (2018): 1781-1792.e12. doi:10.1016/j.jaci.2017.12.1005

      [D] Winkels, Holger et al. “Atlas of the Immune Cell Repertoire in Mouse Atherosclerosis Defined by Single-Cell RNA-Sequencing and Mass Cytometry.” Circulation research vol. 122,12 (2018): 1675-1688. doi:10.1161/CIRCRESAHA.117.312513

      Reviewer #2 (Recommendations for the authors):

      There is a logical error in line 298. I suggest revising to: "Collectively, these data suggest that Xcl1 promotes atherosclerosis by recruiting Xcr1+ cDC1 cells, which subsequently drive T cell activation in lesions."

      Thanks for your advice. Since Xcl1 deficiency reduced both the frequencies and absolute counts of Xcr1+ cDC1 and CD8+ T cells in lesions without affecting T cell activation, we revised the sentence as you suggested.

      Please refer to the Results section from line 314 to 315: “Collectively, these data suggest that Xcl1 promotes atherosclerosis by recruiting Xcr1<sup>+</sup> cDC1 cells, and facilitating CD8<sup>+</sup> T cell accumulation in lesions”.

    1. eLife Assessment

      This important study elucidates the molecular function of the SARS-CoV-2 helicase NSP13, which inhibits the transcriptional activity of the YAP/TEAD complex in vitro and in vivo. The evidence supporting the authors' claims is compelling, based on cell biological assays and multi-omic studies. This work contributes to the understanding of the new regulatory mechanism of YAP/TEAD after SARS-CoV-2 infection and will be of interest to researchers investigating COVID-19 infection and the Hippo-YAP signaling pathway.

    2. Reviewer #1 (Public review):

      In the revised manuscript, Meng et al. report that SARS-CoV-2 infection suppresses YAP target gene transcription in both patient lung samples and iPSC-derived cardiomyocytes. Among the tested viral proteins, the helicase nonstructural protein 13 (NSP13) was identified as a key factor that impairs YAP/TEAD transcriptional activity. Through mutagenesis and protein-protein interaction studies, the authors propose a mechanism where NSP13 binds YAP/TEAD complex, remodels chromatin structure, and recruits transcriptional repressors to inhibit YAP/TEAD's transcriptional activity.

      Overall, this study uncovers a novel regulation of Hippo signaling by SARS-CoV-2 through NSP13, suggesting a potential role of this growth-related pathway in host innate immune response to viral infection. While these findings are intriguing, future studies are needed to validate the involvement of YAP/TEAD in patient tissues and to assess their potential as therapeutic targets against SARS-CoV-2.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Meng et al. describes a role for the coronavirus helicase NSP13 in the regulation of YAP-TEAD-mediated transcription. The authors present data that NSP13 expression in cells reduces YAP-induced TEAD luciferase reporter activity and that NSP13 transduction in cardiomyocytes blocks hyperactive YAP-mutant phenotypes in vivo. Mechanisms by which viral proteins (particularly those from coronaviruses) intersect with cellular signaling events is an important research topic, and the intersection of NSP13 with YAP-TEAD transcriptional activity (independent of upstream Hippo pathway mediated signals) offers new knowledge that is of interest to a broad range of researchers.

      Strengths:

      The manuscript presents convincing data mapping the effects of NSP13 on YAP-TEAD reporter activity to the helicase domain. Moreover, the in vivo data demonstrating that NSP13 expression in YAP5SA mouse cardiomyocytes increased survival animal rates, and restored cardiac function is striking and is supportive of the model presented.

      Weaknesses:

      While there are some hints at the mechanisms by which NSP13 regulates YAP-TEAD activity through the identification of NSP13-associated proteins by mass spec, the relationships and functions of these factors in the context of YAP-TEAD regulation requires further study in the future.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Major points

      (1) The authors discovered a novel regulation of the Hippo-YAP pathway by SARS-CoV-2 infection but did not address the pathological significance of this finding. It remains unclear why YAP downstream gene transcription needs to be inhibited in response to SARS-CoV-2 infection. Is this inhibition crucial for the innate immune response to SARS-CoV-2? The authors should re-analyze their snRNA-seq and bulk RNA-seq data described in Figure 1 to determine whether any of the affected YAP downstream genes are involved in this process.

      We appreciate the reviewer’s suggestion to clarify the pathological significance of YAP pathway inhibition in SARS-CoV-2 infection. To address this, we re-analyzed our snRNA-seq and bulk RNA-seq datasets to determine whether YAP target genes overlap with known mediators of the innate immune response. As described in Fig. 1C, bulk RNA-seq revealed decreased expression of multiple YAP downstream targets linked to innate immune regulation (e.g., Thbs1, Ccl2, Axl, and Csf1) in SARS-CoV-2–infected cells in vitro.

      snRNA-seq of alveolar type I (AT1) cells from COVID-19 patients revealed a more complex landscape: While we observed reduced YAP activity overall (Fig. 1G), multiple YAP target genes involved in innate immunity and cytokine signaling were paradoxically elevated (Supplemental Fig. 1E). Several factors likelt explain these conflicting observations: 1. In the lung, AT1 cells (which are critical for gas exchange) may cell specifically respond to virus infection by upregulating genes related to immune response by other signaling pathway(s); 2. In vivo, SARS-CoV-2 infection triggers a surge in cytokines, chemokines, and other local factors that can differentially modulate YAP binding sites and thus affect its downstream targets, a complexity not fully captured in vitro; 3. YAP is highly sensitive to mechanical signals and tissue architecture. The 3D structure of altered cell–cell junctions in infected lung tissue, and fluid shear stress in the alveolar space could shape YAP target gene transcription differently from simplified monolayer cell cultures.

      We have expanded the results section of the new version to include the above points. We also acknowledge that ongoing and future work is needed to delineate the exact molecular and tissue-specific pathways through which YAP inhibition confers a potential advantage in combating SARS-CoV-2.

      (2) The authors concluded that helicase activity is required for NSP13-induced inhibition of YAP transcriptional activity based on mutation studies (Figure 3B). This finding is somewhat confusing, as K131, K345/K347, and R567 are all essential residues for NSP13 helicase activity while mutating K131 did not affect NSP13's ability to inhibit YAP (Figure 3B). Additionally, there are no data showing exactly how NSP13 inhibits the YAP/TEAD complex through its helicase function. This point was also not reflected in their proposed working model (Figure 4H).

      We appreciate the reviewer’s concerns regarding the helicase‐dependent inhibition of YAP by NSP13, particularly the roles of K131, K345/K347, and R567. Based on published structural and biochemical studies, each of these residues uniquely supports helicase function (1): K131 is crucial for stabilizing the NSP13 stalk region by interacting with S424. Substituting K131 with alanine (K131A) reduces helicase efficiency but does not completely abolish it; K345/K347 are key DNA‐binding residues, and mutating both (K345A/K347A) largely prevents NSP13 from binding DNA, thus eliminating unwinding. R567 is critical for ATP hydrolysis, and the R567A mutant retains DNA binding capacity but fails to unwind it. In Fig. 3B, K131A suppresses YAP transactivation to nearly the same extent as wild‐type NSP13, suggesting that partial helicase activity is sufficient for complete YAP/TEAD inhibition. Conversely, the K345A/K347A and R567A mutants show markedly diminished repression, underscoring the importance of DNA binding and ATP hydrolysis.

      As the new Fig. 4J illustrates, NSP13 must bind DNA and hydrolyze ATP to unwind nucleic acids. This helicase‐dependent process likely enables NSP13 to remodel chromatin structure by binding TEAD and properly organize YAP repressors at YAP/TEAD complex to prevent YAP/TEAD transactivation. In support of this mechanism, the K345A/K347A mutant, unable to anchor to DNA, fails to repress YAP and slightly increases YAP‐driven transcription (Fig. 3B), presumably by mislocalizing YAP repressors. Likewise, the ATPase‐dead R567A can bind DNA but does not unwind and remodel chromatin to recruit YAP repressors, resulting in a loss of YAP suppression (Fig. 3B and 3F). Our revised model demonstrates that both DNA binding and ATP‐dependent unwinding are essential for NSP13 to suppress YAP transcriptional activity. We have updated the results, discussion, and model accordingly.

      (3) The proposed model that NSP13 binds TEAD4 to recruit repressor proteins and inhibits YAP/TEAD downstream gene transcription (Figure 4H) needs further characterization. Second, NSP13 is a DNA-binding protein, and its nucleic acid-binding mutant K345A/K347A failed to inhibit YAP transcriptional activity (Figure 3B). The authors should investigate whether NSP13 could bind to the TEAD binding sequence or the nearby sequence on the genome to modulate TEAD's DNA binding ability. Third, regarding the identified nuclear repressors, the authors should validate the interaction of NSP13 with the ones whose loss activates YAP transcriptional activity (Figure 4G). Lastly, why can't NSP13 bind TEAD4 in the cytoplasmic fractionation if both NSP13 and TEAD4 are detected there (Figure 3B)? This finding indicates their interaction is not a direct protein-protein interaction but is mediated by something in the nucleus, such as genomic DNA.

      (1) Low TEAD expression in HEK293T cells: Our IP-MS experiments were performed in HEK293T cells, which, according to the Human Protein Atlas, express TEAD1–4 at comparatively low levels (TEAD1: 16.5, TEAD2: 16.4, TEAD3: 4.9, TEAD4: 38.7 nTPM). In contrast, HeLa cells, where we successfully validated NSP13-mediated YAP suppression (Fig. 4H, Supplementary Fig.5B-D), show higher expression of these TEAD isoforms (TEAD1: 97.1, TEAD2: 27.3, TEAD3: 12.2, TEAD4: 48.1 nTPM). Therefore, insufficient TEAD abundance in HEK293T cells may limit the sensitivity needed to detect TEAD–NSP13 interactions in our proteomic screens.

      (2) Transience and potential DNA dependence: Our co-immunoprecipitation (co-IP) experiments (Fig. 4B, Supplementary Fig.4C-E) indicated that NSP13–TEAD4 binding is low-affinity. Under standard IP-MS conditions (which typically do not include chemical cross-linkers or nucleic acids to stabilize transient complexes), weak or short-lived interactions can be lost during washes or sample processing.

      (3). Additional supporting evidence: We carefully checked our IP-MS data and found that the well-known TEAD binding proteins, including CTBP1/2 and GATA4, were pulled down, suggesting TEAD’s absence does not rule out an NSP13–TEAD association.

      (3a) We acknowledge that our NSP13 immunoprecipitation–mass spectrometry (IP-MS) did not identify any TEAD proteins (Fig. 4G and IP-MS tables). Several factors likely contributed to this outcome:

      (3b) We sincerely appreciate the reviewer’s insightful suggestion. While we agree that mapping NSP13 occupancy at individual TEAD-binding motifs is valuable, we respectfully consider this to be beyond the scope of the current study. Biochemical and structural work on coronavirus NSP13 shows that it recognizes nucleic‑acid substrates primarily through their 5′ single‑stranded overhang and duplex architecture, not through a defined base sequence(2, 3). Accordingly, our data (Fig. 3B and 3F) indicate that DNA binding ability, rather than recognition of a specific motif, enables NSP13 to perform its helicase activity in proximity to TEAD and recruit repressors. Moreover, the DNA‑binding mutant K345A/K347A and the ATPase‑dead mutant R567A both fail to suppress YAP/TEAD transcription despite retaining the ability to interact with TEAD (Fig. 3B). These loss‑of‑function phenotypes demonstrate that NSP13’s chromatin engagement and unwinding activity, rather than sequence‑restricted targeting, are essential for repression. For these reasons, motif‑specific binding assays were not pursued in this revision, but we clarified in the discussion that NSP13’s DNA engagement is likely structural or TEAD-dependent, rather than sequence‑directed. We also highlighted this as an important avenue for future investigation.

      (3c) To validate the NSP13 interacting proteins from our IP-MS data, we generated plasmids expressing several candidates (CCT3, SMARCD1, EIF4A1, LMNA, TTF2, and YY2) and performed co-IP assays. As predicted, we confirmed the robust interaction between NSP13 and TEAD (Supplemental Fig. 5E). However, these putative nuclear repressors exhibited weak binding to NSP13 compared with TEAD4, suggesting that NSP13 associates with them indirectly, possibly as part of a larger multiprotein complex or depending on the chromatin structure, rather than via direct protein–protein interaction (Fig. 4J).

      (3d) We appreciate the reviewer’s question. To investigate whether their association might be DNA‐dependent, we performed co‐IP experiments using nuclear lysates in the presence or absence of various nucleases: Universal Nuclease (which degrades all forms of DNA and RNA), DNase I (which cleaves both single‐ and double‐stranded DNA), and RNase H (which selectively cleaves the RNA strand in RNA/DNA hybrids). Our findings revealed that nucleic acid removal did not disrupt the NSP13/TEAD4 interaction (Supplemental Fig.4E), indicating that their binding is not solely mediated by DNA or RNA.

      Reviewer #2 (Public Review):

      Specific comments and suggestions for improvement of the manuscript:

      (1) NSP13 has been reported to block, in a helicase-dependent manner, episomal DNA transcription (PMID: 37347173), raising questions about the effects observed on the data shown from the HOP-Flash and 8xGTIIC assays. It would be valuable to demonstrate the specificity of the proposed effect of NSP13 on TEAD activation by YAP (versus broad effects on reporter assays) and also to show that NSP13 reduces the function of endogenous YAP-TEAD transcriptional activity (i.e., does ectopic NSP13 expression reduce the expression of YAP induced TEAD target genes in cells).

      We appreciate the reviewer’s comments and have carefully revisited the conclusions from the published paper(4) (PMID: 37347173), which reported that NSP13 suppresses episomal DNA transcription, as evidenced by reduced Renilla luciferase (driven by the herpes simplex virus thymidine kinase promoter) and GFP expression upon co‐expression with NSP13. For our experiments, we used a dual‐luciferase assay with Renilla luciferase (under the same promoter) as an internal control. After re-examining our raw Renilla luciferase data (now provided in the supplemental Excel file “Supporting data value”), we found that while 100 ng of NSP13 did not affect Renilla luciferase levels, 400 ng of NSP13 reduced them by approximately 50% relative to the YAP5SA‐only group (Supplemental Fig.2B, Fig.3C-D). We observed a similar reduction with NSP13 truncation mutants—an outcome not fully consistent with the published study (Supplemental Fig.3D, PMID: 37347173). However, unlike their finding of robust episomal DNA suppression, our data indicate that the K345A/K347A mutant of NSP13, which lacks DNA‐binding ability, completely lost its suppressive effect (Fig.3B).

      We performed additional Notch reporter assays to address the concern that NSP13 might nonspecifically inhibit episomal DNA transcription (including the HOP‑Flash and 8×GTIIC reporters). These experiments revealed that co‑expression of NSP13 with NICD (Notch intracellular domain) does not suppress Notch signaling (Supplemental Fig. 2C), indicating that NSP13 does not globally block all reporter systems. To evaluate whether NSP13 reduces endogenous YAP‑TEAD activity, we transiently overexpressed NSP13 WT and its R567A mutant in HeLa cells. However, bulk RNA‑seq and qPCR analyses did not reveal a clear decrease in YAP target genes, possibly due to the low transfection efficiency (< 50%, Supplemental Fig.4D). Interestingly, we observed that YAP5SA was predominantly retained in the nucleus upon NSP13 or R567A co‑expression, suggesting that NSP13 (or together with its interacting partners) restricts YAP5SA cytoplasmic shuttling. Future studies will involve stable cell lines expressing NSP13 WT or R567A to better characterize the mechanisms driving YAP5SA nuclear retention and clarify how NSP13 specifically suppresses YAP activity.

      (2) While the IP-MS experiment may have revealed new regulators of TEAD activity, the data presented are preliminary and inconclusive. No interactions are validated and beyond slight changes in TEAD reporter activity following knockdown, no direct links to YAP-TEAD are demonstrated, and no link to NPS13 was shown. Also, no details are provided about the methods used for the IP-MS experiment, raising some concerns about potential false positive associations within the data.

      We appreciate the reviewer’s feedback regarding our IP-MS findings and acknowledge that additional validation is required to establish definitive links between the identified putative regulators, YAP-TEAD, and NSP13. We have taken the following steps (and plan further experiments) to address these concerns:

      (2a) Co-IP validation: Same with the answer for Reviewer #1 (3c), we generated plasmids expressing several top candidate interactors from the IP-MS data (CCT3, SMARCD1, EIF4A1, LMNA, TTF2, and YY2) and performed direct co-IP assays in a more controlled setting. The results indicated that these putative NSP13 interactors had weaker binding compared to TEAD4, implying that NSP13 may associate with them as part of a larger complex or depending on the chromatin structure rather than through a direct protein–protein interaction (Fig. 4J).

      (2b) qPCR validation: Beyond reporter assays for evaluating YAP transactivation after the candidate YAP suppressor knockdown (Fig. 4H and Supplemental Fig. 5C), we performed qPCR to detect YAP activation on endogenous YAP-TEAD target genes (e.g., CTGF CYR61, and AMOTL2) after CCT3 knockdown. Expression of CTGF and CYR61 was higher compared to control (Supplemental Fig. 5D), strengthening the case for an interaction relevant to YAP-TEAD signaling.

      (2c) To investigate how NSP13‐interacting proteins link to the YAP/TEAD complex, we examined the IP‑MS dataset and identified several well‐known YAP and TEAD binding partners, including CTBP1/2 (TEAD‐binding), GATA4 (TEAD‐binding), and multiple 14‐3‐3 isoforms (YWHAZ/YWHAB/YWHAH/YWHAQ, YAP binding). These findings suggest that NSP13 may form a larger nuclear complex with YAP/TEAD and associated cofactors. In the future, we will determine whether these putative TEAD regulators also interact with NSP13 under various conditions (e.g., in the presence or absence of DNA) and whether co‐expression of NSP13 influences their association with YAP or TEAD. This approach will clarify how NSP13 might leverage these factors to regulate YAP‐TEAD function.

      (2e) For the mass spectrometry experiments, HEK293T cells were transfected with Flag‐YAP1, HA‐NSP13, or Flag‐YAP1 + HA‐NSP13 according to the manufacturer’s standard protocols. After nuclear extraction and lysis, the supernatant was incubated with HA magnetic beads to immunoprecipitate (IP) NSP13. The IP samples were subsequently analyzed by mass spectrometry to identify NSP13‐associated proteins (Fig. 4F). Each experimental condition was performed in duplicate to ensure reproducibility. We included an appropriate negative control (Flag‐YAP1) and stringent data‐filtering criteria to minimize false positives. We apologize for not including these details in our original Methods section; in this revised manuscript, we have fully described the number of replicates, the controls used, and our data analysis pipelines.

    1. eLife Assessment

      The study presents valuable theoretical insights by attempting to classify pattern-forming gene subnetworks and exploring their potential mechanisms. However, the results are incomplete, as they rely on oversimplified models, limited classifications, and assumptions that may not hold in more complex or realistic scenarios.

    2. Reviewer #1 (Public review):

      Summary:

      The authors tackle a long-standing question in developmental theory: given a gene-regulatory network that includes extracellular signalling, which topologies are even capable of transforming an initial spatial profile into a genuinely new pattern? Building on the classical reaction-diffusion framework in one dimension, but imposing biologically motivated constraints, they prove that every one-signal sub-network must be either Hierarchical (H), self-activating (L+), or self-inhibiting (L-). They further demonstrate that only three composite classes of full networks - pure H, a coupled L+ L- "Turing" pair, and an L- module fed by an intracellular positive loop ("noise-amplifying")-can create non-trivial spatial transformations. Analytical criteria and illustrative simulations are provided, together providing a closed taxonomy, which is supposed to be relevant for real systems.

      Strengths:

      (1) Useful classification framework. Reducing a vast number of possible gene circuits to three canonical pattern-forming motifs is a valuable organising insight for both theorists and experimentalists.

      (2) Logical completeness. All required cases are addressed, and the proofs elevate previous computational observations to formal statements.

      (3) Practical interpretability. Given a reaction network diagram, one can now decide (assuming the model applies to the real systems) whether spatial patterning is even possible, saving experimental effort on in-silico screens that could never succeed.

      Weaknesses:

      (1) The Results section is difficult to follow. Key logical steps and network configurations are described shortly in prose, which constantly require the reader to address either SI or other parts of the text (see numerous links on the requirements R1-R5 listed at the beginning of the paper) to gain minimal understanding. As a result, a scientifically literate but non-specialist reader may struggle to grasp the argument with a reasonable time invested.

      (2) A central step in the model formulation is the linearisation of the reaction term around a homogeneous steady state; higher-order kinetics, including ubiquitous bimolecular sinks such as A + B → AB, are simply collapsed into the Jacobian without any stated amplitude bound on the perturbations. Because the manuscript never analyses how far this assumption can be relaxed, the robustness of the three-class taxonomy under realistic nonlinear reactions or large spike amplitudes remains uncertain.

      (3) All modelling is confined to one spatial dimension, and the very definition of a "non-trivial" transformation is framed in terms of peak positions along a line, which clearly must be reformulated for higher dimensions. It's well-known that diffusions in 1, 2, and 3 dimensions are also dramatically different, so the relevance of the three-class taxonomy to real multicellular tissues remains unclear, or at least should be explained in more detail.

      Discussion:

      As stated above, there are several uncertainties about the relevance of the presented framework for real systems. However, if the results hold, researchers could look at a gene-network diagram and quickly judge whether it can make spatial patterns and, if so, which of the three known mechanisms it will use. That shortcut would save experimental and computational time. In the case that the results don't hold for the real systems, the authors' proof tools at least give theorists a solid base they can extend to more complex cases.

    3. Reviewer #2 (Public review):

      Summary:

      This study explores how gene regulatory networks that include intra- and extracellular signaling can give rise to spatial patterns of gene expression in cells. The authors investigate this question in a simplified theoretical framework, where all cells are assumed to respond identically to signals, and spatial details such as cell boundaries and extensions are abstracted away. Within this setting, they identify three distinct signaling topologies, referred to as L and H types, and combine them into three minimal subnetworks capable of generating patterns. The study analyzes possible combinations of these topologies and examines how each subnetwork behaves under three different initial conditions. Combining the analyses with mathematical proofs and heuristic arguments, the authors define necessary conditions under which such networks can produce non-trivial spatial patterns.

      Strengths:

      The authors break down larger gene regulatory networks into smaller subnetworks, which allows for a more tractable analysis of pattern formation. These minimal subnetworks are examined under different initial conditions, providing a range of examples for how patterns can emerge in simplified settings. The study also proposes necessary conditions for pattern formation, which may be useful for identifying relevant network structures. In addition, the manuscript offers heuristic explanations for the emergence of patterns in each subnetwork, which help to interpret the simulation results and analytical criteria.

      Weaknesses:

      (1) We have serious concerns regarding the validity of the simulation results presented in the manuscript. Rather than simulating the full nonlinear system described by Equation (1), the authors base their results on a truncated expansion (Equation S.8.2) that captures only the time evolution of small deviations around a spatially homogeneous steady state. However, it remains unclear how this reduced system is derived from the full equations - specifically, which terms are retained or neglected and why - and how the expansion of the nonlinear function can be steady-state independent, as claimed. Additionally, in simulations involving the spike plus homogeneous initial condition, it is not evident - or, where equations are provided, it is not correct - that the assumed global homogeneous background actually corresponds to a steady state of the full dynamics. We elaborate on these concerns in the following:

      It is assumed that the homogeneous steady states are given by g_i=0 and g_i=c_i, where 1/c_i = \mu_i or \hat{\mu}_i​, independently of the specific network structure. However, the basis for this assumption is unclear, especially since some of the functions do not satisfy this condition - for example, f5​ as defined below Eq. S8.10.5. Moreover, if g_i=c_i does not correspond to a true steady state, then the time evolution of deviations from this state is not correctly described by Eq. S8.2, as the zeroth-order terms do not vanish in that case.

      Additionally, the equations used contain only linear terms and a cubic degradation term for each species g_i, while neglecting all quadratic terms and cubic terms involving cross-species interactions (i≠j). An explanation for this selective truncation is not provided, and without knowledge of the full equation (f), it is impossible to assess whether this expansion is mathematically justified. If, as suggested in the Supplementary Information, the linear and cubic terms are derived from f, then at the very least, the Jacobian matrix should depend on the background steady-state concentration. However, the equations for the small deviation around a steady state (including the Jacobian matrix) used in the simulations appear to be independent of the particular steady state concentration.

      This is why we believe that the differences observed between the spike-only initial condition and the spike superimposed on a homogeneous background are not due to the initial conditions themselves, but rather result from a modified reaction scheme introduced through a questionable cutoff.

      "In simulations with spike initial patterns, the reference value g≡0 represents an actual concentration of 0 and therefore, we must add to (S8.2) a Heaviside function Φ acting of f (i.e., Φ(f(g))=f(g) if f(g)>0 , Φ(f(g))=0 if f(g){less than or equal to}0 ) to prevent the existence of negative concentrations for any gene product (i.e., g_i<0 for some i )." (SI chapter S8).

      This cutoff alters the dynamics (no inhibition) and introduces a different reaction scheme between the two simulations. The need for this correction may itself reflect either a problem in the original equations (which should fulfill the necessary conditions and prevent negative concentrations (R4 in main text)) or the inappropriateness of using an expanded approximation which assumes independence on the steady state concentration. It is already questionable if the linearized equations with a cubic degradation term are valid for the spike initial conditions (with different background concentration values), as the amplitude of this perturbation seems rather large.

      Lastly, we note that under the current simulation scheme, it is not possible to meaningfully assess criteria RH2a and RH2b, as they rely on nonlinear interactions that are absent from the implemented dynamics.

      (2) Most of the proofs presented in the Supplementary Information rely on linearized versions of the governing equations, and it remains unclear how these results extend to the fully nonlinear system. We are concerned that the generality of the conclusions drawn from the linear analysis may be overstated in the main text. For example, in Section S3, the authors introduce the concept of dynamic equivalence of transitive chains (Proposition S3.1) and intracellular transitive M-branching (Proposition S3.2), which pertains to the system's steady-state behavior. However, the proof is based solely on the linearized equations, without additional justification for why the result should hold in the presence of nonlinearities. Moreover, the linearized system is used to analyze the response to a "spike initial pattern of arbitrary height C" (SI Chapter S5.1), yet it is not clear how conclusions derived from the linear regime can be valid for large perturbations, where nonlinear effects are expected to play a significant role. We encourage the authors to clarify the assumptions under which the linearized analysis remains valid and to discuss the potential limitations of applying these results to the nonlinear regime.

      (3) Several statements in the main text are presented without accompanying proof or sufficient explanation, which makes it difficult to assess their validity. In some cases, the lack of justification raises serious doubts about whether the claims are generally true. Examples are:

      "For the purpose of clarity we will explain our results as if these cells have a simple arrangement in space (e.g., a 1D line or a 2D square lattice) but, as we will discuss, our results shall apply with the same logic to any distribution of cells in space." (Main text l.145-l.148).

      "For any non-trivial pattern transformation (as long as it is symmetric around the initial spike), there exists an H gene network capable of producing it from a spike initial pattern." (Main text l.366f).

      "In 2D there are no peaks but concentric rings of high gene product concentration centered around the spike, while in 3D there are concentric spherical shells." (Main text l. 447ff).

      (4) The study identifies one-signal networks and examines how combinations of these structures can give rise to minimal pattern-forming subnetworks. However, the analysis of the combinations of these minimal pattern-forming subnetworks remains relatively brief, and the manuscript does not explore how the results might change if the subnetworks were combined in upstream and downstream configurations. In our view, it is not evident that all possible gene regulatory networks can be fully characterized by these categories, nor that the resulting patterns can be reliably predicted. Rather, the approach appears more suited to identifying which known subnetworks are present within a larger network, without necessarily capturing the full dynamics of more complex configurations.

      (5) The definition of non-trivial pattern formation is provided only in the Supplementary Information, despite its central importance for interpreting the main results. It would significantly improve clarity if this definition were included and explained in the main text. Additionally, it remains unclear how the definition is consistently applied across the different initial conditions. In particular, the authors should clarify how slope-based measures are determined for both the random noise and sharp peak/step function initial states. Furthermore, the authors do not specify how the sign function is evaluated at zero. If the standard mathematical definition sgn(0)=0 is used, then even a simple widening of a peak could fulfill the criterion for non-trivial pattern transformation.

      (6) The manuscript lacks a clear and detailed explanation of the underlying model and its assumptions. In particular, it is not well-defined what constitutes a "cell" in the context of the model, nor is it justified why spatial features of cells - such as their size or boundaries - can be neglected. Furthermore, the concept of the extracellular space in the one-dimensional model remains ambiguous, making it unclear which gene products are assumed to diffuse.

    4. Reviewer #3 (Public review):

      Pattern formation is responsible for generating the spatial organization of cells, tissues, and organs during embryogenesis. It operates within a multifactorial system including initial conditions, gene regulatory networks, extracellular signals, mechanical forces, stochastic noise, and environmental inputs. Finally, it ensures the functional anatomy of an organism.

      This study focuses on the one central aspect in pattern formation: how spatial heterogeneity arises from an initial condition and evolves into a more complex or distinct spatial pattern (non-trivial pattern formation, as they termed). The authors made efforts to explore and characterize all possible ways to achieve the pattern formation. They do this by discussing how extracellular signals spread, how individual cells respond to those signals, and how those responses, in turn, modulate signal propagation.

      Finally, their comprehensive analysis summarizes that there are three classes of interactions between extracellular signals and intracellular responses, corresponding to previously known mechanisms that can generate spatial patterns: difference in morphogen concentrations in space, noise-amplification, and Turing pattern.

    1. eLife Assessment

      This study presents a sequence-based method for predicting drug-interacting residues in intrinsically disordered proteins (IDPs), addressing an important challenge in understanding small-molecule:IDP interactions. The findings have solid support in illustrative examples that underscore the role of aromatic interactions. While predicted binding sites remain coarse, validation was done on a total of 10 IDPs, four of which thoroughly and six others less so. The method builds on previous work from the authors, with necessarily ad hoc modifications, and offers a starting point for further exploration in this emerging field.

    2. Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential drug-interacting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts drug-interacting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow

      Weaknesses:

      (1) The DIRseq method is based on SeqDYN, which itself is a simple (which I do not mean as a negative - simple is good!) statistical predictor for R2 relaxation rates. The challenge here is that R2 rates cover a range of timescales, so the physical intuition as to what exactly elevated R2 values mean is not necessarily consistent with "drug interacting". Presumably, the authors are not using the helix boost component of SeqDYN here (it would be good to explicitly state this). This is not necessarily a weakness, but I think it would behove the authors to compare a few alternative models before settling on the DIRseq method, given the somewhat ad hoc modifications to SeqDYN to get DIRseq.

      Specifically, the authors previously showed good correlation between the stickiness parameter of Tesei et al and the inferred "q" parameter for SeqDYN; as such, I am left wondering if comparable accuracy would be obtained simply by taking the stickiness parameters directly and using these to predict "drug interacting residues", at which point I'd argue we're not really predicting "drug interacting residues" as much as we're predicting "sticky" residues, using the stickiness parameters. It would, I think, be worth the authors comparing the predictive power obtained from DIRseq with the predictive power obtained by using the lambda coefficients from Tesei et al in the model, local density of aromatic residues, local hydrophobicity (note that Tesei at al have tabulated a large set of hydrophobicity scores!) and the raw SeqDYN predictions. In the absence of lots of data to compare against, this is another way to convince readers that DIRseq offers reasonable predictive power.

      (2) Second, the DIRseq is essentially SeqDYN with some changes to it, but those changes appear somewhat ad hoc. I recognize that there is very limited data, but the tweaking of parameters based on physical intuition feels a bit stochastic in developing a method; presumably (while not explicitly spelt out) those tweaks were chosen to give better agreement with the very limited experimental data (otherwise why make the changes?), which does raise the question of if the DIRseq implementation of SeqDYN is rather over-parameterized to the (very limited) data available now? I want to be clear, the authors should not be critiqued for attempting to develop a model despite a paucity of data, and I'm not necessarily saying this is a problem, but I think it would be really important for the authors to acknowledge to the reader the fact that with such limited data it's possible the model is over-fit to specific sequences studied previously, and generalization will be seen as more data are collected.

      (3) Third, perhaps my biggest concern here is that - implicit in the author's assumptions - is that all "drugs" interact with IDPs in the same way and all drugs are "small" (motivating the change in correlation length). Prescribing a specific lengthscale and chemistry to all drugs seems broadly inconsistent with a world in which we presume drugs offer some degree of specificity. While it is perhaps not unexpected that aromatic-rich small molecules tend to interact with aromatic residues, the logical conclusion from this work, if one assumes DIRseq has utility, is that all IDRs bind drugs with similar chemical biases. This, at the very least, deserves some discussion.

      (4) Fourth, the authors make some general claims in the introduction regarding the state of the art, which appear to lack sufficient data to be made. I don't necessarily disagree with the author's points, but I'm not sure the claims (as stated) can be made absent strong data to support them. For example, the authors state: "Although an IDP can be locked into a specific conformation by a drug molecule in rare cases, the prevailing scenario is that the protein remains disordered upon drug binding." But is this true? The authors should provide evidence to support this assertion, both examples in which this happens, and evidence to support the idea that it's the "prevailing view" and specific examples where these types of interactions have been biophysically characterized.

      Similarly, they go on to say:

      "Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues." But again, where is the data to support this assertion? I don't necessarily disagree, but we need specific empirical studies to justify declarative claims like this; otherwise, we propagate lore into the scientific literature. The use of "typically" here is a strong claim, implying most IDP complexes behave in a certain way, yet how can the authors make such a claim?

      Finally, they continue to claim:

      "Such drug interacting residues (DIRs), akin to binding pockets in structured proteins, are key to optimizing compounds and elucidating the mechanism of action." But again, is this a fact or a hypothesis? If the latter, it must be stated as such; if the former, we need data and evidence to support the claim.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential drug-interacting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      We fully recognize that different compounds may have different interaction propensity profiles along the IDP sequence. In future studies, we will investigate compound-specific parameter values. The limiting factor is training data, but such data are beginning to be available.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts drug-interacting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow

      Weaknesses:

      (1) The DIRseq method is based on SeqDYN, which itself is a simple (which I do not mean as a negative - simple is good!) statistical predictor for R2 relaxation rates. The challenge here is that R2 rates cover a range of timescales, so the physical intuition as to what exactly elevated R2 values mean is not necessarily consistent with "drug interacting". Presumably, the authors are not using the helix boost component of SeqDYN here (it would be good to explicitly state this). This is not necessarily a weakness, but I think it would behove the authors to compare a few alternative models before settling on the DIRseq method, given the somewhat ad hoc modifications to SeqDYN to get DIRseq.

      Actually, the factors that elevate R2 are well-established. These are local interactions and residual secondary structures (if any). The basic assumption of our method is that intra-IDP interactions that elevate R2 convert to IDP-drug interactions. This assumption was supported by our initial observation that the drug interaction propensity profiles predicted using the original SeqDYN parameters already showed good agreement with CSP profiles. We only made relatively small adjustments to the parameters to improve the agreement. Indeed we did not apply the helix boost portion of SeqDYN to DIRseq, and will state as such. We will also compare DIRseq with several alternative models.

      Specifically, the authors previously showed good correlation between the stickiness parameter of Tesei et al and the inferred "q" parameter for SeqDYN; as such, I am left wondering if comparable accuracy would be obtained simply by taking the stickiness parameters directly and using these to predict "drug interacting residues", at which point I'd argue we're not really predicting "drug interacting residues" as much as we're predicting "sticky" residues, using the stickiness parameters. It would, I think, be worth the authors comparing the predictive power obtained from DIRseq with the predictive power obtained by using the lambda coefficients from Tesei et al in the model, local density of aromatic residues, local hydrophobicity (note that Tesei at al have tabulated a large set of hydrophobicity scores!) and the raw SeqDYN predictions. In the absence of lots of data to compare against, this is another way to convince readers that DIRseq offers reasonable predictive power.

      We will compare predictions of these various parameter sets, and summarize the results in a table.

      (2) Second, the DIRseq is essentially SeqDYN with some changes to it, but those changes appear somewhat ad hoc. I recognize that there is very limited data, but the tweaking of parameters based on physical intuition feels a bit stochastic in developing a method; presumably (while not explicitly spelt out) those tweaks were chosen to give better agreement with the very limited experimental data (otherwise why make the changes?), which does raise the question of if the DIRseq implementation of SeqDYN is rather over-parameterized to the (very limited) data available now? I want to be clear, the authors should not be critiqued for attempting to develop a model despite a paucity of data, and I'm not necessarily saying this is a problem, but I think it would be really important for the authors to acknowledge to the reader the fact that with such limited data it's possible the model is over-fit to specific sequences studied previously, and generalization will be seen as more data are collected.

      We have explained the rationale for the parameter tweaks, which were limited to q values for four amino-acid types, i.e., to deemphasize hydrophobic interactions and slightly enhance electrostatic interactions (p. 4-5). We will add that these tweaks were motivated by observations from MD simulations of drug interactions with a-syn (ref 20). As already noted in the response to the preceding comment, we will also present results for the original parameter values as well as for when the four q values are changed one at a time.

      (3) Third, perhaps my biggest concern here is that - implicit in the author's assumptions - is that all "drugs" interact with IDPs in the same way and all drugs are "small" (motivating the change in correlation length). Prescribing a specific lengthscale and chemistry to all drugs seems broadly inconsistent with a world in which we presume drugs offer some degree of specificity. While it is perhaps not unexpected that aromatic-rich small molecules tend to interact with aromatic residues, the logical conclusion from this work, if one assumes DIRseq has utility, is that all IDRs bind drugs with similar chemical biases. This, at the very least, deserves some discussion.

      The reviewer raises a very important point. In Discussion, we will add that it is important to further develop DIRseq to include drug-specific parameters when data for training become available.

      (4) Fourth, the authors make some general claims in the introduction regarding the state of the art, which appear to lack sufficient data to be made. I don't necessarily disagree with the author's points, but I'm not sure the claims (as stated) can be made absent strong data to support them. For example, the authors state: "Although an IDP can be locked into a specific conformation by a drug molecule in rare cases, the prevailing scenario is that the protein remains disordered upon drug binding." But is this true? The authors should provide evidence to support this assertion, both examples in which this happens, and evidence to support the idea that it's the "prevailing view" and specific examples where these types of interactions have been biophysically characterized.

      We will cite several studies showing that IDPs remain disordered upon drug binding.

      Similarly, they go on to say:

      "Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues." But again, where is the data to support this assertion? I don't necessarily disagree, but we need specific empirical studies to justify declarative claims like this; otherwise, we propagate lore into the scientific literature. The use of "typically" here is a strong claim, implying most IDP complexes behave in a certain way, yet how can the authors make such a claim? 

      Here again we will add citations to support the statement.

      Finally, they continue to claim:

      "Such drug interacting residues (DIRs), akin to binding pockets in structured proteins, are key to optimizing compounds and elucidating the mechanism of action." But again, is this a fact or a hypothesis? If the latter, it must be stated as such; if the former, we need data and evidence to support the claim. 

      We will add citations to both compound optimization and mechanism of action.

    1. eLife Assessment

      This work presents valuable new information on the microtubule-binding mode of the microtubule kinesin-13, MCAK, the authors use quantitative single-molecule studies to propose that MCAK preferentially binds to a GDP-Pi-tubulin portion of the microtubule end. However, the evidence provided to support this claim remains incomplete and would benefit from more rigorous methodology particularly the diffraction limited experiments do not provide sufficient spatial resolution to support the authors' conclusions. In addition, a more through discussion of the existing literature would further strengthen the manuscript.

    2. Reviewer #1 (Public review):

      The authors responded to multiple criticisms with additional data and more detailed statistics, in some instances improving the quality of the work. However, I had difficulty understanding some of the authors' responses. The logic was not always apparent, the writing was occasionally confusing or would benefit from more careful wording, and some of the provided responses were superficial or raised new concerns. In some cases, the underlying data needed to support their responses were not shown. Thus, the current version of the manuscript does not sufficiently resolve the following critical issues raised by myself and other reviewers.

      (1) A clear new insight into a physiological process or cellular behavior remains lacking. The study largely confirms prior observations of MCAK binding to both the microtubule wall and end. However, it is still unclear whether direct binding to the tip-as opposed to accumulation via wall diffusion or interaction with other tip-binding proteins-is a significant mechanism.

      (2) The newly revealed adenosine-nucleotide-dependent binding preferences do not help clarify MCAK's catalytic function or its mechanisms of tip recognition. Consequently, the final summary figure remains speculative and is not convincingly supported by the data. It is also unclear what exactly is meant by the "working model" (figure title), or by the claim of "a simple rule of how the end-binding regulators coordinate their activities" (abstract).

      (3) As noted in my previous review, the effects of adding different adenosine nucleotides on MCAK binding to microtubules are much more pronounced than the differences in MCAK binding to tubulin with various guanosine-containing nucleotides, or to lattice versus tip (e.g., Fig. 5E). Therefore, the manuscript title-"MCAK recognizes the nucleotide-dependent feature at growing microtubule ends"-does not do justice to the scale of these effects.

      (4) The title implies that MCAK selectively recognizes a feature determined by the tubulin-bound guanosine nucleotide. However, the authors frequently claim that MCAK binds to the "entire GTP cap." It appears that they exclude structural protrusions from their definition of the cap, which is debatable. Even using their definition, the conclusion that MCAK recognizes a specific "nucleotide-dependent feature" seems inconsistent with the claim that it binds uniformly across the cap. These distinctions were not made clear.

      (5) Some important technical details are still absent. For example, when reading the authors' response to another reviewer's question, I could not find an explanation of how the kon values for end and wall binding were calculated. These calculations clearly require assumptions, e.g. about the number of binding sites, but these details are not described. In addition, the binding data are expressed in units per tubulin dimer, which are non-standard and make comparisons to other published results difficult. There are other instances where more technical detail would be desirable, but they are too numerous to list here.

      (6) Several aspects of data presentation as graphs will make it difficult for other researchers to analyze or interpret the findings. Numerical Excel-style data sheets should be provided for all measurements, including raw data-not just the ratios or derived values shown in plots. Other, more significant issues include use of mean values for non-Gaussian distributions (e.g., dwell times); binding affinities inferred from single-concentration measurements, often under varying conditions (e.g., Figs. 3C, 4); and absence of side-by-side plotted controls (e.g., Fig. 6).

      (7) While the authors have added some quantitative values and descriptive detail, the manuscript still lacks a critical comparison of their findings with existing literature. This weakens the impact of the study and limits the reader's ability to place the results in a broader context.

    3. Reviewer #4 (Public review):

      The revised manuscript from Chen et al. implements many of the changes requested by the 3 reviewers of the initial submission. These changes are well-described in the corresponding Response to Reviews document. Of course, not every request from the reviewers was addressed, and the following major concerns remain:

      (1) The authors argue that MCAK binds to the same region as EB proteins, which they refer to as the "EB cap". Reviewers asked for experiments that would increase the size of the EB cap to create "comets" (e.g. by increasing the microtubule growth rate); the prediction is that the MCAK signal should increase in size as well. The authors declined to pursue these experiments. As a result, the EB signals and MCAK signals are diffraction-limited spots, as opposed to the predicted exponential decay signals characteristic of EB comets. The various diffraction-limited spots are then aligned with the diffraction-limited signal of the microtubule end. These alignments and sub-pixel comparisons are technically challenging. The revised manuscript does not go far enough to provide compelling evidence that all technical challenges were overcome. Thus, while the authors can safely conclude that MCAK, EBs, and the microtubule end do occupy the same diffraction-limited spot, more precise conclusions are not supported.

      (2) The reviewers criticized the initial manuscript for neglecting key references, particularly Kinoshita et al., Science 2001. Indeed, I cannot fathom writing a manuscript about MCAK and XMAP215 without putting a citation to such a landmark paper front and center. The authors have responded by including more discussion of the relevant literature (and citing Kinoshita et al.). However, the revised manuscript is often still cursory in giving credit where credit is due, contextualizing the new data, and generally engaging with the scholarship on MCAK.

      (3) The data presented does not include a simple measurement of the impact of MCAK on the catastrophe frequency of microtubules. The authors explain this absence by pointing out that their movies are short (5 min) and high frame rate (10 fps). While I understand that such imaging parameters are necessary to capture single molecule end-binding events, I do not understand why a separate set of experiments could not be performed. This type of "positive control" is often missing, as pointed out by the 3 reviewers.

      (4) Salt conditions, protein concentrations, and other key experimental parameters are not varied, even when varying them would provide excellent tests of the authors' hypotheses.

      In summary, the revised manuscript is improved in many ways, but the interested reader should look carefully at the previous reviews and compare the measurements presented here with those of other labs.

    1. eLife Assessment

      By taking advantage of noise in gene expression, this important study introduces a new approach for detecting directed causal interactions between two genes without perturbing either. The main theoretical result is supported by a proof. Preliminary simulations and experiments on small circuits are solid, but further investigations are needed to demonstrate the broad applicability and scalability of the method.

    2. Reviewer #2 (Public Review):

      Summary:

      This paper describes a new approach to detecting directed causal interactions between two genes without directly perturbing either gene. To check whether gene X influences gene Z, a reporter gene (Y) is engineered into the cell in such a way that (1) Y is under the same transcriptional control as X, and (2) Y does not influence Z. Then, under the null hypothesis that X does not affect Z, the authors derive an equation that describes the relationship between the covariance of X and Z and the covariance of Y and Z. Violation of this relationship can then be used to detect causality.

      The authors benchmark their approach experimentally in several synthetic circuits. In 4 positive control circuits, X is a TetR-YFP fusion protein that represses Z, which is an RFP reporter. The proposed approach detected the repression interaction in 2 of the 4 positive control circuits. The authors constructed 16 negative control circuit designs in which X was again TetR-YFP, but where Z was either a constitutively expressed reporter, or simply the cellular growth rate. The proposed method detected a causal effect in two of the 16 negative controls, which the authors argue is perhaps not a false positive, but due to an unexpected causal effect. Overall, the data support the potential value of the proposed approach.

      Strengths:

      The idea of a "no-causality control" in the context of detected directed gene interactions is a valuable conceptual advance that could potentially see play in a variety of settings where perturbation-based causality detection experiments are made difficult by practical considerations.

      By proving their mathematical result in the context of a continuous-time Markov chain, the authors use a more realistic model of the cell than, for instance, a set of deterministic ordinary differential equations.

      The authors have improved the clarity and completeness of their proof compared to a previous version of the manuscript.

      Limitations:

      The authors themselves clearly outline the primary limitations of the study: The experimental benchmark is a proof of principle, and limited to synthetic circuits involving a handful of genes expressed on plasmids in E. coli. As acknowledged in the Discussion, negative controls were chosen based on the absence of known interactions, rather than perturbation experiments. Further work is needed to establish that this technique applies to other organisms and to biological networks involving a wider variety of genes and cellular functions. It seems to me that this paper's objective is not to delineate the technique's practical domain of validity, but rather to motivate this future work, and I think it succeeds in that.

      Might your new "Proposed additional tests" subsection be better housed under Discussion rather than Results?

      I may have missed this, but it doesn't look like you ran simulation benchmarks of your bootstrap-based test for checking whether the normalized covariances are equal. It would be useful to see in simulations how the true and false positive rates of that test vary with the usual suspects like sample size and noise strengths.

      It looks like you estimated the uncertainty for eta_xz and eta_yz separately. Can you get the joint distribution? If you can do that, my intuition is you might be able to improve the power of the test (and maybe detect positive control #3?). For instance, if you can get your bootstraps for eta_xz and eta_yz together, could you just use a paired t-test to check for equality of means?

      The proof is a lot better, and it's great that you nailed down the requirement on the decay of beta, but the proof is still confusing in some places:

      On pg 29, it says "That is, dividing the right equation in Eq. 5.8 with alpha, we write the ..." but the next equation doesn't obviously have anything to do with Eq. 5.8, and instead (I think) it comes from Eq 5.5. This could be clarified.

      Later on page 29, you write "We now evoke the requirement that the averages xt and yt are stationary", but then you just repeat Eq. 5.11 and set it to zero. Clearly you needed the limit condition to set Eq. 5.11 to zero, but it's not clear what you're using stationarity for. I mean, if you needed stationarity for 5.11 presumably you would have referenced it at that step.

      It could be helpful for readers if you could spell out the practical implications of the theorem's assumptions (other than the no-causality requirement) by discussing examples of setups where it would or wouldn't hold.

    3. Author response:

      The following is the authors’ response to the previous reviews

      We have made the following small adjustments and resubmit the manuscript to be published as a Version of Record with eLife.

      Changes in main text of the manuscript:

      We have moved the “Proposed additional tests” subsection to the Discussion section as suggested by the referee. 

      We have added a link to a Github repository and a link to a Zenodo data repository at the beginning of the Materials and Methods section in the “Data and materials availability” subsection. The Github repository contains simulation code and data, and single-cell data analysis code. The Zenodo link contains our experimental data (we await your confirmation before we publish it officially on Zenodo).   

      Changes in the supplemental information files

      We have fixed the typo on page 29 of the SI in which Eq. (8) was referred to in a derivation. It should be Eq. (5) instead. We thank the referee for catching this mistake which has now been corrected.

      We have fixed a typo on page 29 of SI, in which the word “evoke” is now “invoke”.  

      We have clarified the derivation on page 29 of the SI. The referee is correct that the limit condition was used to set the right-hand side of Eq. (5.11) to zero.

    1. eLife Assessment

      This important study reports an advancement in the diagnosis of Animal African Trypanosomosis (AAT), which adapts a CRISPR-based diagnostic tool (SHERLOCK4AAT) to detect different trypanosome species responsible for AAT. The evidence supporting the conclusions is convincing and in line with the current state-of-the-art diagnostics. This study will be of interest to the fields of Epidemiology, Public Health, and Veterinary Medicine.

    2. Reviewer #1 (Public review):

      Summary:

      The authors developed SHERLOCK4AAT, a CRISPR-Cas13a-based diagnostic toolbox for detecting multiple trypanosome species responsible for animal African trypanosomiasis. They created species-specific assays targeting six prevalent parasite species and validated the system using dried blood spots from domestic pigs in Guinea and Côte d'Ivoire. Field testing revealed high infection rates (62.7% of pigs infected) and, notably, the presence of human-infective parasites in domestic animals.

      Major Strengths:

      This study represents a valuable application of CRISPR-based detection technology to veterinary diagnostics, with strong potential for practical implementation. The authors conducted comprehensive validation, including statistical analyses to determine sensitivity and specificity, and demonstrated field utility through large-scale testing of 424 samples from two geographically distinct regions. The detection of human-infective parasites in pigs at both sites provides important One Health insights supporting integrated disease surveillance and has direct implications for public health policy and disease elimination programs. The methodology is robust, incorporating Bayesian statistical modeling and offering clear practical advantages such as dried blood spot compatibility and detection of active infections. The revised manuscript also addresses implementation considerations, including cost, training needs, and field logistics.

      Major Weaknesses:

      Some technical limitations constrain broader applicability. The assay for one key parasite species (T. vivax) shows suboptimal sensitivity, which may limit its utility in detecting this important pathogen. The current assay design does not distinguish between closely related species within the same subgenus-an important factor for certain epidemiological studies. Additionally, some assays relied on synthetic controls due to unavailable biological material, and the discussion on potential cross-reactivity with related kinetoplastid parasites is limited.<br /> Achievement of Aims: The authors clearly achieved their primary objectives of developing a sensitive, species-specific diagnostic system and demonstrating its applicability in real-world settings. The detection of human-infective trypanosomes in domestic pigs provides valuable epidemiological evidence in support of One Health strategies and targeted disease elimination efforts.

      Impact and Utility:

      This work responds to a well-documented need in veterinary diagnostics, where current methods often lack sensitivity or species discrimination. The system offers practical benefits for resource-limited settings through a short assay duration and compatibility with dried blood spot samples. While certain performance limitations may restrict broader adoption, the species identification capability represents a substantial advancement over existing approaches. The findings enhance our understanding of parasite diversity in livestock and their potential role as zoonotic reservoirs, with implications extending beyond veterinary medicine to public health surveillance and policy development.

      Context:

      This study makes a timely and relevant contribution to diagnostic epidemiology and One Health surveillance frameworks. The field-adapted use of advanced molecular detection technologies represents a significant step toward improved disease monitoring in regions where trypanosomiasis poses ongoing threats to animal health, agriculture, and human livelihoods. The cross-disciplinary implications for veterinary medicine, public health, and disease elimination programs underscore the broader significance of this work.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript is fundamental due to the significance of its findings. The strength of the evidence is compelling, and the manuscript is publishable since the corrections have been made.

      Strengths:

      Using a Novel SHERLOCK4AAT toolkit for diagnosis.

      Identification of various sub-species of Trypanosomes.

      Differentiating the animal sub-species from the human one.

      Corrections Made:

      Definite articles have been removed from the title.

      The words of the title have been reduced to 15.

      Typographical errors have been corrected.

      Weaknesses:

      None

    4. Reviewer #3 (Public review):

      Summary:

      The study adapts CRISPR-based detection toolkit (SHERLOCK assay) using conserved and species-specific targets for the detection of some members of the Trypanosomatidae family of veterinary importance and species-specific assays to differentiate between the six most common animal trypanosomes species responsible for AAT (SHERLOCK4AAT). The assays were able to discriminate between Trypanozoon (T. b. brucei, T. evansi and T. equiperdum), T. congolense (Savanah, Forest Kilifi and Dzanga sangha), T. vivax, T. theileri, T. simiae and T. suis. The design of both broad and species-specific assays was based primarily on sequences of the 18S rRNA, GAPDH (Glyceraldehyde-3-phosphate dehydrogenase) and invariant flagellum antigen (IFX) genes for species identification. Most importantly the authors showed varying limit of detection for the different SHERLOCK assays which is somewhat comparable to PCR-derived molecular techniques currently used for detecting animal trypanosomes even though some of these methodologies have used other primers that target genes such as ITS1 and 7SL sRNA.

      The data presented in the study are particularly useful and of significant interest for diagnosis of AAT in affected areas.

      Strengths:

      The assays convincingly allow for the analysis and detection of most trypanosomes in AAT

      Weaknesses:

      Inability for the assay to distinguish T. b. brucei, T. evansi and T. equiperdum using the 18S rRNA gene as well as the IFX gene not achieving the sensitivity requirements for detection of T. vivax. Both T. brucei brucei and T. vivax are the most predominant infective species in animals (in addition to T. congolense), therefore a reliable assay should be able to convincingly detect these to allow for proper use of diagnostic assay.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses a critical gap in veterinary diagnostics by developing a CRISPR-based diagnostic toolbox (SHERLOCK4AAT) for detecting animal African trypanosomosis. It describes the development and field deployment of SHERLOCK4AAT, a CRISPR-Cas13-based diagnostic toolbox for the eco-epidemiological surveillance of animal African trypanosomosis (AAT) in West Africa.The authors successfully created and validated species-specific assays for multiple trypanosomes, including T. congolense, T. vivax, T. theileri, T. simiae, and T. suis, alongside pan-trypanosomatid and pan-Trypanozoon assays. The field validation in pigs from Guinea and Côte d'Ivoire revealed high trypanosome prevalence (62.7%), frequent co-infections, and importantly identified T. b. gambiense in one animal at each site, suggesting pigs may serve as potential reservoirs for this human-infective parasite.

      A major strength of the study lies in its methodological innovation. By adapting SHERLOCK to target both conserved and species-discriminating sequences, the authors achieved high sensitivity and specificity in detecting Trypanosoma species. Their use of dried blood spots, validated thresholds through ROC analyses, and statistical robustness (e.g., Bayesian latent class modeling) provides a strong foundation for their conclusions.

      The results are significant: over 60% of pigs tested positive for at least one trypanosome species, with co-infections observed frequently and T. b. gambiense detected in pigs at both sites. These findings have direct implications for the role of animal reservoirs in human disease transmission and underscore the value of pigs as sentinel hosts in gHAT elimination efforts.

      The limitations are well acknowledged, particularly the suboptimal sensitivity of the T. vivax assay and the reliance on synthetic controls for T. suis and T. simiae. However, these limitations do not undermine the overall conclusions, and the paper provides a clear roadmap for further assay refinement and implementation.

      This study offers a timely, impactful, and well-substantiated contribution to the field. The SHERLOCK4AAT toolbox holds promise for improving AAT diagnostics in resource-limited settings and advancing One Health surveillance frameworks.

      Thank you

      Strengths: 

      (1) The adaptation of SHERLOCK technology for AAT represents a significant technical advancement, offering higher sensitivity than traditional parasitological methods and the ability to detect multiple species simultaneously.

      (2) Rigorously performed with validation using appropriate controls, ROC curve analyses, and Bayesian latent class modelling, establishing clear analytical sensitivity and specificity for most assays.

      (3) Testing 424 pig samples across two countries provides robust evidence of the tool's utility and reveals important epidemiological insights about trypanosome diversity and prevalence.

      (4) The identification of T. b. gambiense in pigs at both sites has significant implications for HAT elimination strategies and highlights the need for integrated One Health approaches.

      (5) The use of dried blood spots and RNA detection for active infections makes the approach practical for field surveillance in resource-limited settings.

      Thank you

      Weaknesses: 

      (1) The manuscript would benefit from more detailed discussion of practical considerations such as cost, equipment requirements, and training needs for implementing SHERLOCK in endemic areas and rural settings which would improve applicability.

      This is now adressed in the revised discussion (end of the first section).

      (2) Limited discussion of pig selection criteria: More justification for choosing pigs as sentinel animals and discussion of potential limitations of this approach would strengthen the manuscript.

      Yes, this is now more clearly explained in the revised discussion (beginning of the first section).

      (3) More details on why certain genes were targeted would strengthen the methods.

      The first result section ‘Selection of targets for broad and species-specific SHERLOCK assays targeting AAT species (SHERLOCK4AAT)’ is already dedicated to extensively explaining target selection, hence we’re afraid we don’t know what could be added.  

      (4) Table formatting could be improved for readability. 

      (5) Some figures are complex and would benefit from additional explanations in the legends.

      We have tried to improve these two aspects as much as possible in the revised manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript is important due to the significance of the findings. The strength of evidence is convincing.

      Thank you

      Strengths: 

      (1) Using a Novel SHERLOCK4AAT toolkit for diagnosis. 

      (2) Identification of various sub-species of Trypanosomes. 

      (3) Differentiating the animal subspecies from the human one. 

      Thank you

      Weaknesses: 

      (1) The title is too long, and the use of definite articles should be reduced in the title.

      The title has been improved in the revised version.

      (2) The route of blood sample collection in the animals should be well defined and explained.

      This has been more clearly explained in the revised method section.

      Reviewer #3 (Public review):

      Summary: 

      The study adapts CRISPR-based detection toolkit (SHERLOCK assay) using conserved and species-specific targets for the detection of some members of the Trypanosomatidae family of veterinary importance and species-specific assays to differentiate between the six most common animal trypanosome species responsible for AAT (SHERLOCK4AAT). The assays were able to discriminate between Trypanozoon (T. b. brucei, T. evansi, and T. equiperdum), T. congolense (Savanah, Forest Kilifi, and Dzanga sangha), T. vivax, T. theileri, T. simiae, and T. suis. The design of both broad and species-specific assays was based primarily on sequences of the 18S rRNA, GAPDH (Glyceraldehyde-3-phosphate dehydrogenase), and invariant flagellum antigen (IFX) genes for species identification. Most importantly, the authors showed varying limits of detection for the different SHERLOCK assays, which is somewhat comparable to PCR-derived molecular techniques currently used for detecting animal trypanosomes, even though some of these methodologies have used other primers that target genes such as ITS1 and 7SL sRNA. <br /> The data presented in the study are particularly useful and of significant interest for the diagnosis of AAT in affected areas.

      Thank you

      Strengths: 

      The assays convincingly allow for the analysis and detection of most trypanosomes in AAT.

      Thank you

      Weaknesses: 

      Inability for the assay to distinguish T. b. brucei, T. evansi, and T. equiperdum using the 18S rRNA gene, as well as the IFX gene, not achieving the sensitivity requirements for detection of T. vivax.  Both T. brucei brucei and T. vivax are the most predominant infective species in animals (in addition to T. congolense), therefore, a reliable assay should be able to convincingly detect these to allow for proper use of the diagnostic assay.

      We agree with this point and aim to improve the toolbox for future studies.

      Reviewer #1 (Recommendations for the authors):

      (1) Provide additional details on the practicality of SHERLOCK deployment in the field, including training, costs, and infrastructure (potential challenges for field deployment, including suggestions for how to overcome these barriers).

      This is now adressed in the revised discussion (end of the first section).

      (2) Provide more detailed justification for choosing pigs as the main study species and discuss potential benefits and limitations of extending the approach to other livestock species.

      Yes, this is now more clearly explained in the revised discussion (beginning of the first section).

      (3) Add a comparison table comparing SHERLOCK4AAT performance metrics (sensitivity, specificity, LoD) with existing molecular diagnostic methods for AAT for ease of reference.

      There are dozens of different serological, immunological and molecular approaches with highlty variable levels of sensitivity and specificities already reviewed and compared in detail in two references from 2022 (Desquesnes et al. a and b), which we have cited, as well as in a newly added reference (EBHODAGHE F acta trop 2018). Hence, we decided to only refer to the most comparable studies in the present article.

      (4) Review complex figures and improve legends for better readability and interpretation.

      We have tried to improve this as much as possible in the revised manuscript.

      Reviewer #2 (Recommendations for the authors): 

      (1) Reduce the number of words in the title from 28 to not more than 20.

      The title has been improved in the revised version.

      (2) Specify the particular route of collection of blood samples in the various animals.

      Yes, this is now more clearly explained in the revised method section.

      (3) Correct all typographical errors. 

      We have tried to improve this as much as possible in the revised manuscript.

      Thanks. I wish you the best in your publication process. 

      Thank you

      Reviewer #3 (Recommendations for the authors): 

      Minor comments 

      (1) The authors can expand the discussion to include other recent diagnostic assays for Animal trypanosomiasis, such as those that target other genes like tubulin.

      Please see response to Review 1 point #3 above.

      (2) The cost-effectiveness of the use of the assay can be discussed since the assay is expected to be used for work in some resource-deprived areas. For example, will it cost a researcher less to do a diagnosis with this assay relative to what is already available?

      This is now adressed in the revised discussion (end of the first section).

      (3) Is Cote d'Ivoire more endemic for AAT than Guinea? Will this account for the apparently consistent differences in the percentage of positive samples, or just because of the type of samples used from the two locations?

      As the sampling method, sample preservation and sample analysis were the same for both groups - yes, it appears that pigs, at least for domesticated ones, in the study region of Cote d'Ivoire were more frequently infected than those in the study region of Guinea. It is however risky to extrapolate these observations to the AAT prevalence in the entire countries and/or to other mammals.

      (4) Can the authors comment on how long one can store the samples for an effective and reliable assay?

      The samples can be stored for several months at ambient temperature in a sealed bag with silica gel packages to reduce humidity. We have added this detail in the revised methods section.

      (5) It is not clear whether the authors used conventional molecular diagnostics to compare the data obtained from this particular cohort of animals as reference is made to published data. It is not surprising that the SHERLOCK performed better than using parasitology-based methodology.

      This is now adressed in the revised discussion.

      (6) (Figure 4D-5D) should be 4D and 5D.

      Thank you, this has been corrected.

    1. eLife Assessment

      This useful study integrates experimental methods from materials science with psychophysical methods to investigate how frictional stabilities influence tactile surface discrimination. The authors argue that force fluctuations arising from transitions between frictional sliding conditions facilitate the discrimination of surfaces with similar friction coefficients. However, the reliance on friction data from an artificial finger, combined with correlational analyses that fall short of establishing a mechanistic link to perception, renders the findings incomplete.

    2. Reviewer #2 (Public review):

      This is a revised version of a paper I reviewed previously.

      Again, the purpose of the paper is to suggest that common metrics, such as friction or any given physical property of the surface, are probably inadequate to predict the perception of the surface or its discriminability. Instead, the authors propose a very interesting and original idea that, instead, frictional instabilities are related to fine touch perception (title).

      Overall, the authors have put much effort into improving the manuscript, enhancing clarity, and avoiding overstatements. And I feel the narrative is indeed much improved and less ambiguous.

      However, the authors have systematically avoided addressing the main comment of all reviewers: the link made between the mock finger passive experiment and the active human psychophysics is incorrect and should not be done, because its interpretation could be flawed.<br /> - First, this link is very weak (the correlation of 6 datapoints is barely significant).<br /> - Second, the real and mock fingers have very different properties (think about moisture, compliance, roughness,...).<br /> - Third, the comparison is made between a passive and well-controlled experiment and an active exploration. Yet, the comparison metrics (number of events) are clearly dependent on exploration procedures.

      In your response to my comments:<br /> "We have made changes throughout the manuscript to acknowledge that our findings are correlative, clarifying this throughout, and incorporating into the discussion how our work may enable biomechanical measurements and tactile decision making models"

      The authors admit that the analysis is flawed, yet they did not remove it. If they cannot demonstrate that the mock finger and the human finger behave the same way during the perceptual experiment, then they should remove Fig2 that combines apples and oranges. OR, they should look at the active exploration data and compute the same metrics on that data.

      "This "weird choice" is the central innovation of this paper. This choice was necessary because we demonstrated that the common usage of friction coefficient is fundamentally flawed: we see that friction coefficient suggests that surface which are more different would feel more similar - indeed the most distinctive surfaces would be two surfaces that are identical, which is clearly spurious. "

      They did not "demonstrate" such a flaw. Again, the difference in friction is between the mock finger trials. At the very least, the authors should verify that it is true of the active human experiment.

      "To fully implement this, a decision-making model is necessary because, as a counter example, a participant could have generated 10 swipes of SFW and 1 swipe of a Sp, but the Sp may have been the most important event for making a tactile decision. This type of scenario is not compatible with the analysis suggested - and similar counterpoints can be made for other types of seemingly straightforward analysis."

      The suggested analyses are straightforward and would be much more valuable than the data from the mock finger, even with the potential variability stated above.

      "We recognize that, with all factors being equal, this sample size is on the smaller end"

      Yet, the authors did not collect additional data to confirm their findings.

    3. Reviewer #3 (Public review):

      Strengths:

      The paper describes a new perspective on friction perception, with the hypothesis that humans are sensitive to the instabilities of the surface rather than the coefficient of friction. The paper is very well written and with a comprehensive literature survey.

      One of the central tools used by the author to characterize the frictional behavior is the frictional instabilities maps. With these maps, it becomes clear that two different surfaces can have both similar and different behavior depending on the normal force and the speed of exploration. It puts forward that friction is a complicated phenomenon, especially for soft

      The psychophysics study is centered around an odd-one-out protocol, which has the advantage of avoiding any external reference to what would mean friction or texture for example. The comparisons are made only based on the texture being similar or not.

      The results show a significant relationship between the distance between frictional maps and the success rate in discriminating two kinds of surface.

      Weaknesses:

      The main weakness of the paper comes from the fact that the frictional maps and the extensive psychophysics study are not made at the same time, nor with the same finger. The frictional maps are produced with an artificial finger made out of PDMS which is a poor substitute for the complex tribological properties of skin.

      The evidence would have been much stronger if the measurement of the interaction was done during the psychophysical experiment. In addition, because of the protocol, the correlation is based on aggregates rather than on individual interactions. However the current data already bring new light on the nature of frictional oscillation and their link to perception.

      The authors compensate with a third experiment where they used a 2AFC protocol and an online force measurement. But the results of this third study fail to solidify the relation.

      No map of the real finger interaction is shown, bringing doubt to the validity of the frictional map for something as variable as human fingers.

    4. Reviewer #4 (Public review):

      Summary:

      In this paper, Derkaloustian et al. look at the important topic of what affects fine touch perception. The observations that there may be some level of correlation with instabilities are intriguing. They attempted to characterize different materials by counting the frequency (occurrence #, not of vibration) of instabilities at various speeds and forces of a PDMS slab pulled lengthwise over the material. They then had humans perform the same vertical motion to discriminate between these samples. They correlated the % correct in discrimination with differences in frequency of steady sliding over the design space as well as other traditional parameters such as friction coefficient and roughness.

      The authors pose an interesting hypothesis and make an interesting observation about the occurrences of instability regimes in different materials while in contact with PDMS, which is interesting for the community to see in publication. It should be noted however that the finger is complex, and there are many factors that may be over simplified, and perhaps even incorrect, with the use of the PDMS finger. There are trends, such as the trend of surfaces that are more similar in PDMS friction coefficient being easier to discriminate than those with more different PDMS friction coefficient, that contradict multiple other papers in the literature (Fehlberg et al., 2024; Smith and Scott, 1996). This may be due to the PDMS finger not being representative of the real finger conditions. A measurement of friction and the instabilities with a human finger, or demonstration that the PDMS finger is producing the same results (friction coefficient, instabilities, etc.) as a human finger, is needed.

      Strengths:

      The strength of this paper is in its intriguing hypothesis and important observation that instabilities may contribute to what humans are detecting as differences in these apparently similar samples.

      Weaknesses:

      There is are significant weaknesses in the representativeness of the PDMS finger, the vertical motion, and the speed of sliding to real human exploration. The real finger has multiple layers with different moduli. In fact, the stratum corneum cells, which are the outer layer at the interface and determine the friction, have much higher modulus than PDMS. In addition, the flat contact area can cause shifting of contact points. Both can contribute to making the PDMS finger have much more stick slip than a real finger. In fact, if you look at the regime maps, there is very little space that has steady sliding. This does not represent well human exploration of surfaces. We do not tend to use force and velocity that will cause extensive stick slip (frequent regions of 100% stick slip) and, in fact, the speeds used in the study are on the slow side, which also contributes to more stick slip. At higher speeds and lower forces, all of the materials had steady sliding regions. Further, on these very smooth surfaces, the friction and stiction are more complex and cannot dismiss considerations such as finger material property change with sweat pore occlusion and sweat capillary forces. Also, the vertical motion of both the PDMS finger and the instructed human subjects is not the motion that humans typically use to discriminate between surfaces.

      This all leads to the critical question, why is the friction, normal force, and velocity not measured during the measured human exploration using the real human finger? An alternative would be showing that the PDMS finger reproduces the results of the human finger. I have checked the author's previous papers with this setup and did not find one that showed that the PDMS finger produced the same results as a human finger (Carpenter et al., 2018; Dhong et al., 2018; Nolin et al., 2022, 2021). The reviewer is not asking to do a more detailed psychophysical study with a decision-making model. All that is being asked is to use a human finger for the friction coefficient and instability measurements at typical human forces and speeds, or at least doing these measurements with both for one or two samples to show that the PDMS finger produces the same results as a human finger. The authors posed an extremely interesting hypothesis that humans may alter their speed to feel the instability transition regions. This is something that could be measured with a real finger but is not likely to be correlated accurately enough to match regime boundaries determined with such a simplified artificial finger.

      References

      Carpenter CW, Dhong C, Root NB, Rodriquez D, Abdo EE, Skelil K, Alkhadra MA, Ramírez J, Ramachandran VS, Lipomi DJ. 2018. Human ability to discriminate surface chemistry by touch. Mater Horiz 5:70-77. doi:10.1039/C7MH00800G<br /> Dhong C, Kayser LV, Arroyo R, Shin A, Finn M, Kleinschmidt AT, Lipomi DJ. 2018. Role of fingerprint-inspired relief structures in elastomeric slabs for detecting frictional differences arising from surface monolayers. Soft Matter 14:7483-7491. doi:10.1039/C8SM01233D<br /> Fehlberg M, Monfort E, Saikumar S, Drewing K, Bennewitz R. 2024. Perceptual Constancy in the Speed Dependence of Friction During Active Tactile Exploration. IEEE Transactions on Haptics 17:957-963. doi:10.1109/TOH.2024.3493421<br /> Nolin A, Licht A, Pierson K, Lo C-Y, Kayser LV, Dhong C. 2021. Predicting human touch sensitivity to single atom substitutions in surface monolayers for molecular control in tactile interfaces. Soft Matter 17:5050-5060. doi:10.1039/D1SM00451D<br /> Nolin A, Pierson K, Hlibok R, Lo C-Y, Kayser LV, Dhong C. 2022. Controlling fine touch sensations with polymer tacticity and crystallinity. Soft Matter 18:3928-3940. doi:10.1039/D2SM00264G<br /> Smith AM, Scott SH. 1996. Subjective scaling of smooth surface friction. Journal of Neurophysiology 75:1957-1962. doi:10.1152/jn.1996.75.5.1957

    1. eLife Assessment

      This valuable simulation study proposes a new coarse-grained model to explain the effects of CpG methylation on nucleosome wrapping energy and nucleosome positioning. The evidence to support the claims in the paper looks solid and this work will be of interest to the researchers working on gene regulation and mechanisms of DNA methylation.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors used a coarse-grained DNA model (cgNA+) to explore how DNA sequences and CpG methylation/hydroxymethylation influence nucleosome wrapping energy and the probability density of optimal nucleosomal configuration. Their findings indicate that both methylated and hydroxymethylated cytosines lead to increased nucleosome wrapping energy. Additionally, the study demonstrates that methylation of CpG islands increases the probability of nucleosome formation.

      Strengths:

      The major strength of this method is the model explicitly includes phosphate group as DNA-histone binding site constraints, enhancing CG model accuracy and computational efficiency and allowing comprehensive calculations of DNA mechanical properties and deformation energies.

      Weaknesses:

      A significant limitation of this study is that the parameter sets for the methylated and hydroxymethylated CpG steps in the cgNA+ model are derived from all-atom molecular dynamics (MD) simulations that use previously established force field parameters for modified cytosines (Pérez A, et al. Biophys J. 2012; Battistini, et al. PLOS Comput Biol. 2021). These parameters suggest that both methylated and hydroxymethylated cytosines increase DNA stiffness and nucleosome wrapping energy, which could predispose the coarse-grained model to replicate these findings. Notably, conflicting results from other all-atom MD simulations, such as those by Ngo T in Nat. Commun. 2016, shows that hydroxymethylated cytosines increase DNA flexibility, contrary to methylated cytosines. If the cgNA+ model were trained on these later parameters or other all-atom MD force fields, different conclusions might be obtained regarding the effects of methylated and hydroxymethylation on nucleosome formation.

      Despite the training parameters of the cgNA+ model, the results presented in the manuscript indicate that methylated cytosines increase both DNA stiffness and nucleosome wrapping energy. However, when comparing nucleosome occupancy scores with predicted nucleosome wrapping energies and optimal configurations, the authors find that methylated CGIs exhibit higher nucleosome occupancies than unmethylated ones, which seems to contradict the expected relationship where increased stiffness should reduce nucleosome formation affinity. In the manuscript, the authors also admit that these conclusions "apparently runs counter to the (perhaps naive) intuition that high nucleosome forming affinity should arise for fragments with low wrapping energy". Previous all-atom MD simulations (Pérez A, et al. Biophys J. 2012; Battistini, et al. PLOS Comput Biol. 202; Ngo T, et al. Nat. Commun. 20161) show that the stiffer DNA upon CpG methylation reduces the affinity of DNA to assemble into nucleosomes or destabilizes nucleosomes. Given these findings, the authors need to address and reconcile these seemingly contradictory results, as the influence of epigenetic modifications on DNA mechanical properties and nucleosome formation are critical aspects of their study.

      Understanding the influence of sequence-dependent and epigenetic modifications of DNA on mechanical properties and nucleosome formation is crucial for comprehending various cellular processes. The authors' study, focusing on these aspects, definitely will garner interest from the DNA methylation research community.

      Comments on revised version:

      The authors have addressed most of my comments and concerns regarding this manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      This study uses a coarse-grained model for double stranded DNA, cgNA+, to assess nucleosome sequence affinity. cgNA+ coarse-grains DNA on the level of bases and accounts also explicitly for the positions of the backbone phosphates. It has been proven to reproduce all-atom MD data very accurately. It is also ideally suited to be incorporated into a nucleosome model because it is known that DNA is bound to the protein core of the nucleosome via the phosphates.

      It is still unclear whether this harmonic model parametrized for unbound DNA is accurate enough to describe DNA inside the nucleosome. Previous models by other authors, using more coarse-grained models of DNA, have been rather successful in predicting base pair sequence dependent nucleosome behavior. This is at least the case as long as DNA shape is concerned whereas assessing the role of DNA bendability (something this paper focuses on) has been consistently challenging in all nucleosome models to my knowledge.

      It is thus of major interest whether this more sophisticated model is also more successful in handling this issue. As far as I can tell the work is technically sound and properly accounts for not only the energy required in wrapping DNA but also entropic effects, namely the change in entropy that DNA experiences when going from the free state to the bound state. The authors make an approximation here which seems to me to be a reasonable first step.

      Of interest is also that the authors have the parameters at hand to study the effect of methylation of CpG-steps. This is especially interesting as this allows to study a scenario where changes in the physical properties of base pair steps via methylation might influence nucleosome positioning and stability in a cell-type specific way.

      Overall, this is an important contribution to the questions of how sequence affects nucleosome positioning and affinity. The findings suggest that cgNA+ has something new to offer. But the problem is complex, also on the experimental side, so many questions remain open. Despite of this, I highly recommend publication of this manuscript.

      Strengths:

      The authors use their state-of-the-art coarse grained DNA model which seems ideally suited to be applied to nucleosomes as it accounts explicitly for the backbone phosphates.

      Weaknesses:

      The authors introduce penalty coefficients c_i to avoid steric clashes between the two DNA turns in the nucleosome. This requires c_i-values that are so high that standard deviations in the fluctuations of the simulation are smaller than in the experiments.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, authors utilize biophysical modeling to investigate differences in free energies and nucleosomal configuration probability density of CpG islands and nonmethylated regions in the genome. Toward this goal, they develop and apply the cgNA+ coarse-grained model, an extension of their prior molecular modeling framework.

      Strengths:

      The study utilizes biophysical modeling to gain mechanistic insight into nucleosomal occupancy differences in CpG and nonmethylated regions in the genome.

      Weaknesses:

      Although the overall study is interesting, the manuscripts need more clarity in places. Moreover, the rationale and conclusion for some of the analyses are not well described.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors used a coarse-grained DNA model (cgNA+) to explore how DNA sequences and CpG methylation/hydroxymethylation influence nucleosome wrapping energy and the probability density of optimal nucleosomal configuration. Their findings indicate that both methylated and hydroxymethylated cytosines lead to increased nucleosome wrapping energy. Additionally, the study demonstrates that methylation of CpG islands increases the probability of nucleosome formation.

      Strengths:

      The major strength of this method is that the model explicitly includes elastic constraints on the positions of phosphate groups facing a histone octamer, as DNA-histone binding site constraints. The authors claim that their model enhances the accuracy and computational efficiency and allows comprehensive calculations of DNA mechanical properties and deformation energies.

      Weaknesses:

      A significant limitation of this study is that the parameter sets for the methylated and hydroxymethylated CpG steps in the cgNA+ model are derived from all-atom molecular dynamics (MD) simulations that suggest that both methylated and hydroxymethylated cytosines increase DNA stiffness and nucleosome wrapping energy (P´erez A, et al. Biophys J. 2012; Battistini, et al. PLOS Comput Biol. 2021). It could predispose the coarse-grained model to replicate these findings. Notably, conflicting results from other all-atom MD simulations, such as those by Ngo T in Nat. Commun. 2016, shows that hydroxymethylated cytosines increase DNA flexibility, contrary to methylated cytosines. If the cgNA+ model was trained on these later parameters or other all-atom force fields, different conclusions might be obtained regarding the effects of methylated and hydroxymethylation on nucleosome formation.

      Despite the training parameters of the cgNA+ model, the results presented in the manuscript indicate that methylated cytosines increase both DNA stiffness and nucleosome wrapping energy. However, when comparing nucleosome occupancy scores with predicted nucleosome wrapping energies and optimal configurations, the authors find that methylated CGIs exhibit higher nucleosome occupancies than unmethylated ones, which seems to contradict their findings from the same paper which showed that increased stiffness should reduce nucleosome formation affinity. In the manuscript, the authors also admit that these conclusions “apparently runs counter to the (perhaps naive) intuition that high nucleosome forming affinity should arise for fragments with low wrapping energy”. Previous all-atom MD simulations (P´erez A, et al. Biophys J. 2012; Battistini, et al. PLOS Comput Biol. 202; Ngo T, et al. Nat. Commun. 20161) show that the stiffer DNA upon CpG methylation reduces the affinity of DNA to assemble into nucleosomes or destabilizes nucleosomes. Given these findings, the authors need to address and reconcile these seemingly contradictory results, as the influence of epigenetic modifications on DNA mechanical properties and nucleosome formation are critical aspects of their study. Understanding the influence of sequence-dependent and epigenetic modifications of DNA on mechanical properties and nucleosome formation is crucial for comprehending various cellular processes. The authors’ study, focusing on these aspects, will definitely garner interest from the DNA methylation research community.

      Training the cgNA+ model on alternative MD simulation datasets is certainly of interest to us. However, due to the significant computational cost, this remains a goal for future work. The relationship between nucleosome occupancy scores and nucleosome wrapping energy is still debated, with conflicting findings reported in the literature, as noted in our Discussion section. Interestingly, we find that our predicted log probability density of DNA spontaneously acquiring a nucleosomal configuration is a better indicator of nucleosome occupancy than our predicted DNA nucleosome wrapping energy.

      Reviewer #2 (Public Review):

      Summary:

      This study uses a coarse-grained model for double-stranded DNA, cgNA+, to assess nucleosome sequence affinity. cgNA+ coarse-grains DNA on the level of bases and accounts also explicitly for the positions of the backbone phosphates. It has been proven to reproduce all-atom MD data very accurately. It is also ideally suited to be incorporated into a nucleosome model because it is known that DNA is bound to the protein core of the nucleosome via the phosphates.

      It is still unclear whether this harmonic model parametrized for unbound DNA is accurate in describing DNA inside the nucleosome. Previous models by other authors, using more coarse-grained models of DNA, have been rather successful in predicting base pair sequence-dependent nucleosome behavior. This is at least the case as far as DNA shape is concerned whereas assessing the role of DNA bendability (something this paper focuses on) has been consistently challenging in all nucleosome models, to my knowledge.

      It is thus of major interest whether this more sophisticated model is also more successful in handling this issue. As far as I can tell the work is technically sound and properly accounts for not only the energy required in wrapping DNA but also entropic effects, namely the change in entropy that DNA experiences when going from the free state to the bound state. The authors make an approximation here which seems to me to be a reasonable first step.

      Of interest is also that the authors have the parameters at hand to study the effect of methylation of CpG-steps. This is especially interesting as it allows us to study a scenario where changes in the physical properties of base pair steps via methylation might influence nucleosome positioning and stability in a cell-type-specific way.

      Overall, this is an important contribution to the question of how the sequence affects nucleosome positioning and affinity. The findings suggest that cgNA+ has something new to offer. But the problem is complex, also on the experimental side, so many questions remain open.

      Strengths:

      The authors use their state-of-the-art coarse-grained DNA model which seems ideally suited to be applied to nucleosomes as it accounts explicitly for the backbone phosphates.

      Weaknesses:

      (1) According to the abstract the authors consider two “scalar measures of the sequence-dependent propensity of DNA to wrap into nucleosomes”. One is the bending energy and the other, is the free energy. Specifically in the latter, the authors take the difference between the free energies of the wrapped and the free DNA. Whereas the entropy of the latter can be calculated exactly, they assume that the bound DNA always has the same entropy (independent of sequence) in its more confined state. The problem is the way in which this is written (e.g. below Eq. 6) which is hard to understand. The authors should mention that the negative of Eq. 6 is what physicists call free energy, namely especially the free energy difference between bound and free DNA.

      We have included the necessary clarifications in the revised manuscript, below Eq. 6.

      (2) In Eq. 5 the authors introduce penalty coefficients c<sub>i</sub>. They write that values are “set by numerical experiment to keep distances ... within the ranges observed in the PDB structure, while avoiding sterical clashes in DNA.” This is rather vague, especially since it is unclear to me what type of sterical clashes might occur. Figure 1 shows then a comparison between crystal structures and simulated structures. They are reasonably similar but standard deviations in the fluctuations of the simulation are smaller than in the experiments. Why did the authors not choose smaller c<sub>i</sub>-values to have a better fit? Do smaller values lead to unwanted large fluctuations that would lead to steric clashes between the two DNA turns? I also wonder what side views of the nucleosomes look like (experiments and simulations) and whether in this side view larger fluctuations of the phosphates can be observed in the simulation that would eventually lead to turn-turn clashes for smaller c<sub>i</sub>-values.

      The side view plots of the experimental and predicted nucleosome structures are now added to Supplementary material (Figure S8). Indeed, smaller c<sub>i</sub> values lead to steric clashes between the two turns of DNA – this is now specified in the Methods section. A possible improvement of our optimisation method and a direction of future work would be adding a penalty which prevents steric clashes to the objective function. Then the c<sub>i</sub> values could be reduced to have bigger fluctuations that are even closer to the experimental structures. We added this explanation to the Results section.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors utilize biophysical modeling to investigate differences in free energies and nucleosomal configuration probability density of CpG islands and nonmethylated regions in the genome. Toward this goal, they develop and apply the cgNA+ coarse-grained model, an extension of their prior molecular modeling framework.

      Strengths:

      The study utilizes biophysical modeling to gain mechanistic insight into nucleosomal occupancy differences in CpG and nonmethylated regions in the genome.

      Weaknesses:

      Although the overall study is interesting, the manuscripts need more clarity in places. Moreover, the rationale and conclusion for some of the analyses are not well described.

      We edited the manuscript according to the reviewer’s suggestions and hopefully improved its readability.

      Reviewer #1 (Recommendations For The Authors):

      (1) The cgNA+ model parameters are derived from all-atom molecular dynamics (MD) simulations, yet there is no consensus within all-atom MD simulations regarding the impact of CpG methylation on DNA mechanical properties. The authors could consider fitting the coarsegrained model with a different all-atom force field to verify whether the conclusions regarding the effects of methylation and hydroxymethylation on DNA nucleosome wrapping energies still hold. For further details on MD simulations related to CpG methylation effects, the authors are advised to consult the review paper by Li et al. (2022) titled “DNA methylation: Precise modulation of chromatin structure and dynamics” published in Current Opinion in Structural Biology.

      Parametrizing the cgNA+ model using MD simulations with various force fields is certainly of interest to us. However, due to the computational cost involved, it remains a goal for future work.

      (2) Beyond DNA mechanical properties, which are directly linked to nucleosome wrapping energies in this study, the authors might also consider other factors such as geometric properties that could influence nucleosome formation. This approach might help the authors to reconcile the observed higher nucleosome occupancy scores for methylated CpGs. The authors are encouraged to review the aforementioned paper for additional experimental and MD simulation studies that could support this perspective.

      Geometric properties of DNA are directly incorporated into our method through the cgNA+ model equilibrium shape prediction µ. We compute the mechanical energy needed deform µ to a nucleosomal configuration. Notably, the equilibrium shape µ is sensitive to methylation, as demonstrated in Figure 3.

      (3) There are some issues with citation accuracy in the manuscript. For instance, in the Discussion section, the authors attribute a statement to Collings et al. and Anderson (2017), claiming that “methylated regions, known to have high wrapping energy, are among the highest nucleosome occupied elements in the genome.” However, upon reviewing this paper, it appears that it does not make any claims about the high wrapping energy of methylated regions.

      The paragraph is now edited and a separate citation, P´erez et al. (2012), is given for the statement that methylation regions have high wrapping energy.

      Reviewer #2 (Recommendations For The Authors):

      Please improve the readability by:

      (1) making clear that -ln ρ in Eq. 6 on page 4 is actually the free energy. Also, the word entropy comes too late (on page 7) where the best explanation of Eq. 6 is presented.

      We added a comment about -ln ρ being the free energy after Eq. 6 and also included an equation, relating ln ρ and entropy.

      (2) page 12 and 13 show two sets of experimental data. They are quite different from each other. When reading this, I wondered why there is this difference. But only on page 16, you explain that these are different cell types. The difference should be explained already when the papers are introduced on page 12.

      A corresponding sentence already appeared in page 12: “The observations about nucleosome occupancy should be regarded as preliminary, and be treated with caution, as they are based on experimental data obtained for the cancerous HeLa cells Schwartz et al. (2019) and human genome embryonic stem cells Yazdi et al. (2015)”. Now we also added this information to the first paragraph of the subsection for clarity.

      Finally, I add here some general thoughts that came up when reading the paper, comparing your findings with earlier findings in the field. This is not a strict one-to-one comparison and thus does not have to find its way into this manuscript but might give ideas for future studies. Experiments suggest that nucleosomes prefer DNA with a high content of C’s and G’s. Figure 2 does not look at the GC content but at the number of CpG’s. But in any case, let’s use this as a proxy for GC content. Figure 2a suggests that there is not a strong dependence of the bending energy on the number CpG steps. This is consistent with earlier work with the rigid basepair model which shows the same behavior for GC content (for both MD and crystal parametrizations). Figure 2c (related to the negative free energy) shows that with an increasing number of CpG steps the propensity to bind goes down. This suggests that the entropic cost to confine CpG-rich DNA increases, which in turn reflects that these DNA stretches are softer. This is rather interesting since in the case of the rigid basepair model this effect is observed only when stiffnesses are extracted from crystal data not MD data (however, this refers again to CG content). This might indicate a difference between the rigid bp model and cgNA+ which will be interesting to study in the future. Interesting is also the effect of CpG methylation. The stiffer methylated steps lead to an increase in the energy with the number of such steps (Figure 2a). The entropic cost for binding is thus expected to be smaller and this is indeed observed in Figure 2c when compared to the non-methylated steps.

      We thank the reviewer for this comment. As for the GC content, the energy and lnp plots are indeed very similar to those in Figure 2.

      Reviewer #3 (Recommendations For The Authors):

      (1) The formulation of the cgNA+ model in the method section was not easy to follow and can be described better to improve clarity.

      We have revised the model description and hope that its clarity has been improved

      (2) The authors mention utilizing 100 human genome sequences with 100 configurations from DB. It would be helpful to clarify the source of these 100 human genome sequences. Are these 100 distinct regions on the human reference genome, or are they from a specific dataset or database?

      We now include an explanation about the origin of sequences: “The human genome sequences are a random subset of our sequence sample for the CGI and NMI intersection in the Chromosome 1, but the following observations remain unchanged for sequence samples from different genomic regions.”

      (3) The authors mention the lack of tail unwrapping in their model. It would be beneficial to understand the magnitude of this issue and its potential impact on the overall results. How significant is the lack of unwrapping events in their current model?

      We observed the unwrapping of approximately five base-pairs at each end of our predicted nucleosome configurations, in comparison to the experimental configurations (Figure 1). This issue could be solved by adding additional constraints at the ends of the 147 bp sequence. The wrapping energy would increase marginally, as only about 10 of 147 bp would be affected. We added this remark to the main text.

      (4) Observations from Figure 3 are not described properly. Are these differences statistically significant? Why is twist higher for CpG sites but lower for a roll?

      We added an explanation of how the statistics was computed into the caption of Figure 3. In fact, we didn’t use statistical estimates here, but generated all the possible cases and computed the exact statistics (for the given set of our model parameters). Regarding the changes in twist and roll, we have added the following comment on page 7: “The ground state changes resulting from cytosine modifications – primarily characterized by an average increase in roll and a decrease in twist – may be linked to steric hindrance caused by the cytosine 5-substituent (Battistini et al. (2021)). Notably, the negative coupling between twist and roll has already been observed in X-ray crystallography data (Olson et al. (1998)).”

      (5) Figure 4 does not clarify the authors’ conclusion of higher stiffness for ApT and TpA dinucleotides. The authors should provide further explanation for this observation.

      We revised the text to clarify that the statement regarding ApT and TpA being the most stiff and the most flexible dinucleotides is not a conclusion derived from Figure 4, but rather from earlier work that we cite.

      (6) In Figure 7, the authors note that methylated CGIs have higher nucleosome occupancy on average than unmethylated sequences. Is this observation statistically significant?

      We observe that methylated sequences have a higher average occupancy than unmethylated sequences in Yazdi et al. data, when the CpG count falls into the intervals from 5 to 14 and from 15 to 24. For each of the two intervals this difference is statistically significant: the permutation test, used due to the lack of normality, yields a p-value of 0.0001 for both cases. The differences in mean scores shown in Figure 8 are also statistically significant. Such test results are expected, given the large sample sizes and the observed differences in means, therefore we prefer not to include this discussion in main text.

      (7) The authors note that their analyses to correlate nucleosome occupancy profile with the methylation state of underlying sequences are preliminary, as different cell lines were used to perform these analyses. Given this inconsistency, it needs to be clarified why this analysis was performed and what the takeaway is.

      We added the following comment at the end of the Results section: “Although comparing data from different cell lines is not optimal, to the best of our knowledge, no publicly available methylation and nucleosome occupancy data exist for the entire human genome within the same cell type. Nevertheless, since the lowest log probability densities in the human genome are predicted for CpG-rich sequences regardless of their methylation state (Figure 2d), and the same holds for both sets of the nucleosome occupancy scores (Figure 7), we conclude that the lowest occupancies occur for sequences with the lowest log probability densities.”

    1. eLife Assessment

      The authors addressed an important biological question, namely the role of glutamine metabolism in humoral responses, and they obtained solid conclusions. The strength of this study is that the authors used state-of-the-art transgenic mouse models together with in vitro analysis, thereby providing significant insights into the question posed. The following would strengthen the manuscript: i) adding more in-depth functionality/physiological relevance in the discussion part, and ii) regarding the experiments, the inclusion of more appropriate controls and a clearer and more accurate description of the methods.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Cho et al. present a comprehensive and multidimensional analysis of glutamine metabolism in the regulation of B cell differentiation and function during immune responses. They further demonstrate how glutamine metabolism interacts with glucose uptake and utilization to modulate key intracellular processes. The manuscript is clearly written, and the experimental approaches are informative and well-executed. The authors provide a detailed mechanistic understanding through the use of both in vivo and in vitro models. The conclusions are well supported by the data, and the findings are novel and impactful. I have only a few, mostly minor, concerns related to data presentation and the rationale for certain experimental choices.

      Detailed Comments:

      (1) In Figure 1b, it is unclear whether total B cells or follicular B cells were used in the assay. Additionally, the in vitro class-switch recombination and plasma cell differentiation experiments were conducted without BCR stimulation, which makes the system appear overly artificial and limits physiological relevance. Although the effects of glutamine concentration on the measured parameters are evident, the results cannot be confidently interpreted as true plasma cell generation or IgG1 class switching under these conditions. The authors should moderate these claims or provide stronger justification for the chosen differentiation strategy. Incorporating a parallel assay with anti-BCR stimulation would improve the rigor and interpretability of these findings.

      (2) In Figure 1c, the DMK alone condition is not presented. This hinders readers' ability to properly asses the glutaminolysis dependency of the cells for the measured readouts. Also, CD138+ in developing PCs goes hand in hand with decreased B220 expression. A representative FACS plot showing the gating strategy for the in vitro PCs should be added as a supplementary figure. Similarly, division number (going all the way to #7) may be tricky to gate and interpret. A representative FACS plot showing the separation of B cells according to their division numbers and a subsequent gating of CD138 or IgG1 in these gates would be ideal for demonstrating the authors' ability to distinguish these populations effectively.

      (3) A brief explanation should be provided for the exclusive use of IgG1 as the readout in class-switching assays, given that naïve B cells are capable of switching to multiple isotypes. Clarifying why IgG1 was preferentially selected would aid in the interpretation of the results.

      (4) The immunization experiments presented in Figures 1 and 2 are well designed, and the data are comprehensively presented. However, to prevent potential misinterpretation, it should be clarified that the observed differences between NP and OVA immunizations cannot be attributed solely to the chemical nature of the antigens - hapten versus protein. A more significant distinction lies in the route of administration (intraperitoneal vs. intranasal) and the resulting anatomical compartment of the immune response (systemic vs. lung-restricted). This context should be explicitly stated to avoid overinterpretation of the comparative findings.

      (5) NP immunization is known to be an inducer of an IgG1-dominant Th2-type immune response in mice. IgG2c is not a major player unless a nanoparticle delivery system is used. However, the authors arbitrarily included IgG2c in their assays in Figures 2 and 3. This may be confusing for the readers. The authors should either justify the IgG2c-mediated analyses or remove them from the main figures. (It can be added as supplemental information with proper justification).

      (6) Similarly, in affinity maturation analyses, including IgM is somewhat uncommon. I do not see any point in showing high affinity (NP2/NP20) IgMs (Figure 3d), since that data probably does not mean much.

      (7) Following on my comment for the PC generation in Figure 1 (see above), in Figure 4, a strategy that relies solely on CD40L stimulation is performed. This is highly artificial for the PC generation and needs to be justified, or more physiologically relevant PC generation strategies involving anti-BCR, CD40L, and various cytokines should be shown.

      (8) The effects of CB839 and UK5099 on cell viability are not shown. Including viability data under these treatment conditions would be a valuable addition to the supplementary materials, as it would help readers more accurately interpret the functional outcomes observed in the study.

      (9) It is not clear how the RNA seq analysis in Figure 4h was generated. The experimental strategy and the setup need to be better explained.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate the functional requirements for glutamine and glutaminolysis in antibody responses. The authors first demonstrate that the concentrations of glutamine in lymph nodes are substantially lower than in plasma, and that at these levels, glutamine is limiting for plasma cell differentiation in vitro. The authors go on to use genetic mouse models in which B cells are deficient in glutaminase 1 (Gls), the glucose transporter Slc2a1, and/or mitochondrial pyruvate carrier 2 (Mpc2) to test the importance of these pathways in vivo.

      Interestingly, deficiency of Gls alone showed clear antibody defects when ovalbumin was used as the immunogen, but not the hapten NP. For the latter response, defects in antibody titers and affinity were observed only when both Gls and either Mpc2 or Slc2a1 were deleted. These latter findings form the basis of the synthetic auxotrophy conclusion. The authors go on to test these conclusions further using in vitro differentiations, Seahorse assays, pharmacological inhibitors, and targeted quantification of specific metabolites and amino acids. Finally, the authors document reduced STAT3 and STAT1 phosphorylation in response to IL-21 and interferon (both type 1 and 2), respectively, when both glutaminolysis and mitochondrial pyruvate metabolism are prevented.

      Strengths:

      (1) The main strength of the manuscript is the overall breadth of experiments performed. Orthogonal experiments are performed using genetic models, pharmacological inhibitors, in vitro assays, and in vivo experiments to support the claims. Multiple antigens are used as test immunogens--this is particularly important given the differing results.

      (2) B cell metabolism is an area of interest but understudied relative to other cell types in the immune system.

      (3) The importance of metabolic flexibility and caution when interpreting negative results is made clear from this study.

      Weaknesses:

      (1) All of the in vivo studies were done in the context of boosters at 3 weeks and recall responses 1 week later. This makes specific results difficult to interpret. Primary responses, including germinal centers, are still ongoing at 3 weeks after the initial immunization. Thus, untangling what proportion of the defects are due to problems in the primary vs. memory response is difficult.

      (2) Along these lines, the defects shown in Figure 3h-i may not be due to the authors' interpretation that Gls and Mpc2 are required for efficient plasma cell differentiation from memory B cells. This interpretation would only be correct if the absence of Gls/Mpc2 leads to preferential recruitment of low-affinity memory B cells into secondary plasma cells. The more likely interpretation is that ongoing primary germinal centers are negatively impacted by Gls and Mpc2 deficiency, and this, in turn, leads to reduced affinities of serum antibodies.

      (3) The gating strategies for germinal centers and memory B cells in Supplemental Figure 2 are problematic, especially given that these data are used to claim only modest and/or statistically insignificant differences in these populations when Gls and Mpc2 are ablated. Neither strategy shows distinct flow cytometric populations, and it does not seem that the quantification focuses on antigen-specific cells.

      (4) Along these lines, the conclusions in Figure 6a-d may need to be tempered if the analysis was done on polyclonal, rather than antigen-specific cells. Alum induces a heavily type 2-biased response and is not known to induce much of an interferon signature. The authors' observations might be explained by the inclusion of other ongoing GCs unrelated to the immunization.

    4. Reviewer #3 (Public review):

      Summary:

      In their manuscript, the authors investigate how glutaminolysis (GLS) and mitochondrial pyruvate import (MPC2) jointly shape B cell fate and the humoral immune response. Using inducible knockout systems and metabolic inhibitors, they uncover a "synthetic auxotrophy": When GLS activity/glutaminolysis is lost together with either GLUT1-mediated glucose uptake or MPC2, B cells fail to upregulate mitochondrial respiration, IL 21/STAT3 and IFN/STAT1 signaling is impaired, and the plasma cell output and antigen-specific antibody titers drop significantly. This work thus demonstrates the promotion of plasma cell differentiation and cytokine signaling through parallel activation of two metabolic pathways. The dataset is technically comprehensive and conceptually novel, but some aspects leave the in vivo and translational significance uncertain.

      Strengths:

      (1) Conceptual novelty: the study goes beyond single-enzyme deletions to reveal conditional metabolic vulnerabilities and fate-deciding mechanisms in B cells.

      (2) Mechanistic depth: the study uncovers a novel "metabolic bottleneck" that impairs mitochondrial respiration and elevates ROS, and directly ties these changes to cytokine-receptor signaling. This is both mechanistically compelling and potentially clinically relevant.

      (3) Breadth of models and methods: inducible genetics, pharmacology, metabolomics, seahorse assay, ELISpot/ELISA, RNA-seq, two immunization models.

      (4) Potential clinical angle: the synergy of CB839 with UK5099 and/or hydroxychloroquine hints at a druggable pathway targeting autoantibody-driven diseases.

      Weaknesses:

      (1) Physiological relevance of "synthetic auxotrophy"

      The manuscript demonstrates that GLS loss is only crippling when glucose influx or mitochondrial pyruvate import is concurrently reduced, which the authors name "synthetic auxotrophy". I think it would help readers to clarify the terminology more and add a concise definition of "synthetic auxotrophy" versus "synthetic lethality" early in the manuscript and justify its relevance for B cells.

      While the overall findings, especially the subset specificity and the clinical implications, are generally interesting, the "synthetic auxotrophy" condition feels a little engineered. Therefore, the findings strongly raise the question of the likelihood of such a "double hit" in vivo and whether there are conditions, disease states, or drug regimens that would realistically generate such a "bottleneck". Hence, the authors should document or at least discuss whether GC or inflamed niches naturally show simultaneous downregulation/lack of glutamine and/or pyruvate. The authors should also aim to provide evidence that infections (e.g., influenza), hypoxia, treatments (e.g., rapamycin), or inflammatory diseases like lupus co-limit these pathways.

      It would hence also be beneficial to test the CB839 + UK5099/HCQ combinations in a short, proof-of-concept treatment in vivo, e.g., shortly before and after the booster immunization or in an autoimmune model. Likewise, it may also be insightful to discuss potential effects of existing treatments (especially CB839, HCQ) on human memory B cell or PC pools.

      (2) Cell survival versus differentiation phenotype

      Claims that the phenotypes (e.g., reduced PC numbers) are "independent of death" and are not merely the result of artificial cell stress would benefit from Annexin-V/active-caspase 3 analyses of GC B cells and plasmablasts. Please also show viability curves for inhibitor-treated cells.

      (3) Subset specificity of the metabolic phenotype

      Could the metabolic differences, mitochondrial ROS, and membrane-potential changes shown for activated pan-B cells (Figure 5) also be demonstrated ex vivo for KO mouse-derived GC B cells and plasma cells? This would also be insightful to investigate following NP-immunization (e.g., NP+ GC B cells 10 days after NP-OVA immunization).

      (4) Memory B cell gating strategy

      I am not fully convinced that the memory-B-cell gate in Supplementary Figure 2d is appropriate. The legend implies the population is defined simply as CD19+GL7-CD38+ (or CD19+CD38++?), with no further restriction to NP-binding cells. Such a gate could also capture naïve or recently activated B cells. From the descriptions in the figure and the figure legend, it is hard to verify that the events plotted truly represent memory B cells. Please clarify the full gating hierarchy and, ideally, restrict the MBC gate to NP+CD19+GL7-CD38+ B cells (or add additional markers such as CD80 and CD273). Generally, the manuscript would benefit from a more transparent presentation of gating strategies.

      (5) Deletion efficiency

      mRNA data show residual GLS/MPC2 transcripts (Supplementary Figure 8). Please quantify deletion efficiency in GC B cells and plasmablasts.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. The authors provide evidence that 1) non time-reversible models sometimes perform better than general time-reversible models when inferring phylogenetic trees out of simulated viral genome sequence data sets, and that 2) non time-reversible models can fit the real data better than the reversible substitution models commonly used in phylogenetics, a finding consistent with previous work. However, the methods are incomplete in supporting the main conclusion of the manuscript, that is that non time-reversible models should be incorporated in the model selection process for these data sets.

      The non-reversible models should be incorporated in the selection model process not because the significantly perform better but only because the do not perform worse than the reversible models and that true biochemical processes of nucleotide substitution does support the science of non-reversibility.

      Reviewer #1 (Public Review):

      The study by Sianga-Mete et al revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. This topic is not new, previous works already showed that non-reversible, and also covarion, substitution models can fit the real data better than the reversible substitution models commonly used in phylogenetics. In this regard, the results of the present study are not surprising. Specific comments are shown below.

      True

      It is well known that non-reversible models can fit the real data better than the commonly used reversible substitution models, see for example,

      https://academic.oup.com/sysbio/article/71/5/1110/6525257

      https://onlinelibrary.wiley.com/doi/10.1111/jeb.14147?af=R

      The manuscript indicates that the results (better fitting of non-reversible models compared to reversible models) are surprising but I do not think so, I think the results would be surprising if the reversible models provide a better fitting.

      I think the introduction of the manuscript should be increased with more information about non-reversible models and the diverse previous studies that already evaluated them. Also I think the manuscript should indicate that the results are not surprising, or more clearly justify why they are surprising.

      The surprise in the findings is in NREV12 performing better than NREV6 for double stranded DNA viruses as it was expected that NREV6 would perform better given the biochemical processes discussed in the introduction.

      In the introduction and/or discussion I missed a discussion about the recent works on the influence of substitution model selection on phylogenetic tree reconstruction. Some works indicated that substitution model selection is not necessary for phylogenetic tree reconstruction,

      https://academic.oup.com/mbe/article/37/7/2110/5810088

      https://www.nature.com/articles/s41467-019-08822-w

      https://academic.oup.com/mbe/article/35/9/2307/5040133

      While others indicated that substitution model selection is recommended for phylogenetic tree reconstruction,

      https://www.sciencedirect.com/science/article/pii/S0378111923001774

      https://academic.oup.com/sysbio/article/53/2/278/1690801

      https://academic.oup.com/mbe/article/33/1/255/2579471

      The results of the present study seem to support this second view. I think this study could be improved by providing a discussion about this aspect, including the specific contribution of this study to that.

      In our conclusion we have stated that:

      The lack of available data regarding the proportions of viral life cycles during which genomes exist in single and double stranded states makes it difficult to rationally predict the situations where the use of models such as GTR, NREV6 and NREV12 might be most justified: particularly in light of the poor over-all performance of NREV6 and GTR relative to NREV12 with respect to describing mutational processes in viral genome sequence datasets. We therefore recommend case-by-case assessments of NREV12 vs NREV6 vs GTR model fit when deciding whether it is appropriate to consider the application of non-reversible models for phylogenetic inference and/or phylogenetic model-based analyses such as those intended to test for evidence of natural section or the existence of molecular clocks.

      The real data was downloaded from Los Alamos HIV database. I am wondering if there were any criterion for selecting the sequences or if just all the sequences of the database for every studied virus category were analysed. Also, was any quality filter applied? How gaps and ambiguous nucleotides were considered? Notice that these aspects could affect the fitting of the models with the data.

      We selected varying number of sequences of the database for every studied virus type. Using the software aliview we did quality filter by re-aligning the sequences per virus type.

      How the non-reversible model and the data are compared considering the non-reversible substitution process? In particular, given an input MSA, how to know if the nucleotide substitution goes from state x to state y or from state y to state x in the real data if there is not a reference (i.e., wild type) sequence? All the sequences are mutants and one may not have a reference to identify the direction of the mutation, which is required for the non-reversible model. Maybe one could consider that the most abundant state is the wild type state but that may not be the case in reality. I think this is a main problem for the practical application of non-reversible substitution models in phylogenetics.

      True

      Reviewer #1 (Recommendations for the authors):

      The reversible and non-reversible models used in this study assume that all the sites evolve under the same substitution matrix, which can be unrealistic. This aspect could be mentioned.

      Done

      The manuscript indicates that "a phylogenetic tree was inferred from an alignment of real sequences (Avian Leukosis virus) with an average sequence identity (API) of ~90%.". I was wondering under which substitution model that phylogenetic tree reconstruction was performed? could the use of that model bias posterior results in terms of favoring results based on such a model?

      We have stated that the GTR+G model was used to reconstruct the tree. The use of the GTR+G model could yes bias the posterior results as we have stated in the paper too.

      I was wondering which specific R function was used to calculate the weighted Robinson-Foulds metric. I think this should be included in the manuscript.

      We stated that We used the weighted Robinson-Foulds metric (wRF; implemented in the R phangorn package (Schliep, 2011)⁠)

      Despite a minority, several datasets fitted better with a reversible model than with a non-reversible model. I think that should be clearly indicated. In addition, in my opinion the AIC does not enough penalizes the number of parameters of the models and favors the non-reversible models over the reversible models, but this is only my opinion based on the definition of AIC and it is not supported. Thus, I think the comparison between phylogenetic trees reconstructed under different substitution models was a good idea (but see also my second major comment).

      Noted

      When comparing phylogenetic trees I was wondering if one should consider the effect of the estimation method and quality of the studied data? For example, should bootstrap values be estimated for all the ancestral nodes and only ancestral nodes with high support be evaluated in the comparison among trees?

      Yes the estimation method and quality of the studied data should be considered. When using RF unlike wRF this will not matter but for weighted RF it does. When building the trees, using RaxML only high support nodes are added to the tree.

      In Figure 3, I do not see (by eye) significant differences among the models. I see in the legend that the statistical evaluation was based on a t test but I am not much convinced. Maybe it is only my view. Exactly, which pairs of datasets are evaluated with the t test? Next, I would expect that the influence of the substitution model on the phylogenetic tree reconstruction is higher at large levels of nucleotide diversity because with more substitution events there is more information to see the effects of the model. However, the t test seems to show that differences are only at low levels of nucleotide diversity (and large DNR), what could be the cause of this?

      The paired T-tests compares the wRF distances of the inferred tree real tree and the trees simulated using the GTR model verses the wRF distances of the inferred true tree from the trees simulated using the NREV12 model.

      The reason why the influence of the NREV12 model on the tree reconstructed is not significantly higher at large levels of nucleotide diversity could be because at a certain level the DNR are simply unrealistic.

      Can the user perform substitution model selection (i.e., AIC) among reversible and non-reversible substitution models with IQTREE? If yes, then doing that should be the recommendation from this study, correct?

      But, can DNR be estimated from a real dataset? DNR seems to be the key factor (Figure 3) for the phylogenetic analysis under a proper model.

      Substitution model selection can be performed among reversible and non-reversible using both HyPhy and IQTREE. And we have recommended that model tests should be done as a first step before tree building. Estimating DNR from real datasets requires a substation rate matrix of a non-reversible.

      The manuscript has many text errors (including typos and incorrect citations). For example, many citations in page 20 show "Error! Reference source not found.". I think authors should double check the manuscript before submitting. Also, some text is not formally written. For example, "G represents gamma-distributed rates", rates of what? The text should be clear for readers that are not familiar with the topic (i.e., G represents gamma-distributed substitution rates among sites). In general, I recommend a detailed revision of the whole text of the manuscript.

      Done

      Reviewer #2 (Public Review):

      The authors evaluate whether non time reversible models fit better data presenting strand-specific substitution biases than time reversible models. Specifically, the authors consider what they call NREV6 and NREV12 as candidate non time-reversible models. On the one hand, they show that AIC tends to select NREV12 more often than GTR on real virus data sets. On the other hand, they show using simulated data that NREV12 leads to inferred trees that are closer to the true generating tree when the data incorporates a certain degree of non time-reversibility.

      Based on these two experimental results, the authors conclude that "We show that non-reversible models such as NREV12 should be evaluated during the model selection phase of phylogenetic analyses involving viral genomic sequences". This is a valuable finding, and I agree that this is potentially good practice.

      However, I miss an experiment that links the two findings to support the conclusion: in particular, an experiment that solves the following question: does the best-fit model also lead to better tree topologies?

      By NREV12 leading to inferred trees that are closer to the true generating tree as compared to GTR, it then shows that the best-fit model in this case being NREV12 leads to better tree topologies.

      On simulated data, the significance of the difference between GTR and NREV12 inferences is evaluated using a paired t test. I miss a rationale or a reference to support that a paired t test is suitable to measure the significance of the differences of the wRF distance. Also, the results show that on average NREV12 performs better than GTR, but a pairwise comparison would be more informative: for how many sequence alignments does NREV12 perform better than GTR?

      We have used the popular paired t-test as it is the most widely used when comparing means values between two matched samples where the difference of each mean pair is normally distributed. And the wRF distances do match the guidelines above.

      The paired t-test contains the pairwise comparison and the boxplots side by side show the pairwise wRF comparisions.

      Reviewer #2 (Recommendations for the authors):

      The authors reference Baele et al., 2010 for describing NREV6 and NREV12. I suggest using the same name used in the referenced paper: GNR-SYM and GNR respectively. Although I do not think there is a standard name for these models, I would use a previously used one.

      We have built studies based on the names NREV6 and NREV12. We would like to keep the naming as standard for our studies.

      GTR and NREV12 models are already described in many other papers. I do not see the need to include such an extensive description. Also, a reference should be included to the discrete Gamma rate categories [1]

      We included the extensive description to enable other readers who are not super familiar with these models better understanding since we have given the models our own naming different from those used in other papers.

      We have added referencing for the discrete gamma rate as recommended. (Yang, 1994)

      To evaluate the exhaustiveness and correctness of the results, I would recommend publishing as supplementary material the simulated data sets or the scripts for generating the data set, the scripts or command lines for the analysis, and the versions of the software used (e.g., IQTREE). Also, to strongly support the main conclusion of the manuscript, I suggest adding to the simulations section results the RF-distances of the best-fit selected model under AIC, AICc, and BIC as well.

      We can go ahead and submit all the needed datasets. The simulated data RF-Distances results are available and will be submitted. We cannot however add them to the main document as this will create very long data tables.

      In some instances, it is mentioned that the selection criterion used is AIC, while in others, AIC-c is referenced. Even in the table captions, both terms are mixed. It should be made clearer which criterion is being employed, as AIC is not suitable for addressing the overparameterization of evolutionary models, given that it does not account for the sample size. A previous pre-print of this article [2] does not mention AIC-c, but also explicitly includes the formulas for AIC that do not take the sample size into account, and reports the same results as this manuscript, what indicates that AIC and not AIC-c was used here. This should be clarified. It is recommended to use AIC-c instead of AIC, especially if the sample size to model parameters ratio is low [3]. Two things may be appointed here: some authors consider tree branch lengths as model free parameters and others do not. In this paper it is not specified how the model parameters are counted. AIC tends to select more parameterized models than AIC-c, and overparameterization can lead to different tree inferences, as evidenced in Hoff et al., 2016. Therefore, it is expected that NREV12 is more frequently selected than NREV6 and GTR.

      In my opinion, a pairwise comparison between GTR and NREV12 performance is of great interest here, and the whiskers plots are not useful. Scatterplots would display the results better.

      Boxplots are meant to offer a simplified view of the results as the paired t-tests does all of the comparisons. We shall provide the scatter plots as supplementary information so that readers can get full detailed plots as recommended.

      Some references are missing.

      Missing references added

    2. Reviewer #1 (Public Review):

      The study by Sianga-Mete et al revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. This topic is not new, previous works already showed that non-reversible, and also covarion, substitution models can fit the real data better than the reversible substitution models commonly used in phylogenetics. In this regard, the results of the present study are not surprising.

    3. Reviewer #2 (Public Review):

      The authors evaluate whether non time reversible models fit better data presenting strand-specific substitution biases than time reversible models. Specifically, the authors consider what they call NREV6 and NREV12 as candidate non time-reversible models. On the one hand, they show that AIC tends to select NREV12 more often than GTR on real virus data sets. On the other hand, they show using simulated data that NREV12 leads to inferred trees that are closer to the true generating tree when the data incorporates a certain degree of non time-reversibility. Based on these two experimental results, the authors conclude that "We show that non-reversible models such as NREV12 should be evaluated during the model selection phase of phylogenetic analyses involving viral genomic sequences". This is a valuable finding, and I agree that this is potentially good practice. However, I miss an experiment that links the two findings to support the conclusion: in particular, an experiment that solves the following question: does the best-fit model also lead to better tree topologies?

      [Editors' note: the reviewers were sent the revised submission and rebuttal and based on their response, an amended eLife Assessment has been formulated.]

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      In this manuscript, Gruber et al perform serial EM sections of the antennal lobe and reconstruct the neurites innervating two types of glomeruli one that is narrowly tuned to geosmin and one that is broadly tuned to other odours. They quantify and describe various aspects of the innervations of olfactory sensory neurons (OSNs), uniglomerlular projection neurons (uPNs), and the multiglomerular Local interneurons (LNs) and PNs (mPNs). They find that narrowly tuned glomeruli had stronger connectivity from OSNs to PNs and LNs, and considerably more connections between sister OSNs and sister PNs than the broadly tuned glomeruli. They also had less connectivity with the contralateral glomeruli. These observations are suggestive of strong feed-forward information flow with minimal presynaptic inhibition in narrowly tuned glomeruli, which might be ecologically relevant, for example, while making quick decisions such as avoiding a geosmin-laden landing site. In contrast, information flow in more broadly tuned glomeruli show much more lateralisation of connectivity to the contralateral glomerulus, as well as to other ipsilateral glomeruli. 

      The data are well presented, the manuscript clearly written, and the results will be useful to the olfaction community. I wonder, given the hemibrain and FAFB datasets exist, whether the authors have considered verifying whether the trends they observe in connectivity hold across three brains? Is it stereotypic? 

      We appreciate the reviewer’s positive view of our study and their thoughtful and relevant comment on the issue of individual variation. We agree in that this is a very important question and notice that it was also asked for by the second Reviewer. It reflects both our limited understanding of the range of individual variation in synaptic connectivity—whether in flies, humans, or other species—and the challenge of determining which of the differences observed in our study are stereotypical features of each glomerulus type. Undoubtedly this criticism addresses a crucial problem of practically all connectome studies so far and for which there is no immediate solution. This type of studies requires so much time, efforts and money that increasing the number of samples is seldom feasible. The Reviewer wonders if we could compare our data with that made available by two of the largest connectome studies of Drosophila. This appeared to us to be a very good idea and we have tried to follow the advice but, unfortunately, it was impracticable because of the reasons we explain below. The hemibrain data cannot be used for this purpose because it does not contain the full glomerulus DA2 (Schlegel et al., 2021). A different problem hindered us from using the FAFB dataset, the other dataset mentioned by the Reviewer. In this case the three glomeruli were sectioned and reconstructed but the dataset lacks an annotated list of all synaptic connections corresponding to each glomerulus. Such annotation (a compendium of all synaptic connections inside each glomerulus informing for each connection which type of neuron provides the presynaptic site and which the postsynaptic site) is essential for direct comparison with our data. It is important to keep in mind that the current analytical tools available for the use of these datasets (e.g., NeuPrint, FlyWire and CATMAID) do not offer the ability to extract data on synapses exclusively from the glomerular volume of DA2 or DL5. In this case, it certainly is theoretically possible to obtain the data by doing ourselves the annotation. However, such a study will demand so much time, efforts and financial resources, which we believe would not be justified solely to increase the number of individuals from one to two. Instead, our manuscript includes a comparison of the OSN connectivity in VA1v and DL5 using the hemibrain dataset published by Schlegel et al. (2021) (see revised manuscript: lines 311–315; 431–434; 558–562; 602–606).

      Beyond the opinion, that we share in full with the Reviewer, that a comparison including three flies will be better than a comparison made with one glomerulus of each type we are still challenged by the question of which -if any- of the differences are stereotypic. The clarification of what are stereotypical differences between particular glomeruli in features as those discussed in our study and what is simply differences within the normal range of individual variation is basically a statistical problem. A first attempt at a comprehensive comparison focusing on intra- and inter-individual variability was recently made by comparing two connectome datasets from two different Drosophila individuals (Dorkenwald et al., 2024; Schlegel et al., 2024). At present, it is still unclear how many samples are needed to make a statistically robust comparison of olfactory synaptic circuits in adult flies—perhaps 3, 6, or even 18 individuals?  

      Reviewer #2 (Public Review):

      The chemoreceptor proteins expressed by olfactory sensory neurons differ in their selectivity such that glomeruli vary in the breadth of volatile chemicals to which they respond. Prior work assessing the relationship between tuning breadth and the demographics of principal neuron types that innervate a glomerulus demonstrated that narrowly tuned glomeruli are innervated more projection neurons (output neurons) and fewer local interneurons relative to more broadly tuned glomeruli. The present study used high-resolution electron microscopy to determine which synaptic relationships between principal cell types also vary with glomerulus tuning breadth using a narrowly tuned glomerulus (DA2) and a broadly tuned glomerulus (DL5). The strength of this study lies in the comprehensive, synapse-level resolution of the approach. Furthermore, the authors implement a very elegant approach of using a 2-photon microscope to score the upper and lower bounds of each glomerulus, thus defining the bounds of their restricted regions of interest. There were several interesting differences including greater axo-axonic afferent synapses and dendrodentric output neuron synapses in the narrowly tuned glomerulus, and greater synapses upon sensory afferents from multiglomerular neurons and output neuron autapses in the broadly tuned glomerulus.     The study is limited by a few factors. There was a technical need to group all local interneurons, centrifugal neurons, and multiglomerular projection neurons into one category ("multiglomerular neurons") which complicates any interpretations as even multiglomerular projection neurons are very diverse. Additionally, there were as many differences between the two narrowly tuned glomeruli as there were comparing the narrowly and broadly tuned glomeruli. Architecture differences may therefore not reflect differences in tuning breadth, but rather the ecological significance of the odors detected by cognate sensory afferents. Finally, some synaptic relationships are described as differing and others as being the same between glomeruli, but with only one sample from each glomerulus, it is difficult to determine when measures differ when there is no measure of inter-animal variability. If these caveats are kept in mind, this work reveals some very interesting potential differences in circuit architecture associated with glomerular tuning breadth.

      This work establishes specific hypotheses about network function within the olfactory system that can be pursued using targeted physiological approaches. It also identifies key traits that can be explored using other high-resolution EM datasets and other glomeruli that vary in their tuning selectivity. Finally, the laser "branding" technique used in this study establishes a reduced-cost procedure for obtaining smaller EM datasets from targeted volumes of interest by leveraging the ability to transgenically label brain regions in Drosophila.

      CLASSIFICATION OF NEURONAL TYPES

      We agree that grouping diverse types of interneurons into a single category (referred to as MGNs) limits the ability to make interpretations about synaptic similarities and differences between specific neuronal types. This was, however, an unavoidable compromise resulting from our decision to generate a comprehensive, synapse-level reconstruction of the restricted regions encompassing the DA2 and DL5 glomeruli. As both reviewers have noted, this approach offers significant value and we hope the Editor will also recognize that this limitation does not prevent readers from gaining important and novel insights into the synaptic circuitry of these two glomeruli.  

      Similar to the approach taken by Tobin at al. (2017) we prioritized producing a densely reconstructed neuropile, in which no synapses were omitted (Tobin et al., 2017). The downside of this method is that not all synaptic connections could be reliably assigned to specific neuronal types, with about 12% remaining unassigned." We anticipate that future research, supported by advances in semi-automated tracing methods, improved imaging technologies, and increased personnel resources, will allow not only for the generation of more complete connectomes of the entire brain (Scheffer et al., 2020; Zheng et al., 2018), but also, for the accurate reconstruction and classification of individual synapses—even in highly complex regions such as the olfactory glomeruli. We also expect that a second complete connectome of a male Drosophila will soon become available, which will provide valuable opportunities for comparisons across individuals and between male and female brains in future studies.

      INTERGLOMERULAR DIFFERENCES

      Thank you for this insightful comment. It is indeed true that despite both DA2 and VA1v being narrowly tuned glomeruli, they exhibit considerable differences in specific connectivity features (e.g., relative synaptic strengths above certain thresholds) and that those differences can be as pronounced as those observed between DA2 and the broadly tuned DL5. For this reason, comparing each individual glomerulus to every other is not a practical or informative approach. To derive robust interpretations, we focused instead on whether two glomeruli that share a particular functional characteristic—namely, being narrowly tuned for single odorants—also share connectivity patterns that distinguish them from a broadly tuned reference glomerulus.

      Our results support this. Furthermore, additional connectomics data reinforce our conclusions.

      For example, OSN-OSN connectivity is stronger in the two narrowly tuned glomeruli (DA2 and VA1v) relative to the broadly tuned glomerulus (DL5). While these pairwise differences alone are not conclusive, the finding that the two narrowly tuned glomeruli studied here share features that distinguish them from the broadly tuned glomerulus supports our interpretation. We found further support for this idea in the data reported by Schlegel et al. (2021) further. In that dataset, other narrowly tuned glomeruli (DA1, DL3, and DL4) also exhibit stronger OSNOSN connectivity than other broadly tuned glomeruli (DM1 or DM4).

      We do not deny that there are many differences between any given pair of glomeruli, regardless of whether they are narrowly or broadly tunned. Instead, we propose that our findings on circuit features indicate that most of the observed differences actually grouped the two narrowly tuned glomeruli together relative to the broadly tuned glomerulus. A more concise summary is now provided in the newly added Figure 8. We also added explanatory lines of text in the beginning of the chapter ‘specific features of narrowly tuned glomerular circuits. 

      ECOLOGICAL SIGNIFICANCE

      This is an interesting point. However, it is difficult to disentangle the "ecological significance" of processed odorants from the "tuning breadth" of a glomerulus. In the Drosophila olfactory system, glomerular circuits that respond to ecologically important odorants—such as those involved in reproduction or danger—tend to be more narrowly tuned. Moreover, while we refer to odorants with specific ecological significance as those linked to survival or reproductive behaviors, defining the significance of an odorant with precision is inherently challenging, as it can vary depending on context and environmental conditions.

      What both circuits share is their narrow tuning breadth. We therefore propose that the common circuit features of VA1v and DA2, highlighted in this study, are functionally related to the fact that each circuit processes single odorants. Consequently, their specificity is most likely determined at the level of the receptor. 

      INDIVIDUAL VARIABILITY

      We agree that accounting for inter-animal variability would strengthen the study. However, we are confident that even a modest statistically sound assessment of this variability would require a larger sample size, certainly more than just two or three flies, which is presently not feasible.

      We refer the reviewer to our response to Reviewer #1 regarding this important issue.

      Initial insights into variability between flies have been provided through comparative analyses of the two most comprehensive female Drosophila melanogaster connectomes—the FAFB and hemibrain datasets (Schlegel et al., 2024). For more detailed quantitative comparisons regarding inter-animal variability, please refer to our response to the second major point raised by Reviewer #2. As highlighted by Schlegel et al. (2024), making definitive statements about the stereotypy of neuron numbers, unitary cell-cell connections (edges), or synaptic strengths (weights) remains a complex challenge."

      While appreciating the rigour of this work we were surprised to notice the omission of a comparison of their observations with the two other existing datasets. This would not only have addressed the technical limitation of this particular study - the inability to identify specific neuron types due to imaging a small part of the brain - but would also have shed light on inter-animal variability 

      We strongly recommend that the authors do make this comparison - the datasets are currently extremely user friendly and so we don't estimate the replication of their key findings will be too onerous. This will be particularly important to resolve the issue of having to classify all multiglomerular local interneurons and multiglomerular projection neurons - broadly into "MGN. Such a comparison will dramatically strengthen this study that poses very interesting questions, but in its current form, has this striking shortcoming. 

      INDIVIDUAL VARIABILITY AS EXPRESSED HERE:

      Earlier on we were of the same opinion that the Reviewer express here but, unfortunately, it was not possible to follow his advice. As far as it was possible, we have compared some of our results to the values of the two datasets that the Reviewer refers to, but the absence of glomerulus DA2 in one of the datasets and the absence of synapse annotation for all the relevant glomeruli in the other dataset prevented us from making a full comparison. Moreover, believe that the problem of individual variation most probably cannot be solved by increasing the comparison with one or two more flies.

      Reviewer #1 (Recommendations for The Authors): 

      The lines 270 - 282 confused me in the backdrop of Figure 3B. 

      The concern may stem from our inclusion of a comparison between the uPNs of glomerulus DA2 and the single uPN of glomerulus DL5 in the statistical analysis presented in Figure 3. This comparison was included to ensure a comprehensive representation of the data, highlighting the variability across all major cell groups. We have clarified this rationale in the revised manuscript (see lines 274-282).

      Reviewer #2 (Recommendations for The Authors): 

      I commend the authors for taking such a thorough approach to advance an interesting topic in olfaction. The following suggestions are intended to strengthen this study: 

      Major points: 

      A color-blind-friendly palette should be used for all figures. Currently, five of seven figures use red and green, and in particular, Figure 5 will be uninterpretable for red/green color-blind readers. 

      We are thankful for this important comment. We changed the color palette as suggested by the reviewer, and replaced Red with Magenta and changed the figure legend accordingly.

      This level of analysis is extremely resource and time-consuming, so even obtaining this information at this resolution is an impressive achievement. However, this study would be well served by strategically supplementing the analysis of this dataset with information from other publicly available connectomics datasets. For instance, some interpretations are limited because there is information from only a single DL5 and DA2 glomerulus. Any claims in which one glomerulus has more, less, or the same of a metric must be tempered because without replicates, there are no measures of inter-animal variability. As an example, on lines 386-387 the authors state "The relative synaptic strength between MGN>uPN was stronger in DA2 (12%) than DL5 (10%)". It is difficult to assess whether this represents a difference that is outside of the range of inter-animal variability inherent to the olfactory system. Taking select measures from the Hemibrain and FAFB (via FlyWire) datasets could help strengthen these claims. 

      We fully agree with the Reviewer’s opinion that since our data is from one glomerulus of each type “It is difficult to assess whether this represents a difference that is outside of the range of inter-animal variability inherent to the olfactory system.” This is a weakness of practically all connectome studies based on electron microscopy in both Drosophila and other animals We cannot be sure that measurements from the Hemibrain and FAFB datasets could help strengthen our claims, because the magnitude of the range of individual variation is presently not known and most probably solving this problem will require more than one or two more flies. In any case, it is not possible to follow this advice and compare our data with that of the hemibrain because the DA2 was not included in that study. We ask the Reviewer to read our more detailed explanation in our response to Reviewer 1.

      In the particular case commented by the Reviewer above, the relative difference in synaptic strength exceeds 20%. Whether such a difference has functional relevance remains an open question but Schlegel et al. (2024) support our interpretation. They showed that synaptic weights with differences larger than 20% tend to be consistent across individuals, with strong correlations within and between animals (Pearson’s R = 0.97 and R = 0.8; Fig. 4).

      Grouping all local interneurons, centrifugal neurons response and multiglomerular PNs into one category limits the ability to make interpretations about similarities or differences in the synaptic relationships involving MGNs. The authors could get an estimate of the number of multiglomerular PNs in DL5, VA1v, and DA2 from Hemibrain and FlyWire platforms to get a better sense of differences between glomeruli in the MGN category. 

      We agree in that grouping a variety of interneurons into a single category (called MGNs) limits the ability to make interpretations about similarities or differences in the synaptic relationships involving different neurons. This was the unavoidable price to be paid once we decided to register a “comprehensive, synapse-level resolution” map of these two glomeruli. It appears to us that both reviewers have clearly recognized the intrinsic value of this approach and we hope that the Editor will share this opinion. 

      Consistent with the assumptions of Tobin et al., (2017) our hypothesis on LN connectivity differences is based on the fact that they are the most numerous and broadly arborizing neurons of the class that we call multiglomerular neurons in the AL (Chou et al., 2010; Lin et al., 2012; Tanaka et al., 2012). Recent connectome studies confirm this feature across all glomeruli (Bates et al., 2020; Horne et al., 2018; Scheffer et al., 2020; Schlegel et al., 2021; Zheng et al., 2018).  

      In response to the reviewer’s question, we conducted a case-specific reanalysis of the data from Horne (2018), which provides comprehensive connectivity information for the VA1v glomerulus. This allowed us to quantify the proportional contributions of LNs (n = 56) and mPNs (n = 13) to all MGN connections (MGN-MGN, MGN>OSN, MGN>uPN, uPN>MGN, OSN>MGN).

      Our analysis showed that 84% of MGN output originates from LNs. 57% of the input to MGN comes from LNs and 43% from mPNs, largely due to strong OSN>mPN input. Thus, for the filtered MGN connections relevant to distinguishing narrowly from broadly tuned circuits (e.g., MGN>OSN, uPN>MGN; see Fig. 8), LNs are the dominant contributors in VA1v. (These data are not included in the resubmitted manuscript.) This supports our interpretation that the LN are responsible for the majority of MGN connections underlying the observed differences between glomeruli.

      For instance, prior work has reported fewer local interneurons innervating DA2, but in this study there was an unexpected result that there was greater MGN innervation density and synapse # for DA2 relative to DL5 This discrepancy could be due to differences in the number of multiglomerular PNs innervating each glomerulus, which would be obscured when these PNs are combined with local interneurons in the MGN category. 

      "We agree that the greater MGN innervation density in DA2 in our study could reflect a stronger contribution from mPNs. However, innervation density alone does not indicate how many mPNs actually innervate DA2 or DL5. Alternatively, increased innervation and/or synaptic frequency of local interneurons (LNs) could also account for this observation. In our view, neuron number does not necessarily correlate with branching complexity or synaptic density. 

      For example, the dendritic length of the single uPN in glomerulus DL5 is approximately equal to the combined dendritic length of the multiple uPNs of the DA2. Similarly, Tobin et al. (2017) reported that when comparing uPNs in glomerulus DM6 between the left and right brain hemispheres, they found variability in cell number but not in dendritic length. More recently, the FAFB and hemibrain datasets showed a similar pattern in another neuronal type. A substantial variation in cell number was observed for Kenyon cells between the two Drosophila individuals, but this cell type consistently makes and receives, in both individuals, similar presynapses and post-synapses (Schlegel et al., 2024).

      On line 33 the authors cannot claim that DA2-OSNs experience less presynaptic inhibition based on the data in this study. Even without the limitations of the MGN category (described above), presynaptic inhibition depends on more than just the number of synapses, rather it is affected by GABA B receptor expression levels and the second messenger components downstream of this receptor. Physiological experiments are needed to justify this claim, so I recommend adjusting accordingly.

      We agree with the Reviewer and have adjusted the text on line 33 and in the main body of the text by referring to this finding as “presynaptic input”, which is what we have quantified, instead of “less presynaptic inhibition”.

      Figures 5 and 6 seek to distill the wealth of information from this study into broad takehome points for the reader, while still providing a good amount of detail. I think a final more concise graphic summary (similar to the graphical abstract or Figure 6 of Grabe et al 2016) depicting the most critical differences between glomeruli would further clarify the broad findings of this study. 

      We appreciate this comment and we have added a “graphic summary” as the Reviewer proposed. We made a new figure that becomes Figure 8 and summarizes our results and highlights differences between narrowly and broadly tuned glomeruli in a more concise graphical abstract format.

      Minor points: 

      Much of the manuscript provides details about synapse fractions or % synapses for a given synaptic relationship. Please ensure that it is clear which principal cell types are being described, as it can be easy to get lost.  - Should line 284 say "...than DL5 as it has been reported that DA2 is innervated by fewer LNs..."?

      We appreciate the reviewer’s comment and we have corrected this sentence that now reads as follows: (see text: beginning at line 290).  

      Taisz et al.  has been published, so the citation should be updated. 

      We have updated the corresponding citation.  

      On line 233, the authors ascribe the small electron-dense vesicles as likely housing sNPF released by MGNs. However, Carlsson et al. (2010) demonstrated that sNPF is released by OSNs, which was further functionally characterized by Root et al. (2011) and Ko et al. (2014). In terms of MGNs that release neuropeptides, Carlsson et al. 2010 demonstrated that local interneurons immunolabel for tachykinin, myoinhibitory peptide, and allatostatin-A, while two extrinsic neurons release SIFamide. In theory, aminergic neurons could also have small electron-dense vesicles, but this can be variable. 

      The Reviewer is completely right in his criticism. The MGN certainly contain neurons that have been reported to contain neuropeptides other than sNPF. We have corrected this sentence and it now reads as follows (page7, line 236): “Interestingly, besides the abundant clear small vesicles..

      On line 636, the Berck and Schlegel studies demonstrated that panglomerular local interneurons synapse upon OSN, but not that they induce presynaptic inhibition (which was demonstrated in the studies cited in the next sentence). I recommend adjusting this sentence.

      We agree and we have corrected the text following the Reviewers advice. It now reads as follows (page 19. Line 663): “We also observed that OSNs received less MGN feedback.

    1. eLife Assessment

      The manuscript presents a valuable finding that CCDC32, beyond its reported role in AP2 assembly, follows AP2 to the plasma membrane and regulates clathrin-coated pit assembly and dynamics. The authors further identify an alpha-helical region within CCDC32 that is essential for its interaction with AP2 and its cellular function. While live-cell and ultrastructural imaging data are solid, future biochemical studies will be needed to confirm the proposed CCDC32-AP2 interaction.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Yang et al. describes CCDC32 as a new clathrin mediated endocytosis (CME) accessory protein. The authors show that CCDC32 binds directly to AP2 via a small alpha helical region and cells depleted for this protein show defective CME. Finally, the authors show that the CCDC32 nonsense mutations found in patients with cardio-facial-neuro-developmental syndrome (CFNDS) disrupt the interaction of this protein to the AP2 complex. The results presented suggest that CCDC32 may act as both a chaperone (as recently published) and a structural component of the AP2 complex.

    3. Reviewer #2 (Public review):

      Summary:<br /> The authors responded to my previous concerns with additional arguments and discussion. While I do not object to the publication of this work, two critical experiments are still missing.

      Weaknesses:<br /> First, biochemical assays using recombinant proteins should be conducted to determine whether CCDC32 binds to the full AP2 adaptor or to specific AP2 intermediates, such as hemicomplexes. The current co-IP data from mammalian cell lysates are too complex to interpret conclusively. Second, cell fractionation should be performed to assess whether, and how, CCDC32 associates with membrane-bound AP2.

    4. Reviewer #3 (Public review):

      In this manuscript, Yang et al. characterize the endocytic accessory protein CCDC32, which has implications in cardio-facio-neuro-developmental syndrome (CFNDS). The authors clearly demonstrate that the protein CCDC32 has a role in the early stages of endocytosis, mainly through the interaction with the major endocytic adaptor protein AP2, and they identify regions taking part in this recognition. Through live cell fluorescence imaging and electron microscopy of endocytic pits, the authors characterize the lifetimes of endocytic sites, the formation rate of endocytic sites and pits and the invagination depth, in addition to transferrin receptor (TfnR) uptake experiments. Binding between CCDC32 and CCDC32 mutants to the AP2 alpha appendage domain is assessed by pull down experiments.

      Together, these experiments allow deriving a phenotype of CCDC32 knock-down and CCDC32 mutants within endocytosis, which is a very robust system, in which defects are not so easily detected. A mutation of CCDC32, mimicking CFNDS mutations, is also addressed in this study and shown to have endocytic defects.

      An experimental proof for the resistance of the different CCDC32 mutants to siRNA treatment would have helped to strengthen the conclusions.

      In summary, the authors present a strong combination of techniques, assessing the impact of CCDC32 in clathrin mediated endocytosis and its binding to AP2.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This is a revision of a manuscript previously submitted to Review Commons. The authors have partially addressed my comments, mainly by expanding the introduction and discussion sections. Sandy Schmid, a leading expert on the AP2 adaptor and CME, has been added as a co-corresponding author. The main message of the manuscript remains unchanged. Through overexpression of fluorescently tagged CCDC32, the authors propose that, in addition to its established role in AP2 assembly, CCDC32 also follows AP2 to the plasma membrane and regulates CCP maturation. The manuscript presents some interesting ideas, but there are still concerns regarding data inconsistencies and gaps in the evidence.

      With due respect, we would argue that a role for CCDC32 in AP2 assembly is hardly ‘established’.  Rather a single publication reporting its role as a co-chaperone for AAGAP appeared while our manuscript was under review.  We find some similar and some conflicting results, which are described in our revised manuscript.  However, in combination our two papers clearly show that CCDC32, a previously unrecognized endocytic accessory protein, deserves further study.

      (1) eGFP-CCDC32 was expressed at 5-10 times higher levels than endogenous CCDC32. This high expression can artificially drive CCDC32 to the cell surface via binding to the alpha appendage domain (AD)-an interaction that may not occur under physiological conditions.

      While we acknowledge that overexpression of eGFP-CCDC32 could result in artificially driving it to CCPs, we do not believe this is the case for the following reasons:

      i. The bulk of our studies (Figures 2-4) demonstrate the effects of siRNA knockdown on CCDC32 on CCP early stages of CME, and so it is likely that these functions require the presence of endogenous CCDC32 at nascent CCPs as detected with overexpressed eGFP-CCDC32 by TIRF imaging.

      ii. At these levels of overexpression eGFP-CCDC32 fully rescues the effects of siRNA KD of endogenous CCCDC32 of Tfn uptake and CCP dynamics (Figure 6F,G). If the protein was artificially recruited to the AP2 appendage domain, one would expect it to compete with the recruitment of other EAPS to CCPs and hence exhibit defects in CCP dynamics. Indeed, we see the opposite: CCPs that are positive for eGFP-CCDC32 show normal dynamics and maturation rates, while CCPs lacking eGFP-CCDC32 are short-lived and more likely to be aborted (Figure 1C).

      iii. We have identified two modes of binding of CCDC32 to AP2 adaptors: one is through canonical AP2-AD binding motifs, the second is through an a-helix in CCDC32 that, by modeling, docks only to the open conformation of AP2.  Overexpressed CCDC32 lacking this a-helix is not recruited to CCPs (Fig. 6 D,E), indicating that the canonical AP2 binding motifs are not sufficient to recruit CCDC32 to CCPs, even when overexpressed.

      (2) Which region of CCDC32 mediates alpha AD binding? Strangely, the only mutant tested in this work, Δ78-98, still binds AP2, but shifts to binding only mu and beta. If the authors claim that CCDC32 is recruited to mature AP2 via the alpha AD, then a mutant deficient in alpha AD binding should not bind AP2 at all. Such a mutant is critical for establish the model proposed in this work.

      We understand the reviewer’s confusion and thus devoted a paragraph in the discussion to this issue.  As revealed by AlphaFold 3.0 modeling (Figure S6) binding of CCDC32 to the alpha AD likely occurs via the 2 canonical AP2-AD binding motifs encoded in CCDC32. Given the highly divergent nature of AP2-AD binding motifs, we did not identify these motifs without the AlphaFold 3.0 modeling. While these interactions could be detected by GST-pull downs, they are apparently not of sufficient affinity to recruit CCDC32 to CCPs in cells. In the text, we now describe the a-helix we identified as being essential of CCP recruitment as ‘a’ AP2 binding site on CCDC32 rather than ‘the’ AP2 binding site.  Interestingly, and also discussed, Alphafold 3.0 identifies a highly predicted docking site on a-adaptin that is only accessible in the open, cargo-bound conformation of intact AP2.  This is also consistent with the inability of CCDC32(D78-99) to bind the a:µ2 hemi-complex in cell lysates.

      We agree that further structural studies on CCDC32’s interactions with AP2 and its targeting to CCPs will be of interest for future work.

      (3) The concept of hemicomplexes is introduced abruptly. What is the evidence that such hemicomplexes exist? If CCDC32 binds to hemicomplexes, this must occur in the cytosol, as only mature AP2 tetramers are recruited to the plasma membrane. The authors state that CCDC32 binds the AD of alpha but not beta, so how can the Δ78-98 mutant bind mu and beta?

      We introduced the concept of hemicomplexes based on our unexpected (and now explicitly stated as such) finding that the CCDC32(D78-99) mutant efficiently co-IPs with a b2:µ2 hemicomplex.  As stated, the efficiency of this pulldown suggests that the presumed stable AP2 heterotetramer must indeed exist in equilibrium between the two a:s2 and b2:µ2 hemicomplexes, such that CCDC32(D78-99) can sequester and efficiently co-IP with the b2:µ2 hemicomplex.  A previous study, now cited, had shown that the b2:µ2 hemicomplex could partially rescue null mutations of a in C. elegans (PMID: 23482940).  We do not know how CCDC32 binds to the b2:µ2 hemicomplex and we did not detect these interactions using AlphaFold 3.0. However, these interactions could be indirect and involve the AAGAB chaperone.  It is also likely, based on the results of Wan et al. (PMID: 39145939), that the binding is through the µ2 subunit rather than b2. As mentioned above, and in our Discussion, further studies are needed to define the complex and multi-faceted nature of CCDC32-AP2 interactions.

      (4) The reported ability of CCDC32 to pull down AP2 beta is puzzling. Beta is not found in the CCDC32 interactome in two independent studies using 293 and HCT116 cells (BioPlex). In addition, clathrin is also absent in the interactome of CCDC32, which is difficult to reconcile with a proposed role in CCPs. Can the authors detect CCDC32 binding to clathrin?

      Based on the studies of Wan et al. (PMID: 39145939), it is likely that CCDC32 binds to µ2, rather than to the b2 in the b2:µ2 hemicomplex.  As to clathrin being absent from the CCDC32 pull down, this is as expected since the interactions of clathrin even with AP2 are weak in solution (as shown in Figure 5C, clathrin is not detected in our AP2 pull down) so as not to have spontaneous assembly of clathrin coats in the cytosol. Rather these interactions are strengthened by both the reduction in dimensionality that occurs on the membrane and by avidity of multivalent interactions.  For example, Kirchausen reported that 2 AP2 complexes are required to recruit one clathrin triskelion to the PM.

      (5) Figure 5B appears unusual-is this a chimera?

      Figure 5B shows an internal insertion of the eGFP tag into an unstructured region in the AP2 hinge. As we have previously shown (PMID: 32657003), this construct, unique among other commonly used AP2 tags, is fully functional.  We have rearranged the text in the Figure legend to make this clearer.

      Figure 5C likely reflects a mixture of immature and mature AP2 adaptor complexes.

      This is possible, but mature heterotetramers are by far the dominant species, otherwise the 4 subunits would not be immuno-precipitated at near stoichiometric levels with the a subunit.  Near stoichiometric IP with antibodies to the a-AD have been shown by many others in many cell types. 

      (6) CCDC32 is reduced by about half in siRNA knockdown. Why not use CRISPR to completely eliminate CCDC32 expression?

      Fortuitously, partial knockdown was essential to reveal this second function of CCDC32, as we have emphasized in our Discussion.  Wan et al, used CRISPR to knockout CCDC32 and reveal its essential role as a AAGAB co-chaperone.  In the complete absence of CCDC32 mature AP2 complexes fail to form.  However, under our conditions of partial CCDC32 depletion, the expression of AP2 heterotetramers is unaffected revealing a second function of CCDC32 at early stages of CME.  We expect that the co-chaperone function of CCDC32 is catalytic, while its role in CME is more structural; hence the different concentration dependencies, the former being less sensitive to KD than the latter.  This is one reason that many researchers are turning to CRISPRi for whole genome perturbation studies as many proteins play multiple roles that can be masked in KO studies.

      Reviewer #2 (Public review):

      Yang et al. describes CCDC32 as a new clathrin mediated endocytosis (CME) accessory protein. The authors show that CCDC32 binds directly to AP2 via a small alpha helical region and cells depleted for this protein show defective CME. Finally, the authors show that the CCDC32 nonsense mutations found in patients with cardio-facial-neuro-developmental syndrome (CFNDS) disrupt the interaction of this protein to the AP2 complex. The results presented suggest that CCDC32 may act as both a chaperone (as recently published) and a structural component of the AP2 complex.

      Strengths:

      The conclusions presented are generally well supported by experimental data and the authors carefully point out the differences between their results and the results by Wan et al. (PNAS 2024).

      Weaknesses:

      The experiments regarding the role of CCDC32 in CFNDS still require some clarifications to make them clearer to scientists working on this disease. The authors fail to describe that the CCDC32 isoform they use in their studies is different from the one used when CFNDS patient mutations were described. This may create some confusion. Also, the authors did not discuss that the frame-shift mutations in patients may be leading to nonsense mediated decay.

      As requested we have more clearly described our construct with regard to the human mutations and added the possibility of NMD in the context of the human mutations.

      Reviewer #3 (Public review):

      In this manuscript, Yang et al. characterize the endocytic accessory protein CCDC32, which has implications in cardio-facio-neuro-developmental syndrome (CFNDS). The authors clearly demonstrate that the protein CCDC32 has a role in the early stages of endocytosis, mainly through the interaction with the major endocytic adaptor protein AP2, and they identify regions taking part in this recognition. Through live cell fluorescence imaging and electron microscopy of endocytic pits, the authors characterize the lifetimes of endocytic sites, the formation rate of endocytic sites and pits and the invagination depth, in addition to transferrin receptor (TfnR) uptake experiments. Binding between CCDC32 and CCDC32 mutants to the AP2 alpha appendage domain is assessed by pull down experiments. While interaction between CCDC32 and the alpha appendage domain of AP2 is clearly described, a discussion of potential association with other AP2 domains would be beneficial to understand the impact of CCDC32 in endocytosis.

      The reviewer is correct. That CCDC32 also interacts with other subunits of AP2, is evident from the findings of Wan et al. and by the fact that the CCDC32(D78-99) mutant efficiently co-IPs with the b2:µ2 hemicomplex.  We expanded our discussion around this point. CCDC32 remains an, as yet, poorly characterized, but we now believe very interesting EAP worth further study.

      Together, these experiments allow deriving a phenotype of CCDC32 knock-down and CCDC32 mutants within endocytosis, which is a very robust system, in which defects are not so easily detected. A mutation of CCDC32, mimicking CFNDS mutations, is also addressed in this study and shown to have endocytic defects.

      In summary, the authors present a strong combination of techniques, assessing the impact of CCDC32 in clathrin mediated endocytosis and its binding to AP2.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) The authors must be clear about the differences between the CCDC32 isoform they used in their manuscript and the one used to describe the patient mutations. This could be done, for example, in the methods. This is essential for the capacity of other labs to reproduce, follow up and correctly cite these results.

      We have added this information to the Methods. 

      (2) I believe the authors have misunderstood what nonsense mediated decay is. NMD occurs at the mRNA level and requires a full genome context to occur (introns and exons). The fact that a mutant protein is expressed normally from a construct by no means prove that it does not happen. I believe that adding the possibility of NMD occurring would enrich the discussion.

      Thank you, we have now done more homework and have added this possibility into our discussion of the mutant phenotype.  However, if a robust NMD mechanism resulted in a complete loss of CCDC42 protein, then the essential co-chaperone function reported by Wan et al, would result in complete loss of AP2.  A more detailed characterization of the cellular phenotype of these mutations, including assessing the expression levels of AP2 would be informative.

      Reviewer #3 (Recommendations for the authors):

      - It is not clear what the authors mean by '~30s lifetime cohort' (line 159). They refer to Figure 2H, which shows the % of CCPs. Can the authors explain exactly what kind of tracks they used for this analysis, for example which lifetime variations were accepted? Do they refer to the cohorts in Figure S4? In Figure S4, the most frequent tracks have lifetimes < 20 s (in contrast to what is stated in the main text). Why was this cohort not used?

      The ‘30s cohort’ refers to CCPs with lifetimes between 25-35s which encompasses the most abundant species in control cells and CCDC32 KD cells, as shown by the probability curves in Figure 2H. Given the large number of CCPs analyzed we still have large numbers for our analyses n=5998 and 4418, for control and siRNA treated conditions, respectively.  Figure 2H shows the frequency of CCPs in cells treated with CCDC32 siRNA are shifted to shorter lifetimes. We have clarified this in the text.

      - Figure S1: It is now clear, why the mutant versions of CCDC32 are not detected in this western blot. However, data that show the resistance of these proteins to siCCDC32 is still missing (S1 A is in the absence of siCCSC32 I assume, as the legend suggests). A western blot using an anti-GFP antibody, as the one used in Figure S1, after siRNA knock-known would provide clarity.

      That these constructs all contain the same mutation in the siRNA target sequence gives us confidence that they are indeed resistant to siRNA.

      - Note that the anti-CCDC32 antibody does not detect the eGFP-CCDC32(∆78-98) as well as full-length and is unable to detect eGFP-CCDC32(1-54)'. This phrase should belong to Figure S1 (B), not (A)

      Corrected.

      - The immunoprecipitations of CCDC32 and its mutants with AP2 and its subunits are partially confusing. In Figure 5, the authors show that CCDC32 interacts specifically with the alpha-AD, but not with the beta-AD of AP2. In Figure 6B and C, on the other hand, Co-IPs are shown also with the beta and the mu domain of AP2. This is understandable in the context of the full AP2. However, when interaction with the alpha domain (and sigma) is abolished through mutation of helix 78-98, why would beta and mu still interact, when the beta-AD cannot interact with CCDC32 on its own. Are there interaction sites expected outside the ADs in the beta or mu domains?

      See responses to reviewer 1 above.  This result likely reflects the co-chaperone activity of CCDC32 as reported by Wan et al it likely due to their reported interactions of CCDC32 with the µ2 subnit of b2:µ2 hemicomplexes.

      - Figure S6 D, E and F: How much confidence do the authors have on the AlphaFold predictions? Have the same binding poses been obtained repeatedly by independent predictions?

      We provide, with a color scale, the confidence score for each interaction, which is very high (>90%). Of course, this is still a prediction that will need to be verified by further structural studies as we have stated.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Cook et al. have presented an important study on the transcriptomic and epigenomic signature underlying craniofacial development in marsupials. Given the lack of a dunnart genome, the authors also prepared long and short-read sequence datasets to assemble and annotate a novel genome to allow for the mapping of RNAseq and ChIPseq data against H3K4me3 and H3K27ac, which allowed for the identification of putative promoter and enhancer sites in dunnart. They found that genes proximal to these regulatory loci were enriched for functions related to bone, skin, muscle and embryonic development, highlighting the precocious state of newborn dunnart facial tissue. When compared with mouse, the authors found a much higher proportion of promoter regions aligned between species than for enhancer regions, and subsequent profiling identified regulatory elements conserved across species and are important for mammalian craniofacial development. In contrast, the identification of dunnart-specific enhancers and patterns of RNA expression further confirm the precocious state of muscle development, as well as for sensory system development, in dunnart suggesting that early formation of these features are critical for neonate marsupials likely to assist with detecting and responding to cues that direct the joeys to the mother's teat after birth. This is one of the few epigenomic studies performed in marsupials (of any organ) and the first performed in fat-tailed dunnart (also of any organ). Marsupials are emerging as an important model for studying mammalian development and evolution and the authors have performed a novel and thorough analysis, impressively including the assembly of a new marsupial reference genome that will benefit many future studies.

      Strengths:

      The study provides multiple pieces of evidence supporting the important role enhancer elements play in mammalian phenotypic evolution, namely the finding of a lower proportion of peaks present in both dunnart and mouse for enhancers than for promoters, and dunnart showing more genes uniquely associated with it's active enhancers than any other combination of mouse and dunnart samples, whereas this pattern was less pronounced than for promoter-associated genes. In addition, rigorous parameters were used for the cross-species analyses to identify the conserved regulatory elements and the dunnart-specific enhancers. For example, for the results presented in Figure 1, I agree that it is a little surprising that the average promoter-TSS distance is greater than that for enhancers, but that this could be related to the possible presence of unannotated transcripts between genes. The authors addressed this well by examining the distribution of promoter-TSS distances and using proximal promoters (cluster #1) as high confidence promoters for downstream analyses.

      The genome assembly method was thorough, using two different long read methods (Pacbio and ONT) to generate the long reads for contig and scaffold construction, increasing the quality of the final assembled genome.

      Weaknesses:

      Biological replicates of facial tissue were collected at a single developmental time point of the fat-tailed dunnart within the first postnatal day (P0), and analysed this in the context of similar mouse facial samples from the ENCODE consortium at six developmental time points, where previous work from the authors have shown that the younger mouse samples (E11.5-12.5) approximately corresponds to the dunnart developmental stage (Cook et al. 2021). However, it would be useful to have samples from at least one older dunnart time point, for example, at a developmental stage equivalent to mouse E15.5. This would provide additional insight into the extent of accelerated face development in dunnart relative to mouse, i.e. how long do the regulatory elements that activated early in dunnart remain active for and does their function later influence other aspects of craniofacial development?

      We thank the reviewer for their feedback and agree that the inclusion of multiple postnatal stages in the dunnart would give further valuable insights to the comparative analyses. Unfortunately, we were limited by the pouch young available and prioritized ensuring robust data at a single stage for this study. We hope to expand this work to more stages in future studies.

      The authors refer to the development of the CNS being delayed in marsupials relative to placental mammals, however, evidence shows how development of the dunnart brain (whole brain or cortex) is protracted compared to mouse, by a factor of at least 2 times, rather than delayed per se (Workman et al. 2013; Paolino et al. 2023). In addition, there is evidence that cortical formation and cell birth may begin at approximately the same stage across species equivalent to the neonate period in dunnart (E10.5 in mouse), and that shortly after this at the stage equivalent to mouse E12.5, the dunnart cortex shows signs of advanced neurogenesis followed by a protracted phase of neuronal maturation (Paolino et al. 2023). Therefore, it is possible that marsupial CNS development appears delayed relative to mouse but instead begins at the same stage and then proceeds to develop on a different timing scale.

      The comparison here is not directly between CNS development in placental and marsupials but CNS development relative to development of a subset of structures of the cranial skeleton and musculature (as first proposed by Kathleen Smith 1997). For example, Smith 1997 found that in eutherians, evagination of the telencephalon and appearance of the pigment in the eye occur before the ossification of the premaxilla, maxilla, and dentary. However, in marsupials, evagination of the telencephalon and appearance of the pigment in the eye occur concurrently with condensation of cartilage in the basicranium and the ossification of the premaxilla, maxilla, and dentary. Smith 1997 reports both a delay in the initiation of CNS development in marsupials relative to craniofacial ossification and a protraction of CNS development compared to placental mammals.

      This also highlights the challenges of correlating different staging systems between placentals and marsupials as stages determined as equivalent can change depending on which developmental events are used. The protracted development of the CNS in marsupials (Smith 1997, Workman et al. 2013; Paolino et al. 2023) still supports the hypothesis that during the short gestation period in marsupials structures required for life outside the womb in an embryonic-like state, such as the orofacial region, are likely prioritized.

      We have clarified this based on the reviewers feedback and added text referring to the protraction of marsupial CNS development to the Discussion section.

      [New text]: Marsupials display advanced development of the orofacial region relative to development of the central nervous system when compared to placental mammals[3,6].

      [New text]: Although development of the central nervous system is protracted in marsupials compared to placentals, marsupials have well-developed peripheral motor nerves and sensory nerves (eg. the trigeminal) at birth [5].

      Reviewer #2 (Public review):

      This study by Cook and colleagues utilizes genomic techniques to examine gene regulation in the craniofacial region of the fat-tailed dunnart at perinatal stages. Their goal is to understand how accelerated craniofacial development is achieved in marsupials compared to placental mammals.

      The authors employ state-of-the-art genomic techniques, including ChIP-seq, transcriptomics, and high-quality genome assembly, to explore how accelerated craniofacial development is achieved in marsupials compared to placental mammals. This work addresses an important biological question and contributes a valuable dataset to the field of comparative developmental biology. The study represents a commendable effort to expand our understanding of marsupial development, a group often underrepresented in genomic studies.

      The dunnart's unique biology, characterized by a short gestation and rapid craniofacial development, provides a powerful model for examining developmental timing and gene regulation. The authors successfully identified putative regulatory elements in dunnart facial tissue and linked them to genes involved in key developmental processes such as muscle, skin, bone, and blood formation. Comparative analyses between dunnart and mouse chromatin landscapes suggest intriguing differences in deployment of regulatory elements and gene expression patterns.

      Strengths

      (1) The authors employ a broad range of cutting-edge genomic tools to tackle a challenging model organism. The data generated - particularly ChIP-seq and RNA-seq from craniofacial tissue - are a valuable resource for the community, which can be employed for comparative studies. The use of multiple histone marks in the ChIP-seq experiments also adds to the utility of the datasets.

      (2) Marsupial occupy an important phylogenetic position, but they remain an understudied group. By focusing on the dunnart, this study addresses a significant gap in our understanding of mammalian development and evolution. Obtaining enough biological specimens for these experiments studies was likely a big challenge that the authors were able to overcome.

      (3) The comparison of enhancer landscapes and transcriptomes between dunnarts and can serve as the basis of subsequent studies that will examine the mechanisms of developmental timing shifts. The authors also carried out liftover analyses to identify orthologous enhancers and promoters in mice and dunnart.

      Weaknesses and Recommendations

      (1) The absence of genome browser tracks for ChIP-seq data makes it difficult to assess the quality of the datasets, including peak resolution and signal-to-noise ratios. Including browser tracks would significantly strengthen the paper by provide further support for adequate data quality.

      We have put together an IGV session with the dunnart genome, annotation and ChIP-seq tracks. This is now available in the FigShare data repository (10.7554/eLife.103592.1).

      (2) The first two figures of the paper heavily rely in gene orthology analysis, motif enrichment, etc, to describe the genomic data generated from the dunnart. The main point of these figures is to demonstrate that the authors are capturing the epigenetic signature of the craniofacial region, but this is not clearly supported in the results. The manuscript should directly state what these analyses aim to accomplish - and provide statistical tests that strengthen confidence on the quality of the datasets.

      As this is the first epigenomic profiling for this species we performed extensive data quality control (See Supplementary Tables 2-3, 18, 20-23 and Supplementary Figures 1-3, 6-11). These figures and corresponding Supplementary Tables show the robustness of the data, including well-described metrics for assessing promoters and enhancers, GO terms relevant to craniofacial development and binding motifs for key developmental TF families.

      We have emphasised this aspect of the work more strongly in the results section, particularly in [Defining craniofacial putative enhancer- and promoter regions in the dunnart].

      (3) The observation that "promoters are located on average 106 kb from the nearest TSS" raises significant concerns about the quality of the ChIP-seq data and/or genome annotation. The results and supplemental information suggest a combination of factors, including unannotated transcripts and enhancer-associated H3K4me3 peaks - but this issue is not fully resolved in the manuscript. The authors should confirm that this is not caused by spurious peaks in the CHIP-seq analysis - and possibly improve genome annotation with the transcriptomic datasets presented in the study.

      Spurious ChIP-seq peaks could be possible as there is no “blacklisted regions” database for the dunnart to filter on, however we used a no-IP control, a stringent FDR of 0.01 and peaks had to be reproducible in two biological replicates when calling peaks - all of which should reduce the likelihood of false positives.

      H3K4me3 activity at enhancers is well-established, in particular when enhancer sequences are also bound by RNA Pol II ((Koch and Andrau, 2011; Pekowska et al., 2011). However, compared to H3K4me3 activity at promoters, H3K4me3 levels at enhancers are low (Calo and Wysocka, 2013). This is in line with our observations that H3K4me3 levels at enhancers are much lower than observed at promoter regions (see Supplementary Note 2). We found that H3K4me3 peaks located closer to the TSS had a stronger peak signal (mean = 46.10) than distal H3K4me3 peaks (mean = 6.95; Wilcoxon FDR-adjusted p < 2.2 x 10<sup>-16</sup>). This suggests that although some distal promoter peaks may be due to missingness in the annotation, the majority likely represent peaks associated with enhancer regions. We have emphasized this finding more strongly in the results section:

      [New text]: H3K4me3 activity at enhancers is well-established[25,26], however, compared to H3K4me3 activity at promoters, H3K4me3 levels at enhancers are low[27]. This is in line with our observations where H3K4me3 levels at distal enhancer peaks are nearly 7 times lower than those observed at promoter regions (see SupNote2).

      (4) The comparison of gene regulation between a single dunnart stage (P1) and multiple mouse stages lacks proper benchmarking. Morphological and gene expression comparisons should be integrated to identify equivalent developmental stages. This "alignment" is essential for interpreting observed differences as true heterochrony rather than intrinsic regulatory differences.

      Given the developmental differences between eutherian and marsupial mammals it is challenging to assign the dunnart a precise “equivalent” developmental stage to the mouse. From our morphological and developmental characterisation (see Cook et al. 2020 Nat Comms Bio) based on ossification patterns the dunnart orofacial region on the day of birth appears to be similar to that of an E12.5 mouse embryo (just prior to the observation of ossified craniofacial bones). However, when we compared both regulatory elements and expressed genes between the dunnart at this stage (P1) and 5 developmental stages in the mouse, there is no obvious equivalent stage. For example, when we simply compare genes linked to enhancer peaks, the group with the largest intersection between dunnart and any mouse stage are ~500 genes that are present in dunnart, and mouse stages E10.5, E12.5 - E15.5, Figure 5B). When we then compare genes expressed in the dunnart to temporal gene expression dynamics during mouse development we find that the largest overlap is with genes highly expressed at E14.5 or E15.5 in the mouse (Figure 6, Supplementary Figure 5). We have strengthened the rationale for the selected mouse stages in the comparative analyses section of the results.

      (5) The low conservation of putative enhancers between mouse and dunnart (0.74-6.77%) is surprising given previous reports of higher tissue-specific enhancer conservation across mammals. The authors should address whether this low conservation reflects genuine biological divergence or methodological artifacts (e.g., peak-calling parameters or genome quality). Comparisons with published studies could contextualize these findings.

      The reported range (0.74 - 6.77%) refers to the number regions called as an active enhancer peak in both species (conserved activity) divided by the total number of dunnart peaks alignable to the mouse genome, which we expect to be low given sequence turnover rates and the evolutionary distance separating dunnart and mice. The alignability (conserved sequence) for dunnart enhancers to the mouse genome was ~13% for 100bp regions and can be found in Supplementary Table 22, we have now clarified this in the main text.

      [New Text]: After building dunnart-mm10 liftover chains (see Methods and SupNote5) we compared mouse and dunnart regulatory elements. The alignability (conserved sequence) for dunnart enhancers to the mouse genome was ~13% for 100bp regions (Supplementary Table 22).

      The activity conservation range reported here is consistent with previously reported for marsupial-placental enhancer comparisons (Villar et al. 2015), where ~1% of conserved liver-specific human enhancers had conserved activity to opossum. Follow up studies in Berthelot et al 2018 also found that approximately 1% of human liver enhancers were conserved across the placental mammals included in the study.

      (6) Focusing only on genes associated with shared enhancers excludes potentially relevant genes without clear regulatory conservation. A broader analysis incorporating all orthologous genes may reveal additional insights into craniofacial heterochrony.

      We appreciate the reviewers comment, we understand that a broader analysis may provide some additional insights to this question however in this study our focus was understanding the enhancers driving craniofacial development in these species. We linked enhancers with gene expression data as additional evidence of regulatory programs involved in craniofacial development. The majority (~70%) of genes reproducibly expressed were linked to an active enhancer and/or promoter.   This has now been highlighted in the result section.

      [New Text]: There were 12,153 genes reproducibly expressed at a level > 1 TPM across three biological replicates, with the majority of genes 67% of genes expressed (67%; 8158/12153) associated with near an active enhancer and/or promoter peak.

      In conclusion, this study provides an important dataset for understanding marsupial craniofacial development and highlights the potential of genomic approaches in non-traditional model organisms. However, methodological limitations, including incomplete genome annotation and lack of developmental benchmarking weaken the robustness and of the findings. Addressing these issues would significantly enhance the study's utility to the field and its ability to support the study's central conclusion that dunnart-specific enhancers drive accelerated craniofacial development.

      Reviewer #1 (Recommendations for the authors):

      Minor comments and corrections:

      (1) ChIP-seq FRiP fractions were much higher in dunnart samples than in mouse. Is this related to any differences in sample preparation they are aware of in the ENCODE datasets of mouse, such as different anti-histone antibodies used (and therefore different efficiency of binding to the same histone markers across species)? The authors appear to have addressed something similar with respect to the much lower enriched peak number observed in the mouse sample relative to dunnart in Supp note 4. I suspect the "technical cofounder" they refer to there is affecting both the FRiP scores and the higher correlation coefficients between IP and input in mouse.

      We chose the same antibodies used in the mouse craniofacial tissue ENCODE experiments however, the procedure is slightly different. We used the MAGnify Chromatin Immunoprecipitation System while in the ENCODE assays performed by Bing Ren’s group in 2012 was an in-house lab protocol for MicroChIP. Given that the samples for mouse and dunnart were not processed together, by the same researcher, with the same protocol there could be any number of technical cofounders impacting enrichment. A low FRiP score suggests low specificity as the majority of reads are in non-specific regions (low enrichment), consistent with the higher correlation between IP and input in mouse. The data quality also appears to vary between H3K27ac and H3K4me3 in the mouse (Supplementary Table 21), with H3K4me3 FRiP scores more similar to those observed in our dunnart experiments. This suggests a potential confounder specific to the mouse H3K27ac IP. QC metrics (FRiP, bam correlation) are consistent between H3K27ac and H3K4me3 IPs in our experiments (Supplementary Table 20).

      (2) Some of the promoter peak numbers in Supp table 1 do not match the numbers in the main text.

      We have corrected the incorrect number reported in the text for promoter peaks with orthologous genes (8590 -> 8597).

      (3) In Supp tables 2 and 3, the number of GO terms similar across tables is 466, which is ~42% of total number of enriched GO terms. However the authors mention that only 23% of terms were the same between promoters and enhancers, and a value of 42% was applied to the proportion of terms uniquely enriched for terms associated with genes assigned to promoters only. Unless I'm reading these Supp tables incorrectly, is it possible the proportions were mixed up?

      Thanks for catching this. The lists provided in Supplementary Table 2 were incorrect. The Supplementary Tables and in text description has been corrected to reflect this.

      (4) Would be helpful to add a legend for the mouse samples in Supp Figure 10.

      We have added the labels to the plot.

      (5) In Supp note 5, regarding the percentage of alignable peaks recovered, the percentages mentioned for the 50bp and 500bp peak summit lengths for enhancers and promoters do not seem to match the values in Supp tables 22 and 23.

      Thank you for catching this - we have corrected the Supplementary Tables and in text.

      (6) Please provide additional information to explain how dunnart RNA expression was associated with the five temporal expression clusters found in the mouse data shown in Figure 6 given there is only one dunnart time point and so the species temporal pattern's could not be compared, i.e. how was the odds ratio calculated and was this applied iteratively for dunnart against each mouse age and within each temporal cluster?

      The TCseq package takes the mouse expression data across all 6 stages and calls differentially expressed genes with an absolute log<sub>2</sub> fold-change > 2 compared to the starting time-point (E10.5). The mouse gene expression patterns were clustered into 5 clusters that each show distinct temporal expression patterns (see Supplementary Figure 5D). The output from this is 5 lists where within each list are unique genes that share a temporal pattern. These lists of mouse genes were then each compared to the orthologous genes expressed in the dunnart using a Fishers Exact test with corrections for multiple testing using the Holm method. We have added additional details in the methods:

      [New text]: Orthologous genes reproducibly expressed >1 TPM in the dunnart were compared to the list of genes for each cluster using Fisher’s Exact Test followed by p-value corrections for multiple testing with the Holm method.

      (7) SupFile1 and SupFile2 - which supplementary note or figure are these referring to?

      Apologies for this error. These items were meant to link to the FigShare repository where the supplementary files can be found. We have corrected this using the DOI for the repository.

      Reviewer #2 (Recommendations for the authors):

      (1) Authors should clarify that the mouse ENCODE data used for the comparisons was obtained from craniofacial tissue.

      This has now been corrected to clarify that the mouse ENCODE data used was from craniofacial tissues. ENCODE mouse embryonic facial prominence ChIP-seq and gene expression quantification file accession numbers and details used in study can be found in Supplementary Table 17.

      (2) Given the large differences in TPM for highly expressed genes shown in Figure 5, a MA or volcano plot would provide a more comprehensive view of global transcriptome differences between species.

      We have added this plot as Supplementary Figure 13.

      (3) It is unclear whether the enrichment analysis was performed for mouse genes, dunnart genes, or both.

      In reference to Figure 5, Gene Ontology enrichment analysis was performed on the top 500 highly expressed genes in dunnart. Because there is not an ontology database for dunnart gene IDs, these top 500 dunnart gene IDs were converted to the orthologous gene ID in mouse before performing the enrichment analysis. We apologise for the lack of clarity and have added additional text in the results section to make this clearer. In addition, the relevant methods section now reads:

      [New text]: As there is no equivalent gene ontology database for dunnart, we converted the Tasmanian devil RefSeq IDs to Ensembl v103 using biomaRt v2.46.3 and then converted these to mouse Ensembl v103 IDs. In this way we were able to use the mouse Ensembl Gene Ontology annotations for the dunnart gene domains. All gene ontology analyses were performed using clusterProfiler v4.1.4[117], with Gene Ontology from the org.Mm.eg.db v3.12.0 database[118], setting an FDR-corrected p-value threshold of 0.01 for statistical significance.

    1. eLife Assessment

      This study presents a valuable comparison of the efficiency and precision of two prime editing methods to introduce single-nucleotide variants and longer exogenous DNA sequences into the zebrafish genome. Solid data support the conclusion that the PE2 prime editor Nickase is more effective at introducing single-nucleotide variants, while the PEn prime editor nuclease is more effective at integrating short sequences from 3 up to 30 base pairs, for both somatic and germline editing. The results will be of interest to the zebrafish community, in particular to model human disease variants in this model organism.

    2. Reviewer #1 (Public review):

      Ono et al. compared the activity of prime editor Nickase PE2 and prime editor nuclease PEn in introducing SNPs and short exogenous DNA sequences into the zebrafish genome to model human disease variants. They find the nickase PE2 prime editor had a higher rate of precise integration for introducing single-nucleotide substitutions, whereas the nuclease PEn prime editor showed improved precision of integration of short DNA sequences. In somatic tissue, the percentage of SNP variant precision edits improved when using PE2 RNP injection instead of mRNA injection, but increased precision editing correlated with elevated indel formation. While PEn overall had higher rates of precision edits, the indel rate was also elevated. Similar rates were observed when introducing a 3 bp stop codon into the ror gene using a standard pegRNA with a 13-nucleotide homology arm, or a springRNA lacking the homology arm that drives integration via NHEJ. Inclusion of an abasic sequence in the springRNA prevented imprecise edits caused by scaffold incorporation, but did not improve the overall percentage of precise edits in somatic tissue. Recovery of a germline ror-TGA integration allele using PEn with RNP was robust, resulting in 5 out of 10 founders transmitting a precise allele. Lastly, the authors demonstrate that PEn was effective at the integration of a 30 bp nuclear localization signal into the 5' end of GFP in an existing muscle-specific reporter line. However, the undefined number of cassettes in this multicopy transgene complicates accurate measurements of editing frequency. Integration of the NLS or other longer sequences at an endogenous locus would demonstrate the broad utility of this approach. From the work presented, it is unclear how prime editing could be used to transiently model human pathogenic variants, given the low frequency of precision edits in somatic tissue, or to isolate stable germline alleles of variants that are potentially dominant negative or gain-of-function in nature. Without a direct comparison with CRISPR/Cas9 nuclease HDR-based methods that use oligonucleotide templates to introduce edits, the advantage of prime editing is unclear. A cost comparison between prime editing and HDR methods would also be of interest, particularly for integration of longer DNA sequences.

      The conclusions of the paper are mostly well supported, but some changes to the text and additional analyses would strengthen the conclusion that PE2 vs. PEn is preferred for introducing variants, short or long DNA sequences.

      (1) In Figure 3, the data indicate a significant increase in precise edits of the 3 bp TGA using PE2 RNP (11.5%) vs. PE2 mRNA (1.3%). At the adgrf3b locus, only PEn mRNA was tested for introducing the 3 bp and 12 bp insertions. The previous study testing PE2 for 3 and 12 bp insertions was mentioned, but the frequency was not listed, and the study wasn't cited (lines 204 - 207). A comparison of germline transmission rates using PE2 vs. PEn would support the conclusion that PEn allows precise integration of longer templates and recovery of germline integration alleles.

      (2) Figure 4 shows the results of introducing a TGA stop codon that is predicted to result in nonsense-mediated decay. Testing the ability to also isolate different substitution mutations in the germline would be useful information for identifying the most effective approach for generating human disease variant models.

      (3) A comparison with the prime editing variant knock-in frequencies reported in the recent publication by Vanhooydonck et al., 2025, Lab Animal should be included in the Discussion.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript provides a comparison of nickase-based (PE2) and nuclease-based (PEn) Prime Editors in zebrafish, evaluating their efficiencies for substitutions, short insertions (3-30 bp), and germline transmission.

      Strengths:

      The manuscript has demonstrated for the first time that nuclease-based PEn more efficiently inserts nucleotide sequences up to 30 bp (nuclear localization sequence) than PE2, providing an improvement for the application of gene editing in functional genetics research. Additionally, the demonstration of stable zebrafish lines with edited ror2 and smyhc1:gfp loci is well-supported by sequencing and phenotypic data, confirming functional consequences of edits.

      Weaknesses:

      The study lacks conceptual innovation, as the central methodology-RNP-based Prime Editor delivery in zebrafish-was previously established by Petri et al. (2022). The present study extends this by testing longer insertions (30 bp) with nuclease-based PEn, but this incremental advance does not substantially shift the field's understanding or capabilities. The manuscript does not sufficiently differentiate its contributions from these precedents.

      The comparative analysis between PE2 and PEn systems suffers from limited evidentiary support. The comparison relies on single loci for substitutions (crbn) and insertions (ror2), raising concerns about generalizability. Additional validation across multiple loci is necessary to support broad conclusions about PE2/PEn performance.

    4. Reviewer #3 (Public review):

      The manuscript by Ono et al describes the application of prime editors to introduce precise genetic changes in the zebrafish model system. Probably the most important observation is that, compared to the "standard" PE2, the prime editor with full nuclease activity appears to be more efficient at introducing insertions into the genome. Although many laboratories around the world have successfully used oligonucleotide-mediated HDR to insert short exogenous sequences such as epitope tags or loxP sites into the zebrafish genome, the method suffers from a high frequency of indels at the edit site. Thus, additional tools are badly needed, making this manuscript very important. Length of the longer reported insertion (+30) is quite close to the range of V5 (14 amino acids) and ALFA (12 amino acids without "spacer" prolines) epitope tags, as well as loxP site (34 nucleotides). Conclusions drawn in the paper are supported by compelling evidence. I only have a few minor comments:

      (1) The logic for introducing two nucleotide changes (at +3 and +10) to change a single amino acid (I378) should be explicitly explained in the main body of the manuscript. It is indeed self-explanatory when looking at Supplementary Figure 1. One way of doing it could be to include Supplementary Figure 1a in Figure 1.

      (2) It is not clear why a 3-nucleotide insertion was used to generate W722X. The human W720X is a single-nucleotide polymorphism, and it should be possible to make a corresponding zebrafish mutant by introducing two nucleotide changes.

      (3) Lines 137-138: T7 Endonuclease assay used in Figure 2d detects all polymorphisms, both precise changes and indels. Thus, if this assay were performed on embryos shown in Figure 1c-d, the overall percentage of modified alleles would be similarly higher for PEn over PE2 (add up precise prime edits and indels). The conclusion in the last sentence of the paragraph is, therefore, incorrect, I believe.

      (4) Use of terminology. "Germline transmission" is typically used to refer to the fraction of F0s transmitting desired changes (or transgenes) to their progeny, while "germline mosaicism" refers to the fraction of F1s with the desired change in the progeny of a given F0. "Germline transmission" in line 217 should be replaced with "germline mosaicism".

      (5) Lines 253-255: The fraction of injected embryos that had mosaic nuclear expression of GFP, indicative of NLS insertion, should be clarified. It should also be clarified whether embryos positive for nuclear GFP were preselected for amplicon sequencing and germline transmission analyses. This is extremely important for extrapolation to scenarios like epitope tagging, where preselection is not possible.

      (6) Statistical analyses. It would be helpful to clarify why different statistical tests are sometimes used to assess seemingly very similar datasets (Figures 1c, 1d, 2b, 2c, 2f).

      (7) Discussion. Since authors suggest that PEn might be especially beneficial for insertion of additional sequences, it is important to stress locus-to-locus variability of success. While the precise +3 insertion was indeed tremendously efficient at both tested loci (ror2 and adgrf3b), +12 addition into adgrf3b was over 10 times less efficient (lines 193-194). In contrast, +30 into smyhc:GFP using the shorter pegRNA was highly efficient again with an average of 8.5% of sequence reads indicating precise integration (line 257, Figure 5c). Longer pegRNA did not work nearly as well (Figure 5c), but was still much better than +12 into adgrf3b. As dangerous as it is to extrapolate from small datasets, perhaps these observations indicate that optimization of RT template and PBS may be needed for each new locus in order to significantly outperform oligonucleotide-mediated HDR? If so, would the cost of ordering several pegRNAs and the effort needed to compare them factor in when deciding which method to use? Reported germline transmission rates for both ror2 W722X (+3, Figure 4a) and smyhc:NLS-GFP (+30, Figure 5f) are tantalizingly high.

    1. eLife Assessment

      This important study demonstrates that disruption of a common protein-folding system renders drug-resistant clinical bacteria susceptible to antibiotics. The work convincingly shows that targeting protein folding can be used to combat multidrug-resistant pathogens, both by potentiating the efficacy of existing drugs and by therapeutic use of small-molecule inhibitors. This study is significant and timely as it informs on a new strategy that is relevant to microbiologists and clinicians interested in combating antimicrobial resistance.